2025 11 21

GCP – Four agentic workflows you can build for life sciences for R&D

AI agents, powered by generative AI, are rapidly transforming industries by acting as intelligent, collaborative partners that can interpret goals, plan multi-step actions, and work independently across systems, marking a significant shift in how businesses can find, understand, and act on their data. Our recent blog outlines how AI agents are transforming several industries.

Below we describe how to create a modular, end-to-end platform that accelerates the discovery and preclinical optimization of novel therapeutic candidates through a multi-agentic system. The system is designed to move from a high-level disease concept to a set of lead candidates with a high probability of success, regardless of the specific disease or therapeutic modality.

We see few key roles to be played by specialized AI agents, each based on a specialized open-weight model from Google, which can in turn be fine-tuned and trained for even more specialized purposes. Given the below agents are all based on open weight models, it gives a lot of room to further fine tune and train these models to build powerful agents.

Four agents you can build for life sciences

1. MedGemma: “The strategic intelligence agent”

- Expertise: Deep comprehension and synthesis of unstructured biomedical text, medical imaging, clinical data, and scientific literature.
- Function: Acts as a specialized knowledge agent. When directed by the Cognitive Orchestrator, it executes deep search and synthesis across biomedical corpora (e.g., PubMed, patient text records, other modalities such as chest x-rays) to extract findings, build cohorts, and summarize knowledge. MedGemma is especially useful for use cases requiring strict version control (e.g. regulated products), lower inference costs, or requiring substantial adaptation to specific use cases. Additionally its fast performance and efficient cost makes it very suitable for high volume medical use cases where speed and cost are of importance, a lot of those use cases are very common in LifeSciences

2. TxGemma: “The preclinical analyst”

- Expertise: Predicting functional and safety properties of therapeutic molecules
- Function: Predicts preclinical properties of drug candidates in silico, such as pharmacokinetics, permeability, toxicity, or efficacy.
- TxGemma Blog

3. Gemini 2.5 Pro: “The cognitive orchestrator agent”

Expertise: Advanced multi-step reasoning, dynamic planning, and contextual understanding to manage the end-to-end drug discovery workflow.
Function: Directs the specialized AI agents by interpreting high-level goals, sequencing tasks, evaluating results, and dynamically adapting the workflow to help the scientists achieve the final therapeutic objective.This orchestrator also accesses various tools. A tool can be a complete, specialized agent (like MedGemma) or a specific model endpoint (like AlphaFold), and is given a clear, natural language description of its function. For example, the MedGemma tool might be used as: “A tool that searches and synthesizes biomedical literature to identify potential disease targets based on a given pathology.”
Note: For uses cases needing a version locked model and change control users have the option of using Gemma (Open Source) for this orchestration

4. AlphaFold-2 & molecular docking tools: “The molecular architect”

Expertise: Predicting the precise 3D structure of molecular targets and simulating how candidate molecules physically interact (dock) with them.
Function: Creates the essential structural blueprint of the drug-target interaction, enabling structure-based design, virtual screening, and specificity analysis

Here’s the step-by-step process

Phase 1: Find the target

A scientist prompts the system (e.g., “Find novel targets for Parkinson’s”). The MedGemma agent (“AI Research Analyst”) instantly scans millions of publications and clinical data to identify promising biological targets. The Orchestrator delivers a concise report, and the scientist approves the final target.

Phase 2: Generate candidates

The AlphaFold agent (“Molecular Architect”) builds a 3D model of the target. Then, the TxGemma agent performs virtual screening, testing thousands of potential drug “keys” to see how they “fit” the target “lock,” creating a shortlist of candidates.

Phase 3: The “Design-test-refine” loop

This is the core engine for rapidly improving candidates.

Predict: TxGemma (“Preclinical Analyst”) runs a virtual simulation on each candidate, predicting its real-world performance (e.g., potency, toxicity).
Triage: The Orchestrator sorts them: “Promote” (looks excellent), “Archive” (a dead end), or “Optimize” (promising, but flawed).
Refine: “Optimize” candidates are automatically refined to fix their specific flaw and are sent right back into the loop.

This Design -> Dock -> Predict -> Refine cycle runs thousands of times on Google Cloud’s high-performance computing, iterating on drug designs at a speed impossible in a physical lab.

Phase 4: Nominate lab-ready leads

After the loop, the Orchestrator presents the human scientist with the final, highly-optimized lead candidates. The scientist makes the final selection, and MedGemma re-engages to help design the optimal strategy for real-world lab testing.

By moving the costly “test-and-fail” part of discovery into this rapid, in-silico workflow, we can focus our lab resources on candidates with the highest probability of success, creating a faster, more intelligent path to new therapies.

Reference architecture

This diagram shows the foundational services and how data flows between them and how services work together.

Executing this sophisticated, iterative workflow requires a robust, scalable, and secure cloud platform. Google Cloud provides a comprehensive suite of services that map directly to the needs of each AI agent and the overall workflow, ensuring data integrity, compliance, and computational power.

How to get started using Google Cloud

Vertex AI Search is the core service for this agent’s function. It can create a sophisticated Retrieval-Augmented Generation (RAG) system over a corpus of private biomedical data, such as internal research documents, PubMed literature, and clinical trial data. This directly enables the agent to answer natural language queries and synthesize information with citations.
Vertex AI. Google Cloud offers managed, optimized AlphaFold environments and integrations. For high-throughput needs, Vertex AI Training with GPU or TPU acceleration can run thousands of protein folding and docking simulations in parallel. Use Vertex AI Agent Builder to create agents.

^{We would like to thank Ryan Ye Min Thein (Customer Engineer, Google Cloud) and Justin Chen (Clinician Specialist, Google Health) for their contributions}

GCP – Four agentic workflows you can build for life sciences for R&D

Four agents you can build for life sciences

Here’s the step-by-step process

Reference architecture

Related Posts

AWS – Announcing larger managed database bundles for Amazon Lightsail

AWS – Amazon EMR Serverless adds support for job run level cost allocation

GCP – How Hackensack Meridian Health de-risked network migration using VPC Flow Logs