GCP – Accelerating AI in healthcare using NVIDIA BioNeMo Framework and Blueprints on GKE
The quest to develop new medical treatments has historically been a slow, arduous process, screening billions of molecular compounds across decade-long development cycles. The vast majority of therapeutic candidates do not even make it out of clinical trials.
Now, AI is poised to dramatically accelerate this timeline.
As part of our wide-ranging, cross-industry collaboration, NVIDIA and Google Cloud have supported the development of generative AI applications and platforms. NVIDIA BioNeMo is a powerful open-source collection of models specifically tuned to the needs of medical and pharmaceutical researchers.
Medical and biopharma organizations of all sizes are looking closely at predictive modeling and AI foundation models to help disrupt this space. With AI, they’re working on accelerating the identification and optimization of potential drug candidates to significantly shorten development timelines and address unmet medical needs. This has become a significant turning point for analyzing DNA, RNA, and protein sequences, and chemicals, predicting molecular interactions, and designing novel therapeutics at scale.
With BioNeMo, companies in this space gain a more data-driven approach to developing medicines while reducing reliance on time-consuming experimental methods. But these breakthroughs are not without their own challenges. The shift to generative medicine requires a robust tech stack, including: powerful infrastructure to build, scale, and customize models; efficient resource utilization; agility for faster iteration; fault tolerance; and orchestration of distributed workloads.
Google Kubernetes Engine (GKE) offers a powerful solution for achieving many of these demanding workloads, and when taken together with NVIDIA BioNeMo, GKE can accelerate work on the platform. With BioNeMo running on GKE, organizations can achieve medical breakthroughs and new research with levels of speed and effectiveness that were unheard of before.
In this blog, we’ll show you how to build and customize models and launch reference blueprints using NVIDIA BioNeMo platform on GKE
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9db19afa60>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
NVIDIA’s BioNeMo platform on GKE
NVIDIA BioNeMo is a generative AI framework that enables researchers to model and simulate biological sequences and structures. It places major demands for computing with powerful GPUs, scalable infrastructure for handling large datasets and complex models, and robust managed services for storage, networking, and security.
GKE offers a highly scalable and flexible platform ideal for AI and machine learning — and particularly the demanding workloads found in biopharma research and development. GKE’s autoscaling features ensure efficient resource utilization, while its integration with other Google Cloud services simplifies the AI workflow.
NVIDIA’s BioNeMo platform offers two potential synergistic components:
1. BioNeMo Framework: Large-Scale Training Platform for Drug Discovery AI
A scalable, open-source, training system for biomolecular AI models like ESM-2 and Evo2. It provides an optimized environment for training and fine-tuning biomolecular AI models. Built on NVIDIA NeMo and PyTorch Lightning, it offers:
- Domain-Specific Optimization: Provides performant biomolecular AI architectures that can be scaled to billions of parameters (eg: BERT, Striped Hyena) along with representative model examples (e.g., ESM-2, Geneformer) built with CUDA-accelerated tooling tailored for drug discovery workflows.
- GPU-accelerated performance: Delivers industry-leading speed through native integration with NVIDIA GPUs at scale, reducing training time for large language models and predictive models.
- Comprehensive open-source resources: Includes programming tools, libraries, prepackaged datasets, and detailed documentation to support researchers and developers in deploying biomolecular AI solutions
Explore the preprint here for details.
2. BioNeMo Blueprints: Production Ready Workflows for Drug Discovery
BioNeMo Blueprints provide ready-to-use reference workflows for tasks such as protein binder design, virtual screening, and molecular docking. These workflows integrate advanced AI models like AlphaFold2, DiffDock 2.0, RFdiffusion, MolMIM, and ProteinMPNN to accelerate drug discovery processes. These blueprints provide solutions to patterns identified across several other industry use cases. Scientific developers can try NVIDIA inference microservices (NIMs) at build.nvidia.com and access them to test via a NVIDIA developer license.
The following graphic shows the components and features of GKE used by the BioNeMo platform. In this blog, we demonstrate how to deploy these components on GKE, combining NVIDIA’s domain-specific AI tools with Google Cloud’s managed Kubernetes infrastructure for:
- Distributed pretraining and finetuning of models across NVIDIA GPU clusters
- Blueprint-driven workflows using NIMs
- Cost-optimized scaling via GKE’s dynamic node pools and preemptible VMs
Figure 1: NVIDIA BioNeMO Framework and BioNeMo Blueprints on GKE
Solution Architecture of BioNeMo framework
Here, we will walk through setting up the BioNeMo framework on GKE to perform ESM2 pretraining and fine-tuning.
Figure 2: BioNeMo framework on GKE
The above diagram shows an architectural overview of deploying the NVIDIA BioNeMo Framework on GKE for AI model pre-training, fine-tuning, and inferencing. Here’s a breakdown from an architectural perspective:
-
GKE: The core orchestration platform including the control plane managing the deployment and scaling of the BioNeMo Framework. This is deployed as a regional cluster, and can be optionally configured as a zonal cluster.
-
Node Pool: A group of worker nodes within the GKE cluster, specifically configured with NVIDIA GPUs for accelerated AI workloads.
-
Nodes: Individual machines within the node pool, equipped with NVIDIA GPUs.
-
NVIDIA BioNeMo Framework: The AI software platform running within GKE, enabling pre-training, fine-tuning, and inferencing of AI models.
Networking:
-
Virtual Private Cloud (VPC): A logically isolated network within GCP, ensuring secure communication between resources.
-
Load Balancer: Distributes incoming traffic to the BioNeMo services running in the GKE cluster, enhancing availability and scalability.
Storage:
-
Filestore (NFS): Provides high-performance network file storage for datasets and model checkpoints.
-
Cloud Storage: Object storage for storing datasets and other large files.
NVIDIA NGC Image Registry: Provides container images for BioNeMo and related software, ensuring consistent and optimized deployments.
Steps
We have published an example to pre-train, fine-tune, and infer an ESM-2 model using BioNeMo Framework on GKE in Pretraining and Fine-tuning ESM-2 LLM on GKE using BioNeMo Framework 2.0 GitHub repo. Here is an outline of the steps for pretraining:
1. Create a GKE cluster
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud container clusters create “gke-bionemo-esm2″ \rn –num-nodes=”1″ \rn –location=”<GCP region / zone>” \rn –machine-type=”e2-standard-2″ \rn –addons=GcpFilestoreCsiDriver’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9d8db5ef40>)])]>
2. Add node pool with NVIDIA GPUs
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud container node-pools create “gke-bionemo-esm2-np” \rn–cluster=”gke-bionemo-esm2″ \rn–location=”<GCP region / zone>” \rn–node-locations=”<GCP region / zone>” \rn–num-nodes=”1″ \rn–machine-type=”g2-standard-2″ \rn–accelerator=”type=nvidia-l4,count=1,gpu-driver-version=LATEST” \rn–placement-type=”COMPACT” \rn–disk-type=”pd-ssd” \rn–disk-size=”300GB”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9d8db5ea30>)])]>
3. Mount Google Cloud Filestore across all the nodes
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl apply -f create-mount-fs.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9d8db5e130>)])]>
4. Run the pretraining job
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl apply -f esm2-pretraining.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9d8db5e1c0>)])]>
5. Visualize results in TensorBoard
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl port-forward pod/<pod-bionemo> 8000:6006’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9d8db5ebb0>)])]>
Open a web browser pointing to http://localhost:8000/#timeseries to see the loss curves. The details for fine-tuning and inference are laid out in the GitHub repo.
Solution Architecture of BioNeMo Blueprints
The below graphic shows a BioNeMo Blueprint that is deployed on GKE for inferencing. From an infrastructure standpoint, the components used across the Compute, Networking and Storage layer are similar to Figure 2:
-
NIMs are packaged as a unit with runtime and model-specific weights. Blueprints deploy one or more NIMs using Helm charts. Alternatively, they can be deployed using gcloud or docker commands and configured using kubectl commands. Each NIM needs a minimum of one NVIDIA GPU accessible through a GKE node pool.
-
Three NIMs—AlphaFold2, DiffDock, and MolMIM—are deployed as individual Kubernetes deployments. Each deployment uses a GPU and a NIM container image, mounting a persistent volume claim for storing model checkpoints and data. Services expose each application on different ports. The number of GPUs can be configured to a higher value for better scalability.
Figure 3: NIM Blueprint on GKE
Steps
We have an example of deploying a BioNeMo blueprint for Generative Virtual Screening at Generative Virtual Screening for Drug Discovery on GKE GitHub repo. The setup steps, such as GKE cluster, node pool, and mounting filestore, are similar to BioNeMo training. The below steps give an outline of deploying the BioNeMo blueprint and using it for inference:
1. Deploy the BioNeMo blueprint
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl create -f nim-bionemo-generative-virtual-screening.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9db414d280>)])]>
2. Use port forwarding to interact with the pod
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl port-forward pod/<molmim-pod> 8010:8000 &’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9db414d6d0>)])]>
3. Test MolMIM NIM locally using a curl statement. The output will have the generated molecule.
- code_block
- <ListValue: [StructValue([(‘code’, ‘curl -X POST \rn-H ‘Content-Type: application/json’ \rn-d ‘{rn “smi”: “CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C”,rn “num_molecules”: 5,rn “algorithm”: “CMA-ES”,rn “property_name”: “QED”,rn “min_similarity”: 0.7,rn “iterations”: 10rn}’ \rn”http://localhost:8011/generate”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9db1554880>)])]>
NVIDIA BioNeMo Blueprints workflows can be adapted to various domain-specific use cases beyond drug discovery. For example, researchers can leverage generative AI models like RFdiffusion and ProteinMPNN in protein engineering to design stable protein binders with high affinity, drastically reducing the experimental iteration cycles.
By integrating modular NIM microservices with scalable platforms like GKE, industries ranging from biopharma to agriculture can deploy AI-driven solutions tailored to their unique challenges, enabling faster insights and more efficient processes at scale.
Conclusion
As we’ve explored in this blog post, GKE provides a robust and versatile platform for deploying and running both NVIDIA BioNeMo Framework and NVIDIA BioNeMo Blueprint. By leveraging GKE’s scalability, container orchestration capabilities, and integration with Google Cloud’s ecosystem, you can streamline the development and deployment of AI solutions in the life sciences and other domains.
Whether you’re accelerating drug discovery with BioNeMo or deploying generative AI models with NIMs, GKE empowers you to harness the power of AI and drive innovation. By leveraging the strengths of both platforms, you can streamline the deployment process, optimize performance, and scale your AI workloads seamlessly.
Ready to experience the power of NVIDIA BioNeMo on Google Cloud? Get started today by exploring the BioNeMo Framework and NIM catalog, deploying your first generative AI model on GKE, and unlocking new possibilities for your applications.
We’d like to thank the NVIDIA team members who helped contribute to this guide, Juan Pablo Guerra, Solutions Architect, and Kushal Shah, Senior Solutions Architect.
Read More for the details.