GCP – Google Cloud at GTC: A4 VMs now generally available, A4X VMs in preview
At Google Cloud, we’re thrilled to return to NVIDIA’s GTC AI Conference in San Jose CA this March 17-21 with our largest presence ever. The annual conference brings together thousands of developers, innovators, and business leaders to experience how AI and accelerated computing are helping humanity solve the most complex challenges. Join us to discover how to build and deploy AI with optimized training and inference, apply AI with real-world solutions, and experience AI with our interactive demos.
After being the first hyperscaler to make both NVIDIA’s HGX B200 and GB200 NVL72 available to customers with A4 and A4X VMs. We’re are pleased to announce that A4 VMs are generally available, and that A4X VMs are in preview with general availability coming soon.
-
A4X VMs: Accelerated by NVIDIA GB200 NVL72 GPUs, A4X VMs are purpose-built for training and serving the most demanding, extra-large-scale AI workloads — particularly those involving reasoning models, large language models (LLMs) with long context windows, and scenarios that require massive concurrency. This is enabled by unified memory across a large GPU domain and ultra-low-latency GPU-to-GPU connectivity. Each A4X VM contains 4 GPUs, and an entire 72 GPU system is connected via fifth-generation NVLink to deliver 720 petaflops of performance (FP8). A4X has achieved 860,000 tokens/sec of inference performance on a full NVL72 running Llama 2 70b
-
A4 VMs: Built on NVIDIA HGX B200 GPUs, the A4 VM provides excellent performance and versatility for diverse AI model architectures and workloads, including training, fine-tuning, and serving. Each A4 VM contains eight GPUs for a total of 72 petaflops of performance (FP8). A4 offers easy portability from prior generations of Cloud GPUs. This enables an easy upgrade to 2.2x increase in training performance over A3 Mega (NVIDIA H100 GPU).
“We’re excited that we were among the first to test A4 VMs, powered by NVIDIA Blackwell GPUs and Google Cloud’s AI Hypercomputer architecture. The sheer compute and memory advancements, combined with the 3.2 Tbps GPU-to-GPU interconnect via NVLink and the Titanium ML network adapter, are critical for us to train our models. Leveraging the Cluster Director simplifies the deployment and management of our large-scale training workloads. This gives our researchers the speed and flexibility to experiment, iterate, and refine trading models more efficiently.” – Gerard Bernabeu Altayo, Compute Lead, Hudson River Trading
The Google Cloud advantage
A4 and A4X VMs are part of Google Cloud’s AI Hypercomputer, our supercomputing architecture designed for high performance, reliability, and efficiency for AI workloads. AI Hypercomputer brings together Google Cloud’s workload-optimized hardware, open software, and flexible consumption models to help simplify deployments, improve performance, and optimize costs. A4 and A4X VMs benefit from the following AI Hypercomputer capabilities:
- AI-optimized architecture: A4 and A4X VMs are built on servers with our Titanium ML network adapter, which builds on NVIDIA ConnectX-7 network interface cards (NICs) to deliver a secure, high-performance cloud experience for AI workloads. Combined with our datacenter-wide 4-way rail-aligned network, A4 VMs deliver non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). You can scale to tens of thousands of NVIDIA Blackwell GPUs with our Jupiter network fabric with 13 Petabits/sec of bi-sectional bandwidth.
- Simplified deployment with pre-built solutions: For large training workloads, Cluster Director offers dense co-location of accelerator resources, to help ensure host machines are allocated physically close to one another, provisioned as blocks of resources, and interconnected with a dynamic ML network fabric that minimizes network hops and optimizes for the lowest latency.
- Scalable infrastructure: With support for up to 65,000 nodes per cluster, Google Kubernetes Engine (GKE) running on AI Hypercomputer is the most scalable Kubernetes service with which to implement a robust, production-ready AI platform. A4 and A4X VMs are natively integrated with GKE. And with integration to other Google Cloud services such as Hyperdisk ML for storage or BigQuery as a data warehouse, GKE facilitates data processing and distributed computing for AI workloads.
- Fully-integrated, open software: In addition to support for CUDA, we work closely with NVIDIA to optimize popular frameworks with XLA such as PyTorch and JAX (including the reference implementation, MaxText), enabling increased performance of GPU infrastructure. Developers can easily incorporate powerful techniques like a latency hiding scheduler to minimize communication overhead (see XLA optimizations).
- Flexible consumption models: In addition to the on-demand, committed use discount, and Spot consumption models, we reimagined cloud consumption for the unique needs of AI workloads with Dynamic Workload Scheduler, which offers two modes for different workloads: Flex Start mode for enhanced obtainability and better economics, and Calendar mode for predictable job start times and durations. Dynamic Workload Scheduler improves your access to AI accelerator resources, helps you optimize your spend, and can improve the experience of workloads such as training and fine-tuning jobs, by scheduling all the accelerators needed simultaneously.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9dac390f40>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
NVIDIA and Google Cloud: Better together
We’re continuously working together to provide our joint customers with an optimized experience. One of our recent collaborations includes bringing together software innovations to accelerate AI-driven drug discovery. Using the NVIDIA BioNeMo framework and blueprints on GKE, and PyTorch Lightning, we’re providing ready-to-use reference workflows for domain specific tasks. The NVIDIA BioNeMo Framework provides an optimized environment for training and fine-tuning biomolecular AI models. Read more here.
Meet Google Cloud
To connect with Google Cloud, please visit us at booth #914 at NVIDIA GTC, join our expert-led sessions listed below, or email us to set up a private meeting. Whether it’s your first time speaking with Google Cloud or the first time connecting with us at NVIDIA GTC, we’re looking forward to meeting with you.
Deep dive into AI at expert-led sessions
Join our expert-led sessions to gain in-depth knowledge and develop practical skills in AI development on Google.
Tuesday, March 18
Optimizing the Future of Ads with MoE Models
Time: 2:00 PM – 2:40 PM PDT
Speaker: Tris Warkentin, Director of Product Management, Google DeepMind
Wired for AI: Lessons from Networking 100K+ GPU AI Data Centers and Clouds
Time: 4:00 PM – 5:00 PM PDT
Speakers: Dan Lenoski, VP Networking, Google and more industry leaders
Wednesday, March 19
Accelerate AI: Enhance Performance and Efficiency Using Google Cloud
Time: 10:00 AM – 10:40 AM PDT
Speakers: Roy Kim, Director Google Cloud GPUs, and Scott Dietzen, CEO, Augment Code
Optimize your Workloads for Rack-Scale Interconnected GPU systems
Time: 3:00 PM – 3:40 PM PDT
Speakers: Jon Olson, Software Engineer, Google Cloud, and Pramod Ramarao, Product Manager, Google
Thursday, March 20
Unlock the Speed of Light for Data Science Workflows With Gemini Coding Assistant
Time: 8:00 AM – 8:40 AM PDT
Speaker: Paige Bailey, ENG MGR, Developer Relations, Google
Build Next-Generation AI Factories With DOCA-Accelerated Networking
Time: 9:00 AM – 9:40 AM PDT
Speakers: Valas Valancius, Senior Staff Software Engineer, Google Cloud; Ariel Kit, Director, Product Management, NVIDIA; David Wetherall, Distinguished Engineer, Google Cloud
Physical AI for Humanoids: How Google Robotics Uses Simulation to Accelerate Humanoid Robotics Training
Time: 9:00 AM – 9:40 AM PDT
Speaker: Erik Frey, Lead Researcher, Google
Toward Rational Drug Design With AlphaFold 3
Time: 10:00 AM – 10:40 AM PDT
Speakers: Max Jaderberg, Chief AI Officer, Isomorphic Labs (DeepMind) and Sergei Yakneen, Chief Technology Officer, Isomorphic Labs (DeepMind)
AI in Action: Optimize Your AI Infrastructure
Time: 11:00 AM – 11:40 AM PDT
Speakers: Chelsie Czop, Senior Product Manager, Google Cloud; Kshetrajna Raghavan, Machine Learning Engineer, Shopify; Ashwin Kannan, Principal Machine Learning Engineer, Palo Alto Networks; Jia Li, Chief AI Officer, Livex.AI
Horizontal Scaling of LLM Training with JAX
Time: 2:00 PM – 2:40 PDT
Speakers: Andi Gavrilescu, Sr. Engineering Manager, Google; Matthew Johnson, Research Scientist, Google
Abhinav Goel, Senior Deep Learning Architect, Google
On-Demand, Virtual Sessions
S74318: Deploy AI and HPC on NVIDIA GPUs With Google
Speakers: Annie Ma-Weaver, HPC group Product Manager, Google Cloud; Wyatt Gorman, HPC and AI Solutions Manager, Google Cloud; Sam Skillman, HPC Software Engineer, Google Cloud
S74319: Supercharge Large-Scale AI with Google Cloud AI Hypercomputer
Speakers: Rajesh Anantharaman, Product Management Lead, ML Software, Google Cloud and Deepak Patil, Product Manager, Google Cloud
In addition to our expert-led sessions at NVIDIA GTC, we invite you to join us at the following events onsite (limited space available):
-
Executive Roundtable, Wednesday, March 19 at 8 AM
-
DGX Cloud on Google Cloud Roundtable, Thursday, March 20 at 8 AM
-
Developer Hands On Lab, Thursday, March 20 at 10 AM
Read More for the details.