GCP – Blackwell is here — new A4 VMs powered by NVIDIA B200 now in preview
Modern AI workloads require powerful accelerators and high-speed interconnects to run sophisticated model architectures on an ever-growing diverse range of model sizes and modalities. In addition to large-scale training, these complex models need the latest high-performance computing solutions for fine-tuning and inference.
Today, we’re excited to bring the highly-anticipated NVIDIA Blackwell GPUs to Google Cloud with the preview of A4 VMs, powered by NVIDIA HGX B200. The A4 VM features eight Blackwell GPUs interconnected by fifth-generation NVIDIA NVLink, and offers a significant performance boost over the previous generation A3 High VM. Each GPU delivers 2.25 times the peak compute and 2.25 times the HBM capacity, making A4 VMs a versatile option for training and fine-tuning for a wide range of model architectures, while the increased compute and HBM capacity makes it well-suited for low-latency serving.
The A4 VM integrates Google’s infrastructure innovations with Blackwell GPUs to bring the best cloud experience for Google Cloud customers, from scale and performance, to ease-of-use and cost optimization. Some of these innovations include:
- Enhanced networking: A4 VMs are built on servers with our Titanium ML network adapter, optimized to deliver a secure, high-performance cloud experience for AI workloads, building on NVIDIA ConnectX-7 network interface cards (NICs). Combined with our datacenter-wide 4-way rail-aligned network, A4 VMs deliver non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). Customers can scale to tens of thousands of GPUs with our Jupiter network fabric with 13 Petabits/sec of bi-sectional bandwidth.
- Google Kubernetes Engine: With support for up to 65,000 nodes per cluster, GKE is the most scalable and fully automated Kubernetes service for customers to implement a robust, production-ready AI platform. Out of the box, A4 VMs are natively integrated with GKE. Integrating with other Google Cloud services, GKE facilitates a robust environment for the data processing and distributed computing that underpin AI workloads.
- Vertex AI: A4 VMs will be accessible through Vertex AI, our fully managed, unified AI development platform for building and using generative AI, and which is powered by the AI Hypercomputer architecture under the hood.
- Open software: In addition to PyTorch and CUDA, we work closely with NVIDIA to optimize JAX and XLA, enabling the overlap of collective communication and computation on GPUs. Additionally, we added optimized model configurations and example scripts for GPUs with XLA flags enabled.
- Hypercompute Cluster: Our new highly scalable clustering system streamlines infrastructure and workload provisioning, and ongoing operations of AI supercomputers with tight GKE and Slurm integration.
- Multiple consumption models: In addition to the On-demand, Committed use discount, and Spot consumption models, we reimagined cloud consumption for the unique needs of AI workloads with Dynamic Workload Scheduler, which offers two modes for different workloads: Flex Start mode for enhanced obtainability and better economics, and Calendar mode for predictable job start times and durations.
Hudson River Trading, a multi-asset-class quantitative trading firm, will leverage A4 VMs to train its next generation of capital market model research. The A4 VM, with its enhanced inter-GPU connectivity and high-bandwidth memory, is ideal for the demands of larger datasets and sophisticated algorithms, accelerating Hudson River Trading’s ability to react to the market.
Better together: A4 VMs and Hypercompute Cluster
Effectively scaling AI model training requires precise and scalable orchestration of infrastructure resources. These workloads often stretch across thousands of VMs, pushing the limits of compute, storage, and networking.
Hypercompute Cluster enables you to deploy and manage these large clusters of A4 VMs with compute, storage and networking as a single unit. This makes it easy to manage complexity while delivering exceptionally high performance and resilience for large distributed workloads. Hypercompute Cluster is engineered to:
- Deliver high performance through co-location of A4 VMs densely packed to enable optimal workload placement
- Optimize resource scheduling and workload performance with GKE and Slurm, packed with intelligent features like topology-aware scheduling
- Increase reliability with built-in self-healing capabilities, proactive health checks, and automated recovery from failures
- Enhance observability and monitoring for timely and customized insights
- Automate provisioning, configuration, and scaling, integrated with GKE and Slurm
We’re excited to be the first hyperscaler to announce preview availability of an NVIDIA Blackwell B200-based offering. Together, A4 VMs and Hypercompute Cluster make it easier for organizations to create and deliver AI solutions across all industries. If you’re interested in learning more, please reach out to your Google Cloud representative.
Read More for the details.