GCP – New G4 VMs with NVIDIA RTX PRO 6000 Blackwell power AI, graphics, gaming and beyond
Today, we’re excited to announce the preview of our new G4 VMs based on NVIDIA RTX PRO 6000 Blackwell Server edition — the first cloud provider to do so. This follows the introduction earlier this year of A4 and A4X VMs powered by NVIDIA Blackwell GPUs, designed for large-scale AI training and serving. At the same time, we’re also seeing growing demand for GPUs to power a diverse range of workloads and data formats. G4 VMs round out our 4th generation NVIDIA GPU portfolio and bring a new level of performance and flexibility to enterprises and creators.
G4 VMs combine eight NVIDIA RTX PRO 6000 GPUs, two AMD Turin CPUs, and Google Titanium offloads:
- RTX PRO 6000 Blackwell GPUs provide new fifth-generation Tensor Cores, second-generation Transformer Engine supporting FP6 and FP4 precision, fourth-generation Ray Tracing (RT) Cores, and Multi-Instance GPU (MIG) capabilities, delivering 4x the compute and memory, and 6x memory bandwidth compared to G2 VMs.
- Turin CPUs offer up to 384 vCPUs and 1.4TB DDR5 memory for a ratio of 48 vCPU per GPU. This enables embedding models with precompute features on CPUs and graphics, where the CPU helps orchestrate simulations.
- Titanium offloads provide dedicated network processing with up to 400 Gbps bandwidth that’s 4x faster than in G2 VMs.
The G4 VM can power a variety of workloads, from cost-efficient inference, to advanced physical AI, robotics simulations, generative AI-enabled content creation, and next-generation game rendering. For example, with advanced ray-tracing cores to simulate the physical behavior of light, NVIDIA RTX PRO 6000 Blackwell provides over 2x performance of the prior generation, providing hyper-realistic graphics for complex, real-time rendering. For demanding graphics and physical AI-enabled applications, being able to run NVIDIA Omniverse workloads natively unlocks new possibilities for the manufacturing, automotive, and logistics industries, where digital twins and real-time simulation are rapidly transforming operations. G4 VMs also support the NVIDIA Dynamo inference framework to enable high-throughput, low-latency AI inference for generative models at scale.
Customers across industries — from media and entertainment to manufacturing, automotive, and gaming — are onboarding to use G4 VMs to accelerate AI-powered content creation, advanced simulation, and high-performance visualization:
“Our initial tests of the G4 VM show great potential, especially for self-hosted LLM inference use cases. We are excited to benchmark the G4 VM for a variety of other ranking workloads in the future.” – Vinay Kola, Snap, Senior Manager, Software Engineering
-
Altair is going to help customers accelerate their computer aided engineering (CAE) workloads with the performance and large memory of Google Cloud’s G4 instances.
-
Ansys will help its customers leverage Google Cloud’s G4 instances to accelerate their simulation workloads.
-
AppLovin is excited to use G4 for ad serving and recommendations.
-
WPP is excited to use G4 to continue ground-breaking work with physically-accurate generative AI and robotics simulation.
-
Nuro is looking to run drive simulations on G4 via NVIDIA Omniverse.
-
A major player in the video game industry is looking to use G4 for their next generation game rendering.
G4 VMs provide 768 GB of GDDR7 memory and 384 vCPUs with 12 TiB of Titanium local SSD, extensible with up to 512 TiB of Hyperdisk network block storage. For design and simulation workloads, G4 VMs support third-party engineering and graphics applications like Altair HyperWorks, Ansys Fluent, Autodesk AutoCAD, Blender, Dassault SolidWorks, and Unity.
G4 VMs are available as part of AI Hypercomputer, Google Cloud’s fully integrated AI supercomputing system, and work natively with Google Cloud services like Google Kubernetes Engine, Google Cloud Storage, and Vertex AI. Many customers use a combination of services such as Vertex AI or GKE with NVIDIA GPUs on Google Compute Engine and Google Cloud HyperdiskML for AI inference. Hyperdisk provides ultra-low latency and supports up to 500K IOPS and 10,000 MiB/s throughput per instance — making it well-suited for demanding inference workloads.
Machine Type |
GPUs |
GPU Memory (GB) |
vCPUs |
Host Memory (GB) |
Local SSD (GB) |
---|---|---|---|---|---|
g4-standard-384 |
8 |
768 |
384 |
1,440 |
12,000 |
G4 is currently in preview and will be available globally by the end of the year. Reach out to your Google Cloud Sales representative to learn more.
Read More for the details.