GCP – Announcing smaller machine types for A3 High VMs
Today, an increasing number of organizations are using GPUs to run inference1 on their AI/ML models. Since the number of GPUs needed to serve a single inference workload varies, organizations need more granularity in the number of GPUs in their virtual machines (VMs) to keep costs low while scaling with user demand.
You can use A3 High VMs powered by NVIDIA H100 80GB GPUs in multiple generally available machine types of 1NEW, 2NEW, 4NEW, and 8 GPUs.
Accessing smaller H100 machine types
All A3 machine types are available through the fully managed Vertex AI, as nodes through Google Kubernetes Engine (GKE), and as VMs through Google Compute Engine.
The 1, 2, and 4 A3 High GPU machine types are available as Spot VMs and through Dynamic Workload Scheduler (DWS) Flex Start mode.
A3 VMs portfolio powered by NVIDIA H100 GPUs |
||
Machine Type(GPUs count, GPU memory) |
Vertex AI |
Google Kubernetes Engine |
a3-highgpu-1g NEW |
|
|
a3-highgpu-2g NEW |
|
|
a3-highgpu-4g NEW |
||
a3-highgpu-8g |
|
|
a3-megagpu-8g |
aAvailable only through Model Garden owned capacity.
Google Kubernetes Engine
For almost a decade, GKE has been the platform-of-choice for running web applications and microservices, and now it provides a cost efficient, highly scalable, and open platform for training and serving AI workloads. GKE Autopilot reduces operational cost and offers workload-level SLAs, and is a fantastic choice for inference workloads — bring your workload and let Google do the rest. You can use the 1, 2, and 4 A3 High GPU machine types through both GKE Standard and GKE Autopilot modes of operation.
Below are two examples of creating node pools in your GKE cluster with a3-highgpu-1g machine type using Spot VMs and Dynamic Workload Scheduler Flex Start mode.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e447b6804c0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
Using Spot VMs with GKE
Here’s how to request and deploy a3-highgpu-1g Spot VM on GKE using the gcloud API.
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud container node-pools create NODEPOOL_NAME \rn –cluster CLUSTER_NAME \rn –region CLUSTER_REGION \rn –node-locations GPU_ZONE1,GPU_ZONE2 \rn –machine-type a3-highgpu-1g \rn –accelerator type=nvidia-h100-80gb,count=1,gpu-driver-version=latest \rn –image-type COS_CONTAINERD \rn –spot’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e447b680550>)])]>
Using Dynamic Workload Scheduler Flex Start mode with GKE
Here’s how to request a3-highgpu-1g using Dynamic Workload Scheduler Flex Start mode with GKE.
- code_block
- <ListValue: [StructValue([(‘code’, ‘gcloud beta container node-pools create NODEPOOL_NAME \rn –cluster CLUSTER_NAME \rn –region CLUSTER_REGION \rn –node-locations GPU_ZONE1,GPU_ZONE2 \rn –enable-queued-provisioning \rn –machine-type=a3-highgpu-1g \rn –accelerator type=nvidia-h100-80gb,count=1,gpu-driver-version=latest \rn –enable-autoscaling \rn –num-nodes=0 \rn –total-max-nodes TOTAL_MAX_NODES \rn –location-policy=ANY \rn –reservation-affinity=none \rn –no-enable-autorepair’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e447b680970>)])]>
This creates a GKE node pool with Dynamic Workload Scheduler enabled and that contains zero nodes. You can then run your workloads with Dynamic Workload Scheduler.
Vertex AI
Vertex AI is Google Cloud’s fully managed, unified AI development platform for building and using predictive and generative AI. With the new 1, 2, and 4 A3 High GPU machine types, Model Garden customers can deploy hundreds of open models cost-effectively and with strong performance.
What our customers are saying
“We use Google Kubernetes Engine to run the backend for our AI-assisted software development product. Smaller A3 machine types have enabled us to reduce the latency of our real-time code assist models by 36% compared to A2 machine types, significantly improving user experience.” – Eran Dvey Aharon, VP R&D, Tabnine
Get started today
At Google Cloud, our goal is to provide you with the flexibility you need to run inference for your AI and ML models cost-effectively as well as with great performance. The availability of A3 High VMs using NVIDIA H100 80GB GPUs in smaller machine types provides you with the granularity you need to scale with user demand while keeping costs in check.
1. AI or ML inference is the process by which a trained AI model uses its training data to calculate output data or make predictions about new data points or scenarios.
Read More for the details.