GCP – With MultiKueue, grab GPUs for your GKE cluster, wherever they may be
Artificial Intelligence (AI) and large language models (LLMs) are experiencing explosive growth, powering applications from machine translation to artistic creation. These technologies rely on intensive computations that require specialized hardware resources, like GPUs. But access to GPUs can be challenging, both in terms of availability and cost.
For Google Cloud users, the introduction of Dynamic Workload Scheduler (DWS) transformed how you can access and use GPU resources, particularly within a Google Kubernetes Engine (GKE) cluster. Dynamic Workload Scheduler optimizes AI/ML resource access and spending by simultaneously scheduling necessary accelerators like TPUs and GPUs across various Google Cloud services, improving the performance of training and fine-tuning jobs.
Further, Dynamic Workload Scheduler offers an easy and straightforward integration between GKE and Kueue, a cloud-native job scheduler, making it easier to access GPUs as quickly as possible, in a given region, for a given GKE cluster.
But what if you want to deploy your workload in any available region, as soon as possible, as soon as DWS provides you the resources your workload needs?
This is where MultiKueue, a Kueue feature, comes into play. With MultiKueue, GKE, and Dynamic Workload Scheduler, you can wait for accelerators in multiple regions. Dynamic Workload Scheduler automatically provisions resources in the best GKE clusters as soon as they are available. By submitting workloads to a global queue, MultiKueue executes them in the region with available GPU resources, helping to optimize global resource usage, lowering costs, and speeding up processing.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee55f3b3a00>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
MultiKueue
MultiKueue enables workload distribution across multiple GKE clusters in different regions. By identifying clusters with available resources, MultiKueue simplifies the process of dispatching jobs to the optimal location.
Dynamic Workload Scheduler on GKE Autopilot, our managed Kubernetes service that automatically handles the provisioning, scaling, security, and maintenance of your container infrastructure; it’s supported on GKE Autopilot 1.30.3. Let’s take a deeper look at how to set up and manage MultiKueue with Dynamic Workload Scheduler, so you can obtain GPU resources faster.
MultiKueue cluster roles
MultiKueue provides two distinct cluster roles:
-
Manager cluster – Establish and maintain the connection with the worker clusters, as well as create and monitor remote objects (workloads or jobs) while keeping the local ones in sync.
-
Worker cluster – A simple standalone Kueue cluster that lets you execute the job submitted by the manager cluster.
Creating a MultiKueue cluster
In this example we create four GKE Autopilot clusters:
-
One manager cluster in europe-west4
-
Three worker clusters in
-
europe-west4
-
us-east4
-
asia-southeast1
Let’s take a look at how this works in the following step-by-step example. You can access the files for this example in this github repository.
1. Clone github repository
- code_block
- <ListValue: [StructValue([(‘code’, ‘git clone https://github.com/GoogleCloudPlatform/ai-on-gke.gitrncd ai-on-gke/tutorials-and-examples/workflow-orchestration/dws-multiclusters-example’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55f3b3250>)])]>
2. Create GKE clusters
- code_block
- <ListValue: [StructValue([(‘code’, ‘terraform -chdir tf initrn terraform -chdir tf planrn terraform -chdir tf apply -var project_id=<YOU PROJECT_ID>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55f3b3e20>)])]>
This terraform script creates the required GKE clusters and adds four entries to your kubeconfig files:
-
manager-europe-west4
-
worker-us-east4
-
worker-europe-west4
-
worker-asia-southeast1
Then you can switch between contexts easily with
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl config use-context <context name>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55f3b3b80>)])]>
3. Install and configure MultiKueue
- code_block
- <ListValue: [StructValue([(‘code’, ‘./deploy-multikueue.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55da977f0>)])]>
This script:
-
Installs kueue in the four clusters
-
Enables and configures MultiKueue in the manager cluster
-
Creates a podMonitoring resource for each clusters that enables kueue metrics to be sent to Google Cloud Managed Service for Prometheus
-
Configures the connection between the manager cluster and the worker clusters
-
Configures Kueue in the worker clusters
GKE clusters, Kueue with MultiKueue, and DWS are now configured and ready to use. Once you submit your jobs, the Kueue manager distributes them across the three worker clusters.
In the dws-multi-worker.yaml file, you’ll find the Kueue configuration for the worker clusters, including the manager configuration.
The following script provides a basic example of how to set up the MultiKueue AdmissionCheck with three worker clusters.
- code_block
- <ListValue: [StructValue([(‘code’, ‘apiVersion: kueue.x-k8s.io/v1beta1rnkind: AdmissionCheckrnmetadata:rn name: sample-dws-multikueuernspec:rn controllerName: kueue.x-k8s.io/multikueuern parameters:rn apiGroup: kueue.x-k8s.iorn kind: MultiKueueConfigrn name: multikueue-dwsrn—rnapiVersion: kueue.x-k8s.io/v1alpha1rnkind: MultiKueueConfigrnmetadata:rn name: multikueue-dwsrnspec:rn clusters:rn – multikueue-dws-worker-asiarn – multikueue-dws-worker-usrn – multikueue-dws-worker-eurn—‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55da97ca0>)])]>
4. Submit jobs
Ensure you’re using the manager kubecontext when submitting jobs.
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl config use-context manager-europe-west4rnkubectl create -f job-multi-dws-autopilot.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55da97f40>)])]>
To observe how the MultiKueue admission check distributes jobs among worker clusters, you can submit the job creation request multiple times.
5. Get jobs status
To check the job status and determine the scheduled region, execute the following command
- code_block
- <ListValue: [StructValue([(‘code’, ‘kubectl get workloads.kueue.x-k8s.io -o jsonpath='{range .items[*]}{.status.admissionChecks}{“\n”}{end}”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55da97850>)])]>
6. Delete resources
Finally, be sure to delete the four GKE clusters you created to try out this functionality:
- code_block
- <ListValue: [StructValue([(‘code’, ‘terraform -chdir=tf destroy -var project_id=<YOUR_PROJECT_ID>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee55da97820>)])]>
What’s next
So that’s how you can leverage MultiKueue, GKE, and DWS to streamline global job execution, optimize speed, and eliminate the need for manual node management!
This setup also addresses the needs of those with data residency requirements, allowing you to dedicate subsets of clusters for different workloads and ensure compliance.
To further enhance your setup, you can leverage advanced kueue features like team management with local kueue or workload priority classes. Additionally, you can gain valuable insights by creating a Grafana or Cloud Monitoring dashboard that utilizes Kueue metrics, which are automatically handled by Google Managed Service for Prometheus via the PodMonitoring resources.
Read More for the details.