GCP – Meet the new GKE: Extending Autopilot to all qualifying clusters
Autopilot is an operational mode for Google Kubernetes Engine (GKE) that provides a fully managed environment and takes care of operational details, like provisioning compute capacity for your workloads. Autopilot allows you to spend more time on developing your own applications and less time on managing node-level details. This year, we upgraded Autopilot’s autoscaling stack to a fully dynamic container-optimized compute platform that rapidly scales horizontally and vertically to support your workloads. Simply attach a horizontal pod autoscaler (HPA) or vertical pod autoscaler (VPA) to your environment, and experience a fully dynamic platform that can scale rapidly to serve your users.
More and more customers, including Hotspring and Contextual AI, understand that Autopilot can dramatically simplify Kubernetes cluster operations and enhance resource efficiency for their critical workloads. In fact, in 2024, 30% of active GKE clusters were created in Autopilot mode. The new container-optimized compute platform has also proved popular with customers, who report rapid performance improvements in provisioning time. The faster GKE provisions capacity, the more responsive your workloads become, improving your customers’ experience and optimizing costs.
Today, we are pleased to announce that the best of Autopilot is now available in all qualified GKE clusters, not just dedicated Autopilot ones. Now, you can utilize Autopilot’s container-optimized compute platform and ease of operation from existing GKE clusters. It’s generally available, starting with clusters enrolled in the Rapid release channel and running GKE version 1.33.1-gke.1107000 or later. Most clusters will qualify and be able to access these new features as they roll out to the other release channels, except clusters enrolled in the Extended channel and those that use the older routes-based networking. To access these new features, enroll in the Rapid channel and upgrade your cluster version, or wait to be auto-upgraded.
How to use it
Autopilot features are offered in Standard clusters via compute classes, which are a modern way to group and specify compute requirements for workloads in GKE. GKE now has two built-in compute classes, autopilot
and autopilot-spot,
that are pre-installed on all qualified clusters running on GKE 1.33.1-gke.1107000 or later and enrolled in the Rapid release channel. Running your workload on Autopilot’s container-optimized compute platform is as easy as specifying the autopilot
(or autopilot-spot
) compute class, like so:
- code_block
- <ListValue: [StructValue([(‘code’, ‘apiVersion: v1rnkind: Podrnmetadata:rn name: timeserverrn labels:rn pod: timeserver-podrnspec:rn nodeSelector:rn cloud.google.com/compute-class: autopilotrn containers:rn – name: timeserver-containerrn image: docker.io/wdenniss/timeserver:1rn resources:rn requests:rn cpu: “50m”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f189cea1f10>)])]>
Better still, you can make the Autopilot container-optimized compute platform the default for a namespace, a great way to save both time and money. You get efficient bin-packing, where the workload is charged for resource requests (and can even still burst!), rapid scaling, and you don’t have to plan your node shapes and sizes.
Here’s how to set Autopilot as your default for a namespace:
- code_block
- <ListValue: [StructValue([(‘code’, ‘NAMESPACE_NAME=your_namespacernkubectl label namespaces $NAMESPACE_NAME cloud.google.com/default-compute-class=autopilot’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f189cea1640>)])]>
Pod sizes for the container-optimized compute platform start at 50 milli-CPU (that’s just 5% of 1 CPU core!), and can scale to 28vCPU. With the container-optimized compute platform you only pay for the resources your Pod requests, so you don’t have to worry about system overhead or empty nodes. Pods such as those larger than 28 vCPU or with specific hardware requirements can also run in Autopilot mode on specialized compute with node-based pricing via customized compute classes.
Run AI workloads on GPUs and TPUs with Autopilot
It’s easy to pair Autopilot’s container-optimized compute platform with specific hardware such as GPUs, TPUs and high-performance CPUs to run your AI workloads. You can run those workloads in the same cluster side by side Pods on the container-optimized compute platform. By choosing Autopilot mode for these AI workloads, you benefit from the Autopilot’s managed node properties, where we take a more active role in management. Furthermore, you also get our enterprise-grade privileged admission controls that require workloads to run in user-space, for better supportability, reliability and an improved security posture.
Here’s how to define your own customized compute class that runs in Autopilot mode with specific hardware, in this example a G2 machine type with NVIDIA L4s with two priority rules:
- code_block
- <ListValue: [StructValue([(‘code’, ‘apiVersion: cloud.google.com/v1rnkind: ComputeClassrnmetadata:rn name: gpu-l4-aprnspec:rn autopilot:rn enabled: truern priorities:rn – machineType: g2-standard-48rn spot: truern gpu:rn type: nvidia-l4rn count: 4rn – machineType: g2-standard-24rn spot: truern gpu:rn type: nvidia-l4rn count: 2’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f189cea1ca0>)])]>
A new way to use compute classes
We’re also making compute classes work better with a new provisioning mode that automatically provisions resources for compute classes, without changing how other workloads are scheduled on existing node pools. This means you can now adopt the new deployment paradigm of compute class (including the new Autopilot-enabled compute classes) at your own pace, without affecting existing workloads and deployment strategies.
Until now, to use compute class in Standard clusters with automatic node provisioning, you needed to enable node auto-provisioning for the entire cluster. Node auto-provisioning has been part of GKE for many years, but it was previously an all-or-nothing decision — you couldn’t easily combine a manual node pool with a compute class provisioned by node auto-provisioning without potentially changing how workloads outside of the compute class were scheduled. Now you can, with our new automatically provisioned compute classes. All Autopilot compute classes use this system, so it’s easy to run workloads in Autopilot mode side-by-side with your existing deployments (e.g., on manual node pools). You can also enable this feature on any compute class starting with clusters in the Rapid channel running GKE version 1.33.3-gke.1136000 or later.
Here’s how:
- code_block
- <ListValue: [StructValue([(‘code’, ‘apiVersion: cloud.google.com/v1rnkind: ComputeClassrnmetadata:rn name: gpu-l4rnspec:rn nodePoolAutoCreation:rn enabled: truern priorities:rn – machineType: g2-standard-48rn spot: truern gpu:rn type: nvidia-l4rn count: 4rn – machineType: g2-standard-24rn spot: truern gpu:rn type: nvidia-l4rn count: 2’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f189cea1d90>)])]>
With the Autopilot mode for compute classes in Standard clusters, and the new automatic provisioning mode for all compute classes, you can now introduce compute class as an option to more clusters without impacting how any of your existing workloads are scheduled. Customers we’ve spoken to like this, as they can adopt these new patterns gradually for new workloads and by migrating existing ones, without needing to plan a disruptive switch-over.
Autopilot for all
At Google Cloud, we believe in the power of GKE’s Autopilot mode to simplify operations for your GKE clusters and make them more efficient. Now, those benefits are available to all GKE customers! To learn more about GKE Autopilot and how to enable it for your clusters, check out these resources.
-
Learn how the container-optimized compute works under the hood to drive performance.
-
Watch the GKE Spotlight from NEXT ‘25, and read the announcement.
Read More for the details.