2025 08 28

GCP – GKE under the hood: Container-optimized compute delivers fast autoscaling for Autopilot

The promise of Google Kubernetes Engine (GKE) is the power of Kubernetes with ease of management, including planning and creating clusters, deploying and managing applications, configuring networking, ensuring security, and scaling workloads. However, when it comes to autoscaling workloads, customers tell us the fully managed mode of operation, GKE Autopilot, hasn’t always delivered the speed and efficiency they need. That’s because autoscaling a Kubernetes cluster involves creating and adding new nodes, which can sometimes take several minutes. That’s just not good enough for high-volume, fast-scale applications.

Enter the container-optimized compute platform for GKE Autopilot, a completely reimagined autoscaling stack for GKE that we introduced earlier this year. In this blog, we take a deeper look at autoscaling in GKE Autopilot, and how to start using the new container-optimized compute platform for your workloads today.

Understanding GKE Autopilot and its scaling challenges

With the fully managed version of Kubernetes, GKE Autopilot users are primarily responsible for their applications, while GKE takes on the heavy lifting of managing nodes and nodepools, creating new nodes, and scaling applications. With traditional Autopilot, if an application needed to scale quickly, GKE first needed to provision new nodes onto which the application could scale, which sometimes took several minutes.

To circumvent this, users often employed techniques like “balloon pods” — creating dummy pods with low priority to hold onto nodes; this helped ensure immediate capacity for demanding scaling use cases. However, this approach is costly, as it involves holding onto actively unused resources, and is also difficult to maintain.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9751f0c820>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Introducing the container-optimized compute platform

We developed the container-optimized compute platform with a clear mission: to provide you with near-real-time, vertically and horizontally scalable compute capacity precisely when you need it, at optimal price and performance. We achieved this through a fundamental redesign of GKE’s underlying compute stack.

The container-optimized compute platform runs GKE Autopilot nodes on a new family of virtual machines that can be dynamically resized while they are running, from fractions of a CPU, all without disrupting workloads. To improve the speed of scaling and resizing, GKE clusters now also maintain a pool of dedicated pre-provisioned compute capacity that can be automatically allocated for workloads in response to increased resource demands. More importantly, given that with GKE Autopilot, you only pay for the compute capacity that you requested, this pre-provisioned capacity does not impact your bill.

The result is a flexible compute that provides capacity where and when it’s required. Key improvements include:

Up to 7x faster pod scheduling time compared to clusters without container-optimized compute
Significantly improved application response times for applications with autoscaling enabled
Introduction of in-place pod resize in Kubernetes 1.33, allowing for pod resizing without disruption

The container-optimized compute platform also includes pre-enabled high-performance Horizontal Pod Autoscaler (HPA) profile, which delivers:

Highly consistent horizontal scaling reaction times
Up to 3x faster HPA calculations
Higher resolution metrics, leading to improved scheduling decisions
Accelerated performance for up to 1000 HPA objects

All these features are now available out of the box in GKE Autopilot 1.32 or later.

The power of the new platform is evident in demonstrations where replica counts are rapidly scaled, showcasing how quickly new pods get scheduled.

How to leverage container-optimized compute

To benefit from these improvements in GKE Autopilot, simply create a new GKE Autopilot cluster based on GKE Autopilot 1.32 or later.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud container clusters create-auto <cluster_name> \rn –location=<region> \rn –project=<project_id>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9751f0ca30>)])]>

If your existing cluster is on an older version, upgrade it to 1.32 or newer to benefit from container-optimized compute platform’s new features offered.

To optimize performance, we recommend that you utilize the general purpose compute class for your workload. While the container-optimized compute platform supports various types of workloads, it works best with services that require gradual scaling and small (2 CPU or less) resource requests like web applications.

While the container-optimized compute platform is versatile, it is not currently suitable for specific deployment types:

One-pod-per-node deployments, such as anti-affinity situations
Batch workloads

The container-optimized compute platform marks a significant leap forward in improving application autoscaling within GKE and will unlock more capabilities in the future. We encourage you to try it out today in GKE Autopilot.

GCP – GKE under the hood: Container-optimized compute delivers fast autoscaling for Autopilot

Understanding GKE Autopilot and its scaling challenges

Introducing the container-optimized compute platform

How to leverage container-optimized compute

Related Posts

AWS – Announcing larger managed database bundles for Amazon Lightsail

AWS – Amazon EMR Serverless adds support for job run level cost allocation

GCP – How Hackensack Meridian Health de-risked network migration using VPC Flow Logs