GCP – Run your fault-tolerant workloads cost-effectively with Google Cloud Spot VMs, now GA
Available in GA today, you can now begin deploying Spot VMs in your Google Cloud projects to start saving now. For an overview of Spot VMs, see our Preview launch blog and for a deeper dive, check out our Spot VM documentation.
Modern applications such as microservices, containerized workloads, and horizontal scalable applications are engineered to persist even when the underlying machine does not. This architecture allows you to leverage Spot VMs to access capacity and run applications at a low price. You will save 60 – 91% off the price of our on-demand VMs with Spot VMs.
To make it even easier to utilize Spot VMs, we’ve incorporated Spot VM support in a variety of tools.
Google Kubernetes Engine (GKE)
Containerized workloads are often a good fit for Spot VMs as they are generally stateless and fault tolerant. Google Kubernetes Engine (GKE) provides container orchestration. Now with native support for Spot VMs, use GKE to manage your Spot VMs to get cost savings. On clusters running GKE version 1.20 and later, the kubelet graceful node shutdown feature is enabled by default, which allows the kubelet to notice the preemption notice, gracefully terminate Pods that are running on the node, restart Spot VMs, and reschedule Pods. As part of this launch, Spot VM support in GKE is now GA.
For best practices on how to use GKE with Spot VMs, see our architectural walkthrough on running web applications on GKE using cost-optimized Spot VMs as well as our GKE Spot VM documentation.
GKE Autopilot Spot Pods
Kubernetes is a powerful and highly configurable system. However, not everyone needs that much control and choice. GKE Autopilot provides a new mode of using GKE which automatically applies industry best practices to help minimize the burden of node management operations. When using GKE Autopilot, your compute capacity is automatically adjusted and optimized based on your workload needs. To take your efficiency to the next level, mix in Spot Pods to drastically reduce the cost of your nodes. GKE Autopilot gracefully handles preemption events by redirecting requests away from nodes with preempted Spot Pods and manages autoscaling and scheduling to ensure new replacement nodes are created to maintain sufficient resources.
Spot Pods for GKE Autopilot is now GA, and you can learn more through the GKE Autopilot and Spot Pods documentation.
Terraform
Terraform makes managing infrastructure as code easy, and Spot VM support is now available for Terraform on Google Cloud. Using Terraform templates to define your entire environment, including networking, disks, and service accounts to use with Spot VMs, makes continuous spin-up and tear down of deployments a convenient, repeatable process. Terraform is especially important when working with Spot VMs as the resources should be treated as ephemeral.
Terraform works even better in conjunction with GKE to define and manage a node poolseparately from the cluster control plane. This combination gives you the best of both worlds by using Terraform to set up your compute resources while allowing GKE to handle autoscaling and autohealing to make sure you have sufficient VMs after preemptions.
Slurm
Slurm is one of the leading open-source HPC workload managers used in TOP 500 supercomputers around the world. Over the past five years, we’ve worked with SchedMD, the company behind Slurm, to release ever-improving versions of Slurm on Google Cloud. SchedMD recently released the newest Slurm for Google Cloud scripts, available through the Google Cloud Marketplace and in SchedMD’s GitHub repository. This latest version of Slurm for Google Cloud includes support for Spot VMs via the Bulk API. You can read more about the release in the Google Cloud blog post.
Read More for the details.