GCP – 11 ways to reduce your Google Cloud compute costs today
As the saying goes, “a penny saved is a penny earned,” and this couldn’t be more true when it comes to cloud infrastructure. In today’s competitive business landscape, you need to maintain the performance to meet your business needs. Luckily, Google Cloud’s Compute Engine and block storage services offer numerous opportunities to reduce costs without sacrificing performance, especially in the context of your migration and modernization initiatives.
In this article, we’ll explore 11 key ways to optimize your infrastructure spending on Google Cloud, from simple adjustments to strategic decisions that can result in significant long-term savings.
1. Choose the right VM instances
One of the most effective ways to reduce Compute Engine costs is to ensure that you’ve properly selected and right-sized your virtual machines (VMs) for their workloads to support your migration and modernization efforts. Whether you’re new to Google Cloud or already using Compute Engine, adopting the latest-generation VMs — such as N4, C4, C4D, and C4A — can deliver substantial savings and improved price-performance.
Powered by Google Cloud’s Titanium architecture, our latest-generation VMs offer faster CPUs, higher memory bandwidth, and more efficient virtualization than their predecessors, so you can handle the same workloads with fewer resources. For existing customers, migrating from older VM generations to the newest VMs can significantly lower total costs while helping you exceed current performance levels. Organizations that have made the switch often report 20–40% better performance along with meaningful reductions in cloud compute spend. For example, Elastic leveraged the general-purpose C4A machine series based on Google Cloud’s Arm-based Axion CPUs, to achieve a compelling efficiency and performance uplift for their workloads.
Beyond general-purpose VMs, we also offer specialized machine types to address unique customer requirements. Compute-optimized HPC VMs like H4D are designed for high-performance computing and data analytics, offering extreme performance for demanding workloads. M4 and X4 instances cater to memory-intensive applications, while Z3 instances are ideal for storage-intensive workloads. Furthermore, if you need complete control over your hardware environment and maximum performance isolation, we offer bare metal instances.
These options help ensure that even the most specialized and performance-sensitive workloads can find an optimal and cost-effective home within the Compute Engine portfolio.
2. Optimize your block storage selections
The best way to lower your block storage TCO, while ensuring your workloads remain successful, is to drive high resource efficiency. Hyperdisk makes it simple to drive high performance and high efficiency by enabling you to optimize your block storage to your workload and through Storage Pools. We’ll discuss each of these capabilities, and how you can use them to lower your block storage TCO below.
Workload Optimized: With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk enables you to independently provision performance and capacity at the volume level. You can leverage this capability to purchase just the capacity and performance you need, no more and no less. You can also take advantage of Hyperdisk Balanced’s “baseline” performance (i.e. included free with every volume), you can serve the vast majority of your VMs without purchasing any extra performance.
Storage Pools: Hyperdisk is the only hyperscale cloud block storage to offer thin-provisioned performance and capacity. With Hyperdisk Storage Pools, you can provision the aggregate performance and capacity your workload requires, while still provisioning the volume level capacity performance your workloads request (also known as thin-provisioning). This allows you to pay for the resources you need, not the sum of the volumes you’ve provisioned. As a result, you can lower your overall block storage TCO by as much as 50%.
For more information on how to select the right block storage for your workload and to see how customers have benefitted from Hyperdisk, read this blog.
3. Consider custom compute classes
To get the most out of our latest-generation VMs, Google Kubernetes Engine (GKE) custom compute classes (CCC) offer an advanced way to optimize compute choices and provide high availability. Instead of being limited to a single machine type for your workloads, you can define a prioritized list of VM instance types. This allows you to set the newest, most price-performant VMs — including our latest-generation VMs — as your top priority. GKE custom compute classes provide the capability to automatically and seamlessly spin up instances based on your specified priority list. This feature helps you maximize the availability of your compute capacity while still aiming for the most cost-effective options, so your workloads can scale reliably without manual intervention.
Here are some specific use cases for how custom compute classes can help you optimize costs:
- Autoscaling cost-performant fallbacks: When demand peaks, you might be tempted to autoscale using a highly available but less cost-efficient VM type. CCC allows you to take a tiered approach. You can set up several cost-efficient fallback alternatives, so that as demand increases, GKE first attempts to use the most cost-effective options, and progressively moves to the other choices in your list when necessary to meet demand.
- AI/ML inference: Running AI/ML inference workloads often involves significant compute resources. Instead of maintaining a large, static reservation that might sit idle during off-peak times, CCC lets you provision a minimal base reservation and leverage more cost-effective capacity types, such as Spot VMs, to handle peak inference demand — all orchestrated through your CCC configuration.
- Adopting new VM generations: Combine the power of GKE custom compute classes with Compute Flexible committed use discounts (Flex CUDs) to de-risk the adoption of new, cost-efficient VM series like N4 and C4. With CCC, you can define fallback options, providing workload resilience, while Flex CUDs offer financial adaptability, as the discounts apply across your total eligible compute spend, regardless of the specific VM series you use. This dual approach is a safe, cost-effective strategy for leveraging the latest hardware without disruption. For more information, read this blog.
- Using flexible Spot VMs: Spot VMs offer significant savings but can be preempted. Being constrained to a single Spot VM shape increases the risk that capacity will not be available. With CCC, you can define multiple fallback Spot VM types. This “spot surfing” capability allows the application to remain on cost-efficient Spot capacity by automatically pivoting to alternative Spot instance types if the primary choice is unavailable.
In short, by leveraging GKE CCC, you can artfully mix and match various VM types and consumption models, including On-Demand, Spot, DWS FlexStart, and instances covered by CUDs, to build a resilient and highly cost-optimized infrastructure that adapts to the unique needs and patterns of your workloads.
4. Leverage custom machine types (CMT)
Custom machine types, available on N4 VMs, allow you to precisely configure virtual machines to your exact specifications. Rather than selecting from predefined machine types that might include excess capacity, you can tailor the CPU-to-memory ratio specifically for your workloads, so you only pay for resources you actually use. This targeted approach minimizes waste and can significantly reduce your cloud spend, especially when migrating from on-premises to Google Cloud or from other cloud providers.
This flexibility becomes particularly valuable if your applications have unique resource profiles that don’t align well with our standard offerings. Custom machine types let you create the perfect environment for your needs. By avoiding the compromise of over-provisioning certain resources while potentially constraining others, you can achieve both better performance and more efficient spending across your Compute Engine deployment.
As an example, take a memory-intensive workload that runs best with 16 vCPU, and 70 GB memory. Normally, you would need to pick a VM with 128 GB memory with our standard shapes, or in other cloud contexts, resulting in higher costs to run your workload due to the extra provisioned resources. Instead, with custom machine types, you can easily launch a VM with 16 vCPU and 70 GB memory, resulting in an 18% cost savings vs standard N4-highmem-16 VMs.
5. Make the most of committed use discounts
CUDs are a strategic cost-saving opportunity for organizations with steady, predictable computing needs. By committing to resource usage over one- or three-year periods, you can reduce cloud costs by up to 70% compared to on-demand pricing. This approach not only helps ensure budget predictability but also converts fixed infrastructure spending into a financial advantage, making it ideal for stable workloads that support core business functions.
Google Cloud offers flexible CUD structures to align with various operational models. Resource-based commitments target specific machine types and regions, flexible commitments apply discounts across projects, regions, and machine series — great for dynamic environments. By analyzing historical usage and forecasting future needs, you can identify workloads suited for these discounts, reinvesting the savings into innovation and scaling initiatives.
6. Manage unused disk space
You pay for the total provisioned disk space, regardless of how much you actually use. Many organizations tend to over-provision storage “just in case,” which often leads to unnecessary and costly waste. For instance, if you provision a 100GB disk but only use 20GB, you’re still paying for the entire 100GB. Being intentional and precise with your storage allocations — rather than rounding up to common sizes — can lead to significant cost savings.
To optimize spending, it’s important to adopt a few best practices. Using Ops Agent, regularly audit disk usage across your infrastructure to identify and eliminate inefficiencies. Resize disks to align with actual consumption, allowing a reasonable buffer for growth. Implement automated alerts in Cloud Monitoring to detect underutilized disks and take corrective action. For stateless applications, consider using smaller boot disk images to minimize overhead and reduce costs even further.
In addition, consider the following optimization strategies to further reduce costs and improve efficiency:
- Use Google Cloud’s monitoring tools to track CPU, memory, and disk usage over time.
- Establish a regular review cycle to identify and right-size over-provisioned resources.
- Test workloads across different VM configurations to find the optimal balance between cost and performance.
7. Use Spot VMs
Spot VMs provide the same machine types and configuration options as standard virtual machines but at a significantly reduced cost — typically offering a 60% to 91% discount. This cost efficiency comes with the tradeoff of potential preemption at short notice, making them most suitable for workloads that are fault-tolerant and can recover quickly from unexpected interruptions. Spot VMs are designed to take advantage of unused compute capacity, allowing you to optimize your cloud spending without compromising access to high-performance resources.
Strong use cases for Spot VMs include batch processing jobs, big data and analytics workloads, continuous integration and deployment (CI/CD) pipelines, stateless web servers running in autoscaling groups, and compute-heavy tasks. When properly architected to handle interruptions — for example, by using job checkpointing, load balancing, task queues, or via GKE custom compute classes (see more above) — Spot VMs can play a critical role in minimizing infrastructure costs while maintaining high availability and system resilience. Leveraging Spot VMs in these scenarios lets you scale cost-effectively, especially when compute demand is variable or time-flexible.
8. Use optimization recommendations
Google Cloud’s Recommenders are a powerful tool designed to help you optimize your cloud resources efficiently. When browsing the Google Cloud console, you may see lightbulb icons next to specific resources — these indicate potential improvements identified by Google’s recommendation engine. By analyzing real-time usage patterns and current resource configurations, the Recommender delivers actionable insights tailored to each user’s unique environment. This intelligent system highlights opportunities not only to reduce costs but also to enhance security, performance, reliability, management efficiency, and environmental sustainability.
For example, there are idle VM recommendations to help you identify VM instances that have not been used over the last 1 to 14 days. Common recommendations include switching to more suitable machine types, rightsizing underutilized compute instances, or adopting more cost-effective storage solutions. The tool allows you to apply many of these changes directly, streamlining the optimization process. By continuously evaluating workloads and offering these automated, data-driven suggestions, the Recommendation Hub helps organizations maintain cloud performance while managing costs more effectively.
9. Take advantage of auto-scaling and scheduling
Matching your compute resources to actual demand patterns is one of the most effective ways to reduce cloud waste and improve overall cost efficiency. Many organizations over-provision their resources to handle peak workloads, leaving machines underutilized during off-peak periods. By aligning compute capacity more closely with real-time or predictable usage patterns, such as business hours or seasonal trends, you can significantly cut unnecessary spending without sacrificing performance.
Autoscaling is the key to achieving this efficiency. In fact, customers who leverage Google Compute Engine autoscaling for their virtual machines have seen average infrastructure cost savings of more than 40%.
You can implement autoscaling strategies to dynamically adjust resources based on CPU utilization, load balancing capacity, or custom application metrics, so that workloads receive the necessary compute power when needed, while scaling down automatically during low-demand periods.
For workloads with predictable patterns, such as those that fluctuate with business hours or planned seasonal events, schedule-based scaling is a particularly powerful tool. This approach allows you to proactively increase resources in anticipation of high demand and scale them down during lulls, for the performance you need without constant over-provisioning.
In addition to autoscaling, several practical implementation techniques can further optimize your resource usage. Setting up instance scheduling lets you automatically start and stop development and test environments according to business hours — a simple yet highly effective approach that can lead to cost savings of up to 70%. You can also leverage maintenance windows to reduce disruptions and resource consumption, by concentrating updates and system changes into low-usage periods. Together, these tactics help maintain high availability and performance while keeping infrastructure costs under control.
10. Understand your spend with detailed billing analysis
Before implementing any cost-saving strategies in Google Cloud, it’s essential to understand your current spending in detail. Google Cloud’s billing panel offers granular visibility into your expenses, including costs broken down by individual SKUs. This level of transparency lets you track where your money is going and identify potential inefficiencies. Begin by regularly reviewing your billing dashboard to monitor usage trends and spot anomalies. Applying labels and tags to your resources can further help categorize and attribute costs accurately, especially in complex environments with multiple projects or departments.
In addition, setting up budget alerts is a practical way to stay ahead of overspending by notifying you when costs approach or exceed predefined thresholds. It’s also important to identify and eliminate unused or idle resources, such as virtual machines or persistent disks that are no longer in active use — these can often be shut down or deleted to immediately reduce costs. By thoroughly analyzing your cost structure, you can uncover “low-hanging fruit” — resources that provide little or no value — and make data-driven decisions to optimize your cloud usage efficiently.
11. Consider serverless alternatives
Last but not least, Google Cloud’s serverless computing offerings provide a compelling alternative to traditional virtual machines, can deliver better cost efficiency, simplified operations, and greater scalability. By abstracting away infrastructure management, serverless platforms allow teams to focus on writing and deploying code without worrying about provisioning, scaling, or maintaining servers. This shift can not only reduce operational overhead but also cut costs by aligning compute spending directly with application usage.
There are multiple serverless options available, each tailored to different workloads. Cloud Run is designed for running containerized applications that need rapid scaling and flexible deployment. Cloud Run Functions supports lightweight, event-driven code execution for microservices or automation tasks. GKE (Autopilot Mode) simplifies Kubernetes operations by automatically managing nodes and scaling, allowing you to run Kubernetes workloads without handling the underlying infrastructure. All these options charge based on usage not allocation, significantly reducing costs associated with idle resources and over-provisioning. This makes them especially beneficial for variable or unpredictable workloads. Cloud Run and GKE both support GPU’s and flexibility to move between the two. You can start with Cloud Run then move to GKE or vice-versa. Some customers also leverage both offerings for workloads. The rule of thumb is to start with GKE if you need access to the Kubernetes API. Otherwise, start with Cloud Run.
Start reducing your costs today
Migrate to Google Cloud and optimize your infrastructure costs without compromising on what your workloads need. If you are new to Google Cloud, start with a migration assessment. Google Cloud’s Migration Center can help you with a clear understanding of your potential savings by migrating to Google Cloud, with detailed recommended paths for your workloads, along with TCO reports. Apply the strategies in this article and unlock substantial cost savings.
Read More for the details.