GCP – Selecting the right Hyperdisk block storage for your workloads
As you adopt Google Cloud or migrate to the latest Compute Engine VMs or to Google Kubernetes Engine (GKE), selecting the right block storage for your workload is crucial. Hyperdisk, Google Cloud’s workload-optimized block storage that’s designed for our latest VM families (C4, N4, M4, and more), delivers high-performance storage volumes that are cost-efficient, easily managed at scale, and enterprise-ready. In this post, we guide you through the basics and help you choose the optimal Hyperdisk for your environment.
Introduction to Hyperdisk block storage
With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk is available in a few flavors:
-
Hyperdisk Balanced: Designed to fit most workloads and offers the best combination and balance of price and performance. This is also the boot disk for your compute instances. With Hyperdisk Balanced, you can independently configure the capacity, throughput, and IOPS of each volume. Hyperdisk Balanced is available in High Availability and Multi-writer mode.
-
Hyperdisk Extreme: Delivers the highest IOPS of all Hyperdisk offerings and is suited for high-end, performance-critical databases. With Hyperdisk Extreme, you can drive up to 350K IOPS from a single volume.
-
Hyperdisk Throughput: Delivers capacity at the cost of cold object storage with the semantics of a disk. Hyperdisk Throughput offers high throughput for bandwidth and capacity-intensive workloads that do not require low latency. It also can be used to deliver cost-effective disks for cost-sensitive workloads (e.g., cold disks).
-
Hyperdisk ML: Purpose-built for loading static data into your compute clusters. With Hyperdisk ML, you hydrate the disk with a fixed data set (such as model weights or binaries), then connect up to 2,500 compute instances to the same volume, so a single volume can serve over 150x more compute instances than competitive block storage volumes1 in read-only mode. You get exceptionally high aggregate throughput across all of those nodes, enabling you to accelerate inference startup, train models faster, and ensure your valuable compute resources are highly utilized.
You can also leverage Hyperdisk Storage Pools, which lowers TCO and simplifies operations by pre-provisioning an aggregate amount of capacity and performance, which is then dynamically consumed by volumes in that pool. You create a storage pool with the aggregate capacity and performance that your workloads will need, and then create disks in the storage pool. You can then attach the disks to your VMs. When you create the disks, you can create them with a much larger size or provisioned performance limit than is needed. This simplifies planning and provides room for growth later, without needing to change the disk’s provisioned size or performance.
You can also use a set of comprehensive data protection capabilities such as high availability, cross-region replication and recovery, backup, and snapshots to protect your business critical workloads.
For specifics around capabilities, capacity, machine support, and performance, please visit the documentation.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e8f019c9d30>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Recommendations for the most common workloads
To make choosing the right Hyperdisk architecture simpler, here are high-level recommendations for some of the most common workloads we see. For an enterprise, the Hyperdisk portfolio lets you optimize an entire three-tier application matching the needs of each component of your application to the different flavors of Hyperdisk.
Enterprise applications including general-purpose databases:
Hyperdisk Balanced combined with Storage Pools offers an excellent solution for a wide variety of general-purpose workloads, including common database workloads. Hyperdisk Balanced can meet the IOPS and throughput needs for most databases including Clickhouse, MySQL, and PostgreSQL, at general-purpose pricing. Hyperdisk Balanced offers 160K IOPS per volume — 10x better than AWS EBS gp3 volumes2. With Storage Pools you can enhance efficiency and radically simplify planning. Storage Pools allows customers to save approximately 20-40% on storage costs for typical database workloads when compared to Hyperdisk Balanced Volumes or AWS EBS gp3 volumes3.
“At Sentry.io, a platform used by over 4 million developers and 130,000 teams worldwide to quickly debug and resolve issues, adopting Google Cloud’s Hyperdisk has enabled us to create a flexible architecture for the next-generation of our Event Analytics Platform, a product at the core of our business. Hyperdisk Storage Pools with advanced capacity and performance enabled us to reduce our planning cycles from weeks to minutes, while saving 37% in storage costs, compared to persistent disks.” – Dave Rosenthal, CTO, Sentry
“High Availability is essential for Blackline — we run database failover clustering, at massive scale, for our global and mission-critical deployment of Financial Close Management. We are excited to bring this workload to Google Cloud leveraging Hyperdisk Balanced High Availability to meet the performance, capacity, cost efficiency, and resilience requirements that our customers demand, and helps us address our customer’s financial regulatory needs globally.” – Justin Brodley, SVP of Cloud Engineering and Operations, Blackline
Tier-0 databases
For high-end, performance-critical databases like SAP HANA, SQL Server, and Oracle Database, Hyperdisk Extreme delivers uncompromising performance. With Hyperdisk Extreme, you can obtain up to 350K IOPS and 10 GiB/s of throughput from a single volume.
AI, analytics, and scale-out workloads
Hyperdisk offers excellent solutions for the most demanding next-generation machine learning and high performance computing workloads.
Dynamically scaling AI and analytics workloads and high-performance file systems
Workloads with fluctuating demand, and high peak throughput and IOPS, benefit from Hyperdisk Balanced and Storage Pools. These workloads can include customer-managed parallel file systems and scratch disks for accelerator clusters. Storage Pools’ dynamic resource allocation helps ensure that these workloads get the performance they need during peak times without requiring constant manual adjustments or inefficient over-provisioning. Further, once your Storage Pool is set up, planning at the per-disk level is significantly simpler. Note: If you want a fully managed file system, Managed Lustre is an excellent option for you to consider.
“Combining our use of cutting-edge machine learning in quantitative trading at Hudson River Trading (HRT) with Google Cloud’s accelerator-optimized machines, Dynamic Workload Scheduler (DWS) and Hyperdisk has been transformative in enabling us to develop [state-of-the-art] models. Hyperdisk storage pools have delivered substantial cost savings, lowering our storage expenses by approximately 50% compared to standard Hyperdisk while minimizing the amount of planning needed.” – Ragnar Kjørstad, Systems Engineer, Hudson River Trading
AI/ML and HPC data-load acceleration
Hyperdisk ML is specifically optimized for accelerating data load times for inference, training and HPC workloads — Hyperdisk ML accelerates model load time by 3-5x compared to common alternatives4. Hyperdisk ML is particularly well-suited for serving tasks compared to other storage services on Google Cloud because it can concurrently provide to many VMs exceptionally high aggregate throughput (up to 1.2 TiB/s of aggregate throughput per volume, offering greater than 100x higher performance than competitive offerings)5. You write once (up to 64 TiB per disk) and attach multiple VM instances to the same volume in a read-only mode. With Hyperdisk ML you can accelerate data load times for your most expensive compute resources, like GPUs and TPUs. For more, check out g.co/cloud/storage-design-ai.
“At Resemble AI, we leverage our proprietary deep-learning models to generate high-quality AI audio through text-to-speech and speech-to-speech synthesis. By combining Google Cloud’s A3 VMs with NVIDIA H100 GPUs and Hyperdisk ML, we’ve achieved significant improvements in our training workflows. Hyperdisk ML has drastically improved our data loader performance, enabling 2x faster epoch cycles compared to similar solutions. This acceleration has empowered our engineering team to experiment more freely, train at scale, and accelerate the path from prototype to production.” – Zohaib Ahmed, CEO, Resemble AI
“Abridge AI is revolutionizing clinical documentation by leveraging generative AI to summarize patient-clinician conversations in real time. By adopting Hyperdisk ML, we’ve accelerated model loading speeds by up to 76% and reduced pod initialization times.” – Taruj Goyal, Software Engineer, Abridge
High-capacity analytics workloads:
For large-scale data analytics workloads like Hadoop and Kafka, which are less sensitive to disk latency fluctuations, Hyperdisk Throughput provides a cost-effective solution with high throughput. Its low cost per GiB and configurable throughput are ideal for processing large volumes of data with low TCO.
How to size and set up your Hyperdisk
To select and size the right Hyperdisk volume types for your workload, answer a few key questions:
-
Storage management. Decide if you want to manage the block storage for your workloads in a pool or individually. If your workload will have more than 10 TiB of capacity in a single project and zone, you should consider using Hyperdisk Storage Pools to lower your TCO and simplify planning. Note that Storage Pools do not affect disk performance; some data protection features such as Replication and High Availability are not supported in Storage Pools.
-
Latency. If your workload requires SSD-like latency (i.e., sub-millisecond), it likely should be served by Hyperdisk Balanced or Hyperdisk Extreme.
-
IOPS or throughput. If your application requires less than 160K IOPS or 2.4 GiB/s of throughput from a single volume, Hyperdisk Balanced is a great fit. If it needs more than that, consider Hyperdisk Extreme.
-
Sizing performance and capacity. Hyperdisk offers independently configurable capacity and performance, allowing you to pay for just the resources you need. You can leverage this capability to lower your TCO by understanding how much capacity your workload needs (i.e., how much data, in GiB or TiB, is stored on the disks which serve this workload) and the peak IOPS and throughput of the disks. If the workload is already running on Google Cloud, you can see many of these metrics in your console under “Metrics Explorer.”
Another important consideration is the level of business continuity and data protection required for your workloads. Different workloads have different Recovery Point Objective (RPO) and Recovery Time Objective (RTO) requirements, each with different costs. Think about your workload tiers when making data-protection decisions. The more critical an application or workload, the lower the tolerance for data loss and downtime. Applications critical to business operations likely require zero RPO and RTO in the order of seconds. Hyperdisk business continuity and data protection helps customers meet the performance, capacity, cost efficiency, and resilience requirements they demand, and helps them address their financial regulatory needs globally.
Here are a few questions to consider when selecting which variety of Hyperdisk to use for a workload:
-
How do I protect my workloads from attack and malicious insiders? Use Google Cloud Backup vault for cyber resilience, backup immutability, and indelibility for managed backup reporting and compliance. If you want to self-manage your own backups, Hyperdisk standard snapshots are an option for your workloads.
-
How do I protect data from user errors and bad upgrades cost efficiently with low RPO / RTO? You can use our point-in-time recovery with Instant Snapshots. This feature minimizes the risk of data loss from user error and bad upgrades with ultra-low RPO and RTO — creating a checkpoint is nearly instantaneous.
-
How do I easily deploy my critical workload (e.g., MySQL) with resilience across multiple locations? You can utilize Hyperdisk HA. This is a great fit for scenarios that require high availability and fast failover, such as SQL Server that leverages failover clustering. For such workloads, you can also choose our new capability with Hyperdisk Balanced High Availability with Multi-Writer support. This allows you to run clustered compute with workload-optimized storage in two zones with RPO=0 synchronous replication.
-
When a disaster occurs, how do I recover my workload elsewhere quickly and reliably, and run drills to confirm my recovery process? Utilize our disaster recovery capabilities with Hyperdisk Async Replication which enables cross-region continuous replication and recovery from a regional failure, with fast validation support for disaster recovery drills via cloning. Further, consistency group policies help ensure that workload data that’s distributed across multiple disks is recoverable when a workload needs to fail over between regions.
In short, Hyperdisk provides a wealth of options to help you optimize your block storage to the needs of your workloads. Further, selecting the right Hyperdisk and leveraging features such as Storage Pools can help you lower your TCO and simplify management. To learn more, please visit our website. For tailored recommendations, always consult your Google Cloud account team.
1. As of March 2025 based on published information for Amazon EBS, Azure managed disks.
2. As of May 2025, compared to Amazon EBS gp3 volumes max iops/volume
3. As of March 2025, at list price, 50 to 150 TiB, peak IOPS of 25K to 75K and 25% compressibility, compared to Amazon EBS gp3 volumes.
4. As of March 2025, based on internal Google benchmarking, compared to Rapid Storage, GCSFuse with Anywhere Cache, Parallelstore and Lustre for larger node sizes.
5. As of March 2025 based on published performance for Microsoft Azure Ultra SSD and Amazon EBS io2 BlockExpress
The authors would like to thank David Seidman and Ruwen Hess for their contributions on this blog.
Read More for the details.