GCP – Introducing gcloud storage: up to 94% faster data transfers for Cloud Storage
Cloud Storage customers often ask us about the fastest way to ingest and retrieve data from their buckets. Getting the best performance often requires the users to know the right flags and parameters to optimize transfer speeds. In many situations, customers are using Cloud Storage with other Google Cloud services and are looking for one tool that can be used to manage all their Google Cloud assets.
Introducing gcloud storage – the latest addition to the Google Cloud CLI
The Google Cloud CLI (a.k.a., gcloud CLI) can be used to create and manage Google Cloud resources and services directly on the command line or via scripts. Gcloud storage is the newest addition to this set, which modernizes the CLI experience for Cloud Storage.
Data Transfer Performance
Data transfer rates are important to customers as they determine the rate of utilization of data to gain useful insights for their business. The new gcloud storage CLI offers significant performance improvements over the existing gsutil option which is a Python application that lets you access Cloud Storage via CLI.
To demonstrate the performance difference between gsutil and the gcloud storage option, we tested single and multi-file scenarios. When transferring 100 files that were 100MB in size, gcloud storage is 79% faster than gsutil on download and 33% faster on upload using a parallel composite upload strategy. See Figure 1. With a 10GB file, gcloud storage is 94% faster than gsutil on download and 57% faster on upload. See Figure 2. These tests have been performed on Google Cloud Platform using n2d-standard-16 (8 vCPUs, 32 GB memory) and 1x375GB NVME in RAID0 in us-east4.
Faster transfer rates are a result of two primary innovations in gcloud storage. First, gcloud storage uses faster hashing tools for CRC32C data integrity checking that skip the complicated setup required for gsutil. Second, it utilizes a new parallelization strategy that treats task management as a graph problem, which allows more work to be done in parallel with far less overhead.
Improved usability
In addition to the performance improvements gcloud CLI provides, it also provides a consistent way to manage all Google Cloud resources like Cloud Storage buckets, Compute Engine VMs, and Google Kubernetes Engine clusters.
gcloud storage automatically detects optimal settings and speeds up transfers without requiring any flags from the users. In gcloud storage, all operations happen in parallel. As an example, parallel composite uploads are enabled automatically based on bucket configuration. This is a vast improvement compared to gsutil, which requires the -m (parallel operations) flag to improve the performance for uploads and downloads.
gcloud storage significantly reduces the number of top level commands that users have to manage their Cloud Storage resources. This is achieved by grouping commands under common headers – all bucket operations are grouped under gcloud storage buckets <command> and all object ops are grouped under gcloud storage objects <command>.
The transition to gcloud storage CLI is simple because we have introduced a shim that enables existing gsutil scripts to be executed as gcloud storage. This allows you to get all the performance benefits of the new CLI without having to rewrite any existing gsutil scripts for Cloud Storage.
Enabling gcloud storage
gcloud storage CLI is currently available and you can use it without any additional charges. You may install or upgrade to the latest version of the Google Cloud SDK to get the new CLI. To learn more about the gcloud storage CLI, please refer to the documentation here.
Read More for the details.