GCP – Introducing Cassandra-compatible APIs in Spanner for zero-code-change migration
You heard at Next ‘25 that native support for the Cassandra Query Language (CQL) API in Spanner is available in preview, providing near-zero code-change application migration, along with a suite of complementary tools for zero-downtime data migration.
Spanner is known for its robust SQL support, ACID transactions, and change data capture (CDC), provided as a fully managed service. At the same time, Spanner excels at the basics of NoSQL: single-digit-millisecond read and write latencies, virtually unlimited horizontal scalability, and industry leading high availability. With the CQL addition to Spanner, migrating your application is typically a one-line code change — your application and CQL statements stay the same — while new data migration tooling simplifies live and bulk migration of data.
Why migrate from Cassandra to Spanner?
But first, let’s talk about why an organization might want to migrate from Cassandra to Spanner in the first place.
NoSQL databases like Apache Cassandra gained popularity for their horizontal scalability and high availability — attributes that are essential for modern, web-scale applications that manage large datasets and user traffic. But while Cassandra’s masterless architecture and tunable consistency provides flexibility, but some key challenges persist:
-
Cassandra suffers from high total cost of ownership (TCO) via inelastic scaling and the substantial operational overhead it requires.
-
Queries that use secondary indexes may suffer from significant performance impact.
-
CQL lacks complex joins and sophisticated aggregation functionality.
In contrast, Spanner offers several key advantages for demanding NoSQL workloads:
-
Easy operations: Managing Cassandra requires dedicated expertise and effort for tasks such as hardware/VM provisioning, configuration tuning, scaling, and patching and upgrades. With Spanner, Google Cloud manages these operational tasks, dramatically reducing TCO and freeing engineers to focus on your application, not database administration.
-
Optimized elastic scale: Spanner offers truly elastic scalability via an autoscaler that automatically adjusts capacity in response to real-time load, helping avoid overprovisioning for peak workloads. Spanner handles data rebalancing in the background, so as not to impact availability or performance of production workloads.
-
Global ACID transactions: Cassandra’s Lightweight Transactions (LWTs) offer limited compare-and-set functionality within a single partition and cannot guarantee atomicity for multi-partition or multi-step operations without complex, often brittle, application-level workarounds. Spanner supports fully ACID (atomic, consistent, isolated, durable) transactions, simplifying development and helping to ensure data integrity. In fact, Gartner ranks Spanner as #1 in the Lightweight Transactions Use Case and #3 in the OLTP Transactions Use Case in its Critical Capabilities for Cloud Database Management Systems for Operational Use Cases report.
-
Strongly consistent global secondary indexes: Cassandra secondary indexes are local indexes, stored as hidden tables on each node that contains the primary data to be indexed. This means that queries on a secondary index that accesses multiple nodes can greatly impact your application’s performance. Spanner offers global secondary indexes that can be used to efficiently query data across the entire database. In addition, Spanner’s secondary indexes are strongly consistent, so your different queries all return results that are consistent and up to date. Spanner also offers an index advisor that analyzes your queries to recommend new or altered indexes that can improve query performance.
-
Normalized data for data integrity: Data in Cassandra is often arranged as one query per table, and data is repeated amongst many tables, a process known as denormalization. Data denormalization has several drawbacks: increased data redundancy, inconsistencies between datasets, and increased storage and maintenance requirements as the number of tables grows. In contrast, data in Spanner is normalized, with full support for relational semantics, including enforced foreign keys and rich SQL query capabilities with efficient server-side aggregation.
Using Spanner’s new Cassandra-compatible APIs
The new APIs and the migration tools are meant to provide a seamless experience for your Cassandra to Spanner migrations. Here’s a quick guide:
-
Set up Spanner: Begin by creating a Spanner instance — regional or multi-regional. For each Cassandra keyspace you intend to migrate, create a corresponding database in your Spanner instance. Using the same name as the Cassandra keyspace is recommended to minimize application code modifications.
-
Convert your schema: Utilize the
spanner-cassandra-schema-tool
to automate schema conversion and Spanner tables creation. This tool processes your CQL table definitions, maps Cassandra data types to their Spanner equivalents (e.g.,text
toSTRING(MAX)
,bigint
toINT64
,map
toJSON
), and automatically generates the transformed Spanner tables. -
Migrate data: Migrate your data from Cassandra to Spanner and subsequently verify its integrity. We typically recommend a two-phased approach to minimize downtime:
-
Live migration (for zero downtime): To achieve a near zero-downtime migration, implement live migration for incoming data before initiating the bulk data transfer, via the included Docker file or Terraform template. The tool leverages the ZDM Proxy for dual-writes and the Cassandra-Spanner Proxy to transform CQL to Spanner API.
-
Bulk data migration: Use the SourceDB to Spanner Dataflow template for a highly parallelized transfer. This template reads data from Cassandra, performs the necessary transformations, and writes it to Spanner.
-
Data validation: After the bulk data migration is complete, it’s time to validate data accuracy and integrity. Common methods include comparing row counts, sampling row data, or performing an exhaustive comparison between the Cassandra and Spanner data.
Switch your application to use the native endpoint for Cassandra: We provide two options for doing this:
-
- Embed the Spanner Cassandra Client Library: For applications written in Java or Go, you can include the Spanner Cassandra client library as a dependency. The primary code modification involves changing how you build the
CqlSession
object to use theSpannerCqlSession.builder()
and providing the Spanner database URI. Your existing data access logic using theCqlSession
interface remains unchanged.
- Embed the Spanner Cassandra Client Library: For applications written in Java or Go, you can include the Spanner Cassandra client library as a dependency. The primary code modification involves changing how you build the
- code_block
- <ListValue: [StructValue([(‘code’, ‘CqlSession session =rn SpannerCqlSession.builder() rn .setDatabaseUri(“projects/your_gcp_project/instances/your_spanner_instance/databases/your_spanner_database”)rn .withKeyspace(“your_spanner_database”) // Corresponds to the Spanner database namern .withConfigLoader(rn DriverConfigLoader.fromFile(new File(“/path/to/config/file”))) // Optional: path to driver configrn .build();rnrn// Your existing code using the ‘session’ object remains unchanged.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2046f0c070>)])]>
-
Use the Spanner Cassandra sidecar proxy: For applications where embedding the client library is not feasible or for connecting standard tools like
cqlsh
, you can deploy the Spanner Cassandra sidecar proxy. This proxy runs as a separate process, often containerized, alongside your application and handles the translation between the Cassandra wire protocol and Spanner’s API.
At this point, your application and data should be successfully migrated to Spanner. We have more detailed migration instructions available in the documentation.
Spanner is cost-effective for your Cassandra workloads
Now that you’ve migrated your workloads to Spanner, you may wonder about costs. In fact, Spanner is a cost-effective alternative to Cassandra. Here’s why:
-
Fully managed service: This reduces your operational costs.
-
License-Free, Usage-Based Billing: Unlike Cassandra, where per-node licensing fees can accumulate, Spanner has no licensing fees. Spanner usage is billed hourly, meaning no upfront costs, and you only pay for what you use.
-
Elastic scaling with Autoscaler: Spanner’s Autoscaler allows precise scaling according to workload requirements. Avoid over-provisioning for peak performance and pay only for the resources you consume.
-
Granular instances: Start small with a Spanner granular instance (100 processing units) for as little as $65/month.
-
Built-in observability dashboards: No need to pay extra to store and analyze your usage logs. The Spanner console has everything you need.
-
Tiered storage: For storage-heavy workloads, Spanner offers tiered storage options that can be up to 80% cheaper than solid-state drive (SSD) storage.
-
Incremental backups: Benefit from substantial cost savings on backups with Spanner’s incremental backups.
A recent Total Economic Impact study by Forrester Consulting found that Spanner delivered a 132% ROI and $7.74 million in total benefits over three years for a composite organization representative of interviewed customers. These gains largely stemmed from retiring self-managed databases and leveraging Spanner’s elastic scalability and built-in, hands-free high availability operations.
Migrating to Spanner is just the starting point
Migrating your Cassandra application to Spanner is the first step to unlocking a world of new possibilities for your data. While your application continues interacting with Cassandra via CQL, you can build new microservices that access the same data. You can even consider moving your connecting ETL pipelines to Spanner leveraging Spanner’s full multi-model capabilities. You can also implement sophisticated graph queries to analyze relationships between users, utilize full-text search across user profiles or product catalogs, or add vector embedding support for recommendation engines or similarity searches, combining transactional data with AI/ML workloads directly. Lastly, Spanner is tightly integrated with BigQuery, allowing you to unlock valuable insights and drive better decision-making by connecting operational and analytical workloads.
Get started today by testing our migration kit via the Spanner free trial. We believe this will empower next-gen application builders like you to modernize your Cassandra workloads and unlock the full potential of your data on Google Cloud with Spanner.
Read More for the details.