GCP – How Yahoo Calendar broke free from hardware queues and DBA bottlenecks
Editor’s note: Yahoo Mail is in the midst of one of its largest infrastructure transformations to date: a multi-year effort to modernize hundreds of petabytes of services by moving to Google Cloud.The Yahoo Mail migration – a high-scale always-on workload – began with Yahoo Calendar, a product that is an essential part of the experience for hundreds of millions of Yahoo Mail users. It was a massive undertaking with no room for error, and the result was a smooth cutover with no customer impact that proved Cloud SQL could handle the complexity and pace of Yahoo’s operations. It also marked a shift in how Yahoo works by reducing manual overhead, unlocking developer agility, and laying the foundation for what comes next.
At Yahoo, we knew migrating our cornerstone platform — Yahoo Mail — to the cloud would be one of the most significant infrastructure efforts we’d ever taken on. With over 500 petabytes of interconnected systems, we knew we needed to start with a smaller, high-impact workload to build early confidence. That’s how Yahoo Calendar, a product that is an essential part of the experience for hundreds of millions of Yahoo Mail users, , became the first production service to make the move.
We needed to migrate a high-scale, always-on service without disrupting the experience users rely on every day — or risk millions of people missing standups, birthday dinners, or that dentist appointment they actually remembered to schedule.
We chose Google Cloud to help us modernize our operations with managed infrastructure, reduce manual effort, and tap into a trusted ecosystem for large-scale transformation. Migrating Yahoo Calendar became our proving ground for running mission-critical services on Cloud SQL and would set the pace for the rest of our multi-year migration plan for Yahoo Mail.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘Build smarter with Google Cloud databases!’), (‘body’, <wagtail.rich_text.RichText object at 0x7fa96c59f820>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Modernizing infrastructure without skipping a single invite
The infrastructure we were replacing included tens of on-premises MySQL (Percona) instances. It was solid but not built for operational speed. Scaling meant filing hardware requests and often waiting weeks or even months. Routine tasks like backups or upgrades had to go through a separate database administration (DBA) team. And as demand grew, the need for agility grew with it. To meet that growing need with more flexibility and speed, we took on a massive lift:
-
Migrating tens of database shards across multiple regions
-
Moving over 20+ TB of storage (excluding replicas)
-
Supporting peak traffic of 1 million QPS reads and 2,500 QPS writes
-
Replatforming our application stack to run on Google Kubernetes Engine (GKE) to support the Calendar experience for the hundreds of millions of Yahoo Mail users
Cloud SQL’s support for our existing MySQL workloads with minimal changes lets us replicate our on-prem shards without a full re-architecture. That compatibility provided the foundation to restructure our full stack. To make it all work, we migrated the UI, API, and backend to GKE and connected everything to Cloud SQL deployments in multiple Google Cloud regions. All of this had to be migrated incrementally, with no downtime for public users. Traffic continued flowing through existing endpoints, and our proxy layer routed requests based on each user’s location and migration state. As database shards became ready, we carefully flipped them into read-write mode on Cloud SQL to keep Calendar users running on schedule while shifting the backend in stages.
Fig. 1 – MySQL On-Prem to Cloud SQL Initial Load + CDC
A migration this big needed backup
Google Cloud’s Professional Services Organization (PSO) played a critical role in getting us there. From the earliest stages, they were embedded with our team. They helped us evaluate Cloud SQL and Database Migration Service (DMS), guide proof-of-concept work, and stress-test our migration architecture.
When we hit a roadblock replicating data with DMS, PSO worked closely with Cloud SQL engineering and our internal security and DBA teams to design a custom workaround. During cutover, they were right there with us to help with hiccups like debugging capacity constraints or troubleshooting connection spikes during shadow traffic. They also helped us resolve reverse replication failures caused by permission changes — an edge case we wouldn’t have anticipated without their guidance.
Fig. 2 – Yahoo Calendar migration diagram
Cloud SQL helped us block time for what matters
With managed infrastructure, we’ve significantly reduced manual operations, reduced database admin overhead, and gained the agility to scale up without the wait. Our application teams now deploy and manage database shards ourselves using infrastructure as code (IaC), without relying on manual processes. Backups, patching, and failovers are automated to reduce risk and manual effort. Usage and cost monitoring are built-in, helping us optimize across the board. And thanks to tight integration with our security protocols, we’re able to maintain high confidence in operating a large-scale public-facing service.
Today, Yahoo Calendar processes hundreds of thousands of queries per second, operates 26 Cloud SQL instances with disaster recovery (DR), and runs on infrastructure that includes 2,500 virtual CPUs and 17 TB of memory for databases alone. Our application tier spans 850 pods and 2,200 vCPUs, with 10 TB of memory to match. We now run at scale, with confidence — and without waiting on hardware or handoffs.
Fig .3 – Architecture diagram of Yahoo Calendar’s services
Up next on our calendar
We’re seeing the benefits of infrastructure that works with us, not against us. And we’re doing it all without compromising on scale, performance, or security. Now that we’ve pressure-tested our migration strategy and refined how we operate in the cloud, we’re ready to take on Yahoo Mail’s full environment — 500 petabytes and counting.
The next couple of years will be about scaling smart, staying nimble, and proving that modernization doesn’t have to mean disruption. But with the hardest part of any journey behind us (starting), and a calendar that runs on Cloud SQL, we’re in sync and right on schedule.
Learn more:
-
Discover how Cloud SQL can transform your business! Start a free trial today!
-
Download this IDC report to learn how migrating to Cloud SQL can lower costs, boost agility, and speed up deployments.
-
Learn how Ford and Lightricks gained high performance and cut costs by modernizing with Cloud SQL.
Read More for the details.