GCP – How MLB keeps fans connected to the game – one cache hit at a time
Editor’s note: Major League Baseball (MLB) delivers data in real time to millions of fans, apps, and broadcasters — tracking everything from pitch speeds to player positions. To keep pace, the Baseball Data Platform team turned to Memorystore for Valkey, Google Cloud’s fully managed in-memory data service. It’s helped MLB handle billions of daily requests, recover from outages in seconds, and focus more on innovation than infrastructure management.
There’s a saying in baseball: It’s a game of inches. A ball that lands fair by a few blades of grass, a runner safe by the edge of a cleat – tiny margins can change everything.
For the Baseball Data Platform team at Major League Baseball (MLB), it’s also a game of milliseconds. Our team is responsible for building and maintaining the Stats API, the backbone of MLB’s data delivery platform. It powers everything from live game trackers to broadcast graphics, internal apps, and more. On a busy day, we handle nearly 10 billion requests at the edge and 15,000 API requests per second at peak load.
Whether fans tap in through an app, a screen in the stadium, or a second-screen experience at home, we’re the ones making sure the data shows up accurately and instantly. And Memorystore for Valkey helps us do that. Of
Bases loaded, cache full
Our original caching layer was built on self-managed Memcached VMs. But as our data needs evolved, so did the pressure on our systems. If a VM failed, its cached data was lost, and recovery could take hours. During that window, we’d divert traffic to a backup region and rebuild caches, all while watching the risk of a second failure rise.
Meanwhile, our data had grown more complex. We were now streaming high-volume telemetry from the field, including player movement data that maps 29 points on each athlete’s body in 3D space at 30 frames per second. The sheer volume and velocity of this data tested the limits of our stack.
That’s what led us to Memorystore for Valkey, Google Cloud’s managed in-memory service built on Valkey. What stood out immediately was the built-in high availability, cross-region replication, and the ability to scale clusters with zero downtime. And because it’s a managed service, we could finally shift our focus from maintaining cache infrastructure to optimizing the experience around it.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘Build smarter with Google Cloud databases!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f2506b9c430>), (‘btn_text’, ‘Download the e-book today!’), (‘href’, ‘https://cloud.google.com/resources/content/databases-customer-stories-2025’), (‘image’, None)])]>
Covering the field with Memorystore for Valkey
Today, some of our most critical data flows run through Memorystore for Valkey. To make the most of it, we’ve developed a caching strategy built around three major use cases.
1. Ballpark buffering
This is our edge-layer caching, deployed directly in Google Distributed Cloud at the stadium level. Here, we process and temporarily store operator-generated metadata and real-time tracking data before pushing it into our cloud services. Memorystore for Valkey acts as a message buffer in this setup. It gives us a layer of protection between the ballpark systems and our backend. That way, if there’s ever a network hiccup, we don’t lose data – we just replay it from the cache once connectivity is restored.
We’re running this on Valkey 8.0, with latency as low as 1–2 milliseconds during live games and peak command volumes around 11,000 per second. It’s fast, reliable, and invisible to the people in the stands – as it should be.
Fig. 1 – Architecture diagram of ballpark caching
2. Live GUMBO, served fast
We call it GUMBO, but it’s not stew – it’s our Grand Unified Master Baseball Object, a JSON structure that represents the full state of a live game. This includes play-by-play updates, pitch data, player positions, and more. After every pitch and every play, the GUMBO gets updated. Fans and internal systems alike need that data to be fresh and fast, no matter where they are. That’s why we use a multi-region read setup for GUMBO, with Memorystore for Valkey helping us meet our SLA of under two seconds for cross-region availability.
Previously, we had to build and manage our own replication pipeline to make this work. With Memorystore for Valkey’s cross-region replication (CRR), we’ve started simplifying that stack, replacing custom infrastructure with built-in capabilities that are faster and easier to operate.
Fig. 2 – Architecture diagram of live Gumbo caching
3. Everything else: Stats, rosters, leaderboards
Not all data is real-time. Some of it updates every few seconds, some every few minutes, and some post-game. But all of it – batting stats, player rosters, standings, schedules – needs to be delivered fast when requested. For these semi-live and core datasets, we use a read-through caching model. If a user request hits an empty cache, our system fetches the data from the database, updates the cache, and returns the result.
We’ve configured two Memorystore clusters per region: a ‘regular’ cluster and a ‘heavy’ cluster. Together, they handle upwards of 200,000 commands per second with average memory usage below 75%, and we’re still seeing room to grow.
Staying ahead of the game
With Memorystore for Valkey at the core of our caching strategy, we’ve stepped up to the plate in terms of performance, reliability, and operational efficiency.
The shift to managed caching has freed up our team to focus on what’s next. We’re investing more in tuning performance, refining cache patterns, and building tools to deepen our observability. We’ve started exploring better ways to monitor key distribution and slot activity within our clusters to help us detect hotspots before they become bottlenecks.
We’re also interested in automating how we scale clusters throughout the season. Game schedules can affect traffic in predictable ways, and being able to dynamically adjust cache size – either manually or through a rules-based approach – would help us right-size our infrastructure with even more precision.
With Memorystore for Valkey, we’re in a position to move faster, build smarter, and deliver better experiences for everyone who depends on our data – from fans in the stands to broadcasters, analysts, and club staff.
Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.
Read More for the details.