GCP – Songkick harmonized their infrastructure with Memorystore
Songkick is a U.K.-based concert discovery service and live music platform owned by Warner Music Group connecting music fans to live events. Annually, we help over 175 million music fans around the world track their favorite artists, discover concerts and live streams, and buy tickets with confidence online and via their mobile apps and website.
We have about 15 developers across four teams, all based in London, and my role is to provide support across those teams by helping them to make technical decisions and to architect solutions. After migrating to Google Cloud, we wanted a fully managed caching solution that would integrate well with the other Google tools that we’d come to love, and free our developers to work on innovative, customer-delighting products. Memorystore, Google’s scalable, secure, and highly available memory service for Redis, helped us meet those goals.
Fully managed Memorystore removed hassles
Our original caching infrastructure was built solely with on-premises Memcached, which we found simple to use at the time. Eventually, we turned to Redis to leverage advanced features like dictionaries and increments. In our service-oriented architecture, we had both of these open source data stores working for us. We had two Redis clusters—one for persistent data, and one as a straightforward caching layer between our front end and our services.
When we were making decisions about how to use Google Cloud, we realized there was no real advantage for having two caching technologies (Memcached and Redis) and decided to use only Redis because everything we used it for could be handled by Redis and this way we don’t need knowledge in both databases. We do know that Redis can be more complex to use and manage but that wasn’t a big concern for us because it would be completely managed by Google Cloud when we use Memorystore. With Memorystore automating complex tasks for Redis like enabling high availability, failover, patching, and monitoring, we could focus that time now on new engineering opportunities.
We considered the hours we spent fixing broken Redis clusters and debugging network problems. Our team is more heavily experienced in developing versus managing infrastructure, so problems with Redis had proven distracting and time-consuming for the team. Also, with a self-managed tool, there would potentially be some user-facing downtime. But Memorystore was a secure, fully managed option that was cost-effective and promised to save us those hassles. It offered the benefits of Redis without the cost of managing it. Choosing it was a no-brainer.
How Memorystore works for us
Let’s look at a couple of our use cases for Memorystore. We have two levels of caching on Memorystore—the front end caches results from API calls to our services and some services cache database results. Usually, our caching key for the front end services is the URL and any primitive values that will get passed. With the URL and the query parameters, the front end looks to see if it already has a result for it, or if it needs to then go talk to the service.
We have a few services where we actually have a caching layer within the service itself that talks to Redis first before deciding whether it needs to go, then invokes our business logic and talks to the databases. That caching sits in front of the service, operating on the same principle as the front-end caching.
We also use Fastly as a caching layer in front of our front ends. So, on an individual page level, the whole page may be heavily cached in Fastly, such as when a page is for a leaderboard of the top artists on the platform.
Memorystore comes in for user-level content, such as if there’s an event page that pulls some information about the artist and some information about the event, and maybe some recommendations for the artists. If the Fastly cache on the artist page had expired, it would go to the front end, which would know to talk to the various services to display all of the requested information on the page. In this case, there might be three separate bits of data sitting in our Redis cache. Our artist pages have components that are not cached in Fastly, so there we rely much more heavily on Redis.
Our Redis cache TTL (time-to-live) tends to be quite low; sometimes we have just a ten-minute cache. Other times, with very static data, we can cache it in Redis for a few hours. We determine a reasonable caching time for each data item, and then set the TTL based on that determination. A particular artist might be called 100,000 times a day, so even putting just a ten-minute cache on that makes a huge difference in how many calls a day we have to put into our service.
For this use case, we have one highly available Memorystore cluster of about 4 GB of memory, and we use a cache eviction policy of allkeys-lru (least recently used). Right now on that cluster, we’re getting about 400 requests per second in peaks. That’s an average day’s busy period, but it’ll spike much higher than that in certain circumstances.
We had two different Redis clusters in our old infrastructure. The first is as just described. The second was persistent Redis. When considering migration to Google Cloud, we decided to use Redis in the way it really excels in and decided to simplify, and re-architect the four or five features that use the persistent Redis, either using Cloud SQL for MySQL or using BigQuery. Sometimes we use Redis to aggregate data, and now that we’re on Google Cloud, we could just use BigQuery and have far better analysis options than we had for aggregating on Redis.
We also use Memorystore as a distributed mutex. There are certain actions in our system where we don’t want the same thing happening concurrently—for example, a migration of data for a particular event, where two admins might be trying to pick up the same piece of work at the same time. If that data migration happened concurrently, it could prove damaging to our system. So we use Redis here as a mutex lock between different processes, to ensure they happen consecutively instead of concurrently.
Memorystore and Redis work for us in peaceful harmony
We have not seen any problems with Redis since the migration. We also love the monitoring capabilities you get out of the box with Memorystore. When we deploy a new feature, we can easily check if it suddenly fills the cache, or if we have a really low hit ratio that indicates we’ve made an error in our implementation.
Another benefit: the Memorystore interface works exactly like you’re just talking to Redis. We use ordinary Redis in a Docker container in our development environments, so when we’re running it locally, it’s seamless to check that our caching code is doing exactly what it’s meant to.
We have both production and staging environments, both Virtual Private Clouds, each with its own Memorystore cluster. We have unit tests, which never really touch Redis, and integration tests, which talk to a local MySQL in Docker and a Redis in Docker as well. And we also have acceptance tests—browser automation tests that run in the staging environment, which talk to Cloud SQL and Memorystore.
Planning encores with Memorystore
For a potential future use case for Memorystore, we’re almost certainly going to be adding Pub/Sub to our infrastructure, and we’ll be using Redis to deduplicate some messages coming from Pub/Sub, such as when we don’t want to send the same email twice in quick succession. We’re looking forward to Pub/Sub’s fully managed services as well, since we’re currently running RabbitMQ, which too often requires debugging. We performed an experiment using Pub/Sub for the same use case, and it worked really well, so it made for another easy decision.
Memorystore is just one of Google’s data cloud solutions we use everyday. Additional ones include Cloud SQL, BigQuery and Dataflow for an ETL pipeline, data warehousing, and our analytics products. There, we aggregate data that the artist is interested in, feed that back into MySQL, and then surface that in our artist products. Once we have Pub/Sub, we’ll have virtually every bit of Google Cloud database type. That’s evidence of how we feel about Google Cloud’s tools.
Learn more about the services and products making music at Songkick. Curious to learn more about Memorystore? Check out the Google Cloud blog for a look at performance tuning best practices for Memorystore for Redis.
Read More for the details.