GCP – Instacart migrates to Memorystore and sees a 23 percent reduction in latency and costs
With millions of products available across the catalog and more than 1,200 retail banner partners who deliver from more than 100,000 stores across more than 14,000 cities in the U.S. and Canada, Instacart is one of the world’s largest grocery technology services, and its customers have come to love and rely on its convenience.
In Instacart’s E-Commerce Engineering organization, Dennis Turko leads the Storefront Pro Foundations team, which is responsible for operating Instacart’s white-label digital storefront product. On a daily basis, this team strives to improve the reliability, performance, scalability, and cloud cost management of a platform that powers the online grocery experience for dozens of prominent retailers, including major national retailers and regional favorites. The platform leverages Memcached as a distributed cache to accelerate data retrieval and alleviate database load. It is a critical component of Instacart’s enterprise architecture and contributes to the system’s ability to seamlessly handle fluctuations in traffic resulting from various trends, promotions, and holidays.
Recently the Foundations team migrated to Google Cloud’s fully managed Memorystore for Memcached service from previously self-managing Memcached on Compute Engine. As a result of this migration, the Instacart team observed the following benefits:
23% reduced Google Cloud bill for Memcached
80-100 hours of Engineering maintenance time eliminated annually
80-90% reduction in cache timeouts and transient read errors
p95 latency improvements of 23%, leading to API performance improvements
Simplified compliance processes which reduced engineering toil
Instacart’s previous DIY architecture
Before switching to Memorystore, Dennis and his team managed their Memcached clusters on Compute Engine, with responsibility for the complete lifecycle of this hosted caching infrastructure, including:
Provisioning, monitoring, and maintaining Compute Engine VMs using Terraform (infrastructure as code) and Ansible (configuration management)
Installing and configuring Memcached
Ensuring optimizations and tuning performance
Compliance, patching, and data security across VMs and Memcached.
The entire management process was further complicated by Instacart’s requirement to maintain a separate cache instance for each retailer.
Like many engineering teams, the Instacart team is responsible for a variety of software stacks and Memcached represents a single component in a large and complex architecture. The team did not possess deep expertise in Memcached, and had limited time to invest in caching improvements. In particular, the team found self-managing to be challenging in the following areas:
Scaling up caches during high traffic periods and keeping VMs patched and compliant. This was prohibitively manual and error-prone.
Suboptimal Memcached configurations and performance. This was due to lack of expertise and bandwidth for optimization.
Connectivity without auto-discovery. This created configuration complexity and required maintenance downtime.
To address these pain points, the team began to investigate Google Cloud’s fully managed Memorystore for Memcached offering so they could offload their operational burden to Google Cloud.
The team spent the final weeks of 2022 benchmarking the Memorystore solution, and planning the production migration. The production migration began in early January 2023 and lasted two weeks. All retailer environments were migrated without downtime, during business hours, facilitated significantly by Google’s strong Terraform provider support, enabling infrastructure-as-code, and auto-discovery endpoint.
Instacart’s new Memorystore architecture
The return on investment from migrating to Memorystore exceeded the Instacart team’s expectations. In addition to offloading the burden of patching, configuration, and Compute Engine VM management to Google, the team observed surprising reliability and performance improvements. While the self-managed Memcached nodes showed deteriorating performance with more than 10,000 active connections per node, Google’s Memorystore instances were optimized to support higher levels of traffic, comfortably allowing 65,000 connections per node.
Memorystore’s Memcached Auto-Discovery service, which allows clients to automatically adapt to the addition and removal of node IP addresses during scaling, enabled the team to simplify the application client with minimal code changes, removing the need to reconfigure it after scaling operations. From the application perspective, nodes are now added or removed seamlessly during scaling.
The scaling operation itself can be accomplished in minutes using a simple Terraform pull request. Streamlined horizontal scalability not only offers the benefits of saving engineering time and reducing potential for human error, but also creates opportunity for more precise cost control, allowing each retailer’s Memcached instance to be right-sized based on their seasonal traffic patterns. Right-sizing combined with the fact that each Memorystore node provided more throughput enabled Instacart to save 23% on their enterprise Memcached cloud bill.
Since Memorystore instances do not require additional configuration, the team was able to deprecate a sizable portion of their configuration management codebase devoted to Memcached and simplify a number of operational runbooks used at various levels of the organization. “Had we known the full scope of benefits from switching to Memorystore earlier, we could have saved more engineering time for delivering value to other parts of our e-commerce platform”, said Dennis Turko.
Memorystore also significantly improved the reliability of their Memcached cache. Overall, their rate of cache timeouts decreased by an astonishing 80-90%.
Perhaps most importantly for a cache, Instacart also observed a significant improvement in latencies after the migration to Memorystore. They initially decided to migrate their production workloads to Memorystore after benchmarking its performance with memtier, a popular open source tool. The team used custom inputs to mimic production traffic and found Memorystore to be highly performant, with higher throughput and lower latency than their self-managed clusters. After they cutover production traffic to Memorystore, Instacart discovered that the improvements even surpassed the performance gains observed during benchmarking.
Because caching is such an important optimization, these command latency improvements added up to a 18.5% overall improvement in average performance for a subset of API endpoints that drive the e-commerce product. This resulted in a decrease in time-to-interactive and an improved user experience.
Last, but certainly not least for the engineering team, was the opportunity to streamline compliance processes. With Memorystore, Instacart’s engineers were able to utilize the automatic maintenance window policy to meet internal compliance requirements and simplify the Standard Operating Procedures (SOP) for patching. This relief from more stringent requirements of self-managed clusters was another benefit to the engineers and further reduces the toil of managing their cache, allowing Instacart to focus on adding value elsewhere.
Delivered by Memorystore: Faster checkouts
After experiencing a seamless migration, and realizing numerous impactful benefits, the Instacart team is thrilled to recommend Memorystore to other organizations using Google Cloud. Reduced maintenance time, simplified operating procedures, and tighter cost control have brought Instacart much needed peace of mind when it comes to managing their e-commerce caching infrastructure. This means the team is free to tackle its next infrastructure challenge — and perhaps take a well-earned vacation!
Want to learn more about Memorystore?
Check out the product pages for Memorystore Memcached and Memorystore Redis
Learn how you can save up to 40% on Memorystore with Committed Use Discounts (CUDs)
Read about how OpenX used Memorystore to to “improve performance, reduce response time and optimize overall costs”
Read More for the details.