GCP – Rightsize your Memorystore for Redis Clusters with open-source Autoscaler
One of the most compelling aspects of cloud computing is being able to automatically scale resources up, but almost as importantly, to scale them back down to manage costs and performance. This is standard practice with virtual machines, for instance Compute Engine Managed Instance Groups, but because of their inherent complexity, less so with stateful services such as databases.
Last year we released Memorystore for Redis Cluster with the ability to manually trigger scale out and down. Today, to meet the incredibly elastic nature of modern Memorystore workloads, we’re excited to announce the open-source Memorystore Cluster Autoscaler available on GitHub, which builds on our open-source Spanner Autoscaler, which we released in 2020.
Understanding cluster scaling
Memorystore for Redis Cluster capacity is determined by the number of shards in your cluster, which can be increased/decreased without downtime, and your cluster’s shard size, which maps on to the underlying node type. At this time, the node type of the cluster is immutable. To scale capacity in or out, you modify the number of shards in your cluster. To automate this process, you can deploy the Memorystore Cluster Autoscaler to monitor your cluster metrics, and rightsize your cluster based on that information. The Autoscaler performs the necessary resource adjustments using rulesets that evaluate memory and CPU utilization, without impacting cluster availability.
The following chart shows the Autoscaler in action, with a Memorystore for Redis Cluster instance automatically scaling out as memory utilization increases. The green line represents data being written to the cluster at the rate of one gigabyte every five minutes. The blue line represents the number of shards in the cluster. You can see that the cluster scales out, with the number of shards increasing in proportion to the memory utilization, then plateaus when the writes stop, and finally scales back in when the keys are flushed at the end of the test.
Experience and deployment
To use the Autoscaler, deploy it to one of your Google Cloud projects. The Autoscaler is very flexible and there are multiple options for its deployment, so the repository contains multiple example Terraform deployment configurations, as well as documentation that describes the various deployment models.
Once you’ve deployed the Autoscaler, configure it according to the scaling requirements of the Memorystore instances being managed, to suit your workloads’ characteristics. You do this by setting Autoscaler configuration parameters for each of the Memorystore instances. Once configured, the Autoscaler autonomously manages and scales the Memorystore instances. You can read more about these parameters later in this post, and in the Autoscaler documentation.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e4df7d14c70>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Autoscaler architecture
The Autoscaler consists of two main components, the Poller and the Scaler. You can deploy these to either Cloud Run functions or Google Kubernetes Engine (GKE) via Terraform, and configure them so that the Autoscaler runs according to a user-defined schedule. The Poller queries the Memorystore metrics in Cloud Monitoring at a pre-defined interval to determine utilization, and passes them to the Scaler. The Scaler then compares the metrics against the recommended thresholds specified in the rule set, and determines if the instance should be scaled in or out, and if so, by how many shards. You can modify the sample configuration to determine minimum and maximum cluster sizes and any other thresholds suitable for your environment.
Throughout the flow, the Autoscaler writes a step-by-step summary of its recommendations and actions to Cloud Logging for tracking and auditing, as well as metrics to Cloud Monitoring to provide insight into its actions.
Scaling rubrics
Memorystore performance is most commonly limited by in-memory storage and by CPU. The Autoscaler is configured by default to take both of these factors into consideration when scaling, by utilizing the CPU_AND_MEMORY profile. This is a good place to start your deployment, and can be replaced with a custom configuration, if required, to best suit your needs.
Defaults
Metric |
Average Default Setting |
Max Default Setting |
CPU scale OUT |
CPU > 70% |
Max CPU > 80% and average CPU > 50% |
CPU scale IN |
CPU < 50% * |
Max CPU < 60% and average CPU < 40% * |
Memory Scale OUT |
Usage > 70% |
Max Usage > 80% and average usage > 50% |
Memory Scale IN |
Usage < 50% * |
Max Usage < 60% and average usage < 40% * |
* Scale-in will be blocked if there are ongoing key evictions, which occur when the keyspace is full and keys are removed from the cache to make room. Scale in is enabled by default, but can be configured using a custom scaling profile. Refer to the Scaling Profiles section of the documentation for more information on how to do this.
Scaling scenarios and methods
Let’s take a look at some typical scenarios and their specific utilization patterns, and the Autoscaler configurations best suited to each of them. You can read more about the options described in the following section in the configuration documentation.
Standard workloads
With many applications backed by Memorystore, users interact with the application at certain times of day more than others, in a regular pattern — think a banking application where users check their accounts in the morning, make transactions during the afternoon and early evening, but don’t use the application much at night.
We refer to this fairly typical scenario as a “standard workload” whose time series shows:
-
Large utilization increase or decrease at certain points of the day
-
Small spikes over and under the threshold
A recommended base configuration for these types of workflow should include:
-
The LINEAR scalingMethod to cover large scale events
-
A small value for scaleOutCoolingMinutes — between 5 and 10 minutes — to minimize Autoscaler’s reaction time.
Plateau workloads
Another common scenario is applications with more consistent utilization during the day such as global apps, games, or chat applications. User interactions with these applications are more consistent, so the jumps in utilization are less pronounced than for a standard workload.
These scenarios create a “plateau workload” whose time series shows:
-
A pattern composed of various plateaus during the day
-
Some larger spikes within the same plateau
A recommended base configuration for these types of workflow should include:
-
The STEPWISE scalingMethod, with a stepSize sufficient to cover the largest utilization jump using only a few steps during a normal day, OR
-
The LINEAR scalingMethod, if there is likely to be a considerable increase or reduction in utilization at certain times, for example when breaking news is shared. Use this method together with a scaleInLimit to avoid reducing the capacity of your instance too quickly
Batch workloads
Customers often need increased capacity for their Memorystore clusters to handle batch processes or a sales event, where the timing is usually known in advance. These scenarios comprise a “batch workload” with the following properties:
-
A scheduled, well-known peak that requires additional compute capacity
-
A drop in utilization when the process or event is over
A recommended base configuration for these types of workloads should include two separate scheduled jobs:
-
One for the batch process or event, that includes an object in the configuration that uses the DIRECT scalingMethod, and a minSize value of the peak number of shards/nodes to cover the process or event
-
One for regular operations, that includes configuration with the same projectId and instanceId, but using the LINEAR or STEPWISE method. This job will take care of decreasing the capacity when the process or event is over
Be sure to choose an appropriate scaling schedule so that the two configurations don’t conflict. For both Cloud Run functions and GKE deployments, make sure the batch operation starts before the Autoscaler starts to scale the instance back in again. You can use the scaleInLimit parameter to slow the scale-in operation down if needed.
Spiky workloads
Depending on load, it can take around several minutes for Memorystore to update the cluster topology and fully utilize new capacity. Therefore, if your traffic is characterized by very spiky traffic or sudden-onset load patterns, the Autoscaler might not be able to provision capacity quickly enough to avoid latency, or efficiently enough to yield cost savings.
For these spiky workloads, a base configuration should:
-
Set a minSize that slightly over-provisions the usual instance workload
-
Use the LINEAR scalingMethod, in combination with a scaleInLimit to avoid further latency when the spike is over
-
Choose scaling thresholds large enough to smooth out some smaller spikes, while still being reactive to large ones
Advanced usage
As described above, the Autoscaler is preconfigured with scaling rules designed to optimize cluster size based on CPU and memory utilization. However, depending on your workload(s), you may find that you need to modify these rules to suit your utilization, performance and/or budget goals.
There are several ways to customize the rule sets that are used for scaling, in increasing order of effort required:
-
Choose to scale on only memory or only CPU metrics. This can help if you find your clusters flapping, i.e., alternating rapidly between sizes. You can do this by specifying a scalingProfile of either CPU or MEMORY to override the default CPU_AND_MEMORY in the Autoscaler configuration.
-
Use your own custom scaling rules by specifying a scalingProfile of CUSTOM, and supplying a custom rule set in the Autoscaler configuration as shown in the example here.
-
Create your own custom rule sets and make them available for everyone in your organization to use as part of a scaling profile. You can do this by customizing one of the existing scaling profiles to suit your needs. We recommend starting by looking at the existing scaling rules and profiles, and creating your own customizations.
Next steps
The OSS Autoscaler comes with a Terraform configuration to get you started, which can be integrated into your codebase for production deployments. We recommend starting with non-production environments, and progressing through to production when you are confident with the behavior of the Autoscaler alongside your application(s). Some more tips for production deployments are here in the documentation.
If there are additional features you would like to see in the Autoscaler — or would like to contribute to it yourself — please don’t hesitate to raise an issue via the GitHub issues page. We’re looking forward to hearing from you.
Read More for the details.