GCP – Avoid cost overruns: How to manage your quotas programmatically
One important aspect of managing a cloud environment is setting up financial governance to safeguard against budget overruns. Fortunately, Google Cloud lets you set quotas for a variety of services, which can play a key role in establishing guardrails—and protect against unforeseen cost spikes. And to help you set and manage quotas programmatically, we’re pleased to announce that the Service Usage API now supports quota limits in Preview.
The Service Usage API is a service that lets you view and manage other APIs and services in your cloud projects. With support for quota limits, you can now leverage the Service Usage API to manage service quotas, such as those from Compute Engine.
In this blog post, we’ll take a look at how to use this new functionality with Google Cloud operations tools, so you can track the resources consumed by your projects, set alerts, and right-size your deployments for better cost control.
Understanding quotas
Quotas can be used to limit the resources a project/organization is authorized to consume. From the type and number of Compute Engine CPUs, to the maximum number of requests made to an API over a certain period of time, quota metrics have associated quota limits that express the ceiling that the quota metric can reach.
A quota limit may be applied globally. This is when there is only one quota limit for the project, independent of where the resource is consumed. Other quota limits may be applied separately for each cloud region (a regional limit) or cloud zone (a zonal limit). As a project administrator, you can use these quota limits to control how much and where a project can use resources, so that costs stay under control.
As an example, you may want to allow production workloads to use a substantial number of high-end CPUs, and a large number of external VPN gateways to allow for scaling flexibility. Experimental projects, meanwhile, may have significantly lower limits to make sure they stay within their allocated research budget.
Quota limits were initially exclusively managed via the Google Cloud Console. This interface is ideal when you only need to apply a few changes. However, when you need to adjust a large number of quota limits, or when you need to apply these changes as part of an automated workflow, a programmatic approach is preferable.
Setting quota limits programmatically
With the Service Usage API, you can discover the quota limits that are available as well as set new ceilings (called consumer overrides). This API will allow you to set quota limits programmatically in workflows and scripts when projects are created, or to leverage automation tools that you might already be using such as Terraform. Note that you can’t use the Service Usage API to increase the available quota above what is allowed by default. For this, you need to place a Quota Increase Request (QIR) via the Quota page.
You can invoke the Service Usage API by making direct HTTP requests, or using the client libraries that Google provides in your favorite languages (Go, Java, Python, etc.)
Monitoring and alerting on quota
You can now monitor quotas, graph historical usage, and set alerts when certain thresholds are reached with the help of Cloud Monitoring, from both the user interface and its API (see Using quota metrics).
Cloud Monitoring starts tracking each of the quotas supported by Service Usage API the moment the project starts consuming them. Allocation quota usage, rate quota usage, quota limit, and quota exceeded error (attempts to go over quota that failed) are all stored automatically by Cloud Monitoring under the “Consumer Quota” resource type.
You can use Metrics explorer to query quota data, create charts and easily incorporate them in a monitoring dashboard. This enables you and your team to see historical events, track trends, and monitor usage over time.
You can also create alerts on quota data in order to be notified when consumption thresholds you define are exceeded or when you are approaching a quota limit. You have to define which conditions trigger the alert, and where you want to be notified (notification channels include email, SMS, Cloud Console app, PagerDuty, Slack, Pub/Sub, and Webhooks). Cloud Monitoring offers both a UI and an API to create and configure these alerts.
Ratio alerting
The new Monitoring Query Language (MQL) makes it possible to create flexible and powerful ratio alerts. With ratio alerts, you can set an alerting threshold as a percentage of a quota limit instead of a fixed number. The advantage of an alert based on a ratio is that you don’t need to redefine the alert when the quota limit changes. For example, you could set an alert threshold as “75%” for the CPUs quota, which triggers the alert if the number of CPUs in use exceeds 75, given a quota limit of 100. If you then increase the quota limit to 300 CPUs, the alert triggers if the number of CPUs in use exceeds 225.
Combined with wildcard filters, MQL can help set up powerful alerts, e.g., “alert me if any of my quotas reach 80% of their limits.” This allows you to create one alert that covers a significant portion of your quotas.
Get started
Any project owner, viewer or editor can access quota usage within the Cloud Console. You can get started by reviewing the Quota and Service Usage documentation and then Managing service quota using the Service Usage API. For quota monitoring and alerting, start with the documentation on using quota metrics, followed by more in-depth documentation on MQL, ratio alerting, and wildcards.
Read More for the details.