GCP – Understand the change in Cloud Monitoring service discovery and how to adapt
If you’ve opened the SLOs Overview in the Google Cloud console recently, you may have seen this notice:
This notice announces a recent change in the way of defining services for Cloud Monitoring. Before the change, Cloud Monitoring automatically discovered services that were provisioned in AppEngine, Cloud Run or Google Kubernetes Engine (GKE). These services were automatically populated in the Services Overview dashboard.
Now, all services in the Services Overview dashboard have to be created explicitly. To simplify this task, when defining a new service in the console UI you are presented with a list of candidates that is built based on the auto-discovered services. The full list of the auto-discovered services includes managed services from AppEngine, Cloud Run and Istio as well as GKE workloads and services.
Besides using the UI, you can add managed services to Cloud Monitoring using the services.create API or using the Terraform google_monitoring_service resource.
For example, if you have a GKE cluster named cluster-001 provisioned in the us-central1 region that has a service frontend in the default namespace, the following command in Cloud Shell defines this service for Cloud Monitoring:
<ListValue: [StructValue([(‘code’, ‘curl -X POST \rn https://monitoring.googleapis.com/v3/\rn projects/${GOOGLE_CLOUD_PROJECT}/services?service_id=frontend \rn -H “Authorization: Bearer $(gcloud auth print-access-token)” \rn -H “Content-Type: application/json; charset=utf-8” \rn -d \rn’rn{rn “gkeService”: {rn “clusterName”: “cluster-001”,rn “location”: “us-central1”,rn “namespaceName”: “default”,rn “serviceName”: “frontend”rn }rn}”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6dd74c8ca0>)])]>
When using the Terraform resource, the keys for the service_labels argument should be converted from the camel case notation (in documentation) to the snake case notation. For example, the command above will look in Terraform like the following:
<ListValue: [StructValue([(‘code’, ‘resource “google_monitoring_service” “frontend” {rn service_id = “frontend”rn basic_service {rn service_type = “GKE_SERVICE”rn service_labels = {rn location : “us-central1”,rn cluster_name : “cluster-001”,rn service_namespace : “default”,rn service_name : “frontend”rn }rn }rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6dd74c8400>)])]>
When your definition of the service does not match one to one with one of the managed services, you can add it to Cloud Monitoring by defining a custom service. You will use the same API request:
<ListValue: [StructValue([(‘code’, ‘curl -X POST \rn https://monitoring.googleapis.com/v3/\rn projects/${GOOGLE_CLOUD_PROJECT}/services?service_id=custom_svc \rn -H “Authorization: Bearer $(gcloud auth print-access-token)” \rn -H “Content-Type: application/json; charset=utf-8” \rn -d \rn’rn{rn “displayName”: “custom sevice”rn “custom”: {}rn}”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6dd74c8d60>)])]>
Or you will use a designated Terraform resource, google_monitoring_custom_service:
<ListValue: [StructValue([(‘code’, ‘resource “google_monitoring_custom_service” “custom_svc” {rn service_id = “custom_svc”rn display_name = “custom service”rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6dd74c83d0>)])]>
Compared to a custom service, the auto-detected services come with two predefined SLIs for availability and latency. These SLIs utilize the metrics of the managed services that are automatically captured such as request processing time or HTTP request status. For custom services these SLIs have to be defined explicitly using request-based or window-based SLIs.
Check out creating SLOs and SLO-based alerts to find more information about tracking your service SLO and error budgets. And see this blog to learn about the predefined SLIs that are used in availability and latency SLOs.
Read More for the details.