GCP – Announcing gcpdiag – Open Source Troubleshooting Tool for Google Cloud Platform
We strive to make Google Cloud as easy to use as possible, and one of the ways we do this is to listen closely to our customer feedback that comes in through our Google Cloud Support team. Besides helping the customers immediately, the support team is also using the experience gained through this feedback to feed improvement suggestions to the product development and documentation teams. Every time that a customer requires help from support, we see it as an opportunity to improve the product experience, and we call this work “supportability”, as you can read on a previous blog post.
Supportability in practice
Let’s consider the following example: A DevOps engineer conducts a security review and realizes they can increase security by removing excessive permissions from several service accounts. However, they inadvertently remove a permission from a GCE service account that is needed to ingest system logs from GCE instances into Cloud Logging.
As part of the supportability work, we will verify that our products use sensible defaults, that the UI is intuitive, and that our public documentation is clear on how to do things correctly. In this example, we will verify and possibly improve our documentation on access control for Cloud Logging, which however seems clear on this topic. Also, the UI is also not misleading, and the default permissions are set correctly so that logging should work.
So what else might we do to help customers to determine the root cause of this permissions issue?
Introducing gcpdiag
Today, we are announcing gcpdiag, an open source tool to detect configuration issues in Google Cloud projects, maintained by Google Cloud Support team and with contributions from the open source community.
gcpdiag was born out of the idea that we could automate troubleshooting certain common issues using information returned from Google Cloud API calls. Going back to the logging ingestion example, the automated process might be implemented like this:
Retrieve the list of GCE instances in the project (API call).
Retrieve the IAM policy of the project (API call).
Verify that the service accounts used by the listed GCE instances have the logging.logEntries.create permission.
gcpdiag is a command-line tool which runs many of such automated checks, called rules, and creates a report about all the issues that it detects. Rules are classified by the category of issues that they detect, such as ERR for likely mistakes, BP for best practices, and SEC for security issues. Similar to code linting tools that you run against your code, you run gcpdiag to inspect your GCP resources to catch these common configuration issues. Here is an example:
Currently, gcpdiag ships with more than 70 rules that detect common issues encountered by the Cloud Support team troubleshooting customer issues. We have particularly good coverage for GKE-related issues, and are improving our coverage of the detection of issues in other products.
Try out gcpdiag today
We encourage you to try gcpdiag out to see if you have any of the many issues that gcpdiag can detect in your own projects. gcpdiag only checks configurations and doesn’t make any changes to your resources, so you can run it without worry. The easiest way to run it is using Cloud Shell, where gcpdiag is pre-installed. Just type in your Cloud Shell instance:
gcpdiag lint –project=PROJECT_ID
You can restrict what rules to run, and fine tune the output to your liking, for example as follows:
gcpdiag lint –project=PROJECT_ID –include=gke
–exclude=bp –hide-ok
This will run only GKE-related rules and avoid best-practice rules (“bp”). Also, it will only show failed rules in the output.
If you prefer running it in your own terminal instead of Cloud Shell, follow these instructions to set it up locally with Docker. With the docker image, you can run gcpdiag on demand or periodically, to validate that your projects remain without any known issues.
Help us make gcpdiag more useful
gcpdiag is not an official Google product, but instead an open source community project, released on GitHub. We released gcpdiag as open source because we want to encourage community contributions, and because we believe that a “troubleshooting as code” tool like gcpdiag should be available to all.
Please let us know about bugs or feature requests via a GitHub issue. We also welcome external code contributions in the form of pull requests. Help us help all Google Cloud users troubleshoot configuration issues and run their resources according to best practices by improving gcpdiag!
Read More for the details.