GCP – Don’t just speculate, investigate! Gemini Cloud Assist now offers root-cause analysis
Debugging in a complex, distributed cloud environment can feel like searching for a needle in a haystack. The sheer volume of data, intertwined dependencies, and ephemeral issues make traditional troubleshooting methods time-consuming and often reactive. Just as modern software development demands more context for effective debugging, so too does cloud operations.
Gemini Cloud Assist, a key product in the Google Cloud with Gemini portfolio, simplifies the way you manage your applications with AI-powered assistance to help you design, deploy, and optimize your apps, so you can reach your efficiency, cost, reliability, and security goals.
Then there’s Gemini Cloud Assist investigations, a root-cause analysis (RCA) AI agent for troubleshooting infrastructure and applications that is now available in preview.
When you encounter an issue, you can initiate an investigation from various places like the Logs Explorer, Cloud Monitoring alerts, or directly from the Gemini chat panel. Cloud Assist then analyzes data from multiple sources, including logs, configurations, and metrics, to produce ranked and filtered “Observations” that provide insights into your environment’s state. It synthesizes these observations to diagnose probable root causes, explains the context, and recommends the next steps or fixes to resolve the problem. If you need more help, your investigation, along with all its context, can be seamlessly transferred into a Google Cloud support case to expedite resolution with a support engineer.
How Gemini Cloud Assist investigations works
Gemini Cloud Assist investigations helps to find the root cause of an issue using a combination of capabilities:
- Programmatic, proactive, and interactive access: Trigger or consume your investigation through API calls, chat menu, or UI for proactive or interactive troubleshooting.
- Contextualization: Investigations discover the most relevant resources, data sources, and APIs in your environment to provide focused troubleshooting.
- Comprehensive signal analysis: Investigations perform deep analysis in parallel across Cloud Logs, Cloud Asset Inventory, App Hub, Metrics, Errors, and Log Themes to uncover anomalies, configuration changes, performance bottlenecks, and recurring issues.
- AI-powered insights and recommendations: Utilizing specialized knowledge sources, like Google Cloud support knowledgebases and internal runbooks, investigations generate probable root cause and actionable recommendations.
- Interactive collaboration – Chat with and share investigations across your team for collaborative troubleshooting between you, your team, and Gemini Cloud Assist.
- Handoff to Google Cloud Support: Convert your investigation directly to a support case without losing any time or context.
Programmatic, proactive, and interactive investigations
Early users are thrilled with the speed and effectiveness with which Cloud Assist investigations helps them troubleshoot and resolve tough problems.
“At ZoomInfo, maintaining uptime is critical, but equally important is ensuring our engineers can swiftly and effectively troubleshoot complex issues. By integrating Gemini Cloud Assist investigations early into our development process, we’ve accelerated troubleshooting across all levels of our engineering team. Engineers at every experience level can now rapidly diagnose and resolve problems, reducing some resolution times from hours to minutes. This efficiency enables our teams to spend more energy innovating and less time on reactive problem-solving. Gemini Cloud Assist investigations isn’t just a troubleshooting aid; it’s a key driver of productivity and innovation.” – Yasin Senturk, DevOps Engineer at ZoomInfo
“I’m really impressed by how Gemini Cloud Assist Investigations feature in 2 minutes turned over with some valid suggestions on the potential root causes, and the first one being the actual culprit! I was able to mitigate the whole issue within an hour. Gemini Cloud Assist really saved my weekend!” – Chuanzhen Wu, SRE, Google Waze
Let’s walk through Gemini Cloud Assist investigations’ capabilities in a bit more detail.
Programmatic, proactive, and interactive access
You can start an investigation directly from various points within Google Cloud, such as error messages in Logs Explorer or specific product pages (like Google Kubernetes Engine or Cloud Run), or from the central Investigations page, where you can provide context like error messages, affected resources, and observation time. Gemini Cloud Assist investigations also provides an API, allowing you to integrate it into existing workflows such as Slack or other incident management tools. If the root cause of an issue requires further assistance, you can trigger a Google Cloud support case with the Investigation response so support engineers can proceed from that point.
Contextualization
Investigations can start with a natural language description, error message, log snippets, or any combination of information that you have about your issue. It starts by gathering the initial context related to your issue, then builds a topology of relevant resources and all the associated data sources that might provide insights to the root cause.
Investigations uses both public and private knowledge, playbooks informed by Google SRE and Google Cloud Support issues, and your topology, grounding itself in similar issues before generating a troubleshooting plan for your issue. This context becomes key in providing focused and comprehensive signal analysis.
Comprehensive signal analysis
Once the investigation runs, you’ll see the observations that it starts to collect from your project. The investigation goes beyond surface-level observations; it automatically analyzes critical data sources across your Google Cloud environment, including:
- Google Cloud logs: Sifting through vast log data to identify anomalies and critical events
- Cloud Asset Inventory: Understanding changes in your resource configurations and their potential impact
- Metrics (coming soon): Correlating performance data to pinpoint resource exhaustion or unexpected behavior
- Errors: Aggregating and categorizing errors to highlight patterns and recurring problems
- Log themes: Identifying common patterns and themes within log data to provide a higher-level view of issues
AI-powered insights and recommendations
Observations are the basis of Gemini Cloud Assist investigations’ root-cause insights and recommendations. Leveraging Gemini’s analytical capabilities, Cloud Assist synthesizes observations from disparate data sources, ranking and filtering information to focus on the most relevant details. Crucially, investigations draw upon differentiated knowledge sources and publicly available documentation, such as extensive Google Cloud support troubleshooting knowledge and internal runbooks, to generate highly accurate and relevant insights and observations. It then generates:
-
Probable root cause: Provides clear hypotheses about the underlying cause of the issue, complete with contextual explanations
-
Actionable recommendations: Offers concrete next steps for troubleshooting or even direct fixes, helping you resolve incidents faster
Handoff to Google Support teams
If an issue proves particularly elusive, with the click of a button, investigations packages context, observations, and hypotheses into a support case, for faster issue resolution. This is why you might want to run an investigation before contacting Google support teams about an issue.
Get started with Gemini Cloud Assist investigations today
Ready to get to the root of your troubles faster? Try investigations now by investigating any error logs from the Log Explorer console. Or create an investigation directly and describe any issues you might be having.
Read More for the details.