gcp

2025 11 03

GCP – Announcing the General Availability of Smarter, AI-powered Cost Anomaly Detection

Last year, we announced the public preview of Cost Anomaly Detection, an AI-powered product designed to eliminate one of the biggest anxieties of using the Cloud: unexpected costs. The goal was to provide a safety net that automatically identifies unusual spikes in spending, helping you catch issues before they become financial problems.

Today, we are excited to announce that Cost Anomaly Detection is now generally available (GA), and it is more proactive, intelligent, and flexible. Best of all, anomaly alerts are now on by default for every customer across all projects, including the new ones, offering complete protection from day one.

What’s new in general availability?

For the GA release, we focused on making the service smarter, more automatic, more proactive, and more customizable to suit your specific needs. Here’s what’s new:

1. Auto-alerts by default

Insights into any deviations in your cloud costs should be the default. Protection from cost overruns should be constant and not require any configuration from your end. That’s why we’ve automatically enabled anomaly alerts for all customers on all their projects. Default alerts will be sent to Billing Administrators; you can, of course, easily visit the billing console to manage and customize your alert preferences at any time. The alerts will take you to the Anomaly dashboard on the billing console, where you can easily see all the details related to the cost spike including the root causes.

Anomaly Dashboard with Root Cause Analysis

2. Intelligent, AI-generated thresholds

Will auto-alerts mean more noise and email spam? No. Our improved algorithm now provides automated, AI-generated anomaly thresholds based on your historical spending patterns. This intelligent baseline ensures you are only alerted to spikes that seem significant and unexpected, relative to your spend behavior.

And while the AI-generated thresholds work out of the box, you still have the flexibility to override them with your own custom values, if needed. Customers who have already configured their own custom values but would like to leverage our AI-generated thresholds, can easily do so from the billing console at any time.

3. More flexible filtering with percentage deviation

We heard your feedback that every project has a different sensitivity to cost spikes. A $100 deviation might be critical for a small project but expected noise for a large one. To address this, we’ve introduced an additional threshold for percentage deviation that filters your anomaly dashboard and alerts not only on an absolute dollar value but also on a percentage change. This allows your alerts to stay relevant to your budget and scale.

Don’t worry — all anomalies are still captured and can be viewed at any time by simply removing the filters from your dashboard.

4. Immediate protection from day one

During the public preview, we offered anomaly detection only on projects that were at least 6 months old due to lack of significant spend history. However, our improved algorithm now solves this “cold start” problem, making it possible to alert on anomalies even for new accounts and projects with no prior spend history. This helps ensure that you are protected on Google Cloud, from the get go.

Get started today

Cost Anomaly Detection is a core part of our FinOps capabilities that provides you with complete and predictable control over your cloud costs. When layered with Cloud Budgets, it creates a robust cost control strategy that works to prevent, detect, and contain runaway spend. And it remains free, offered as part of our comprehensive set of cost management tools.

Head over to your billing console to access this product and refer to our documentation for more details.

Read More for the details.

2025 11 03

GCP – Evolving Ray and Kubernetes together for the future of distributed AI and ML

Tibor Kiss Cloud, Google Cloud gcp

Ray is an OSS compute engine that is popular among Google Cloud developers to handle complex distributed AI workloads across CPUs, GPUs, and TPUs. Similarly, platform engineers have long trusted Kubernetes, and specifically Google Kubernetes Engine, for powerful and reliable infrastructure orchestration. Earlier this year, we announced a partnership with Anyscale to bring the best of Ray and Kubernetes together, forming a distributed operating system for the most demanding AI workloads. Today, we are excited to share some of the open-source enhancements we have built together across Ray and Kubernetes.

Ray and Kubernetes label-based scheduling

One of the key benefits of Ray is its flexible set of primitives that enable developers to write distributed applications without thinking directly about the underlying hardware. However, there are some use cases that weren’t very well covered by the existing support for virtual resources in Ray.

To improve scheduling flexibility and empower the Ray and Kubernetes schedulers to perform better autoscaling for Ray applications, we are introducing label selectors to Ray. Ray label selectors are heavily inspired by Kubernetes labels and selectors, and intend to offer a familiar experience and smooth integration between the two systems. The Ray Label Selector API is available starting on Ray v2.49 and offers improved scheduling flexibility for distributed tasks and actors.

With the new Label Selector API, Ray now directly helps developers accomplish things like:

Assign labels to nodes in your Ray cluster (e.g. gpu-family=L4, market-type=spot, region=us-west-1).
When launching tasks, actors or placement groups, declare which zones, regions or accelerator types to run on.
Use custom labels to define topologies and advanced scheduling policies.

For scheduling distributed applications on GKE, you can use Ray and Kubernetes label selectors together to gain full control over application and the underlying infrastructure. You can also use this combination with GKE custom compute classes to define fallback behavior when specific GPU types are unavailable. Let’s dive into a specific example.

Below is an example Ray remote task that could run on various GPU types depending on available capacity. Starting in Ray v2.49, you can now define the accelerator type to bind GPUs with fallback behavior in cases where the primary GPU type or market type is not available. In this example, the remote task is targeting spot capacity with L4 GPUs but with a fallback to on-demand:

code_block: <ListValue: [StructValue([(‘code’, ‘@ray.remote(rn label_selector={rn “ray.io/accelerator”: “L4″rn “ray.io/market-type”: “spot”rn },rn fallback_strategy=[rn {rn “label_selector”: {rn “ray.io/accelerator”: “L4″rn “ray.io/market-type”: “on-demand”rn }rn },rn ]rn)rndef func():rn pass’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f30c3798fa0>)])]>

On GKE, you can couple the same fallback logic using custom compute classes such that the underlying infrastructure for the Ray cluster matches the same fallback behavior:

code_block: <ListValue: [StructValue([(‘code’, ‘apiVersion: cloud.google.com/v1rnkind: ComputeClassrnmetadata:rn name: gpu-compute-classrnspec:rn priorities:rn – gpu:rn type: nvidia-l4rn count: 1rn spot: truern – gpu:rn type: nvidia-l4rn count: 1rn spot: falsern nodePoolAutoCreation:rn enabled: truern whenUnsatisfiable: DoNotScaleUp’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f30c3798c40>)])]>

Refer to the Ray documentation to get started with Ray label selectors.

Advancing accelerator support in Ray and Kubernetes

Earlier this year we demonstrated the ability to use the new Ray Serve LLM APIs to deploy large models such as DeepSeek-R1 on GKE with A3 High and A3 Mega machine instances. Starting on GKE v1.33 and KubeRay v1.4, you can use Dynamic Resource Allocation (DRA) for flexible scheduling and sharing of hardware accelerators, enabling the use of the next-generation of AI accelerators with Ray. Specifically, you can now use DRA to deploy Ray clusters on A4X series machines utilizing the NVIDIA GB200 NVL72 rack-scale architecture. To use DRA with Ray on A4X, create an AI-optimized GKE cluster on A4X and define a ComputeDomain resource representing your NVL72 rack:

code_block: <ListValue: [StructValue([(‘code’, ‘apiVersion: resource.nvidia.com/v1beta1rnkind: ComputeDomainrnmetadata:rn name: a4x-compute-domainrnspec:rn numNodes: 18rn channel:rn resourceClaimTemplate:rn name: a4x-compute-domain-channel’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f30c3798940>)])]>

And then specify the claim in your Ray worker’s Pod template:

code_block: <ListValue: [StructValue([(‘code’, ‘workerGroupSpecs:rn …rn template:rn…rnspec:rn …rn volumes:rn …rn containers:rn – name: ray-containerrn …rn resources:rn limits:rn nvidia.com/gpu: 4rnt claims:rn – name: compute-domain-channelrn …rnresourceClaims:rn – name: compute-domain-channelrn resourceClaimTemplateName: a4x-compute-domain-channel’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f30c3798850>)])]>

Combining DRA with Ray ensures that Ray worker groups are correctly scheduled on the same GB200 NVL72 rack for optimal GPU performance for the most demanding Ray workloads.

We’re also partnering with Anyscale to bring a more native TPU experience to Ray and closer ecosystem integrations with frameworks like JAX. Ray Train introduced a JAXTrainer API starting in Ray v2.49, streamlining model training on TPUs using JAX. For more information on these TPU improvements in Ray, read A More Native Experience for Cloud TPUs with Ray.

Ray-native resource isolation with Kubernetes writable cgroups

Writable cgroups allow the container’s root process to create nested cgroups within the same container without requiring privileged capabilities. This feature is especially critical for Ray, which runs multiple control-plane processes alongside user code inside the same container. Even under the most intensive workloads, Ray can dynamically reserve a portion of the total container resources for system critical tasks, which significantly improves the reliability of your Ray clusters.

Starting on GKE v1.34.X-gke.X, you can enable writable cgroups for Ray clusters by adding the following annotations:

code_block: <ListValue: [StructValue([(‘code’, ‘metadata:rn annotations:rn node.gke.io/enable-writable-cgroups.test-container: “true”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f30c3798790>)])]>

To enable Ray resource isolation using writable cgroups, set the following flags in ray start:

code_block: <ListValue: [StructValue([(‘code’, ‘ray start –head –enable-resource-isolation’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f30c3798160>)])]>

This capability is one such example of how we’re evolving Ray and Kubernetes to improve reliability across the stack without compromising on security.

In the near future, we plan to also introduce support for per-task and per-actor resource limits and requirements, a long requested feature in Ray. Additionally, we are collaborating with the open-source Kubernetes community to upstream this feature..

Ray vertical autoscaling with in-place pod resizing

With the introduction of in-place pod resizing in Kubernetes v1.33, we’re in the early stages of integrating vertical scaling capabilities for Ray when running on Kubernetes. Our early benchmarks show a 30% increase in workload efficiency due to scaling pods vertically before scaling horizontally.

Benchmark based on completing two TPC-H workloads (Query 1 and 5) with Ray, 3 times on a GKE cluster with 3 worker nodes, each with 32 CPUs and 32 GB of memory.

In-place pod resizing enhances workload efficiency in the following ways:

Faster task/actor scale-up: With in-place resizing, Ray workers can scale up their available resources in seconds, an improvement over the minutes it could take to provision new nodes. This capability significantly accelerates the scheduling time for new Ray tasks.
Enhanced bin-packing and resource utilization: In-place pod resizing enables more efficient bin-packing of Ray workers onto Kubernetes nodes. As new Ray workers scale up, they can reserve smaller portions of the available node capacity, freeing up the remaining capacity for other workloads.
Improved reliability and reduced failures: In-place scaling of memory can significantly reduce out-of-memory (OOM) errors. By avoiding the need to restart failed jobs, this capability improves overall workload efficiency and stability.

Ray + Kubernetes = The distributed OS for AI

We are excited to highlight the recent joint innovations from our partnership with Anyscale. The powerful synergy between Ray and Kubernetes positions them as the distributed operating system for modern AI/ML. We believe our continued partnership will accelerate innovation within the open-source Ray and Kubernetes ecosystems, ultimately driving the future of distributed AI/ML.

Together, these updates are a significant step toward Ray working seamlessly on GKE. Here’s how to get started:

Request capacity: Get started quickly with Dynamic Workload Scheduler Flex Start for TPUs and GPUs, which provides access to compute for jobs that run for less than 7 days.
Get started with Ray on GKE
Try out JaxTrainer with TPUs

Read More for the details.

2025 10 31

GCP – Cloud CISO Perspectives: AI as a strategic imperative to manage risk

Tibor Kiss Cloud, Google Cloud gcp

Welcome to the second Cloud CISO Perspectives for October 2025. Today, Jeanette Manfra, senior director, Global Risk and Compliance, shares her thoughts on the role of AI in risk management.

As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.

aside_block: <ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x7f5e28391940>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>

AI as a strategic imperative: Modernizing risk management

By Jeanette Manfra, Senior Director, Global Risk and Compliance, Google Cloud

AI is more than a technological upgrade: It’s a strategic imperative for modernizing risk management, security, and compliance. It can help organizations fundamentally shift from reactive responses to proactive, data-driven strategies.

AI systems that can enable predictive risk analytics and accurately inform decision-making in a timely manner is the holy grail of risk management, although adoption has not been uniform. Great strides have been made in many disciplines, particularly in financial risk modeling. Other areas have struggled to take advantage of advances in analytics, for various reasons.

What I am focused on is the integration of a unified risk posture that is agile as inputs change — and meets the needs of a rapidly-growing company. There are four key areas where AI can help across the risk management lifecycle:

Risk identification: AI algorithms can analyze large volumes of structured and unstructured data from many sources to detect patterns and anomalies indicative of emerging risks. Natural Language Processing (NLP) specifically can help extract insights from text data, and can help identify risks from regulatory changes, customer complaints, and employee feedback. For financial institutions, AI can identify policies and procedures that align with regulations and pinpoint compliance gaps.
Risk assessment: AI models can use predictive analytics to forecast potential risks based on historical data and current trends to enable proactive management. They can run simulations for various risk scenarios to assess impact, which can improve decision-making. Machine learning algorithms can be trained to continuously learn from new data, dynamically adjusting risk assessments and improving accuracy.
Risk mitigation: AI-powered systems are being developed that can implement and enforce automated controls to reduce exposure to identified risks in near real-time. They suggest optimal mitigation strategies based on changing risk profiles and business objectives.
Risk monitoring and reporting: AI-driven systems can provide continuous monitoring, generating alerts for unusual activities or deviations. They can automate data collection and analysis, generate detailed reports, and improve compliance reporting, such as automating Suspicious Activity Reports (SARs) filings.

We can also track the value of AI across key risk-management uses:

In cybersecurity threat detection, AI-driven systems can monitor enterprise environments, network traffic, and user activity, and help enable detection. They can identify anomalies and predict attack vectors, shifting security from reactive to proactive.
In regulatory change management, AI systems can review regulatory documents and updates, then summarize the changes and other important details in plain language.
In quality assurance and quality control, AI is being explored by compliance departments to help with tasks, such as executing secondary reviews with large population samples.

Organizational and operational challenges

Implementing AI requires careful planning and testing to secure buy-in and acceptance from regulators, employees, executives, and other stakeholders. Boards of directors also can play a vital role in helping guide AI adoption. Conversely, a lack of broad organizational commitment and involvement from senior leadership can limit the beneficial impact of AI.

Organizations generally pursue one of two paths for AI adoption. AI tools can be integrated into existing workflows, or organizations can use AI as a starting point to transform workflows from scratch to make AI an integral part of the process. Both often face operational challenges when working with legacy infrastructure not designed for modern, data-intensive systems. Additionally, fragmentation of existing security tools can hamper a unified view of the threat landscape.

Organizations can face fragmented risk oversight from a lack of alignment, so effective AI risk management should be integrated into broader enterprise risk-management strategies. Business and security leaders, and boards of directors, should be prepared to implement cultural changes as required.

There is also a significant shortage of experienced specialists capable of effectively deploying, managing, and operating AI solutions. AI security solutions, for example, require specialized talent, ongoing training, and infrastructure investments.

While AI can automate many tasks, over-reliance on automated systems can diminish the critical role of human judgment and contextual understanding, leading to unfair or harmful outcomes when AI systems fail to account for nuanced or context-specific factors. Human decision-making authority should remain final in AI compliance.

Risk measurement and management with AI can also face an additional level of complexity when organizations rely on third-party suppliers for AI products and services. Differing metrics, lack of transparency, and less control over use cases can all impair the use of AI, so contingency processes for failures in third-party data and AI systems should be strongly considered.

Adopting comprehensive AI risk-management frameworks

Many organizations lack structured AI governance. To implement AI compliance and risk management properly, the legal, data governance, technical development, and cybersecurity teams should be brought together. Organizations need a structured, comprehensive approach.

At Google Cloud, part of our approach is to align AI risk management with the Secure AI Framework (SAIF), the NIST AI Risk Management Framework (AI RMF), and ISO 42001. Beyond NIST, organizations can integrate AI into existing enterprise risk-management frameworks including ISO 31000 and Committee of Sponsoring Organizations (COSO) to enhance their effectiveness by introducing automation, scalability, and near real-time capabilities.

Google Cloud’s approach to trustworthy AI

We also adhere to a holistic approach to AI risk management and compliance. We focus on several key areas:

Innovating responsibly, guided by AI principles;
Extending security best practices to AI-specific risks through SAIF (guidance here) and the Coalition for Secure AI (CoSAI);
Employing an AI risk assessment methodology for identifying, assessing, and mitigating risks;
Developing and using an automated, scalable, and evidence-based approach for auditing generative AI workloads;
And emphasizing human oversight and collaboration in our risk assessments and governance councils.

Additionally, we use explainability tools to help understand and interpret AI predictions and evaluate potential bias; privacy-preserving technologies such as masking and tokenization and adhering to privacy laws; continuous monitoring and auditing for security vulnerabilities that AI might miss; investing in training programs to bridge the AI knowledge gap; and encouraging “interdisciplinary collaboration” between data scientists, risk analysts, and domain experts is also key.

AI is a transformative force, enabling unprecedented levels of proactive risk management, enhanced security, and streamlined compliance. The path forward requires a holistic, leadership-driven approach, spanning structured frameworks, ethical AI design, interdisciplinary collaboration, and continuous investments in talent and technology. Staying adaptable to evolving technologies and regulations is not just a competitive advantage; it’s an operational necessity.

For more guidance on using AI in risk management, please check out our CISO Insights hub.

aside_block: <ListValue: [StructValue([(‘title’, ‘Tell us what you think’), (‘body’, <wagtail.rich_text.RichText object at 0x7f5e28391430>), (‘btn_text’, ‘Join the conversation’), (‘href’, ‘https://google.qualtrics.com/jfe/form/SV_2n82k0LeG4upS2q’), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>

In case you missed it

Here are the latest updates, products, services, and resources from our security teams so far this month:

How Google Does It: Building AI agents for cybersecurity and defense: At Google, we’ve moved from talking about AI agents to actively using them for security. Here are four critical lessons that helped shape our approach. Read more.
How Model Armor can help protect your AI apps: You can use Model Armor to protect against prompt injections and jailbreaks. Here’s how. Read more.
Enabling a safe agentic web with reCAPTCHA: At Google Cloud, we believe preventing fraud and abuse in the agentic web should fundamentally result in a simpler customer experience. Here’s how we’re doing it. Read more.
New from Mandiant Academy: Practical training to protect your perimeter: Protecting the Perimeter: Practical Network Enrichment teaches the skills to transform network traffic analysis into a powerful, precise security asset. Read more.
How we’re helping customers prepare for a quantum-safe future: Google has been working on quantum-safe computing for nearly a decade. Here’s our latest on protecting data in transit, digital signatures, and public key infrastructure. Read more.
Google is named a Leader in the 2025 Gartner® Magic Quadrant™ for SIEM: We’re excited to share that Gartner has recognized Google as a Leader in the 2025 Gartner® Magic Quadrant™ for Security Information and Event Management (SIEM). Read more.
Cloud Armor named Strong Performer in Forrester WAVE, new features launched: New capabilities in Cloud Armor offer more comprehensive security policies and granular network configuration controls. Read more.
A practical guide to Google Cloud’s Parameter Manager: Google Cloud Parameter Manager is designed to reduce unnecessarily sharing key cloud configurations, and it works with many types of data formats. Read more.

Please visit the Google Cloud blog for more security stories published this month.

aside_block: <ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x7f5e283916d0>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>

Threat Intelligence news

EtherHiding in the open, part 1: DPRK hides nation-state malware on blockchains: Google Threat Intelligence Group (GTIG) and Mandiant have observed the North Korea threat actor UNC5342 using “EtherHiding” to deliver malware and facilitate cryptocurrency theft, the first time we have observed a nation-state actor adopting this method. EtherHiding uses transactions on public blockchains to store and retrieve malicious payloads, and is notable for its resilience against conventional takedown and blocklisting efforts. Read more.
EtherHiding in the open, part 2: How UNC5142 uses it to distribute malware: Since late 2023, UNC5142 has significantly evolved their tactics, techniques, and procedures (TTPs) to enhance operational security and evade detection. The group is characterized by its use of compromised WordPress websites and EtherHiding on the BNB Smart Chain to store its malicious components in smart contracts. Read more.
New malware attributed to Russia state-sponsored COLDRIVER: COLDRIVER, a Russian state-sponsored threat group known for targeting high-profile representatives from non-governmental organizations, policy advisors, and dissidents, swiftly shifted operations after GTIG’s May 2025 public disclosure of its LOSTKEYS malware. Only five days later, the group began deploying new malware families. Read more.
Pro-Russia information operations leverage Russian drone incursions into Polish airspace: GTIG has observed multiple instances of pro-Russia information operations (IO) actors promoting narratives related to the reported incursion of Russian drones into Polish airspace that occurred in September. The IO activity appeared consistent with previously-observed instances of pro-Russia IO targeting Poland — and more broadly the NATO Alliance and the West. Read more.
Vietnamese actors using fake job posting campaigns to deliver malware and steal credentials: GTIG is tracking a cluster of financially-motivated threat actors operating from Vietnam that use fake job postings on legitimate platforms to target individuals in the digital advertising and marketing sectors. Read more.

Please visit the Google Cloud blog for more threat intelligence stories published this month.

Now hear this: Podcasts from Google Cloud

What really makes your SOC ready for AI: What impact will AI have on security teams: Will it turn them into powered-up superheroes, or is the future more Jekyll-and-Hyde? Monzy Merza, co-founder and CEO, Crogl, discusses AI’s potential destinies with hosts Anton Chuvakin and Tim Peacock. Listen here.
How to stop playing security theater and start practicing security reality: Jibran Ilyas, director, Incident Response, Google Cloud, talks with hosts Anton and Tim about why tabletops for incident response preparedness are effective yet rarely done well. Listen here.
Behind the Binary: Building a robust network at Black Hat: Host Josh Stroschein is joined by Mark Overholser, a technical marketing engineer, Corelight, who also helps run the Black Hat Network Operations Center (NOC). He gives us an insider’s look at the philosophy and challenges behind building a robust network for a security conference. Listen here.

To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.

Read More for the details.

2025 10 31

GCP – Why GKE & Gemini CLI are better together

Tibor Kiss Cloud, Google Cloud gcp

Cloud-native development is constantly evolving, and at Google, we’re dedicated to empowering developers and operators with tools that are both powerful and intuitive.

Today, we’re thrilled to dive into how the Gemini CLI and Google Kubernetes Engine (GKE) are coming together with the new open-sourcing of the GKE Gemini CLI extension. This extension brings GKE directly into the Gemini CLI ecosystem, and can also be used as an MCP server with any other MCP client. This gives developers several advantages:

Resources: Seamlessly integrate GKE-specific context directly into your Gemini CLI interactions, enabling less verbose and more natural language prompting.
Prompts: Provide detailed prompts integrated with Gemini CLI slash commands to easily complete common but complex workflows.
Powerful tools: Leverage GKE’s capabilities through intuitive commands, simplifying complex operations. Integration with companion products like Cloud Observability has been enhanced through the addition of specific context and tooling, ensuring seamless compatibility with other GCP products when using GKE.

We’re making regular releases, continuously improving and enhancing the Gemini CLI extension for GKE and GKE MCP server.

We’ve already heard from customers about how well the two work together: “We’re intrigued by the integration of GKE with Gemini CLI. This integration presents an exciting path to solve real-world challenges, and we look forward to working closely with Google to help shape its future.” – Jason O’Connell, Head of Engineering AI and Architecture, Macquarie Bank.

How Gemini CLI has become an essential developer tool

Gemini CLI has quickly become an essential tool for developers leveraging AI directly from their command line. For a deeper dive, learn more in this Gemini CLI blog post. This powerful AI agent, with built-in access to core tools, provides extensive out-of-the-box functionality, streamlining complex tasks and accelerating development workflows. It demonstrates the transformative potential of intelligent tooling for productivity. Gemini CLI quickly became the most starred agentic CLI on GitHub, and we’ve had dozens of releases featuring over a hundred community contributors.

One of the key strengths of Gemini CLI lies in its extensibility. Gemini CLI extensions bundle MCP servers, context files, and custom commands into a simple package that teaches Gemini how to use any tool. This innovative architecture enables seamless integration of countless extensibility, unlocking a universe of possibilities for developers to tailor their AI-driven workflows.

GKE: Powering the next generation of workloads

Google Kubernetes Engine (GKE) continues to be a cornerstone for enterprises seeking to deploy, manage, and scale their containerized applications. Its robust and flexible infrastructure has made it the leading choice for demanding workloads, including the increasingly vital training and inference tasks for AI models. Up until now, GKE leveraged the Gemini foundational model in Gemini CLI without GKE expert resources, prompts, or tools.

It is incredibly easy to get started with the new GKE Gemini CLI extension. You can install into Gemini CLI with a single command:

code_block: <ListValue: [StructValue([(‘code’, ‘gemini extensions install https://github.com/GoogleCloudPlatform/gke-mcp’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f5e28322700>)])]>

For those using other MCP clients, you can find installation instructions here.

GKE + Gemini CLI: Unlocking inference CUJs

Together, GKE and Gemini CLI shine when it comes to common inference use cases. They can deploy and manage your AI models on GKE as naturally as a conversation with your command line.

Here is an example of how this powerful combination can transform your workflow:

Scenario: You are an ML engineer and want to deploy an inference model. You aren’t familiar with what model or accelerator to use to satisfy your business requirements. You ask Gemini CLI configured with the GKE MCP server for help to deploy a model with a 1500ms latency requirement. Gemini CLI automates the process of discovering models and accelerators, and generating a deployable Kubernetes manifest based on your business requirements. This dynamic workflow has drastically reduced any friction.

Get started today

Together, GKE and Gemini CLI gives developers a more powerful experience working with AI on Kubernetes. We’re excited to see the innovative solutions you’ll build with these powerful tools. Download Gemini CLI, install the GKE Gemini CLI extension, dive in, experiment, and let us know what you think.

Read More for the details.

2025 10 31

GCP – Introducing the Log Analytics query builder: Easier analytics for your logs

Tibor Kiss Cloud, Google Cloud gcp

As a DevOps engineer, Site Reliability Engineer, or application developer, how many times have you wrestled with complex queries to get the insights you need? Have you wished that there was an easier way to troubleshoot, identify root causes, and verify fixes, without being a SQL expert? We hear you loud and clear. That’s why we’re thrilled to announce the general availability of the Log Analytics query builder, a powerful new tool designed to democratize access to your observability data in Google Cloud.

The challenge: When writing SQL is a bottleneck

Log Analytics lets you query logs, other telemetry types, and even transactional or business datasets from BigQuery in one place. However, for many users, writing SQL queries can be a significant hurdle. This is especially true when dealing with critical log data, where valuable information is often nested in JSON payloads with varying schemas. The time and effort required to write effective SQL can slow down troubleshooting and hinder the ability to diagnose issues efficiently.

The solution: An intuitive query-building experience

We designed the new query builder to break down these barriers, with an intuitive, UI-based experience that empowers users of all skill levels to get answers from their observability data quickly.

Our goals were simple:

Lower the barrier to entry: Get started with Log Analytics for troubleshooting without a steep learning curve.
Accelerate insights: Generate insights from your data faster, reducing the time and effort needed to create effective queries.
Simplify JSON parsing: Easily extract and analyze valuable data from JSON payloads in your logs.
Reduce the need for SQL: For many queries, you may not need to write a SQL, or use the builder to generate starting SQL and get a jump start to continue editing complex queries.

With the query builder, you can analyze, chart and alert on logs with a few clicks on the UI. Save time and effort by more easily writing a SQL query to help more quickly resolve an incident.

1 screenshot — Figure: Query Builder Interface

Key features at a glance

The Log Analytics query builder is packed with features to streamline your workflow:

Search all fields: Simply paste an error message or string to search across all your data and quickly pinpoint the source of an issue.
Log schema preview: Query builder not only provides a log schema preview, but also provides inferred JSON key and value previews.
Intelligent value selection: The UI provides intelligent values for fields and filters, derived directly from your dataset, even including the nested fields in JSON.
Easier JSON handling: The query builder automatically discovers and suggests JSON schemas and values, allowing you to easily select and extract data without wrestling with JSON_VALUE, JSON_EXTRACT or CAST.
Powerful filtering and aggregation: Easily apply common SQL operators, aggregations (like counts, mean, percentiles), and group-by clauses through a simple UI.
Work with log scopes: Apply a query to a log scope by selecting the log scope from the view/scope picker.
Real-time SQL preview: If you want to see the underlying SQL, the query builder provides a real-time preview that updates as you build your query in the UI. You can switch to the code editor at any time to fine-tune your query.
Visualization dashboard: Instantly visualize your query results and save them to a dashboard with a single click.

Example: Search for `IAM_PERMISSION_DENIED` in pod name contains `event-exporter-gke-mod`

In this example, we searched for `IAM_PERMISSION _DENIED` message in a Kubernetes pod name containing `event-exporter-gke-mod`, and displayed the log id and text payload from the log entry.

From this example, query builder generates the below SQL:

code_block: <ListValue: [StructValue([(‘code’, “WITHrn scope_query AS (rn SELECTrn *rn FROMrn `test-project.global._Default._Default` )rnSELECTrn resource.type,rn log_id,rn JSON_VALUE( resource.labels.pod_name ) AS pod_name,rn text_payloadrnFROMrn scope_queryrnWHERErn JSON_VALUE( resource.labels.pod_name ) LIKE ‘%event-exporter-gke-mod%’rn AND SEARCH(scope_query,rn ‘IAM_PERMISSIONS_DENIED’)rnLIMITrn 100″), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f5e28338e50>)])]>

Example: Count log entries in groups

In this example, we grouped log entries by severity, resource type, and log ID, and counted the number of log entries in each group, in descending order of time.

Query builder generates the below SQL representing this example:

code_block: <ListValue: [StructValue([(‘code’, “WITHrn scope_query AS (rn SELECTrn *rn FROMrn `test-project.global._Default._Default` )rnSELECTrn TIMESTAMP_TRUNC( timestamp, MINUTE ) AS minute_timestamp,rn severity,rn resource.type,rn log_id,rn COUNT( * ) AS count_allrnFROMrn scope_queryrnWHERErn SEARCH(scope_query,rn ‘k8s’)rnGROUP BYrn TIMESTAMP_TRUNC( timestamp, MINUTE ),rn severity,rn resource.type,rn log_idrnORDER BYrn minute_timestamp DESCrnLIMITrn 100″), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f5e28338f10>)])]>

Demo: The power of query builder in action

This video demonstrates how to use the query builder to understand total external traffic by country with VPC flow log:

While building your query with the builder, you can always switch to SQL editor and see the generated SQL representing the query, as shown below:

code_block: <ListValue: [StructValue([(‘code’, “WITHrn scope_query AS (rn SELECTrn *rn FROMrn `test-project.global._Default._Default` )rnSELECTrn JSON_VALUE( json_payload.dest_location.country ) AS country,rn SUM( CAST( JSON_VALUE( json_payload.bytes_sent ) AS INT64 ) ) AS total_bytes_sent,rn SUM( CAST( JSON_VALUE( json_payload.packets_sent ) AS INT64 ) ) AS total_packets_sent,rn AVG( CAST( JSON_VALUE( json_payload.rtt_msec ) AS INT64 ) ) AS avg_rtt_msecrnFROMrn scope_queryrnWHERErn log_id = ‘compute.googleapis.com/vpc_flows’rn AND JSON_VALUE( json_payload.reporter ) = ‘SRC’rn AND JSON_VALUE( json_payload.dest_location.country ) IS NOT NULLrnGROUP BYrn JSON_VALUE( json_payload.dest_location.country )rnORDER BYrn total_bytes_sent DESCrnLIMITrn 100″), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f5e28338d30>)])]>

You can also easily edit the query and advance the query as needed, while the foundations are built for you through the query builder experience.

To find more log analysis examples using the query builder, check out the user guide.

What’s next? The future of querying in Log Analytics

Query builder in Log Analytics is just the beginning. We have an exciting features planned, including:

More log scopes: We will soon support log scope containing views from multiple projects.
Trace in Analytics: We will introduce trace to become accessible and queryable from Log Analytics. Join trace and log data for advanced troubleshooting in the coming month.
Save and reuse Queries: Save your frequently used queries and access your recent query history, so you don’t have to start from scratch every time.
NL2SQL: Leverage the power of Gemini to ask questions in natural language and have the query builder generate the SQL for you.

Get started today!

Ready to say goodbye to your SQL headaches and experience a new level of simplicity and power in Log Analytics? The query builder is available now. Dive in and see how easy it is to get the answers you need from your observability data. We can’t wait to hear what you think!

Read More for the details.

2025 10 31

GCP – Accelerating AI inferencing with external KV Cache on Managed Lustre

Tibor Kiss Cloud, Google Cloud gcp

The demand for AI inference infrastructure is accelerating, with market spend expected to soon surpass investment in training the models themselves. This growth is driven by the demand for richer experiences, particularly through support for larger context windows and the rise of agentic AI. As organizations aim to improve user experience while optimizing costs, efficient management of inference resources is paramount.

According to an experimental study of large model inferencing, external key-value caches — KV Cache or, “attention caches” — on high-performance storage like Google Cloud Managed Lustre, can reduce total cost of ownership (TCO) by up to 35%, allowing organizations to serve the same workload with 43% fewer GPUs by offloading prefill compute to I/O. In this blog, we explore the core challenges of managing long-context AI inference and detail how Google Cloud Managed Lustre provides the high-performance external storage solution required to achieve these significant cost and efficiency benefits.

About KV Cache

During the inference phase, a KV Cache is a critical optimization technique for the efficient operation of Transformer-based large language models (LLMs).

The key innovation of the Transformer was the complete elimination of sequential processing (recurrence), which was achieved by introducing the self-attention mechanism to allow every element in a sequence to instantaneously and dynamically compare itself to and assess the relevance of every other element (a global, all-at-once evaluation). Within this self-attention mechanism, the model computes Key (K) and Value (V) vectors of all preceding tokens in the sequence. To generate the next token during the inference phase, the model needs the K and V vectors of all the previous tokens.

This is where the KV Cache comes into play. The KV Cache stores these K and V vectors after the initial context processing (known as the “prefill” stage), thereby avoiding the redundant, costly re-computation of the context sequence when generating subsequent tokens. By eliminating this re-computation, the KV Cache vastly speeds up the overall inference process. While smaller caches can fit in high-bandwidth memory (HBM) or host DRAM — up to a few TBs of memory may be available in a single multi-accelerator server — managing a KV Cache for contexts across multiple concurrent users that exceed the memory capacity often requires external or hierarchical storage solutions.

These large contexts can make the “prefill” computation — the calculation that an AI model performs when processing a large context window — very expensive:

For a large context of 100K or more tokens, the prefill computation may cause the time to first token (TTFT) to increase to tens of seconds.
Prefill computation requires a high number of floating-point operations (FLOPs). KV Cache reuse saves these costs and makes additional resources available on the accelerator.

The growth of agentic AI is likely to make the challenge of managing a long context even greater. Unlike a simple chatbot, agentic AI is built for action. It moves beyond conversation to solve problems proactively, completing tasks on your behalf. To do this, it actively gathers context from a wide range of digital sources. Agentic AI may, for example: check live flight data, pull a customer’s history from a database, research topics on the web, and/or keep organized notes in its own files. Agentic AI thereby builds a rich understanding of its environment, but often increases context lengths and their associated KV Cache size.

The key to managing performance costs at scale is to ensure that the accelerator is utilized as fully as possible. High-performance, scale-out storage provides the required greater throughput per accelerator and therefore translates into lighter resource requirements.

External KV Cache on Google Cloud Managed Lustre

We believe that Google Cloud Managed Lustre should be your primary storage solution for external KV Cache. On GPUs, Lustre is assisted by locally attached SSDs. And on TPUs, where local SSDs are not available, Lustre’s role is even more central.

A recent LMCache blog post by Google’s Danna Wang, “LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage,” demonstrates the foundational value of host-level offloading. Our Managed Lustre strategy is the next evolution of this host-offloading concept. While Local SSDs and CPU RAM are effective node-local tiers, they are fixed in size and cannot be shared. Managed Lustre provides a parallel file system to act as the massive, high-throughput external storage, making it a great solution for large-scale, multi-node, and multi-tenant AI inference workloads where the cache exceeds the capacity of the host machine.

Here’s an example of how the performance gains of Managed Lustre can reduce your TCO:

In an experiment with a 50K token context and a high cache hit rate (about 75%), using Managed Lustre improved total inference throughput by 75% and reduced mean time to first token by 44% compared to using KV Cache in host memory alone (further detail below).
TCO analysis yielded a 35% savings from using an external attention/KV Cache for a workload processing 1 million Tokens per Second (TPS) and leveraging A3-Ultra VMs and Managed Lustre, when compared to a workload leveraging no external storage.
Our experiment demonstrated that with configuration tuning and an improvement in KV Cache software to adopt more I/O parallelism, Managed Lustre can substantially improve inference performance.

Total Cost of Ownership: Analysis

When evaluating a KV Cache solution, it’s critical to consider the TCO, which includes not just compute and storage costs but also operational expenses and potential savings. Our analysis shows that a high performance storage-backed KV Cache, like one built on Managed Lustre, provides a compelling TCO advantage compared to purely memory-based solutions.

Cost savings

After taking incremental storage costs into account, we project that the TCO for a file-system-backed KV Cache solution, processing 1m TPS, is 35% lower compared to a memory-only solution. This makes it a more scalable and economically viable option for large-scale AI inference deployments.

The primary TCO benefit comes from a more efficient utilization of expensive compute resources. By offloading KV Cache to a high-performance storage solution, you can achieve a higher inference throughput per accelerator. This means that fewer accelerators are needed for the same workload: You can handle a specific number of queries per second with 43% fewer accelerators, resulting in direct cost savings.

TCO model assumptions

The TCO calculation includes several key components:

Storage costs (list price): These are the costs of Managed Lustre. Testing used the 1000 MB/s per TiB Performance Tier. The TCO model includes sufficient Lustre capacity (73 A3-Ultra machines, with 18 TiB Lustre capacity per machine) to hit the 1m TPS target rate.
Compute costs (list price): A3-Ultra VMs each with 8x H200s GPUs and 8x 141 GB HBM (spot prices will be lower).

Performance benchmarks

Our experiments demonstrated Google Cloud Managed Lustre’s ability to deliver the high-performance I/O necessary with a state-of-the-art LLM. These experiments serve Deepseek-R1 on a Google Cloud A3-Ultra machine (8x H200s; 8x 141GB HBMs). The experiments ran a synthetic serving workload with a 50K token context and a high cache-hit rate (about 75% hit rate) with a total KV Cache size of about 3.4TiB. The memory-only baseline uses 1 TiB host memory for KV Cache. We experimented with two variants of Managed Lustre at high and low I/O parallelism. For high I/O parallelism, we utilized 32 I/O worker threads to read KV Cache data from Lustre in parallel.

Lustre improved total inference throughput by 75% and reduced the mean time to first token by 44% compared to using KV Cache in host memory alone.

Ready to optimize your inference workloads?

To get started with an external KV Cache solution that solves the capacity limits of long context windows and delivers significant performance gains on your large-scale LLMs, follow these steps:

1. Provision your infrastructure; create a Managed Lustre instance:

Provision your Lustre file system in the same region and zone as your target accelerators (GPUs or TPUs) for optimal low-latency access.
Deploy your inference engine: Deploy your LLM using a high-performance inference server like vLLM or a similar framework that supports an external KV Cache or paged-attention architecture.

2. Configure for performance

Once you’ve mounted Managed Lustre, you must configure your inference engine software to leverage the high-performance storage:

Implement direct I/O: Configure your application to access Managed Lustre using the o_direct flag. This bypasses the general-purpose file system cache, allowing the inference engine to manage the critical host memory more effectively.
Tune I/O parallelism: Depending on your inference KV Cache software, its out-of-the-box storage I/O parallelism may not be ideal. You may need to tune the KV Cache software to read KV chunk files with enhanced parallelism to maximize performance.

To take the next step, read the documentation about how to get started with Managed Lustre.

Read More for the details.

2025 10 31

GCP – Talking shop: 7 ways conversational AI agents open up possibilities for designers and developers

Tibor Kiss Cloud, Google Cloud gcp

Remember when online shopping meant typing specific keywords into a rigid search bar and endlessly scrolling through irrelevant results? Traditional e-commerce search, while common, helps only about 1 in 10 consumers find exactly what they’re looking for.

For designers and developers, this opens up possibilities to deliver on what users expect in the AI era. But what exactly are they looking for, and how can you design an experience that both delights, and helps get them what they need?Conversational AI is a significant leap in online search and shopping, moving towards more natural, personalized, and efficient interactions. By focusing on design principles that prioritize multimodal input, intelligent query handling, rich visual presentation, transparency, and accessibility, retailers can build AI experiences that meet user expectations and transform the online shopping journey.

Here are seven ways building conversational AI agents can improve the online shopping experience for shoppers– and how you can start designing them.

1. Smarter search that understands you

Gone are the days of finding the perfect keyword. AI-driven search, or “conversational” search, understands natural language, interpreting full-phrase queries, user intent, and context. This means your users can search more naturally, like asking, “What’s a good jacket for hiking in a rainy climate?” or “Show me red sneakers for under $100”. AI can also intelligently rank and prioritize the most relevant products based on context, user history, and trends.

To help users find what you’re looking for even faster, AI, such as Google Cloud’s Conversational Commerce agent, offers “predictive assistance”, suggesting completions as you type. When a query is ambiguous, the conversational AI agent can proactively ask clarifying questions. This reduces friction and improves product discovery.

2. Personalized recommendations

AI allows for a truly personalized shopping experience. It can suggest products based on users’ past behavior, preferences, and interactions such as conversation history. For travel, imagine a user booking a flight. They might prefer the window seat, and an AI agent notices that an aisle seat has been assigned to their ticket. You can design an experience that notifies the user if a window seat is available, so they have the option to switch. However, it’s important that the AI agent is transparent about why results are personalized, perhaps stating: “Recommended based on previous searches/bookings”, and always clarifying that users have the ability to modify or reset their personalization options.

3. Seamless conversational interaction

Users are increasingly interested in interacting with AI in a conversational way, much like using tools such as Gemini and AI Mode. This allows designers to ask questions in natural language about product availability, differences between items, best store locations, and more. Enhanced conversational capabilities can even help designers adapt to users’ styles and offer tailored prompts.

For example, according to our research, end users have expressed a desire for an “agentic experience” that’s more engaging. Asking clarifying questions when a query is ambiguous is a key part of this interaction.

These tools also support multimodal inputs, allowing users to search using voice, image, or text, or any combination. Voice search is particularly valued for its flexibility and hands-free convenience, especially on mobile devices. As a designer, you could design an experience where the user uploads an image of an item they saw, and ask the agent to see it in a different color – all by using their voice.

4. Addressing frustrations and enhancing comparison

One major pain point in online shopping is the uncertainty of item availability and receiving unsuitable substitutions when items are out of stock. AI can provide real-time stock information and suggest closely related alternatives if an item isn’t available.

Users also strongly desire better tools for “comparing products”, especially details like nutritional information, specifications on tech products, cars, even clothing. They want features like a “compare” button or the ability to see differences side-by-side on a single screen. AI-generated side-by-side comparison tables are highly valued by users as they help in making decisions between products.

5. Clear visuals and user-friendly design

Seeing product pictures and visuals alongside search results is crucial for online shopping. AI interfaces can effectively present results using visual layouts and features like “carousels”, which are particularly useful on mobile to showcase multiple relevant products without cluttering the screen. For designers, we recommend the following:

Rethink placement on mobile: For conversational features on mobile, placing the conversational UI at the top of the page pushes products down. Consider placements like the bottom of the screen, a flyout menu, or a side panel that allows users to browse products while interacting with the AI. Let the user have control over when conversation appears.
Prioritize a “co-browse” experience: A preferred design is an “integrated mode” where the AI assistant appears on the same page as the product results, allowing users to see products update in real-time as they refine their search with the AI. A side panel/fly-out was suggested as an ideal way to achieve this without being as cumbersome as a top-of-page element.
Use clear and intuitive labels: Descriptive labels like “shopping assistant” clarify the feature’s function.

6. Building trust and handling errors gracefully

Trust is a significant factor in user adoption of AI features. Users want clear source attribution for information provided by AI. In a shopping context, this translates to clearly showing product details, prices, and links to retailers.

When the AI can’t fully understand a query or finds no results, it should handle this gracefully. Instead of a simple “no results” message, it can offer intelligent suggestions, alternatives, or prompt the user for clarification, maintaining a productive dialogue.

7. Conversational commerce components library:

We have a downloadable component library on Figma accompanying the UX use cases, that can be used as a guiding kit to utilize designs as outlined in the UX documentation. It contains a collection of reusable UI elements reflecting our tech capabilities that are pre-designed and pre-built, allowing designers and developers to quickly adapt to their particular brand needs and quickly customize and incorporate into their projects.

Components include:

Device sizes
Color (Black/White/Tertiary Colors)
Typography
Component varieties (Buttons, Filters, etc)
Search input
AI prompt
More detailed results
Light/Dark Mode

In addition to speeding up implementation, the component library empowers teams with unmatched customization capabilities. With just a few clicks, designers and developers can easily tailor the experience to reflect their unique brand identity — adjusting everything from typography and color schemes to corner roundness and layout structure. This flexibility ensures that businesses don’t need to compromise between advanced AI functionality and maintaining a consistent, on-brand user interface. The components are built to scale and adapt, offering autonomy while reducing development overhead.

Get started

Designers and developers have the opportunity to meet consumers where they are using conversational AI. To get started with Vertex AI Search: Conversational Commerce Agent:

Contact sales
View the full documentation here
View the user experience guide here
View the UI components library here
View the developer guide here

Read More for the details.

2025 10 30

GCP – How Global Payments built a resilient architecture for scale with Cloud SQL

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: When payments are your product, downtime isn’t an option. To meet the critical demands of global availability, Global Payments partnered with the Google Cloud product team to architect a resilient, multi-region solution leveraging Cloud SQL Enterprise Plus. This collaboration ensures enhanced uptime and streamlined disaster recovery. This strategic adoption of Google Cloud’s advanced database capabilities empowers Global Payments to deliver always-on performance. In this blog, Principal Architect Govindaraj Palanisamy shares how his team is using Cloud SQL to deliver always-on performance.

At Global Payments, we power payment services around the world, across every kind of industry—from schools and hospitals to stadiums and gas stations. Whether it’s a small daily purchase or a critical invoice, every transaction matters. When people go to pay their bills, buy a snack, or get through a turnstile, they expect things to just work—without delay. This commitment means powering seamless transactions, and our collaboration with Google Cloud on resilient architecture is key to delivering a superior customer experience and achieving scalability. That expectation becomes even more critical for our Tier 1 systems, like the portals that handle invoice and utility payments.These apps need to be up 24/7, with near-zero tolerance for downtime or data loss.

To meet those demands, we chose Cloud SQL Enterprise Plus edition, which provides the kind of performance, scalability, and business continuity we need for global operations. Today, we use Cloud SQL for multiple workloads, including managed Cloud SQL for SQL Server for high-priority transactions and Cloud SQL for PostgreSQL and Cloud SQL for MySQL for value-added services. And thanks to recent innovations from Google Cloud, we’re getting even more out of the platform.This comprehensive adoption across all three key database engines underscores our deep confidence in Cloud SQL’s capabilities and, by extension, in Google Cloud’s robust platform. And thanks to recent innovations from Google Cloud, we’re getting even more out of the platform.

aside_block: <ListValue: [StructValue([(‘title’, ‘Build smarter with Google Cloud databases!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f027c382220>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

High expectations, high availability
For our Tier 1 workloads, we require 99.99% uptime, rapid failover, and recovery point objectives (RPO) of under a minute. These are systems that can’t afford to go down, even for maintenance. With Cloud SQL Enterprise Plus, we get:

Near-zero planned downtime (often under 1 second)
Less than a minute RTO and zero RPO using multi-zone HA configurations with synchronous replication
Easy disaster recovery orchestration and testing using advanced disaster recovery switchover and write endpoints

For less critical but still important workloads, like merchant-facing services, we use similar configurations with slightly more flexible recovery windows. But in all cases, we ensure that data protection, performance, and compliance requirements—including the Payment Card Industry Data Security Standard (PCI DSS), GDPR, and the NIST Cybersecurity Framework— remain paramount.

Architecture built for resilience
Our current deployment spans three Google Cloud regions, with each region running web and application tiers in Google Kubernetes Engine (GKE) and connecting to Cloud SQL through a Cloud SQL Auth Proxy. Every Cloud SQL database is replicated across zones and regions. Read replicas support low-latency reads, and cascading replication helps route read traffic away from write-heavy nodes to balance the load.

Global-Payment-Arch-Final — Fig. 1 – Global Payments’ Architecture Diagram featuring Cloud SQL & GKE

That distributed topology gives us options. With a three-region deployment, we’re able to keep a failover-ready region available—even in the rare event of a dual-zone failure—so we can shift traffic without disruption. If we anticipate traffic spikes—for example, during a seasonal billing surge—we can scale our read replicas to maintain consistent performance. And with point-in-time recovery and backup retention decoupled from the database instance, we’re covered even if an instance is deleted.

Uptime well spent
The results of our Cloud SQL deployment have been substantial:

Near-zero downtime during maintenance
Consistent performance with read/write separation
Streamlined disaster recovery testing (including quarterly failover drills driven by compliance requirements)
Up to 60% reduction in operational overhead

We also use query insights for automated alerting and performance monitoring, giving us better visibility into key metrics as we scale.

Cloud SQL’s managed services free up our team to focus on innovation, not on managing database infrastructure and with Cloud SQL Enterprise Plus, we have the performance and availability guarantees our clients expect. We’re already exploring new capabilities like managed connection pooling and read pools, which promise to simplify scaling and reduce latency even further. These are the kinds of enhancements that let us keep growing without outgrowing our infrastructure.

At Global Payments, reliability is business critical. Cloud SQL helps us deliver it.

Editor’s note: Govindaraj Palanisamy originally shared Global Payments’ story onstage at Google Cloud Next. During this session, Google Cloud also announced support for C4A virtual machines, powered by Google Axion, in Cloud SQL and AlloyDB. These VMs offer up to 65% better price-performance than current-generation x86 instances and up to 2x higher throughput than comparable Arm-based offerings. C4A is built to handle the scale, speed, and efficiency today’s enterprise databases demand.

Watch the full session or read the Axion announcement blog to learn more. And check out our web page to dive into the power of Cloud SQL.

Read More for the details.

2025 10 30

GCP – PQC in plaintext: How we’re helping customers prepare for a quantum-safe future

Tibor Kiss Cloud, Google Cloud gcp

Google has been working on making computing quantum safe for almost a decade. We’ve been rolling out post-quantum cryptography in our infrastructure for internal and customer services, and now we’re at the level where you can get hands-on — and we’re here to help.

We’ve developed a threat model, invested in cryptographic libraries, BoringSSL, key rotation, and encrypted our internal traffic — as part of mitigating Store Now, Decrypt Later (SNDL) attacks (sometimes referred to as Harvest Now, Decrypt Later). We’re also actively participating in industry working-groups.

We’re phasing our effort to migrate to a quantum-safe state across our public-facing services in a way that closely mirrors the internal work we’re doing. We’ve focused on three key areas: protecting data in transit, securing long-lived digital signatures, and transitioning our public key infrastructure. Today, we’re detailing the current status of each.

Protecting data in transit

A huge number of key exchanges take place each day to establish cloud connections. Without PQC protections, keys could be captured for use in SNDL attacks. To protect your data in transit, Google has deployed the new standards-based PQC method, ML-KEM (FIPS 203). We use it both as a standalone algorithm and in hybrid configurations that pair it with traditional key exchange methods.

What we’ve done: We’ve migrated key exchanges for internal traffic to ML-KEM. All Google and select Google Cloud-native services are safeguarded by default using Google Cloud network encryption, using ML-KEM for cryptographic key exchange. Customer workloads running on Google Cloud benefit from this protection with no further action needed. We are currently working on migrating public APIs and services that need protections beyond those provided by the Google Cloud Networking stack.

Additionally, Google Cloud KMS has now introduced ML-Key Encapsulation Mechanisms (including generation, encapsulation, and decapsulation) in preview, providing a foundational tool for quantum-resistant encryption. ML-KEM allows you to start your post-quantum cryptography experimentation by creating and testing next-generation keys.

Next steps for you: We’re enabling our public API endpoints and client libraries to support ML-KEM, providing quantum-safe protection from store-now, decrypt-later attacks when you access Google Cloud services. For your administrators, we are providing the tools to manage this transition. They will have the ability to opt-in to PQC protection, monitor its adoption, and enforce its use across your organization.

Securing long-lived digital signatures

Digital signatures that need to remain valid for decades are high-value targets for future quantum attacks. We are focused on transitioning to ML-DSA (FIPS 204) and SLH-DSA (FIPS 205) to protect against forged signatures that could undermine essential properties like data integrity, authenticity, and non-repudiation.

What we’ve done: We’ve built native support for generating and using ML-DSA and SLH-DSA in Cloud KMS, allowing you to implement new post-quantum roots of trust for uses including software, firmware, and document signing.
Next steps for you: Identify and prioritize migration of your long-lived digital signatures. We will provide tools to help you audit your cryptographic assets and implement a transition plan using PQC capabilities in Cloud KMS.

Transitioning our Public Key Infrastructure

Securing communications requires updating our PKI with quantum-safe signing algorithms. Our phased approach is enabling quantum-safe certificates in our cloud-based Certificate Authority Service (CAS).

What we’ve done: With the added support for ML-DSA keys, we are laying the foundation to use those keys to sign the root or intermediate certificate authorities in your PKI systems of the future.. Standards for combined signatures using both classical and PQC signatures are still evolving. Google is active in the standards process and working to ensure customers have robust signing options that are both pure PQC and hybrids when needed. Because the cryptographic community has yet to reach a full consensus on the best practices for combined signature schemes, we are not yet offering a native API for them in Cloud KMS. As organizations agree to settle on standards, we will begin to offer standards-based solutions.
Next steps for you: To help minimize your efforts, Google Cloud will continue to update guidance on how to best ensure your PKI can validate and trust both legacy and new PQC certificates.

What’s next for PQC

Google is investing in post-quantum cryptography as a long-term program, and we have more milestones to achieve and approaches to discuss before we can be secure against quantum-driven attacks. For more information, please visit our post-quantum cryptography hub.

Read More for the details.

2025 10 30

GCP – Master multi-tasking with the Jules extension for Gemini CLI

Tibor Kiss Cloud, Google Cloud gcp

Tasks like fixing minor bugs, patching vulnerabilities, or updating dependencies can interrupt your workflow and drain productivity, whether you’re deep in a coding session or just starting your day.

With the new Jules extension for Gemini CLI we announced yesterday, you can delegate tasks to Jules using the /jules command right from Gemini CLI. Jules acts as your autonomous sidekick working asynchronously in the background, letting you stay actively focused in the Gemini CLI on the code that matters most to you. Try the Jules extension for Gemini CLI today!

In this post we explore several ways you can use the Jules extension in the Gemini CLI.

Tackle multiple GitHub issues in parallel

Instead of manually addressing GitHub issues one by one, you can boost the process using the Jules extension. Tianzi shows how you can use the GitHub API in the Gemini CLI to read a list of open security issues directly from a repository. From there, you can instruct Jules to work on them in parallel. Jules addresses each issue in its own isolated environment and prepares the fixes for your review.

A YouTube video on how to fix multiple GitHub issues at once using the Jules extension for Gemini CLI

Orchestrate complex workflows by combining the Jules extension with other Gemini CLI extensions

You can also combine Jules with other Gemini CLI extensions to create powerful, automated workflows. We have a growing list of Gemini CLI extensions you can try in combination with the Jules extension. Here are a few examples of powerful new workflows using multiple extensions.

Fixing security vulnerabilities in the background with the Jules and Security extensions

In this demo, we use the Security extension for Gemini CLI to generate critical security analysis reports, then delegate the fixes to the Jules extension to work on while we step away. We ask Jules to autonomously investigate the security report findings and work on the necessary code fixes. Jules publishes the corrected code to a new branch for a final review.

A YouTube video showing how to use the security extension for Gemini CLI with the Jules extension

Automated bug fixing & unit testing with the Jules and Observability extensions

Streamline your development with automated bug fixing and unit testing using the Observability extension and Jules extension for Gemini CLI. In this demo, we use the Observability extension to retrieve the top crash for our app and then we delegate the entire error investigation task to Jules. Jules acts as an autonomous agent to not only find the necessary code fix but also add a unit test to prevent future recurrence.

A YouTube video showing the observability extension and Jules extension for Gemini CLI

How do you use the Jules extension?

We’d love to hear how you’re using the Jules extension for Gemini CLI to speed up your development workflows. Please let us know in the Jules extension repository. To show your support, please consider giving the repo a star.

Get started

Read More for the details.

2025 10 29

GCP – A New Era: Highlights from Google Public Sector Summit

Tibor Kiss Cloud, Google Cloud gcp

Welcome to a new era, marked by unprecedented innovation, greater efficiency, elevated citizen experiences, and profound mission impact, all driven by groundbreaking advancements in technology. At the Google Public Sector Summit today in Washington D.C., public sector leaders are discussing the implications of these opportunities and how to re-imagine what’s possible.

Organizations across the public sector are rewriting the playbook; making tough decisions and charting a new path of transformation. The public sector is not just keeping pace with the commercial sector; it is helping lead the way. We surveyed eight industries as part of our ROI of AI study, and found that nearly half (46%) of respondents from public sector organizations reported that productivity has doubled or more as a result of gen AI. 42% of public sector respondents reported their organizations have deployed more than ten AI agents.

Looking ahead, public sector respondents indicated that increased AI agent deployment was a top objective, higher than any other industry surveyed.

Empowering public sector employees with agentic AI and skills training

To help scale these successes, we are bringing the best of Google AI to every employee with Gemini for Government, a comprehensive and secure platform. We’ll be demonstrating this platform during the Summit. It includes: an AI Agent Gallery; agent-to-agent communication protocols; connectors into enterprise data sets; pre-built AI agents; and an open platform that enables agencies to choose the right agents for their users – whether built by Google, third-parties, or government agencies themselves. Being able to launch and monitor agentic use cases through Gemini for Government gives agencies flexibility and control. They can closely manage and scale agency-wide agent adoption with user access controls, AI agent provisioning, and multi-agent coordination.

And, because this new era requires new skills, attendees at the Summit can expect direct, hands-on learning opportunities where they can build their own AI agents and dive deep into topics like cloud security in expert-led labs. They will also be able to earn valuable professional credentials, including Continuing Professional Education (CPE) credits and a Google Skills Badge, while networking with industry leaders and peers.

Real-world impact in accelerating mission success

Of course, the true measure of this new AI era is its real-world impact. Throughout the Summit, we will spotlight government and higher education leaders who are already turning the promise of AI into reality. Here are just a few of the speakers and stories we are highlighting:

Lockheed Martin is collaborating with Google Public Sector to integrate Google’s Gemini models into its secure AI Factory, empowering its teams to develop solutions for the nation’s most complex operational challenges.
The City of Los Angeles is deploying Google Workspace with Gemini to its workforce, empowering employees to automate tasks and deliver faster, more accessible public services for its residents.
Maryland Department of Information Technology (DoIT) is rolling out Google Workspace across its state operations, empowering more than 40,000 employees to modernize public services and support workforce collaboration.
Old Dominion University (ODU) is creating the MonarchSphere AI incubator, empowering its educators and researchers to embed AI into teaching and operations to accelerate discovery and provide students with new career opportunities.

Solving the public sector’s most complex challenges requires a powerful and open ecosystem. At Partner Connect – our companion event for public sector partners hosted before the Summit – we announced new investments to our partner programs, including new expertise badges, training and financial support for public sector pilots.

We are also incredibly proud of this week’s announcements on the deepening partnership between Google Cloud and NVIDIA, introducing new capabilities designed to strengthen our end-to-end platforms for the AI lifecycle. Together, we will bring our Gemini models to on-premises and even fully air-gapped environments, ensuring the most sensitive government missions can leverage the latest, most secure AI.

Today at the Summit, I will be joined on stage for a fireside chat with two leading voices in technology: Jensen Huang, Founder, President and CEO of NVIDIA, and Thomas Kurian, CEO of Google Cloud. We will discuss our partnership and shared vision for the future of AI in the public sector.

Google Public Sector is your mission partner

We are proud that Google is the partner that public sector organizations turn to when they want to be bold, take calculated risks, and transform. We know AI is a big part of the reason why. Gemini for Government is the new front door for the best of Google’s AI-optimized, secure and accredited commercial cloud services, our industry-leading Gemini models, and agentic solutions. It brings the best of Google AI to every mission.

We encourage you to follow Google Public Sector on Linkedin using the hashtag #GooglePSSummit for highlights and links to additional resources.

Read More for the details.

2025 10 29

GCP – Announcing docs.cloud.google.com: The new home for Google Cloud documentation

Tibor Kiss Cloud, Google Cloud gcp

We’ve reached a significant milestone in improving our documentation experience with the launch of a new, dedicated home for all Google Cloud technical documentation at https://docs.cloud.google.com.

This initiative is the latest step in our AI-first transformation of how we deliver the technical documentation you rely on every day.

Building a new foundation

AI is transforming the way we work, and it’s changed how we read and use documentation. We know that excellent AI experiences are powered by excellent content.

By moving all our technical documentation to one, dedicated platform, we’ve created a unified foundation that makes it much easier to build the next generation of AI-driven experiences—from smarter, context-aware assistance to more deeply integrated help—right where you need it.

We’re also using this opportunity to improve how we create content. We’re not just building AI features for our documentation; we’re using AI to build the documentation itself.

“To accelerate the creation process, we have integrated Gemini directly into our writers’ authoring environments. This acts as a productivity multiplier…” [Source].

A faster, more global experience today

You’ll notice a faster, more responsive experience where you can quickly find the content you need. We achieved this by streamlining code and reducing backend complexity, all to make the site work better for you.

Our translation process is now AI-powered. This shift has allowed us to translate a significantly greater volume of content, faster than ever, making documentation for most of our products available in 12 languages.

What this means for you

We know that changes to a site you use every day can raise questions. Here’s a quick guide to what’s happening.

Where should I go for documentation?

You can start using https://docs.cloud.google.com today. This is the new, primary home for all Google Cloud technical documentation.

What will happen to cloud.google.com?

Content like the Google Cloud Blog, product discovery pages, and pricing information will remain on cloud.google.com indefinitely.

All of the documentation that was on cloud.google.com is now available on docs.cloud.google.com. Over the next few weeks, we will begin redirecting visitors from cloud.google.com documentation pages to our new domain.

What about my bookmarks and URLs?

In most cases, the only thing that’s changing is the domain. If you have workloads that rely upon content at specific URLs, you can begin updating them–but the old links will continue to work for the foreseeable future.

To ensure a smooth transition and positive experience, we maintained the existing URL patterns. The new site’s structure generally mirrors the old one, preserving the links you rely on.

For example:

Old URL: https://cloud.google.com/compute/docs/instances
New URL: https://docs.cloud.google.com/compute/docs/instances

What’s next?

This new platform unlocks the potential for many more advancements. We’re committed to building on this foundation to deliver a truly world-class developer experience in the months ahead.

If you want to hear about the latest changes coming to our documentation, check out our What’s New in Cloud docs page.

This post includes additional contributions from Brett Johnson, Head of Google Cloud Developer Knowledge Engineering.

Read More for the details.

2025 10 28

GCP – The Blueprint: How Giles AI transforms medical research with conversational AI

Tibor Kiss Cloud, Google Cloud gcp

Welcome to The Blueprint, a new feature where we highlight how Google Cloud customers are tackling unique and common challenges across industries using the latest AI and cloud technologies. We hope to inspire others looking to innovate in their work.

The challenge:

Giles AI is a London-based startup that helps healthcare and life sciences organizations quickly extract insights from fragmented data, whether that data is available in an online repository (e.g. PubMed, NICE, the FDA etc.), local documents, or internal IP. Users can connect to internal and external data repositories and upload documents and images to the Giles AI platform; this integration allows users to the combined knowledge base for insights more quickly and efficiently, using natural language prompts and an intuitive interface.

As Giles AI grew in popularity, our incumbent cloud provider struggled to cope with complex data flows, new LLMs, and external APIs. Latency increased, slowing the user interface and impacting critical activities. Engineers also required a more agile development environment. Security is also a foundational feature of Giles AI and everything we build — with clinical, medical, and healthcare standards in mind, sensitive data must be protected at every step, both at rest and in transit.

The solution:

Giles AI leveraged Google Cloud’s modular, API-friendly, microservices-based architecture to minimize latency, easily manage complex clinical data flows in real-time, and capitalize on the latest and greatest AI foundation models as they are released.

Backend service orchestration in Google Kubernetes Engine and lightweight microservices in Cloud Run are complemented by specialized workloads in Compute Engine to keep the Giles AI platform available, flexible, and scalable without the heavy management and maintenance demands of legacy infrastructure. Cloud Load Balancing ensures efficiency.

Cloud SQL, Cloud Storage, and Document AI help the Giles AI platform manage structured and unstructured data and extract insights. Under the hood, Vertex AI handles model selection and prompt orchestration. The system is model-agnostic by design, enabling Giles AI to route queries to the most appropriate language model including hundreds available through Model Garden on Vertex AI.

With this highly flexible approach, Giles AI is able to deliver numerous healthcare and life sciences use cases from systematic literature reviews and regulatory reviews to meta-analyses, data extraction, and patient eligibility screening — all with high levels of accuracy and agreement.

To enhance security, we’re leveraging Cloud Armor to defend against Web-based attacks and Security Command Center to keep a close eye on its posture. Google Cloud regional databases help Giles AI localize data at rest — a critical need given healthcare regulations.

The architecture:

The conclusion:

What we love about Vertex AI is that it supports our AI workflow experimentation. In simple terms, this means we can plug any LLM of our choice in and out of our workflow, drawing from the hundreds of models available in the Model Garden on Vertex AI. This provides amazing flexibility and efficiency, which is key to our success.

So far, the results of our migration have been impressive.

One of Giles AI’s early customers achieved an 85% reduction in the time required for clinical research tasks and over 94% response accuracy, with references provided when they wanted to be certain and verify. This customer was so compelled with the results that they went on to make a significant investment into the company and became a strategic partner.

Latency, uptime, and scalability have all improved significantly, even with complex, multi-layered data queries. From an internal perspective, Giles AI has seen an increase in developer velocity, with infrastructure-as-code and managed services reducing engineering overheads.

The Giles AI generative AI assistant interface

Looking to the future, our team at Giles AI is excited for the potential of Google Cloud’s AI foundation models designed for the medical community. These include MedGemma, a family of open-source AI models tailored for medical applications, and TxGemma, a suite of open therapeutic-language models derived from Gemma 2 that help streamline drug discovery and development.

With these powerful tools on the horizon, Giles AI is poised to deliver smarter, more verticalized decision-making across the entire healthcare R&D pipeline. For clients, this means turning complex data into real-world breakthroughs, faster than ever before.

Read More for the details.

2025 10 28

GCP – Unlock the AI performance you need: Introducing managed DRANET for A4X Max on GKE

Tibor Kiss Cloud, Google Cloud gcp

As AI/ML models grow, their infrastructure demands are pushing traditional networking to its limit, creating critical performance bottlenecks. This is especially true for models running on Kubernetes and Google Kubernetes Engine (GKE).

At Google, we’ve been working in the open-source community to make Kubernetes aware of specialized hardware capabilities. For example, we’ve been active in developing the Kubernetes Dynamic Resource Allocation (DRA) framework, a generic API for specialized hardware. Building on DRA, we proposed the Dynamic Resource Allocation for Networking, or DRANET, which extends the DRA API to manage network interfaces as first-class, schedulable resources, with a focus on performance.

Today, we are proud to announce a preview managed DRANET in Google Kubernetes Engine (GKE), launching first with our brand-new A4X Max instances. With this release, Google Cloud is deploying managed DRANET into production, starting with the A4X Max. Managed DRANET offers an enterprise-grade, integrated solution to intelligently allocate high-performance network interfaces alongside accelerators on Kubernetes, addressing the core challenges of network performance and operational complexity for demanding AI workloads.

Hidden performance bottlenecks in AI networking

DRANET on GKE is specifically designed for AI workloads that run across multiple GPUs. Modern accelerator instances like the new A4X Max use multiple high-throughput RDMA network interfaces to feed those powerful GPUs. However, the traditional Kubernetes Networking interface has limitations that make it hard to take full advantage of these networking capabilities:

Topology blindness: Peak performance requires network interface alignment. To reduce latency, the GPU and its network interface must be physically “close,” ideally on the same non-uniform memory access (NUMA) node. The default Kubernetes scheduler is unaware of this hardware topology, which can lead to sub-optimal pairings and severely degraded performance.
Poor operational performance: The inability to co-schedule NICs and GPUs also leads to sub-optimal resource utilization. This impacts overall cluster performance and efficiency, as schedulers cannot effectively match available accelerators with the specific network interfaces they require.

How GKE with DRANET unlocks performance

When powered by our managed DRANET integration, GKE’s control plane delivers higher performance through:

Intelligent alignment for higher throughput: This is the core performance win. GKE can now allocate network interfaces that are NUMA-aligned with the assigned GPUs, resulting in lower latency and higher throughput. NUMA alignment can be critical: as detailed in our DRANET research paper, we saw bus bandwidth increased by up to 59.6% during a set of internal tests.
A dynamic resource specification: DRANET allows you to dynamically express your workload’s networking needs directly in your pod specification. You can ask for a specific number of high-performance network interfaces right alongside your GPU request. GKE then ensures your pod is only scheduled to a node that has both the required GPU and the specific network interfaces available.

These are sophisticated, complex processes, but with managed DRANET on GKE, the complexity is abstracted away. You get the performance of a topology-aware cluster with the flexibility and simplicity of a mature, enterprise-grade container orchestration platform.

DRANET and the new A4X Max: a perfect match

Managed DRANET for GKE arrives just in time for the Google Cloud A4X Max instance, our new flagship AI platform based on the NVIDIA GB300 NVL72 rack-scale system. These instances are built for extreme-scale AI and feature multiple RDMA interfaces.

Managed DRANET on GKE unlocks the full performance of this hardware, ensuring every GPU has the dedicated, aligned, low-latency network path it needs. For a deeper dive into the A4X Max instance itself, please read our full launch blog [add-link-here].

The future of AI networking on GKE

The launch of managed DRANET on GKE is a milestone, shifting Kubernetes from topology-agnostic to topology-aware resource management. That’s the power of Google Cloud: innovating and leading a powerful open-source concept, and delivering it as a simple, scalable, and managed solution.

To learn more about DRANET and get started:

Read the A4X Max launch blog
Get started with DRANET on GKE
Explore the open source project
Learn more in the DRANET open source blog
Go under the covers in the DRANET research paper

Read More for the details.

2025 10 28

GCP – Expanding our NVIDIA partnership: Now shipping A4X Max, Vertex AI Training, and more

Tibor Kiss Cloud, Google Cloud gcp

Today’s AI models are moving from billions to trillions of parameters, and are capable of complex, multi-modal reasoning. This leap in sophistication demands a new class of purpose-built infrastructure and software to handle the immense computational and memory requirements of these next-generation models.

At Google Cloud, we’re committed to empowering developers and organizations to build and deploy what’s next in AI. Today, we are excited to deepen our partnership with NVIDIA with a suite of new capabilities that strengthens our platform for the entire AI lifecycle:

New A4X Max instances powered by NVIDIA’s GB300 NVL72, purpose-built for multimodal AI reasoning tasks
Google Kubernetes Engine (GKE), now supporting Dynamic Resource Allocation Kubernetes Network Driver (DRANET), boosting bandwidth in distributed AI/ML workloads
GKE Inference Gateway, now integrating with NVIDIA NeMo Guardrails
Vertex AI Model Garden to feature NVIDIA Nemotron models
Vertex AI Training recipes on top of the NVIDIA NeMo Framework and NeMo-RL

Let’s take a closer look at these developments.

A4X Max with NVIDIA GB300 GPUs

A4X Max is now shipping in production. These new instances, powered by NVIDIA GB300 NVL72, are optimized for the most demanding, multimodal AI reasoning workloads. A4X Max includes 72 Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs connected with NVIDIA’s fifth-generation high-speed GPU interconnect NVIDIA NVLink to function as a single, unified compute platform with shared memory and high-bandwidth communication. Together with Google’s Titanium ML adapter and Google Cloud’s Jupiter network fabric, A4X Max is purpose-built to scale to tens of thousands of GPUs in non-blocking, rail-optimized clusters. Compared to A4X powered by NVIDIA GB200 NVL72, A4X Max delivers 2x the network bandwidth on each system.

A4X Max leverages Google Cloud’s Cluster Director, letting you combine optimized compute, networking, and Google’s storage offerings into a cohesive, performant, and easily managed environment. Cluster Director manages the complete lifecycle of A4X Max clusters — from provisioning and topology-aware placement across the NVL72 domains, to providing powerful observability and resiliency capabilities. It integrates with optimized storage solutions like Managed Lustre, while a managed pre-configured Slurm environment offers fault-tolerant scalable job scheduling for A4X Max. Cluster Director also provides deep observability into job and system performance across the GPUs, NVLink and DC networking fabrics. To maximize throughput, Cluster Director helps ensure high reliability with features like automatic straggler detection and in-job recovery. Cluster Director capabilities like topology aware scheduling, maintenance management, and faulty node reporting are also available transparently through Google Kubernetes Engine (GKE), enabling customers to stay in the GKE environment while running A4X Max.

What all this this means for your workloads:

Optimized reasoning and inference: With its 72-GPU NVLink domain, delivering 1.5x FP4 FLOPs, 1.5x HBM memory capacity, and 2x the network bandwidth compared to A4X, A4X Max is specifically designed for low-latency inference, especially for the largest reasoning models. When integrated with GKE Inference Gateway, you benefit from prefix-aware load balancing, improving Time to First Token latency for prefix-heavy workloads. Disaggregated serving can also be enabled to further optimize performance. This is achieved by leveraging Inference Gateway, llm-d, and vLLM together, resulting in significant throughput improvements.
Enhanced training and serving performance: With more than 1.4 exaflops per GB300 NVL72 system, A4X Max offers a 4x increase in LLM training and serving performance compared to A3 VMs powered by NVIDIA H100 GPUs.
Maximum scalability and parallelization: Based on RDMA over Converged Ethernet (RoCE), A4X Max’s networking fabric delivers low-latency high-performance GPU-to-GPU collectives for distributed training and disaggregated serving workloads. By leveraging a new data-center-scaling design, A4X Max clusters can be 2x larger compared to A4X clusters.

The preview of A4X Max instances comes on the heels of our new G4 VMs powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, and support for NVIDIA Omniverse libraries. Taken together, these offerings underscore our commitment to delivering an end-to-end platform for every AI workload, while our deepening partnership with NVIDIA provides you with a powerful, comprehensive ecosystem to build what’s next in AI.

Increased RDMA performance with GKE DRANET

Today, we’re deploying managed DRANET into production, starting with A4X Max. By enabling topology-aware scheduling of GPUs and RDMA network interface cards, DRANET boosts bus bandwidth for all-gather and all-reduce operations in distributed AI/ML workloads. This translates to improved cost efficiency due to better VM utilization. It does this by scheduling GKE Pods on nodes where the RDMA device and the GPU have the best possible connectivity. DRANET also simplifies RDMA management by making RDMA devices first-class, native resources within GKE. Learn more about DRANET for GKE here.

GKE and NVIDIA NeMo Guardrails

As organizations deploy their AI models into production, they must ensure their safety, security, and responsible behavior. Today, we are announcing the integration of NVIDIA NeMo Guardrails with GKE Inference Gateway, an extension to GKE Gateway for serving generative AI applications.

GKE Inference Gateway optimizes model serving with features like model-aware routing and autoscaling, while NeMo Guardrails add a critical layer of safety, preventing models from engaging in undesirable topics or responding to malicious prompts. Together, they offer a secure, scalable, and manageable inference solution to speed up your AI initiatives.

Vertex AI Model Garden to feature NVIDIA Nemotron models

To give developers greater choice and performance, Vertex AI Model Garden will soon have support for NVIDIA’s Nemotron family of open models as NVIDIA NIM microservices. This integration — starting with the upcoming availability of the NVIDIA Llama Nemotron Super v1.5 model — will give developers and organizations access to the NVIDIA’s latest open-weight models directly within Vertex AI. With a Vertex AI managed deployment, you can rapidly develop and deploy custom AI agents powered by Nemotron models, all while maintaining control over performance, cost, and compliance.

Models deployed through Vertex AI offer the following benefits :

Granular control over your deployments, with the ability to optimize for performance or cost by selecting from a wide range of machine types and Google Cloud regions.
Robust security by deploying models entirely within your own VPC and adhering to your VPC-SC policies.
Incredible ease of use — you can discover, license, and deploy these cutting-edge models in just a few clicks.

Vertex AI Training with NVIDIA NeMo Integration

Vertex AI Training provides the essential control and flexibility enterprises need to adapt foundation models to their proprietary data. To accelerate the creation of highly accurate, proprietary models, we are announcing expanded capabilities in Vertex AI Training that simplify and accelerate the path to developing large-scale models.

Customers benefit from a fully managed and resilient Slurm environment that simplifies large-scale training. Automated resiliency features improve cluster uptime. Our comprehensive data-science tooling removes much of the guesswork from complex model development. Finally, curated and optimized pre-training and post-training recipes built on top of standardized frameworks like NVIDIA NeMo and NeMo-RL empower builders to move from a novel idea to a production-ready, domain-specialized model with greater speed and efficiency.

Take the next steps

These updates enhance the capabilities and flexibility of our Google Cloud platform for running AI workloads. You can choose between the flexibility and control of infrastructure as a service (IaaS) with Google Compute Engine or GKE with Cluster Director; or the fully managed, end-to-end experience of Vertex AI, which provides a secure, scalable, and simplified workflow to train, tune, and manage models.

Together, these infrastructure innovations represent a significant step forward in our mission to provide a complete platform for AI development and deployment. The combination of Google Cloud’s infrastructure and NVIDIA’s latest technology provides a solid foundation for building the next generation of AI applications.

To get started with the A4X Max preview, please contact your Google Cloud sales representative. Vertex AI Training, meanwhile, has everything you need to transform your models into proprietary assets that define your business advantage. To deploy and manage AI models at scale with enterprise-grade security and efficiency, learn how GKE Inference Gateway can help you serve inference workloads. We are excited to see what you will build.

Read More for the details.

2025 10 28

GCP – Enabling a safe agentic web with reCAPTCHA

Tibor Kiss Cloud, Google Cloud gcp

The emergence of the agentic web — an internet where autonomous AI agents can independently execute complex, multi-step tasks and transactions that previously required human interaction — promises a fundamental shift for how customers and businesses interact. While agents can help deliver a frictionless customer experience, they can also enable new abuse and fraud vectors.

In the agentic web, automation raises key questions for enterprise fraud and risks teams to address, including:

How do you identify an AI agent and the user behind it to ensure the agent hasn’t been taken over?
How do you determine the legitimate intent of the agent’s tasks and that the agent has not gone rogue?
How do you mitigate risks when malicious AI has the ability to solve the very challenges designed to stop them?
How do you enable safe agentic commerce when new communication surfaces such as agent-to-agent and agent-to-services are prevalent?

At Google Cloud, we believe preventing fraud and abuse in the agentic web should fundamentally result in a simpler customer experience. To deliver this safe agentic web, we must evolve from pure prevention to active enablement. As such, we are building a proactive, trust-based model founded on a framework for agentic trust.

A framework for agentic trust

The stakes for every enterprise are high. Consider a high-demand product launch: 10,000 individual customers task their personal AI agents to each buy one item the moment it drops. This is a high-value, desirable use case. Now consider one malicious scalper deploying 10,000 agents to buy the entire inventory for resale. To a traditional security system, both scenarios look like an identical “attack.”

If your system can’t tell the difference, you either block your best customers or fail your entire launch. It’s no longer just about detecting automation; it’s about differentiating intent, and challenging that intent when risks are detected.

In this agentic web, the most fundamental question is, how do you protect your businesses from fraud and abuse and at the same time deliver an autonomous and frictionless agentic ecommerce experience?

Agent and user identity (knowing who it is)
Like human users, AI agents should have their own trusted identities and be accountable for all the activities they perform. In the agentic world, there are agents that act on behalf of the user, and they can leverage the user’s existing session and context. There are also agents that operate remotely as a cloud service, such as Gemini, while performing tasks for the users.

It’s critical that businesses have visibility and control on both the agent and user identities, as well as their relationships, to prevent attacks such as agent takeovers. At Google, we are actively identifying and labeling agentic activities, integrating with different agent identity protocols (including SPIFFE and Web Bot Auth), and building flexible controls to challenge and block agents based on identities and behaviors.

By using Google’s fraud intelligence protecting billions of accounts, over 7 million sites, and 50% of the Fortune 100, Google Cloud can deliver unparalleled visibility into agent and user identities to prevent takeovers.

Agent behavior (analyzing what it’s doing)
To effectively stop fraud in the agentic world, you need to look beyond identity and continuously analyze an agent’s actions and intent in real time. That’s because a trusted identity isn’t enough to stop potential attacks from compromised or rogue agents performing fraudulent activities.

We are building dedicated risk models that segment traffic into “agentic” and “non-agentic,” combine an agent’s identity with its live actions, and allow our systems to perform comprehensive risk and trust analysis on both good and bad activities. The adaptive analysis can allow us to understand intent by scrutinizing the sequence of actions of users, weigh the reputation of a signed-in account, and analyze behavior over time.

By using Google’s unique and global insights, we are protecting customers at the web layer, and also new communication surfaces such as agent-to-agent and Model Context Protocol (MCP) layers.

Mitigation (responding effectively)
When high risk is detected, the response must be effective against an AI — not just simple bots. That’s why we are investing in a new class of AI-resistant challenges, which are explicitly designed to be economically unviable for AI to solve at scale, since software can’t affordably fake a unique human interacting with a unique piece of hardware.

A prime example is our new mobile-device based challenge, which requires a user to scan a QR code with their physical mobile device in order to provide a high-assurance attestation that a unique human is present. This new approach provides stronger, AI-resistant security, and also effectively breaks the business model for large-scale attackers. It can simplify the challenge process, too, providing a better experience for end-users.

Enablement (accelerating business)
In the new agentic web, trusted agents will act on behalf of shoppers, finding the best value, and making purchases. Google recently announced the Agent Payments Protocol (AP2), an open protocol developed with leading payments and technology companies to securely enable this use case. However, additional security guardrails must be put in place to mitigate the risk of attacks with the new agentic commerce protocols.

Today, customers can already use reCAPTCHA’s transaction risk API to detect and prevent scaled carding attacks and increased chargebacks. In addition, we are actively working to deepen the integration of reCAPTCHA transaction fraud detection models directly into Google Cloud’s AI services to ensure a seamless customer integration and end user commerce experience.

An invitation to build the future

Ultimately, this framework provides the visibility and control needed to shift from prevention to enablement. We’re enabling a safe agentic web by empowering you to create nuanced security strategies that blocks threats and confidently accelerates trusted interactions. This agility means you can always deploy the right response for the right situation, fostering an environment for your legitimate users and their agents.

For organizations building on Google Cloud, we are natively integrating reCAPTCHA‘s agent-aware security with Google Cloud’s powerful AI services including Vertex AI, Agent Engine, and Gemini Enterprise, and providing a platform for you to build innovative agentic services and deploy them with the confidence that they are secure from the ground up.

The agentic web will redefine digital interaction. Fraud and risk business and security leaders can use this pivotal moment to help their organizations stay on top of the agentic future. By evolving into enablers of this new agentic web, they can help drive the next phase of business growth and build a foundation of digital trust.

To learn how reCAPTCHA‘s solutions can protect your business from fraud today and help you build safe, frictionless agentic experiences for tomorrow, we invite you to have a conversation with our team to explore this framework and prepare for the next generation of digital business.

Read More for the details.

2025 10 28

GCP – From Oracle transactions to AI actions: Activate your data with intelligent automation

Tibor Kiss Cloud, Google Cloud gcp

Many enterprises have built their foundational business operations on robust transactional systems powered by Oracle Database. And with Oracle on Google Cloud, they can deploy and manage Oracle Database instances directly within Google Cloud’s highly scalable and secure infrastructure, benefiting from low-latency network connectivity and integrated services.

But in today’s digitized world, data utilization is crucial for competitive advantage. Oracle excels at Online Transaction Processing (OLTP). However, to fully leverage its analytical capabilities and integrate it with cutting-edge AI, seamless integration with a scalable, cloud-native data platform like Google Cloud’s BigQuery is often essential.

In this blog post, we discuss best practices for moving Oracle data into BigQuery. Then, once it’s there, we discuss some of the things that you can now do with that data, and how. Finally, we present some short examples. Let’s get started.

Bridging the gap: Connecting Oracle data to BigQuery

The journey to advanced analytics begins by bringing your Oracle data into an environment designed for scale and analytical workloads. Luckily, Google Cloud offers several powerful services that can help to facilitate this.

Your first thought might be to use ODBC/JDBC drivers — and in many respects, you’re not wrong. However, it’s crucial to understand their primary role in this context.

The Google Cloud documentation on ODBC/JDBC drivers for BigQuery describes how client applications and reporting tools (which might be running on Oracle, or simply need to connect to BigQuery for data access) can use these drivers to query BigQuery. These drivers establish a direct connection between an application and BigQuery, acting as an intermediary to translate SQL queries and retrieve results. For example, an application could use a JDBC driver to connect to BigQuery over TCP/IP, send a SQL query string, and receive a result set back in a structured format. In other words, these drivers are primarily designed for interactive querying and reporting, rather than large-scale, continuous data movement from transactional systems.

To truly integrate Oracle operational data into BigQuery for analytical purposes, the most efficient and recommended approach involves continuous data replication. Google Cloud’s Datastream for BigQuery stands out as a key solution. Datastream enables low-latency Change Data Capture (CDC), capturing transactional changes from your Oracle source database at the redo-log level, streaming real-time inserts, updates, and deletes from your Oracle databases directly into BigQuery. Datastream handles schema evolution and data-type conversions, helping to ensure data integrity and consistency between Oracle and BigQuery. This means your analytical datasets are always fresh and ready for immediate insights. Then, for less frequent updates or large historical loads, you can also stage data in Google Cloud Storage and then load it into BigQuery, using BigQuery Data Transfer Service or bq load commands or queried directly via BigQuery external tables; this allows BigQuery to read data directly from Cloud Storage without explicit loading.

Unlocking BigQuery Analytics and AI

Once your Oracle data is in BigQuery, a world of possibilities opens up. BigQuery provides a fully managed, serverless, and highly scalable data platform that can easily handle petabytes of data. Its columnar storage format and massively parallel processing (MPP) architecture optimize analytical query performance. You can run complex SQL queries on your consolidated datasets, combining Oracle’s transactional history with other data sources to gain a comprehensive, unified view of your business.

But the real fun begins when you integrate Gemini capabilities into BigQuery, namely:

Natural language data exploration: Gemini lets you interact with your data using natural language, regardless of your technical skill level. Features like data canvas and data insights let you ask questions in plain English, generate visualizations, and discover patterns — all without writing a single line of code. This is powered by large language models (LLMs) that understand natural language queries and translate them into SQL.
AI-assisted SQL and Python: For data professionals, Gemini powers intelligent assistance for writing, explaining, and debugging SQL and Python code within BigQuery, dramatically increasing productivity and reducing development time. Gemini’s code generation and debugging capabilities are trained on vast code repositories, and can provide context-aware suggestions and error explanations.
Advanced analytics with BigQuery ML: If you’re a data analyst, get ready to integrate Gemini models directly into your BigQuery ML workflows. Perform tasks like sentiment analysis, entity extraction, text generation, or leverage advanced forecasting models (like TimesFM) on your integrated data, all within the familiar BigQuery environment. BigQuery ML lets you create and execute machine learning models using standard SQL queries, democratizing ML for data analysts. Gemini models can be invoked as user-defined functions (UDFs) within BigQuery ML, facilitating complex AI tasks directly on your data.
Multimodal Capabilities: BigQuery moves beyond traditional text and numbers by integrating multimodal capabilities, powered by models like Gemini 1.5 Pro. This allows you to analyze diverse unstructured assets—such as images, audio, and video—directly alongside your structured data. The result is richer, context-aware analysis across your complete enterprise dataset.
Agents: AI agents enable complex, multi-step data operations that go beyond simple querying. Leveraging frameworks like the Agent Developer Kit (ADK), these specialized agents autonomously plan, reason, and orchestrate steps like querying BigQuery tables, running BigQuery ML models, and generating comprehensive, natural language reports based on a single high-level goal. Google’s Gemini for Enterprise (previously Agentspace) acts as the central AI agent hub, bridging your structured BigQuery data with various work applications and data sources. Agents deployed here leverage Gemini’s reasoning to perform end-to-end workflows. For example, an agent could synthesize sales trends from BigQuery (including modernized Oracle data), interact with a CRM to identify at-risk customers, and automatically generate and send personalized outreach emails, turning data insights into automated action.

Real-world impact: Industry use cases

With BigQuery and Google’s Gemini for Enterprise, the possibilities are endless. Here are some early examples of how, together, they can transform raw data into actionable intelligence:

Retail: Retailers can stream sales transactions, customer purchase histories, and inventory data from Oracle ERP and POS systems into BigQuery. With Gemini AI and Google’s Gemini for Enterprise, they can then build sophisticated customer recommendation engines, develop highly accurate demand forecasts, optimize inventory levels, and create personalized marketing campaigns with a 360-degree view of each customer. For instance, you could use BigQuery ML to build collaborative filtering models for recommendations, and time-series models like ARIMA or Prophet for demand forecasting. AI agents could automate the generation of personalized product catalogs or even initiate reorder processes based on demand forecasts derived from this data.
Education Tech and universities: Institutions can integrate student enrollment data, course histories, financial aid information, and administrative records from Oracle systems into BigQuery. Leveraging Google’s Gemini for Enterprise, they can predict student success rates, identify students at risk of dropping out, tailor course recommendations to individual student interests and career aspirations, and optimize resource allocation for academic programs. Predictive models in BigQuery ML can identify students at risk using historical academic performance and engagement data. AI agents could assist faculty with curriculum development, learning plans based on student interest trends, content preference or help administrators streamline student support processes. Pearson is partnering with Google Cloud to create Next-Generation AI Tools across various data sources to develop agentic AI-powered study tools that enable personalized learning that adapts to each student’s unique pace and progress, keeping learners engaged, supported, and on track for academic success.

In conclusion, integrating Oracle Database with Google Cloud services like BigQuery, Datastream, Google’s Gemini for Enterprise is more than just a technical migration; it’s a strategic transformation. By moving transactional data from a robust system of record into a powerful analytical and AI-driven ecosystem, enterprises can unlock unprecedented value. This fusion allows businesses to not only gain a comprehensive, real-time view of their operations but also to infuse their data with advanced intelligence. From natural language queries and AI-assisted coding to predictive modeling and automated workflows, the combination of Oracle on Google Cloud empowers organizations to turn historical data into actionable insights, driving innovation, efficiency, and a significant competitive edge in a data-centric world.

Further resources:

Explore AI Solutions: Google Cloud AI Products
Official guide on Oracle to BigQuery, covering architectural differences, migration strategies, and AI-powered BigQuery Migration Services: Oracle to BigQuery migration
Evaluate BQ Agents using Agent Development Kit (ADK)

Read More for the details.

2025 10 28

GCP – Agent Factory Recap: AI Agents for Data Engineering and Data Science

Tibor Kiss Cloud, Google Cloud gcp

Welcome to another exciting episode of The Agent Factory, the podcast that goes beyond the hype to build production-ready AI agents! In this episode, we were thrilled to host Lucia Subatin, who guided us through the world of data agents and their transformative power for data engineers and scientists. She also showcased some truly innovative applications of graph databases and AI for better access to knowledge.

A podcast discussing data science agents

This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.

The Agent Industry Pulse

Timestamp: [01:45]

This week, the agent industry is buzzing with some groundbreaking releases:

Gemini API’s Computer Use Model: A new model that grants AI agents the ability to “see” and interact with your computer screen. It takes screenshots, decides on UI actions (click, scroll, type, open webpage), and executes them, allowing agents to automate real-world browser tasks like filling forms or testing user flows. Built with robust safety layers, every action undergoes a safety check, requiring human confirmation for risky operations. We even saw a demo of it looking up pricing on a documentation page!

CodeMender – AI Agent for Code Security: This AI agent is designed to autonomously patch new vulnerabilities as they arise (reactive) and rewrite existing code to secure entire classes of flaws (proactive). Leveraging the reasoning power of Gemini Deep Think and equipped with self-correction tools like static analysis and fuzzing, CodeMender automates the creation and validation of high-quality security patches at scale. It has already upstreamed 72 security fixes to open-source projects, marking a significant breakthrough for software security.

The Factory Floor

The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with live demos.

The BigQuery Data Engineering Agent

Timestamp: [06:44]

We dove into the BigQuery Data Engineering Agent, a powerful tool for automating data pipeline creation and management directly within BigQuery.

Generating Sales Regions: Lucia demonstrated how to use the agent to add a new sales_region field to an accounts table based on the billing_country, leveraging BigQuery’s AI_GENERATE function to call Gemini 2.5 Flash from a SQL statement.
Creating a Time Dimension Table: The agent was then prompted to generate a comprehensive time_dimension table, crucial for natural language to SQL queries by providing readily available date components (year, quarter, month name) for easier analysis.
Automating Data Quality Assertions: Finally, Lucia showed how the agent can automatically generate data quality assertions for all tables, such as ensuring non-null IDs and unique account names, to maintain data cleanliness and reliability for agent applications.

The Data Science Agent

Timestamp: [07:24]

Next, we explored the Data Science Agent, operating within Colab Enterprise, to extract insights and prepare data for agent applications.

Anomaly Detection: Lucia tasked the agent with detecting anomalies in a Case table. The agent formulated a plan to load and describe data, preprocess it for anomaly detection, train an isolation forest model, and provide visualizations.
Identifying Anomalous Records: After executing its plan, the agent successfully identified anomalous records, provided a summary of its findings, and even presented a visual confirmation of the separation between normal and anomalous data points. It also offered insights and next steps to understand the root causes of these anomalies, proving invaluable for improving data collection processes.

Creating Comics from Spanner Concepts using an ADK

Timestamp: [26:01]

In a truly unique demonstration, we saw how to combine a graph database with AI for creative content generation.

Spanner Graph Database: Lucia explained Spanner as a globally distributed, strongly consistent database with graph capabilities. She showcased a graph database built from Spanner’s documentation, traversable via GQL.
Knowledge Traversal and Comic Generation: Using an ADK application, a knowledge agent traversed this Spanner graph database to answer “What are regions?” Based on the retrieved information, another agent generated a detailed prompt for Nano Banana, an image generation model, to create a six-panel comic strip explaining Spanner regions in a vibrant tech illustration style. The comic visually explained regional, dual-region, and multi-region configurations.

The following is an example of another comic generated by the agent, responding to the question “What is interleaving?”

It was incredible to see how agents could not only retrieve precise information but also transform it into engaging visual content, even with multiple iterations to refine text clarity in the generated images.

Developer Q&A

Timestamp: [38:49]

We wrapped up with some great questions from our developer community:

On the Availability of Data Science and Data Engineering Agents

Timestamp: [38:53]

Both the Data Science Agent and the Data Engineering Agent are currently in preview. The Data Science Agent is in public preview, while access to the Data Engineering Agent requires following a specific link, which we’ll provide in the description. This means developers can start experimenting with these powerful tools today!

On the Scalability and Deployment of the Data Engineering Agent

Timestamp: [39:33]

The Data Engineering Agent leverages highly scalable platforms: BigQuery and Dataform. It can perform analysis across multiple tables, datasets, and projects, provided the executing pipeline has the necessary permissions. For deployment to higher environments (staging, production), Dataform excels in assisting the data pipeline lifecycle by generating declarative artifacts that can be released and configured for deployment across various project and dataset combinations, ensuring a robust software delivery lifecycle for your data pipelines.

What an incredible journey through the world of data agents and creative AI! We hope this episode inspired you to explore the possibilities of augmenting your data workflows and even generating engaging content with these innovative tools. The power to build cleaner data pipelines, derive deeper insights, and bring complex concepts to life through AI is truly at your fingertips.

Your turn to build

Ready to get hands-on? Dive into the resources linked below and start building your own data agents and AI-powered applications today! Don’t forget to watch the full episode for all the practical demonstrations.

Connect with us

Smitha Kolan → LinkedIn, YouTube, X, Instagram
Lucia Subatin → LinkedIn, YouTube, TikTok, Instagram

Read More for the details.

2025 10 28

GCP – Keys to the Kingdom: A Defender’s Guide to Privileged Account Monitoring

Tibor Kiss Cloud, Google Cloud gcp

Written by: Bhavesh Dhake, Will Silverstone, Matthew Hitchcock, Aaron Fletcher

The Criticality of Privileged Access in Today’s Threat Landscape

Privileged access stands as the most critical pathway for adversaries seeking to compromise sensitive systems and data. Its protection is not only a best practice, it is a fundamental imperative for organizational resilience. The increasing complexity of modern IT environments, exacerbated by rapid cloud migration, has led to a surge in both human and non-human identities, comprising privileged accounts and virtual systems [compute workloads such as virtual machines (VMs), containers, and serverless functions, plus their control planes], significantly expanding the overall attack surface. This environment presents escalating challenges in identity and access management, cross-platform system security, and effective staffing, making the establishment and maintenance of a robust security posture increasingly challenging.

The threat landscape is continuously evolving, with a pronounced shift towards attacks that exploit privileged access. Mandiant’s 2025 M-Trends report highlights that stolen credentials have surpassed email phishing to become the second-most frequently observed initial access method, accounting for 16% of intrusions in 2024. This resurgence is fueled, in part, by the proliferation of infostealer malware campaigns, which facilitate the collection and trade of compromised user credentials. However, threat actors of all types have found myriad new ways to compromise identity, including social engineering, which has been on the rise alongside several other tactics, techniques, and procedures (TTPs). ENISA documents criminal use of generative artificial intelligence (AI) for credential-stealing social-engineering and “fraud kits.”

Stolen credentials provide not just a high-value vector for initial access during intrusions, but also further enable actors to conduct internal reconnaissance, move laterally, and complete their mission. Compromised credentials, alongside stolen session tokens, social engineering, and other techniques to compromise identity, underscore the critical need for organizations to make identity security one of the foundational pillars of their security posture. Even with advanced perimeter defenses, if privileged credentials are weak or poorly managed, attackers will inevitably find a way into an organization’s critical systems. Breaches can be difficult to detect and contain; M-Trends 2025 reports a global median dwell time of 11 days in 2024—5 days when the adversary notifies, 26 days when an external entity notifies, and 10 days when detected internally. A concise defense-in-depth approach is required, where you should assume breach and implement layer controls so failure of one control is caught by the next layer of defense:

Verify every request (Zero Trust).
Require multifactor authentication (MFA) for all administrative paths.
Enforce privileged access management (PAM) with credential rotation and session recording.
Administer only from privileged access workstations (PAWs) on a segmented management network.
Tune security information and event management (SIEM) for privileged anomalies to reduce dwell time and radius.

Beyond external attacks, organizations face significant risk from account takeover (ATO) and insider activity. Adversaries routinely weaponize stolen credentials and session tokens, while negligent or malicious insiders can move quickly once they have access. In both cases, the trust model is being exploited, and privileged identities are the shortest path to impact.

At the same time, the business impact of breaches continues to rise and third-party exposure remains a frequent entry point. These realities reinforce an assume-breach posture with layered controls that reduce dwell time and blast radius.

This blog post provides recommendations and insights into preventing, detecting, and responding to intrusions targeting privileged accounts. To secure these “keys to your kingdom,” Mandiant’s strategy is built upon a comprehensive framework of three interdependent pillars:

Prevention: Securing Privileged Access to Prevent Compromise
Detection: Maintaining Visibility and Engineering Detections for Privileged Accounts
Response: Taking Action to Investigate and Remediate Privileged Account Compromise

This blog post serves as a reusable resource, emphasizing practical, threat-informed strategies to secure the most valuable digital assets.

01: Prevention: Securing Privileged Access to Limit Attack Impact

Effective privileged access management begins with an understanding of what constitutes a privileged account and a strategy for securing these critical assets.

Defining Privileged Accounts: Beyond the Obvious

Access is a privilege; every account is a grant of trust. A privileged account is any human or non-human identity whose entitlements can change system state, alter security policy, or reach sensitive data beyond a normal role. Privilege is contextual to role and tier: an entitlement is “privileged” when misuse would cause material impact for that asset. In modern enterprises, this also includes business users with access to sensitive financial or personal data via web apps and developers with cloud-platform access.

The evolving definition of “privileged” directly reflects the decentralization of IT and the rise of cloud-native and DevOps environments. Attackers are no longer solely targeting domain admins; they increasingly focus on developers’ workstations, service accounts, and API keys, knowing these give access to systems. This wider scope requires a more complete PAM strategy that covers the entire enterprise, not just traditional IT. The definition must also cover non-human accounts, such as service accounts, application accounts, and API keys. These are prime targets in real compromises because they hold broad access, yet are less monitored than human accounts. A PAM strategy that only focuses on human domain admins is incomplete and leaves attack surfaces open. Therefore, maintain a single inventory that classifies every human, service, and API account by business impact and maps each to a role with least-privilege entitlements—owner, purpose, systems touched, permitted actions, tier (T0/T1/T2), allowed pathways (PAW/jump), and Segregation of Duties (SoD) constraints—with quarterly attestation in the identity and access management (IAM) source of truth.

Categorizing and Tiering Privileged Accounts and Dependencies

Many organizations struggle with a broad and unclear understanding of “privileged accounts,” limiting their focus to only domain admins or global admins. This narrow view overlooks the dependencies on which those accounts rely. Mandiant’s Identity Security Modernization Engagements offer principles for better defining and categorizing privileged accounts beyond these views. These assessments help identify and reduce the number of accounts with highly privileged roles. This includes accounts or groups with permissions for modifying Group Policy Objects (GPOs), explicit permissions on domain controllers (DCs) or Tier-0 endpoints, privileged roles for virtualization platforms, and permissions to run processes as SYSTEM on many endpoints.

Dependencies often overlooked include jump servers, management workstations, specific network segments, applications, and continuous integration and continuous delivery/deployment (CI/CD) pipelines. “Trusted Service Infrastructure” directly addresses these dependencies, including management interfaces for asset and patch management tools, network devices, virtualization platforms, backup technologies, security tooling, and PAM systems themselves. Attackers target these components for persistence and lateral movement, knowing that compromising them can give broad control over an environment.

“Tiering” is key for PAM. It moves beyond a flat “privileged” versus “non-privileged” view by categorizing accounts based on compromise impact and their dependencies. For example, an account that can access a Tier-0 asset (like a domain controller) or the infrastructure supporting it (e.g., a jump server) poses a higher risk to the operation. Overlooking these dependencies means that even if a “privileged” account is secure, the less-secure system used to access it can become the weakest link. This shows the need for a holistic security approach that extends PAM controls to the entire “privileged access pathway,” ensuring controls are layered across identities, endpoints, networks, and applications. The context of access—from where, when, and how—becomes as important as the identity itself.

Common Privileged Account Categories and Critical Dependencies

Category	Examples	Critical Dependencies	Criteria for Privileged Roles
Human accounts	Domain administrators, local administrators (Linux/Unix), business users, developers	Jump servers, management workstations, critical networks, CI/CD pipelines	Default privileged roles, GPO modification permissions, explicit permissions on DC, local admin access
Non-human accounts	Service accounts, application accounts, API keys	Asset management tools, network management tools, virtualization platforms, backup technologies, security tooling, PAM systems	Accounts or groups with permissions to invoke processes as SYSTEM on a large scope of endpoints

Establishing a PAM Foundation

A PAM program is not built overnight; it is a journey that progresses through distinct phases, each building upon the last to systematically reduce risk and enhance security posture.

The PAM Maturity Journey

Effective privileged access management adoption is an evolutionary process through levels of maturity. These levels build on each other, reducing risk and improving security. Mandiant sees four stages: Uninitiated, Ad-Hoc, Repeatable, and Iterative Optimization. As an organization moves through these stages, its protection covers more types of privileged users, sensitive systems, and their accounts.

Uninitiated. Privileged access sits largely uncontrolled: manual account creation, spreadsheet tracking, shared credentials, weak/absent MFA, loose password policy, and missed deprovisioning. Service/API accounts appear without owners or documentation. Security lacks a full map of privileged pathways and Tier-0 assets—high cyber risk by default.

Ad-Hoc. First risk reductions begin: a subset of shared credentials gets vaulted/rotated; a few guardrails appear. Operations stay reactive, tools remain fragmented, role/tier separation is limited, reporting and attestation is difficult. PAM exists as point solutions rather than a program.

Repeatable. Controls become consistent and broad. PAM covers business users, developers, third parties, servers, workstations, and software-as-a-service (SaaS). Role-based access control (RBAC) standardizes access; MFA is on all admin paths; local-admin rotation (e.g., Local Administrator Password Solution [LAPS]) is in place; PAWs for Tier-0/1; just-in-time/just-enough-administration (JIT/JEA) introduced; change control and ticketing integrated. Scheduled discovery, classification, role mapping, and quarterly attestation create a dependable operating rhythm.

Iterative Optimization. Automation and analytics drive continuous improvement. Full lifecycle orchestration—provision → approve/JIT → session oversight → auto-rotate → deprovision—runs end-to-end. SIEM / extended detection and response (XDR) / security orchestration, automation, and response (SOAR) detect and contain privileged anomalies. Human standing privilege trends toward zero; service/API identities move to group Managed Service Account (gMSA)/managed identities; dual-control on vault release; break-glass tested; controls validated through red/purple-team exercises. PAM is woven through IT and DevOps, reinforcing defense-in-depth so failure of one layer is caught by the next.

Implementing a Dedicated PAM Solution

A key step in building a strong PAM foundation is implementing a dedicated privileged access management solution (e.g., CyberArk, BeyondTrust, Delinea). Onboarding all privileged accounts into a centralized PAM system provides visibility into who accesses which credentials and from where. Leading PAM tools discover, vault, and manage credentials; enforce security policies (like checkout approvals and one-time passwords); and log all privileged activities for audit. For cloud control planes, pair your PAM with cloud-native PIM/JIT services (e.g., Google Cloud Privileged Access Manager, Microsoft Entra ID PIM) to grant time-bound elevation rather than standing admin rights. This greatly reduces the risk of unmanaged, ad hoc credential use.

However, simply deploying a PAM product is not enough—PAM must be treated as an ongoing program with defined policies and ownership. The organization should establish central governance and processes around privileged access: enforce tiered account structures, require multifactor authentication for all admin access, and mandate least privilege. Pair top-down role design with bottom-up discovery. Governance must also audit effective permissions at the resource level (access control lists [ACLs] on data stores, app/database (DB) roles, SaaS admin scopes, cloud IAM policies) to surface shadow admins—accounts that do not appear privileged in directory groups but can fully control sensitive resources (e.g., HR/finance datasets). Feed these findings into tier mapping and PAM onboarding so those identities are either right-sized to least privilege or brought under PAM with JIT/JEA and session oversight. Scheduled entitlement discovery can be done via identity governance and administration (IGA) / cloud infrastructure entitlement management (CIEM) or dedicated entitlement-analysis tools as well as native exports from the platforms themselves.

Having a PAM tool does not automatically mean you are “doing PAM.” Many organizations park credentials in a vault yet fail to align with a tiered model or manage a full identity lifecycle. PAM must live inside a broader governance program. In practice, classify assets and platforms, map each privileged identity to that classification, then configure the tool to enforce policy (password rotation cadence, session recording, JIT/JEA, approvals, network restrictions). With that context, PAM simplifies the complexity by tying process to technology, turning policy into consistent, auditable controls. Run scheduled resource-permission crawls and reconcile deltas (new owners, new admin scope) back into the PAM inventory and approval workflows.

Properly implemented, a PAM solution yields many benefits: centralized insight into privileged access patterns, automated password management (eliminating hard-coded or stale credentials), and real-time alerting on suspicious behavior. Without such automation, managing thousands of privileged accounts manually is error prone and high risk. PAM tools mitigate human error by enforcing consistent policies and reducing reliance on individual administrators. They also integrate with monitoring systems (or built-in analytics) to flag anomalous admin activities. Organizations still relying on spreadsheets or disparate teams to manage admin passwords face scalability limits and blind spots. A dedicated PAM system, combined with strong processes, closes these gaps.

Considerations for Self-Managed PAM Initiatives

While a dedicated PAM solution is best for security and efficiency, organizations may manage some PAM aspects themselves, especially in earlier stages or for niche needs. If an organization undertakes a self-managed PAM initiative, these factors must be accounted for:

Manual Inventory and Tracking Overhead: Without automated discovery tools, keeping an accurate, up-to-date inventory of all privileged accounts (human and non-human, including application accounts, especially in finance organizations) becomes a large, error-prone, manual effort. This includes tracking permissions, dependencies, and owners across systems.
No Centralized Visibility: Separate manual processes lead to fragmented visibility. Combining logs from various sources (operating systems, applications, network devices) and correlating privileged activity for a unified view is hard without a central system (like a SIEM).
Inconsistent Policy Enforcement: Manual policy enforcement (e.g., password complexity, rotation, least privilege) across many privileged accounts is prone to human error and inconsistency, leading to security gaps.
Scalability Limits: Manual PAM processes do not scale. As privileged accounts grow, management overhead becomes too much, affecting security and operations.
Slower Incident Response: Without automated detection, real-time alerting, and integrated response, finding and containing a privileged account compromise will be slower, increasing dwell time and damage.
Higher Risk of Human Error: Manual management increases misconfigurations, forgotten deprovisioning, and accidental or coerced credential exposure, all leading to security incidents.
Compliance Burden: Showing compliance with regulations (e.g., PCI DSS, NIST) for privileged access becomes a laborious, manual audit process without automated reporting and session records.
Application-Specific Privileged Accounts: Applications, especially in finance, need attention to ensure they are managed with accounts that follow least privilege, rather than standard corporate accounts. This needs a detailed understanding of application roles and privileges.
No Bottom-Up Entitlement Discovery: Without resource-level audits of effective access, “shadow admin” rights remain invisible, leaving PAM scope incomplete and high-impact accounts unmanaged.

Organizations starting a self-managed PAM journey must know these limits and be ready to invest much manual effort and accept a higher risk than with a purpose-built PAM solution.

Hardening Critical Infrastructure and Credentials

Strong PAM needs hardening around it. Reduce privileged credentials to the minimum, lock down those that must exist, and restrict where/when they can operate. Treat the full credential lifecycle—creation, storage, use, rotation, retirement—as a control surface. Even if a password or token leaks, tight hardening should keep it noisy, short-lived, or useless.

Secure Administrative Access Paths (RDP, SMB, WinRM)

Admin pathways are prime lateral-movement rails. Collapse them into monitored, gated channels.

No direct exposure: Block Remote Desktop Protocol (RDP) / Secure Shell (SSH) from the internet. For remote admin, force access through PAWs or jump hosts with MFA and bastion logging.
Segregate management networks: Only PAWs and PAM session managers reach server admin interfaces; deny user subnets by default.
Protocol hygiene: Enforce Server Message Block (SMB) signing; prefer Kerberos; phase down NTLM; disable default admin shares (e.g., ADMIN$) where operationally viable.
WinRM/RDP hardening: Enforce encrypted Windows Remote Management (WinRM) always (AllowUnencrypted=0). In Active Directory (AD)-joined scenarios using Kerberos/Negotiate, WinRM over HTTP already provides message-level encryption; still prefer HTTPS to add TLS, server certificate validation, and for non-domain/cross-forest use. Require HTTPS for Basic auth, workgroup hosts, or any untrusted network path. Restrict to approved admin groups and endpoints; disable CredSSP unless explicitly required. For RDP, enable Network Level Authentication (NLA), limit access via firewall rules/GPO, and log via bastions/PAM session managers.
Broker sessions: Use PAM session management for high-risk systems (Tier-0/1) with keystroke/command capture and real-time termination.

Endpoint Security Controls (Least Privilege and Unknown-Code Execution)

Stop unapproved tools and script abuse on machines admins touch (endpoint privilege management [EPM]).

Least privilege by role: Remove local admin from user workstations; perform admin tasks only from PAWs.
Application control: Enforce WDAC/AppLocker allow-lists; block unsigned and unknown binaries; restrict PowerShell to Constrained Language Mode; enable AMSI + Script Block Logging.
Protect secrets on the host: Turn on Credential Guard/LSA Protection (RunAsPPL); disable legacy caches (e.g., WDigest); prefer AES-only Kerberos; shorten Ticket-Granting Ticket (TGT) lifetimes for admin roles.
Baseline + EDR: Apply CIS/Microsoft baselines via GPO/MDM; require endpoint detection and response (EDR) with tamper protection, USB/device control, and quarantine actions.
Classify by tier: Mark PAWs/jump hosts as T0/T1, workstations as T2, and enforce stronger baselines and update rings for higher tiers.

Credential Protection and Usage Hardening

Even when privileged accounts exist, we can limit their exposure and utility to attackers. Enforce technical controls such as:

Block Credential Reuse on Endpoints: Prevent privileged domain accounts from logging into standard user workstations. Administrators should use separate admin accounts only on admin systems (tiered access model). If a privileged credential is never used on a low-security machine, malware on that machine cannot steal it. Similarly, for local administrator accounts, disallow remote use (e.g., via Group Policy restrictions on those accounts’ Security Identifiers [SIDs]) to stop lateral movement. Use Microsoft LAPS, CyberArk Loosely Connected Device (LCD) (feature designed to manage and rotate credentials for endpoints, regardless of their connection to the corporate network or Active Directory), or equivalent to ensure each machine’s local admin password is unique and regularly rotated.
Service Account Restrictions and Residency: Define explicit residency for every service/API account, and which hosts, networks, and tiers they can run on. On Windows, prefer gMSA and restrict which computers may retrieve/use the credential via PrincipalsAllowedToRetrieveManagedPassword; grant “Log on as a service” only on those hosts; deny interactive and RDP logon everywhere; limit network logon as required. Use Kerberos constrained or resource-based constrained delegation only to named backend services; avoid unconstrained delegation. Residency boundaries enforce least privilege, prevent credential spread, and make misuse obvious in logs.
Memory and Credential Cache Protections: On Windows (Domain Member) systems, enable features like Protected Users group membership for admins (which disables legacy authorization protocols and forces Kerberos, etc.)—though do not apply this to service accounts, which may break if subject to those restrictions. Disable WDigest authentication and other settings that might keep credentials in memory in plaintext. These measures ensure that even if malware lands on a system, it is harder to scrape credentials from memory (Local Security Authority Subsystem Service [LSASS]). Reducing the “live” presence of passwords and tickets closes off common credential theft techniques (pass-the-hash, ticket reuse).

These hardening steps illustrate a mindset: even if privileged credentials exist, make them hard for attackers to capture or use. By reducing where they reside and how long they remain valid, you decrease the value of a stolen credential. This directly reduces an attack’s impact, forcing attackers to spend more effort or give up.

Minimize Standing Privileges

An emerging best practice is to reduce the number of privileged accounts that exist with always-on rights. Instead of giving every administrator their own always-privileged user account, consider a model of ephemeral or checked-out privileges. For example, a team of 10 admins may not need 10 separate domain admins active at all times. Using PAM, you could maintain a small pool of privileged accounts that admins check out when needed (one-at-a-time, with unique login tracking for accountability) and that get automatically locked or rotated afterward. Many PAM solutions support “exclusive access” or one-time password checkout, ensuring no two people use the same shared account simultaneously and every action is tied back to an individual.

This approach shrinks the attack surface by having fewer privileged credentials in existence. It also enforces discipline—admins must go through the PAM process to get access, which is logged and monitored. While shared accounts are generally risky, with strict PAM controls (per-user checkouts, full session recording, and audited approvals) they can be used in a way that preserves accountability while limiting credential proliferation.

The goal is zero standing privilege: no one has permanent admin rights unless actively approved and in use. Just-in-time administration (discussed later in this post) is a related concept that achieves this by granting rights only when needed.

Secrets Management

For highly sensitive secrets—master encryption keys, signing certificates, cloud API keys, etc.—organizations should use dedicated secret management systems (often termed “key vaults”). A key vault (whether services like Azure Key Vault, HashiCorp Vault, or CyberArk’s Identity Security Platform) is a hardened repository that securely stores secrets and tightly controls their access. The vault becomes the single source of truth for sensitive credentials, enabling fine-grained access control, auditing, and automated rotation from one central point. This reduces the risk of secrets sprawl (e.g., passwords stashed in configuration files or plaintext) and helps prevent unauthorized access to critical secrets.

When we say “secrets management,” we refer to the general practice of centralized secrets management, not a specific product. For example, Azure Key Vault, Amazon Web Services (AWS) Key Management Service (KMS) / Secrets Manager, Google Cloud KMS, or a third-party vaulting tool all serve a similar purpose. The key is that these systems are purpose-built to protect secrets through strong encryption, access control, and monitoring.

Key considerations for effective secrets management include:

Hardware Security Module (HSM): Integrate key vaults with HSMs as they provide a tamper-resistant environment for cryptographic operations and key storage, protecting keys from logical and physical attacks.
Least-Privilege Access: Only authorized users or automated processes should be able to retrieve or manage secrets, and access should be granted on a just-in-time, just-enough basis.
Auditing and Monitoring: Implement comprehensive logging and monitoring of all access to and operations within the key vault. Integrate these logs with your SIEM (e.g., Google SecOps) to detect anomalous behavior and unauthorized access attempts in real time.
Automated Rotation and Lifecycle Management: Automate the rotation of secrets stored in the key vault to reduce the impact of any potential compromise. This includes automated certificate renewals, API key rotations, and password changes for managed accounts. The key vault should manage the entire lifecycle of secrets, from creation to destruction.
Geographical Dispersion and Redundancy: Deploy key vaults in a highly available and geographically dispersed architecture to ensure business continuity and disaster recovery.
Segregation of Duties: No single individual should have complete control over all aspects of the key vault.
Secure Backup and Recovery: Establish secure, offline, and encrypted backup procedures for the key vault itself. This ensures that even in a worst-case scenario, critical secrets can be safely restored.

Segregation of Duties and Tiered Access for Secrets Management: Robust Credential Security

Segregation of Duties (SoD) is a security cornerstone: no single individual controls critical processes. For key vaults, housing sensitive cryptographic keys, privileged credentials, SoD is vital.

SoD Imperative in Secrets Management

SoD prevents fraud, errors, and malicious activity by distributing control over critical processes so no single individual can act unilaterally. For key vaults, this prevents a single point of failure and mitigates the risk of insider threats and external attacks that exploit stolen credentials. Without SoD, a single compromised account could grant an attacker unfettered control, leading to immediate data exfiltration.

SoD proactively defends against cyberattacks by “decompressing” the attack pathway. Attackers use stolen credentials to bypass initial access defenses, but SoD fragments the control over a key vault, so even if one person’s credentials are breached, the attacker lacks the full permissions needed to compromise the vault completely. This increases the complexity and time needed for an attack, making it easier to detect.

SoD deters malicious activity by ensuring accountability. When administrators know their actions are subject to forensic auditing, they are less likely to misuse their access. This architecture makes malicious activity more difficult, reduces human error, and fosters shared vigilance. By forcing privileged actions through approved, dual-controlled paths, the design makes unauthorized tradecraft noisy and easy to spot—attempts outside sanctioned workflows fail fast and alert.

Tiered Access Control for Key Vaults

Enforce tiering inside PAM and vault workflows. Treat the vault, PAM components, identity provider (IdP), and admin workstations as Tier-0 control planes. Permit Tier-0 identities only on Tier-0 systems; block cross-tier logons; require PAWs for Tier-0; isolate management networks so lower tiers cannot reach them. Make dual-control the default for vault release and role changes; ensure all break-glass paths are audited. Encode this in PAM: dedicated Tier-0 roles, approval chains, session isolation. Validate in SIEM: vault access, policy edits, role elevation, key retrieval.

Administrative Silos Per Tier

Build discrete silos for T0, T1, and T2. Separate admin groups, PAWs, credential stores, management tooling, logging, and network segments. No shared hosts, no shared identities, no shared jump paths across silos. Deny-by-default between tiers; allow only vetted, one-way orchestration flows.

Tier Controls that Protect Tiers from Each Other—and Themselves

Cross-tier protections: Block interactive logon from lower to higher tiers; restrict credential injection and token reuse; require JIT elevation with time bounds; enforce change windows and peer approval for T0 actions.
Intra-tier firebreaks: Session recording with command risk scoring; rate-limit or pause high-impact operations; require two-person integrity for destructive changes (key purge, policy delete); automatic rollback checkpoints for T0 policy edits.

Tier Definitions

T0 (crown-jewel control plane): AD domain controllers; cloud IdP tenants (Microsoft Entra ID, Okta, Ping); cloud management planes and root roles (AWS IAM/root, Azure management groups/subscriptions, Google Cloud org/projects), Kubernetes control plane; PAM infrastructure; secrets/key services (CyberArk Vault, HashiCorp Vault, Azure Key Vault, AWS KMS/Secrets Manager); public key infrastructure (PKI) / certificate authority (CA) and CI/CD orchestrators.
T1: Core business platforms (critical apps, databases).
T2: Workstations, lower-impact servers. Key vaults are unequivocally T0.

Tiering must extend across the entire privileged pathway—identities, endpoints, networks, and applications that touch the vault—in both on-premises and cloud environments so a weak hop cannot bypass controls. SoD + tiering work together: tiering sets asset criticality and isolation boundaries; SoD fragments authority so no single operator can subvert Tier-0. Net effect: PAM encodes and enforces the enterprise tier model for every vault operation, while monitoring, approvals, and session isolation keep even the most privileged actions accountable and recoverable.

Advanced PAM Capabilities: JIT and JEA (Just-In-Time / Just-Enough-Access)

Modern PAM programs are increasingly adopting just-in-time (JIT) access and just-enough-access (JEA) models to enforce least privilege dynamically. These approaches aim to eliminate standing high-level access and only grant privileges when and to the extent needed.

Just-Enough-Access (JEA). Constrain privilege to the exact commands required for the task—nothing more. Example: instead of making a helpdesk user a domain admin, expose a PowerShell JEA endpoint that can only unlock accounts. JEA forces least privilege per command, produces high-fidelity logs, and blocks actions outside the allowed scope by design.

Application Control / EPM as the runway to JEA. Before or alongside JEA, apply application allow-listing and per-process elevation so only approved binaries run and only approved binaries can receive elevation. Concretely: WDAC/AppLocker on Windows, sudoers and signed-binary policies on Linux/macOS, or endpoint privilege management (e.g., CyberArk EPM) to elevate a specific installer/tool without giving the user local admin. This shrinks the privilege surface on endpoints and makes the later jump to JEA much smoother.

Just-In-Time (JIT) Access. JIT focuses on time-bound privilege elevation. Instead of an account having 24×7 admin rights, it can be configured so that admin privileges can be activated for a short window when needed, often requiring approval. For instance, a cloud administrator might not normally have the Owner role on a production subscription, but through a privileged identity management (PIM) service (like Microsoft Entra ID PIM or CyberArk Secure Cloud Access), they can request that role, and upon approval it is granted for one or two hours and then removed automatically. JIT ensures that even if an account’s credentials are stolen, an attacker cannot do anything privileged with them unless they happen to steal it during an active privileged window (which is unlikely). It minimizes the duration of elevated access, cutting off opportunities for abuse.

Zero-Standing Privilege (ZSP). Target state: no human holds always-on admin rights. Access requires an approved request, step-up MFA, and either (a) time-bound role assignment or (b) an ephemeral token/credential. Session recording and command controls run by default. ZSP combines JEA (scope) + JIT (time) + strong approvals, making privilege both temporary and tightly bound.

App control/EPM prevents unknown tools from running; JEA restricts allowed actions; JIT/ZSP removes 24×7 rights; secure web sessions capture and deter misuse. Together they reduce blast radius, raise attacker friction, and generate auditable evidence for every privileged step.

Hardened Access Pathways

Restricting access to key vaults via hardened pathways is critical. Organizations should use PAWs or jump servers, which are highly secure, segmented systems used exclusively for privileged administrative tasks. This prevents attackers from moving laterally from a compromised, less-secure workstation to a high-value Tier-0 asset.

Hardening common lateral movement protocols like RDP, SMB, and WinRM is also vital. This includes disabling administrative shares, avoiding direct internet exposure, and enforcing MFA for RDP sessions. These measures contain a compromise even if initial access is gained.

Automated Credential Management

Automated secret rotation (for passwords, SSH keys, API keys, and certificates) is vital for reducing the window of opportunity for attackers. This automation is a form of SoD, as it removes the human element from handling sensitive credentials, minimizing accidental exposure or malicious manipulation during rotation. Dedicated PAM solutions can automate this process at scale, ensuring consistent policy application and reducing the “privilege of knowledge” by limiting how long any human needs to know a sensitive secret.

Dual Authorization and Approval Workflows

The “four-eyes” principle, or dual authorization, is a direct and stringent application of SoD. It requires a second or more, independent approval for high-impact actions within a key vault, such as retrieving a master encryption key or modifying critical policies. This ensures no single individual can perform a potentially irreversible action without independent verification, raising the bar for attackers and malicious insiders.

Monitoring and Auditing

Collect comprehensive logs from vault/PAM (checkouts, policy edits, session telemetry), IdP sign-ins, PAWs/jump hosts, EDR, and network controls. Correlate and aggregate these streams in the SIEM (e.g., Google SecOps) to build a single privileged-activity timeline. Combine analytics with context/assurance signals (device trust, geographic risk, user risk) to score events. Let automation auto-contain clear cases (suspend token, rotate secret), and surface in-role but abnormal activity to humans with the correlated context needed for fast decisions beyond automation.

These monitoring capabilities are also critical for compliance. Many regulatory frameworks, such as PCI DSS and NIST SP 800-53, mandate detailed auditing of all actions taken by individuals with administrative privileges.

02: Detection: Maintaining Visibility and Engineering Detections for Privileged Accounts

Distinguishing Privileged Account Monitoring from Normal IAM Abuse

Basic security tooling misses privileged misuse. Firewalls, simple intrusion detection systems (IDS), or SIEMs used as raw-log buckets give a flat view with little actor intent, leaving audit gaps. Close those gaps with defense-in-depth observability: collect high-fidelity, user-centric signals across control planes and correlate them—PAM vault checkouts, elevation/approval workflow events, session transcripts/commands, IdP sign-ins and Conditional Access outcomes, PAW posture, EDR process trees, network flows, change/configuration logs, and ticket metadata. Tie each privileged action to who/what/when/where/why/how, then apply behavioral analytics to flag authorized but abnormal use, verify dual-control, and automatically kill sessions, revoke tokens, or rotate secrets. On the defender side, organizations that deploy security AI/automation see materially better outcomes. IBM’s study reports ~USD $2.2M lower average breach costs and a shorter breach lifecycle—reinforcing the need for automated detections.

Key differences vs. normal IAM abuse:

Observation depth. Privileged activity demands Who, What, When, Where, Why, and How context captured from the aforementioned user-centric signals for both real-time and post-event assessment.
Impact-first triage. Prioritize by asset tier and action impact rather than “detect-all” volume.
Authorized-abuse focus. Validate approvals and scope; alert on mismatches in approver, time, device, or target.
Analytics + response. Combine PAM telemetry with identity threat detection and response (ITDR) and User Entity Behavior Analytics (UEBA) in SIEM/XDR to drive automated containment (session terminate, token revoke, secret rotation).

Key Distinctions: Privileged Account Monitoring vs. Normal IAM Abuse

Criteria	Privileged Account Monitoring	Normal IAM Abuse Monitoring	Shortcomings of Traditional Tools
Granularity	High-fidelity, user-centric context (screen, keystrokes, metadata)	Basic event logs; general access attempts	Incomplete picture, lack of detail, scattered events
Impact of Compromise	Disproportionately high impact (financial, operational, reputational)	Lower/variable impact; general threat detection	Fails to differentiate critical from non-critical events effectively
Contextual Understanding	Deep understanding of intentions/impacts; behavioral analysis	Focus on basic access patterns	No user-centric context; difficult interpretation
Compliance Requirements	Strict regulatory demands (PCI DSS, NIST, etc.)	Broader compliance; general logging	“Audit gap” where traditional logs do not meet detailed requirements
Insider Threat Mitigation	Primary focus for insider threat mitigation (malicious or negligent)	General threat detection; less specific focus on insider misuse	Cannot effectively identify subtle insider misuse

Engineering Specific Detections and Hunts

To monitor privileged accounts, organizations must move beyond static, rule-based detections to dynamic, intelligent approaches, including behavioral analytics with machine learning.

Generalized Detections for Anomalous Behavior

SIEM anomaly detection constantly monitors and analyzes data from network sources to establish a normal behavior baseline. Any deviation, like unusual login times, unexpected data transfers, or access by unfamiliar users, is flagged as an anomaly. Machine learning is at the heart of modern SIEM anomaly detection, letting the system learn from vast data, adjust baselines as the network changes, and find complex patterns across data sources. This finds subtle, low-and-slow attacks typical of threat actors. For instance, a system might detect a privileged user suddenly accessing new resources or acting outside their usual hours. While individual actions may seem harmless, their combination can show compromised credentials or insider misuse, which traditional rules might miss.

Nuanced Brute-Force Monitoring

Not all brute-force looks equal. Tune sensitivity by target impact and identity legitimacy. Deprioritize sprays at low-risk users; treat attempts against super admin/root, PAM/vault, secrets management, IdP break-glass, and cloud control planes as high-severity. Go beyond failure counts: classify the campaign by username quality (invalid-name ratio → enumeration; high valid-name ratio → likely stolen list), technique (spray vs. stuffing vs. targeted), MFA outcomes, lockout/rate-limit evasion, and source reputation. Correlate with role catalogs and allowed activity for that role: a spray that yields a success on a Tier-0 identity followed by atypical actions (token creation, role elevation, policy edits) signals compromise. Suppress noise by allow-listing approved scanners and pen-test windows; require change-ticket or source-IP tags to mark “legit testing.” Drive a risk score per campaign that blends target tier, username legitimacy, success events, and post-authorization behavior, then trigger automated response (step-up auth, session kill, account disable, secret rotation) only for high-risk series.

Privileged Session Monitoring and Auditing

For privileged activity, high-fidelity session capture (screens, keystrokes/commands, metadata) provides intent and impact at review time, detects insider misuse, and proves compliance. Feed session telemetry to SIEM/XDR and Privilege Threat Analytics/ITDR to enrich with risk factors: asset tier, origin/device trust, time, approval chain, command rarity, data movement. Link sessions to brute-force outcomes and dual-control artifacts (who requested, who approved). Use analytics to auto-summarize what mattered (privilege elevation, new tokens/keys, policy changes, lateral pivots) and assign a risk score so investigators can triage fast; auto-action when thresholds are crossed (terminate session, revoke tokens, rotate credentials). This keeps reviews focused on abnormal behavior while preserving full evidence for forensics.

Specific Detections and Hunts (Examples)

When engineering detections and hunts for privileged accounts, focus on high-impact behaviors and unusual patterns, covering human and non-human privileged entities.

Credential Exposure: Look for account lockouts or unexpected password resets, more login attempts on multiple services, logins from new devices or unfamiliar locations, multiple accounts accessed by the same device or IP address, uninitiated changes to account settings (e.g., recovery emails, security questions), and the use of emulators or virtual machines.
GPO Modifications: Detect Group Policy Object (GPO) modifications by checking Security event logs on domain controllers for Windows Security Event ID 5136. “Audit Directory Service Changes” must be on. Watch for modifications of GPOs like the Default Domain Policy or scheduled task additions via GPO.
Trusted Service Infrastructure Activity: Detect authentications and activities within platforms like asset and patch management tools, virtualization platforms, and security tooling.
Virtualization Infrastructure: Ensure centralized SIEM/logging platforms capture authentication, authorization, access events, and configuration changes for virtualization platforms. Baseline these events, then alert on any access where privileged identities are used.
Privileged Service Account Behavior: Keep an inventory of where and when privileged service accounts log on and create detections for any activity outside these baselined parameters. This is key for applications, especially in finance, that might be managed by service accounts. Detections should flag if a service account used for a financial application tries to access a different application, or logs in from an unexpected host.
Threat Hunting: Do regular, proactive threat hunting to find compromise evidence missed by existing detections. This also helps find visibility gaps and build new detection uses.
Compliance-Driven Auditing: Follow industry standards and regulations. PCI DSS requires auditing all actions by root or administrative privileges, and invalid logical access attempts. NIST SP 800-53 requires session audits at system start-up, user session content capture, and real-time viewing of user sessions.

By providing these detection examples, organizations can improve security. Linking detections to compliance standards adds a mandatory reason for implementation. Inventorying and monitoring service accounts, a common hurdle, can also be addressed.

Sample Detections for Privileged Account Activity

Detection Category	Specific Activity/Indicator	Potential Threat	Recommended Action (Google SecOps)
Anomalous Login	Login of a privileged account from a new geolocation or unusual IP address	Account Takeover, Compromised Credential	High-severity alert, automated account suspension
High-Impact Brute-Force	Rapid failed-login burst to a Tier-0 admin from one origin fingerprint (same IP, device/hostname, ASN/geo, user-agent) within minutes	Account Takeover, Credential Stuffing	Critical alert, force password reset, automated account lockout
Credential Exposure	Uninitiated password reset or account setting change on a privileged account	Account Takeover, Insider Threat	High-severity alert, trigger forensic investigation playbook
GPO Modification	Windows Security Event ID 5136 on domain controller for modification of Default Domain Policy or addition of a scheduled task	Ransomware Deployment, Lateral Movement, Persistence	Critical alert, automated GPO rollback (if feasible), immediate investigation
Privileged Service Account Anomaly	Privileged service account login or activity outside of baselined hours or on unapproved systems	Lateral Movement, Insider Threat, Compromised Service Account	Medium-to-high severity alert, automated account suspension/disablement
Trusted Service Infrastructure Access	Unusual authentication or activity within asset/patch management tools, virtualization platforms, or security tooling	Privilege Escalation, Command & Control, Data Exfiltration	High-severity alert, isolate source system, initiate threat hunt

Leveraging Google SecOps for Enhanced PAM Visibility

Google SecOps, especially its SOAR capabilities, serves as a central nervous system for privileged account monitoring. Its ability to ingest and analyze data from various sources is key for PAM.

Centralized Aggregation and Analysis

Google SecOps integrates with PAM solutions (e.g., CyberArk, via Syslog ingestion) and infrastructure logs (like Google Workspace activity). This allows central data aggregation and analysis, addressing “scattered events” and “incomplete pictures” from traditional logging. Once ingested, Google SecOps maps fields to a unified data model (UDM), enriching data with context and standardizing event types. This normalization is key, allowing cross-platform correlation and a unified view of privileged account activity across the enterprise. This aggregation and normalization overcome disparate log source limits, enabling better anomaly detection and threat hunting that might otherwise be impossible.

Automated Detection and Response Workflows

PAM data integration with Google SecOps’ SOAR enables automated response workflows. This addresses the need for fast action when privileged accounts are compromised. Google SecOps can trigger automated workflows for actions like revoking access, suspending accounts, or granting temporary access for incident response. When a PAM solution, integrated with the SIEM, detects deviations from a baseline, it can alert and then initiate actions like rotating compromised credentials or enforcing MFA to contain attacks.

This automation means that upon detecting a high-severity anomaly (e.g., a brute-force attempt on a super admin account), Google SecOps can automatically initiate containment, reducing manual Security Operations Center (SOC) effort and attacker opportunity. This shifts security from reactive alerting to proactive, automated defense. Automating containment, like suspending a compromised account or forcing a password reset, reduces “SOC effort” and “impact,” letting human analysts focus on investigation rather than initial containment. Google SecOps is a key enabler for a mature, efficient PAM program that detects and responds to threats fast, improving security.

03: Response: Taking Action to Investigate and Remediate Privileged Account Compromise

Even with prevention and detection, organizations must be ready for a privileged account compromise. Response speed and thoroughness dictate incident impact.

Tactical Hardening and Positioning During an Incident

Prepare before an incident: Map every service account to owner and workload, run continuous discovery cycles (using PAM discovery/scanning tools) to find systems and credentials, and onboard all human and non-human privileged identities into PAM; enforce unique credentials, MFA, and API-based rotation; migrate Windows services to gMSA / standalone Managed Service Account (sMSA) and block interactive logon for service accounts; create pre-approved, tested runbooks for bulk rotation, quarantine, and vault/IdP audit escalation; store break-glass credentials offline with dual-control retrieval and immutable logging; secure executive sponsorship for Tier-0 ownership and cross-team responsibilities (platform, app, IAM, PAM).

Immediate isolation: Pull suspected admin workstations off the network; restrict east-west to Tier-0; in cloud, revoke refresh tokens and active sessions, then force re-authorization. For vaults/IdP/DCs showing anomalous activity, sever untrusted network paths, keep console access for responders, and snapshot logs/state before change. Raise audit levels on targets (operating system [OS], IdP, vault, PAM) to capture follow-on actions for forensics.

Credential resets: Coordinated, not piecemeal. Use PAM to bulk-rotate human + service secrets once initial containment stabilizes. Close a common gap by onboarding service accounts comprehensively, blocking interactive logon, mapping each to an owner/workload, and attaching to rotation workflows. For Windows services, migrate to gMSA/sMSA to gain automatic, frequent password changes with no human handling; for non-Windows/app credentials, store in PAM and rotate via API. This yields rapid, low-friction resets without tipping the actor.

Break-glass that actually works: Maintain offline, tightly held emergency access for Tier-0 (e.g., local admin on vault/DC, offline DA credential) with dual-control retrieval, immutable logging, and post-use rotation. Drill these paths routinely.

Incident response (IR) support: Engage internal IR plus an external partner early for memory capture, log triage, and containment strategy while platform teams sustain core services. (IR playbooks should already assume the aforementioned Tier-0 model to avoid re-exposure during response).

Effective Investigation and Remediation

Investigation for privileged account compromise must be holistic, combining forensic analysis with understanding how privileged access is abused. This shows the need for logging and monitoring setup in the detection phase.

Investigation should include analyzing systems that interact with privileged infrastructure, like developer and signing systems, for malware. Initial access vectors, particularly phishing campaigns (e.g., fake job offers) and malicious web pages, must be investigated. Reviewing logs for interactions with privileged infrastructure, especially sending transactions from secret management platforms or API gateways, is also key. Understanding the full attack path—how access was gained, how privileges were increased, how lateral movement used privileged access, and what actions were done—is key for remediation and preventing recurrence.

Eradication hinges on a coordinated enterprise password reset (EPR)—a planned, organization-wide rotation of credentials and secrets to evict an attacker’s ability to reuse stolen material. Initiate EPR when there is evidence or strong suspicion of mass credential exposure (e.g., NTDS.dit dump, DCSync/DCShadow, Kerberoasting, or secrets pulled from code/repos/vaults). Scope EPR to cover domain, local, service, and application/technology accounts; API keys and embedded secrets; cloud sync/bind identities; and third-party integrations. Run it as a cross-functional operation (IR, IAM/PAM, platform, app/dev, cloud ops, SOC, help desk, legal/communications, executives) with staged playbooks, (e.g., dual KRBTGT rotations, trust key resets, service-account updates via PAM/gMSA, and immediate revocation of exposed tokens/keys). Executed well, EPR restores positive control with minimal disruption and removes the attacker’s persistence.

Recovery Planning for Critical Systems

A PAM strategy goes beyond immediate incident response to include recovery planning for systems, ensuring resilience in a catastrophic event.

Virtualization Infrastructure Hardening and Protections

Treat vCenter/ESXi, Hyper-V, cloud consoles as Tier-0 choke points. Use dedicated admin identities and/or privileged directories (separate forest or platform-local), vault them, require MFA, and put all hypervisor/out-of-band management on segmented admin networks reachable only from PAWs/jump hosts. For HPE Integrated Lights-Out (iLO) / Integrated Dell Remote Access Computer (iDRAC) / Intelligence Platform Management Interface (IPMI), place on a dedicated management network, disable internet exposure, replace default certs, avoid IPMI-over-LAN, and restrict operators to a tiny vetted group with session recording and aggressive rotation/certificate-based authentication.

Harden ESXi hosts. Enable Lockdown Mode to force host admin via vCenter, reserve direct console/ Direct Console User Interface (DCUI) for break-glass; minimize SSH, disable when not needed; enforce vCenter RBAC and strong password policies. Centralize telemetry in SIEM for VM create/delete, role/permission changes, snapshot/optical disk image (ISO) mounts high-signal events for ransomware staging. Monitor vpxuser across hosts; keep automatic rotation enabled (30-day default) and, if compromise suspected, change rotation interval in vCenter so it propagates, rather than requiring manual changes on hosts.

Harden PAM servers themselves as Tier-0: Dedicated machines, not domain-joined or isolated to a Tier-0 silo; vendor hardening baselines; minimal services; host firewalls; controlled console access; continuous health/telemetry to SIEM. CyberArk’s Digital Vault Security Standard and hardening guidance provide concrete checklists.

Backup Infrastructure Protections

Backup infrastructure is the ultimate privileged access target for ransomware operators, as its compromise can stop recovery. Protecting identities that manage backups is a PAM concern, ensuring the “keys to the recovery kingdom” are secure. Organizations must find all dependencies and interconnectivity needs for backup infrastructure availability. The backup architecture should be effective and timely, considering isolated recovery environments and immutable backups—following the 3-2-1 rule of 3 copies in 2 locations and 1 offline.

A defined recovery and reconstitution sequencing strategy, based on business importance, guides restoration. Planning for secure, validated restoration using isolated network enclaves is key to prevent reintroducing malware. Strategies include using unique, separate credentials (not with primary identity provider) with MFA for backup infrastructure, securing offline copies of emergency access credentials, and using unique programmatic service accounts with regular rotation. Implementing firewall rules to restrict admin traffic to a dedicated backup admin network, isolating backup servers from production, and using immutable backups or “write once, read many” (WORM) capabilities are also vital. Finally, admin access to backup infrastructure should be restricted via secure access workstations, and detection strategies should find illegitimate modifications to backup retention and purge policies.

Conclusion: A Proactive Stance on Privileged Access Security

Privileged accounts remain the primary target for attackers, serving as the gateway to financial and operational impact within any organization. The threat landscape, with more credential compromise, account takeovers, and insider threats, shows the need for privileged account monitoring and a mature privileged access management (PAM) program.

Effective PAM goes beyond the narrow definition of privileged accounts, covering human and non-human entities across IT environments and their dependencies. It needs a maturity journey, guiding organizations from an Uninitiated posture to Iterative Optimization—an automated, continuously improving defense. Dedicated PAM solutions are foundational, but must sit on firm system hardening and enforced policy baselines—tiering and SoD, PAWs-only administration, conditional access/MFA, application allow-listing, credential hygiene/rotation—followed by protocol controls such as RDP, SMB, and WinRM. Together these measures reduce attack surface and sharply limit the utility of stolen credentials.

In detection, traditional logging limits mean moving to specialized monitoring. Distinguishing privileged account activity from normal IAM abuse needs more detailed context and a focus on compromise impact. Using advanced analytics, especially machine learning anomaly detection, finds subtle, “in-role but abnormal” behaviors that show compromise or misuse. Nuanced alerting, like prioritizing brute-force attempts against super admin accounts, optimizes security operations. High-fidelity session monitoring gives proof for investigation and compliance. Google SecOps, with its central aggregation, unified data model, and automated response, is a platform to make these PAM monitoring strategies work, enabling real-time threat detection and fast containment.

Finally, a PAM strategy demands practiced incident response and recovery planning. Immediate tactical hardening, better logging, and isolation are key during an incident. Thorough investigation and remediation, including secret rotation and system rebuilding, are needed for eviction and future resilience. Planning for critical system recovery, like key vaults, virtualization infrastructure, and backup systems—with isolated, encrypted, and tested backups—is the ultimate safeguard against loss.

By taking a proactive stance on privileged access security, organizations can reduce risk, protect assets, and build a more defensible and resilient digital ecosystem.

Read More for the details.

2025 10 28

GCP – Expanding investment in our Google Public Sector partner ecosystem

Tibor Kiss Cloud, Google Cloud gcp

As AI technology advances at a rapid pace, I am pleased to announce a new set of investments in our partner ecosystem at today’s Partner Connect at the Google Public Sector Summit. Our new initiatives will deepen our collaboration and empower our partners to capitalize on the incredible momentum in AI.

We are focused on what matters most: accelerating the growth of every partner, aligning our teams, and making it easier to bring transformative solutions to public sector customers.

Increasing investment in partner-focused programs

Today we are significantly increasing our investments in vital partner programs to drive continued growth:

Expanded Rapid Innovation Team (RIT) partner funding: We are increasing investment in our successful Rapid Innovation Team Partner Funding Pilot to help more partners work with the RIT to build and deliver repeatable, high-fidelity solutions within and across federal, state, and local governments, and the higher education sector.
Doubled training capacity: We are doubling the capacity for our Partner Development Sprints to help partner teams build critical technical skills and specializations in a 120-day path.
Targeted opportunity advancement: We are increasing our Deal Acceleration Funds (DAF) to directly invest in shortening the sales cycle for strategic, high-value deals.

Enhancing co-selling opportunities

We are improving alignment between our sales and partner teams to accelerate co-selling and create a seamless “one-customer” experience. This includes:

New Google Public Sector Partner Expertise Badges: We are launching three new badges in our Google Public Sector Partner Expertise Badge Program, aligned with critical customer needs: Google Distributed Cloud (GDC), Infrastructure Modernization, and Gemini for Government. These are the skills the market is demanding, and the application process for new partners to gain this critical differentiation opens today.
Streamlined subcontracting: We remain fully committed to a services engagement model that heavily relies on our partners. Our new Google Public Sector Services Subcontractor Program creates a transparent, standardized process for selecting subcontracting partners, creating a transparent and repeatable process for how partners are selected for individual projects.

Accelerating time-to-market for independent software vendors

We are dedicated to increasing speed-to-market for our Independent Software Vendor (ISV) partners with investments that include:

ATO Accelerator Program expansion: Today, we are announcing the expansion of our ISV ATO Accelerator Program by providing up to $250,000 in Google Cloud Platform (GCP) credits and up to $500,000 in services reimbursements per partner, to ease the path to vital FedRAMP and Impact Level accreditations.
Greater Marketplace access: Our ISV ATO Accelerator Program speeds ISV entry into our private marketplaces such as JWCC and C2E, where we are seeing rapid growth.
Increased ISV support: We are also increasing the resources supporting our ISVs to ensure our partners have the ability to build on GCP and co-sell with our field teams.

Simplifying collaboration to accelerate delivery

Finally, we are launching new tools to make it easier to partner with Google Public Sector, including:

Partner Demo Portal enhancements: We are thrilled to announce further enhancements to the Partner Demo Portal, based on feedback we’ve heard from partners. The new portal is designed to give partners greater visibility, deeper insights, and a more seamless collaborative experience.
- Key enhancements include:
  - Enhanced discovery: Partners can find the right demo faster with improved filtering by government type, industry, and application.
  - Actionable analytics: Partners can now track search appearances, views, and saves to see the traction their solutions are gaining with our sales teams.
  - Transparent collaboration: A new submission and review workflow provides clear, timely feedback from our partner engineers.
  - Showcase success: Partners can now add customer references and highlight “golden assets”—demos that include a pitch deck, functional demo, and pricing estimations.
Training investments: We are making investments in new self-service labs and instructor-led bootcamps focused on public sector use cases designed to help partners grow faster.

Your strategic advantage with Google Public Sector

These initiatives are all backed by our commitment to partnership. We deliver what no other provider can: a full-stack AI solution with multi-party and open source AI models, and a unified AI platform – all powered by our secure custom silicon, chips, and global infrastructure.

Our model is, and will continue to be, built on scaling through our partners. We view partners as mission accelerators: critical to helping public sector organizations successfully adopt AI, enhance cybersecurity, and modernize their operations.

We are excited to continue building the future together. We invite you to engage in our programs today:

Read More for the details.

gcp

What’s new in general availability?

1. Auto-alerts by default

2. Intelligent, AI-generated thresholds

Get started today

Ray and Kubernetes label-based scheduling

Advancing accelerator support in Ray and Kubernetes

Ray-native resource isolation with Kubernetes writable cgroups

Ray vertical autoscaling with in-place pod resizing

Ray + Kubernetes = The distributed OS for AI

A more native experience for Cloud TPUs with Ray on GKE

AI as a strategic imperative: Modernizing risk management

In case you missed it

Threat Intelligence news

Now hear this: Podcasts from Google Cloud

How Gemini CLI has become an essential developer tool

GKE: Powering the next generation of workloads

GKE + Gemini CLI: Unlocking inference CUJs

Get started today

The challenge: When writing SQL is a bottleneck

The solution: An intuitive query-building experience

Key features at a glance

Demo: The power of query builder in action

What’s next? The future of querying in Log Analytics

Get started today!

About KV Cache

External KV Cache on Google Cloud Managed Lustre

Total Cost of Ownership: Analysis

Performance benchmarks

Ready to optimize your inference workloads?

1. Smarter search that understands you

2. Personalized recommendations

3. Seamless conversational interaction

4. Addressing frustrations and enhancing comparison

5. Clear visuals and user-friendly design

6. Building trust and handling errors gracefully

7. Conversational commerce components library:

Get started

Protecting data in transit

Securing long-lived digital signatures

Transitioning our Public Key Infrastructure

What’s next for PQC

Tackle multiple GitHub issues in parallel

Orchestrate complex workflows by combining the Jules extension with other Gemini CLI extensions

Fixing security vulnerabilities in the background with the Jules and Security extensions

Automated bug fixing & unit testing with the Jules and Observability extensions

How do you use the Jules extension?

Get started

Empowering public sector employees with agentic AI and skills training

Real-world impact in accelerating mission success

Google Public Sector is your mission partner

Building a new foundation

A faster, more global experience today

What this means for you

What’s next?

The challenge:

The solution:

The architecture:

The conclusion:

Hidden performance bottlenecks in AI networking

How GKE with DRANET unlocks performance

DRANET and the new A4X Max: a perfect match

The future of AI networking on GKE

A4X Max with NVIDIA GB300 GPUs

Increased RDMA performance with GKE DRANET

GKE and NVIDIA NeMo Guardrails

Vertex AI Model Garden to feature NVIDIA Nemotron models

Vertex AI Training with NVIDIA NeMo Integration

Take the next steps

A framework for agentic trust

An invitation to build the future

Bridging the gap: Connecting Oracle data to BigQuery

Unlocking BigQuery Analytics and AI

Real-world impact: Industry use cases

Further resources:

The Agent Industry Pulse

The Factory Floor

Creating Comics from Spanner Concepts using an ADK

Developer Q&A

On the Availability of Data Science and Data Engineering Agents