Google Cloud

2025 06 23

GCP – Work Smarter with Chromebook Plus and Google AI

The way we use technology at work is changing at a rapid pace. Innovation in AI is leading to new experiences and expectations for what can be done on laptops. That’s why we’re excited to unveil the next evolution of Chromebook Plus, a powerful leap forward and designed to help businesses unlock productivity, creativity, and collaboration for employees.

We’ve been hard at work, not only refining the features you already know and love, but also integrating even more Google AI capabilities directly into your devices. We’re also introducing the next wave of Chromebook Plus devices, including the brand-new Lenovo Chromebook Plus, an innovative device powered by the most advanced processor in a Chromebook ever—the MediaTek Kompanio Ultra.

This moment also marks a milestone in our larger effort to improve our hybrid computing approach to AI. With built-in NPU (neural processing unit) capabilities on Chromebook Plus, we now offer on-device AI for offline use, complemented by cloud-based capabilities that benefit from continuous updates and advancements. This hybrid approach allows us to balance performance, efficiency, privacy, cost, and reliability in Chromebook Plus.

The latest in Chromebook Plus

The Lenovo Chromebook Plus (14”, 10) is the world’s first Chromebook with an NPU and NPU-enabled capabilities. Powered by MediaTek’s Kompanio Ultra processor, and boasting 50 TOPS (trillions of operations per second) of AI processing power to enable on-device generative AI experiences, this intelligent device is built to keep up with modern workers and offer up to 17 hours of battery life. Learn more.

The Lenovo Chromebook Plus also comes with exclusive AI features built in. Organization is easy with Smart grouping, which provides you with a glanceable chip of your recent tabs and apps. You can also automatically group related items, move them to a new desk, or reopen all tabs in a single window. And with On device image generation, you can effortlessly turn any image into a sticker or standalone graphic with a transparent background, ready for use in Google Slides, Docs, and more.

Plus_SmartGrouping_May_2025_16x9_1x — Smart Grouping

A device for every need

We also understand that every business has its own unique needs and requirements. That’s why we’re so excited to expand the Chromebook portfolio with additional devices, including the ASUS Chromebook Plus CX15, ASUS Chromebook Plus CX14, and the ASUS Chromebook CX14. These additions further broaden the range of choices available, ensuring businesses can find a device that aligns with both their operational needs and budget.

When it comes to modernizing your team, the right device can make all the difference.

For cost-conscious businesses, who prioritize a highly affordable and reliable solution for essential tasks like email, web browsing, and cloud-based applications, standard Chromebooks offer exceptional value.
For enhanced interactions and versatility, especially for teams in retail, field services, or more creative roles, we offer touchscreen options, as well as detachable and convertible form factors so you can adapt to various work environments and presentation styles.
For advanced use cases and future-proofing, and employees that require cutting-edge performance, Chromebook Plus devices are the ideal choice. With powerful processors, more memory, double the storage of standard Chromebooks, and on-device AI capabilities, these devices are equipped to handle the next generation of productivity tools and smart features, future-proofing your investment.

New Google AI features to supercharge your workforce

Along with all of this new hardware, we’re also introducing new and updated features built directly into Chromebook and Chromebook Plus.

For productivity, we’ve enhanced Help me read, which now can simplify complex language into more straightforward, digestible text. This is perfect for quickly grasping complicated topics, technical documents, or anything that might otherwise require more time to understand. Additionally, we’re introducing the new Text capture feature. Leveraging generative AI, it extracts specific information from anything on your screen and provides contextual recommendations. Imagine automatically adding events to your calendar directly from an email banner, or effortlessly taking a receipt screenshot and pulling that data into a spreadsheet for easier tracking. Finally, Select to search with Lens helps you get more information from whatever is on your screen. Whether you’re curious about a landmark, a product, or anything else, this feature helps you quickly identify and learn more about it.

Plus_TextCapture_May_2025_16x10 — Text Capture

Just as critical as productivity is empowering teams to unleash their creativity. With that in mind, we’ve improved Quick Insert to now include image generation capabilities. With just the press of a button, you can generate high-quality AI images and instantly insert them into your emails, slides, documents, and more. Need a unique visual for a presentation or an email? Simply describe it, and let AI bring your vision to life.

Plus_QuickInsertImageGen_May_2025_16x10 — Quick Insert with Image Generation

As always, these features come with built-in policies, ensuring IT admins maintain full control over your organization’s access and usage of AI.

Preparing for the future of work

We continue to invest in making Chromebook Plus the definitive choice for businesses seeking to modernize their operations, empower their end-users with productivity and creativity, and prepare for the evolving demands of the future of work. With Chromebook Plus, your organization gains a secure, intelligent, and powerful platform designed to drive progress today and into tomorrow.

Click here to learn about ChromeOS devices, and discover which device is best for your business.

Read More for the details.

2025 06 18

GCP – BigQuery under the hood: Enhanced vectorization in the advanced runtime

Tibor Kiss Cloud, Google Cloud gcp

Under the hood, there’s a lot of technology and expertise that goes into delivering the performance you get from BigQuery, Google Cloud’s data to AI platform. Separating storage and compute provides unique resource allocation flexibility and enables petabyte-scale analysis, while features like compressed storage, compute autoscaling, and flexible pricing contribute to its efficiency. Then there’s the infrastructure — technologies like Borg, Colossus, Jupiter, and Dremel, as we discussed in a previous post.

BigQuery is continually pushing the limits of query price/performance. Google infrastructure innovations such as L4 in Colossus, userspace host networking, optimized BigQuery storage formats, and a cutting-edge data center network have allowed us to do a complete modernization of BigQuery’s core data warehousing technology. We do this while adhering to core principles of self-tuning and zero user intervention, to guarantee the best possible price/performance for all queries. Collectively, we group these improvements into BigQuery’s advanced runtime. In this blog post, we introduce you to one of these improvements, enhanced vectorization, now in preview. Then, stay tuned for future blog posts where we’ll go deep on other technologies and techniques in the advanced runtime family.

Enhanced vectorization: next-level query execution

Before diving into enhanced vectorization, let’s talk about vectorized query execution. In vectorized query execution, columnar data is processed in blocks the size of the CPU cache using Single Instruction Multiple Data (SIMD) instructions, which is now the de-facto industry standard for efficient query processing. BigQuery’s enhanced vectorization expands on vectorized query execution by applying it to key aspects of query processing, such as filter evaluation in BigQuery storage, support for parallel execution of query algorithms, and through specialized data encodings and optimization techniques. Let’s take a closer look.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7a058daf40>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

Data encodings
Modern columnar storage formats use space-efficient data encodings such as dictionary and run-length encodings. For instance, if a column has a million rows but only 10 unique values, dictionary encoding stores those 10 values once and assigns a smaller integer ID to each row rather than repeating the full value. Enhanced vectorization can directly process this encoded data, eliminating redundant computations and significantly boosting query performance. The smaller memory footprint of this encoded data also improves cache locality, creating more opportunities for vectorization.

1 Dictionary and run-length encodings — Figure 1: Dictionary and run-length encodings

For example, as figure 1 demonstrates, “Sedan”, “Wagon” and “SUV” string values are encoded in the dictionary, replacing the repeated string literals with integers that represent indices in the dictionary built from those string values. Subsequently, the repeated integer values can be further represented with run-length encoding. Both types of encodings can offer substantial space and processing savings.

Expression folding and common subexpression elimination
Enhanced vectorization integrates native support for dictionary and run-length encoded data directly into its algorithms. This, combined with optimization techniques such as expression folding, folding propagation, and common subexpression elimination, allows it to intelligently reshape query execution plans. The result can be a significant reduction, or indeed complete removal, of unnecessary data processing.

Consider a scenario where REGEXP_CONTAINS(id, '[0-9]{2}$') AS shard receives dictionary-encoded input. The REGEXP_CONTAINS calculation is performed only once for each unique dictionary value, and the resulting expression is also dictionary-encoded, reducing the number of evaluations significantly and leading to performance improvements.

2 Dictionary folding — Figure 2: Dictionary folding

Here, the calculation is applied to the input dictionary-encoded data directly, producing output of dictionary-encoded data and skipping the dictionary expansion.

With enhanced vectorization, we take expression folding optimization even further by, in some cases, converting an expression into a constant. Consider this query:

code_block: <ListValue: [StructValue([(‘code’, “SELECT SUM(number) FROM tablernWHERE REGEXP_CONTAINS(id, ‘^.*[0-9]{2}’);”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7a06941640>)])]>

If the id in the Capacitor file for this table is dictionary-encoded, the system’s expression folding will evaluate all dictionary values, and, because none of its values contain two digits, determine that the REGEXP_CONTAINS condition is always false, and replace the WHERE clause with a constant false. As a result, BigQuery completely skips scanning the Capacitor file for this table, significantly boosting performance. Of course, these optimizations are applicable across a broad range of scenarios and not just to the query used in this example.

Data-encoding-enabled optimizations
Our state-of-the art join algorithm tries to preserve dictionary and run-length-encoded data wherever possible and makes runtime decisions taking data encoding into account. For example, if the probe side in the join key is dictionary-encoded, we can use that knowledge to avoid repeated hash-table lookups. Also, during aggregation, we can skip building a hashmap if data is already dictionary-encoded and its cardinality is known.

Parallelizable join and aggregation algorithms
Enhanced vectorization harnesses sophisticated parallelizable algorithms for efficient joins and aggregations. When parallel execution is enabled in a Dremel leaf node for certain query-execution modes, the join algorithm can build and probe the right-hand side hash table in parallel using multiple threads. Similarly, aggregation algorithms can perform both local and global aggregations across multiple threads simultaneously. This parallel execution of join and aggregation algorithms leads to a substantial acceleration of query execution.

Tighter integration with Capacitor
We re-engineered Capacitor for the enhanced vectorization runtime, making it smarter and more efficient. This updated version now natively supports semi-structured and JSON data, using sophisticated operators to rebuild JSON data efficiently. Capacitor enables enhanced vectorization runtime to directly access dictionary and run-length-encoded data and apply various optimizations based on data. It intelligently applies folding to a constant optimization when an entire column has the same value. And it can prune expressions in functions expecting NULL, such as IF_NULL and COALESCE, when a column is confirmed to be NULL-free.

Filter pushdown in Capacitor
Capacitor leverages the same vectorized engine as enhanced vectorization to efficiently push down filters and computations. This allows for tailored optimizations based on specific file characteristics and the expressions used. When combined with dictionary and run-length-encoded data, this approach delivers exceptionally fast and efficient data scans, enabling further optimizations like expression folding.

Enhanced vectorization in action

Let’s illustrate the power of these techniques with a concrete example. Enhanced vectorization accelerated one query by 21 times, slashing execution time from over one minute (61 seconds) down to 2.9 seconds.

The query that achieved this dramatic speedup was:

code_block: <ListValue: [StructValue([(‘code’, ‘SELECTrn ANY_VALUE(id) AS id,rn hash_idrnFROM (rn SELECTrn CAST(source_id AS STRING) AS id,rn TO_HEX(SHA1(CAST(source_id AS STRING))) AS hash_idrn FROM `source_data`)rnWHERErn hash_id IS NOT NULLrnGROUP BYrn hash_id’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7a06582850>)])]>

This query ran against a table with over 13 billion logical rows spread across 167 partitions, stored in Capacitor columnar storage format and optimized with dictionary and run-length-encoded data.

Without enhanced vectorization

Executing this query with a regular query engine would involve several steps:

Reading all data for each partition, fully expanding the dictionary and run-length-encoded columnar data.
Computing CAST(source_id AS STRING) and TO_HEX(SHA1(CAST(source_id AS STRING))) for every single columnar data value.
Building a hashmap from all the non-NULL hash_id values.

With enhanced vectorization

When enhanced vectorization processed the same query over the same dataset, it automatically applied these crucial optimizations:

It directly scanned the columnar data in the Capacitor file while preserving its dictionary-encoded data.
It detected and eliminated duplicate computations for CAST(source_id AS STRING) by identifying them as common subexpressions.
It folded the TO_HEX(SHA1(CAST(source_id AS STRING))) computation, propagating the resulting dictionary-encoded data directly to the aggregation step.
The aggregation step recognized the data was already dictionary-encoded, allowing it to completely skip building a hashmap for aggregation.

This example of 21-times query speedup vividly demonstrates how tight integration between enhanced vectorization runtime and Capacitor and various optimization techniques can lead to substantial query performance improvements.

What’s next

BigQuery’s enhanced vectorization significantly improves query price/performance. Internally, we’ve seen a substantial reduction in query latency with comparable or even lower slot utilization with enhanced vectorization runtime, though individual query results can differ. This performance gain comes from innovations in both enhanced vectorization and BigQuery’s storage formats.

We’re dedicated to continuously improving both, applying even more advanced optimizations alongside Google’s infrastructure advancements in storage, compute, and networking to further boost query efficiency and expand the range of queries that the advanced runtime can handle. Over the coming months, BigQuery’s advanced runtime’s enhanced vectorization will be enabled for all customers by default, but you can enable it earlier for your project today. Next up: We’ll offer BigQuery enhanced vectorization for Parquet files and Iceberg tables!

Read More for the details.

2025 06 18

GCP – Automate data resilience at scale with Eon and Google Cloud Backup

Tibor Kiss Cloud, Google Cloud gcp

Cloud backups were once considered as little more than an insurance policy. Now, your backups should do more! They should be autonomous, cost-efficient, and analytics-ready by default.

That’s why Eon built a platform purposefully aligned with Google Cloud to eliminate backup blind spots, simplify recovery, and unlock the value inside backup data without requiring teams to become policy experts or infrastructure wranglers.

Still, no matter what platform you use, it’s critical to understand what resilient cloud backup looks like and how to get there with Google Cloud’s native capabilities.

What makes cloud backup resilient?

Before diving into tooling, it’s worth asking: What does a resilient backup strategy look like in the cloud? In our work with Google Cloud users across industries, we’ve found five common criteria:

5 signs your backup posture may be at risk

You can’t easily see what’s backed up (or not)
Retention policies vary across projects and teams
Data is duplicated or stored inefficiently, driving up spend
Cloud ransomware protection is reactive rather than policy-driven
Recovery requires full restores even when you only need one object

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e79daa4f550>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Best practices for data protection

Google Cloud provides foundational capabilities to protect your data if you configure and use them consistently. Here’s how to maximize native protection:

1. Versioning and retention: first lines of defense

Enable Object Versioning in Cloud Storage to retain multiple object versions, making it easier to recover from accidental deletions. Pair this with Retention Policies to enforce minimum storage lifetimes for regulatory or critical datasets.

Tip: Use Bucket Lock for write-once-read-many (WORM) protection in the areas where compliance matters most.

2. Monitor for gaps in coverage

Use native services like Cloud SQL backups, GKE snapshots, and Persistent Disk images, but be mindful that backup responsibilities can fall to different teams. Without centralized visibility, coverage becomes inconsistent.

Tip: Use Cloud Asset Inventory or scheduled BigQuery queries to audit coverage.

3. Design for granular recovery

Plan for partial restores since not everything needs a full rollback. Whether it’s a single BigQuery table or a specific Cloud Storage object, restoring only what you need saves time and cost.

Tip: Use Object Lifecycle Management to automatically transition older or less critical Cloud Storage objects to colder storage classes.

Automating the complexity away

Managing cloud backup at scale is hard to do manually. From onboarding new workloads to applying consistent policies, human-led approaches don’t scale well.

That’s why more teams are exploring autonomous Cloud Backup Posture Management (CBPM) solutions, like Eon, that detect new assets in real time, apply smart backup rules automatically, and enforce consistent protection across environments.

With Eon, you don’t have to tag resources or write custom scripts. Our platform classifies and protects your Google Cloud assets out of the box—whether you’re working with GKE, Cloud SQL, BigQuery, or another solution.

From backups to business insights

Traditionally, backup data was siloed, underused, and only meant to be retrieved in emergencies. But, increasingly, teams are unlocking that data to:

Run analysis directly on backups using BigQuery and Dataproc,
Feed training and monitoring pipelines via Vertex AI,
Deliver audit-ready dashboards with Looker, powered by backup snapshots.

With Eon, this is built-in. We transform backups into zero-ETL data lakes that reduce pipeline costs and provide immediate access to structured data with no reprocessing required.

What a “mature” backup posture looks like

The end goal for many cloud-native teams is not just to “have backups.” It’s to develop a resilient, intelligent backup strategy that adapts to scale and risk.

Here’s what that looks like:

Automated discovery of new resources
Policy-driven protection tailored to data type and criticality
Immutable backups with time-locked retention
Search-first recovery instead of full snapshot restores
Cost-aware tiering and storage deduplication

Eon helps Google Cloud users reach this level of maturity faster without the burden of custom tooling or constant policy updates.

Ready to simplify backup?

If your team spends hours managing scripts, storage tiers, or backup tags across cloud environments, it may be time to rethink your approach.

Eon was built to make cloud backup resilient, autonomous, and actually useful. From ransomware protection to instant, object-level recovery—and now, zero-ETL access to analytics—we’re here to help you unlock the full potential of your backup data.

Book a demo to see how Eon can modernize your Google Cloud data protection strategy.

To discover how Google Cloud can support your startup, visit our program page. You can also sign up for our newsletter to stay informed about community activities, digital events, special offers, and more.

Read More for the details.

2025 06 18

GCP – Google is a Leader in the 2025 Gartner® Magic Quadrant™ for Analytics and Business Intelligence Platforms

Tibor Kiss Cloud, Google Cloud gcp

We are pleased to share that Gartner® has named Google a Leader in the 2025 Magic Quadrant™ for Analytics and Business Intelligence, for the second consecutive year. We believe this validates our strategy of delivering a comprehensive BI platform for self-service and governed environments that’s accessible to entire organizations through natural language, and backed by trusted data enabled by a semantic modeling layer.

figure1 (1) — Download the complimentary 2025 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms.

Generative AI has redefined what a business intelligence platform can offer. In the past year, we introduced Conversational Analytics, a new way for people in all parts of your organization to talk with their data and get answers, using simple, natural language, while also delivering many AI-powered capabilities to the Looker platform, including slide generation, formula creation and more. We also set the stage for AI agents grounded in truth with Looker’s trusted metrics, expanded our semantic layer to new third-party providers, introduced Looker reports for Google-easy dashboarding, and debuted continuous integration to help developers build and test faster.

The goal: infusing trusted data into a company’s every workflow and decision.

The generative AI revolution in BI

The deep integration of Google’s foundational Gemini models into the Looker platform has ushered in a new era of AI-powered business intelligence, making data exploration and analysis more accessible and insightful than ever before.

The AI-powered capabilities that we introduced in the past year are fundamentally changing how users interact with their data:

Conversational Analytics: Users can now ask complex questions of their data in natural language and instantly receive intelligent, visualized answers. This empowers business users to self-serve their data needs without writing a single line of code, freeing up data teams to focus on more strategic initiatives.
Code Interpreter in Conversational Analytics translates your natural-language questions into Python code, and executes that code to provide advanced analysis and visualizations. This helps with more complex scenarios, such as “what if” questions, period-over-period growth analysis and more.
AI-powered development: Every action in the Looker platform is powered by Google’s Gemini models, accelerating all of your BI actions, from writing and debugging LookML, developing robust and reliable data models, to building new reports and slides.
Automated slide generation and formula creation: With Google’s latest Gemini models, Looker re-envisions the way you create and share information in the AI age. You can create Google Slides presentations with insightful chart summaries in seconds, or tap into the formula assistant to build calculated fields that leverage metrics and dimensions based on your own unique data.

Looker agents, leveraging the Conversational Analytics API, will soon be available in Agentspace, providing a central repository for discovery and access so they are simple to deploy and manage.

Google-easy data storytelling with Looker reports

Building on our commitment to flexible and powerful data visualization, we introduced a new, more intuitive Looker reports experience. This reimagined reporting capability provides a beautiful and collaborative canvas for data exploration and storytelling, complete with:

Enhanced visualization capabilities: New chart types and customization options give users more control over how they present their data to tell compelling narratives.
Simplified, collaborative workflows: The new reporting interface makes it easier than ever to build, share, and collaborate on reports, fostering a more inclusive data culture.
Responsive canvas: Reports are now more responsive, for a smooth viewing experience across screen sizes for devices ranging from desktops, to tablets, to mobile.

Empowering developers and embedded experiences

At Looker, we know the only limitations on developer creativity are those you set on yourself. That is, unless tools get in the way. With that in mind, we continue to invest heavily in the developer experience.

Our new Conversational Analytics API allows you to embed natural-language querying directly into your applications and workflows, unlocking a new level of interactivity and user engagement for embedded analytics experiences. When applied in combination with Looker Embedded and the emerging Model Context Protocol (MCP) standard, developers can now build and design custom conversational agents for BI for their own applications and innovations.

Agentspace will serve as a centralized hub for managing and sharing Looker agents, enhancing discoverability and simplifying deployment. With this approach, teams can quickly leverage AI-powered insights and share agents across teams, promoting a more data-driven culture. And with the Agent Development Kit announced at Google Cloud Next ‘25, Google is providing a rich model and tooling ecosystem designed for multi-agent capabilities.

Code Interpreter in Conversational Analytics enables users to perform advanced analysis that historically has required specialized knowledge of advanced coding or statistical methods. Code Interpreter excels at questions that go beyond the scope of the standard BI query, such as “What were the key drivers of sales in my data?” or “What were our quarterly sales in 2023 and 2024, and what was the quarter-over-quarter growth?”. Also, we know that in the world of AI-powered data, an answer holds little value if you can’t verify how it was generated. That’s why Code Interpreter shows its work. For every answer it produces, you can expand a “How is this calculated?” section to see the exact Python code that was run, ensuring it’s not a black box.

We also know developers need to trust that their applications and dashboards are accurate and will build properly every time. To enhance the reliability and speed of LookML development, the Spectacles.dev team joined Google Cloud, and is working hard to deliver powerful continuous integration (CI) and automated testing capabilities to the Looker platform, helping to ensure data quality and consistency at scale.

Bringing trust to every gen-AI-powered business

In the AI era, data drives your business, your apps and your decisions. You need your data to be accurate and consistent, but that hasn’t always been the case with traditional tools. In this new world, trusted definitions managed by a semantic layer are a must-have, backed by unique information about your business. It is not enough to have reports and dashboards be available or simply delightful — they must take full advantage of data agents for specific use cases, or be embedded in third-party apps that your organization uses every day.

The combination of Looker’s powerful semantic model with Google’s leading AI capabilities delivers a new foundation for business intelligence — one that is more intelligent, intuitive, and impactful than ever before. Our own testing shows that by building with Looker’s semantic layer, data errors in gen AI natural language queries are reduced by as much as two thirds. Data consistency and quality are top priorities for modern organizations. We are building for this moment.

To download the full 2025 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms report, click here, and for more information on Looker see here.

^{2025 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms – Anirudh Ganeshan, Edgar Macari, Jamie O’Brien, Kurt Schlegel, Christopher Long, June 16, 2025}

^{GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved.}

^{Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.}

^{This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Google.}

Read More for the details.

2025 06 18

GCP – What’s in an ASP? Creative Phishing Attack on Prominent Academics and Critics of Russia

Tibor Kiss Cloud, Google Cloud gcp

Written by: Gabby Roncone, Wesley Shields

In cooperation with external partners, Google Threat Intelligence Group (GTIG) observed a Russia state-sponsored cyber threat actor impersonating the U.S. Department of State. From at least April through early June 2025, this actor targeted prominent academics and critics of Russia, often using extensive rapport building and tailored lures to convince the target to set up application specific passwords (ASPs). Once the target shares the ASP passcode, the attackers establish persistent access to the victim’s mailbox. Two distinct campaigns are detailed in this post. This activity aligns with Citizen Lab’s recent research on social engineering attacks against ASPs, another useful resource for high risk users.

GTIG tracks this activity as UNC6293, a likely Russia state-sponsored cyber actor we assess with low confidence is associated with APT29 / ICECAP. After establishing rapport, the attacker sent phishing lures disguised as meeting invitations, and added spoofed Department of State email addresses on the cc line of the initial outreach to increase the legitimacy of the contact attempt. The initial phishing email itself is not directly malicious, but encourages the victim to respond to set up a meeting.

Figure 1: Keir Giles, a prominent British researcher on Russia, posted this screenshot of an email header with fake U.S. Department of State emails that was part of a UNC6293 campaign

Targets who responded received an email with a benign PDF lure attached. The State Department themed lure is customized to the target and contains instructions to securely access a fake Department of State cloud environment. This included directing victims to go to https://account.google.com and create an Application Specific Password (ASP) or “app passwords.” ASPs are randomly generated 16-character passcodes that allow third-party applications to access your Google Account, intended for applications and devices that do not support features like 2-step verification (2SV). To use an ASP you must set it up and provide a name for the application.

Figure 2: Benign PDF document with instructions

In campaign one, the ASP name suggested in the lure PDF was “ms.state.gov” and in campaign two, we observed a Ukrainian and Microsoft themed ASP name. After creating the ASP, the attackers directed the target to send them the 16-character code. The attackers then set up a mail client to use the ASP, likely with the end goal of accessing and reading the victim’s email correspondence. This method also allows the attackers to have persistent access to accounts.

Campaign	Sender Theme	ASP Name	Attacker Infrastructure Used
Campaign 1	State Department	ms.state.gov	91.190.191.117 – Residential proxy
Campaign 2	Unknown	Ukrainian and Microsoft-themed ASP	91.190.191.117 – Residential proxy

Attackers logged into victim accounts primarily using residential proxies and VPS servers, in some cases re-using infrastructure to access different victim or attacker accounts. As a result, we were able to connect the two distinct campaigns we observed to the same cluster. We have re-secured the Gmail accounts compromised by these campaigns.

Mitigations

GTIG is committed to our mission of understanding and countering advanced threats. We use the results of our research to ensure that Google’s products are secure and to protect our users and enterprise customers.

Users have complete control over their ASPs and may create or revoke them on demand. Google Workspace administrators also have options for restricting their use, or revoking ones created by their users. Upon creation, Google sends a notification to the corresponding account Gmail, recovery email address, and any device signed in with that Google account to ensure the user intended to enable this form of authentication.

Figure 3: Google Account Help documentation on app passwords

Google provides enhanced security resources such as the Advanced Protection Program (APP), intended for individuals at high risk of targeted attacks and exposure to other serious threats. Opting to use the APP prevents an account from creating an ASP due to the program’s heightened security requirements.

We are committed to sharing our findings with the security community and with companies and individuals that may have been targeted by these activities, and we hope that improved understanding of tactics and techniques will enhance threat hunting capabilities and lead to stronger user protections across the industry.

Lure PDF Document

SHA256: 329fda9939930e504f47d30834d769b30ebeaced7d73f3c1aadd0e48320d6b39

Read More for the details.

2025 06 17

GCP – Enhancing backup vaults with support for Persistent Disk, Hyperdisk, and multi-regions

Tibor Kiss Cloud, Google Cloud gcp

To help protect against evolving digital threats like ransomware and malicious deletions, last year, we introduced backup vault in the Google Cloud Backup and DR service, with support for Compute Engine VM backups. This provided immutable and indelible backup capabilities for mission-critical VMs, for both VM metadata and all their attached disks.

Today, we’re announcing two enhancements to backup vaults that can help you protect more types of workloads, better:

Backup vaults now support standalone Persistent Disk (PD) and Hyperdisk backups. Now in preview, it enables the direct backup of data on individual disks, providing a granular alternative to backing up the entire virtual machine.
Backup vaults can now be created in multi-region locations. Now generally available it supports regional data resilience and helping to meet business continuity requirements.

Immutability and indelibility

Traditional backups have a well-known vulnerability. If a malicious actor gains access to your environment, if they attempt to delete or corrupt the backup, preventing recovery and thus causing business loss, there is nothing preventing this from happening. This is where backup vaults fundamentally change the game.

A backup vault provides a secure, isolated storage environment in Google-managed projects that helps ensure your backups are immutable (secured against data modification) and indelible (secured against data deletion), providing protection against cyber attacks such as ransomware. When creating a backup vault, you can specify that vaulted backups must be secured against modification and deletion — even by a backup administrator who would traditionally have the ability to expire backups — until the specified minimum enforced retention timeframe has elapsed.

Once a backup is stored in a vault, it’s logically air-gapped from your Google Cloud project, and cannot be changed during its user-defined enforced retention period. This means:

No deletion: The backup can’t be accidentally or deliberately deleted before its enforced retention period expires.
No alteration: The backup data cannot be changed, and remains exactly as it was when it was created.

This gives you the confidence that your crucial recovery points have not been modified, so they are available when you need them.

Backup Vault now supports Persistent Disk and Hyperdisk

Many applications rely on the durable storage provided by Persistent Disk and Hyperdisk. With support for Persistent Disk and Hyperdisk in addition to Compute Engine VMs, backup vaults now offer a holistic defense strategy for your entire compute environment:

For your VMs: Backup vaults can help protect your Compute Engine VMs (including VM metadata and all the attached disks). They can provide rapid and secure recovery of operating systems, configurations, application binaries, and all associated disks.
For critical data disks: Now you can secure specific Persistent Disks and Hyperdisks that contain application data, databases, and file shares. They can provide granular protection, for scenarios where a full VM backup isn’t necessary, or you want to optimize costs.

This integrated approach ensures that whether you need to restore an entire VM or a specific disk, your recovery points are secured in a backup vault.

Key benefits of unified backup vault protection

By centralizing your Compute Engine VM, Persistent Disk, and Hyperdisk backups within backup vaults, you gain a powerful suite of advantages that transform your data protection strategy from reactive to proactively resilient:

Unified interface for easy management: Easily define and enforce consistent backup policies (including backup frequency and retention period) across your entire organization. Manage backups for your Compute Engine VMs, Persistent Disks, and Hyperdisks from a unified interface, even across multiple Google Cloud projects, simplifying administration.
Comprehensive monitoring and reporting: Benefit from centralized monitoring, detailed reporting, and timely alerting capabilities that streamline your day-to-day backup management. This enhanced visibility also significantly aids in meeting stringent audit and compliance requirements by providing clear, verifiable records of your backup posture.
Proactive security integration: Elevate your overall security posture with integration to Security Command Center, enabling proactive detection of anomalous activities, such as unauthorized backup deletion attempts or suspicious policy changes, so you can respond swiftly and decisively to threats.
Reduced operational complexity: Consolidate your backup management processes, moving away from disparate, script-based, or manual solutions. Backup and DR service provides a streamlined, fully managed service that simplifies operations, reduces human error, and frees up valuable IT resources, so you can focus on innovation.

Here’s how it works

Create a backup vault: Begin by establishing a secure backup vault. This vault acts as your designated, isolated, and highly protected storage destination for all your managed backups.
Define a backup plan: Next, create a comprehensive backup plan, specifying parameters such as the desired backup frequency (how often your disks will be backed up), backup retention period, and designating the specific backup vault where the backup data will be stored.
Schedule your backups: Now you are ready to apply your backup plan to your desired Persistent Disks or Hyperdisks. The Backup and DR service automatically takes incremental crash-consistent backups according to your defined schedule, with no manual intervention on your part.

Once these backups are created and stored in your designated vault, the vault’s enforced retention policy is automatically applied, making the backups immutable and indelible for the specified enforced retention period.

1 - persistent-disk-backups-to-backup-vault-for-cyber-resilience

Secure disaster recovery with multi-region backup vaults

In addition, you can now create backup vaults in Google-managed, multi-region locations. When using a multi-region backup vault, data is stored in more than one geographic region, thereby providing the security benefits of backup vault, while also making critical backup data available during unforeseen events.

Using multi-region backup vaults lets you:

Retain data access: Maintain accessibility and recoverability of critical backup data during a regional service disruption (such as natural disasters, power outages).
Satisfy business continuity requirements: Instill confidence in your business operations with your ability to perform on-demand, backup-based recoveries.
Secure your data: Retain all of the critical security benefits delivered by backup vaults.

Multi-region backup vault storage is generally available and currently supports Compute Engine full VM backups and disk backups to supported Locations. Complete this form to request access to the new feature.

2 - Backup vault creation screen - multi-region

Protect all your critical Compute Engine data

With the addition of multi-region backup vaults and disk-level backup, Backup and DR service can secure and recover critical Compute Engine data better than ever. Try the new capabilities yourself to optimize your VM data protection strategy.

To learn more about disk backup, start here.
To learn more about multi-region backup vaults, start here.
To request access to use multi-region backup vaults, please complete this form.
See here for pricing information relating to the new capabilities.

Read More for the details.

2025 06 17

GCP – GKE workload scheduling: Strategies for when resources get tight

Tibor Kiss Cloud, Google Cloud gcp

As a customer of Google Kubernetes Engine (GKE), you’ve selected a container runtime with a high degree of managed operations, encompassing everything from automatic upgrades to effortless node management. This inherent efficiency allows you to focus more on your applications, and less on the underlying infrastructure. In an ideal world, this streamlined experience, coupled with GKE’s robust autoscaling capabilities, ensures perfect workload scheduling all the time. Your applications seamlessly scale up and down, always finding the resources they need, precisely when they need them.

Unfortunately though, the real world presents a few more challenges that need to be addressed. GKE offers powerful four-way autoscaling (Horizontal Pod Autoscaler, Vertical Pod Autoscaler, Cluster Autoscaler, and Node Auto Provisioning) that provides the building blocks to address the scalability needs for workloads and infrastructure. However, running an efficient platform for today’s dynamic workloads involves more than just ensuring scalability. Factors like cost optimization, capacity availability, the speed at which resources can scale, overall performance, and the flexibility of your infrastructure all profoundly affect and constrain how workload scheduling can be effectively planned on GKE. Honestly, it can get a bit cloudy on what is the best strategy and what are the trade offs between these parameters.

In this blog we will focus on specifically the GKE scheduler and the factors that can influence its workload placement decisions when capacity constraints exist. We will explore how to plan and design for these scenarios using various GKE features and workload configurations.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ecd5756d0d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>

It’s (mostly) a constraint optimization problem

At its core, effective workload scheduling in GKE is not about finding the single best solution, but rather navigating a multi-dimensional constraint optimization problem. For many use cases it is less about overcoming strict limitations and more about finding trade-offs between competing priorities:

Cost: You want to minimize overall infrastructure spend, optimize utilization, avoid over-provisioning, and leverage cost-effective solutions.
Performance: You want to ensure the workloads that run on the platform can meet their SLOs according to their relative importance to the business.
Flexibility and agility: You want to be able to react to changes in demand of your workloads by providing the necessary capacity when needed.

Understanding your individual preferences and tolerances across these dimensions is critical to understanding how to navigate this constraint space and how to design and configure your GKE environment.

Core building blocks

While not the only factor, autoscaling and its configuration plays a key role in workload scheduling. The configuration of scaling is particular to each environment, and some best practices have been documented. GKE supports autoscaling across four dimensions:

Horizontal Pod Autoscaler (HPA) – adjusts the number of pod replicas
Vertical Pod Autoscaler (VPA) – adjusts pod resource requests based on actual usage
Cluster Autoscaler (CA) – automatically adjusts the number of nodes
Node Auto Provisioner (NAP) – adjusts node pool size based on workload demands

When capacity is a concern, it’s crucial to understand how much resource your workload requests and consumes. The GKE scheduler relies on the pod resource.request value to make an optimum scheduling decision. If this is not set, this can result in incorrect placement (e.g., on nodes with not enough capacity) and workload instability due to pre-emption. The importance of setting requests is discussed in more detail here.

Workload scheduling constraint scenarios

What are good options for running an efficient and performant platform when capacity is constrained?

Let’s take some examples of common scenarios and discuss our options for getting the best result for our workload in terms of cost, performance, and flexibility.

Capacity is fixed or limited – but some high-priority workloads need guaranteed capacity

In this scenario the number of available nodes is considered to be static but the workloads still scale with demand. This creates the need to guarantee resources for critical workloads and explicitly define priority orders.

Solution: Workload priority classes and taints/tolerations

Priority classes implement a hierarchy of workload importance, where higher priority workloads take precedence over lower ones during scheduling decisions. As shown in the diagram above, under capacity constraints, the scheduler evicts lower priority (blue) workloads to successfully schedule those with a higher priority (red).
Taints and tolerations allow capacity targeting by ensuring workloads are not scheduled onto inappropriate nodes. They make sure all the capacity on certain (tainted) nodes (e.g.. with GPUs or SSDs) is only available to specific workloads. Not even workloads with a higher priority class than those with a toleration will be scheduled on the tainted node.

Applications experience sudden spikes in demand, and workloads need to be scaled quickly without performance degradation / errors

In this scenario we need to quickly schedule workloads on a horizontally scalable cluster. Even though GKE has features like container-optimized compute and image streaming that can drastically reduce provisioning time on new nodes, scheduling pods is still much quicker than scaling nodes. This can lead to resource bottlenecks and a degraded SLO.

Solution: Placeholder pods and scaling profiles

Placeholder pods, or “balloon” pods, have the effect of holding or reserving spare running capacity in the cluster. When there’s a sudden spike and new pods need to be scheduled, these balloon pods are evicted, releasing capacity and allowing new pods to be scaled rapidly in their place. New nodes are provisioned by the cluster autoscaler to accommodate the evicted placeholder pods, and provide more capacity if needed.
Auto-scaling profiles configure node scale-down behaviours based on either cost or performance. There are two cluster-based profiles in GKE: balanced and optimize-utilization. The balanced profile scales down nodes in a less aggressive manner compared to the optimize-utilization profile, meaning nodes are available for longer. Any further spikes in demand therefore are not delayed by new node provisioning times.

Workload-specific node scale-down is also available through the use of compute classes (described in more detail below). These allow for node consolidation triggers such as utilization and time delays to customize node lifetimes for different conditions.

I need to provision additional nodes in my cluster but my preferred node type is not available

In this scenario, we address the need to scale out a cluster without knowing if our required or preferred node type, such as a certain hardware accelerator or spot instance, will be available.

Solution: GKE custom compute classes.

These allow you to specify the preferred and fallback nodes that can be used to scale out your clusters. Priorities can be defined for specific node properties like CPU and accelerators, node characteristics (VM family, min CPU/Memory, Spot), or specific instance types (n4-standard-16).
Compute classes also adopt active migration to top priorities, meaning that workloads will always be reconciled to the highest priority option (e.g. Spot instance) when it becomes available, if it was not available at deployment time.
For users of resource-based committed use discounts (CUDs), compute classes can be configured in a way that prefers their committed resources before moving to other resources. To allow for full flexibility and between machine families, regions, and even compute platforms, you should also consider moving to flexible CUDs in the future.

code_block: <ListValue: [StructValue([(‘code’, ‘apiVersion: cloud.google.com/v1rnkind: ComputeClassrnmetadata:rn name: my-classrnspec:rn priorities:rn rules:rn – machineFamily: n4rn minCores: 16rn – machineType: e2-standard-16rn – nodepools: [pool1, pool2]rn autoscalingPolicy:rn consolidationDelayMinutes: 20rn nodepoolAutoCreation:rn enabled: true’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd540c93d0>)])]>

My preferred resources are not available in a given region

In this scenario, the capacity required by the workload is very specific and in high demand. There might even be a possibility that the resources cannot be obtained in a region even through compute classes. This is especially important for AI-based workloads that require high-performance infrastructure and GPU or TPU accelerators.

Solution: Multi-Cluster Orchestrator and Multi-Cluster Gateway

Multi-Cluster Orchestrator is an open-source project whose primary goal is ¨simplifying multi-cluster deployments, optimizing resource utilization and costs, and enhancing workload reliability, scalability, and performance.” Using this technology in GKE, platform engineers can in effect “capacity chase” across Google Cloud regions where a workload´s capacity requirements are matched with the regions where that capacity is available. Multi-Cluster Orchestrator then initiates cluster provisioning in that region to run the workload.
Multi-Cluster Gateway is a networking solution for GKE that leverages the Kubernetes Gateway API to manage application traffic across multiple GKE clusters, potentially spanning different regions. It simplifies the complex task of exposing services and balancing workloads in geographically distributed GKE environments

Conclusion and next steps

GKE offers platform engineers a robust set of tools to optimize resource allocation, even when they face capacity constraints. Effective and holistic capacity planning depends on a clear understanding of the workloads, including their criticality, usage profiles, and capacity requirements. Managing constrained capacity can be a strategic way to control costs, making it crucial to optimize performance under these conditions.

To further enhance your capacity planning consider the following resources:

Understand GKE cluster and workload signals such as utilisation and rightsizing and how they are important in capacity planning.
Monitor scaling events such as failed pod scheduling events and node/pod number changes in the Unschedulable Pods dashboard template.
Take a look at the recently released feature — GKE Horizontal Pod Autoscaling Observability Events — which provides the ability to view HPA autoscaler decision events in logs. This can help with the tracking and understanding of scaling event decisions which may influence platform design.

Read More for the details.

2025 06 17

GCP – Spanner’s enduring impact: Celebrating the 2025 ACM SIGMOD Systems Award

Tibor Kiss Cloud, Google Cloud gcp

Earlier this year, the Association for Computing Machinery’s Special Interest Group on Management of Data (ACM SIGMOD) announced that Spanner, Google’s globally distributed database, was awarded the 2025 SIGMOD Systems Award. The SIGMOD Systems Award specifically honors systems whose technical contributions have profoundly impacted the theory or practice of large-scale data management. On behalf of the entire Spanner team, especially the engineers who were there at the beginning of Spanner’s journey, it is with deep humility and immense pride that we receive this recognition from such a distinguished community. We’re thrilled to be participating in the 2025 SIGMOD conference from June 22-27 in Berlin, Germany as a Platinum sponsor.

This honor feels particularly significant following Spanner’s 2022 SIGOPS Hall of Fame Award, which highlighted the crucial role of technologies like TrueTime and our network infrastructure, reaffirming the lasting significance of the original vision laid out in the first Spanner paper.

Spanner’s core innovation: TrueTime and external consistency

For Spanner to be recognized in this way is a powerful affirmation of the vision we set out to achieve years ago and the new ways Spanner enables applications to be built. According to the award citation, Spanner is recognized “for reimagining relational data management to enable serializability with external consistency at global scale.”

Why “reimagining”? Before Spanner, databases offered a stark choice: you could have ACID transactions and SQL, or you could have scale and multi-datacenter reliability. Scale and availability required a distributed system, and that meant eventual consistency and other forms of best-effort synchronization. Spanner showed that this choice was not fundamental — that it was possible to build a database that offered the horizontal scalability of a distributed system with the power and ease of use of transactions and SQL. It enabled companies for whom scale is job #1 to regain developer velocity and agility. Spanner drastically simplifies the logic required in distributed applications. Developers can reason about the state of the database as if it were a single, consistent entity, even when it spans the globe.

The key enabler for Spanner’s ability to deliver external consistency is TrueTime. Beyond just a synchronized global clock, TrueTime is an API that cleverly exposes clock uncertainty as a bounded interval, which allows higher-level algorithms to reason about the ordering of events. Google’s TrueTime implementation uses specialized hardware references like GPS receivers and atomic clocks to provide highly trustworthy and very tight time bounds. Spanner leverages this bounded uncertainty to achieve external consistency. When a transaction commits, Spanner assigns it a commit timestamp derived from TrueTime. Spanner then enforces a “commit wait” — which can be overlapped with making the transaction durable – to ensure that the commit timestamp is definitively in the past before making the transaction’s effects visible. This ensures that the assigned commit timestamps definitively reflect the true global serialization order of transactions, even across data centers. The result is remarkable: external consistency with no performance cost.

Addressing the consistency-scale dilemma

To truly appreciate the journey, it helps to travel back to the early and mid-2000s at Google. The internet was exploding, and our biggest challenge was scaling our software infrastructure to keep pace. We needed databases that could store and process a copy of the internet using vast fleets of commodity servers. This spurred the development of internal systems that delivered incredible performance and scalability, but they came with trade-offs.

As we gained more familiarity with these systems, and started using them to build big interactive applications like Gmail, we consistently heard from internal developers about the challenge of working with eventual consistency and cross-shard synchronization, as well as the friction of modeling every problem (no matter how complex) as key-value pairs. It quickly became apparent that we needed to build a globally distributed database that offered the familiarity and guarantees of traditional relational databases — including ACID transactions, serializability, and external consistency — without giving up Google’s ever-growing need for bigger databases serving bigger audiences. Moreover, working closely with our customers, it became clear that actually, it was something we could build. The rest is history!

Spanner as a cloud service

As a cornerstone of Google’s infrastructure, Spanner powers some of our most critical, planet-scale services, including Google Ads, Google Search indexing, Gmail, YouTube, Google Photos, metadata for Cloud Storage, and BigQuery, demonstrating its robustness and scalability under extreme load.

The next logical step was to make these capabilities available externally to customers through Google Cloud. The launch and subsequent evolution of Spanner aimed to democratize this technology, bringing the power of a globally consistent, scalable database to organizations of all sizes, from startups to global enterprises, simplifying their application development and operations.

Spanner’s core value proposition for customers stems directly from its unique architecture:

Global scale with strong consistency: Spanner delivers on the original promise: ACID transactions with external consistency across a database that can scale horizontally across regions and continents, automatically managing data distribution (sharding) as needed. This directly addresses the capability highlighted by the SIGMOD award.
Unmatched availability: Leveraging synchronous, Paxos-based replication across multiple zones or regions, Spanner offers an industry-leading 99.999% availability Service Level Agreement (SLA) for multi-region configurations. This helps provide extreme fault tolerance and minimizes downtime risk for mission-critical applications.
Simplified operations: As a fully managed service, Spanner automates complex operational tasks like sharding, replication management, backups, and maintenance. This frees development teams from significant operational burdens, allowing them to focus on building applications rather than managing database infrastructure. This contrasts sharply with the manual effort often required for traditional sharded databases or the complexity of implementing consistency logic at the application layer for NoSQL systems.
Developer productivity: Spanner offers familiar SQL query interfaces, supporting both Google SQL and PostgreSQL dialects, which significantly flattens the learning curve for developers. Furthermore, its strong consistency eliminates entire classes of complex problems related to data synchronization and reconciliation that often plague applications built on eventually consistent systems.

Empowering customers and industries

Of course, today’s data landscape has changed since we published the first Spanner paper in 2012. In the modern AI-first data world, we see customers more focused than ever on getting full value from their data, which is often spread out over many systems with different data models, varying scalability, and uneven reliability. We’re addressing these new challenges head on, introducing Spanner Graph, vector search for AI applications, and integrated full-text search. These allow you to bring together a wide range of data and iterate on it rapidly within a single, consistent, scalable platform. We’ve also helped you run even more cost-effectively as your data grows, with increased compute and storage density, tiered pricing through editions, and cost-optimizing features such as tiered storage. Finally, we’ve made it easier to bring scale-out workloads into Spanner through enhanced interoperability with tools and capabilities, such as Cassandra-compatible APIs.

While academic recognition such as the SIGMOD award is gratifying, we’ve always felt that the true measure of a system’s impact lies in how it empowers users to solve real-world problems and build innovative applications. Spanner’s unique combination of capabilities have proven transformative across various industries.

In financial services, where consistency, availability, and security are paramount, Spanner provides the foundation for many critical systems. Companies like Goldman Sachs use it to consolidate trade ledgers, while others like Arigato Bank rely on it to handle high volumes of financial transactions with perfect consistency, even during peak loads. Digital-native banks like Minna Bank have built their entire infrastructure on Spanner, leveraging its availability and consistency to meet stringent regulatory requirements and customer expectations.

The gaming industry is constantly pushing the boundaries of scale and real-time interaction. Spanner has helped game developers launch globally successful titles like Dragon Quest Walk by Colopl, handling millions of concurrent players from day one. Its ability to manage player profiles, in-game inventory, and leaderboards consistently across a global player base, while elastically scaling to handle unpredictable traffic spikes, has been crucial for delivering a seamless player experience

In retail and e-commerce, Spanner helps businesses manage the complexities of modern commerce. Walmart uses Spanner to modernize its inventory and payment management, providing a real-time, consistent view across online and physical stores. MercadoLibre, a global online marketplace and e-commerce provider, leverages Spanner to handle the global needs of their customers, including massive demand spikes during major product launches.

Leaders in transportation, such as Uber, rely on Spanner to handle millions of concurrent users and billions of trips per month across over 10,000 cities and billions of database transactions per day.

These examples illustrate that for many modern applications, Spanner’s specific blend of global consistency, massive scalability, high availability and interoperable multi-model isn’t just a technical advantage; it’s a fundamental enabler. Business models built around real-time global inventory, instantaneously consistent financial records, or seamless worldwide multiplayer experiences become significantly simpler, less risky, and more feasible to implement with Spanner.

The future with Spanner

The SIGMOD award recognizes the contributions of over 30 individuals, and countless other current and prior Googlers who have played roles throughout its development and evolution. It’s been a privilege to witness the journey from the initial ambitious concepts to the globally impactful system it is today. The journey from the original OSDI paper published to this 2025 SIGMOD Systems Award highlights over a decade of sustained research, engineering, and investment by Google. This long-term commitment is rare and is a key factor behind Spanner’s enduring success and impact.

Thanks to you, our customers, Spanner remains a living system that continues to evolve, benefiting from ongoing improvements, driven by both internal use and the needs of our cloud customers. Looking ahead, we remain committed to pushing the boundaries of what’s possible with distributed databases. We’re excited about enabling new kinds of applications, including those leveraging AI, and continuing our mission to simplify the complexities of data management for developers building the next generation of world-changing applications.

Experience the Spanner difference

The 2025 ACM SIGMOD Systems Award is a tremendous honor and a validation of the path we embarked on years ago. If you are at the 2025 SIGMOD conference, join us on Tuesday, June 24 at 4:30 pm to hear from and meet with the Google team.

If Spanner’s capabilities for consistency, scale, and availability resonate with the challenges you face, we encourage you to learn more:

Explore Spanner to learn about supported features and use cases, and get started with a 90-day Spanner free trial instance.
Dive deeper with documentation
For the academically inclined, read the original and successor papers on Spanner

In closing, we believe Spanner unlocks new possibilities for building reliable, scalable, and globally consistent applications, and we are excited to see what you, our customers, will build with it.

Read More for the details.

2025 06 17

GCP – Graduating the Google for Startups Accelerator: AI First in Europe & Israel

Tibor Kiss Cloud, Google Cloud gcp

Today, we’re incredibly proud to announce the graduation of the latest cohort from the Google for Startups Accelerator: AI First from Europe & Israel! This milestone marks the culmination of an intensive three-months journey for these 14 innovative startups, who’ve dedicated themselves to growing their businesses and pushing the boundaries of artificial intelligence. The hybrid program offered expert mentorship, robust technical support, and access to a powerful global network, empowering founders to scale their impact.

“With Google’s support, we brought our AI recruitment platform into its next generation — the most advanced in the world, with a business model built for $7M+ ARR within a year. Their guidance and exposure to breakthrough models took our tech years ahead.“ – Shira Spetter, CEO, iVERSE

With AI projected to contribute a staggering $15.7 trillion to the global economy by 2030 (PwC), supporting these AI startups is crucial for accelerating groundbreaking innovations, deploying scalable solutions, and ensuring AI’s benefits are widely accessible to businesses and communities worldwide. Because when AI innovation thrives, the world moves forward.

“We were initially cautious about deploying LLMs in production for key functionalities. However, after experiencing Gemini 2.5, we’re not just convinced – we’re actively integrating it to power exciting new features.” – Maria Fe Paz, CEO and founder of Connect by Circular-Lab.

The cohort celebrated the milestone at Viva Technology in Paris where they presented their companies, met potential venture capitalists and had an intimate fireside chat with Joëlle Barral, Research & Engineering Senior Director at Google DeepMind and Arno Amabile, Advisor to the French President Special Envoy for AI.

Learn more about the graduating startups and their inspiring work:

Ambr AI (UK) helps professionals master difficult workplace conversations through realistic Voice AI practice simulations, providing a safe, convenient environment with instant feedback to build crucial communication skills.
Connect by Circular-Lab (Spain) uses AI/ML to structure and centralize diagnostic data, making it accessible for labs, hospitals, and industry stakeholders.
Folio (Israel) empowers industrial sales and application engineers by turning technical specs, configuration data, and application info into instant answers, recommendations, and agentic workflows, speeding work, cutting errors, and boosting revenue for industrial manufacturers and distributors.
Good With (UK) delivers ai-driven financial behaviour analytics for real-time credit risk assessment, enabling lenders to convert binary ‘Yes/No’ decisioning into a ‘Safe Journey to Yes’ which increases ‘good’ customer acquisition and reduces loss.
Hybr (UK) is a SaaS enabled lettings platform for letting agents cut workloads by up to 80%, turning leads into lets faster, and building transparency into the rental process.
iVerse (Israel) is an AI talent platform built on 50+ years of occupational psychology — combining real-time behavioral analysis, proprietary evaluation layers, and a scalable model to match top AI talent with top AI opportunities worldwide.
Material Evolution (UK) is transforming cement with a novel AI-driven tech that uses industrial waste that requires no heat and significantly reduces carbon emissions.
Metsystem (Denmark) develops an AI powered metastasis-targeting platform to predict personalized cancer treatments and help pharmaceutical companies stratify patients for drug trials.
Noxon (Germany) builds wearable Muscle-Computer Interface (MCI) that makes muscle diagnostics and therapies more accessible, user-friendly, and scalable for remote care.
Punto Health (UK/ Spain) is transforming dementia care with an AI-powered platform that delivers continuous, personalised support for patients and carers, while improving monitoring and coordination for providers.
ShareID (France) enables real-time, privacy-first identity verification without storing personal or biometric data, redefining digital trust.
Tech1M (UK) is an intelligent recruitment engine with AI Agents for sourcing, screening, interviewing and hiring talents anywhere in the world.
V-Art (Ukraine) is a DeepTech startup streamlining IP monetization for brands and AI with a solution to manage and license any digital content at scale.
Whering (UK) is a digital wardrobe & AI styling app that allows users to unlock infinite outfit combinations from the clothes they already own.

We can’t wait to see what comes next for these AI-first solutions and incredible teams driving them. Learn more about Google for Startups Accelerator programs on startup.google.com.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ecd540be310>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Read More for the details.

2025 06 17

GCP – Gemini momentum continues with launch of 2.5 Flash-Lite and general availability of 2.5 Flash and Pro on Vertex AI

Tibor Kiss Cloud, Google Cloud gcp

The momentum of the Gemini 2.5 era continues to build. Following our recent announcements, we’re empowering enterprise builders and developers with even greater access to the intelligence, and flexibility of our most capable models yet, directly within Vertex AI, our unified platform for enterprise-scale AI development.

The significant updates announced today are designed to help your organization build sophisticated, customized, and efficient AI solutions, more confidently. These include:

Gemini 2.5 Flash and 2.5 Pro now generally available: Our most intelligent models for speed and advanced reasoning are production-ready providing organizations with the stability, reliability and scalability needed to confidently deploy the most advanced AI capabilities into mission-critical applications.
New Gemini 2.5 Flash-Lite in public preview: Experience the cost-efficient Gemini 2.5 model yet with optimized performance for high-volume tasks.
New Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash is generally available: Tailor our high-speed model to your unique enterprise data and needs.
New updated Live API with native audio in public preview: Streamline the development of complex, real-time audio AI systems.

Build with confidence using production-ready Gemini 2.5

Gemini 2.5 Flash: Optimized for speed, efficiency, and scale

Gemini 2.5 Flash, is now generally available in Vertex AI, the Gemini API, and Google AI Studio, engineered for high-throughput enterprise tasks such as large-scale summarization, responsive chat applications, and efficient data extraction. These advancements provide a comprehensive toolkit to elevate your enterprise applications and unlock new levels of productivity and innovation. Build with confidence on this production-ready foundation.

“SmartBear uses AI to power Test Hub, its solution for building and executing regression tests for web, desktop, and mobile. With Gemini 2.5 Flash on Vertex AI, we can accelerate tasks like translating extensive manual test scripts into robust automated tests with remarkable speed and cost-effectiveness. The ROI is multifaceted: we’re empowering our customers to realize the benefits of automation execution, while simultaneously producing intent-based, resilient-to-change test plans. This drastically increases testing velocity and enables faster feature delivery—helping our customers move with greater speed and confidence, powered by a more efficient and scalable AI foundation.”
– Fitz Nowlan, PhD, VP of AI, SmartBear

“At Connective Health, our mission is empowering healthcare providers and driving better patient outcomes. Gemini 2.5 Flash on Vertex AI is instrumental in helping us extract vital medical records from complex free-text records. Customer trust is paramount, so our AI initiatives are always developed in close collaboration with healthcare providers, ensuring its use is accurate and impactful. The rapid advancements in Gemini’s capabilities allow us to continually enhance how we deliver these critical insights, and we’re excited to explore further applications to improve the lives of more patients and providers.”
– Joe Athman, CTO, Connective Health

“At Suggestic, we’re advancing the future of personalized nutrition by making nutritional data instantly actionable through our next-generation, image-based inference API. By leveraging Gemini 2.5 Flash as our core model, we’ve consistently achieved exceptional accuracy and processing efficiency, significantly outperforming alternative models on the Nutrition5k dataset. Gemini 2.5 Flash delivered a remarkable 25% improvement across critical benchmarks, including processing speed, enabling us to implement advanced image modification tools that enhance inference accuracy without sacrificing response times. Its native support for structured output and unparalleled capability in handling complex, tool-augmented tasks ensures seamless, real-time experiences, making Gemini 2.5 Flash the optimal choice for robust, production-grade solutions.”
– Shai Rozen, Co-founder, Suggestic

Gemini 2.5 Pro: Unlock state-of-the-art intelligence

Our most capable model, Gemini 2.5 Pro, is also now generally available in Vertex AI, the Gemini API, and Google AI Studio. Designed for your most demanding enterprise AI challenges like making sense of massive datasets for scientific discovery or accelerating migration of critical legacy code, it excels at highly complex reasoning, advanced code generation, and deep multimodal understanding.

“At Snap, we believe today’s devices and user interfaces can constrain the full potential of AI. So, we’re bringing AI into the world through Spectacles, our standalone, see-through, immersive AR glasses, and Gemini on Google Cloud. Through the powerful combination of our Depth Module API and Gemini 2.5 Pro, it’s already possible to translate 2D coordinates of an image into 3D space, enabling information and annotations to be anchored on the real world – even as you move around. We’re excited to unlock a whole new paradigm for spatial intelligence on Spectacles.”
– Terek Judi, Staff Product Manager, Snap Inc.

“At Multimodal, we’re reimagining how business and IT teams in finance and insurance co-create intelligent agentic workflows. By integrating Gemini 2.5 Pro into our AgentFlow platform, we’ve transformed how customers experience Zero Shot AI—enabling them to instantly see how AI agents operate on their own documents, workflows, and use cases, without needing lengthy pilots or custom demos. Gemini 2.5’s large context window and structured reasoning unlock a level of depth and adaptability that’s been impossible before, allowing our agents to understand, reason through, and act across highly specific domain workflows. This fundamentally changes the go-to-market experience: business teams can now visualize and validate impact on day one. For industries where trust, compliance, and precision are paramount, that’s a game-changer.”
– Andrew McKishnie, VP of Engineering, Multimodal

Enhanced customization and efficiency for your needs

Gemini 2.5 Flash-Lite in public preview: Gain cost-efficiency with low latency
Get an early look at Gemini 2.5 Flash-Lite, the most cost-effective Gemini 2.5 model yet, optimized for performance in high-volume workloads. Delivering higher performance than the previous Flash-Lite model, 2.5 Flash-Lite is 1.5 times faster than 2.0 Flash, at a lower cost, on Vertex AI. It’s ideal for tasks like classification, translation, intelligent routing, and other cost-sensitive, high-scale operations.

Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash: Customized AI for your business
Achieve unparalleled customization with the GA release of Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash on Vertex AI. Adapt Gemini to your enterprise’s specific datasets, industry-specific terminology, and unique brand voice, leading to higher accuracy on specialized tasks.

Live API with native audio in public preview: Build real-time interactive services
Streamline the development of sophisticated, real-time AI systems with the Live API, now in public preview with native audio-to-audio capabilities. This enables more natural and responsive voice-driven applications and complex AI agent interactions.

“Newo.ai enables small and medium businesses to deploy fully functional AI receptionists that handle all incoming communication channels—voice and text—in just 3 minutes with one click. We’ve worked through thousands of customer scenarios to enable AI Employee creation using only a Google Maps listing or website. While this appears simple, we deliver sophisticated conversation flows requiring advanced reasoning, low latency, multilingual capabilities, and empathetic responses—features powered by the Live API and Gemini 2.5 Flash on Vertex AI. This combination allows us to deliver production-ready AI employees that generate up to 30x ROI for our clients.”
– David Yang, Co-founder, Newo.ai

Driving your enterprise AI initiatives forward, these comprehensive Vertex AI updates enable you to continue to scale confidently with robust, production-grade models. You can now tailor powerful AI precisely to your unique operational needs and data, optimize for cost-efficiency in high-throughput scenarios, and build next-generation, interconnected AI solutions that push the boundaries of innovation.

“At Citizen Health, we develop AI advocates that empower rare‑disease patients/caretakers to understand and navigate their healthcare journeys. Our data pipelines stream longitudinal EHR data – decades of clinician notes, imaging reports, and genomic panels – directly into Gemini  2.5 Pro’s million‑token context windows, enabling patients and caretakers to receive concise, context‑rich answers in near real-time. We orchestrate Gemini 2.5 Flash and Gemini 2.5 Pro models within a LangGraph‑powered multi‑agent framework, ensuring the most relevant evidence reaches patients and caretakers without hallucinations. Gemini’s long‑context comprehension coupled with rapid inference converts exhaustive document review into a seamless conversation, allowing families to spend less time deciphering records and more time making informed care decisions.”
– Daniel Wang, CTO, Citizen Health

Pricing and availability
The Gemini 2.5 family of models offers a range of options to meet diverse enterprise needs. With Gemini 2.5 Flash moving to general availability, its pricing has been updated to reflect its improved quality and comprehensive capabilities. We are also introducing preview pricing for Gemini 2.5 Flash-Lite, our most cost efficient Gemini 2.5 model yet. For complete details on pricing for Gemini 2.5 Flash, Gemini 2.5 Pro, and the Gemini 2.5 Flash-Lite preview, please visit our pricing page.

Start moving to production today with Gemini 2.5 Flash and Gemini 2.5 Pro, now generally available on Vertex AI.

Read More for the details.

2025 06 17

GCP – Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes

Tibor Kiss Cloud, Google Cloud gcp

Integrating context from tools and data sources into LLMs can be challenging, which impacts ease-of-use in the development of AI agents. To address this challenge, Anthropic introduced the Model Context Protocol (MCP), which standardizes how applications provide context to LLMs. Imagine you want to build an MCP server for your API to make it available to fellow developers so they can use it as context in their own AI applications. But where do you deploy it? Google Cloud Run could be a great option.

Drawing directly from the official Cloud Run documentation for hosting MCP servers, this guide shows you the straightforward process of setting up your very own remote MCP server. Get ready to transform how you leverage context in your AI endeavors!

MCP Transports

MCP follows a client-server architecture, and for a while, only supported running the server locally using the stdio transport.

MCP-blog-image — https://modelcontextprotocol.io/introduction

MCP has evolved and now supports remote access transports: streamable-http and sse. Server-Sent Events (SSE) has been deprecated in favor of Streamable HTTP in the latest MCP specification but is still supported for backwards compatibility. Both of these two transports allow for running MCP servers remotely.

With Streamable HTTP, the server operates as an independent process that can handle multiple client connections. This transport uses HTTP POST and GET requests.

The server MUST provide a single HTTP endpoint path (hereafter referred to as the MCP endpoint) that supports both POST and GET methods. For example, this could be a URL like https://example.com/mcp.

You can read more about the different transports in the official MCP docs.

Benefits of running an MCP server remotely

Running an MCP server remotely on Cloud Run can provide several benefits:

Scalability: Cloud Run is built to rapidly scale out to handle all incoming requests. Cloud Run will scale your MCP server automatically based on demand.
Centralized server: You can share access to a centralized MCP server with team members through IAM privileges, allowing them to connect to it from their local machines instead of all running their own servers locally. If a change is made to the MCP server, all team members will benefit from it.
Security: Cloud Run provides an easy way to force authenticated requests. This allows only secure connections to your MCP server, preventing unauthorized access.

IMPORTANT: The security benefit is critical. If you don’t enforce authentication, anyone on the public internet can potentially access and call your MCP server.

Prerequisites

Python 3.10+
Uv (for package and project management, see docs for installation)
Google Cloud SDK (gcloud)

Installation

Create a folder, mcp-on-cloudrun, to store the code for our server and deployment:

code_block: <ListValue: [StructValue([(‘code’, ‘mkdir mcp-on-cloudrunrncd mcp-on-cloudrun’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93a60>)])]>

Let’s get started by using uv to create a project. Uv is a powerful and fast package and project manager.

code_block: <ListValue: [StructValue([(‘code’, ‘uv init –name “mcp-on-cloudrun” –description “Example of deploying a MCP server on Cloud Run” –bare –python 3.10’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93e20>)])]>

After running the above command, you should see the following pyproject.toml:

code_block: <ListValue: [StructValue([(‘code’, ‘[project]rnname = “mcp-on-cloudrun”rnversion = “0.1.0”rndescription = “Example of deploying a MCP server on Cloud Run”rnrequires-python = “>=3.10″rndependencies = []’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93f10>)])]>

Next, let’s create the additional files we will need: a server.py for our MCP server code, a test_server.py that we will use to test our remote server, and a Dockerfile for our Cloud Run deployment.

code_block: <ListValue: [StructValue([(‘code’, ‘touch server.py test_server.py Dockerfile’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b932b0>)])]>

Our file structure should now be complete:

code_block: <ListValue: [StructValue([(‘code’, ‘├── mcp-on-cloudrunrn│ ├── pyproject.tomlrn│ ├── server.pyrn│ ├── test_server.pyrn│ └── Dockerfile’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93700>)])]>

Now that we have our file structure taken care of, let’s configure our Google Cloud credentials and set our project:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud auth loginrnexport PROJECT_ID=<your-project-id>rngcloud config set project $PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93070>)])]>

Math MCP Server

LLMs are great at non-deterministic tasks: understanding intent, generating creative text, summarizing complex ideas, and reasoning about abstract concepts. However, they are notoriously unreliable for deterministic tasks – things that have one, and only one, correct answer.

Enabling LLMs with deterministic tools (such as math operations) is one example of how tools can provide valuable context to improve the use of LLMs using MCP.

We will use FastMCP to create a simple math MCP server that has two tools: add and subtract. FastMCP provides a fast, Pythonic way to build MCP servers and clients.

Add FastMCP as a dependency to our pyproject.toml:

code_block: <ListValue: [StructValue([(‘code’, ‘uv add fastmcp==2.6.1 –no-sync’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93220>)])]>

Copy and paste the following code into server.py for our math MCP server:

code_block: <ListValue: [StructValue([(‘code’, ‘import asynciornimport loggingrnimport osrnrnfrom fastmcp import FastMCP rnrnlogger = logging.getLogger(__name__)rnlogging.basicConfig(format=”[%(levelname)s]: %(message)s”, level=logging.INFO)rnrnmcp = FastMCP(“MCP Server on Cloud Run”)rnrn@mcp.tool()rndef add(a: int, b: int) -> int:rn “””Use this to add two numbers together.rn rn Args:rn a: The first number.rn b: The second number.rn rn Returns:rn The sum of the two numbers.rn “””rn logger.info(f”>>> ?️ Tool: ‘add’ called with numbers ‘{a}’ and ‘{b}'”)rn return a + brnrn@mcp.tool()rndef subtract(a: int, b: int) -> int:rn “””Use this to subtract two numbers.rn rn Args:rn a: The first number.rn b: The second number.rn rn Returns:rn The difference of the two numbers.rn “””rn logger.info(f”>>> Tool: ‘subtract’ called with numbers ‘{a}’ and ‘{b}'”)rn return a – brnrnif __name__ == “__main__”:rn logger.info(f” MCP server started on port {os.getenv(‘PORT’, 8080)}”)rn # Could also use ‘sse’ transport, host=”0.0.0.0″ required for Cloud Run.rn asyncio.run(rn mcp.run_async(rn transport=”streamable-http”, rn host=”0.0.0.0″, rn port=os.getenv(“PORT”, 8080),rn )rn )’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93850>)])]>

Transport

We are using the streamable-http transport for this example as it is the recommended transport for remote servers, but you can also still use sse if you prefer as it is backwards compatible.

If you want to use sse, you will need to update the last line of server.py to use transport="sse".

Deploying to Cloud Run

Now let’s deploy our simple MCP server to Cloud Run.

Copy and paste the below code into our empty Dockerfile; it uses uv to run our server.py:

code_block: <ListValue: [StructValue([(‘code’, ‘# Use the official Python lightweight imagernFROM python:3.13-slimrnrn# Install uvrnCOPY –from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/rnrn# Install the project into /apprnCOPY . /apprnWORKDIR /apprnrn# Allow statements and log messages to immediately appear in the logsrnENV PYTHONUNBUFFERED=1rnrn# Install dependenciesrnRUN uv syncrnrnEXPOSE $PORTrnrn# Run the FastMCP serverrnCMD [“uv”, “run”, “server.py”]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473b80>)])]>

You can deploy directly from source, or by using a container image.

For both options we will use the --no-allow-unauthenticated flag to require authentication.

This is important for security reasons. If you don’t require authentication, anyone can call your MCP server and potentially cause damage to your system.

Option 1 – Deploy from source

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run deploy mcp-server –no-allow-unauthenticated –region=us-central1 –source .’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473ee0>)])]>

Option 2 – Deploy from a container image

Create an Artifact Registry repository to store the container image.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud artifacts repositories create remote-mcp-servers \rn –repository-format=docker \rn –location=us-central1 \rn –description=”Repository for remote MCP servers” \rn –project=$PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473070>)])]>

Build the container image and push it to Artifact Registry with Cloud Build.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud builds submit –region=us-central1 –tag us-central1-docker.pkg.dev/$PROJECT_ID/remote-mcp-servers/mcp-server:latest’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b04732e0>)])]>

Deploy our MCP server container image to Cloud Run.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run deploy mcp-server \rn –image us-central1-docker.pkg.dev/$PROJECT_ID/remote-mcp-servers/mcp-server:latest \rn –region=us-central1 \rn –no-allow-unauthenticated’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473d30>)])]>

Once you have completed either option, if your service has successfully deployed you will see a message like the following:

code_block: <ListValue: [StructValue([(‘code’, ‘Service [mcp-server] revision [mcp-server-12345-abc] has been deployed and is serving 100 percent of traffic.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473340>)])]>

Authenticating MCP Clients

Since we specified --no-allow-unauthenticated to require authentication, any MCP client connecting to our remote MCP server will need to authenticate.

The official docs for Host MCP servers on Cloud Run provides more information on this topic depending on where you are running your MCP client.

For this example, we will run the Cloud Run proxy to create an authenticated tunnel to our remote MCP server on our local machines.

By default, the URL of Cloud Run services requires all requests to be authorized with the Cloud Run Invoker (roles/run.invoker) IAM role. This IAM policy binding ensures that a strong security mechanism is used to authenticate your local MCP client.

Make sure that you or any team members trying to access the remote MCP server have the roles/run.invoker IAM role bound to their IAM principal (Google Cloud account).

NOTE: The following command may prompt you to download the Cloud Run proxy if it is not already installed. Follow the prompts to download and install it.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run services proxy mcp-server –region=us-central1’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473eb0>)])]>

You should see the following output:

code_block: <ListValue: [StructValue([(‘code’, ‘Proxying to Cloud Run service [mcp-server] in project [<YOUR_PROJECT_ID>] region [us-central1]rnhttp://127.0.0.1:8080 proxies to https://mcp-server-abcdefgh-uc.a.run.app’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473970>)])]>

All traffic to http://127.0.0.1:8080 will now be authenticated and forwarded to our remote MCP server.

Testing the remote MCP server

Let’s test and connect to the remote MCP server using the FastMCP client to connect to http://127.0.0.1:8080/mcp (note the /mcp at the end as we are using the Streamable HTTP transport) and call the add and subtract tools.

Add the following code to the empty test_server.py file:

code_block: <ListValue: [StructValue([(‘code’, ‘import asynciornrnfrom fastmcp import Clientrnrnasync def test_server():rn # Test the MCP server using streamable-http transport.rn # Use “/sse” endpoint if using sse transport.rn async with Client(“http://localhost:8080/mcp”) as client:rn # List available toolsrn tools = await client.list_tools()rn for tool in tools:rn print(f”>>> Tool found: {tool.name}”)rn # Call add toolrn print(“>>> Calling add tool for 1 + 2”)rn result = await client.call_tool(“add”, {“a”: 1, “b”: 2})rn print(f”<<< Result: {result[0].text}”)rn # Call subtract toolrn print(“>>> Calling subtract tool for 10 – 3”)rn result = await client.call_tool(“subtract”, {“a”: 10, “b”: 3})rn print(f”<<< Result: {result[0].text}”)rnrnif __name__ == “__main__”:rn asyncio.run(test_server())’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473580>)])]>

NOTE: Make sure you have the Cloud Run proxy running before running the test server.

In a new terminal run:

code_block: <ListValue: [StructValue([(‘code’, ‘uv run test_server.py’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473130>)])]>

You should see the following output:

code_block: <ListValue: [StructValue([(‘code’, ‘>>> Tool found: addrn>>> Tool found: subtractrn>>> Calling add tool for 1 + 2rn<<< Result: 3rn>>> Calling subtract tool for 10 – 3rn<<< Result: 7’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473c10>)])]>

You’ve done it! You have successfully deployed a remote MCP server to Cloud Run and tested it using the FastMCP client.

Want to learn more about deploying AI applications on Cloud Run? Check out this blog from Google I/O to learn the latest on Easily Deploying AI Apps to Cloud Run!

Continue Reading

Read More for the details.

2025 06 16

GCP – C4D now GA: up to 80% higher performance for your business critical workloads

Tibor Kiss Cloud, Google Cloud gcp

We’re excited to announce the general availability of our next-generation C4D virtual machine family. Powered by 5th Gen AMD EPYC processors (Turin) paired with Google Titanium’s latest advancements, C4D provides customers with meaningful performance improvements — up to 80% higher throughput for web serving and 30% better performance for general computing workloads compared to the previous generation. This improvement in performance enables you to maximize your cloud investment and achieve more with fewer resources.

Beyond raw performance, C4D supports key enterprise capabilities including our first AMD-based Bare Metal instances offering direct access to all the resources on the server for maximum control and performance (bare metal will be available in the coming weeks); and the next-gen Titanium Local SSD, which enhances I/O-intensive operations with up to 35% lower latency vs. the prior generation. These hardware advancements are paired with enterprise-grade security and reliability, featuring a 30-day uptime window between planned maintenance events. With this combination of peak performance, expanded capabilities and enterprise-grade controls, C4D is suited for a wide range of general-purpose computing workloads, from databases, AI inference, web, application and game servers, to mission-critical business applications.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e64ec35a670>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>

Supercharge your workloads while optimizing costs

General computing and web serving workloads

For general-purpose computing workloads, C4D VMs deliver up to 25% better performance per dollar than C3D, based on estimated SPECrate®2017_int_base benchmark results. C4D also delivers up to 20% better performance per dollar than comparable, generally available offerings from other cloud providers on the same benchmark. This helps you meet demanding performance requirements for a wide range of workloads — including web servers, game, ad and application servers, and containerized microservices — while optimizing resource usage and reducing costs.

For web-serving workloads, C4D leverages AMD Turin’s improved L3 cache efficiency and next-generation branch prediction to deliver higher throughput/vcpu, resulting in up to 75% increase in price-performance compared to the previous-generation C3D. This drives faster page rendering and a smoother end-user experience, with up to 45% higher price-performance compared to competitive offerings.

“AppLovin, a global leader in mobile advertising, is constantly looking for cutting-edge infrastructure innovations to deliver exceptional performance for our clients. Google Cloud’s C4D VMs enable us to do just that — driving up to 40% improvement over the prior generation, which leads to significant efficiency gains and latency reduction.” – Basil Shikin, Chief Technology Officer, AppLovin

“On C4D, our ad servers perform 191% faster than N2D, and 81% faster than C3D to serve the same amount of ad-requests. This improved performance comes at a lower overall cost because we can run a smaller number of more efficient nodes, but not only that, for us more performance/less latency means not only savings but more revenue since the fill-rate for ads (successful bid/ask matching) grows exponentially.” – Pablo Loschi, Principal Systems Engineer, Verve Group

“C4D Performance was impressive — most workloads, including page rendering & video processing, were 40+% faster than the previous generation. This kind of improvement makes a real difference to users of platforms like SpareRoom.” – Dimitrios Kechagias, Principal Developer, Cloud Infrastructure Optimisation Lead, SpareRoom

Databases and data-intensive applications

The C4D family is purpose-built for data-intensive applications such as databases and data analytics, by offering the latest generation compute and advanced storage capabilities. C4D’s high core frequency of up to 4.1Ghz and improved Instructions Per Clock (IPC) accelerate transactional workloads such as MySQL by up to 55% versus the prior generation with faster, more efficient query processing. For applications that require large datasets in memory, C4D provides VM sizes scaling up to 384vcpu and 3TB of high-bandwidth DDR5 memory. To scale database I/O performance, customers benefit from the integrated Hyperdisk Extreme storage with up to 500k IOPS and the new Titanium Local SSD that reduces read latency by 35% compared to the prior generation. Together, these capabilities increase performance and responsiveness for your mission-critical databases, delivering up to 35% better price-performance for Redis and MySQL workloads than comparable generally available VMs from other hyperscalers.

“We are constantly looking for advanced computing options to improve experience for the players. With Google’s new C4D VMs, we see drastically improved performance for our observability stack which handles 50k inserts/sec concurrently. Compared to C3D, we were able to cut our resource footprint by half, while reducing CPU load by 35% and seeing a 30% improvement in indexing latency. We look forward to adopting C4D at scale.” – Grzegorz Dlugolecki, Principal Cloud & Kubernetes Engineer, Chess.com

“Across over 100 benchmarks, going from the C3D to C4D yielded 1.7x the performance! This is a heck of a generational improvement for Google Cloud or any public cloud provider for that matter. C4D performance is extremely compelling and opens up a lot of new compute possibilities in the public cloud.” – Michael Larabel, Founder and Principal Author, Phoronix (Read the full study here)

AI inference and complex computations
C4D’s processor offers full support for AVX-512 with a 512 bit datapath, a 50% increase in memory channels, and higher IPC compared to the prior generation. This provides significant improvements for compute-heavy tasks such as CPU-based inference, matrix operations, financial modeling and simulations, and analytics. For recommendation inference, C4D VMs demonstrate an up to 75% price-performance uplift compared to C3D and up to 35% better price-performance versus the comparable competitive offerings from leading hyperscalers, to accelerate the time to results and reduce TCO.

“Silk has tested C4D, and found it to deliver a dramatic increase of up to 40% in performance compared to the previous generation, C3D, enabling our customers to enjoy significant gains in efficiency and agility of their mission critical workloads, over the transactional, analytics and AI use cases.” – Adik Sokolovski, Chief R&D Officer, Silk

Security, maintenance controls and shapes

With Titanium, C4D offers improved infrastructure performance, lifecycle management, reliability, and security. Storage and network management is offloaded to the Titanium adapter, reserving the host resources for running customer workloads.

Titanium also enables our first AMD-based Bare Metal instances, which provide direct access to server resources. Bare metal instances are ideal for workloads that require low-level system access — like custom hypervisors, container platforms, or applications with specialized performance or licensing needs. Sectors such as financial services, security, and private cloud platforms will particularly benefit from C4D Bare Metal offerings.

C4D VMs support Hyperdisk, Google Cloud’s workload-optimized block storage. Designed for high performance and scalability, Hyperdisk is cost-efficient, easy to manage at scale, and enterprise-ready. C4D VMs are compatible with Hyperdisk Balanced and Extreme, supporting up to 512 TiB of capacity per instance. With up to 320K IOPS per instance, Hyperdisk Balanced offers an optimal mix of performance and cost-efficiency, for a broad range of workloads. Hyperdisk Extreme delivers ultra-low latency and supports up to 500K IOPS and 10,000 MiB/s throughput per instance — making it well-suited for demanding workloads like databases and caching layers. With real-time tuning of IOPS and bandwidth, Hyperdisk helps ensure your applications always have the storage resources they need.

To enhance security, C4D VMs support confidential computing with AMD Secure Encrypted Virtualization (AMD SEV), utilizing hardware-based memory encryption to help protect your data and applications while in use. This makes C4D an excellent choice for sensitive data, privileged information, PII, and workloads subject to data privacy regulations and compliance requirements.

Experience C4D today

C4D delivers exceptional performance, scalability, and efficiency for today’s most demanding workloads. Powered by next-gen AMD processors, Titanium infrastructure, and Hyperdisk storage, C4D delivers the performance and capabilities needed to make the most of your cloud resources. Whether you’re just getting started with Compute Engine or planning to upgrade from previous generations, C4D offers a clear path to greater efficiency and performance. C4D is now available in 12 regions and 28 zones — check regional availability on our Regions and Zones page and deploy your first instance in the Google Cloud console or with Google Kubernetes Engine.

Read More for the details.

2025 06 16

GCP – Simplify your multi-cloud strategy with Cloud Location Finder, now in preview

Tibor Kiss Cloud, Google Cloud gcp

As cloud environments expand beyond traditional architectures to include multiple clouds, managing your infrastructure effectively becomes more complex. Imagine easily accessing consistent and up-to-date location information across different cloud providers, so your multi-cloud applications are designed and optimized with performance, security, and regulatory compliance in mind.

Today, we are making this a reality with Cloud Location Finder, a new Google Cloud service which provides up-to-date location data across Google Cloud, Amazon Web Services (AWS), Microsoft Azure, and Oracle Cloud Infrastructure (OCI). Now, you can strategically deploy workloads across different cloud providers with confidence and control.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e64f09e92b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>

Why use Cloud Location Finder?

Unified location data: Cloud Location Finder makes it easy for you to access the latest location information, providing a single source of truth for cloud location data for Google Cloud, AWS, Azure, and OCI.
Rich location attributes: Data includes public cloud region and zone information, and location metadata like proximity¹, territory code, and carbon footprint.
Up-to-date information: Cloud Location Finder provides 24-hour data freshness for active regions, removing outdated information promptly when locations are turned down. This eliminates inconsistencies from hard-coded lists and ensures your information is current without manual monitoring.
Programmatically accessible: Especially for organizations with diverse cloud and location requirements, Cloud Location Finder eliminates the need to manually manage and choose new locations. And because it is an API, it is easy to integrate into your applications.

How can Cloud Location Finder help you?

Whether you’re a partner or customer, a Cloud Architect or application developer designing a hybrid setup or a platform admin ensuring governance, Cloud Location Finder offers valuable insights:

Optimize deployments: Easily identify the nearest Google Cloud region or zone to an existing AWS/Azure/OCI deployment, to help you optimize your multi-cloud application for performance and latency.
Meet sustainability goals: As your business grows, choose nearby cloud locations that are also sustainable.
Help ensure compliance: Find a list of regions and zones in a specified territory to help you ensure compliant log storage or data processing across multiple clouds.
Improve reliability: Rely on a consistent source of truth for location data, which can be integrated directly into your applications.

Getting started

Cloud Location Finder is accessible via REST APIs and gcloud CLI, and available at no cost. You can easily list locations, get specific location details, find nearby locations, and filter these based on criteria such as cloud provider, location type, territory, or carbon footprint.

Ready to streamline your multi-cloud location strategy? Explore the Cloud Location Finder documentation to learn more and start building with consistent, accurate location data today!

^{1. Currently, for GCP regions only}

Read More for the details.

2025 06 16

GCP – Build a multi-agent KYC workflow in three steps using Google’s Agent Development Kit and Gemini

Tibor Kiss Cloud, Google Cloud gcp

Know Your Customer (KYC) processes are foundational to any Financial Services Institution’s (FSI) regulatory compliance practices and risk mitigation strategies. KYC is how financial institutions verify the identity of their customers and assess associated risks. But as customers expect instant approvals, FSIs face pressure to streamline their manual, time-consuming and error-prone KYC processes.

The good news: As LLMs get more capable and gain access to more tools to perform useful actions, employing a robust ‘agentic’ architecture to bolster the KYC process is just what FSIs need. The challenge? Building robust AI agents is complex. Google’s Agent Development Kit (ADK) gives you essential tooling to build multi-agent workflows. Plus, combining ADK with Search Grounding via Gemini can help give you higher fidelity and trustworthiness for tasks requiring external knowledge. Together, this can give FSIs:

Improved efficiency: Automate large portions of the KYC workflow, reducing manual effort and turnaround times.
Enhanced accuracy: Leverage AI for consistent document analysis and comprehensive external checks.
Strengthened compliance: Improve auditability through clear reporting and source attribution (via grounding).

To that end, this post illustrates how Google Cloud’s cutting-edge AI technologies – the Agent Development Kit (ADK), Vertex AI Gemini models, Search Grounding, and BigQuery – can be leveraged to build such a multi-agent KYC solution.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e64ec38b8e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Tech stack from Google Cloud

This multi-agent architecture we’ll show you today effectively utilizes several key Google Cloud services:

Agent Development Kit (ADK): Simplifies the creation and orchestration of agents. ADK handles agent definition, tool integration, state management, and inter-agent communication. It’s a platform and model-agnostic agentic framework which provides the scaffolding upon which complex agentic workflows can be built.
Vertex AI & Gemini models: The agents are powered by Gemini models (like gemini-2.0-flash) hosted on Vertex AI. These models provide the core reasoning, instruction-following, and language understanding capabilities. Gemini’s potential for multimodal analysis (processing images in IDs or documents) and multilingual support further enhances the KYC process for diverse customer bases.
Search Grounding: The google_search tool, used by the Resume_Crosschecker and External_Search agents, leverages Gemini’s Google Search grounding capabilities. This connects the Gemini model’s responses to real-time information from Google Search, significantly reducing hallucinations and ensuring that external checks are based on up-to-date, verifiable public data. The agents are instructed to cite sources (URIs) provided by the grounding mechanism, enhancing transparency and auditability.
BigQuery: The search_internal_database custom tool demonstrates direct integration with BigQuery. The KYC_Agent uses this tool early in the workflow to check if a customer profile already exists within the institution’s internal data warehouse, preventing duplicate entries and leveraging existing information. This showcases how agents can securely interact with internal, structured datasets.

Deep dive: How to build a KYC agent in three steps

Our example KYC solution utilizes a root agent (KYC Agent) that orchestrates several specialized sub-agents:

Document Checker: Analyzes uploaded documents (ID, proof of address, bank statements, etc.) for consistency, validity, and potential discrepancies across documents.
Resume Crosschecker: Verifies information on a customer’s resume against public sources like LinkedIn and company websites using grounded web searches.
External Search: Conducts external due diligence, searching for adverse media, Politically Exposed Person (PEP) status, and sanctions list appearances using grounded web searches.
Wealth Calculator: Assesses the client’s financial position by analyzing financial documents, calculating net worth, and verifying the source of wealth legitimacy.

The root KYC_Agent manages the overall workflow, calling these child agents sequentially and handling tasks like checking if the customer is already present in internal databases and generating unique case IDs to track KYC requests.

Diagram showing the KYC Agent’s structure with sub-agents and tools

Step 1: Define your root agent (which receives the initial request from the user) and the child agents which handle the specialised tasks involved in the KYC process.

code_block: <ListValue: [StructValue([(‘code’, ‘# kyc_agent/agent.py (Illustrative Snippet)rnrn# Child Agents Definitions (Simplified)rndocument_checker_agent = Agent(rn model=MODEL, # e.g. gemini-2.0-flash-001rn name=”Document_Checker”,rn description=’Analyses documents and finds discrepancies…’,rn instruction=instructions_dict[‘Document_Checker’],rngenerate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrnresume_crosschecker = Agent(rn model=MODEL,rn name=’Resume_Crosschecker’,rn description=’Uses `google_search` tool for verifying resume…’,rn instruction=instructions_dict[‘Resume_Crosschecker’],rn tools=[google_search], # Leverages Search Groundingrn generate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrnexternal_search_agent = Agent(rn model=MODEL,rn name=”External_Search”,rn description=’Uses `google_search` tool to find negative news…’,rn instruction=instructions_dict[‘External_Search’],rn tools=[google_search], # Leverages Search Groundingrn generate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrnwealth_calculator_agent = Agent(rn model=MODEL,rn name=”Wealth_Calculator”,rn description=”Assesses the client’s financial position…”,rn instruction=instructions_dict[‘Wealth_Calculator’],rn generate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrn# Wrap Resume_Crosschecker Agentrnresume_crosschecker_tool = AgentTool(agent=resume_crosschecker_agent)rnrn# Wrap External_Search Agentrnexternal_search_tool = AgentTool(agent=external_search_agent)rnrn# Root KYC Agent orchestrating the workflowrnroot_agent = Agent(rn model=MODEL,rn name=”KYC_Agent”,rn description=”KYC Onboarding Assistant”,rn # Add the AgentTool wrappers to the tools list, alongside the original toolsrn tools=[rn generate_case_id,rn search_internal_database,rn resume_crosschecker_tool, # AgentToolrn external_search_tool # AgentToolrn ],rn sub_agents=[rn document_checker_agent,rn wealth_calculator_agentrn ],rn generate_content_config=GenerateContentConfig(temperature=0.27),rn instruction=instructions_dict[‘KYC_Agent’], # Instructions should still guide the LLM to call the tools by namern global_instruction=’You will always give detailed responses and follow instructions’rn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e64ec38b8b0>)])]>

Step 2: Define the tools needed by your agents in order to perform their respective tasks

code_block: <ListValue: [StructValue([(‘code’, ‘# kyc_agent/custom_tools.py (Illustrative Snippet)rnrndef search_internal_database(input_name: str) -> Dict[str, Any]:rn “””rn Finds names in an internal BigQuery table…rn “””rn try:rn client = bigquery.Client(project=PROJECT_ID)rn query = f”””rn SELECT `Full Name`, `UID`, `Risk Level`, `Citizenship`, `Networth`rn FROM `{TABLE_NAME}` # Defined in constants.pyrn WHERE LOWER(`Full Name`) LIKE LOWER(‘%{input_name}%’)rn “””rn query_job = client.query(query)rn results = query_job.result()rn df = results.to_dataframe()rn return df.to_dict(‘records’)rn except Exception as e:rn error_message = f”An error occurred with BigQuery: {e}”rn # Handle errors, potentially fallback to alternate data sourcern # Fallback logic would go here if neededrn return {“error”: error_message}’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e64ec38ba30>)])]>

Step 3: Run your agent locally using the command “adk web”. ADK provides a built-in UI for developers to visualise and debug the agent during the development process:

Screenshot of the ADK Dev UI used for developing agents

Start building now

This multi-agent KYC architecture demonstrates the power of combining ADK, Gemini, Search Grounding, and BigQuery. It provides a blueprint for building intelligent, automated solutions for complex business processes.

Learn more: Dive deeper into the technologies used:

Build your own: Adapt this pattern to your specific KYC requirements and integrate it with your existing systems on Google Cloud using services like Cloud Run for deployment.

Contact us: Reach out to Google Cloud Sales for a deeper discussion on implementing AI-driven KYC solutions tailored to your organization.

By embracing a multi-agent approach powered by Google Cloud’s AI stack, FSIs can transform their KYC processes, achieving greater efficiency, accuracy, and compliance in an increasingly digital world.

Read More for the details.

2025 06 16

GCP – How Google Cloud is securing open-source credentials at scale

Tibor Kiss Cloud, Google Cloud gcp

Credentials are an essential part of modern software development and deployment, granting bearers privileged access to systems, applications, and data. However, credential-related vulnerabilities remain the predominant entry point exploited by threat actors in the cloud.

Stolen credentials “are now the second-highest initial infection vector, making up 16% of our investigations,” said Jurgen Kutscher, vice-president, Mandiant Consulting, in his summary of our M-Trends 2025 report.

Ensuring the safe management of these credentials is a vital task. Developers may accidentally include credentials in artifacts like source code, built software packages, or Docker images. If these credentials fall into the wrong hands, they can be used by malicious actors for data exfiltration, cryptojacking, ransomware attacks, and general resource abuse.

Safeguarding credentials is particularly acute for open-source developers because when a credential is accidentally included in an artifact that is pushed to a public repository (like GitHub, PyPI or DockerHub), that credential becomes available to anyone on the Internet.

To address this critical issue, we’ve developed a powerful tool to scan open-source package and image files by default for leaked Google Cloud credentials to help protect Google Cloud customers who publish open-source artifacts. Created by Google’s deps.dev team in collaboration with Google Cloud’s credential protection team, we’ve seen significant results in identifying and reporting exposed credentials like API keys, service account keys, and OAuth client secrets in historical artifacts.

While this effort has initially focused on Google Cloud credentials, we plan to expand scanning to include third-party credentials later this year.

Beyond retrospective reporting, the tool also scans newly published open-source artifacts for leaked credentials. This pivotal advance can help drive remediation for immediate security breach threats, significantly reducing the risk of developer compromise.

The tool can also cultivate a culture of improved security by effectively shifting security to earlier in the development lifecycle when problems are easier to solve. By shifting left and encouraging earlier security awareness, the tool can help foster improved credential management practices in the open-source community, ultimately strengthening the resilience and security of the entire software supply chain.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e64d41f80d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Understanding the dangers of exposed cloud credentials

Exposed credentials present a serious security risk to cloud users because they allow an individual to gain access to a user’s cloud environment, including their resources, applications and managed user data. A malicious actor can exploit this access for nefarious purposes such as data theft, cryptojacking, ransomware attacks, and general resource abuse which can result in severe financial, reputational, and operational damage.

Once a credential is obtained by malicious actors it should be considered permanently compromised because compromised credentials are easily copied and shared.

Open source developers, while contributing to the collaborative ecosystem, face the risk of inadvertently exposing sensitive credentials. While source code repository hosts like GitHub and GitLab already scan public source code (and, in some cases, package repositories) for exposed credentials, the challenge extends significantly beyond source code.

Built packages and Docker images often include configuration, compiled binaries, and build scripts, all potential sources of leaked credentials. Publishing these artifacts on open-source repositories like Maven Central, PyPI, or DockerHub can expose leaked credentials to exploitation by any individual on the internet. The ease and speed with which open-source artifacts are shared and distributed magnifies the potential damage, making strong credential management and proactive leak detection and remediation critical.

How to scan open source code for credentials

The deps.dev team provides services to help developers better understand the structure, construction, and security of open-source software. The team maintains and analyzes a continuously updated corpus of over 5 billion unique files, across hundreds of millions of open-source software artifacts like source code repositories, software packages and Docker containers.

The pipeline to support this corpus automatically ingests hundreds of millions of public artifacts from a variety of open source repositories. These include package managers (such as npm, Maven Central, PyPI,) source code repository hosts (such as GitHub and GitLab) and Docker images.

Once artifacts are ingested, they undergo a comprehensive decomposition process, which extracts all constituent parts: every file at every commit in a Git repository, every unarchived or unzipped file in a software package, and every file in every individual layer of a Docker image — not just the files in the final image filesystem. These files are then analyzed which includes scanning them for exposed Google Cloud credentials.

When a suspected Google Cloud credential is detected, the credential reporting backend immediately alerts the credential protection program. Since its creation, we’ve observed this system detect and remediate leaked credentials in minutes of their publication, matching or exceeding the speed with which malicious actors have been demonstrated to exploit them.

Credential containment and recovery

We’ve set up a web endpoint so vetted Google Cloud users and security researchers can submit suspected exposed credentials for review.Once a submitter’s identity is validated, the Google Cloud credential protection system proceeds to confirm the validity of the reported credentials. If the credential is confirmed to be active, Google Cloud provides immediate customer notification through multiple channels, including email, telemetry logs, and in-product alerts.

Google Cloud may take automated remediation steps to mitigate potential damage in accordance with customer configurable policy, such as disabling affected service account keys.

What’s next?

We are actively working to further secure open source communities and protect Google Cloud customers alike by taking a proactive approach to credential exposure. Our efforts in this area include several key initiatives:

Broadening the scope of credential scanning: We’re expanding the range of credential types the tool can scan for, which can help protect more organizations and developers.
Increasing open-source coverage: We’re scanning more open-source platforms and repositories to discover exposed credentials, which can help mitigate risks across more of the ecosystem.
Empowering open-source communities with preventative measures: We’re developing and offering tools that allow open-source communities to integrate credential exposure checks directly into their publish workflow, which can help prevent credential leaks before they happen.

By focusing on both detection and prevention, we aim to foster a more secure and resilient open source environment. To report exposed Google Cloud credentials, please contact gcp-credentials-reports@google.com. If you are a credential provider and would like to talk about partnering with us to scan for your credentials, please contact depsdev@google.com.

Read More for the details.

2025 06 16

GCP – Save early and often with multi-tier checkpointing to optimize large AI training jobs

Tibor Kiss Cloud, Google Cloud gcp

As foundation model training infrastructure scales to tens of thousands of accelerators, efficient utilization of those high-value resources becomes paramount. In particular, as the cluster gets larger, hardware failures become more frequent (~ few hours) and recovery from previously saved checkpoints becomes slower (up to 30 minutes), significantly slowing down training progress. A checkpoint represents the saved state of a model’s training progress at any given time and consists of a set of intermediary model weights and other parameters.

We recently introduced multi-tier checkpointing in AI Hypercomputer, our integrated supercomputing system that incorporates lessons from more than a decade of Google’s expertise in AI. This solution increases the ML Goodput of large training jobs (e.g. by 6.59% in a 35K-chip workload on TPU v5p) by utilizing multiple tiers of storage, including in-cluster memory (RAM) and replication, and Google Cloud Storage, thereby minimizing lost progress during a training job and improving mean-time-to-recovery (MTTR). This solution is compatible with JAX using MaxText as a reference architecture as well as NeMo with PyTorch / GPUs.

Multi-tier checkpointing architecture: checkpoints are stored in (1) each node’s RAM, (2) in a different slice or superblock, and (3) in Cloud Storage.

What this means is that you can take a checkpoint at the most optimal frequency (the checkpoint save scales sub-linearly to < 5 minutes) for the biggest models and across a very large node cluster and restore in under a minute across a cluster with thousands of nodes.

Increases in Goodput can translate directly to decreases in infrastructure costs. For example, consider the case where you are using accelerator chips to train a model that takes one month to complete. Even with a somewhat smaller training workload, the cost savings with optimal checkpointing can be significant. If you have a week-long training job spanning 1K VMs that cost $88/hour (a3-highgpu-8g), a 6.5% increase in Goodput on this training task could result in almost $1M in infrastructure savings.

More failures require more checkpointing

Probabilistically, the mean time between failure (MTBF) of a training job decreases — failures happen more frequently — as the size of the cluster increases. Therefore, it is important that foundation model producers take checkpoints more frequently so they don’t lose too much progress on their training job. In the past, Google Kubernetes Engine (GKE) customers could only write a checkpoint every 30 minutes (saving it to Cloud Storage) and had to wait up to 30 minutes to read the last saved checkpoint and distribute it to all the nodes in the cluster.

Multi-tier checkpointing allows for much faster checkpoint writes and more frequent saves by writing data asynchronously to memory (RAM) on the node and then periodically replicating this data inside the cluster, and backing that data up to Cloud Storage. In the event of a failure, a job’s progress can be recovered quickly by using data from a nearby neighbor’s in-memory checkpoint. If the checkpoint data isn’t available in a nearby node’s RAM, checkpoints are downloaded from Cloud Storage bucket backups. With this solution, checkpoint write latency does not increase with the number of nodes in a cluster — it remains constant. Reads are also constant and scale independently, enabling faster checkpoint loading and reducing MTTR.

Architectural details

Conceptually, the multi-tier checkpointing solution provides a single “magic” local filesystem volume for ML training jobs to use for saving checkpoints and from which to restore. It’s “magic” because while it provides ramdisk-level read/write speeds, it also provides data durability associated with Cloud Storage.

When enabled, local volume (Node storage) is the only storage tier visible to ML jobs. The checkpoints written there are automatically replicated in-cluster to one/two/or more peer nodes and are regularly backed up to Cloud Storage.

When the job restarts, the checkpoint data specific for the new portion of the training job running on the node (i.e., NodeRank) automatically appears on the local volume for ML jobs to use. Behind the scenes, the necessary data may be fetched from another node in the cluster, or from Cloud Storage. Finding the most recent fully written checkpoint (no matter where it is) also happens transparently for ML jobs.

The component responsible for data movement across tiers is called Replicator and is running on every Node as a part of a CSI driver that provides local volume mount.

Delving deeper, the Replicator performs the following critical functions:

Centralized intelligence: It analyzes Cloud Storage backups and the collective in-cluster data to determine the most recent, complete checkpoint with which to restore a job upon restart. Furthermore, it detects successful checkpoint saves by all nodes, signaling when older data can be safely garbage-collected, and strategically decides which checkpoints to back up to Cloud Storage.
Smart peer selection: Because it’s aware of the underlying network topology used by both TPUs and GPUs, the Replicator employs smart criteria to select replication peers for each node. This involves prioritizing a “near” peer with high bandwidth and low latency. This “near” peer may have a potentially higher risk of correlated failure (e.g., within the same TPU Slice or GPU Superblock) and as such, it also selects a “far” peer — one with slightly increased networking overhead but enhanced resilience to independent failures (e.g., that resides in a different GPU Superblock). In data parallelism scenarios, preference is given to any peers that possess identical data.
Automatic data deduplication: When data parallelism is employed, multiple nodes run identical training pipelines, resulting in the saving of identical checkpoints. The Replicator’s peer selection ensures these nodes are paired, eliminating the need for actual data replication. Instead, each node verifies the data integrity of its peers; no additional bandwidth is consumed, replication is instantaneous, and local storage usage is significantly reduced. If peers are misconfigured, standard checkpoint copying is maintained.
Huge-model mode with data parallelism assumption: Beyond optimization, this mode caters to the largest models, where local node storage is insufficient to house both a node’s own checkpoint as well as a peer’s data. In such cases, the ML job configures the Replicator to assume data parallelism, drastically reducing local storage requirements. This extends to scenarios where dedicated nodes handle Cloud Storage backups rather than the nodes storing the most recent checkpoints themselves.
Optimized Cloud Storage utilization: Leveraging data deduplication, all unique data is stored in Cloud Storage only once, optimizing storage space, bandwidth consumption, and associated costs.
Automated garbage collection: The Replicator continuously monitors checkpoint saves across all nodes. Once the latest checkpoint is confirmed to have been successfully saved everywhere, it automatically initiates the deletion of older checkpoints, while ensuring that checkpoints still being backed up to Cloud Storage are retained until the process is complete.

A wide range of checkpointing solutions

At Google Cloud, we offer a comprehensive portfolio of checkpointing solutions to meet diverse AI training needs. Options like direct Cloud Storage and Cloud Storage FUSE are simpler approaches and serve smaller to medium-scale workloads very effectively. Parallel file systems such as Lustre offer high throughput for large clusters, while multi-tier checkpointing is purpose-built for the most demanding, highest-scale (>1K nodes) training jobs that require very frequent saves and rapid recovery.

Multi-tier checkpointing is currently in preview, focused on JAX for Cloud TPUs and PyTorch on GPUs. Get started with it today by following our user guide, and don’t hesitate to reach out to your account team if you have any questions or feedback.

Read More for the details.

2025 06 13

GCP – How to benchmark and scale your Google Cloud Managed Service for Kafka deployment

Tibor Kiss Cloud, Google Cloud gcp

Businesses that rely on real-time data for decision-making and application development need a robust and scalable streaming platform, and Apache Kafka has emerged as the leading solution.

At its core, Kafka is a distributed streaming platform that allows applications to publish and subscribe to streams of records, much like a message queue or enterprise messaging system, and goes beyond traditional messaging with features like high throughput, persistent storage, and real-time processing capabilities. However, deploying, managing, and scaling Kafka clusters can be challenging. This is what Google Cloud’s Managed Service for Apache Kafka solves. This managed Kafka service is open-source compatible and portable, easy to operate, and secure, allowing you to focus on building and deploying streaming applications without worrying about infrastructure management, software upgrades, or scaling. It’s also integrated for optimal performance with other Google Cloud data services such as BigQuery, Cloud Storage and Dataflow.

While Apache Kafka offers immense power, achieving optimal performance isn’t automatic. It requires careful tuning and benchmarking. This post provides a hands-on guide to optimize your deployments for throughput and latency.

Note: We assume a high-level understanding of Apache Kafka and BASH scripting. For an introduction and overview of Apache Kafka, visit the Apache Software Foundation website. For an introduction to BASH, please visit this Geeks for Geeks tutorial.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e29765f6280>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

Benchmarking Kafka producers, consumers and latencies

Benchmarking your Kafka deployment is crucial for understanding its performance characteristics and ensuring it can serve your application’s requirements. This involves a deep dive into metrics like throughput and latency, along with systematic experimentation by optimizing your producer and consumer configurations. It’s important to note that this is done at a topic / application level and should be replicated for each topic.

Optimizing for throughput and latency

The Apache Kafka bundle includes two utilities – kafka-producer-perf-test.sh and kafka-consumer-perf-test.sh – to assess producer and consumer performance as well as latencies.

Note: while we are using some config values in order to demonstrate tool usage, it’s recommended that you use configurations (e.g. message size, message rates, etc…) that mirror your workloads.

kafka-producer-perf-test

This tool simulates producer behavior by sending a specified number of messages to a topic while measuring throughput and latencies, and takes the following flags:

topic (required): Specifies the target Kafka topic
num-records (required): Sets the total number of messages to send
record-size (required): Defines the size of each message in bytes
throughput (required): Sets a target throughput in messages per second (use -1 to disable throttling)
producer-props:

bootstrap.servers (required): Comma-separated list of Kafka bootstrap server or broker addresses.
acks (optional): Controls the level of acknowledgment required from brokers (0, 1, or all). 0 for no broker, 1 for leader broker and ‘all’ for all brokers. The default value is ‘all’.
batch.size (optional): The maximum size of a batch of messages in bytes. The default value is 16KB.
linger.ms (optional): The maximum time to wait for a batch to fill before sending. The default value is 0 ms.
Compression.type (optional): any one of: none, gzip, snappy, lz4, zstd. The default value is none.

Sample code block #1: Kafka producer performance test

code_block: <ListValue: [StructValue([(‘code’, ‘./kafka-producer-perf-test.sh \rn–topic <topic_name> \rn–num-records 5000000 \rn–record-size 1024 \rn–throughput -1 \rn–producer-props bootstrap.servers=<bootstrap_servers> \rnacks=1 \rnbatch.size=10000 \rnlinger.ms=10 \rncompression.type=<compression_type>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e29765f6520>)])]>

Important considerations

The most crucial properties are acks, batch.size, linger.ms, and compression because they directly influence producer throughput and latency. While exact settings depend on your application, we suggest these baseline configurations:

acks: acks=1 requires acknowledgement from the leader broker only. This will give the best performance unless you need acks from all the leaders and followers.
batch.size: 10000B, or 10 KB, is a good baseline value to start with. Increasing the batch size allows producers to send more messages in a single request, reducing overhead.
linger.ms: 10ms is a good value as a baseline. You can try within a range of 0-50ms. Increasing linger time further can result in increased latencies.
compression: The recommendation is to use compression to further increase your throughput and reduce latencies.

kafka-consumer-perf-test

This tool simulates consumer behavior by fetching messages from a Kafka topic and measuring the achieved throughput and latencies. Key properties include:

topic (required): Specifies the Kafka topic to consume from.
bootstrap-server (required): Comma-separated list of kafka bootstrap server or broker addresses.
messages (required): The total number of messages to consume.
group (optional): The consumer group ID.
fetch-size (optional): The maximum amount of data to fetch in a single request. The default value is 1048576 bytes or 1.04MB.

Sample code block #2: Kafka consumer test

code_block: <ListValue: [StructValue([(‘code’, ‘./kafka-consumer-perf-test.sh \rn–topic <topic_name> \rn–bootstrap-server <bootstrap_servers> \rn–messages 1000000 \rn–group <consumer_group>\rn–fetch-size 10000000’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e29765f6700>)])]>

Important considerations

To achieve optimal consumer throughput, the fetch-size property is crucial for tuning. The default fetch-size configuration is largely determined by your consumption and throughput needs, and can range from up to 1MB for smaller messages to 1-50MB for larger ones. It’s advisable to analyze the effects of different fetch sizes on both application responsiveness and throughput. By carefully documenting these tests and examining the resulting information, you can pinpoint performance limitations and refine your settings accordingly.

How to benchmark throughput and latencies

Benchmarking the producer

When conducting tests to measure the throughput and latencies of Kafka producers, the key parameters are batch.size, or the maximum size of a batch of messages, and linger.ms, the maximum time to wait for a batch to fill before sending. For the purposes of this benchmark, we suggest keeping acks at 1 (acknowledgment from the leader broker) to balance durability and performance. This helps us to estimate the expected throughput and latencies for a producer. Note that message size is kept constant as 1KB.

Throughput (messages/s)	Throughput (MBs)	Latency(ms)	ack(=1)	batch_size	linger_ms
48049	45	608	Leader	1KB	10
160694	153	171	Leader	10KB	10
117187	111	268	Leader	100KB	10
111524	106	283	Leader	100KB	100

Analysis and findings

The impact of batch size: As expected, increasing batch size generally leads to higher throughput (messages/s and MBs). We see a significant jump in throughput as we move from 1KB to 10KB batch sizes. However, further increasing the batch size to 100KB does not show a significant improvement in throughput. This suggests that an optimal batch size exists beyond which further increases may not yield substantial throughput gains.
Impact of linger time: Increasing the linger time from 10ms to 100ms with a 100KB batch size slightly reduced throughput (from 117,187 to 111,524 messages/s). This indicates that, in this scenario, a longer linger time might not be much beneficial for maximizing throughput.
Latency considerations: Latency tends to increase with larger batch sizes. This is because messages wait longer to be included in a larger batch before being sent. This is clearly visible when batch_size is increased from 10KB to 100KB.

Together, these findings highlight the importance of careful tuning when configuring Kafka producers. Finding the optimal balance between batch.size and linger.ms is crucial for achieving desired throughput and latency goals.

Benchmarking for consumer

To assess consumer performance, we conducted a series of experiments using kafka-consumer-perf-test, systematically varying the fetch size.

Throughput(messages/sec)	Throughput(MBs)	fetch-size
2825	2.6951	10KB
3645	3.477	100KB
18086	17.8	1MB
49048	46	10MB
61334	58	100MB
62562	60	500MB

Analysis and findings

Impact of fetch size on throughput: The results clearly demonstrate a strong correlation between fetch.size and consumer throughput. As we increase the fetch size, both message throughput (messages/s) and data throughput (MBs) improve significantly. This is because larger fetch sizes allow the consumer to retrieve more messages in a single request, reducing the overhead of frequent requests and improving data transfer efficiency.
Diminishing returns: While increasing fetch.size generally improves throughput, we observe diminishing returns as we move beyond 100MB. The difference in throughput between 100MB and 500MB is not significant, suggesting that there’s a point where further increasing the fetch size provides minimal additional benefit.

Scaling the Google Managed Service for Apache Kafka

Based on some more experiments, we explored optimal configurations for the managed Kafka cluster. Please note that for this exercise, we kept message size as 1KB, batch size as 10KB, the topic has 1000 partitions, and the replication number is 3. The results were as follows.

Producer threads	cluster_bytes_in_count (MBs)	CPU Util	Memory Util	vCPU	Memory
1	56	98%	58%	3	12gb
1	61	24%	41%	12	48gb
2	104	56%	57%	12	48gb
4	199	64%	60%	12	48gb

Scaling your managed Kafka cluster effectively is crucial to ensure optimal performance as your requirements grow. To determine the right cluster configuration, we conducted experiments with varying numbers of producer threads, vCPUs, and memory. Our findings indicate that vertical scaling, by increasing vCPUs and memory from 3 vCPUs/12GB to 12 vCPUs/48GB, significantly improved resource utilization. With two producer threads, the cluster’s byte_in_count metric doubled and CPU utilization increased to 56% from 24%. Your throughput requirements play a vital role. With 12 vCPUs/48GB, moving from 2 to 4 producer threads nearly doubled the cluster’s bytes_in_count. You also need to monitor resource utilization to avoid bottlenecks, as increasing throughput can increase CPU and memory utilization. Ultimately, optimizing managed Kafka service performance requires a careful balance between vertical scaling of the cluster and your throughput requirements, tailored to your specific workload and resource constraints.

Build the Kafka cluster you need

In conclusion, optimizing your Google Cloud Managed Service for Apache Kafka deployment involves a thorough understanding of producer and consumer behavior, careful benchmarking, and strategic scaling. By actively monitoring resource utilization and adjusting your configurations based on your specific workload demands, you can ensure your managed Kafka clusters deliver the high throughput and low latency required for your real-time data streaming applications.

Interested in diving deeper? Explore the resources and documentations linked below:

Apache Kafka

Google Cloud Managed Service for Apache Kafka

Read More for the details.

2025 06 13

GCP – How good is your AI? Gen AI evaluation at every stage, explained

Tibor Kiss Cloud, Google Cloud gcp

As AI moves from promising experiments to landing core business impact, the most critical question is no longer “What can it do?” but “How well does it do it?”.

Ensuring the quality, reliability, and safety of your AI applications is a strategic imperative. To guide you, evaluation must be your North Star—a constant process that validates your direction throughout the entire development lifecycle. From crafting the perfect prompt and choosing the right model to deciding if tuning is worthwhile and evaluating your agents, robust evaluation provides the answers.

One year ago, we launched the Gen AI evaluation service, offering capabilities to evaluate various models including Google’s foundation models, open models, proprietary foundation models, and customized models. It provided online evaluation modes with pointwise and pairwise criteria, utilizing computation and Autorater methods.

Since then, we’ve listened closely to your feedback and focused on addressing your most important needs. That’s why today we’re excited to dive into the new features of the Gen AI Evaluation Service, designed to help you scale your evaluations, evaluate your autorater, customize your autorater with rubrics and evaluate your agents in production.

1 Framework to evaluate your generative AI — Framework to evaluate your generative AI

1. Scale your valuation with Gen AI batch evaluation

One of the most pressing questions for AI developers is, “How can I run evaluation at scale”? Previously, scaling evaluations could be heavy-engineered, hard to maintain, and expensive. You have to build your own batch evaluation processes combining multiple Google Cloud services.

The new batch evaluation feature simplifies this process, providing a single API for large datasets. This means you can evaluate large volumes of data efficiently, supporting all methods and metrics available in the Gen AI evaluation service in Vertex AI. It’s designed to be cheaper and more efficient than previous approaches.

You can learn more about how to run batch evaluation with the Gemini API in Vertex AI in this tutorial.

2. Scrutinize your autorater and build trust

A common and critical concern we hear from developers is, “How can I customize and truly evaluate my autorater?” While using an LLM to assess an LLM-based application offers scale and efficiency, it also introduces valid questions about its limitations, robustness, and potential biases. The fundamental challenge is building trust in its results.

We believe that trust isn’t given; it’s built through transparency and control. Our features are designed to empower you to rigorously scrutinize and refine your autorater. This is achieved through two key capabilities:

First, you can evaluate your autorater’s quality. By creating a benchmark dataset of human-rated examples, you can directly compare the autorater’s judgments against your “source of truth.” This allows you to calibrate its performance, measure its alignment with you, and gain a clear understanding of areas that need improvement.
Second, you can actively improve its alignment. We provide several approaches to customize your autorater’s behavior. You can refine the autorater’s prompt with specific criteria, chain-of-thought reasoning, and detailed scoring guidelines. Furthermore, advanced settings and the ability to bring and tune the autorater with your own reference data ensures it meets your specific needs and is able to capture unique use cases.

Here is an example of analysis you can build with the new autorater customization features.

Check out the Advanced judge model customization series in the official documentation to learn more about how to evaluate and configure the judge model. For a practical example, here is a tutorial on how to customize your evaluations using an open autorater with Vertex AI Gen AI Evaluation.

3. Rubrics-driven evaluation

Evaluating complex AI applications can sometimes present a frustrating challenge: how can you use a fixed set of criteria when every input is different? A generic list of evaluation criteria often fails to capture the nuance of a complex multimodal use case, such as image understanding.

To solve this, our rubrics-driven evaluation feature breaks the evaluation experience into a two-step approach.

Step 1 – Rubric generation: First, instead of asking users to provide a static list of criteria, the system acts like a tailored test-maker. For each individual data point in your evaluation set, it automatically generates a unique set of rubrics—specific, measurable criteria adapted to that entry’s content. You can review and customize these tests, if needed.
Step 2 – Targeted autorating: Next, the autorater uses these custom-generated rubrics to assess the AI’s response. This is like a teacher writing unique questions for each student’s essay based on its specific topic, rather than using the same generic questions for the whole class.

This process ensures that every evaluation is contextual and insightful. It enhances interpretability by tying every score to criteria that are directly relevant to the specific task, giving you a far more accurate measure of your model’s true performance.

Here, you can see an example of the rubric-driven pairwise evaluation you will be able to produce with Gen AI evaluation service on Vertex AI.

Check out these examples of running rubric-based evaluation for instruction-following, multimodal, and text quality. Also, we have worked with our research team to implement rubrics-based autorater for text- to-image and text-to-video.

4. Agent evaluation

We are at the beginning of the agentic era, where agents reason, plan, and use tools to accomplish complex tasks. However, evaluating these agents presents a unique challenge. It’s no longer sufficient to just assess the final response; we need to validate the entire decision-making process. “Did the agent choose the right tool?”, “Did it follow a logical sequence of steps?”, “Did it effectively store and use information to provide personalized answers?”. These are some of the critical questions that determine an agent’s reliability.

To address some of these challenges, the Gen AI evaluation service in Vertex AI introduces capabilities specifically for agent evaluation. You can evaluate not only the agent’s final output but also gain insights into its “trajectory”—the sequence of actions and tool calls it makes. With specialized metrics for trajectory, you can assess your agent’s reasoning path. Whether you’re building with Agent Development Kit, LangGraph, CrewAI, or other frameworks, and hosting them locally or on Vertex AI Agent Engine, you can analyze if the agent’s actions were logical and if the right tools were used at the right time. All results are integrated with Vertex AI Experiments, providing a robust system to track, compare, and visualize performance, enabling you to build more reliable and effective AI agents.

Here you can find a detailed documentation with several examples of agent evaluation with Gen AI evaluation service on Vertex AI.

Finally, we recognize that evaluation remains a research frontier. We believe that collaborative efforts are key to addressing current challenges. Therefore, we are actively working with companies like Weights & Biases, Arize, and Maxim AI. Together, we aim to find solutions for open challenges such as the cold-start data problem, multi-agent evaluation, and real-world agent simulation for validation.

Get started today

Ready to build reliable LLMs applications ready for production on Vertex AI? The Gen AI evaluation service in Vertex AI addresses the most requested features from users, providing a powerful, comprehensive suite for evaluating your AI application. By enabling you to scale evaluations, build trust in your autorater, and assess multimodal and agentic use cases, we want to foster confidence and efficiency, ensuring your LLM-based applications perform as expected in production.

Check the comprehensive documentation and code examples for the Gen AI evaluation service.

Read More for the details.

2025 06 12

GCP – What’s new with Google Data Cloud

Tibor Kiss Cloud, Google Cloud gcp

June 9 – June 13

Introducing Pub/Sub Single Message Transforms (SMTs), to make it easy to perform simple data transformations such as validate, filter, enrich, and alter individual messages as they move in real time right within Pub/Sub. The first SMT is available now: JavaScript User-Defined Functions (UDFs), which allows you to perform simple, lightweight modifications to message attributes and/or the data directly within Pub/Sub via snippets of JavaScript code. Learn more in the launch blog.
Serverless Spark is now generally available directly within BigQuery. Formerly Dataproc Serverless, the fully managed Google Cloud Serverless for Apache Spark helps to reduce TCO, provides strong performance with the new Lightning Engine, integrates and leverages AI, and is enterprise-ready. And by bringing Apache Spark directly into BigQuery, you can now develop, run and deploy Spark code interactively in BigQuery Studio. Read all about it here.
Next-Gen data pipelines: Airflow 3 arrives on Google Cloud Composer: Google is the first hyperscaler to provide selected customers with access to Apache Airflow 3, integrated into our fully managed Cloud Composer 3 service. This is a significant step forward, allowing data teams to explore the next generation of workflow orchestration within a robust Google Cloud environment. Airflow 3 introduces powerful capabilities, including DAG versioning for enhanced auditability, scheduler-managed backfills for simpler historical data reprocessing, a modern React-based UI for more efficient operations, and many more features.

June 2 – June 6

Enhancing BigQuery workload management: BigQuery workload management provides comprehensive control mechanisms to optimize workloads and resource allocation, preventing performance issues and resource contention, especially in high-volume environments. To make it even more useful, we announced several updates to BigQuery workload management around reservation fairness, predictability, flexibility and “securability,” new reservation labels, as well as autoscaler improvements. Get all the details here.
Bigtable Spark connector is now GA: The latest version of the Bigtable Spark connector opens up a world of possibilities for Bigtable and Apache Spark applications, not least of which is additional support for Bigtable and Apache Iceberg, the open table format for large analytical datasets. Learn how to use the Bigtable Spark connector to interact with data stored in Bigtable from Apache Spark, and delve into powerful use cases that leverage Apache Iceberg in this post.
BigQuery gets transactional: Over the years, we’ve added several capabilities to BigQuery to bring near-real-time, transactional-style operations directly into your data warehouse, so you can handle common data management tasks more efficiently from within the BigQuery ecosystem. In this blog post, you can learn about three of them: efficient fine-grained DML mutations; change history support for updates and deletes; and real-time updates with DML over streaming data.
Google Cloud databases integrate with MCP: We announced capabilities in MCP Toolbox for Databases (Toolbox) to make it easier to connect databases to AI assistants in your IDE. MCP Toolbox supports BigQuery, AlloyDB (including AlloyDB Omni), Cloud SQL for MySQL, Cloud SQL for PostgreSQL, Cloud SQL for SQL Server, Spanner, self-managed open-source databases including PostgreSQL, MySQL and SQLLite, as well as databases from other growing list of vendors including Neo4j, Dgraph, and more. Get all the details here.

Read More for the details.

2025 06 12

GCP – Cloud CISO Perspectives: How Google secures AI Agents

Tibor Kiss Cloud, Google Cloud gcp

Welcome to the first Cloud CISO Perspectives for June 2025. Today, Anton Chuvakin, security advisor for Google Cloud’s Office of the CISO, discusses a new Google report on securing AI agents, and the new security paradigm they demand.

As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.

aside_block: <ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x3e598bb2a580>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>

How Google secures AI Agents

By Anton Chuvakin, security advisor, Office of the CISO

Anton_Chuvakin_Headshot_18L8044_SQ1_Hi_Res — Anton Chuvakin, security advisor, Office of the CISO

The emergence of AI agents promises to reshape our interactions with information systems — and ultimately with the real world, too. These systems, distinct from the foundation models they’re built on, possess the unique ability to act on information they’ve been given to achieve user-defined goals. However, this newfound capability introduces a critical challenge: agent security.

Agents strive to be more autonomous. They can take information and use it in conjunction with tools to devise and execute complex plans, so it’s critical that developers align agent behavior with user intent to prevent unintended and harmful actions.

With this great power comes a great responsibility for agent developers. To help mitigate the potential risks posed by rogue agent actions, we should invest in a new field of study focused specifically on securing agent systems.

While there are similarities to securing AI, securing AI agents is distinct and evolving, and demands a new security paradigm.

Google advocates for a hybrid defense-in-depth approach that combines the strengths of both traditional (deterministic) and reasoning-based (dynamic) security measures. This creates layered defenses that can help prevent catastrophic outcomes while preserving agent usefulness.

To help detail what we believe are the core issues, we’ve published a comprehensive guide covering our approach to securing AI agents that addresses concerns for both AI agent developers and security practitioners. Our goal is to provide a clear and actionable foundation for building secure and trustworthy AI agent systems that benefit society.

We cover the security challenges of agent architecture, the specific risks of rogue actions and sensitive data disclosure, and detail the three fundamental agent security principles: well-defined human controllers, limited agent powers, and observable agent actions.

Agents must have well-defined human controllers: Agents must operate under clear human oversight, with the ability to distinguish authorized user instructions from other inputs.
Agent powers must have limitations: Agent actions and resource access must be carefully limited and dynamically aligned with their intended purpose and user risk tolerance. This emphasizes the least-privilege principle.
Agent actions and planning must be observable: Agent activities must be transparent and auditable through robust logging and clear action characterization.

Google’s hybrid approach: Agentic defense-in-depth

We believe that the most effective and efficient defense-in-depth path forward secures agents with both classic and AI controls. Our approach advocates for two distinct layers:

Layer 1: Use traditional, deterministic measures, such as runtime policy enforcement. Runtime policy engines act as external guardrails, monitoring and controlling agent actions before execution based on predefined rules. These engines use action manifests to capture the security properties of agent actions, such as dependency types, effects, authentication, and data types.
Layer 2: Deploy reasoning-based defense strategies. This layer uses the AI model’s own reasoning to enhance security. Techniques such as adversarial training and using specialized models as security analysts can help the agent distinguish legitimate commands from malicious ones, making it more resilient against attacks, data theft, and even model theft.

Of course, each of the above two layers should have their own layers of defense. For example, model-based input filtering coupled with adversarial training and other techniques can help reduce the risk of prompt injection, but not completely eliminate it. Similarly, these defense measures would make data theft more difficult, but would also need to be enhanced by traditional controls such as rule-based and algorithmic threat detection.

Key risks, limitations, and challenges

Traditional security paradigms, designed for static software or general AI, are insufficient for AI agents. They often lack the contextual awareness needed to know what the agent is reasoning about and can overly restrict an agent’s utility.

Similarly, relying solely on a model’s judgment for security is also inadequate because of the risk posed by vulnerabilities such as prompt injection, which can compromise the integrity and functionality of an agent over time.

In the wide universe of risks to AI, two risks associated with AI agents stand out from the crowd by being both more likely to manifest and more damaging if ignored.

Rogue actions are unintended, harmful, and policy-violating behaviors an agent might exhibit. They can stem from several factors, including the stochastic nature of underlying models, the emergence of unexpected behaviors, and challenges in aligning agent actions with user intent. Prompt injections are a significant vector for inducing rogue actions.

For example, imagine an agent designed to automate tasks in a cloud environment. A user intends to use the agent to deploy a virtual machine. However, due to a prompt injection attack, the agent instead attempts to delete all databases. A runtime policy engine, acting as a guardrail, would detect the “delete all databases” action (from its action manifest) and block it because it violates predefined rules.

Sensitive data disclosure involves the unauthorized revelation of private or confidential information by agents. Security measures would help ensure that access to sensitive data is strictly controlled.

For example, an agent in the cloud might have access to customer data to generate reports. If not secured, the agent might retain this sensitive data and then be coaxed to expose it. A malicious user could then ask a follow-up question that triggers the agent to inadvertently disclose some of that retained data.

However, securing AI agents is inherently challenging due to four factors:

Unpredictability (non-deterministic nature)
Emergent behaviors
Autonomy in decision-making
Alignment issues (ensuring actions match user intent)

Practical security considerations

Our recommended hybrid approach addresses several critical areas.

Agent/plugin user controls: Emphasizes human confirmation for critical and irreversible actions, clear distinction between user input and other data, and verifiable sharing of agent configurations.
Agent permissions: Adherence to the least-privilege principle, confining agent actions to its domain, limiting permissions, and allowing for user authority revocation. This level of granular control often surprises security leaders because such a traditional 1980s-style security control delivers high value for securing 2020s AI agents.
Orchestration and tool calls: The intricate relationship between AI agents and external tools and services they use for orchestration can present unique security risks, especially with “Actions as Code.” Robust authentication, authorization, and semantic tool definitions are crucial risk mitigations here.
Agent memory: Data stored in an agent’s memory can lead to persistent prompt injections and information leakage.
Response rendering: Safely rendering AI agent outputs into user-readable content is vital to prevent classic web vulnerabilities.

Assurance and future directions

Continuous assurance efforts are essential to validate agent security. This includes regression testing, variant analysis, red teaming, user feedback, and external research programs to ensure security measures remain effective against evolving threats.

Securing AI agents requires a multi-faceted, hybrid approach that carefully balances the utility of these systems with the imperative to mitigate their inherent risks. Google Cloud offers controls in Agentspace that follow these guidelines, such as authentication and authorization, model safeguards, posture assessment, and of course logging and detection.

To learn more about how Google is approaching securing AI agents, please read our research paper.

aside_block: <ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x3e59a87e8b20>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>

In case you missed it

Here are the latest updates, products, services, and resources from our security teams so far this month:

Project Shield blocked a massive recent DDoS attack. Here’s how: Project Shield, Google’s free service that protects at-risk sites against DDoS attacks, kept KrebsOnSecurity up during a recent, massive one. Here’s what happened. Read more.
Don’t test in prod. Use digital twins for safer, smarter resilience: Digital twins are replicas of physical systems using real-time data to create a safe test environment. Here’s how they can help business and security leaders. Read more.
How to build a digital twin with Google Cloud: Digital twins are essentially IT stunt doubles, cloud-based replicas of physical systems for testing. Learn how to build them on Google Cloud. Read mo11re.
Enhancing protection: 4 new Security Command Center capabilities: Security Command Center has a unique vantage point to protect Google Cloud environments. Here are four new SCC capabilities. Read more.

Please visit the Google Cloud blog for more security stories published this month.

aside_block: <ListValue: [StructValue([(‘title’, ‘Tell us what you think’), (‘body’, <wagtail.rich_text.RichText object at 0x3e59a87e8df0>), (‘btn_text’, ‘Vote now’), (‘href’, ‘https://www.linkedin.com/feed/update/urn:li:activity:7338626074882240512’), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>

Threat Intelligence news

The cost of a call, from voice phishing to data extortion: Google Threat Intelligence Group (GTIG) is tracking threat actors who specialize in voice phishing (vishing) campaigns designed to compromise Salesforce instances for large-scale data theft and subsequent extortion. Here’s several defensive measures you can take. Read more.
A technical analysis of vishing threats: Financially motivated threat actors have increasingly adopted voice-based social engineering, or “vishing,” as a primary vector for initial access, though their specific methods and end goals can vary significantly. Here’s how they do it — and what you can do to stop them. Read more.

Please visit the Google Cloud blog for more threat intelligence stories published this month.

Now hear this: Podcasts from Google Cloud

Debunking cloud breach myths (and what DBIR says now): Everything (and we mean everything) you wanted to know about cloud breaches, but were (legitimately, of course) afraid to ask. Verizon Data Breach Report lead Alex Pinto joins hosts Anton Chuvakin and Tim Peacock for a lively chat on breaching clouds. Listen here.
Is SIEM in 2025 still too hard: Alan Braithwaite, co-founder and CTO, RunReveal, discusses the future of SIEM and security telemetry data with Anton and Tim. Listen here.
Cyber-Savvy Boardroom: Jamie Collier on today’s threat landscape: Jamie Collier, lead Europe advisor, GTIG, joins Office of the CISO’s David Homovich and Anton Chuvakin to talk about what boards need to know about today’s threat actors. Listen here.
Defender’s Advantage: Confronting a North Korean IT worker incident: Mandiant Consulting’s Nick Guttilla and Emily Astranova join Luke McNamara for an episode on the AI-driven use of voice-based phishing, or “vishing,” and how they use it during red team engagements. Listen here.
Behind the Binary: Protecting software intellectual property: Tim Blazytko, chief scientist and head of engineering, Emproof, talks with host Josh Stroschein about the essential strategies for protecting software intellectual property. Listen here.

To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.

Read More for the details.

Google Cloud

Enhanced vectorization: next-level query execution

Enhanced vectorization in action

What’s next

What makes cloud backup resilient?

5 signs your backup posture may be at risk

Best practices for data protection

1. Versioning and retention: first lines of defense

2. Monitor for gaps in coverage

3. Design for granular recovery

Automating the complexity away

From backups to business insights

What a “mature” backup posture looks like

Ready to simplify backup?

The generative AI revolution in BI

Google-easy data storytelling with Looker reports

Empowering developers and embedded experiences

Bringing trust to every gen-AI-powered business

Mitigations

Lure PDF Document

Immutability and indelibility

Backup Vault now supports Persistent Disk and Hyperdisk

Here’s how it works

Secure disaster recovery with multi-region backup vaults

Protect all your critical Compute Engine data

Introducing backup vaults for cyber resilience and simplified Compute Engine backups

It’s (mostly) a constraint optimization problem

Core building blocks

Workload scheduling constraint scenarios

Conclusion and next steps

Spanner’s core innovation: TrueTime and external consistency

Addressing the consistency-scale dilemma

Spanner as a cloud service

Empowering customers and industries

The future with Spanner

Experience the Spanner difference

Learn more about the graduating startups and their inspiring work:

Build with confidence using production-ready Gemini 2.5

Enhanced customization and efficiency for your needs

MCP Transports

Benefits of running an MCP server remotely

Prerequisites

Installation

Math MCP Server

Transport

Deploying to Cloud Run

Option 1 – Deploy from source

Option 2 – Deploy from a container image

Authenticating MCP Clients

Testing the remote MCP server

Continue Reading

Supercharge your workloads while optimizing costs

Security, maintenance controls and shapes

Experience C4D today

Driving enterprise transformation with new compute innovations and offerings

Why use Cloud Location Finder?

How can Cloud Location Finder help you?

Getting started

Tech stack from Google Cloud

Deep dive: How to build a KYC agent in three steps

Start building now

Understanding the dangers of exposed cloud credentials

How to scan open source code for credentials

Credential containment and recovery

What’s next?

More failures require more checkpointing

Architectural details

A wide range of checkpointing solutions

Benchmarking Kafka producers, consumers and latencies

Optimizing for throughput and latency

How to benchmark throughput and latencies

Scaling the Google Managed Service for Apache Kafka

Build the Kafka cluster you need

1. Scale your valuation with Gen AI batch evaluation

2. Scrutinize your autorater and build trust

3. Rubrics-driven evaluation

4. Agent evaluation

Get started today

June 9 – June 13

June 2 – June 6