Google Cloud

2025 06 24

GCP – The secret to document intelligence: Box builds Enhanced Extract Agents using Google’s Agent-2-Agent framework

Box is one of the original information sharing and collaboration platforms of the digital era. They’ve helped define how we work, and have continued to evolve those practices alongside successive waves of new technology. One of the most exciting advances of the generative AI era is that now, with all the data that Box users have stored, they can get considerably more value out of those files by using AI to search and synthesize their information in new ways.

That’s why Box created Box AI Agents, to intelligently discern and structure complex unstructured data. Today, we’re excited to announce the availability of the Box AI Enhanced Extract Agent. The Enhanced Extract Agent runs on Google’s most advanced Gemini 2.5 models, and they also feature Google’s Agent2Agent protocol, which allows secure connection and collaboration between AI agents across dozens of platforms in the A2A network.

The Box AI Enhanced Extract Agent gives enterprises users confidence in their AI, helping overcome any hesitations they might feel about gen AI technology and using it for business-critical tasks.

In this post, we’ll take a closer look at how our teams created the Box AI Enhanced Extract Agent and what others building new agentic AI systems might consider when developing their own solutions.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e58a1352760>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Getting more content with confidence

When it comes to data extraction, simply pulling out text from documents is no longer sufficient. A core objective that businesses need peace of mind on is uncertainty estimation, which we define as understanding how uncertain the model is about particular extraction. This is paramount when an organization is processing vast quantities of documents — such as searching tens of thousands of items where you’re trying to extract all the relevant and related values in each of those items — and you need to guide human review effectively and with confidence. The goal isn’t just high accuracy, but also a reliable confidence score for each piece of extracted data.

With the Box AI Enhanced Extract Agent, we wanted to transform how businesses interact with their most complex content — whether that’s scanned PDFs, images, slides, and other diverse materials — and then turn it all into structured, actionable intelligence.

Box and Box AI are already available on Google Cloud Marketplace.

For instance, financial services organizations can automate loan application reviews by accurately extracting applicant details and income data; legal teams can accelerate discovery by pinpointing critical clauses in contracts; and HR departments can streamline onboarding by processing new hire paperwork automatically. In each of these cases, all extracted data like key dates and contractual terms can be validated by the crucial confidence scores that this Box and Google collaboration delivers. This confidence score helps ensure reliable, AI-vetted information powers efficient operations and proactive compliance without extensive manual effort.

Powering enhanced data extraction with Gemini 2.5 Pro

Box’s Enhanced Extract Agent leverages the sophisticated multimodal, agentic reasoning and capabilities of Google’s Gemini 2.5 Pro as its core intelligence engine. However, the relationship goes beyond simple API calls.

“Gemini 2.5 Pro is way ahead due to its multimodal, deep reasoning, and code generation capabilities in terms of accuracy compared to previous models for these complex extraction tasks,” Ben Kus, CTO at Box said. “These capabilities make Gemini a crucial component for achieving Box’s ambitious goals of turning unstructured content into structured content through enhanced extraction agents.”

To build robust confidence scores and enable deeper understanding, Box’s AI Agents acquire specific, granular information that the Gemini 2.5 Pro model is uniquely adept at providing.

An agent-to-agent protocol for deeper collaboration

Box is championing an open AI ecosystem by embracing Google Cloud’s Agent2Agent protocol, enabling all Box AI Agents to securely collaborate with diverse external agents from dozens of partners (a list that keeps growing). By adopting the latest A2A specification, Box AI can ensure efficient and secure communication for complex, multi-system processes. This empowers organizations to power complex, cross-system workflows—bringing intelligence directly to where content lives, boosting productivity through seamless agent collaboration.This advanced interplay leverages the proposed agent-to-agent protocol in the following manners:

Box’s AI Agents: Orchestrate the overall extraction task, manages user interactions, applies business logic, and crucially, performs the confidence scoring and uncertainty analysis.
Google’s Gemini 2.5 Pro: Provides the core text comprehension, reasoning, and generation; and in this enhanced protocol, Gemini models also aim to furnish deeper operational data (like token likelihoods) to its counterpart.

This protocol, for example, allows Box’s Enhanced Extract Agent to “look under the hood” of Gemini 2.5 Pro to a greater extent than typical AI model integrations. This deeper insight is essential for:

Building Reliable Confidence Scores: Understanding how certain Gemini 2.5 Pro is about each generated token allows Box AI’s enhanced data extraction capabilities to construct more accurate and meaningful confidence metrics for the end-user.
Enhancing Robustness: Another key area of focus is model robustness ensuring consistent outputs. As Kus put it: “For us robustness is if you run the same model multiple times, how much variation we would see in the values. We want to reduce the variations to be minimal. And with Gemini, we can achieve this.”

Furthering this commitment to an open and extensible ecosystem, Box AI Agents will be published on Agentspace and will be able to interact with other agents using the A2A protocol. Box has also published support for the Google’s Agent Development Kit (ADK) so developers can build Box capabilities into their ADK agents, truly integrating Box intelligence across their enterprise applications.

The Google ADK, an open-source, code-first Python toolkit, empowers developers to build, evaluate, and deploy sophisticated AI agents with flexibility and control. To expand these capabilities, we have created the Box Agent for Google ADK , which allows developers to integrate Box’s Intelligent Content Management platform with agents built with Google ADK, enabling the creation of custom AI-powered solutions that enhance content workflows and automation.

This integration with ADK is particularly valuable for developers, as it allows them to harness the power of Box’s Intelligent Content Management capabilities using familiar software development tools and practices to craft sophisticated AI applications. Together, these tools provide a powerful, streamlined approach to build innovative AI solutions within the Box ecosystem.

Continual learning and human-in-the-loop, for the most flexible AI

The vision for enhanced extract includes a dynamic, self-improving system. “We want to implement that cycle so that you can get higher and higher confidence,” Kus, Box’s CTO, said. “This involves a human-in-the-loop process where low-confidence extractions are reviewed, and this feedback is used to refine the system.”

Here, the flexibility of Gemini 2.5 Pro, particularly concerning fine-tuning, enables continual improvement. Box is exploring advanced continual learning approaches, including:

In-context learning: Providing corrected examples within the prompt to Gemini 2.5 Pro.
Supervised fine-tuning: Google Cloud’s Vertex AI allows Box to store the fine-tuned weights in the company’s system and then just use them to run their fine-tuned model.

Box AI’s Enhanced Extract Agent would manage these fine-tuned adaptations (for example through small LoRA layers specific to a customer or document template) and provide them to the Gemini 2.5 Pro agent at inference time. “Gemini 2.5 Pro can be used to leverage these adaptations efficiently, using the context caching capability of Gemini models on Vertex AI to tailor its responses for specific, high-value extraction tasks using in-context learning. This allows for ‘true adaptive learning,’ where the system continuously improves based on user feedback and specific document nuances,” Kus said.

The future: Premium document intelligence powered by advanced AI collaboration

The Enhanced Extract Agent — underpinned by Gemini 2.5 Pro’s features such as multimodality, intelligent reasoning, planning and tool-calling, and large context windows — is envisioned as as a key differentiator that Box leverages in developing their AI Hub and Agent family. Box views the Enhanced Extract Agent as a fundamental way in which organizations can build more confidence in how they deploy AI in the enterprise.

For the Google team, it’s been exciting to see the production-grade, scalable use of our Gemini models by Box. Their solution not only provides extracted data, but meta-data semantics enabling a high degree of confidence and a system that uses the Box content and agents on top of Gemini models to enable the Enhanced Data Extraction Agent to adapt and learn over time.

The ongoing collaboration between Box and Google Cloud focuses on unlocking the full potential of models like Gemini 2.5 Pro for complex enterprise use cases, which are rapidly redefining the future of work and paving the way for the next generation of document intelligence powering the agentic workforce.

To reimagine your data, your assets, and your workplace, access Box and Box AI now in Google Cloud Marketplace.

Read More for the details.

2025 06 24

GCP – How AI & IoT are helping detect hospital incidents — without compromising patient privacy

Tibor Kiss Cloud, Google Cloud gcp

Hospitals, while vital for our well-being, can be sources of stress and uncertainty. What if we could make hospitals safer and more efficient — not only for patients but also for the hard-working staff who care for them? Imagine if technology could provide an additional safeguard, predicting falls, or sensing distress before it’s even visible to the human eye.

Many hospitals today still rely on paper-based processes before transforming critical information to digital systems, leading to frequent — and sometimes, remarkably absurd — inefficiencies. In-person patient monitoring, while standard practice, can be slow, incomplete, and subject to human error and bias. In one serious incident, shared by hospital staff, a patient fell shortly after getting out of bed at 5 a.m. and wasn’t discovered until the routine 6:30 a.m. check. Events like this underscore the need for continuous, 24/7 in-room monitoring solutions that can alert staff immediately in high-risk and emergency situations.

Driven by a shared vision to enhance patient care, healthcare innovator Hypros and Google Cloud joined forces to develop an AI-assisted patient monitoring system that detects and alerts staff to in-hospital patient emergencies, such as out-of-bed falls, delirium onset, or pressure ulcers. This innovative privacy-preserving solution enables better prioritization of care and a strong foundation for clinical decision-making — all without the use of invasive cameras.

Privacy-preserving, AI-assisted patient monitoring

While the need for 24/7 patient monitoring is clear, developing these solutions raises important concerns around privacy and professional conduct. Privacy is paramount in any patient-monitoring technology for both the individuals receiving care and the professionals providing it. Even seemingly simple aspects, such as interventions within the patient’s immediate surroundings, require strict compliance with hospital hygiene policies — a lesson reinforced during the COVID-19 pandemic.

It’s crucial to monitor and correct any mistakes without singling out individuals. By using tools like low-resolution sensors, we can protect people’s identities and reduce the risk of unfair judgment, keeping the focus squarely on improving care. This approach is especially valuable, since the root cause of errors, more often than not, extend beyond the individual. As a result, ethical technology deployment of monitoring, AI or otherwise, means ensuring that the efficiencies or insights gained never compromise fundamental rights and well-being.

1 - Zone-Sensor — Figure 1: Patient monitoring device from Hypros.

The approach for continuous patient monitoring hinges on two key innovations:

Non-invasive IoT devices: Hypros developed a novel battery-powered Internet of Things (IoT) device that can be mounted on the ceiling. This device uses low-resolution sensors to capture minimal optical and environmental data, creating a very low-detail image of the scene. The device is designed to be non-invasive, preserving anonymity while still gathering the crucial information needed to detect any meaningful changes in a patient’s environment or condition.
Two-stage AI workflow: Hypros employ a two-stage machine learning (ML) workflow. Initially, they trained a camera-based vision model using AutoML on Vertex AI to label sensor data from simulated hospital scenarios. Next, they use this labeled dataset to train a second model to interpret low-resolution sensor data.

The following sections explain how Hypros implemented these innovations into their patient monitoring solution, and how Google Cloud assisted Hypros in this endeavor.

Low resolution, high information: Securing patient privacy

To address the critical need for patient privacy while enabling effective hospital bed monitoring, Hypros developed a compact, mountable IoT device (see Figure 1) equipped with low-resolution optical and environmental sensors. This innovative solution operates on battery power, facilitating easy installation and relocation to various bed locations as needed.

2 - 3D Data Projection - Transition — Figure 2: How a bed with a patient scene is abstracted to low resolution sensor data.

The device’s low-resolution optical sensors are effective for protecting patient privacy, they also can make data interpretation and analysis more complex. Additionally, low sampling rates and environmental factors can introduce noise and sparsity into the data, resulting in an incomplete representation of human behavior in the hospital. The combination of low-resolution imaging, limited sampling rates, and environmental noise creates a complex data landscape that requires sophisticated algorithms and interpretive models to extract meaningful insights.

3 - Data and pictogram — Figure 3: Real-world data: Bed sheets changed by Staff, and Patient gets into bed. This is a “simple” scenario.

Despite these challenges, Hypros’ device represents a significant advancement in privacy-preserving patient monitoring, offering the potential to enhance hospital workflow efficiency and patient care without compromising individual privacy.

Patient monitoring with AI: Overcoming low-resolution data challenges

While customized parametric algorithms can partially interpret sensor data, they have difficulty handling complex relationships and edge cases. ML algorithms offer clear advantages, making AI a vital tool for a patient monitoring system.

However, the complexity of their sensor data makes it difficult for AI to independently learn the detection of critical patient conditions, and thus, unsupervised learning techniques would not yield useful results. In addition, manual data labeling can quickly become expensive as tight monitoring sends readings every few seconds, quickly producing large volumes of data.

To solve these issues, Hypros adopted an innovative approach that would allow AI to learn how to detect scenarios from their monitoring devices with minimal labeling effort. They found that using pre-trained AI models, which require fewer examples to learn a new image-based task, can simplify labeling image data. However, these models struggled to interpret their low-resolution sensor data directly.

Therefore, they use a two-step process. First, they train a camera-based vision model using camera data to produce a larger, labelled dataset.Then, they transfer these labels to concurrently recorded sensor data, which they use to train a patient monitoring model. This unique approach enables the system to reliably detect events of interest, such as falls or early signs of delirium, without compromising patient privacy.

Driving healthcare innovation with Google Cloud

Hypros relied heavily on Google Cloud to build their patient monitoring system, particularly its data and AI services. The first crucial step was collecting useful data to train their AI models.

They began by replicating a physical hospital room environment within their offices. This controlled setting enabled them to simulate various realistic scenarios, gather data, and record video. During this phase, they also collaborated closely with hospitals to ensure that the characteristics specific to each use case were accurately determined.

Next, they trained a camera-based vision model with AutoML on VertexAI to label sensor data. This process was remarkably straightforward and efficient. Within approximately two weeks, their initial AutoML camera-based vision model used for labeling achieved an average precision exceeding 91% across all confidence thresholds. Already impressive, the actual performance was higher as labeling discrepancies artificially lowered the results.

Subsequently, they labeled various video recordings from hospital beds and correlated these labels with their device data for model training. This approach allowed the model to learn how to interpret sensor data sequences by observing and learning from the corresponding video. For training use cases that didn’t incorporate video information, they relied on data or simulation methodologies from their hospital partners.

The speed of development cycles is also a critical competitive advantage. Therefore, they mapped every step in their workflow and model development cycles (see Figure 4) to the following Google Cloud services:

Cloud Storage: Stores all raw data, enabling easy rollbacks and establishing a clear baseline for ongoing improvements.
BigQuery: Stores labeled data for easier querying, and analysis querying and analysis. Easy access to the right data helps them iterate, analyze, debug, and refine their models more efficiently.
Artifact Registry: Hosts their custom Docker images in ETL and training pipelines. Fewer downloads, shorter builds, and better software dependency management provides smoother, more optimized operations.
Apache Beam with Dataflow Runner: Processes large volumes of data at high speed, keeping their pipelines fast and maximizing their development time.
Vertex AI: Provides a unified platform for model registration, experiment tracking, and visualizing results in TensorBoard; training is done with TensorFlow and TFRecords, using customized resources (like GPUs) and easy deployment options simplify rolling out new model versions.

4 - GCP Workflow — Figure 4: Simple workflow directed graph to highlight technologies used

With Google Cloud’s ability to handle petabytes of data, they know their workflows are highly scalable. Having a powerful, flexible platform lets them focus on delivering value from data insights, rather than worrying about infrastructure.

Further possibilities: Distilling nuanced information

The development of their system has sparked more ideas about ways hospitals can benefit from using sensor data and AI. They see three main areas of care where continuous patient monitoring can help: patient-centric care for better outcomes, staff-centric support to optimize their time, and environmental monitoring for safer spaces.

Some potential use cases include:

People detection: Anonymously detect individuals to improve operations, such as bed occupancy for patient flow management.
Fall prevention and detection: Alert staff about patient falls or flag restless behavior to prevent them.
Pressure ulcers: Monitor 24/7 movement to aid clinical staff in repositioning patients effectively to prevent the development of pressure ulcers (bedsores).
Delirium risk indicators: Track sleep disruption factors like light and noise, which are potential indicators of delirium risk (final correlation requires additional data from other sources).
General environmental analysis: Monitor temperature, humidity, noise, and other environmental data for smarter building responses in the future (e.g., energy savings through optimized heating) and more effective patient recovery.
Hand hygiene compliance: Anonymously track hand disinfection compliance to improve hygiene practices in combination with solutions like the Hypros’ Hand Hygiene Monitoring solution – NosoEx.

Instead of stockpiling sensor data, their system uses advanced AI models to interpret and connect data from multiple streams — turning simple raw readings into practical insights that guide better decisions. Real-time alerts also bring timely attention to critical situations — ensuring patients receive the swift and focused care they deserve, and staff can perform at their very best.

The path forward with patient care

Already, Hypros’ patient monitoring system is gaining momentum, with real-world trials at leading institutions like UKSH (University Hospital Schleswig-Holstein) in Germany. As highlighted by their recent press release, the UKSH recognizes the potential of their solutions to transform patient care and improve operational efficiency. In addition, their clinical partner, the University Medical Center Greifswald, has experienced benefits firsthand as an early adopter.

Dr. Robert Fleishmann, a managing senior physician and deputy medical director at the University Medical Center Greifswald, is convinced of its usefulness, saying:

“The prevention of delirium is crucial for patient safety. The Hypros patient monitoring solution provides us with vital data to examine risk factors (e.g., light intensity, noise levels, patient movements) contributing to the development of delirium on a 24/7 basis. We are very excited about this innovative partnership.”

This positive feedback, alongside the voices of other customers, fuels Hypros’ ongoing commitment to revolutionize patient care through ethical and data-driven technology.

By harnessing the power of AI and cloud computing, in close collaboration with Google Cloud, Hypros is dedicated to developing privacy-preserving patient monitoring solutions that directly address critical healthcare challenges such as staffing shortages and the ever-increasing need for enhanced patient safety.

Building on this foundation, Hypros envisions a future where their AI-powered patient monitoring solutions are seamlessly integrated into healthcare systems worldwide. The goal is to empower clinicians with real-time, actionable insights, ultimately improving patient outcomes, optimizing resource allocation, and fostering a more sustainable and patient-centric healthcare ecosystem for all.

Read More for the details.

2025 06 24

GCP – How to use Gemini 2.5 to fine-tune video outputs on Vertex AI

Tibor Kiss Cloud, Google Cloud gcp

Recently, we announced Gemini 2.5 is generally available on Vertex AI. As part of this update, tuning capabilities have extended beyond text outputs – now, you can tune image, audio, and video outputs on Vertex AI.

Supervised fine tuning is a powerful technique to customize LLM output using your own data. Through tuning, LLMs become specialized in your business context and task by learning from the tuning examples, therefore achieving higher quality output. With video outputs, here’s some use cases our customers have unlocked:

Automated video summarization: Tuning LLMs to generate concise and coherent summaries of long videos, capturing the main themes, events, and narratives. This is useful for content discovery, archiving, and quick reviews.
Detailed event recognition and localization: Fine-tuning allows LLMs to identify and pinpoint specific actions, events, or objects within a video timeline with greater accuracy. For example, identifying all instances of a particular product in a marketing video or a specific action in sports footage.
Content moderation: Specialized tuning can improve an LLM’s ability to detect sensitive, inappropriate, or policy-violating content within videos, going beyond simple object detection to understand context and nuance.
Video captioning and subtitling: While already a common application, tuning can improve the accuracy, fluency, and context-awareness of automatically generated captions and subtitles, including descriptions of nonverbal cues.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6d910b6df0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Today, we will share actionable best practices for conducting truly effective tuning experiments using the Vertex AI tuning service. In this blog, we will cover the following steps:

Craft your prompt
Detect multiple labels
Conduct single-label video task analysis
Prepare video tuning dataset
Set the hyperparameters for tuning
Evaluate the tuned checkpoint on the video tasks

I. Craft your prompt

Designing the right prompt is a cornerstone of any effective tuning, directly influencing model behavior and output quality. An effective prompt for video tuning typically comprises several key components, ensuring clarity in the prompt.

Task context: This component sets the overall context and defines the intention of the task. It should clearly articulate the primary objective of the video analysis. For example….
Task definition: This component provides specific, detailed guidance on how the model should perform the task including label definitions for tasks such as classification or temporal localization. For example, in video classification, clearly define positive and negative matches within your prompt to ensure accurate model guidance.
Output specification: This component provides how the model is expected to produce its output. This includes specific rules or schema for structured formats such as JSON. To maximize clarity, embed a sample JSON object directly in your prompt, specifying its expected structure, schema, data types, and any formatting conventions.

II: Detect multiple labels

Multi-label video analysis involves detecting multiple labels corresponding to a single video. This is a desirable setup for video tasks since the user can train a single model for several labels and obtain predictions for all the labels via a single query request to the tuned model during inference time. These tasks are usually quite challenging for the off-the-shelf models and often need tuning.

See an example prompt below.

code_block: <ListValue: [StructValue([(‘code’, ‘Focus: you are a machine learning data labeller with sports expertise. rnrn### Task definition ###rnGiven a video and an entity definition, your task is to find out the video segments that match the definition for any of the entities listed below and provide the detailed reason on why you believe it is a good match. Please do not hallucinate. There are generally only few or even no positive matches in most cases. You can just output nothing if there are no positive matches.rnrn Entity Name: “entity1″rn Definition: “define entity 1”rn Labeling instruction: provide instruction for entity1rnrn Entity Name: “entity 2″rn Definition: “define entity 2”rn Labeling instruction: provide instruction for entity 2rnrn..rn..rnrn### Output specification ###rnYou should provide the output in a strictly valid JSON format same as the following example.rn[{rn”cls”: {the entity name},rn”start_time”: “Start time of the video segment in mm:ss format.”,rn”end_time”: “End time of the video segment in mm:ss format.”,rn},rn{rn”cls”: {the entity name},rn”start_time”: “Start time of the video segment in mm:ss format.”,rn”end_time”: “End time of the video segment in mm:ss format.”,rn}]rnBe aware that the start and end time must be in a strict numeric format: mm:ss. Do not output anything after the JSON content.rnrnYour answer (as a JSON LIST):’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6d910b6a30>)])]>

Challenges and mitigations for multi-label video tasks:

- The tuned model tends to learn dominant labels (i.e., labels that appear more frequently in the dataset).
- Mitigation: We recommend balancing the target label distribution as much as possible.
When working with video data, skewed label distributions are further complicated by the temporal aspect. For instance, in action localization, a video segment might not contain “event X” but instead feature “event Y” or simply be background footage.
- Mitigation: For such use cases, we recommend using multi-class single-label design described below.
- Mitigation: Improving the positive:negative instance ratio per label would further improve the tuned model’s performance.
The tuned model tends to hallucinate if the video task involves a large number of labels per instance (typically >10 labels per video input).
- Mitigation: For effective tuning, we recommend using multi-label formulation for video tasks that involve less than 10 labels per video.
For video tasks that require temporal understanding in dynamic scenes (e.g. event detection, action localization), the tuned model may not be effective for multiple temporal labels that are overlapping or are very close.

III: Conduct single-label video task analysis

Multi-class single-label analysis involves video tasks where a single video is assigned exactly one label from a predefined set of mutually exclusive labels. In contrast to multi-label tuning, multi-class single-label tuning recipes show good scalability with an increasing number of distinct labels. This makes the multi-class single-label formulation a viable and robust option for complex tasks. For example, tasks that involve categorizing videos into one of many possible exclusive categories or detecting several overlapping temporal events in the video.

In such a case, the prompt must explicitly state that only one label from a defined set is applicable to the video input. List all possible labels within the prompt to provide the model with the complete set of options. It is also important to clarify how a model should handle negative instances, i.e., when none of the labels occur in the video.

See an example prompt below:

code_block: <ListValue: [StructValue([(‘code’, ‘You are a video analysis expert. rnrn### Task definition ###rnDetect which animal appears in the video.The video can only have one of the following animals: dog, cat, rabbit. If you detect none of these animals, output NO_MATCH.rnrn### Output specification ###rnGenerate output in the following JSON format:rn[{rn”animal_name”: “<CATEGORY>”,rn}]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6d8d2a2760>)])]>

Challenges and mitigations for multi-class single-label video tasks

Using highly skewed data distributions may cause quality regression on the tuned model. The model may simply learn to predict the majority class, failing to identify the rare positive instances.

Mitigation: Undersampling the negative instances or oversampling the positive instances to balance the distributions are some effective strategies for tuning recipes. The undersampling/oversampling rate depends on the specific use case at hand

Some video use cases can be formulated as both multi-class single-label tasks and multi-label tasks. For example, detecting time intervals for several events in a video.

For fewer event types with non-overlapping time intervals (typically fewer than 10 labels per video), multi-label formulation is a good option.
On the other hand, for several similar event types with dense time intervals, multi-class single-label recipes yield better model performance . Model inference involves sending a separate query for each class (e.g., “Is event A present?”, then “Is event B present?”). This approach effectively treats the multi-class problem as a series of N binary decisions. This would mean for N classes, you will need to send N inference requests to the tuned models.

This is a tradeoff between higher inference latency and cost vs target performance. The choice should be made based on expected target performance from the model for the use case.

IV. Prepare video tuning dataset

The Vertex Tuning API uses *.jsonl files for both training and validation datasets. Validation data is used to select a checkpoint from the tuning process. Ideally, there should be no overlap in the JSON objects contained within train.jsonl and validation.jsonl. Learn more about how to prepare tuning dataset and its limitations here.

For maximum efficiency when tuning Gemini 2.0 (and newer) models on video, we recommend to use the MEDIA_RESOLUTION_LOW setting, located within the generationConfig object for each video in your input file. It dictates the number of tokens used to represent each frame, directly impacting training speed and cost.

You have two options:

MEDIA_RESOLUTION_LOW (default): Encodes each frame using 64 tokens.
MEDIA_RESOLUTION_MEDIUM: Encodes each frame using 256 tokens.

While MEDIA_RESOLUTION_MEDIUM may offer slightly better performance on tasks that rely on subtle visual cues, it comes with a significant trade-off: training is approximately four times slower. Given that the lower-resolution setting provides comparable performance for most applications, sticking with the default MEDIA_RESOLUTION_LOW is the most effective strategy for balancing performance with crucial gains in training speed.

V. Set the hyperparameters for tuning

After preparing your tuning dataset, you are ready to submit your first video tuning job! We supports 3 hyperparameters:

epochs: specifies the number of iterations over the entire training dataset. With a dataset size of ~500 examples, starting with epochs = 5 is the default value for video tuning tasks. Increase the number of epochs when you have <500 samples and decrease when you have >500 samples.
learning_rate_multiplier: specifies multiplier for the learning rate. We recommended experimenting with values less than 1 if the model is overfitting and values greater than 1 if the model is underfitting.
adapter_size: specified the rank of the LoRA adapter. The default values are adapter_size=8 for flash model tuning. For most use cases, you won’t need to adjust this, but a higher size allows the model to learn more complex tasks.

To streamline your tuning process, Vertex AI provides intelligent, automatic hyperparameter defaults. These values are carefully selected based on the specific characteristics of your dataset, including its size, modality, and context length. For the most direct path to a quality model, we recommend starting your experiments with these pre-configured values. Advanced users looking to further optimize performance can then treat these defaults as a strong baseline, systematically adjusting them based on the evaluation metrics from their completed tuning jobs.

VI. Evaluate the tuned checkpoint on the video tasks

Vertex AI tuning service provides loss and accuracy graph for training and validation dataset out of the box. The monitoring graph is updated in real time as your tuning job progresses. Intermediate checkpoints are automatically deployed for you. We recommend selecting the checkpoint corresponding to the epochs that show loss values on the validation dataset have saturated.

To evaluate the tuned model endpoint, See a sample code snippet below:

code_block: <ListValue: [StructValue([(‘code’, ‘response = tuned_model.generate_content(rn contents=[rn Content(role=”user”, parts=[rn Part.from_text(PROMPT),rn Part.from_uri(rn uri=”<path to video file>”,rn mime_type=”video/mp4″,rn video_offset: {rn “start_time_offset”: “0s”,rn “end_time_offset”: “200s”rn })rn ])rn ],rn generation_config=GenerationConfig.from_dict(rn {rn “temperature” : 0,rn “media_resolution”: “MEDIA_RESOLUTION_LOW”rn }rn ),rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6d8d2a2250>)])]>

For best performance, it is critical that the format, context and distribution of the inference prompts align with the tuning dataset. Also, we recommend using the same mediaResolution for evaluation as the one used during training.

For thinking models like Gemini 2.5 Flash, we recommend setting the thinking budget to 0 to turn off thinking on tuned tasks for optimal performance and cost efficiency. During supervised fine-tuning, the model learns to mimic the ground truth in the tuning dataset, omitting the thinking process.

Get started on Vertex AI today

The ability to derive deep, contextual understanding from video is no longer a futuristic concept—it’s a present-day reality. By applying the best practices we’ve discussed for prompt engineering, tuning dataset design, and leveraging the intelligent defaults in Vertex AI, you are now equipped to effectively tune Gemini models for your specific video-based tasks.

What challenges will you solve? What novel user experiences will you create? The tools are ready and waiting. We can’t wait to see what you build.

Read More for the details.

2025 06 24

GCP – Run your own code at the edge with Service Extensions plugins for Cloud CDN

Tibor Kiss Cloud, Google Cloud gcp

At Google Cloud, we’re committed to delivering the best performance possible globally for web and API content. Cloud CDN is a high-performance edge caching solution that runs at over 200 points of presence, and we continue to add more features and capabilities to it. Recently we launched invalidation with cache tags, device characterization, 0-RTT early data, and geo-targeting. These powerful first class features address many use cases, but organizations tell us they also need a more flexible, lightweight, edge computing solution.

We are excited to announce that you can now run Service Extensions plugins with Cloud CDN, allowing you to run your own custom code directly in the request path in a fully managed Google environment with optimal latency. This allows you to customize the behavior of Cloud CDN and the Application Load Balancer to meet your business requirements.

Service Extensions plugins with Cloud CDN supports the following use cases:

Custom traffic steering: Manipulate request headers to influence backend service selection.
Cache optimization: Influence which content is served from a Cloud CDN cache.
Exception handling: Redirect clients to a custom error page for certain response classes.
Custom logging: Log user-defined headers or custom data into Cloud Logging.
Header addition: Create new headers relevant for your applications or specific customers.
Header manipulation: Rewrite existing request headers or override client headers on their way to the backend.
Security: Write custom security policies based on client requests and make enforcement decisions within your plugin.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6d90586d30>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/products?#networking’), (‘image’, None)])]>

Where you can run your code

Service Extensions plugins run at multiple locations in the request and response processing path. With this launch, you can run plugins before requests to Cloud CDN using an edge extension. In a previous launch, we added the capability to run plugins post the Cloud CDN cache, closer to the origin, via a traffic extension. Now, with support for both edge and traffic extensions, you can choose where you want your code to execute in the processing path.

Service Extensions deep dive

Service Extensions plugins are designed for lightweight compute operations that run as part of the Application Load Balancer request path. Plugins are built on WebAssembly (Wasm), which provides several benefits:

Near-native execution speed, and startup time in the single milliseconds
Support for a variety of programming languages, such as Rust, C++, and recently Go
Cross-platform portability, so you can run the same plugin in various deployments, or locally for testing
Security protections, such as executing plugin logic in a sandboxed environment

Service Extensions plugins leverage Proxy-Wasm, a Google-supported open source project that provides a standard API for Wasm modules to interface with network proxies.

Use Cloudinary’s image & video optimization solution

We are excited to announce our latest partner to integrate their offering with Service Extensions and Cloud CDN. Cloudinary makes an advanced image and video optimization solution and has integrated it with Service Extensions plugins for deployment to Cloud CDN customers.

Packaged as a Wasm plugin, Cloudinary’s plugin takes directives from Client Requests such as User-Agent information, and content type expressed as MIME types from HTTP Accept headers, to determine the most optimal media type to serve to the end user. The plugin also takes care of Cache Key normalization so that the images and videos are cached properly based on device types and content types.

“Cloudinary’s image and video solutions help customers manage and optimize their visual media assets at scale while ensuring they are optimized for the best format, device, channel and viewing context. We are excited to partner with the Google team to offer Cloudinary’s image and video optimization solutions to Cloud CDN customers via Service Extensions.” – Gary Ballabio, VP, Strategic Technology Partnerships, Cloudinary

For more information on Cloudinary’s solution, please review this guide.

What’s next

To get started with Service Extensions plugins, take a look at our growing samples repository with a local testing toolkit and follow our quickstart guide in the documentation.

Read More for the details.

2025 06 23

GCP – Looker developers gain speed and accuracy with debut of Continuous Integration

Tibor Kiss Cloud, Google Cloud gcp

With more than a thousand connected data sources available out-of-the-box and an untold number of custom tools, developers rely on Looker’s cloud-first, open-source-friendly model to create new data interpretations and experiences. Today, we are taking a page from modern software engineering principles with our launch of Continuous Integration for Looker, which will help speed up development and help developers take Looker to new places.

As a developer, you rely on your connections to be stable, your data to be true, and for your code to run the same way every time. And when it doesn’t, you don’t want to spend a long time figuring out why the build broke, or hear from users who can’t access their own tools.

Continuous Integration for Looker helps streamline your code development workflows, boost the end-user experience, and give you the confidence you need to deploy changes faster. With Continuous Integration, when you write LookML code, your dashboards remain intact and your Looker content is protected from database changes. This helps to catch data inconsistencies before your users do, and provides access to powerful development validation capabilities directly in your Looker environment.

With Continuous Integration, you can automatically unify changes to data pipelines, models, reports, and dashboards, so that your business intelligence (BI) assets are consistently accurate and reliable.

2 runsandsuites — Continuous Integration in Looker checks your downstream dependencies for accuracy and speeds up development.

Developers benefit from tools that help them maintain code quality, ensure reliability, and manage content effectively. As Looker becomes broadly adopted in an organization, with more users creating new dashboards and reports and connecting Looker to an increasing number of data sources, the potential for data and content errors can increase. Continuous Integration proactively tests new code before it is pushed to production, helping to ensure a strong user experience and success.

Specifically, Continuous Integration in Looker offers:

Early error detection and improved data quality: Minimize unexpected errors in production. Looker’s new Continuous Integration features help LookML developers catch issues before new code changes are deployed, for higher data quality.
Validators that:

Flag upstream SQL changes that may break Looker dimension and measure definitions.
Identify dashboards and Looks that reference outdated LookML definitions.
Validate LookML for errors and antipatterns as a part of other validations.

Enhanced developer efficiency: Streamline your workflows and integrate Continuous Integration pipelines, for a more efficient development and code review process that automatically checks code quality and dependencies, so you can focus on delivering impactful data experiences.

Increased confidence in deployments: Deploy with confidence, knowing your projects have been thoroughly tested, and confident that your LookML code, SQL queries, and dashboards are robust and reliable.

1 run_validator — Continuous Integration flags development issues early.

Manage Continuous Integration directly within Looker

Looker now lets you manage your continuous integration test suites, runs, and admin configurations within a single, integrated UI. With it, you can

Easily monitor the status of your Continuous Integration runs and manage your test suites directly in Looker.
Leverage powerful validators to ensure accuracy and efficiency of your SQL queries, LookML code, and content.
Trigger Continuous Integration runs manually or automatically via pull requests or schedules whenever you need them, for control over your testing process.

In today’s fast-paced data environment, speed, accuracy and trust are crucial. Continuous Integration in Looker helps your data team promote developmental best practices, reduce risk of introducing errors in production, and increase your organization’s confidence in its data. The result is a consistently dependable Looker experience for all users, including those in line-of-business, increasing reliability across all use cases. Continuous Integration in Looker is now available in preview. Explore its capabilities and see how it can transform your Looker development workflows. For more information, check our product documentation to learn how to enable and configure Continuous Integration for your projects.

Read More for the details.

2025 06 23

GCP – How Conversational Agents and Looker can boost contact center efficiency and enhance constituent services

Tibor Kiss Cloud, Google Cloud gcp

Conversational agents are transforming the way public sector agencies engage with constituents — enabling new levels of hyper-personalization, multimodal conversations, and improving interactions across touchpoints. And this is just the beginning. Our Conversational Agents can help constituents with a variety of tasks such as getting information about government programs and services, scheduling appointments with government agencies, and so much more.

Read on to discover how Google Cloud’s Conversational Agents and tooling can help you build virtual agents that provide rich insights for agency staff, and support excellent constituent services.

Diving deeper into Customer Engagement Suite (CES)

Customer Engagement Suite (CES) with Google AI can improve constituent services and drive greater operational efficiency. It offers tools to automate interactions via 24×7 multilingual virtual agents, assist agents during calls, analyze conversations and provide a unified channel experience. This includes:

Conversational Agents (Dialogflow CX) – now FedRAMP High authorized – includes new generative AI components like data store agents, generative fallbacks and generators, as well as fully generative agents called Playbooks. Conversational Agents are virtual agents that handle natural text, chat or voice conversations with end-users, and are used for a variety of support use cases. They use AI to translate user input (text or audio) into structured queries, integrate with various organization-wide applications, systems and knowledge bases, and help address a user’s questions. Agencies can define deterministic and generative AI-based agentic workflows to support the end-user through processes, guide the overall conversational flow, and take actions.
Agent Assist – Now FedRAMP High authorized – empowers call center operators with real-time support and guidance during the call, providing important context as the conversation unfolds and enabling employees to more efficiently find information for callers. Agent Assist improves accuracy, reduces handle time and after-call work, drives more personalized and effective engagement, and enhances overall service delivery.
Conversational Insights: Unlocks insights about call drivers to improve outcomes.
Contact Center as a Service: Delivers seamless and consistent interactions across all your channels with a turnkey, enterprise-grade, omnichannel contact center solution that is cloud-native and built on Google Cloud’s foundational security, privacy, and AI innovations.

Leveraging analytics for deeper insights

The Analytics Panel in the Conversational Agents Console provides a comprehensive overview of how your agent is performing. It includes metrics like conversation volume, average conversation duration, and conversation abandonment rate. This information can help identify areas where your agent can be improved.

Conversational Insights provides the ability to discover patterns and visualize contact center data trends, offering valuable insights into constituent sentiment, call topics, and agent support needs. This can help identify areas for improvement in the constituent experience. However, analyzing information through the console can be challenging. Custom reports developed with Looker simplify the process of analytics and make trend analysis easier.

Standard Reports allow you to export your Insights data into BigQuery. This allows you to create tailored reports using tools like Looker and Looker Studio. This can give you even more insights into your contact center data – such as conversation sentiment, word clouds with popular entities, Agent Performance reports and conversation specific reporting. Looker Blocks for Contact Center as a Service provides pre-built data models, dashboards, and reports specifically designed for contact center analytics. This accelerates the process of setting up and visualizing contact center data. Understanding conversational data supports mission effectiveness, drives value for the agency, improves operational efficiency, and enhances the overall constituent experience.

Implementing analytics with Contact Center as a Service

To get these pre-made reports that uncover insights from Contact Center Operations using Looker Blocks, you’ll need to do two things.

First, export ConversationaI Insights data into BigQuery. The best way to do this is to set up a scheduled data feed through data engineering pipelines. This automation ensures data synchronization to BigQuery, eliminating the need for manual exports and preventing data loss.

Next, log in to your Looker console, go to the Looker Marketplace, and install the block. Once it’s installed, point it to the BigQuery export datasets, and voila! The dashboards are ready for you to use. Looker Blocks have the ability to recognize the data model and produce metrics for contact center operations. Besides the ready-made dashboards, blocks can also be used as a foundation for reporting and can be tailored to your specific requirements within the organization.

Conversational Agent to Looker analytics pipeline leveraging BigQuery for storage and processing

Overall, these tools can help improve the performance of your contact center. By understanding your agent’s performance, identifying patterns in your contact center data, and creating tailored reports, you can empower agency call center staff with data-driven decisions that enhance the constituent experience.

A great example of this technology in action is in Sullivan County, New York. The county faced the challenge of effectively serving a growing population and managing high inquiry volumes with limited staff and budget. To address this and enhance constituent engagement, they implemented a conversational agent, providing 24/7 online assistance and freeing up county employees for more complex problem-solving. By using Google Cloud’s latest innovations, the county launched a chatbot that streamlined communication. Looker was instrumental in identifying crucial insights, including a 62% year-over-year drop in constituent call volume, tracking their expansion to 24-hour service availability, further augmenting staff capacity and providing Sullivan County constituents with the best possible service.

Tapping into Looker and BigQuery to streamline contact center analytics

Looker is a complete AI for business intelligence (BI) platform allowing users to explore data, chat with their data via AI agents using natural language, and create dashboards and self-service reports with as little as a single natural language query. As a cloud-native and cloud-agnostic conversational enterprise-level BI tool, Looker provides simplified and streamlined provisioning and configuration.

Integrating Looker’s pre-built block with BigQuery offers an immediate and adaptable analytics solution for the public sector. This connection provides a customizable dashboard that visualizes critical contact center data, enabling stakeholders to quickly identify trends, assess performance, and make data-driven decisions to optimize operations. This readily available analytical power eliminates the need for extensive data engineering and accelerates the time to insight, allowing organizations to focus on delivering superior public service.

Ready to see how Looker can transform your contact center data into actionable insights? Sign up for your free Looker trial today.

Read More for the details.

2025 06 23

GCP – Work Smarter with Chromebook Plus and Google AI

Tibor Kiss Cloud, Google Cloud gcp

The way we use technology at work is changing at a rapid pace. Innovation in AI is leading to new experiences and expectations for what can be done on laptops. That’s why we’re excited to unveil the next evolution of Chromebook Plus, a powerful leap forward and designed to help businesses unlock productivity, creativity, and collaboration for employees.

We’ve been hard at work, not only refining the features you already know and love, but also integrating even more Google AI capabilities directly into your devices. We’re also introducing the next wave of Chromebook Plus devices, including the brand-new Lenovo Chromebook Plus, an innovative device powered by the most advanced processor in a Chromebook ever—the MediaTek Kompanio Ultra.

This moment also marks a milestone in our larger effort to improve our hybrid computing approach to AI. With built-in NPU (neural processing unit) capabilities on Chromebook Plus, we now offer on-device AI for offline use, complemented by cloud-based capabilities that benefit from continuous updates and advancements. This hybrid approach allows us to balance performance, efficiency, privacy, cost, and reliability in Chromebook Plus.

The latest in Chromebook Plus

The Lenovo Chromebook Plus (14”, 10) is the world’s first Chromebook with an NPU and NPU-enabled capabilities. Powered by MediaTek’s Kompanio Ultra processor, and boasting 50 TOPS (trillions of operations per second) of AI processing power to enable on-device generative AI experiences, this intelligent device is built to keep up with modern workers and offer up to 17 hours of battery life. Learn more.

The Lenovo Chromebook Plus also comes with exclusive AI features built in. Organization is easy with Smart grouping, which provides you with a glanceable chip of your recent tabs and apps. You can also automatically group related items, move them to a new desk, or reopen all tabs in a single window. And with On device image generation, you can effortlessly turn any image into a sticker or standalone graphic with a transparent background, ready for use in Google Slides, Docs, and more.

Plus_SmartGrouping_May_2025_16x9_1x — Smart Grouping

A device for every need

We also understand that every business has its own unique needs and requirements. That’s why we’re so excited to expand the Chromebook portfolio with additional devices, including the ASUS Chromebook Plus CX15, ASUS Chromebook Plus CX14, and the ASUS Chromebook CX14. These additions further broaden the range of choices available, ensuring businesses can find a device that aligns with both their operational needs and budget.

When it comes to modernizing your team, the right device can make all the difference.

For cost-conscious businesses, who prioritize a highly affordable and reliable solution for essential tasks like email, web browsing, and cloud-based applications, standard Chromebooks offer exceptional value.
For enhanced interactions and versatility, especially for teams in retail, field services, or more creative roles, we offer touchscreen options, as well as detachable and convertible form factors so you can adapt to various work environments and presentation styles.
For advanced use cases and future-proofing, and employees that require cutting-edge performance, Chromebook Plus devices are the ideal choice. With powerful processors, more memory, double the storage of standard Chromebooks, and on-device AI capabilities, these devices are equipped to handle the next generation of productivity tools and smart features, future-proofing your investment.

New Google AI features to supercharge your workforce

Along with all of this new hardware, we’re also introducing new and updated features built directly into Chromebook and Chromebook Plus.

For productivity, we’ve enhanced Help me read, which now can simplify complex language into more straightforward, digestible text. This is perfect for quickly grasping complicated topics, technical documents, or anything that might otherwise require more time to understand. Additionally, we’re introducing the new Text capture feature. Leveraging generative AI, it extracts specific information from anything on your screen and provides contextual recommendations. Imagine automatically adding events to your calendar directly from an email banner, or effortlessly taking a receipt screenshot and pulling that data into a spreadsheet for easier tracking. Finally, Select to search with Lens helps you get more information from whatever is on your screen. Whether you’re curious about a landmark, a product, or anything else, this feature helps you quickly identify and learn more about it.

Plus_TextCapture_May_2025_16x10 — Text Capture

Just as critical as productivity is empowering teams to unleash their creativity. With that in mind, we’ve improved Quick Insert to now include image generation capabilities. With just the press of a button, you can generate high-quality AI images and instantly insert them into your emails, slides, documents, and more. Need a unique visual for a presentation or an email? Simply describe it, and let AI bring your vision to life.

Plus_QuickInsertImageGen_May_2025_16x10 — Quick Insert with Image Generation

As always, these features come with built-in policies, ensuring IT admins maintain full control over your organization’s access and usage of AI.

Preparing for the future of work

We continue to invest in making Chromebook Plus the definitive choice for businesses seeking to modernize their operations, empower their end-users with productivity and creativity, and prepare for the evolving demands of the future of work. With Chromebook Plus, your organization gains a secure, intelligent, and powerful platform designed to drive progress today and into tomorrow.

Click here to learn about ChromeOS devices, and discover which device is best for your business.

Read More for the details.

2025 06 18

GCP – BigQuery under the hood: Enhanced vectorization in the advanced runtime

Tibor Kiss Cloud, Google Cloud gcp

Under the hood, there’s a lot of technology and expertise that goes into delivering the performance you get from BigQuery, Google Cloud’s data to AI platform. Separating storage and compute provides unique resource allocation flexibility and enables petabyte-scale analysis, while features like compressed storage, compute autoscaling, and flexible pricing contribute to its efficiency. Then there’s the infrastructure — technologies like Borg, Colossus, Jupiter, and Dremel, as we discussed in a previous post.

BigQuery is continually pushing the limits of query price/performance. Google infrastructure innovations such as L4 in Colossus, userspace host networking, optimized BigQuery storage formats, and a cutting-edge data center network have allowed us to do a complete modernization of BigQuery’s core data warehousing technology. We do this while adhering to core principles of self-tuning and zero user intervention, to guarantee the best possible price/performance for all queries. Collectively, we group these improvements into BigQuery’s advanced runtime. In this blog post, we introduce you to one of these improvements, enhanced vectorization, now in preview. Then, stay tuned for future blog posts where we’ll go deep on other technologies and techniques in the advanced runtime family.

Enhanced vectorization: next-level query execution

Before diving into enhanced vectorization, let’s talk about vectorized query execution. In vectorized query execution, columnar data is processed in blocks the size of the CPU cache using Single Instruction Multiple Data (SIMD) instructions, which is now the de-facto industry standard for efficient query processing. BigQuery’s enhanced vectorization expands on vectorized query execution by applying it to key aspects of query processing, such as filter evaluation in BigQuery storage, support for parallel execution of query algorithms, and through specialized data encodings and optimization techniques. Let’s take a closer look.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7a058daf40>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

Data encodings
Modern columnar storage formats use space-efficient data encodings such as dictionary and run-length encodings. For instance, if a column has a million rows but only 10 unique values, dictionary encoding stores those 10 values once and assigns a smaller integer ID to each row rather than repeating the full value. Enhanced vectorization can directly process this encoded data, eliminating redundant computations and significantly boosting query performance. The smaller memory footprint of this encoded data also improves cache locality, creating more opportunities for vectorization.

1 Dictionary and run-length encodings — Figure 1: Dictionary and run-length encodings

For example, as figure 1 demonstrates, “Sedan”, “Wagon” and “SUV” string values are encoded in the dictionary, replacing the repeated string literals with integers that represent indices in the dictionary built from those string values. Subsequently, the repeated integer values can be further represented with run-length encoding. Both types of encodings can offer substantial space and processing savings.

Expression folding and common subexpression elimination
Enhanced vectorization integrates native support for dictionary and run-length encoded data directly into its algorithms. This, combined with optimization techniques such as expression folding, folding propagation, and common subexpression elimination, allows it to intelligently reshape query execution plans. The result can be a significant reduction, or indeed complete removal, of unnecessary data processing.

Consider a scenario where REGEXP_CONTAINS(id, '[0-9]{2}$') AS shard receives dictionary-encoded input. The REGEXP_CONTAINS calculation is performed only once for each unique dictionary value, and the resulting expression is also dictionary-encoded, reducing the number of evaluations significantly and leading to performance improvements.

2 Dictionary folding — Figure 2: Dictionary folding

Here, the calculation is applied to the input dictionary-encoded data directly, producing output of dictionary-encoded data and skipping the dictionary expansion.

With enhanced vectorization, we take expression folding optimization even further by, in some cases, converting an expression into a constant. Consider this query:

code_block: <ListValue: [StructValue([(‘code’, “SELECT SUM(number) FROM tablernWHERE REGEXP_CONTAINS(id, ‘^.*[0-9]{2}’);”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7a06941640>)])]>

If the id in the Capacitor file for this table is dictionary-encoded, the system’s expression folding will evaluate all dictionary values, and, because none of its values contain two digits, determine that the REGEXP_CONTAINS condition is always false, and replace the WHERE clause with a constant false. As a result, BigQuery completely skips scanning the Capacitor file for this table, significantly boosting performance. Of course, these optimizations are applicable across a broad range of scenarios and not just to the query used in this example.

Data-encoding-enabled optimizations
Our state-of-the art join algorithm tries to preserve dictionary and run-length-encoded data wherever possible and makes runtime decisions taking data encoding into account. For example, if the probe side in the join key is dictionary-encoded, we can use that knowledge to avoid repeated hash-table lookups. Also, during aggregation, we can skip building a hashmap if data is already dictionary-encoded and its cardinality is known.

Parallelizable join and aggregation algorithms
Enhanced vectorization harnesses sophisticated parallelizable algorithms for efficient joins and aggregations. When parallel execution is enabled in a Dremel leaf node for certain query-execution modes, the join algorithm can build and probe the right-hand side hash table in parallel using multiple threads. Similarly, aggregation algorithms can perform both local and global aggregations across multiple threads simultaneously. This parallel execution of join and aggregation algorithms leads to a substantial acceleration of query execution.

Tighter integration with Capacitor
We re-engineered Capacitor for the enhanced vectorization runtime, making it smarter and more efficient. This updated version now natively supports semi-structured and JSON data, using sophisticated operators to rebuild JSON data efficiently. Capacitor enables enhanced vectorization runtime to directly access dictionary and run-length-encoded data and apply various optimizations based on data. It intelligently applies folding to a constant optimization when an entire column has the same value. And it can prune expressions in functions expecting NULL, such as IF_NULL and COALESCE, when a column is confirmed to be NULL-free.

Filter pushdown in Capacitor
Capacitor leverages the same vectorized engine as enhanced vectorization to efficiently push down filters and computations. This allows for tailored optimizations based on specific file characteristics and the expressions used. When combined with dictionary and run-length-encoded data, this approach delivers exceptionally fast and efficient data scans, enabling further optimizations like expression folding.

Enhanced vectorization in action

Let’s illustrate the power of these techniques with a concrete example. Enhanced vectorization accelerated one query by 21 times, slashing execution time from over one minute (61 seconds) down to 2.9 seconds.

The query that achieved this dramatic speedup was:

code_block: <ListValue: [StructValue([(‘code’, ‘SELECTrn ANY_VALUE(id) AS id,rn hash_idrnFROM (rn SELECTrn CAST(source_id AS STRING) AS id,rn TO_HEX(SHA1(CAST(source_id AS STRING))) AS hash_idrn FROM `source_data`)rnWHERErn hash_id IS NOT NULLrnGROUP BYrn hash_id’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7a06582850>)])]>

This query ran against a table with over 13 billion logical rows spread across 167 partitions, stored in Capacitor columnar storage format and optimized with dictionary and run-length-encoded data.

Without enhanced vectorization

Executing this query with a regular query engine would involve several steps:

Reading all data for each partition, fully expanding the dictionary and run-length-encoded columnar data.
Computing CAST(source_id AS STRING) and TO_HEX(SHA1(CAST(source_id AS STRING))) for every single columnar data value.
Building a hashmap from all the non-NULL hash_id values.

With enhanced vectorization

When enhanced vectorization processed the same query over the same dataset, it automatically applied these crucial optimizations:

It directly scanned the columnar data in the Capacitor file while preserving its dictionary-encoded data.
It detected and eliminated duplicate computations for CAST(source_id AS STRING) by identifying them as common subexpressions.
It folded the TO_HEX(SHA1(CAST(source_id AS STRING))) computation, propagating the resulting dictionary-encoded data directly to the aggregation step.
The aggregation step recognized the data was already dictionary-encoded, allowing it to completely skip building a hashmap for aggregation.

This example of 21-times query speedup vividly demonstrates how tight integration between enhanced vectorization runtime and Capacitor and various optimization techniques can lead to substantial query performance improvements.

What’s next

BigQuery’s enhanced vectorization significantly improves query price/performance. Internally, we’ve seen a substantial reduction in query latency with comparable or even lower slot utilization with enhanced vectorization runtime, though individual query results can differ. This performance gain comes from innovations in both enhanced vectorization and BigQuery’s storage formats.

We’re dedicated to continuously improving both, applying even more advanced optimizations alongside Google’s infrastructure advancements in storage, compute, and networking to further boost query efficiency and expand the range of queries that the advanced runtime can handle. Over the coming months, BigQuery’s advanced runtime’s enhanced vectorization will be enabled for all customers by default, but you can enable it earlier for your project today. Next up: We’ll offer BigQuery enhanced vectorization for Parquet files and Iceberg tables!

Read More for the details.

2025 06 18

GCP – Automate data resilience at scale with Eon and Google Cloud Backup

Tibor Kiss Cloud, Google Cloud gcp

Cloud backups were once considered as little more than an insurance policy. Now, your backups should do more! They should be autonomous, cost-efficient, and analytics-ready by default.

That’s why Eon built a platform purposefully aligned with Google Cloud to eliminate backup blind spots, simplify recovery, and unlock the value inside backup data without requiring teams to become policy experts or infrastructure wranglers.

Still, no matter what platform you use, it’s critical to understand what resilient cloud backup looks like and how to get there with Google Cloud’s native capabilities.

What makes cloud backup resilient?

Before diving into tooling, it’s worth asking: What does a resilient backup strategy look like in the cloud? In our work with Google Cloud users across industries, we’ve found five common criteria:

5 signs your backup posture may be at risk

You can’t easily see what’s backed up (or not)
Retention policies vary across projects and teams
Data is duplicated or stored inefficiently, driving up spend
Cloud ransomware protection is reactive rather than policy-driven
Recovery requires full restores even when you only need one object

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e79daa4f550>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Best practices for data protection

Google Cloud provides foundational capabilities to protect your data if you configure and use them consistently. Here’s how to maximize native protection:

1. Versioning and retention: first lines of defense

Enable Object Versioning in Cloud Storage to retain multiple object versions, making it easier to recover from accidental deletions. Pair this with Retention Policies to enforce minimum storage lifetimes for regulatory or critical datasets.

Tip: Use Bucket Lock for write-once-read-many (WORM) protection in the areas where compliance matters most.

2. Monitor for gaps in coverage

Use native services like Cloud SQL backups, GKE snapshots, and Persistent Disk images, but be mindful that backup responsibilities can fall to different teams. Without centralized visibility, coverage becomes inconsistent.

Tip: Use Cloud Asset Inventory or scheduled BigQuery queries to audit coverage.

3. Design for granular recovery

Plan for partial restores since not everything needs a full rollback. Whether it’s a single BigQuery table or a specific Cloud Storage object, restoring only what you need saves time and cost.

Tip: Use Object Lifecycle Management to automatically transition older or less critical Cloud Storage objects to colder storage classes.

Automating the complexity away

Managing cloud backup at scale is hard to do manually. From onboarding new workloads to applying consistent policies, human-led approaches don’t scale well.

That’s why more teams are exploring autonomous Cloud Backup Posture Management (CBPM) solutions, like Eon, that detect new assets in real time, apply smart backup rules automatically, and enforce consistent protection across environments.

With Eon, you don’t have to tag resources or write custom scripts. Our platform classifies and protects your Google Cloud assets out of the box—whether you’re working with GKE, Cloud SQL, BigQuery, or another solution.

From backups to business insights

Traditionally, backup data was siloed, underused, and only meant to be retrieved in emergencies. But, increasingly, teams are unlocking that data to:

Run analysis directly on backups using BigQuery and Dataproc,
Feed training and monitoring pipelines via Vertex AI,
Deliver audit-ready dashboards with Looker, powered by backup snapshots.

With Eon, this is built-in. We transform backups into zero-ETL data lakes that reduce pipeline costs and provide immediate access to structured data with no reprocessing required.

What a “mature” backup posture looks like

The end goal for many cloud-native teams is not just to “have backups.” It’s to develop a resilient, intelligent backup strategy that adapts to scale and risk.

Here’s what that looks like:

Automated discovery of new resources
Policy-driven protection tailored to data type and criticality
Immutable backups with time-locked retention
Search-first recovery instead of full snapshot restores
Cost-aware tiering and storage deduplication

Eon helps Google Cloud users reach this level of maturity faster without the burden of custom tooling or constant policy updates.

Ready to simplify backup?

If your team spends hours managing scripts, storage tiers, or backup tags across cloud environments, it may be time to rethink your approach.

Eon was built to make cloud backup resilient, autonomous, and actually useful. From ransomware protection to instant, object-level recovery—and now, zero-ETL access to analytics—we’re here to help you unlock the full potential of your backup data.

Book a demo to see how Eon can modernize your Google Cloud data protection strategy.

To discover how Google Cloud can support your startup, visit our program page. You can also sign up for our newsletter to stay informed about community activities, digital events, special offers, and more.

Read More for the details.

2025 06 18

GCP – Google is a Leader in the 2025 Gartner® Magic Quadrant™ for Analytics and Business Intelligence Platforms

Tibor Kiss Cloud, Google Cloud gcp

We are pleased to share that Gartner® has named Google a Leader in the 2025 Magic Quadrant™ for Analytics and Business Intelligence, for the second consecutive year. We believe this validates our strategy of delivering a comprehensive BI platform for self-service and governed environments that’s accessible to entire organizations through natural language, and backed by trusted data enabled by a semantic modeling layer.

figure1 (1) — Download the complimentary 2025 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms.

Generative AI has redefined what a business intelligence platform can offer. In the past year, we introduced Conversational Analytics, a new way for people in all parts of your organization to talk with their data and get answers, using simple, natural language, while also delivering many AI-powered capabilities to the Looker platform, including slide generation, formula creation and more. We also set the stage for AI agents grounded in truth with Looker’s trusted metrics, expanded our semantic layer to new third-party providers, introduced Looker reports for Google-easy dashboarding, and debuted continuous integration to help developers build and test faster.

The goal: infusing trusted data into a company’s every workflow and decision.

The generative AI revolution in BI

The deep integration of Google’s foundational Gemini models into the Looker platform has ushered in a new era of AI-powered business intelligence, making data exploration and analysis more accessible and insightful than ever before.

The AI-powered capabilities that we introduced in the past year are fundamentally changing how users interact with their data:

Conversational Analytics: Users can now ask complex questions of their data in natural language and instantly receive intelligent, visualized answers. This empowers business users to self-serve their data needs without writing a single line of code, freeing up data teams to focus on more strategic initiatives.
Code Interpreter in Conversational Analytics translates your natural-language questions into Python code, and executes that code to provide advanced analysis and visualizations. This helps with more complex scenarios, such as “what if” questions, period-over-period growth analysis and more.
AI-powered development: Every action in the Looker platform is powered by Google’s Gemini models, accelerating all of your BI actions, from writing and debugging LookML, developing robust and reliable data models, to building new reports and slides.
Automated slide generation and formula creation: With Google’s latest Gemini models, Looker re-envisions the way you create and share information in the AI age. You can create Google Slides presentations with insightful chart summaries in seconds, or tap into the formula assistant to build calculated fields that leverage metrics and dimensions based on your own unique data.

Looker agents, leveraging the Conversational Analytics API, will soon be available in Agentspace, providing a central repository for discovery and access so they are simple to deploy and manage.

Google-easy data storytelling with Looker reports

Building on our commitment to flexible and powerful data visualization, we introduced a new, more intuitive Looker reports experience. This reimagined reporting capability provides a beautiful and collaborative canvas for data exploration and storytelling, complete with:

Enhanced visualization capabilities: New chart types and customization options give users more control over how they present their data to tell compelling narratives.
Simplified, collaborative workflows: The new reporting interface makes it easier than ever to build, share, and collaborate on reports, fostering a more inclusive data culture.
Responsive canvas: Reports are now more responsive, for a smooth viewing experience across screen sizes for devices ranging from desktops, to tablets, to mobile.

Empowering developers and embedded experiences

At Looker, we know the only limitations on developer creativity are those you set on yourself. That is, unless tools get in the way. With that in mind, we continue to invest heavily in the developer experience.

Our new Conversational Analytics API allows you to embed natural-language querying directly into your applications and workflows, unlocking a new level of interactivity and user engagement for embedded analytics experiences. When applied in combination with Looker Embedded and the emerging Model Context Protocol (MCP) standard, developers can now build and design custom conversational agents for BI for their own applications and innovations.

Agentspace will serve as a centralized hub for managing and sharing Looker agents, enhancing discoverability and simplifying deployment. With this approach, teams can quickly leverage AI-powered insights and share agents across teams, promoting a more data-driven culture. And with the Agent Development Kit announced at Google Cloud Next ‘25, Google is providing a rich model and tooling ecosystem designed for multi-agent capabilities.

Code Interpreter in Conversational Analytics enables users to perform advanced analysis that historically has required specialized knowledge of advanced coding or statistical methods. Code Interpreter excels at questions that go beyond the scope of the standard BI query, such as “What were the key drivers of sales in my data?” or “What were our quarterly sales in 2023 and 2024, and what was the quarter-over-quarter growth?”. Also, we know that in the world of AI-powered data, an answer holds little value if you can’t verify how it was generated. That’s why Code Interpreter shows its work. For every answer it produces, you can expand a “How is this calculated?” section to see the exact Python code that was run, ensuring it’s not a black box.

We also know developers need to trust that their applications and dashboards are accurate and will build properly every time. To enhance the reliability and speed of LookML development, the Spectacles.dev team joined Google Cloud, and is working hard to deliver powerful continuous integration (CI) and automated testing capabilities to the Looker platform, helping to ensure data quality and consistency at scale.

Bringing trust to every gen-AI-powered business

In the AI era, data drives your business, your apps and your decisions. You need your data to be accurate and consistent, but that hasn’t always been the case with traditional tools. In this new world, trusted definitions managed by a semantic layer are a must-have, backed by unique information about your business. It is not enough to have reports and dashboards be available or simply delightful — they must take full advantage of data agents for specific use cases, or be embedded in third-party apps that your organization uses every day.

The combination of Looker’s powerful semantic model with Google’s leading AI capabilities delivers a new foundation for business intelligence — one that is more intelligent, intuitive, and impactful than ever before. Our own testing shows that by building with Looker’s semantic layer, data errors in gen AI natural language queries are reduced by as much as two thirds. Data consistency and quality are top priorities for modern organizations. We are building for this moment.

To download the full 2025 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms report, click here, and for more information on Looker see here.

^{2025 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms – Anirudh Ganeshan, Edgar Macari, Jamie O’Brien, Kurt Schlegel, Christopher Long, June 16, 2025}

^{GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved.}

^{Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.}

^{This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Google.}

Read More for the details.

2025 06 18

GCP – What’s in an ASP? Creative Phishing Attack on Prominent Academics and Critics of Russia

Tibor Kiss Cloud, Google Cloud gcp

Written by: Gabby Roncone, Wesley Shields

In cooperation with external partners, Google Threat Intelligence Group (GTIG) observed a Russia state-sponsored cyber threat actor impersonating the U.S. Department of State. From at least April through early June 2025, this actor targeted prominent academics and critics of Russia, often using extensive rapport building and tailored lures to convince the target to set up application specific passwords (ASPs). Once the target shares the ASP passcode, the attackers establish persistent access to the victim’s mailbox. Two distinct campaigns are detailed in this post. This activity aligns with Citizen Lab’s recent research on social engineering attacks against ASPs, another useful resource for high risk users.

GTIG tracks this activity as UNC6293, a likely Russia state-sponsored cyber actor we assess with low confidence is associated with APT29 / ICECAP. After establishing rapport, the attacker sent phishing lures disguised as meeting invitations, and added spoofed Department of State email addresses on the cc line of the initial outreach to increase the legitimacy of the contact attempt. The initial phishing email itself is not directly malicious, but encourages the victim to respond to set up a meeting.

Figure 1: Keir Giles, a prominent British researcher on Russia, posted this screenshot of an email header with fake U.S. Department of State emails that was part of a UNC6293 campaign

Targets who responded received an email with a benign PDF lure attached. The State Department themed lure is customized to the target and contains instructions to securely access a fake Department of State cloud environment. This included directing victims to go to https://account.google.com and create an Application Specific Password (ASP) or “app passwords.” ASPs are randomly generated 16-character passcodes that allow third-party applications to access your Google Account, intended for applications and devices that do not support features like 2-step verification (2SV). To use an ASP you must set it up and provide a name for the application.

Figure 2: Benign PDF document with instructions

In campaign one, the ASP name suggested in the lure PDF was “ms.state.gov” and in campaign two, we observed a Ukrainian and Microsoft themed ASP name. After creating the ASP, the attackers directed the target to send them the 16-character code. The attackers then set up a mail client to use the ASP, likely with the end goal of accessing and reading the victim’s email correspondence. This method also allows the attackers to have persistent access to accounts.

Campaign	Sender Theme	ASP Name	Attacker Infrastructure Used
Campaign 1	State Department	ms.state.gov	91.190.191.117 – Residential proxy
Campaign 2	Unknown	Ukrainian and Microsoft-themed ASP	91.190.191.117 – Residential proxy

Attackers logged into victim accounts primarily using residential proxies and VPS servers, in some cases re-using infrastructure to access different victim or attacker accounts. As a result, we were able to connect the two distinct campaigns we observed to the same cluster. We have re-secured the Gmail accounts compromised by these campaigns.

Mitigations

GTIG is committed to our mission of understanding and countering advanced threats. We use the results of our research to ensure that Google’s products are secure and to protect our users and enterprise customers.

Users have complete control over their ASPs and may create or revoke them on demand. Google Workspace administrators also have options for restricting their use, or revoking ones created by their users. Upon creation, Google sends a notification to the corresponding account Gmail, recovery email address, and any device signed in with that Google account to ensure the user intended to enable this form of authentication.

Figure 3: Google Account Help documentation on app passwords

Google provides enhanced security resources such as the Advanced Protection Program (APP), intended for individuals at high risk of targeted attacks and exposure to other serious threats. Opting to use the APP prevents an account from creating an ASP due to the program’s heightened security requirements.

We are committed to sharing our findings with the security community and with companies and individuals that may have been targeted by these activities, and we hope that improved understanding of tactics and techniques will enhance threat hunting capabilities and lead to stronger user protections across the industry.

Lure PDF Document

SHA256: 329fda9939930e504f47d30834d769b30ebeaced7d73f3c1aadd0e48320d6b39

Read More for the details.

2025 06 17

GCP – Enhancing backup vaults with support for Persistent Disk, Hyperdisk, and multi-regions

Tibor Kiss Cloud, Google Cloud gcp

To help protect against evolving digital threats like ransomware and malicious deletions, last year, we introduced backup vault in the Google Cloud Backup and DR service, with support for Compute Engine VM backups. This provided immutable and indelible backup capabilities for mission-critical VMs, for both VM metadata and all their attached disks.

Today, we’re announcing two enhancements to backup vaults that can help you protect more types of workloads, better:

Backup vaults now support standalone Persistent Disk (PD) and Hyperdisk backups. Now in preview, it enables the direct backup of data on individual disks, providing a granular alternative to backing up the entire virtual machine.
Backup vaults can now be created in multi-region locations. Now generally available it supports regional data resilience and helping to meet business continuity requirements.

Immutability and indelibility

Traditional backups have a well-known vulnerability. If a malicious actor gains access to your environment, if they attempt to delete or corrupt the backup, preventing recovery and thus causing business loss, there is nothing preventing this from happening. This is where backup vaults fundamentally change the game.

A backup vault provides a secure, isolated storage environment in Google-managed projects that helps ensure your backups are immutable (secured against data modification) and indelible (secured against data deletion), providing protection against cyber attacks such as ransomware. When creating a backup vault, you can specify that vaulted backups must be secured against modification and deletion — even by a backup administrator who would traditionally have the ability to expire backups — until the specified minimum enforced retention timeframe has elapsed.

Once a backup is stored in a vault, it’s logically air-gapped from your Google Cloud project, and cannot be changed during its user-defined enforced retention period. This means:

No deletion: The backup can’t be accidentally or deliberately deleted before its enforced retention period expires.
No alteration: The backup data cannot be changed, and remains exactly as it was when it was created.

This gives you the confidence that your crucial recovery points have not been modified, so they are available when you need them.

Backup Vault now supports Persistent Disk and Hyperdisk

Many applications rely on the durable storage provided by Persistent Disk and Hyperdisk. With support for Persistent Disk and Hyperdisk in addition to Compute Engine VMs, backup vaults now offer a holistic defense strategy for your entire compute environment:

For your VMs: Backup vaults can help protect your Compute Engine VMs (including VM metadata and all the attached disks). They can provide rapid and secure recovery of operating systems, configurations, application binaries, and all associated disks.
For critical data disks: Now you can secure specific Persistent Disks and Hyperdisks that contain application data, databases, and file shares. They can provide granular protection, for scenarios where a full VM backup isn’t necessary, or you want to optimize costs.

This integrated approach ensures that whether you need to restore an entire VM or a specific disk, your recovery points are secured in a backup vault.

Key benefits of unified backup vault protection

By centralizing your Compute Engine VM, Persistent Disk, and Hyperdisk backups within backup vaults, you gain a powerful suite of advantages that transform your data protection strategy from reactive to proactively resilient:

Unified interface for easy management: Easily define and enforce consistent backup policies (including backup frequency and retention period) across your entire organization. Manage backups for your Compute Engine VMs, Persistent Disks, and Hyperdisks from a unified interface, even across multiple Google Cloud projects, simplifying administration.
Comprehensive monitoring and reporting: Benefit from centralized monitoring, detailed reporting, and timely alerting capabilities that streamline your day-to-day backup management. This enhanced visibility also significantly aids in meeting stringent audit and compliance requirements by providing clear, verifiable records of your backup posture.
Proactive security integration: Elevate your overall security posture with integration to Security Command Center, enabling proactive detection of anomalous activities, such as unauthorized backup deletion attempts or suspicious policy changes, so you can respond swiftly and decisively to threats.
Reduced operational complexity: Consolidate your backup management processes, moving away from disparate, script-based, or manual solutions. Backup and DR service provides a streamlined, fully managed service that simplifies operations, reduces human error, and frees up valuable IT resources, so you can focus on innovation.

Here’s how it works

Create a backup vault: Begin by establishing a secure backup vault. This vault acts as your designated, isolated, and highly protected storage destination for all your managed backups.
Define a backup plan: Next, create a comprehensive backup plan, specifying parameters such as the desired backup frequency (how often your disks will be backed up), backup retention period, and designating the specific backup vault where the backup data will be stored.
Schedule your backups: Now you are ready to apply your backup plan to your desired Persistent Disks or Hyperdisks. The Backup and DR service automatically takes incremental crash-consistent backups according to your defined schedule, with no manual intervention on your part.

Once these backups are created and stored in your designated vault, the vault’s enforced retention policy is automatically applied, making the backups immutable and indelible for the specified enforced retention period.

1 - persistent-disk-backups-to-backup-vault-for-cyber-resilience

Secure disaster recovery with multi-region backup vaults

In addition, you can now create backup vaults in Google-managed, multi-region locations. When using a multi-region backup vault, data is stored in more than one geographic region, thereby providing the security benefits of backup vault, while also making critical backup data available during unforeseen events.

Using multi-region backup vaults lets you:

Retain data access: Maintain accessibility and recoverability of critical backup data during a regional service disruption (such as natural disasters, power outages).
Satisfy business continuity requirements: Instill confidence in your business operations with your ability to perform on-demand, backup-based recoveries.
Secure your data: Retain all of the critical security benefits delivered by backup vaults.

Multi-region backup vault storage is generally available and currently supports Compute Engine full VM backups and disk backups to supported Locations. Complete this form to request access to the new feature.

2 - Backup vault creation screen - multi-region

Protect all your critical Compute Engine data

With the addition of multi-region backup vaults and disk-level backup, Backup and DR service can secure and recover critical Compute Engine data better than ever. Try the new capabilities yourself to optimize your VM data protection strategy.

To learn more about disk backup, start here.
To learn more about multi-region backup vaults, start here.
To request access to use multi-region backup vaults, please complete this form.
See here for pricing information relating to the new capabilities.

Read More for the details.

2025 06 17

GCP – GKE workload scheduling: Strategies for when resources get tight

Tibor Kiss Cloud, Google Cloud gcp

As a customer of Google Kubernetes Engine (GKE), you’ve selected a container runtime with a high degree of managed operations, encompassing everything from automatic upgrades to effortless node management. This inherent efficiency allows you to focus more on your applications, and less on the underlying infrastructure. In an ideal world, this streamlined experience, coupled with GKE’s robust autoscaling capabilities, ensures perfect workload scheduling all the time. Your applications seamlessly scale up and down, always finding the resources they need, precisely when they need them.

Unfortunately though, the real world presents a few more challenges that need to be addressed. GKE offers powerful four-way autoscaling (Horizontal Pod Autoscaler, Vertical Pod Autoscaler, Cluster Autoscaler, and Node Auto Provisioning) that provides the building blocks to address the scalability needs for workloads and infrastructure. However, running an efficient platform for today’s dynamic workloads involves more than just ensuring scalability. Factors like cost optimization, capacity availability, the speed at which resources can scale, overall performance, and the flexibility of your infrastructure all profoundly affect and constrain how workload scheduling can be effectively planned on GKE. Honestly, it can get a bit cloudy on what is the best strategy and what are the trade offs between these parameters.

In this blog we will focus on specifically the GKE scheduler and the factors that can influence its workload placement decisions when capacity constraints exist. We will explore how to plan and design for these scenarios using various GKE features and workload configurations.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ecd5756d0d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>

It’s (mostly) a constraint optimization problem

At its core, effective workload scheduling in GKE is not about finding the single best solution, but rather navigating a multi-dimensional constraint optimization problem. For many use cases it is less about overcoming strict limitations and more about finding trade-offs between competing priorities:

Cost: You want to minimize overall infrastructure spend, optimize utilization, avoid over-provisioning, and leverage cost-effective solutions.
Performance: You want to ensure the workloads that run on the platform can meet their SLOs according to their relative importance to the business.
Flexibility and agility: You want to be able to react to changes in demand of your workloads by providing the necessary capacity when needed.

Understanding your individual preferences and tolerances across these dimensions is critical to understanding how to navigate this constraint space and how to design and configure your GKE environment.

Core building blocks

While not the only factor, autoscaling and its configuration plays a key role in workload scheduling. The configuration of scaling is particular to each environment, and some best practices have been documented. GKE supports autoscaling across four dimensions:

Horizontal Pod Autoscaler (HPA) – adjusts the number of pod replicas
Vertical Pod Autoscaler (VPA) – adjusts pod resource requests based on actual usage
Cluster Autoscaler (CA) – automatically adjusts the number of nodes
Node Auto Provisioner (NAP) – adjusts node pool size based on workload demands

When capacity is a concern, it’s crucial to understand how much resource your workload requests and consumes. The GKE scheduler relies on the pod resource.request value to make an optimum scheduling decision. If this is not set, this can result in incorrect placement (e.g., on nodes with not enough capacity) and workload instability due to pre-emption. The importance of setting requests is discussed in more detail here.

Workload scheduling constraint scenarios

What are good options for running an efficient and performant platform when capacity is constrained?

Let’s take some examples of common scenarios and discuss our options for getting the best result for our workload in terms of cost, performance, and flexibility.

Capacity is fixed or limited – but some high-priority workloads need guaranteed capacity

In this scenario the number of available nodes is considered to be static but the workloads still scale with demand. This creates the need to guarantee resources for critical workloads and explicitly define priority orders.

Solution: Workload priority classes and taints/tolerations

Priority classes implement a hierarchy of workload importance, where higher priority workloads take precedence over lower ones during scheduling decisions. As shown in the diagram above, under capacity constraints, the scheduler evicts lower priority (blue) workloads to successfully schedule those with a higher priority (red).
Taints and tolerations allow capacity targeting by ensuring workloads are not scheduled onto inappropriate nodes. They make sure all the capacity on certain (tainted) nodes (e.g.. with GPUs or SSDs) is only available to specific workloads. Not even workloads with a higher priority class than those with a toleration will be scheduled on the tainted node.

Applications experience sudden spikes in demand, and workloads need to be scaled quickly without performance degradation / errors

In this scenario we need to quickly schedule workloads on a horizontally scalable cluster. Even though GKE has features like container-optimized compute and image streaming that can drastically reduce provisioning time on new nodes, scheduling pods is still much quicker than scaling nodes. This can lead to resource bottlenecks and a degraded SLO.

Solution: Placeholder pods and scaling profiles

Placeholder pods, or “balloon” pods, have the effect of holding or reserving spare running capacity in the cluster. When there’s a sudden spike and new pods need to be scheduled, these balloon pods are evicted, releasing capacity and allowing new pods to be scaled rapidly in their place. New nodes are provisioned by the cluster autoscaler to accommodate the evicted placeholder pods, and provide more capacity if needed.
Auto-scaling profiles configure node scale-down behaviours based on either cost or performance. There are two cluster-based profiles in GKE: balanced and optimize-utilization. The balanced profile scales down nodes in a less aggressive manner compared to the optimize-utilization profile, meaning nodes are available for longer. Any further spikes in demand therefore are not delayed by new node provisioning times.

Workload-specific node scale-down is also available through the use of compute classes (described in more detail below). These allow for node consolidation triggers such as utilization and time delays to customize node lifetimes for different conditions.

I need to provision additional nodes in my cluster but my preferred node type is not available

In this scenario, we address the need to scale out a cluster without knowing if our required or preferred node type, such as a certain hardware accelerator or spot instance, will be available.

Solution: GKE custom compute classes.

These allow you to specify the preferred and fallback nodes that can be used to scale out your clusters. Priorities can be defined for specific node properties like CPU and accelerators, node characteristics (VM family, min CPU/Memory, Spot), or specific instance types (n4-standard-16).
Compute classes also adopt active migration to top priorities, meaning that workloads will always be reconciled to the highest priority option (e.g. Spot instance) when it becomes available, if it was not available at deployment time.
For users of resource-based committed use discounts (CUDs), compute classes can be configured in a way that prefers their committed resources before moving to other resources. To allow for full flexibility and between machine families, regions, and even compute platforms, you should also consider moving to flexible CUDs in the future.

code_block: <ListValue: [StructValue([(‘code’, ‘apiVersion: cloud.google.com/v1rnkind: ComputeClassrnmetadata:rn name: my-classrnspec:rn priorities:rn rules:rn – machineFamily: n4rn minCores: 16rn – machineType: e2-standard-16rn – nodepools: [pool1, pool2]rn autoscalingPolicy:rn consolidationDelayMinutes: 20rn nodepoolAutoCreation:rn enabled: true’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd540c93d0>)])]>

My preferred resources are not available in a given region

In this scenario, the capacity required by the workload is very specific and in high demand. There might even be a possibility that the resources cannot be obtained in a region even through compute classes. This is especially important for AI-based workloads that require high-performance infrastructure and GPU or TPU accelerators.

Solution: Multi-Cluster Orchestrator and Multi-Cluster Gateway

Multi-Cluster Orchestrator is an open-source project whose primary goal is ¨simplifying multi-cluster deployments, optimizing resource utilization and costs, and enhancing workload reliability, scalability, and performance.” Using this technology in GKE, platform engineers can in effect “capacity chase” across Google Cloud regions where a workload´s capacity requirements are matched with the regions where that capacity is available. Multi-Cluster Orchestrator then initiates cluster provisioning in that region to run the workload.
Multi-Cluster Gateway is a networking solution for GKE that leverages the Kubernetes Gateway API to manage application traffic across multiple GKE clusters, potentially spanning different regions. It simplifies the complex task of exposing services and balancing workloads in geographically distributed GKE environments

Conclusion and next steps

GKE offers platform engineers a robust set of tools to optimize resource allocation, even when they face capacity constraints. Effective and holistic capacity planning depends on a clear understanding of the workloads, including their criticality, usage profiles, and capacity requirements. Managing constrained capacity can be a strategic way to control costs, making it crucial to optimize performance under these conditions.

To further enhance your capacity planning consider the following resources:

Understand GKE cluster and workload signals such as utilisation and rightsizing and how they are important in capacity planning.
Monitor scaling events such as failed pod scheduling events and node/pod number changes in the Unschedulable Pods dashboard template.
Take a look at the recently released feature — GKE Horizontal Pod Autoscaling Observability Events — which provides the ability to view HPA autoscaler decision events in logs. This can help with the tracking and understanding of scaling event decisions which may influence platform design.

Read More for the details.

2025 06 17

GCP – Spanner’s enduring impact: Celebrating the 2025 ACM SIGMOD Systems Award

Tibor Kiss Cloud, Google Cloud gcp

Earlier this year, the Association for Computing Machinery’s Special Interest Group on Management of Data (ACM SIGMOD) announced that Spanner, Google’s globally distributed database, was awarded the 2025 SIGMOD Systems Award. The SIGMOD Systems Award specifically honors systems whose technical contributions have profoundly impacted the theory or practice of large-scale data management. On behalf of the entire Spanner team, especially the engineers who were there at the beginning of Spanner’s journey, it is with deep humility and immense pride that we receive this recognition from such a distinguished community. We’re thrilled to be participating in the 2025 SIGMOD conference from June 22-27 in Berlin, Germany as a Platinum sponsor.

This honor feels particularly significant following Spanner’s 2022 SIGOPS Hall of Fame Award, which highlighted the crucial role of technologies like TrueTime and our network infrastructure, reaffirming the lasting significance of the original vision laid out in the first Spanner paper.

Spanner’s core innovation: TrueTime and external consistency

For Spanner to be recognized in this way is a powerful affirmation of the vision we set out to achieve years ago and the new ways Spanner enables applications to be built. According to the award citation, Spanner is recognized “for reimagining relational data management to enable serializability with external consistency at global scale.”

Why “reimagining”? Before Spanner, databases offered a stark choice: you could have ACID transactions and SQL, or you could have scale and multi-datacenter reliability. Scale and availability required a distributed system, and that meant eventual consistency and other forms of best-effort synchronization. Spanner showed that this choice was not fundamental — that it was possible to build a database that offered the horizontal scalability of a distributed system with the power and ease of use of transactions and SQL. It enabled companies for whom scale is job #1 to regain developer velocity and agility. Spanner drastically simplifies the logic required in distributed applications. Developers can reason about the state of the database as if it were a single, consistent entity, even when it spans the globe.

The key enabler for Spanner’s ability to deliver external consistency is TrueTime. Beyond just a synchronized global clock, TrueTime is an API that cleverly exposes clock uncertainty as a bounded interval, which allows higher-level algorithms to reason about the ordering of events. Google’s TrueTime implementation uses specialized hardware references like GPS receivers and atomic clocks to provide highly trustworthy and very tight time bounds. Spanner leverages this bounded uncertainty to achieve external consistency. When a transaction commits, Spanner assigns it a commit timestamp derived from TrueTime. Spanner then enforces a “commit wait” — which can be overlapped with making the transaction durable – to ensure that the commit timestamp is definitively in the past before making the transaction’s effects visible. This ensures that the assigned commit timestamps definitively reflect the true global serialization order of transactions, even across data centers. The result is remarkable: external consistency with no performance cost.

Addressing the consistency-scale dilemma

To truly appreciate the journey, it helps to travel back to the early and mid-2000s at Google. The internet was exploding, and our biggest challenge was scaling our software infrastructure to keep pace. We needed databases that could store and process a copy of the internet using vast fleets of commodity servers. This spurred the development of internal systems that delivered incredible performance and scalability, but they came with trade-offs.

As we gained more familiarity with these systems, and started using them to build big interactive applications like Gmail, we consistently heard from internal developers about the challenge of working with eventual consistency and cross-shard synchronization, as well as the friction of modeling every problem (no matter how complex) as key-value pairs. It quickly became apparent that we needed to build a globally distributed database that offered the familiarity and guarantees of traditional relational databases — including ACID transactions, serializability, and external consistency — without giving up Google’s ever-growing need for bigger databases serving bigger audiences. Moreover, working closely with our customers, it became clear that actually, it was something we could build. The rest is history!

Spanner as a cloud service

As a cornerstone of Google’s infrastructure, Spanner powers some of our most critical, planet-scale services, including Google Ads, Google Search indexing, Gmail, YouTube, Google Photos, metadata for Cloud Storage, and BigQuery, demonstrating its robustness and scalability under extreme load.

The next logical step was to make these capabilities available externally to customers through Google Cloud. The launch and subsequent evolution of Spanner aimed to democratize this technology, bringing the power of a globally consistent, scalable database to organizations of all sizes, from startups to global enterprises, simplifying their application development and operations.

Spanner’s core value proposition for customers stems directly from its unique architecture:

Global scale with strong consistency: Spanner delivers on the original promise: ACID transactions with external consistency across a database that can scale horizontally across regions and continents, automatically managing data distribution (sharding) as needed. This directly addresses the capability highlighted by the SIGMOD award.
Unmatched availability: Leveraging synchronous, Paxos-based replication across multiple zones or regions, Spanner offers an industry-leading 99.999% availability Service Level Agreement (SLA) for multi-region configurations. This helps provide extreme fault tolerance and minimizes downtime risk for mission-critical applications.
Simplified operations: As a fully managed service, Spanner automates complex operational tasks like sharding, replication management, backups, and maintenance. This frees development teams from significant operational burdens, allowing them to focus on building applications rather than managing database infrastructure. This contrasts sharply with the manual effort often required for traditional sharded databases or the complexity of implementing consistency logic at the application layer for NoSQL systems.
Developer productivity: Spanner offers familiar SQL query interfaces, supporting both Google SQL and PostgreSQL dialects, which significantly flattens the learning curve for developers. Furthermore, its strong consistency eliminates entire classes of complex problems related to data synchronization and reconciliation that often plague applications built on eventually consistent systems.

Empowering customers and industries

Of course, today’s data landscape has changed since we published the first Spanner paper in 2012. In the modern AI-first data world, we see customers more focused than ever on getting full value from their data, which is often spread out over many systems with different data models, varying scalability, and uneven reliability. We’re addressing these new challenges head on, introducing Spanner Graph, vector search for AI applications, and integrated full-text search. These allow you to bring together a wide range of data and iterate on it rapidly within a single, consistent, scalable platform. We’ve also helped you run even more cost-effectively as your data grows, with increased compute and storage density, tiered pricing through editions, and cost-optimizing features such as tiered storage. Finally, we’ve made it easier to bring scale-out workloads into Spanner through enhanced interoperability with tools and capabilities, such as Cassandra-compatible APIs.

While academic recognition such as the SIGMOD award is gratifying, we’ve always felt that the true measure of a system’s impact lies in how it empowers users to solve real-world problems and build innovative applications. Spanner’s unique combination of capabilities have proven transformative across various industries.

In financial services, where consistency, availability, and security are paramount, Spanner provides the foundation for many critical systems. Companies like Goldman Sachs use it to consolidate trade ledgers, while others like Arigato Bank rely on it to handle high volumes of financial transactions with perfect consistency, even during peak loads. Digital-native banks like Minna Bank have built their entire infrastructure on Spanner, leveraging its availability and consistency to meet stringent regulatory requirements and customer expectations.

The gaming industry is constantly pushing the boundaries of scale and real-time interaction. Spanner has helped game developers launch globally successful titles like Dragon Quest Walk by Colopl, handling millions of concurrent players from day one. Its ability to manage player profiles, in-game inventory, and leaderboards consistently across a global player base, while elastically scaling to handle unpredictable traffic spikes, has been crucial for delivering a seamless player experience

In retail and e-commerce, Spanner helps businesses manage the complexities of modern commerce. Walmart uses Spanner to modernize its inventory and payment management, providing a real-time, consistent view across online and physical stores. MercadoLibre, a global online marketplace and e-commerce provider, leverages Spanner to handle the global needs of their customers, including massive demand spikes during major product launches.

Leaders in transportation, such as Uber, rely on Spanner to handle millions of concurrent users and billions of trips per month across over 10,000 cities and billions of database transactions per day.

These examples illustrate that for many modern applications, Spanner’s specific blend of global consistency, massive scalability, high availability and interoperable multi-model isn’t just a technical advantage; it’s a fundamental enabler. Business models built around real-time global inventory, instantaneously consistent financial records, or seamless worldwide multiplayer experiences become significantly simpler, less risky, and more feasible to implement with Spanner.

The future with Spanner

The SIGMOD award recognizes the contributions of over 30 individuals, and countless other current and prior Googlers who have played roles throughout its development and evolution. It’s been a privilege to witness the journey from the initial ambitious concepts to the globally impactful system it is today. The journey from the original OSDI paper published to this 2025 SIGMOD Systems Award highlights over a decade of sustained research, engineering, and investment by Google. This long-term commitment is rare and is a key factor behind Spanner’s enduring success and impact.

Thanks to you, our customers, Spanner remains a living system that continues to evolve, benefiting from ongoing improvements, driven by both internal use and the needs of our cloud customers. Looking ahead, we remain committed to pushing the boundaries of what’s possible with distributed databases. We’re excited about enabling new kinds of applications, including those leveraging AI, and continuing our mission to simplify the complexities of data management for developers building the next generation of world-changing applications.

Experience the Spanner difference

The 2025 ACM SIGMOD Systems Award is a tremendous honor and a validation of the path we embarked on years ago. If you are at the 2025 SIGMOD conference, join us on Tuesday, June 24 at 4:30 pm to hear from and meet with the Google team.

If Spanner’s capabilities for consistency, scale, and availability resonate with the challenges you face, we encourage you to learn more:

Explore Spanner to learn about supported features and use cases, and get started with a 90-day Spanner free trial instance.
Dive deeper with documentation
For the academically inclined, read the original and successor papers on Spanner

In closing, we believe Spanner unlocks new possibilities for building reliable, scalable, and globally consistent applications, and we are excited to see what you, our customers, will build with it.

Read More for the details.

2025 06 17

GCP – Graduating the Google for Startups Accelerator: AI First in Europe & Israel

Tibor Kiss Cloud, Google Cloud gcp

Today, we’re incredibly proud to announce the graduation of the latest cohort from the Google for Startups Accelerator: AI First from Europe & Israel! This milestone marks the culmination of an intensive three-months journey for these 14 innovative startups, who’ve dedicated themselves to growing their businesses and pushing the boundaries of artificial intelligence. The hybrid program offered expert mentorship, robust technical support, and access to a powerful global network, empowering founders to scale their impact.

“With Google’s support, we brought our AI recruitment platform into its next generation — the most advanced in the world, with a business model built for $7M+ ARR within a year. Their guidance and exposure to breakthrough models took our tech years ahead.“ – Shira Spetter, CEO, iVERSE

With AI projected to contribute a staggering $15.7 trillion to the global economy by 2030 (PwC), supporting these AI startups is crucial for accelerating groundbreaking innovations, deploying scalable solutions, and ensuring AI’s benefits are widely accessible to businesses and communities worldwide. Because when AI innovation thrives, the world moves forward.

“We were initially cautious about deploying LLMs in production for key functionalities. However, after experiencing Gemini 2.5, we’re not just convinced – we’re actively integrating it to power exciting new features.” – Maria Fe Paz, CEO and founder of Connect by Circular-Lab.

The cohort celebrated the milestone at Viva Technology in Paris where they presented their companies, met potential venture capitalists and had an intimate fireside chat with Joëlle Barral, Research & Engineering Senior Director at Google DeepMind and Arno Amabile, Advisor to the French President Special Envoy for AI.

Learn more about the graduating startups and their inspiring work:

Ambr AI (UK) helps professionals master difficult workplace conversations through realistic Voice AI practice simulations, providing a safe, convenient environment with instant feedback to build crucial communication skills.
Connect by Circular-Lab (Spain) uses AI/ML to structure and centralize diagnostic data, making it accessible for labs, hospitals, and industry stakeholders.
Folio (Israel) empowers industrial sales and application engineers by turning technical specs, configuration data, and application info into instant answers, recommendations, and agentic workflows, speeding work, cutting errors, and boosting revenue for industrial manufacturers and distributors.
Good With (UK) delivers ai-driven financial behaviour analytics for real-time credit risk assessment, enabling lenders to convert binary ‘Yes/No’ decisioning into a ‘Safe Journey to Yes’ which increases ‘good’ customer acquisition and reduces loss.
Hybr (UK) is a SaaS enabled lettings platform for letting agents cut workloads by up to 80%, turning leads into lets faster, and building transparency into the rental process.
iVerse (Israel) is an AI talent platform built on 50+ years of occupational psychology — combining real-time behavioral analysis, proprietary evaluation layers, and a scalable model to match top AI talent with top AI opportunities worldwide.
Material Evolution (UK) is transforming cement with a novel AI-driven tech that uses industrial waste that requires no heat and significantly reduces carbon emissions.
Metsystem (Denmark) develops an AI powered metastasis-targeting platform to predict personalized cancer treatments and help pharmaceutical companies stratify patients for drug trials.
Noxon (Germany) builds wearable Muscle-Computer Interface (MCI) that makes muscle diagnostics and therapies more accessible, user-friendly, and scalable for remote care.
Punto Health (UK/ Spain) is transforming dementia care with an AI-powered platform that delivers continuous, personalised support for patients and carers, while improving monitoring and coordination for providers.
ShareID (France) enables real-time, privacy-first identity verification without storing personal or biometric data, redefining digital trust.
Tech1M (UK) is an intelligent recruitment engine with AI Agents for sourcing, screening, interviewing and hiring talents anywhere in the world.
V-Art (Ukraine) is a DeepTech startup streamlining IP monetization for brands and AI with a solution to manage and license any digital content at scale.
Whering (UK) is a digital wardrobe & AI styling app that allows users to unlock infinite outfit combinations from the clothes they already own.

We can’t wait to see what comes next for these AI-first solutions and incredible teams driving them. Learn more about Google for Startups Accelerator programs on startup.google.com.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ecd540be310>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Read More for the details.

2025 06 17

GCP – Gemini momentum continues with launch of 2.5 Flash-Lite and general availability of 2.5 Flash and Pro on Vertex AI

Tibor Kiss Cloud, Google Cloud gcp

The momentum of the Gemini 2.5 era continues to build. Following our recent announcements, we’re empowering enterprise builders and developers with even greater access to the intelligence, and flexibility of our most capable models yet, directly within Vertex AI, our unified platform for enterprise-scale AI development.

The significant updates announced today are designed to help your organization build sophisticated, customized, and efficient AI solutions, more confidently. These include:

Gemini 2.5 Flash and 2.5 Pro now generally available: Our most intelligent models for speed and advanced reasoning are production-ready providing organizations with the stability, reliability and scalability needed to confidently deploy the most advanced AI capabilities into mission-critical applications.
New Gemini 2.5 Flash-Lite in public preview: Experience the cost-efficient Gemini 2.5 model yet with optimized performance for high-volume tasks.
New Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash is generally available: Tailor our high-speed model to your unique enterprise data and needs.
New updated Live API with native audio in public preview: Streamline the development of complex, real-time audio AI systems.

Build with confidence using production-ready Gemini 2.5

Gemini 2.5 Flash: Optimized for speed, efficiency, and scale

Gemini 2.5 Flash, is now generally available in Vertex AI, the Gemini API, and Google AI Studio, engineered for high-throughput enterprise tasks such as large-scale summarization, responsive chat applications, and efficient data extraction. These advancements provide a comprehensive toolkit to elevate your enterprise applications and unlock new levels of productivity and innovation. Build with confidence on this production-ready foundation.

“SmartBear uses AI to power Test Hub, its solution for building and executing regression tests for web, desktop, and mobile. With Gemini 2.5 Flash on Vertex AI, we can accelerate tasks like translating extensive manual test scripts into robust automated tests with remarkable speed and cost-effectiveness. The ROI is multifaceted: we’re empowering our customers to realize the benefits of automation execution, while simultaneously producing intent-based, resilient-to-change test plans. This drastically increases testing velocity and enables faster feature delivery—helping our customers move with greater speed and confidence, powered by a more efficient and scalable AI foundation.”
– Fitz Nowlan, PhD, VP of AI, SmartBear

“At Connective Health, our mission is empowering healthcare providers and driving better patient outcomes. Gemini 2.5 Flash on Vertex AI is instrumental in helping us extract vital medical records from complex free-text records. Customer trust is paramount, so our AI initiatives are always developed in close collaboration with healthcare providers, ensuring its use is accurate and impactful. The rapid advancements in Gemini’s capabilities allow us to continually enhance how we deliver these critical insights, and we’re excited to explore further applications to improve the lives of more patients and providers.”
– Joe Athman, CTO, Connective Health

“At Suggestic, we’re advancing the future of personalized nutrition by making nutritional data instantly actionable through our next-generation, image-based inference API. By leveraging Gemini 2.5 Flash as our core model, we’ve consistently achieved exceptional accuracy and processing efficiency, significantly outperforming alternative models on the Nutrition5k dataset. Gemini 2.5 Flash delivered a remarkable 25% improvement across critical benchmarks, including processing speed, enabling us to implement advanced image modification tools that enhance inference accuracy without sacrificing response times. Its native support for structured output and unparalleled capability in handling complex, tool-augmented tasks ensures seamless, real-time experiences, making Gemini 2.5 Flash the optimal choice for robust, production-grade solutions.”
– Shai Rozen, Co-founder, Suggestic

Gemini 2.5 Pro: Unlock state-of-the-art intelligence

Our most capable model, Gemini 2.5 Pro, is also now generally available in Vertex AI, the Gemini API, and Google AI Studio. Designed for your most demanding enterprise AI challenges like making sense of massive datasets for scientific discovery or accelerating migration of critical legacy code, it excels at highly complex reasoning, advanced code generation, and deep multimodal understanding.

“At Snap, we believe today’s devices and user interfaces can constrain the full potential of AI. So, we’re bringing AI into the world through Spectacles, our standalone, see-through, immersive AR glasses, and Gemini on Google Cloud. Through the powerful combination of our Depth Module API and Gemini 2.5 Pro, it’s already possible to translate 2D coordinates of an image into 3D space, enabling information and annotations to be anchored on the real world – even as you move around. We’re excited to unlock a whole new paradigm for spatial intelligence on Spectacles.”
– Terek Judi, Staff Product Manager, Snap Inc.

“At Multimodal, we’re reimagining how business and IT teams in finance and insurance co-create intelligent agentic workflows. By integrating Gemini 2.5 Pro into our AgentFlow platform, we’ve transformed how customers experience Zero Shot AI—enabling them to instantly see how AI agents operate on their own documents, workflows, and use cases, without needing lengthy pilots or custom demos. Gemini 2.5’s large context window and structured reasoning unlock a level of depth and adaptability that’s been impossible before, allowing our agents to understand, reason through, and act across highly specific domain workflows. This fundamentally changes the go-to-market experience: business teams can now visualize and validate impact on day one. For industries where trust, compliance, and precision are paramount, that’s a game-changer.”
– Andrew McKishnie, VP of Engineering, Multimodal

Enhanced customization and efficiency for your needs

Gemini 2.5 Flash-Lite in public preview: Gain cost-efficiency with low latency
Get an early look at Gemini 2.5 Flash-Lite, the most cost-effective Gemini 2.5 model yet, optimized for performance in high-volume workloads. Delivering higher performance than the previous Flash-Lite model, 2.5 Flash-Lite is 1.5 times faster than 2.0 Flash, at a lower cost, on Vertex AI. It’s ideal for tasks like classification, translation, intelligent routing, and other cost-sensitive, high-scale operations.

Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash: Customized AI for your business
Achieve unparalleled customization with the GA release of Supervised Fine-Tuning (SFT) for Gemini 2.5 Flash on Vertex AI. Adapt Gemini to your enterprise’s specific datasets, industry-specific terminology, and unique brand voice, leading to higher accuracy on specialized tasks.

Live API with native audio in public preview: Build real-time interactive services
Streamline the development of sophisticated, real-time AI systems with the Live API, now in public preview with native audio-to-audio capabilities. This enables more natural and responsive voice-driven applications and complex AI agent interactions.

“Newo.ai enables small and medium businesses to deploy fully functional AI receptionists that handle all incoming communication channels—voice and text—in just 3 minutes with one click. We’ve worked through thousands of customer scenarios to enable AI Employee creation using only a Google Maps listing or website. While this appears simple, we deliver sophisticated conversation flows requiring advanced reasoning, low latency, multilingual capabilities, and empathetic responses—features powered by the Live API and Gemini 2.5 Flash on Vertex AI. This combination allows us to deliver production-ready AI employees that generate up to 30x ROI for our clients.”
– David Yang, Co-founder, Newo.ai

Driving your enterprise AI initiatives forward, these comprehensive Vertex AI updates enable you to continue to scale confidently with robust, production-grade models. You can now tailor powerful AI precisely to your unique operational needs and data, optimize for cost-efficiency in high-throughput scenarios, and build next-generation, interconnected AI solutions that push the boundaries of innovation.

“At Citizen Health, we develop AI advocates that empower rare‑disease patients/caretakers to understand and navigate their healthcare journeys. Our data pipelines stream longitudinal EHR data – decades of clinician notes, imaging reports, and genomic panels – directly into Gemini  2.5 Pro’s million‑token context windows, enabling patients and caretakers to receive concise, context‑rich answers in near real-time. We orchestrate Gemini 2.5 Flash and Gemini 2.5 Pro models within a LangGraph‑powered multi‑agent framework, ensuring the most relevant evidence reaches patients and caretakers without hallucinations. Gemini’s long‑context comprehension coupled with rapid inference converts exhaustive document review into a seamless conversation, allowing families to spend less time deciphering records and more time making informed care decisions.”
– Daniel Wang, CTO, Citizen Health

Pricing and availability
The Gemini 2.5 family of models offers a range of options to meet diverse enterprise needs. With Gemini 2.5 Flash moving to general availability, its pricing has been updated to reflect its improved quality and comprehensive capabilities. We are also introducing preview pricing for Gemini 2.5 Flash-Lite, our most cost efficient Gemini 2.5 model yet. For complete details on pricing for Gemini 2.5 Flash, Gemini 2.5 Pro, and the Gemini 2.5 Flash-Lite preview, please visit our pricing page.

Start moving to production today with Gemini 2.5 Flash and Gemini 2.5 Pro, now generally available on Vertex AI.

Read More for the details.

2025 06 17

GCP – Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes

Tibor Kiss Cloud, Google Cloud gcp

Integrating context from tools and data sources into LLMs can be challenging, which impacts ease-of-use in the development of AI agents. To address this challenge, Anthropic introduced the Model Context Protocol (MCP), which standardizes how applications provide context to LLMs. Imagine you want to build an MCP server for your API to make it available to fellow developers so they can use it as context in their own AI applications. But where do you deploy it? Google Cloud Run could be a great option.

Drawing directly from the official Cloud Run documentation for hosting MCP servers, this guide shows you the straightforward process of setting up your very own remote MCP server. Get ready to transform how you leverage context in your AI endeavors!

MCP Transports

MCP follows a client-server architecture, and for a while, only supported running the server locally using the stdio transport.

MCP-blog-image — https://modelcontextprotocol.io/introduction

MCP has evolved and now supports remote access transports: streamable-http and sse. Server-Sent Events (SSE) has been deprecated in favor of Streamable HTTP in the latest MCP specification but is still supported for backwards compatibility. Both of these two transports allow for running MCP servers remotely.

With Streamable HTTP, the server operates as an independent process that can handle multiple client connections. This transport uses HTTP POST and GET requests.

The server MUST provide a single HTTP endpoint path (hereafter referred to as the MCP endpoint) that supports both POST and GET methods. For example, this could be a URL like https://example.com/mcp.

You can read more about the different transports in the official MCP docs.

Benefits of running an MCP server remotely

Running an MCP server remotely on Cloud Run can provide several benefits:

Scalability: Cloud Run is built to rapidly scale out to handle all incoming requests. Cloud Run will scale your MCP server automatically based on demand.
Centralized server: You can share access to a centralized MCP server with team members through IAM privileges, allowing them to connect to it from their local machines instead of all running their own servers locally. If a change is made to the MCP server, all team members will benefit from it.
Security: Cloud Run provides an easy way to force authenticated requests. This allows only secure connections to your MCP server, preventing unauthorized access.

IMPORTANT: The security benefit is critical. If you don’t enforce authentication, anyone on the public internet can potentially access and call your MCP server.

Prerequisites

Python 3.10+
Uv (for package and project management, see docs for installation)
Google Cloud SDK (gcloud)

Installation

Create a folder, mcp-on-cloudrun, to store the code for our server and deployment:

code_block: <ListValue: [StructValue([(‘code’, ‘mkdir mcp-on-cloudrunrncd mcp-on-cloudrun’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93a60>)])]>

Let’s get started by using uv to create a project. Uv is a powerful and fast package and project manager.

code_block: <ListValue: [StructValue([(‘code’, ‘uv init –name “mcp-on-cloudrun” –description “Example of deploying a MCP server on Cloud Run” –bare –python 3.10’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93e20>)])]>

After running the above command, you should see the following pyproject.toml:

code_block: <ListValue: [StructValue([(‘code’, ‘[project]rnname = “mcp-on-cloudrun”rnversion = “0.1.0”rndescription = “Example of deploying a MCP server on Cloud Run”rnrequires-python = “>=3.10″rndependencies = []’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93f10>)])]>

Next, let’s create the additional files we will need: a server.py for our MCP server code, a test_server.py that we will use to test our remote server, and a Dockerfile for our Cloud Run deployment.

code_block: <ListValue: [StructValue([(‘code’, ‘touch server.py test_server.py Dockerfile’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b932b0>)])]>

Our file structure should now be complete:

code_block: <ListValue: [StructValue([(‘code’, ‘├── mcp-on-cloudrunrn│ ├── pyproject.tomlrn│ ├── server.pyrn│ ├── test_server.pyrn│ └── Dockerfile’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93700>)])]>

Now that we have our file structure taken care of, let’s configure our Google Cloud credentials and set our project:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud auth loginrnexport PROJECT_ID=<your-project-id>rngcloud config set project $PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93070>)])]>

Math MCP Server

LLMs are great at non-deterministic tasks: understanding intent, generating creative text, summarizing complex ideas, and reasoning about abstract concepts. However, they are notoriously unreliable for deterministic tasks – things that have one, and only one, correct answer.

Enabling LLMs with deterministic tools (such as math operations) is one example of how tools can provide valuable context to improve the use of LLMs using MCP.

We will use FastMCP to create a simple math MCP server that has two tools: add and subtract. FastMCP provides a fast, Pythonic way to build MCP servers and clients.

Add FastMCP as a dependency to our pyproject.toml:

code_block: <ListValue: [StructValue([(‘code’, ‘uv add fastmcp==2.6.1 –no-sync’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93220>)])]>

Copy and paste the following code into server.py for our math MCP server:

code_block: <ListValue: [StructValue([(‘code’, ‘import asynciornimport loggingrnimport osrnrnfrom fastmcp import FastMCP rnrnlogger = logging.getLogger(__name__)rnlogging.basicConfig(format=”[%(levelname)s]: %(message)s”, level=logging.INFO)rnrnmcp = FastMCP(“MCP Server on Cloud Run”)rnrn@mcp.tool()rndef add(a: int, b: int) -> int:rn “””Use this to add two numbers together.rn rn Args:rn a: The first number.rn b: The second number.rn rn Returns:rn The sum of the two numbers.rn “””rn logger.info(f”>>> ?️ Tool: ‘add’ called with numbers ‘{a}’ and ‘{b}'”)rn return a + brnrn@mcp.tool()rndef subtract(a: int, b: int) -> int:rn “””Use this to subtract two numbers.rn rn Args:rn a: The first number.rn b: The second number.rn rn Returns:rn The difference of the two numbers.rn “””rn logger.info(f”>>> Tool: ‘subtract’ called with numbers ‘{a}’ and ‘{b}'”)rn return a – brnrnif __name__ == “__main__”:rn logger.info(f” MCP server started on port {os.getenv(‘PORT’, 8080)}”)rn # Could also use ‘sse’ transport, host=”0.0.0.0″ required for Cloud Run.rn asyncio.run(rn mcp.run_async(rn transport=”streamable-http”, rn host=”0.0.0.0″, rn port=os.getenv(“PORT”, 8080),rn )rn )’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b5b93850>)])]>

Transport

We are using the streamable-http transport for this example as it is the recommended transport for remote servers, but you can also still use sse if you prefer as it is backwards compatible.

If you want to use sse, you will need to update the last line of server.py to use transport="sse".

Deploying to Cloud Run

Now let’s deploy our simple MCP server to Cloud Run.

Copy and paste the below code into our empty Dockerfile; it uses uv to run our server.py:

code_block: <ListValue: [StructValue([(‘code’, ‘# Use the official Python lightweight imagernFROM python:3.13-slimrnrn# Install uvrnCOPY –from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/rnrn# Install the project into /apprnCOPY . /apprnWORKDIR /apprnrn# Allow statements and log messages to immediately appear in the logsrnENV PYTHONUNBUFFERED=1rnrn# Install dependenciesrnRUN uv syncrnrnEXPOSE $PORTrnrn# Run the FastMCP serverrnCMD [“uv”, “run”, “server.py”]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473b80>)])]>

You can deploy directly from source, or by using a container image.

For both options we will use the --no-allow-unauthenticated flag to require authentication.

This is important for security reasons. If you don’t require authentication, anyone can call your MCP server and potentially cause damage to your system.

Option 1 – Deploy from source

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run deploy mcp-server –no-allow-unauthenticated –region=us-central1 –source .’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473ee0>)])]>

Option 2 – Deploy from a container image

Create an Artifact Registry repository to store the container image.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud artifacts repositories create remote-mcp-servers \rn –repository-format=docker \rn –location=us-central1 \rn –description=”Repository for remote MCP servers” \rn –project=$PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473070>)])]>

Build the container image and push it to Artifact Registry with Cloud Build.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud builds submit –region=us-central1 –tag us-central1-docker.pkg.dev/$PROJECT_ID/remote-mcp-servers/mcp-server:latest’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b04732e0>)])]>

Deploy our MCP server container image to Cloud Run.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run deploy mcp-server \rn –image us-central1-docker.pkg.dev/$PROJECT_ID/remote-mcp-servers/mcp-server:latest \rn –region=us-central1 \rn –no-allow-unauthenticated’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473d30>)])]>

Once you have completed either option, if your service has successfully deployed you will see a message like the following:

code_block: <ListValue: [StructValue([(‘code’, ‘Service [mcp-server] revision [mcp-server-12345-abc] has been deployed and is serving 100 percent of traffic.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473340>)])]>

Authenticating MCP Clients

Since we specified --no-allow-unauthenticated to require authentication, any MCP client connecting to our remote MCP server will need to authenticate.

The official docs for Host MCP servers on Cloud Run provides more information on this topic depending on where you are running your MCP client.

For this example, we will run the Cloud Run proxy to create an authenticated tunnel to our remote MCP server on our local machines.

By default, the URL of Cloud Run services requires all requests to be authorized with the Cloud Run Invoker (roles/run.invoker) IAM role. This IAM policy binding ensures that a strong security mechanism is used to authenticate your local MCP client.

Make sure that you or any team members trying to access the remote MCP server have the roles/run.invoker IAM role bound to their IAM principal (Google Cloud account).

NOTE: The following command may prompt you to download the Cloud Run proxy if it is not already installed. Follow the prompts to download and install it.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud run services proxy mcp-server –region=us-central1’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473eb0>)])]>

You should see the following output:

code_block: <ListValue: [StructValue([(‘code’, ‘Proxying to Cloud Run service [mcp-server] in project [<YOUR_PROJECT_ID>] region [us-central1]rnhttp://127.0.0.1:8080 proxies to https://mcp-server-abcdefgh-uc.a.run.app’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473970>)])]>

All traffic to http://127.0.0.1:8080 will now be authenticated and forwarded to our remote MCP server.

Testing the remote MCP server

Let’s test and connect to the remote MCP server using the FastMCP client to connect to http://127.0.0.1:8080/mcp (note the /mcp at the end as we are using the Streamable HTTP transport) and call the add and subtract tools.

Add the following code to the empty test_server.py file:

code_block: <ListValue: [StructValue([(‘code’, ‘import asynciornrnfrom fastmcp import Clientrnrnasync def test_server():rn # Test the MCP server using streamable-http transport.rn # Use “/sse” endpoint if using sse transport.rn async with Client(“http://localhost:8080/mcp”) as client:rn # List available toolsrn tools = await client.list_tools()rn for tool in tools:rn print(f”>>> Tool found: {tool.name}”)rn # Call add toolrn print(“>>> Calling add tool for 1 + 2”)rn result = await client.call_tool(“add”, {“a”: 1, “b”: 2})rn print(f”<<< Result: {result[0].text}”)rn # Call subtract toolrn print(“>>> Calling subtract tool for 10 – 3”)rn result = await client.call_tool(“subtract”, {“a”: 10, “b”: 3})rn print(f”<<< Result: {result[0].text}”)rnrnif __name__ == “__main__”:rn asyncio.run(test_server())’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473580>)])]>

NOTE: Make sure you have the Cloud Run proxy running before running the test server.

In a new terminal run:

code_block: <ListValue: [StructValue([(‘code’, ‘uv run test_server.py’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473130>)])]>

You should see the following output:

code_block: <ListValue: [StructValue([(‘code’, ‘>>> Tool found: addrn>>> Tool found: subtractrn>>> Calling add tool for 1 + 2rn<<< Result: 3rn>>> Calling subtract tool for 10 – 3rn<<< Result: 7’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec3b0473c10>)])]>

You’ve done it! You have successfully deployed a remote MCP server to Cloud Run and tested it using the FastMCP client.

Want to learn more about deploying AI applications on Cloud Run? Check out this blog from Google I/O to learn the latest on Easily Deploying AI Apps to Cloud Run!

Continue Reading

Read More for the details.

2025 06 16

GCP – C4D now GA: up to 80% higher performance for your business critical workloads

Tibor Kiss Cloud, Google Cloud gcp

We’re excited to announce the general availability of our next-generation C4D virtual machine family. Powered by 5th Gen AMD EPYC processors (Turin) paired with Google Titanium’s latest advancements, C4D provides customers with meaningful performance improvements — up to 80% higher throughput for web serving and 30% better performance for general computing workloads compared to the previous generation. This improvement in performance enables you to maximize your cloud investment and achieve more with fewer resources.

Beyond raw performance, C4D supports key enterprise capabilities including our first AMD-based Bare Metal instances offering direct access to all the resources on the server for maximum control and performance (bare metal will be available in the coming weeks); and the next-gen Titanium Local SSD, which enhances I/O-intensive operations with up to 35% lower latency vs. the prior generation. These hardware advancements are paired with enterprise-grade security and reliability, featuring a 30-day uptime window between planned maintenance events. With this combination of peak performance, expanded capabilities and enterprise-grade controls, C4D is suited for a wide range of general-purpose computing workloads, from databases, AI inference, web, application and game servers, to mission-critical business applications.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e64ec35a670>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>

Supercharge your workloads while optimizing costs

General computing and web serving workloads

For general-purpose computing workloads, C4D VMs deliver up to 25% better performance per dollar than C3D, based on estimated SPECrate®2017_int_base benchmark results. C4D also delivers up to 20% better performance per dollar than comparable, generally available offerings from other cloud providers on the same benchmark. This helps you meet demanding performance requirements for a wide range of workloads — including web servers, game, ad and application servers, and containerized microservices — while optimizing resource usage and reducing costs.

For web-serving workloads, C4D leverages AMD Turin’s improved L3 cache efficiency and next-generation branch prediction to deliver higher throughput/vcpu, resulting in up to 75% increase in price-performance compared to the previous-generation C3D. This drives faster page rendering and a smoother end-user experience, with up to 45% higher price-performance compared to competitive offerings.

“AppLovin, a global leader in mobile advertising, is constantly looking for cutting-edge infrastructure innovations to deliver exceptional performance for our clients. Google Cloud’s C4D VMs enable us to do just that — driving up to 40% improvement over the prior generation, which leads to significant efficiency gains and latency reduction.” – Basil Shikin, Chief Technology Officer, AppLovin

“On C4D, our ad servers perform 191% faster than N2D, and 81% faster than C3D to serve the same amount of ad-requests. This improved performance comes at a lower overall cost because we can run a smaller number of more efficient nodes, but not only that, for us more performance/less latency means not only savings but more revenue since the fill-rate for ads (successful bid/ask matching) grows exponentially.” – Pablo Loschi, Principal Systems Engineer, Verve Group

“C4D Performance was impressive — most workloads, including page rendering & video processing, were 40+% faster than the previous generation. This kind of improvement makes a real difference to users of platforms like SpareRoom.” – Dimitrios Kechagias, Principal Developer, Cloud Infrastructure Optimisation Lead, SpareRoom

Databases and data-intensive applications

The C4D family is purpose-built for data-intensive applications such as databases and data analytics, by offering the latest generation compute and advanced storage capabilities. C4D’s high core frequency of up to 4.1Ghz and improved Instructions Per Clock (IPC) accelerate transactional workloads such as MySQL by up to 55% versus the prior generation with faster, more efficient query processing. For applications that require large datasets in memory, C4D provides VM sizes scaling up to 384vcpu and 3TB of high-bandwidth DDR5 memory. To scale database I/O performance, customers benefit from the integrated Hyperdisk Extreme storage with up to 500k IOPS and the new Titanium Local SSD that reduces read latency by 35% compared to the prior generation. Together, these capabilities increase performance and responsiveness for your mission-critical databases, delivering up to 35% better price-performance for Redis and MySQL workloads than comparable generally available VMs from other hyperscalers.

“We are constantly looking for advanced computing options to improve experience for the players. With Google’s new C4D VMs, we see drastically improved performance for our observability stack which handles 50k inserts/sec concurrently. Compared to C3D, we were able to cut our resource footprint by half, while reducing CPU load by 35% and seeing a 30% improvement in indexing latency. We look forward to adopting C4D at scale.” – Grzegorz Dlugolecki, Principal Cloud & Kubernetes Engineer, Chess.com

“Across over 100 benchmarks, going from the C3D to C4D yielded 1.7x the performance! This is a heck of a generational improvement for Google Cloud or any public cloud provider for that matter. C4D performance is extremely compelling and opens up a lot of new compute possibilities in the public cloud.” – Michael Larabel, Founder and Principal Author, Phoronix (Read the full study here)

AI inference and complex computations
C4D’s processor offers full support for AVX-512 with a 512 bit datapath, a 50% increase in memory channels, and higher IPC compared to the prior generation. This provides significant improvements for compute-heavy tasks such as CPU-based inference, matrix operations, financial modeling and simulations, and analytics. For recommendation inference, C4D VMs demonstrate an up to 75% price-performance uplift compared to C3D and up to 35% better price-performance versus the comparable competitive offerings from leading hyperscalers, to accelerate the time to results and reduce TCO.

“Silk has tested C4D, and found it to deliver a dramatic increase of up to 40% in performance compared to the previous generation, C3D, enabling our customers to enjoy significant gains in efficiency and agility of their mission critical workloads, over the transactional, analytics and AI use cases.” – Adik Sokolovski, Chief R&D Officer, Silk

Security, maintenance controls and shapes

With Titanium, C4D offers improved infrastructure performance, lifecycle management, reliability, and security. Storage and network management is offloaded to the Titanium adapter, reserving the host resources for running customer workloads.

Titanium also enables our first AMD-based Bare Metal instances, which provide direct access to server resources. Bare metal instances are ideal for workloads that require low-level system access — like custom hypervisors, container platforms, or applications with specialized performance or licensing needs. Sectors such as financial services, security, and private cloud platforms will particularly benefit from C4D Bare Metal offerings.

C4D VMs support Hyperdisk, Google Cloud’s workload-optimized block storage. Designed for high performance and scalability, Hyperdisk is cost-efficient, easy to manage at scale, and enterprise-ready. C4D VMs are compatible with Hyperdisk Balanced and Extreme, supporting up to 512 TiB of capacity per instance. With up to 320K IOPS per instance, Hyperdisk Balanced offers an optimal mix of performance and cost-efficiency, for a broad range of workloads. Hyperdisk Extreme delivers ultra-low latency and supports up to 500K IOPS and 10,000 MiB/s throughput per instance — making it well-suited for demanding workloads like databases and caching layers. With real-time tuning of IOPS and bandwidth, Hyperdisk helps ensure your applications always have the storage resources they need.

To enhance security, C4D VMs support confidential computing with AMD Secure Encrypted Virtualization (AMD SEV), utilizing hardware-based memory encryption to help protect your data and applications while in use. This makes C4D an excellent choice for sensitive data, privileged information, PII, and workloads subject to data privacy regulations and compliance requirements.

Experience C4D today

C4D delivers exceptional performance, scalability, and efficiency for today’s most demanding workloads. Powered by next-gen AMD processors, Titanium infrastructure, and Hyperdisk storage, C4D delivers the performance and capabilities needed to make the most of your cloud resources. Whether you’re just getting started with Compute Engine or planning to upgrade from previous generations, C4D offers a clear path to greater efficiency and performance. C4D is now available in 12 regions and 28 zones — check regional availability on our Regions and Zones page and deploy your first instance in the Google Cloud console or with Google Kubernetes Engine.

Read More for the details.

2025 06 16

GCP – Simplify your multi-cloud strategy with Cloud Location Finder, now in preview

Tibor Kiss Cloud, Google Cloud gcp

As cloud environments expand beyond traditional architectures to include multiple clouds, managing your infrastructure effectively becomes more complex. Imagine easily accessing consistent and up-to-date location information across different cloud providers, so your multi-cloud applications are designed and optimized with performance, security, and regulatory compliance in mind.

Today, we are making this a reality with Cloud Location Finder, a new Google Cloud service which provides up-to-date location data across Google Cloud, Amazon Web Services (AWS), Microsoft Azure, and Oracle Cloud Infrastructure (OCI). Now, you can strategically deploy workloads across different cloud providers with confidence and control.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e64f09e92b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>

Why use Cloud Location Finder?

Unified location data: Cloud Location Finder makes it easy for you to access the latest location information, providing a single source of truth for cloud location data for Google Cloud, AWS, Azure, and OCI.
Rich location attributes: Data includes public cloud region and zone information, and location metadata like proximity¹, territory code, and carbon footprint.
Up-to-date information: Cloud Location Finder provides 24-hour data freshness for active regions, removing outdated information promptly when locations are turned down. This eliminates inconsistencies from hard-coded lists and ensures your information is current without manual monitoring.
Programmatically accessible: Especially for organizations with diverse cloud and location requirements, Cloud Location Finder eliminates the need to manually manage and choose new locations. And because it is an API, it is easy to integrate into your applications.

How can Cloud Location Finder help you?

Whether you’re a partner or customer, a Cloud Architect or application developer designing a hybrid setup or a platform admin ensuring governance, Cloud Location Finder offers valuable insights:

Optimize deployments: Easily identify the nearest Google Cloud region or zone to an existing AWS/Azure/OCI deployment, to help you optimize your multi-cloud application for performance and latency.
Meet sustainability goals: As your business grows, choose nearby cloud locations that are also sustainable.
Help ensure compliance: Find a list of regions and zones in a specified territory to help you ensure compliant log storage or data processing across multiple clouds.
Improve reliability: Rely on a consistent source of truth for location data, which can be integrated directly into your applications.

Getting started

Cloud Location Finder is accessible via REST APIs and gcloud CLI, and available at no cost. You can easily list locations, get specific location details, find nearby locations, and filter these based on criteria such as cloud provider, location type, territory, or carbon footprint.

Ready to streamline your multi-cloud location strategy? Explore the Cloud Location Finder documentation to learn more and start building with consistent, accurate location data today!

^{1. Currently, for GCP regions only}

Read More for the details.

2025 06 16

GCP – Build a multi-agent KYC workflow in three steps using Google’s Agent Development Kit and Gemini

Tibor Kiss Cloud, Google Cloud gcp

Know Your Customer (KYC) processes are foundational to any Financial Services Institution’s (FSI) regulatory compliance practices and risk mitigation strategies. KYC is how financial institutions verify the identity of their customers and assess associated risks. But as customers expect instant approvals, FSIs face pressure to streamline their manual, time-consuming and error-prone KYC processes.

The good news: As LLMs get more capable and gain access to more tools to perform useful actions, employing a robust ‘agentic’ architecture to bolster the KYC process is just what FSIs need. The challenge? Building robust AI agents is complex. Google’s Agent Development Kit (ADK) gives you essential tooling to build multi-agent workflows. Plus, combining ADK with Search Grounding via Gemini can help give you higher fidelity and trustworthiness for tasks requiring external knowledge. Together, this can give FSIs:

Improved efficiency: Automate large portions of the KYC workflow, reducing manual effort and turnaround times.
Enhanced accuracy: Leverage AI for consistent document analysis and comprehensive external checks.
Strengthened compliance: Improve auditability through clear reporting and source attribution (via grounding).

To that end, this post illustrates how Google Cloud’s cutting-edge AI technologies – the Agent Development Kit (ADK), Vertex AI Gemini models, Search Grounding, and BigQuery – can be leveraged to build such a multi-agent KYC solution.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e64ec38b8e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Tech stack from Google Cloud

This multi-agent architecture we’ll show you today effectively utilizes several key Google Cloud services:

Agent Development Kit (ADK): Simplifies the creation and orchestration of agents. ADK handles agent definition, tool integration, state management, and inter-agent communication. It’s a platform and model-agnostic agentic framework which provides the scaffolding upon which complex agentic workflows can be built.
Vertex AI & Gemini models: The agents are powered by Gemini models (like gemini-2.0-flash) hosted on Vertex AI. These models provide the core reasoning, instruction-following, and language understanding capabilities. Gemini’s potential for multimodal analysis (processing images in IDs or documents) and multilingual support further enhances the KYC process for diverse customer bases.
Search Grounding: The google_search tool, used by the Resume_Crosschecker and External_Search agents, leverages Gemini’s Google Search grounding capabilities. This connects the Gemini model’s responses to real-time information from Google Search, significantly reducing hallucinations and ensuring that external checks are based on up-to-date, verifiable public data. The agents are instructed to cite sources (URIs) provided by the grounding mechanism, enhancing transparency and auditability.
BigQuery: The search_internal_database custom tool demonstrates direct integration with BigQuery. The KYC_Agent uses this tool early in the workflow to check if a customer profile already exists within the institution’s internal data warehouse, preventing duplicate entries and leveraging existing information. This showcases how agents can securely interact with internal, structured datasets.

Deep dive: How to build a KYC agent in three steps

Our example KYC solution utilizes a root agent (KYC Agent) that orchestrates several specialized sub-agents:

Document Checker: Analyzes uploaded documents (ID, proof of address, bank statements, etc.) for consistency, validity, and potential discrepancies across documents.
Resume Crosschecker: Verifies information on a customer’s resume against public sources like LinkedIn and company websites using grounded web searches.
External Search: Conducts external due diligence, searching for adverse media, Politically Exposed Person (PEP) status, and sanctions list appearances using grounded web searches.
Wealth Calculator: Assesses the client’s financial position by analyzing financial documents, calculating net worth, and verifying the source of wealth legitimacy.

The root KYC_Agent manages the overall workflow, calling these child agents sequentially and handling tasks like checking if the customer is already present in internal databases and generating unique case IDs to track KYC requests.

Diagram showing the KYC Agent’s structure with sub-agents and tools

Step 1: Define your root agent (which receives the initial request from the user) and the child agents which handle the specialised tasks involved in the KYC process.

code_block: <ListValue: [StructValue([(‘code’, ‘# kyc_agent/agent.py (Illustrative Snippet)rnrn# Child Agents Definitions (Simplified)rndocument_checker_agent = Agent(rn model=MODEL, # e.g. gemini-2.0-flash-001rn name=”Document_Checker”,rn description=’Analyses documents and finds discrepancies…’,rn instruction=instructions_dict[‘Document_Checker’],rngenerate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrnresume_crosschecker = Agent(rn model=MODEL,rn name=’Resume_Crosschecker’,rn description=’Uses `google_search` tool for verifying resume…’,rn instruction=instructions_dict[‘Resume_Crosschecker’],rn tools=[google_search], # Leverages Search Groundingrn generate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrnexternal_search_agent = Agent(rn model=MODEL,rn name=”External_Search”,rn description=’Uses `google_search` tool to find negative news…’,rn instruction=instructions_dict[‘External_Search’],rn tools=[google_search], # Leverages Search Groundingrn generate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrnwealth_calculator_agent = Agent(rn model=MODEL,rn name=”Wealth_Calculator”,rn description=”Assesses the client’s financial position…”,rn instruction=instructions_dict[‘Wealth_Calculator’],rn generate_content_config=GenerateContentConfig(temperature=0.27),rn)rnrn# Wrap Resume_Crosschecker Agentrnresume_crosschecker_tool = AgentTool(agent=resume_crosschecker_agent)rnrn# Wrap External_Search Agentrnexternal_search_tool = AgentTool(agent=external_search_agent)rnrn# Root KYC Agent orchestrating the workflowrnroot_agent = Agent(rn model=MODEL,rn name=”KYC_Agent”,rn description=”KYC Onboarding Assistant”,rn # Add the AgentTool wrappers to the tools list, alongside the original toolsrn tools=[rn generate_case_id,rn search_internal_database,rn resume_crosschecker_tool, # AgentToolrn external_search_tool # AgentToolrn ],rn sub_agents=[rn document_checker_agent,rn wealth_calculator_agentrn ],rn generate_content_config=GenerateContentConfig(temperature=0.27),rn instruction=instructions_dict[‘KYC_Agent’], # Instructions should still guide the LLM to call the tools by namern global_instruction=’You will always give detailed responses and follow instructions’rn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e64ec38b8b0>)])]>

Step 2: Define the tools needed by your agents in order to perform their respective tasks

code_block: <ListValue: [StructValue([(‘code’, ‘# kyc_agent/custom_tools.py (Illustrative Snippet)rnrndef search_internal_database(input_name: str) -> Dict[str, Any]:rn “””rn Finds names in an internal BigQuery table…rn “””rn try:rn client = bigquery.Client(project=PROJECT_ID)rn query = f”””rn SELECT `Full Name`, `UID`, `Risk Level`, `Citizenship`, `Networth`rn FROM `{TABLE_NAME}` # Defined in constants.pyrn WHERE LOWER(`Full Name`) LIKE LOWER(‘%{input_name}%’)rn “””rn query_job = client.query(query)rn results = query_job.result()rn df = results.to_dataframe()rn return df.to_dict(‘records’)rn except Exception as e:rn error_message = f”An error occurred with BigQuery: {e}”rn # Handle errors, potentially fallback to alternate data sourcern # Fallback logic would go here if neededrn return {“error”: error_message}’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e64ec38ba30>)])]>

Step 3: Run your agent locally using the command “adk web”. ADK provides a built-in UI for developers to visualise and debug the agent during the development process:

Screenshot of the ADK Dev UI used for developing agents

Start building now

This multi-agent KYC architecture demonstrates the power of combining ADK, Gemini, Search Grounding, and BigQuery. It provides a blueprint for building intelligent, automated solutions for complex business processes.

Learn more: Dive deeper into the technologies used:

Build your own: Adapt this pattern to your specific KYC requirements and integrate it with your existing systems on Google Cloud using services like Cloud Run for deployment.

Contact us: Reach out to Google Cloud Sales for a deeper discussion on implementing AI-driven KYC solutions tailored to your organization.

By embracing a multi-agent approach powered by Google Cloud’s AI stack, FSIs can transform their KYC processes, achieving greater efficiency, accuracy, and compliance in an increasingly digital world.

Read More for the details.

Google Cloud

Getting more content with confidence

Powering enhanced data extraction with Gemini 2.5 Pro

An agent-to-agent protocol for deeper collaboration

Continual learning and human-in-the-loop, for the most flexible AI

The future: Premium document intelligence powered by advanced AI collaboration

Privacy-preserving, AI-assisted patient monitoring

Low resolution, high information: Securing patient privacy

Patient monitoring with AI: Overcoming low-resolution data challenges

Driving healthcare innovation with Google Cloud

Further possibilities: Distilling nuanced information

The path forward with patient care

I. Craft your prompt

II: Detect multiple labels

Challenges and mitigations for multi-label video tasks:

III: Conduct single-label video task analysis

Challenges and mitigations for multi-class single-label video tasks

IV. Prepare video tuning dataset

V. Set the hyperparameters for tuning

VI. Evaluate the tuned checkpoint on the video tasks

Get started on Vertex AI today

Where you can run your code

Service Extensions deep dive

Use Cloudinary’s image & video optimization solution

What’s next

Manage Continuous Integration directly within Looker

Diving deeper into Customer Engagement Suite (CES)

Leveraging analytics for deeper insights

Implementing analytics with Contact Center as a Service

Tapping into Looker and BigQuery to streamline contact center analytics

Enhanced vectorization: next-level query execution

Enhanced vectorization in action

What’s next

What makes cloud backup resilient?

5 signs your backup posture may be at risk

Best practices for data protection

1. Versioning and retention: first lines of defense

2. Monitor for gaps in coverage

3. Design for granular recovery

Automating the complexity away

From backups to business insights

What a “mature” backup posture looks like

Ready to simplify backup?

The generative AI revolution in BI

Google-easy data storytelling with Looker reports

Empowering developers and embedded experiences

Bringing trust to every gen-AI-powered business

Mitigations

Lure PDF Document

Immutability and indelibility

Backup Vault now supports Persistent Disk and Hyperdisk

Here’s how it works

Secure disaster recovery with multi-region backup vaults

Protect all your critical Compute Engine data

Introducing backup vaults for cyber resilience and simplified Compute Engine backups

It’s (mostly) a constraint optimization problem

Core building blocks

Workload scheduling constraint scenarios

Conclusion and next steps

Spanner’s core innovation: TrueTime and external consistency

Addressing the consistency-scale dilemma

Spanner as a cloud service

Empowering customers and industries

The future with Spanner

Experience the Spanner difference

Learn more about the graduating startups and their inspiring work:

Build with confidence using production-ready Gemini 2.5

Enhanced customization and efficiency for your needs

MCP Transports

Benefits of running an MCP server remotely

Prerequisites

Installation

Math MCP Server

Transport

Deploying to Cloud Run

Option 1 – Deploy from source

Option 2 – Deploy from a container image

Authenticating MCP Clients

Testing the remote MCP server

Continue Reading