AWS Glue now supports read and write operations from AWS Glue 5.0 Apache Spark jobs on AWS Lake Formation registered tables when the job role has full table access. This capability enables Data Manipulation Language (DML) operations including CREATE, ALTER, DELETE, UPDATE, and MERGE INTO statements on Apache Hive and Iceberg tables from within the same Apache Spark application.
While Lake Formation’s fine-grained access control (FGAC) offers granular security controls at row, column, and cell levels, many ETL workloads simply need full table access. This new feature enables AWS Glue 5.0 Spark jobs to directly read and write data when full table access is granted, removing limitations that previously restricted certain Extract, Transform, and Load (ETL) operations. You can now leverage advanced Spark capabilities including Resilient Distributed Datasets (RDDs), custom libraries, and User Defined Functions (UDFs) with Lake Formation tables. Additionally, data teams can run complex, interactive Spark applications through SageMaker Unified Studio in compatibility mode while maintaining Lake Formation’s table-level security boundaries.
This feature is available in all AWS Regions where AWS Glue and AWS Lake Formation are supported. To learn more, visit the AWS Glue product page and documentation.
Amazon S3 Tables are now available in two additional AWS Regions: Asia Pacific (Thailand) and Mexico (Central). S3 Tables deliver the first cloud object store with built-in Apache Iceberg support, and the easiest way to store tabular data at scale.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C7g instances are available in the AWS Israel (Tel Aviv) Region. These instances are powered by AWS Graviton3 processors that provide up to 25% better compute performance compared to AWS Graviton2 processors, and built on top of the the AWS Nitro System, a collection of AWS designed innovations that deliver efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage.
Amazon EC2 Graviton3 instances also use up to 60% less energy to reduce your cloud carbon footprint for the same performance than comparable EC2 instances. For increased scalability, these instances are available in 9 different instance sizes, including bare metal, and offer up to 30 Gbps networking bandwidth and up to 20 Gbps of bandwidth to the Amazon Elastic Block Store (EBS).
Traditional data warehouses simply can’t keep up with today’s analytics workloads. That’s because today, most data that’s generated is both unstructured and multimodal (documents, audio files, images, and videos). With the complexity of cleaning and transforming unstructured data, organizations have historically had to maintain siloed data pipelines for unstructured and structured data, and for analytics and AI/ML use cases. Between these fragmented data platforms, data access restrictions, slow consumption, and outdated information, enterprises struggle to unlock the full potential of their data. The same issues hinder AI initiatives.
Today we’re introducing a new data type, ObjectRef, now in preview in BigQuery, that represents a reference to any object in Cloud Storage with a URI and additional metadata. ObjectRef complements Object Tables, read-only tables over unstructured data objects in Cloud Storage, to integrate unstructured data like images and audio into existing BigQuery tables. The ObjectRef data type removes fragmentation in data processing and access control, providing a unified, multimodal, and governed way to process all modalities of data. You can process unstructured data with large language models (LLMs), ML models, and open-source Python libraries using the same SQL or Python scripts that process tabular data. You can also store structured and unstructured data in the same row throughout different data engineering stages (extract, load, transform a.k.a. ELT), and govern it using a similar access control model.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed4497f7310>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
For example, to answer the question “of the customers who complained about performance issues during interactions last month, show me the top 10 by revenue” you need to perform natural language processing (NLP) on audio calls, emails and online chat transcripts to normalize the data, identify whether the interaction discussed “performance issues” and detect whether the customer complained. For each of these steps, you need to decide how to build a pipeline over data in Cloud Storage, run AI/ML models on the data, and host the models (e.g., on Compute Engine, Google Kubernetes Engine, or Vertex AI). The normalized and extracted data would then need to be saved in structured format (e.g., in a BigQuery table) and joined with each customer’s revenue data.
With the launch of ObjectRef, you can now answer this question with a simple SQL query. Suppose you’ve combined call center audio files and agent chat text into one BigQuery table customer_interactions using columns (1) audio_ref of type ObjectRef, (2) chat of type STRING. Filtering for customers who complained about performance issues is as easy as adding one more condition in the WHERE clause:
BigQuery with ObjectRef unlocks unique platform capabilities across data and AI:
Multimodality: Natively handle structured (tabular) data, unstructured data, and a combination of the two, in a single table via ObjectRef. Now, you can build multimodal ELT data pipelines to process both structured and unstructured data.
Full SQL and Python support: Use your favorite language without worrying about interoperability. If it works in SQL, it works in Python (via BigQuery DataFrames), and vice versa. Object transformations, saving transformed objects back to Cloud Storage, and any other aggregations or filtering, can all be done in one SQL or Python script.
Gen-AI-ready, serverless, and auto-scaled data processing: Spend more time building your data pipelines, not managing infrastructure. Process unstructured data with LLMs, or use serverless Python UDFs with your favorite open-source library. Create embeddings, generate summaries using a prompt, use a BigQuery table as an input to Vertex AI jobs, and much more.
Unified governance and access control: Use familiar BigQuery governance features such as fine-grained access control, data masking, and connection-delegated access on unstructured data. There is no need to manage siloed governance models for structured versus unstructured data.
ObjectRef in action
Let’s take a closer look at how to use the ObjectRef data type.
What is an ObjectRef?
First, it’s good to understand ObjectRef under the hood. Simply put, ObjectRef is a STRUCT containing object storage and access control metadata. With this launch, when you create an Object Table, it is populated with a new ObjectRef column named ‘ref’.
code_block
<ListValue: [StructValue([(‘code’, ‘struct {rn uri string,rn authorizer string,rn version string, rn details json { rntgcs_metadata jsonrn }rn }’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed43e6869d0>)])]>
Create a BigQuery table with ObjectRefs
Imagine a call center that stores structured information in standard BigQuery tables ingestion.sessions, and call audio in a Cloud Storage bucket, with a BigQuery Object Table ingestion.audios created on the Cloud Storage bucket. While this example is based on audio, ObjectRefs can also represent images, documents, and videos.
In the following diagrams, ObjectRefs are highlighted in red.
With ObjectRef, you can join these two tables on sessions.RecordingID and audios.Ref.uri columns to create a single BigQuery table. The new table contains an Audio column of type ObjectRef, using the Ref column from the ingestion.audios table.
code_block
<ListValue: [StructValue([(‘code’, ‘CREATE OR REPLACE TABLE analysis.sessionsrnASrnSELECT sessions.session_id, sessions.date, sessions.customer_id, object_table.ref AS audiornFROM ingestion.sessions INNER JOIN ingestion.audios object_tablernON object_table.uri = sessions.recording_id;’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cf760>)])]>
Capturing the object version allows BigQuery zero-copy snapshots and clones of analysis.sessions to be reproducible and consistent across structured and unstructured data. This allows reproducibility in downstream applications such as ML training and LLM fine-tuning.
Being a STRUCT, ObjectRef also supports nesting in ARRAY. The main audio file represented by Audio can be chunked (for example, into segments per agent ID), and the resulting objects represented in a new column Chunked of type ARRAY<ObjectRef>. This preserves the order of chunks, and stores them alongside the main audio file in the same row. This data transformation lets you report the number of agent handoffs per call and further analyze each call segment separately.
Process using serverless Python
With Python UDF integration, you can bring your favorite open-source Python library to BigQuery as a user-defined function (UDF). Easily derive structured data, and unstructured data from the source ObjectRef and store them in the same row.
The new function OBJ.GET_ACCESS_URL(ref ObjectRef, mode STRING) -> ObjectRefRuntime enables delegated access to the object in Cloud Storage. ObjectRefRuntime provides signed URLs to read and write data, allowing you to manage governance and access control entirely in BigQuery, and removing the need for Cloud Storage access control.
Serverless Python use case 1: Multimodal data to structured data For example, imagine you want to get the duration of every audio file in the analysis.sessions table. Assume that a Python UDF function analysis.GET_DURATION(object_ref_runtime_json STRING) -> INT has already been registered in BigQuery. GET_DURATION uses signed URLs from ObjectRefRuntime to read Cloud Storage bytes.
code_block
<ListValue: [StructValue([(‘code’, ‘– Object is passed to Python UDF using read-only signed URLsrnSELECT analysis.GET_DURATION(TO_JSON_STRING(OBJ.GET_ACCESS_URL(audio, “R”))) AS durationrnFROM analysis.sessionsrnWHERE audio IS NOT NULL’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cf250>)])]>
code_block
<ListValue: [StructValue([(‘code’, ‘import bigframes.pandas as bpdrndf = bpd.read_gbq(“analysis.sessions”)rnfunc = bpd.read_gbq_function(“analysis.get_duration”)rn# Object is passed to Python UDF using read-only signed URLsrndf[“duration”] = df[“audio”].blob.get_runtime_json_str(mode=”R”).apply(func).cache() # cache to execute’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cf2e0>)])]>
Serverless Python use case 2: Multimodal data to processed multimodal data As another example, here’s how to remove noise from every audio file in the analysis.sessions table, assuming that a Python UDF function analysis.DENOISE(src_object_ref_runtime_json STRING, dst_object_ref_runtime_json STRING)-> object_ref_runtime_json STRING has already been registered in BigQuery. This function reads from the source audio, writes the new noise-removed audio to Cloud Storage, and returns ObjectRefs for the new audio files.
ObjectRefRuntime provides signed URLs for reading and writing object bytes.
code_block
<ListValue: [StructValue([(‘code’, ‘SELECT analysis.DENOISE(rn — Source is accessed using read-only signed URLrn TO_JSON_STRING(OBJ.GET_ACCESS_URL(audio, “R”)), rn — Destination is written using read-write signed URL with prefix “denoised-“rn TO_JSON_STRING(OBJ.GET_ACCESS_URL(rn OBJ.MAKE_REF(rn CONCAT(“denoised-“, audio.uri), audio.authorizer),rn “RW”))rnFROM analysis.sessionsrnWHERE audio IS NOT NULL’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cfa60>)])]>
code_block
<ListValue: [StructValue([(‘code’, ‘import bigframes.pandas as bpdrndf = bpd.read_gbq(“analysis.sessions”)rnrndf[“denoised”] = (“denoised-” + df[“audio”].blob.uri()).str.to_blob()rnfunc_df = df[[“audio”, “denoised”]]rnrnfunc = bpd.read_gbq_function(“analysis.denoise”)rn# Source is accessed using read-only signed URLrnfunc_df[“audio”] = func_df[“audio”].blob.get_runtime_json_str(“R”)rn# Destination is written using read-write signed URL with prefix “denoised-“rnfunc_df[“denoised”] = func_df[“denoised”].blob.get_runtime_json_str(“RW”)rnfunc_df.apply(func, axis=1).cache() # cache to execute’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cfdf0>)])]>
Process using Gemini and BigQuery ML
All BigQuery ML generative AI functions such as AI.GENERATE, ML.GENERATE_TEXT and ML.GENERATE_EMBEDDING now support ObjectRefs as first-class citizens. This enables a number of use cases.
BQML use case 1: Multimodal inference using Gemini You can now pass multiple ObjectRefs in the same Gemini prompt for inference.
Here, you can use Gemini to evaluate noise removal quality by comparing the original audio file and the noise-removed audio file. This script assumes the noise-reduced audio file ObjectRef is already stored in column Denoised.
code_block
<ListValue: [StructValue([(‘code’, ‘SELECT AI.GENERATE(rn prompt => (“Compare original audio file to audio file with noise removed, and output quality of noise removal as either good or bad. Original audio is”, OBJ.GET_ACCESS_URL(audio, “r”), “and noise removed audio is”, OBJ.GET_ACCESS_URL(denoised, “r”)),rn — BQ connection with permission to call Geminirn connection_id => “analysis.US.gemini-connection”,rn endpoint => “gemini-2.0-flash”rn).resultrnFROM analysis.sessions WHERE audio IS NOT NULL AND denoised IS NOT NULL;’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cf580>)])]>
code_block
<ListValue: [StructValue([(‘code’, ‘import bigframes.pandas as bpdrnfrom bigframes.ml import llmrnrngemini = llm.GeminiTextGenerator(model_name=”gemini-2.0-flash”, connection_name=”analysis.US.gemini-connection”)rndf = bpd.read_gbq(“analysis.sessions”)rnresult = gemini.predict(df, prompt=[“Compare original audio file to audio file with noise removed, and output quality of noise removal as either good or bad. Original audio is”, df[“audio”], “and denoised audio is”, df[“denoised”]])rnresult[[“ml_generate_text_llm_result”]]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cfc70>)])]>
As another example, here’s how to transcribe the Audio file using Gemini.
code_block
<ListValue: [StructValue([(‘code’, ‘SELECT AI.GENERATE(rn prompt => (“Transcribe this audio file”, OBJ.GET_ACCESS_URL(audio, “r”)),rn — BQ connection with permission to call Geminirn connection_id => “analysis.US.gemini-connection”,rn endpoint => “gemini-2.0-flash”).result as transcriptrnFROM analysis.sessionsrnWHERE audio IS NOT NULL’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cfeb0>)])]>
code_block
<ListValue: [StructValue([(‘code’, ‘import bigframes.pandas as bpdrnfrom bigframes.ml import llmrnrngemini = llm.GeminiTextGenerator(model_name=”gemini-2.0-flash”, connection_name=”analysis.US.gemini-connection”)rndf = bpd.read_gbq(“analysis.sessions”)rnresult = gemini.predict(df, prompt=[“Transcribe this audio file”, df[“audio”]])rnresult[[“ml_generate_text_llm_result”]]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed44a3cff70>)])]>
With BQML + Gemini, you can also generate structured or semi-structured results from multimodal inference. For example, you can do speaker diarization in the Audio file using Gemini to identify the operator vs. the customer.
code_block
<ListValue: [StructValue([(‘code’, ‘SELECT AI.GENERATE(rnprompt => (“Generate audio diarization for this interview. Use JSON format for the output, with the following keys: speaker, transcription. If you can classify the speaker as customer vs operator, please do. If not, use speaker A, speaker B, etc.”, OBJ.GET_ACCESS_URL(audio, “r”)),rn — BQ connection with permission to call Geminirnconnection_id => “analysis.US.gemini_connection”,rnendpoint => “gemini-2.0-flash”).result as diarized_jsonrnFROM analysis.sessionsrnWHERE audio IS NOT NULL;’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed4492c9d90>)])]>
code_block
<ListValue: [StructValue([(‘code’, ‘import bigframes.pandas as bpdrnfrom bigframes.ml import llmrnrngemini = llm.GeminiTextGenerator(model_name=”gemini-2.0-flash”, connection_name=”analysis.US.gemini-connection”)rndf = bpd.read_gbq(“analysis.sessions”)rnresult = gemini.predict(df, prompt=[“Generate audio diarization for this interview. Use JSON format for the output, with the following keys: speaker, transcription. If you can classify the speaker as customer vs operator, please do. If not, use speaker A, speaker B, etc.”, df[“audio”]])rnresult[[“ml_generate_text_llm_result”]]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed4492c9280>)])]>
BQML use case 2: Multimodal embeddings using Gemini
With ML.GENERATE_EMBEDDING support, you can use ObjectRefs with text embedding and multimodal embedding models to create vector indices, and power RAG workflows to ground LLMs.
Assume we have an Object Table ingestion.images with the ref column containing image ObjectRefs.
code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODEL `ingestion.multimodal_embedding_model`rnREMOTE WITH CONNECTION ‘ingestion.US.gemini-connection’rnOPTIONS (ENDPOINT = ‘multimodalembedding@001’);rnrnSELECT ref, ml_generate_embedding_result as embeddingrnFROM ML.GENERATE_EMBEDDING(rn MODEL `ingestion.multimodal_embedding_model`,rn (rn SELECT OBJ.GET_ACCESS_URL(ref, ‘r’) as content, refrn FROM ingestion.imagesrn ),rn STRUCT (256 AS output_dimensionality)rn);”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ed4492c9cd0>)])]>
To summarize, here’s a list of all the new capabilities for performing analytics on unstructured and/or multimodal data using BigQuery:
New types and functions for handling multimodal data (documents, audio files, images, and videos):
ObjectRef and ObjectRefRuntime types along with new functions: OBJ.MAKE_REF, OBJ.GET_ACCESS_URL and OBJ.FETCH_METADATA
Object Table enhancements:
Scalability: Object Tables now support consistent views of Cloud Storage buckets, scaling 5x from 65M to 300M+ objects per table, and ingesting up to 1M object changes per hour per table
Interop with ObjectRef: New ref column provides pre-constructed ObjectRefs directly from Object Tables
BQML Gen-AI multimodal capabilities:
Support multimodal inference in TVFs ML.GENERATE_TEXT and AI.GENERATE_TABLE, and scalar functions such as AI.GENERATE, and AI.GENERATE_BOOL, by encapsulating multiple objects in the same prompt for Gemini using ObjectRef. Objects can be sourced from different columns, and complex types such as arrays.
Support embedding ObjectRef via the ML.GENERATE_EMBEDDING function
An extension to pandas-like dataframe to include unstructured data (powered by ObjectRef) as just another column
Wrangle, process and filter mixed modality data with the familiarity of dataframe operations
Special transformers for unstructured data like chunking, image processing, transcription made available through server side processing functions and BQML
Leverage the rich Python library ecosystem for advanced unstructured data manipulation in a fully managed, serverless experience with BigQuery governance
Get started today
ObjectRef is now in preview. Follow these simple steps to get started:
Learn by doing – try out ObjectRefs with this multimodal data tutorial using either SQL or Python tutorials.
Build your use case – locate the Cloud Storage bucket containing the unstructured data you want to analyze. Create an Object Table or set up automatic Cloud Storage discovery to pull this data into BigQuery. The Object Table will contain a column of ObjectRefs and now you are ready to start transforming the data.
In today’s dynamic cloud market, true growth comes from strategic clarity. For Google Cloud partners, unlocking immense market potential and building a thriving services practice hinges on a definitive roadmap. That’s why we partnered with global technology analyst firm Canalys to independently study the Partner Ecosystem Multiplier (PEM) – a measure of the incremental revenue you can capture when working with Google Cloud.
The study confirms a key finding: For every US$1 a customer invests in Google Cloud, partners delivering comprehensive services across the customer lifecycle stand to capture up to $7.05 in incremental revenue through their own offerings. This top-tier potential is strongly linked to expanding your services across the entire customer lifecycle – a journey many Google Cloud partners are already on, influencing nearly 80% of Google Cloud’s YoY incremental revenue growth in 2024.
Beyond the number: the strategic path leading up to $7.05
The real takeaway goes beyond the number; it’s about how you can strategically navigate this journey and build towards comprehensive service delivery. Canalys’ research visualizes this through a “partner ecosystem flywheel,” which maps typical partner activities across a three-year customer journey. This powerful framework (illustrated below) outlines how leading partners strategically engage customers across six distinct stages: Advise, Design, Procure, Build, Adopt, and Manage.
To achieve this top-tier potential, Canalys highlights the importance of being in a mature cloud region and developing capabilities across the entire customer lifecycle. If you’re already familiar with multipliers, what sets this apart is how partners can leverage Google Cloud’s strengths in analytics, data, and Generative AI to unlock significant revenue over a three-year project cycle, especially by driving GenAI solutions to production.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed43e686e50>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
How partners create value with the ecosystem flywheel
This flywheel is a roadmap for your growth. The study found that partners who haven’t yet achieved the full multiplier potential might currently focus on specific stages by choice, or haven’t yet fully developed maturity across all these service areas. Achieving maximum potential means expanding your practice within each of these flywheel segments
Figure 1: Google Cloud Partner Ecosystem Multiplier by Service Category (Source: Canalys, Google Cloud Partner Ecosystem Multiplier Study, January 2025)
Advise: influence the customer journey (11% of Multiplier): Vital for influencing long-term customer engagement and shaping their cloud destiny, laying groundwork for subsequent opportunities.
Design: build a strong technical foundation (25% of Multiplier):While Design represents nearly a quarter of the total multiplier, successful partners use this stage strategically to set the foundation for higher-value opportunities. By architecting solutions that fully leverage Google Cloud’s AI and data capabilities from the start, partners create pathways for expanded returns and long-term revenue growth from the Build, Adopt, and Manage phases of the cycle.
Procure: optimize commercial foundations (5% of Multiplier): The smallest category of the multiplier, this focuses on re-sell and commercial management, laying essential commercial groundwork.
Build: unlock AI’s transformative power (24% of Multiplier): Google Cloud’s most compelling growth engine. Overwhelmingly driven by Generative AI, this segment is where partners create customized AI solutions and integrations, moving projects from proof of concept to production. Beyond Gen AI, cybersecurity, application modernization, and infrastructure support are also major revenue drivers.
Adopt: drive expansion and prove ROI (17% of Multiplier): Ensures customers effectively use Google Cloud and realize its value, identifying cross-sell and upsell opportunities and fueling overall PEM. Partners that focus here are best positioned for identifying cross-sell and upsell opportunities, driving increased Google Cloud consumption, and setting the stage for subsequent “Build” opportunities within the same customer.
Manage: Secure Recurring Revenue (18% of Multiplier): Provides ongoing operational support through managed services, offering a clear pathway to recurring revenue and ensuring continuous customer value.
For a detailed breakdown of each flywheel segment’s contribution, including specific dollar values, we encourage you to explore the accompanying Canalys factsheet.
Your long-term edge: The Google Cloud multiplier
The revenue opportunity unfolds strategically across three years. The first year, largely advisory and migration services, hold 51.6% of the multiplier opportunity, but the most significant growth for Google Cloud partners unfolds afterward. Transformative opportunities, particularly transitioning Generative AI proofs of concept to production, typically emerge from year three.
Google Cloud’s ongoing innovation in AI and data powers these later-year opportunities. By building your practice to leverage these advancements and guiding customers to deeper, innovative usage, you achieve sustained growth and a thriving services practice. This approach creates an enduring, valuable services practice, powered by Google Cloud, that supports customers throughout their entire journey.
We’re committed to supporting partner success. Connect with your Partner Development Manager and utilize Partner Network Hub resources to strategize your services growth. Let’s grow and innovate, together.
As organizations build new generative AI applications and AI agents to automate business workflows, security and risk management management leaders face a new set of governance challenges. The complex, often opaque nature of AI models and agents, coupled with their reliance on vast datasets and potential for autonomous action, creates an urgent need to apply better governance, risk, and compliance (GRC) controls.
Today’s standard compliance practices struggle to keep pace with AI, and leave critical questions unanswered. These include:
How do we prove our AI systems operate in line with internal policies and evolving regulations?
How can we verify that data access controls are consistently enforced across the entire AI lifecycle, from training to inference to large scale production?
What is the mechanism for demonstrating the integrity of our models and the sensitive data they handle?
We need more than manual checks to answer these questions, which is why Google Cloud has developed an automated approach that is scalable and evidence-based: the Recommended AI Controls framework, available now as a standalone service and as part of Security Command Center.
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed43e2c24c0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Google Cloud’s AI Protection provides full lifecycle safety and security capabilities for AI workloads from development and training to runtime and large scale production. In addition, it is paramount to not only secure AI workloads, but to also audit whether they adhere to compliance, and ensure we are able to define controls for AI assets and monitor drift. Google Cloud has taken a more holistic approach to define best practices for platform components.
Below is an example of AI workload:
Foundation components of AI workloads.
How the Recommended AI Controls Framework can help audit AI workloads
Audit Manager helps you identify compliance issues earlier in your AI compliance and audit process, integrating it directly into your operational workflows. Here’s how you can move from manual checklists to automated assurance for your generative AI workloads:
Establish your security controls baseline. Audit Manager provides a baseline to audit your generative AI workloads. These baselines are based on industry best practices and frameworks to help give you a clear, traceable directive for your audit.
Understand control responsibilities. Aligned with Google’s shared fate approach, the framework can help you understand the responsibility for each control — what you manage versus what the cloud platform provides — so you can focus your efforts effectively.
Run the audit with automated evidence collection. Evaluate your generative AI workloads against industry-standard technical controls in a simple, automated manner. Audit Manager can reduce manual audit preparation by automatically collecting evidence relative to the defined controls for your Vertex AI usage and supporting services.
Assess findings and remediate. The audit report will highlight control violations and deviations from recommended best practices. This can help your teams perform timely remediation before minor issues escalate into significant risks.
Create and share reports. Generate and share comprehensive, evidence-backed reports with a single click, which can support continuous compliance monitoring efforts with internal stakeholders and external auditors.
Enable continuous monitoring. Move beyond point-in-time snapshots. Establish a consistent methodology for ongoing compliance by scheduling regular assessments. This allows you to continuously monitor AI model usage, permissions, and configurations against best practices, and can help maintain a strong GRC posture over time.
Inside the Recommended AI Controls framework
The framework provides controls specifically designed for generative AI workloads, mapped across critical security domains. Crucially, these high-level principles are backed by auditable, technical checks linked directly to data sources from Vertex AI and its supporting Google Cloud services.
Here are a few examples of the controls included:
Access control:
Disable automatic IAM grants for default service accounts: This control restricts default service accounts with excessive permissions.
Disable root access on new Vertex AI Workbench user-managed notebooks and instances: This boolean constraint, when enforced, prevents newly created Vertex AI Workbench user-managed notebooks and instances from enabling root access. By default root access is enabled.
Data controls:
Customer Managed Encryption Keys (CMEK): Google Cloud offers organization policy constraints to help ensure CMEKusage across an organization. Using Cloud KMS CMEK gives you ownership and control of the keys that protect your data at rest in Google Cloud.
Configure data access control lists: You can customize these lists based on a user’s need to know. Apply data access control lists, also known as access permissions, to local and remote file systems, databases, and applications.
System and information integrity:
Vulnerability scanning: Our Artifact Analysis service scans for vulnerabilities in images and packages in Artifact Registry.
Audit and accountability:
Audit and accountability policy and procedures requirements: Google Cloud services write audit log entries to track who did what, where, and when with Google Cloud resources.
Configuration management:
Restrict resource service usage: This constraint ensures only customer-approved Google Cloud services are used in the right places. For example, production and highly sensitive folders have a list of Google Cloud services approved to store data. The sandbox folder may have a more permissive list of services, with accompanying data security controls to prevent data exfiltration in the event of a breach.
How to automate your AI audit in three steps
Security and compliance teams can immediately use this framework to move from manual checklists to automated, continuous assurance.
Select the framework: In the Google Cloud console, navigate to Audit Manager and select Google Recommended AI Controls framework from the library.
Define the scope: Specify the Google Cloud projects, folders, or organization where your generative AI workloads are deployed. Audit Manager automatically understands the relevant resources within that scope.
Run the assessment: Initiate an audit. Audit Manager collects evidence from the relevant services (including Vertex AI, IAM, and Cloud Storage) against the controls. The result is a detailed report showing your compliance status for each control, complete with direct links to the collected evidence.
Automate your AI assurance today
You can access the Audit Manager directly from your Google Cloud console. Navigate to the Compliance tab in your Google Cloud console, and select Audit Manager. For a comprehensive guide on using Audit Manager, please refer to our detailed product documentation.
We encourage you to share your feedback on this service to help us improve Audit Manager’s user experience.
Financial analysts spend hours grappling with ever-increasing volumes of market and company data to extract key signals, combine diverse data sources, and produce company research. Schroders is a leading global active investment manager. Being an active manager means understanding investment opportunities — combining rigorous research, innovative thinking and deep market perspective — to help build resilience and capture returns for clients.
To maximise its edge as an active manager, Schroders wants to enable its analysts to shift from data collection to the higher-value strategic thinking that is critical for business scalability and client investment performance.
To achieve this, Schroders and Google Cloud collaborated to build a multi-agent research assistant prototype using Vertex AI Agent Builder.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ed4484d11c0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Why multi-agent systems?
At Schroders, analysts are typically responsible for conducting in-depth research on 20 to 30 companies, with another 20 under close watch. An initial report on a new company can take days to complete, most of which are primarily spent gathering quality data. Reducing this research down to minutes would allow analysts to screen more companies, directly increasing their potential to discover promising investment opportunities for their clients. An AI assistant offers a significant productivity boost in driving early-stage company research.
An AI agent is a software system that can perceive its environment, take actions, and employ tools to achieve specific goals. It shows reasoning, planning, and memory, and has a level of autonomy to make decisions, learn, and adapt. Tools are crucial functions or external resources that an agent can utilize to interact with its environment and enhance its capabilities, enabling them to take actions on a user’s behalf.
Standalone generative AI models often struggle with complex, multi-step financial research workflows, which require ordered data retrieval and reasoning (i.e., fetching fundamentals, filings, and news, and then synthesizing analysis). Given the complexity of its use case, Schroders opted to build a multi-agent system due to the following characteristics:
Specialization: Designing agents which are hyper-focused on specific tasks (e.g., R&D Agent, Working Capital Agent, etc.) with only the necessary tools and knowledge for their respective domains.
Modularity and scalability: Each agent is a distinct component developed, tested, and updated independently thereby simplifying development and debugging.
Complex workflow orchestration: Multi-agent systems model their workflows as graphs of interacting agents. For example, a Porter’s 5 Forces Agent designed to identify and analyze industry competition, could trigger child agents like a Threat of New Entrants Agent, in parallel or sequence, to better manage dependencies between deterministic (e.g., calculations) and non-deterministic (e.g., summarization) tasks.
Simplified tool integration: Specialized agents can handle specific toolsets (i.e., an R&D Agent using SQL database query tools) rather than having a single agent manage numerous APIs.
Leveraging Vertex AI Agent Builder
Schroders selected Vertex AI Agent Builder as the core platform for developing and deploying its multi-agent system. This choice provided several key benefits that helped accelerate development, including access to state-of-the-art Google foundation models like Gemini and pre-built connectors for various tools and data sources.
For example, Vertex AI Agent Builder provided easy tool integration for leveraging:
Internal knowledge: Grounding with Vertex AI Search tool was leveraged to ground Gemini to private document corpus, such as internal research notes, using, enabling agents to answer questions based on Schroder’s proprietary data.
Example tool call:search_internal_docs(query="analyst notes for $COMPANY", company_id="XYZ").
Structured data: To simplify financial data querying for analysts in BigQuery, agents employed a custom tool to translate natural language into SQL queries.
Example flow: User: “What were $COMPANY’s revenues for the last 3 quarters?” -> Agent -> SQL Query on BigQuery.
Public web data: The team integrated Grounding with Google Search tool for handling real-time public information like news and market sentiment.
Example tool call:google_search(query="latest news $COMPANY stock sentiment").
Vertex AI’s flexible orchestration supports both native function calling and frameworks like LangGraph, CrewAI, and LangChain, allowing the team to prototype its multi-agent system with function calling before transitioning to a specific framework. In addition, Vertex AI offers seamless integration with other Google Cloud services and tools that help facilitate rapid agent governance and management, including Cloud Logging, Cloud Monitoring, IAM Access Control, Vertex AI evaluation, BigQuery and more.
The evolution of the Vertex AI to support building multi-agent systems, including the latest Agent Development Kit (ADK) and Agent-to-Agent (A2A) protocol, offers future opportunities to further streamline agent development, productization, and integration with existing agent deployments.
Framework choices and implementation tradeoffs
One of the most critical decisions was framework selection for agent orchestration. Initially, native function calling helped Schroders get familiar with Vertex AI Agent Builder and develop agent-building best practices. This approach kept things simple to start with and allowed finer-grained control and reliability over agent interactions and tool invocation, providing easier debugging and faster iterative development for simple, linear agent design and workflows. However, it also required significant custom code to manage state and errors, track dependencies, and handle retry and control logic — all of which created significant complexity.
With a solid foundation in individual agents, Schroders decided to explore integrating multiple agents for achieving complex tasks and quickly recognized the need for a framework for better workflow state and inter-agent dependency management. Subsequently, the team transitioned to LangGraph, an open-source, multi-agent framework, primarily for its state management capabilities, native support for cyclical complex workflows and human in the loop checkpoints, which allow an agent to complete a task, update the state, and pass it to the configured sub agent. The adopted parent-child graph structure requires managing both parent and child agent states; child agents complete tasks with the parent graph leading the orchestration. This structured hierarchy often ends with a “summary” node aggregating child results. Each child stores its tool calls and AI messages before writing its final output to the parent.
Key features and system architecture deep dive
Schroder’s multi-agent system is designed for intuitive, flexible end-user interaction. An analyst creates an agent by providing a name, a description, prompt template sections (e.g., objective, instructions, constraints), and selecting tools. For example, an agent that receives the user query, “Summarize recent earnings and news sentiment for Company X, highlighting any changes in management guidance,” would need access to company documents and market news tools. Agent configurations are versioned in Firestore, ensuring robust management for Create, Read, Update, and Delete (CRUD) operations.
A “quick chat” function allows users to smoke-test agents and tweak prompts. Tested agents join a pool of available agents, which users can then combine agents into “workflows” — directed graphs for multi-step processes. For instance, a Porter’s 5 Forces analysis agent will use pre-built agents and tools like Vertex AI AutoSxS Model Evaluation alongside child agents that integrate current information or internal document insights.
The following diagram illustrates the Google Cloud architecture for orchestring agents:
Here is an example query flow:
Router agent: Receives the user query, uses Gemini to classify intent and identify the target specialized agent or workflow (e.g., “Analyze Company XYZ” routes to the Porter’s 5 Forces Agent).
Task delegation: The router requests parameters and routes to the appropriate agent and workflow.
Agent execution and tools: Specialized agents execute tasks, interacting with configured tools, such as APIs and databases via secure gateways.
Response: Combined results from workflows or individual agent responses are returned.
Follow-ups: Conversation history is stored in Firestore, maintaining full context.
Here is an example workflow for a user wanting to analyse a company:
This distributed approach ensures each component focuses on its strength, providing vital flexibility that encourages user adoption.
Personalization and user adaptation
Personalization was key as a core goal of Schroder’s use case was supporting analysts’ unique workflows, not forcing rigid processes. To achieve this, the system uses customizable system instructions — underlying prompts that can be tuned by analysts and developers. A templating system gives developers control of generalized prompt parts and analysts over the business logic, helping to foster cross-functional collaboration. In addition, the system allows for personalized agent configuration. Analysts can prioritize or toggle on different tools and data sources depending on the research context. These tools are developer-built, restricting direct access to any underlying files like PDF documents. The team also decided to expose model parameters like temperature, allowing users to make small adjustments and modifications during development.
Measuring success: Agent evaluation and iteration
An agentic system is only valuable if it’s accurate, reliable, and truly helpful. These attributes are also important in generating quality investment research, which is vital for client trust and Schroders’ active capability. To address this, Schroders implemented a multi-faceted evaluation strategy, using Vertex AI Generative AI Evaluation. This approach includes:
Human-in-the-loop (HITL): Analysts review outputs for accuracy, relevance, completeness, and conciseness.
A ground truth dataset: This dataset is built based on structured analyst feedback, including corrections and data source indications.
Iterative refinement: Data is fed back into development to refine prompts, tool descriptions, orchestration logic, and identify needs for new agents, rapidly improving performance and trust.
Building a new financial future
Working together, Schroders and Google Cloud developed a successful prototype with Vertex AI Agent Builder, demonstrating that multi-agent systems are capable of tackling complex financial workflows. By combining specialized agents, good architecture and robust evaluation, the collaboration proved the feasibility of developing an equity research AI research assistant that can enhance analyst productivity significantly — reducing the time required to complete a detailed company analysis from days to minutes.
Along the way, the team also discovered several key learnings for building effective agents:
Meticulously decompose tasks. Thoroughly map analyst workflows, breaking them into the smallest logical, atomic units for clear multi-agent roles. Single-task agents are more effective at accomplishing their defined objective.
Prompt engineering is key. Generative AI foundation models rely heavily on tool descriptions, and ambiguity can impact reliability. Effective prompts, especially precise tool descriptions, are critical.
Tool reliability is non-negotiable. Agents are limited by their tools. Instability and bugs in tools can degrade performance and lead to incorrect outputs, which can then impact investment decision making. Implement robust error handling (retries, circuit breakers) and ensure good tool debugging.
Limit tool scope per agent. Agents perform better with fewer (e.g. less than five) highly relevant tools to avoid misuse.
Managing state is complex. Orchestrating multiple agents demands careful management of history and careful tracking of intermediate results. Frameworks like LangGraph or ADK can help significantly.
Leverage Agent-of-Agents. Power comes from collaboration, not overly complex individual agents. For complex tasks, it’s better to build single-responsibility, reusable atomic agents that can work together, carefully orchestrating their interactions.
User trust is earned. Always be transparent and consistent. High-quality, user feedback is essential for driving results that gain user trust and engagement.
In order to scale the prototype in the future, Schroders plans to explore more agents with sophisticated reasoning, support for new multimodal data types like images and charts, enhanced discoverability (Agent-to-Agent protocols), and more autonomy for routine tasks.
Amazon GameLift Servers, a fully managed service for deploying, operating, and scaling game servers for multiplayer games, is now available in two additional AWS Regions: Asia Pacific (Thailand) and Asia Pacific (Malaysia). With this launch, customers can now deploy GameLift fleets closer to players in Thailand and Malaysia, helping reduce latency and improve gameplay responsiveness.
This regional expansion supports both Amazon GameLift Servers managed EC2 and container-based hosting options. Developers can take advantage of features such as FlexMatch for customizable matchmaking, FleetIQ for cost-optimized instance management, and auto-scaling to manage player demand dynamically. The addition of these new regions enables game developers and publishers to better server growing player communities across Southeast Asia while maintaining high performance and reliability.
Starting today, domain name system (DNS) delegation for private hosted zone subdomains can be used with Route 53 inbound and outbound Resolver endpoints. This allows you to delegate the authority for a subdomain from your on-premises infrastructure to the Route 53 Resolver cloud service and vice versa, enabling a simplified cloud experience across namespaces in AWS and on your own local infrastructure.
AWS customers allow multiple organizations within their enterprise to individually manage their respective subdomains and subzones, whereas apex domains and parent hosted zones are typically overseen by a central team. Previously, these customers had to create and maintain conditional forwarding rules in their existing network infrastructure to enable services to discover one another across subdomains. However, conditional forwarding rules are difficult to maintain across large organizations and, in many cases, are not supported by on-premises infrastructure. With today’s release, customers can instead delegate authority of subdomains to Route 53 using name server records and vice versa, achieving compatibility with common, on-premises DNS infrastructure and removing the need for teams to use conditional forwarding rules throughout their organization.
Inbound and outbound delegation for Resolver endpoints is available globally in all AWS Regions, where Resolver endpoints are available, except in AWS GovCloud and Amazon Web Services in China. Inbound and outbound delegation is provided at no additional cost to Resolver endpoints usage. For more details on pricing, visit the Route 53 pricing page, and to learn more about this feature, visit the developer guide.
Today, Amazon EMR on EKS announces support for Service Quotas, improving visibility and control over EMR on EKS quotas.
Previously, to request an increase for EMR on EKS quotas, such as maximum number of StartJobRun API calls per second, customers had to open a support ticket and wait for the support team to process the increase. Now, customers can view and manage their EMR on EKS quota limits directly in the Service Quotas console. This enables automated limit increase approvals for eligible requests, improving response times and reducing the number of support tickets. Customers can also set up Amazon CloudWatch alarms to get automatically notified when their usage reaches a certain percentage of a maximum quota.
Amazon EMR on EKS support for Service Quotas is available in all Regions where Amazon EMR on EKS is currently available. To get started, visit the Service Quotas User Guide.
Now generally available, Amazon CloudWatch helps you accelerate operational investigations across your AWS environment in just a fraction of the time. With a deep understanding of your AWS cloud environment and resources, CloudWatch investigations use an AI agent to look for anomalies in your environment, surface related signals, identify root-cause hypotheses, and suggest remediation steps, significantly reducing mean time to resolution (MTTR).
This new CloudWatch investigations capability works alongside you throughout your operational troubleshooting journey from issue triage through remediation. You can initiate an investigation by selecting the Investigate action on any CloudWatch data widget across the AWS Management Console. You can also start investigations from more than 80 AWS consoles, configure to auto trigger from a CloudWatch alarm action, or initiate from an Amazon Q chat. The new investigation experience in CloudWatch allows teams to collaborate and add findings, view related signals and anomalies, and review suggestions for potential root cause hypotheses. This new capability also provides remediation suggestions for common operational issues across your AWS environment by surfacing relevant AWS Systems Manager Automation runbooks, AWS re:Post articles, and documentation. It also integrates with popular communication channels such as Slack and Microsoft Teams.
The Amazon CloudWatch investigations capability is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (Spain), and Europe (Stockholm).
The CloudWatch investigations capability is now generally available at no additional cost. It was previously launched in preview as Amazon Q Developer operational investigations. To learn more, see getting started and best practice documentation.
Amazon Bedrock Guardrails announces tiers for content filters and denied topics, offering additional flexibility and ease of use towards choosing features and expanded language support depending on customer use cases. With a new Standard tier, Guardrails now detects and filters undesirable content with better contextual understanding including modifications such as typographical errors, and support for up to 60 languages.
Bedrock Guardrails provides configurable safeguards to help detect and block harmful content and prompt attacks, define topics to deny and disallow specific topics, and helps redact personally identifiable information (PII) such as personal data from input prompts and model responses. Additionally, Bedrock Guardrails helps detect and block model hallucinations, and identify, correct, and explain factual claims in model responses using Automated Reasoning checks. Guardrails can be applied across any foundation model including those hosted with Amazon Bedrock, self-hosted models, and third-party models outside Bedrock using the ApplyGuardrail API, providing a consistent user experience and helping to standardize safety and privacy controls.
The new Standard tier enhances the content filters and denied topics safeguards within Bedrock Guardrails by offering better robust detection of prompt and response variations, strengthened defense against all categories of content filters including prompt attacks, and broader language support. The improved prompt attacks filter clearly distinguishes between jailbreaks and prompt injection on the backend while protecting against other threats including output manipulation. To access the Standard tier’s capabilities, customers must explicitly opt in to cross-region inference with Bedrock Guardrails.
Today, AWS launches Intelligent Search on AWS re:Post and AWS re:Post Private — offering a more efficient and intuitive way to access AWS knowledge across multiple sources. This new capability transforms how builders find information, providing synthesized answers from various AWS resources in one place. Intelligent Search streamlines the process of finding relevant AWS information by unifying results from re:Post community discussions, AWS Official documentation, and other public AWS knowledge sources. Instead of manually searching through multiple pages, users receive contextually relevant answers directly, saving time and effort. For instance, when troubleshooting an IAM permissions error, developers can ask a question in natural language and immediately receive a comprehensive response drawing from diverse AWS resources. This feature is particularly valuable for developers, architects, and technical leaders who need quick access to accurate information for problem-solving and decision-making. By consolidating knowledge from various AWS sources, Intelligent Search helps users find solutions faster, accelerating development processes and improving productivity. Intelligent Search is now available on repost.aws. re:Post Private customers can also utilize this feature if artificial intelligence capabilities are enabled in their instance. For setup instructions, see the re:Post Private Administration Guide.
Box is one of the original information sharing and collaboration platforms of the digital era. They’ve helped define how we work, and have continued to evolve those practices alongside successive waves of new technology. One of the most exciting advances of the generative AI era is that now, with all the data that Box users have stored, they can get considerably more value out of those files by using AI to search and synthesize their information in new ways.
That’s why Box created Box AI Agents, to intelligently discern and structure complex unstructured data. Today, we’re excited to announce the availability of the Box AI Enhanced Extract Agent. The Enhanced Extract Agent runs on Google’s most advanced Gemini 2.5 models, and they also feature Google’s Agent2Agent protocol, which allows secure connection and collaboration between AI agents across dozens of platforms in the A2A network.
The Box AI Enhanced Extract Agent gives enterprises users confidence in their AI, helping overcome any hesitations they might feel about gen AI technology and using it for business-critical tasks.
In this post, we’ll take a closer look at how our teams created the Box AI Enhanced Extract Agent and what others building new agentic AI systems might consider when developing their own solutions.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e58a1352760>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Getting more content with confidence
When it comes to data extraction, simply pulling out text from documents is no longer sufficient. A core objective that businesses need peace of mind on is uncertainty estimation, which we define as understanding how uncertain the model is about particular extraction. This is paramount when an organization is processing vast quantities of documents — such as searching tens of thousands of items where you’re trying to extract all the relevant and related values in each of those items — and you need to guide human review effectively and with confidence. The goal isn’t just high accuracy, but also a reliable confidence score for each piece of extracted data.
With the Box AI Enhanced Extract Agent, we wanted to transform how businesses interact with their most complex content — whether that’s scanned PDFs, images, slides, and other diverse materials — and then turn it all into structured, actionable intelligence.
For instance, financial services organizations can automate loan application reviews by accurately extracting applicant details and income data; legal teams can accelerate discovery by pinpointing critical clauses in contracts; and HR departments can streamline onboarding by processing new hire paperwork automatically. In each of these cases, all extracted data like key dates and contractual terms can be validated by the crucial confidence scores that this Box and Google collaboration delivers. This confidence score helps ensure reliable, AI-vetted information powers efficient operations and proactive compliance without extensive manual effort.
Powering enhanced data extraction with Gemini 2.5 Pro
Box’s Enhanced Extract Agent leverages the sophisticated multimodal, agentic reasoning and capabilities of Google’s Gemini 2.5 Pro as its core intelligence engine. However, the relationship goes beyond simple API calls.
“Gemini 2.5 Pro is way ahead due to its multimodal, deep reasoning, and code generation capabilities in terms of accuracy compared to previous models for these complex extraction tasks,” Ben Kus, CTO at Box said. “These capabilities make Gemini a crucial component for achieving Box’s ambitious goals of turning unstructured content into structured content through enhanced extraction agents.”
To build robust confidence scores and enable deeper understanding, Box’s AI Agents acquire specific, granular information that the Gemini 2.5 Pro model is uniquely adept at providing.
An agent-to-agent protocol for deeper collaboration
Box is championing an open AI ecosystem by embracing Google Cloud’s Agent2Agent protocol, enabling all Box AI Agents to securely collaborate with diverse external agents from dozens of partners (a list that keeps growing). By adopting the latest A2A specification, Box AI can ensure efficient and secure communication for complex, multi-system processes. This empowers organizations to power complex, cross-system workflows—bringing intelligence directly to where content lives, boosting productivity through seamless agent collaboration.This advanced interplay leverages the proposed agent-to-agent protocol in the following manners:
Box’s AI Agents: Orchestrate the overall extraction task, manages user interactions, applies business logic, and crucially, performs the confidence scoring and uncertainty analysis.
Google’s Gemini 2.5 Pro: Provides the core text comprehension, reasoning, and generation; and in this enhanced protocol, Gemini models also aim to furnish deeper operational data (like token likelihoods) to its counterpart.
This protocol, for example, allows Box’s Enhanced Extract Agent to “look under the hood” of Gemini 2.5 Pro to a greater extent than typical AI model integrations. This deeper insight is essential for:
Building Reliable Confidence Scores: Understanding how certain Gemini 2.5 Pro is about each generated token allows Box AI’s enhanced data extraction capabilities to construct more accurate and meaningful confidence metrics for the end-user.
Enhancing Robustness: Another key area of focus is model robustness ensuring consistent outputs. As Kus put it: “For us robustness is if you run the same model multiple times, how much variation we would see in the values. We want to reduce the variations to be minimal. And with Gemini, we can achieve this.”
Furthering this commitment to an open and extensible ecosystem, Box AI Agents will be published on Agentspace and will be able to interact with other agents using the A2A protocol. Box has also published support for the Google’s Agent Development Kit (ADK) so developers can build Box capabilities into their ADK agents, truly integrating Box intelligence across their enterprise applications.
The Google ADK, an open-source, code-first Python toolkit, empowers developers to build, evaluate, and deploy sophisticated AI agents with flexibility and control. To expand these capabilities, we have created the Box Agent for Google ADK , which allows developers to integrate Box’s Intelligent Content Management platform with agents built with Google ADK, enabling the creation of custom AI-powered solutions that enhance content workflows and automation.
This integration with ADK is particularly valuable for developers, as it allows them to harness the power of Box’s Intelligent Content Management capabilities using familiar software development tools and practices to craft sophisticated AI applications. Together, these tools provide a powerful, streamlined approach to build innovative AI solutions within the Box ecosystem.
Continual learning and human-in-the-loop, for the most flexible AI
The vision for enhanced extract includes a dynamic, self-improving system. “We want to implement that cycle so that you can get higher and higher confidence,” Kus, Box’s CTO, said. “This involves a human-in-the-loop process where low-confidence extractions are reviewed, and this feedback is used to refine the system.”
Here, the flexibility of Gemini 2.5 Pro, particularly concerning fine-tuning, enables continual improvement. Box is exploring advanced continual learning approaches, including:
In-context learning: Providing corrected examples within the prompt to Gemini 2.5 Pro.
Supervised fine-tuning: Google Cloud’s Vertex AI allows Box to store the fine-tuned weights in the company’s system and then just use them to run their fine-tuned model.
Box AI’s Enhanced Extract Agent would manage these fine-tuned adaptations (for example through small LoRA layers specific to a customer or document template) and provide them to the Gemini 2.5 Pro agent at inference time. “Gemini 2.5 Pro can be used to leverage these adaptations efficiently, using the context caching capability of Gemini models on Vertex AI to tailor its responses for specific, high-value extraction tasks using in-context learning. This allows for ‘true adaptive learning,’ where the system continuously improves based on user feedback and specific document nuances,” Kus said.
The future: Premium document intelligence powered by advanced AI collaboration
The Enhanced Extract Agent — underpinned by Gemini 2.5 Pro’s features such as multimodality, intelligent reasoning, planning and tool-calling, and large context windows — is envisioned as as a key differentiator that Box leverages in developing their AI Hub and Agent family. Box views the Enhanced Extract Agent as a fundamental way in which organizations can build more confidence in how they deploy AI in the enterprise.
For the Google team, it’s been exciting to see the production-grade, scalable use of our Gemini models by Box. Their solution not only provides extracted data, but meta-data semantics enabling a high degree of confidence and a system that uses the Box content and agents on top of Gemini models to enable the Enhanced Data Extraction Agent to adapt and learn over time.
The ongoing collaboration between Box and Google Cloud focuses on unlocking the full potential of models like Gemini 2.5 Pro for complex enterprise use cases, which are rapidly redefining the future of work and paving the way for the next generation of document intelligence powering the agentic workforce.
The Customer Carbon Footprint Tool (CCFT) and Data Exports now show emissions calculated using the location-based method (LBM), alongside emissions calculated using the market-based method (MBM) which were already present. In addition, you can now see the estimated emissions from CloudFront usage in the service breakdown, alongside EC2 and S3 estimates.
LBM reflects the average emissions intensity of grids on which energy consumption occurs. Electricity grids in different parts of the world use various sources of power, from carbon-intense fuels like coal, to renewable energy like solar. With LBM, you can view and validate trends in monthly carbon emissions that more directly align to your cloud usage, and get insights into the carbon intensity of the underlying electricity grids in which AWS data centers operate. This empowers you to make more informed decisions about optimizing your cloud usage and achieving your overall sustainability objectives. To learn more about the differences between LBM and MBM see the GHG Protocol Scope 2 Guidance.
Amazon S3 now supports sort and z-order compaction for Apache Iceberg tables, available both in Amazon S3 Tables and general purpose S3 buckets using AWS Glue Data Catalog optimization. Sort compaction in Iceberg tables minimizes the number of data files scanned by query engines, leading to improved query performance and reduced costs. Z-order compaction provides additional performance benefits through efficient file pruning when querying across multiple columns simultaneously.
S3 Tables provide a fully managed experience where hierarchical sorting is automatically applied on columns during compaction when a sort order is defined in table metadata. When multiple query predicates need to be prioritized equally, you can enable z-order compaction through the S3 Tables maintenance API. If you are using Iceberg tables in general purpose S3 buckets, optimization can be enabled in the AWS Glue Data Catalog console, where you can specify your preferred compaction method.
These additional compaction capabilities are available in all AWS Regions where S3 Tables or optimization with the AWS Glue Data Catalog are available. To learn more, read the AWS News Blog, and visit the S3 Tables maintenance documentation and AWS Glue Data Catalog optimization documentation.
Today, Amazon SageMaker HyperPod announces the general availability of Amazon EC2 P6-B200 instances powered by NVIDIA B200 GPUs. Amazon EC2 P6-B200 instances offer up to 2x performance compared to P5en instances for AI training.
P6-B200 instances feature 8 Blackwell GPUs with 1440 GB of high-bandwidth GPU memory and a 60% increase in GPU memory bandwidth compared to P5en, 5th Generation Intel Xeon processors (Emerald Rapids), and up to 3.2 terabits per second of Elastic Fabric Adapter (EFAv4) networking. P6-B200 instances are powered by the AWS Nitro System, so you can reliably and securely scale AI workloads within Amazon EC2 UltraClusters to tens of thousands of GPUs.
The instances are available through SageMaker HyperPod flexible training plans in US West (Oregon) AWS Region. For on-demand reservation of B200 instances, please reach out to your account manager.
Amazon SageMaker AI lets you easily train machine learning models at scale using fully managed infrastructure optimized for performance and cost. To get started with SageMaker HyperPod, visit the webpage and documentation.
We’re excited to announce the general availability of UDP ping beacons for Amazon GameLift Servers, a new feature that enables game developers to measure real-time network latency between game clients and game servers hosted on Amazon GameLift Servers. With UDP ping beacons, you can now accurately measure latency for UDP (User Datagram Protocol) packet payloads across all AWS Regions and Local Zones where Amazon GameLift Servers is available.
Most multiplayer games use UDP as their primary packet transmission protocol due to its performance benefits for real-time gaming and optimizing network latency is crucial for delivering the best possible player experience. UDP ping beacons provide a reliable way to measure actual UDP packet latency between players and game servers, helping make better decisions about player-to-server matching and game session placement.
The beacon endpoints are available in all AWS Global Regions and Local Zones supported by Amazon GameLift Servers, except AWS China, and through the ListLocations API, making it easy to programmatically access the endpoints.
Hospitals, while vital for our well-being, can be sources of stress and uncertainty. What if we could make hospitals safer and more efficient — not only for patients but also for the hard-working staff who care for them? Imagine if technology could provide an additional safeguard, predicting falls, or sensing distress before it’s even visible to the human eye.
Many hospitals today still rely on paper-based processes before transforming critical information to digital systems, leading to frequent — and sometimes, remarkably absurd — inefficiencies. In-person patient monitoring, while standard practice, can be slow, incomplete, and subject to human error and bias. In one serious incident, shared by hospital staff, a patient fell shortly after getting out of bed at 5 a.m. and wasn’t discovered until the routine 6:30 a.m. check. Events like this underscore the need for continuous, 24/7 in-room monitoring solutions that can alert staff immediately in high-risk and emergency situations.
Driven by a shared vision to enhance patient care, healthcare innovator Hypros and Google Cloud joined forces to develop an AI-assisted patient monitoring system that detects and alerts staff to in-hospital patient emergencies, such as out-of-bed falls, delirium onset, or pressure ulcers. This innovative privacy-preserving solution enables better prioritization of care and a strong foundation for clinical decision-making — all without the use of invasive cameras.
While the need for 24/7 patient monitoring is clear, developing these solutions raises important concerns around privacy and professional conduct. Privacy is paramount in any patient-monitoring technology for both the individuals receiving care and the professionals providing it. Even seemingly simple aspects, such as interventions within the patient’s immediate surroundings, require strict compliance with hospital hygiene policies — a lesson reinforced during the COVID-19 pandemic.
It’s crucial to monitor and correct any mistakes without singling out individuals. By using tools like low-resolution sensors, we can protect people’s identities and reduce the risk of unfair judgment, keeping the focus squarely on improving care. This approach is especially valuable, since the root cause of errors, more often than not, extend beyond the individual. As a result, ethical technology deployment of monitoring, AI or otherwise, means ensuring that the efficiencies or insights gained never compromise fundamental rights and well-being.
Figure 1: Patient monitoring device from Hypros.
The approach for continuous patient monitoring hinges on two key innovations:
Non-invasive IoT devices: Hypros developed a novel battery-powered Internet of Things (IoT) device that can be mounted on the ceiling. This device uses low-resolution sensors to capture minimal optical and environmental data, creating a very low-detail image of the scene. The device is designed to be non-invasive, preserving anonymity while still gathering the crucial information needed to detect any meaningful changes in a patient’s environment or condition.
Two-stage AI workflow: Hypros employ a two-stage machine learning (ML) workflow. Initially, they trained a camera-based vision model using AutoML on Vertex AI to label sensor data from simulated hospital scenarios. Next, they use this labeled dataset to train a second model to interpret low-resolution sensor data.
The following sections explain how Hypros implemented these innovations into their patient monitoring solution, and how Google Cloud assisted Hypros in this endeavor.
Low resolution, high information: Securing patient privacy
To address the critical need for patient privacy while enabling effective hospital bed monitoring, Hypros developed a compact, mountable IoT device (see Figure 1) equipped with low-resolution optical and environmental sensors. This innovative solution operates on battery power, facilitating easy installation and relocation to various bed locations as needed.
Figure 2: How a bed with a patient scene is abstracted to low resolution sensor data.
The device’s low-resolution optical sensors are effective for protecting patient privacy, they also can make data interpretation and analysis more complex. Additionally, low sampling rates and environmental factors can introduce noise and sparsity into the data, resulting in an incomplete representation of human behavior in the hospital. The combination of low-resolution imaging, limited sampling rates, and environmental noise creates a complex data landscape that requires sophisticated algorithms and interpretive models to extract meaningful insights.
Figure 3: Real-world data: Bed sheets changed by Staff, and Patient gets into bed. This is a “simple” scenario.
Despite these challenges, Hypros’ device represents a significant advancement in privacy-preserving patient monitoring, offering the potential to enhance hospital workflow efficiency and patient care without compromising individual privacy.
Patient monitoring with AI: Overcoming low-resolution data challenges
While customized parametric algorithms can partially interpret sensor data, they have difficulty handling complex relationships and edge cases. ML algorithms offer clear advantages, making AI a vital tool for a patient monitoring system.
However, the complexity of their sensor data makes it difficult for AI to independently learn the detection of critical patient conditions, and thus, unsupervised learning techniques would not yield useful results. In addition, manual data labeling can quickly become expensive as tight monitoring sends readings every few seconds, quickly producing large volumes of data.
To solve these issues, Hypros adopted an innovative approach that would allow AI to learn how to detect scenarios from their monitoring devices with minimal labeling effort. They found that using pre-trained AI models, which require fewer examples to learn a new image-based task, can simplify labeling image data. However, these models struggled to interpret their low-resolution sensor data directly.
Therefore, they use a two-step process. First, they train a camera-based vision model using camera data to produce a larger, labelled dataset.Then, they transfer these labels to concurrently recorded sensor data, which they use to train a patient monitoring model. This unique approach enables the system to reliably detect events of interest, such as falls or early signs of delirium, without compromising patient privacy.
Driving healthcare innovation with Google Cloud
Hypros relied heavily on Google Cloud to build their patient monitoring system, particularly its data and AI services. The first crucial step was collecting useful data to train their AI models.
They began by replicating a physical hospital room environment within their offices. This controlled setting enabled them to simulate various realistic scenarios, gather data, and record video. During this phase, they also collaborated closely with hospitals to ensure that the characteristics specific to each use case were accurately determined.
Next, they trained a camera-based vision model with AutoML on VertexAI to label sensor data. This process was remarkably straightforward and efficient. Within approximately two weeks, their initial AutoML camera-based vision model used for labeling achieved an average precision exceeding 91% across all confidence thresholds. Already impressive, the actual performance was higher as labeling discrepancies artificially lowered the results.
Subsequently, they labeled various video recordings from hospital beds and correlated these labels with their device data for model training. This approach allowed the model to learn how to interpret sensor data sequences by observing and learning from the corresponding video. For training use cases that didn’t incorporate video information, they relied on data or simulation methodologies from their hospital partners.
The speed of development cycles is also a critical competitive advantage. Therefore, they mapped every step in their workflow and model development cycles (see Figure 4) to the following Google Cloud services:
Cloud Storage: Stores all raw data, enabling easy rollbacks and establishing a clear baseline for ongoing improvements.
BigQuery: Stores labeled data for easier querying, and analysis querying and analysis. Easy access to the right data helps them iterate, analyze, debug, and refine their models more efficiently.
Artifact Registry: Hosts their custom Docker images in ETL and training pipelines. Fewer downloads, shorter builds, and better software dependency management provides smoother, more optimized operations.
Apache Beam with Dataflow Runner: Processes large volumes of data at high speed, keeping their pipelines fast and maximizing their development time.
Vertex AI: Provides a unified platform for model registration, experiment tracking, and visualizing results in TensorBoard; training is done with TensorFlow and TFRecords, using customized resources (like GPUs) and easy deployment options simplify rolling out new model versions.
Figure 4: Simple workflow directed graph to highlight technologies used
With Google Cloud’s ability to handle petabytes of data, they know their workflows are highly scalable. Having a powerful, flexible platform lets them focus on delivering value from data insights, rather than worrying about infrastructure.
Further possibilities: Distilling nuanced information
The development of their system has sparked more ideas about ways hospitals can benefit from using sensor data and AI. They see three main areas of care where continuous patient monitoring can help: patient-centric care for better outcomes, staff-centric support to optimize their time, and environmental monitoring for safer spaces.
Some potential use cases include:
People detection: Anonymously detect individuals to improve operations, such as bed occupancy for patient flow management.
Fall prevention and detection: Alert staff about patient falls or flag restless behavior to prevent them.
Pressure ulcers: Monitor 24/7 movement to aid clinical staff in repositioning patients effectively to prevent the development of pressure ulcers (bedsores).
Delirium risk indicators: Track sleep disruption factors like light and noise, which are potential indicators of delirium risk (final correlation requires additional data from other sources).
General environmental analysis:Monitor temperature, humidity, noise, and other environmental data for smarter building responses in the future (e.g., energy savings through optimized heating) and more effective patient recovery.
Hand hygiene compliance: Anonymously track hand disinfection compliance to improve hygiene practices in combination with solutions like the Hypros’ Hand Hygiene Monitoring solution – NosoEx.
Instead of stockpiling sensor data, their system uses advanced AI models to interpret and connect data from multiple streams — turning simple raw readings into practical insights that guide better decisions. Real-time alerts also bring timely attention to critical situations — ensuring patients receive the swift and focused care they deserve, and staff can perform at their very best.
The path forward with patient care
Already, Hypros’ patient monitoring system is gaining momentum, with real-world trials at leading institutions like UKSH (University Hospital Schleswig-Holstein) in Germany. As highlighted by their recent press release, the UKSH recognizes the potential of their solutions to transform patient care and improve operational efficiency. In addition, their clinical partner, the University Medical Center Greifswald, has experienced benefits firsthand as an early adopter.
Dr. Robert Fleishmann, a managing senior physician and deputy medical director at the University Medical Center Greifswald, is convinced of its usefulness, saying:
“The prevention of delirium is crucial for patient safety. The Hypros patient monitoring solution provides us with vital data to examine risk factors (e.g., light intensity, noise levels, patient movements) contributing to the development of delirium on a 24/7 basis. We are very excited about this innovative partnership.”
This positive feedback, alongside the voices of other customers, fuels Hypros’ ongoing commitment to revolutionize patient care through ethical and data-driven technology.
By harnessing the power of AI and cloud computing, in close collaboration with Google Cloud, Hypros is dedicated to developing privacy-preserving patient monitoring solutions that directly address critical healthcare challenges such as staffing shortages and the ever-increasing need for enhanced patient safety.
Building on this foundation, Hypros envisions a future where their AI-powered patient monitoring solutions are seamlessly integrated into healthcare systems worldwide. The goal is to empower clinicians with real-time, actionable insights, ultimately improving patient outcomes, optimizing resource allocation, and fostering a more sustainable and patient-centric healthcare ecosystem for all.
Recently, we announced Gemini 2.5 is generally available on Vertex AI. As part of this update, tuning capabilities have extended beyond text outputs – now, you can tune image, audio, and video outputs on Vertex AI.
Supervised fine tuning is a powerful technique to customize LLM output using your own data. Through tuning, LLMs become specialized in your business context and task by learning from the tuning examples, therefore achieving higher quality output. With video outputs, here’s some use cases our customers have unlocked:
Automated video summarization: Tuning LLMs to generate concise and coherent summaries of long videos, capturing the main themes, events, and narratives. This is useful for content discovery, archiving, and quick reviews.
Detailed event recognition and localization: Fine-tuning allows LLMs to identify and pinpoint specific actions, events, or objects within a video timeline with greater accuracy. For example, identifying all instances of a particular product in a marketing video or a specific action in sports footage.
Content moderation: Specialized tuning can improve an LLM’s ability to detect sensitive, inappropriate, or policy-violating content within videos, going beyond simple object detection to understand context and nuance.
Video captioning and subtitling: While already a common application, tuning can improve the accuracy, fluency, and context-awareness of automatically generated captions and subtitles, including descriptions of nonverbal cues.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6d910b6df0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Today, we will share actionable best practices for conducting truly effective tuning experiments using the Vertex AI tuning service. In this blog, we will cover the following steps:
Craft your prompt
Detect multiple labels
Conduct single-label video task analysis
Prepare video tuning dataset
Set the hyperparameters for tuning
Evaluate the tuned checkpoint on the video tasks
I. Craft your prompt
Designing the right prompt is a cornerstone of any effective tuning, directly influencing model behavior and output quality. An effective prompt for video tuning typically comprises several key components, ensuring clarity in the prompt.
Task context: This component sets the overall context and defines the intention of the task. It should clearly articulate the primary objective of the video analysis. For example….
Task definition: This component provides specific, detailed guidance on how the model should perform the task including label definitions for tasks such as classification or temporal localization. For example, in video classification, clearly define positive and negative matches within your prompt to ensure accurate model guidance.
Output specification: This component provides how the model is expected to produce its output. This includes specific rules or schema for structured formats such as JSON. To maximize clarity, embed a sample JSON object directly in your prompt, specifying its expected structure, schema, data types, and any formatting conventions.
II: Detect multiple labels
Multi-label video analysis involves detecting multiple labels corresponding to a single video. This is a desirable setup for video tasks since the user can train a single model for several labels and obtain predictions for all the labels via a single query request to the tuned model during inference time. These tasks are usually quite challenging for the off-the-shelf models and often need tuning.
See an example prompt below.
code_block
<ListValue: [StructValue([(‘code’, ‘Focus: you are a machine learning data labeller with sports expertise. rnrn### Task definition ###rnGiven a video and an entity definition, your task is to find out the video segments that match the definition for any of the entities listed below and provide the detailed reason on why you believe it is a good match. Please do not hallucinate. There are generally only few or even no positive matches in most cases. You can just output nothing if there are no positive matches.rnrn Entity Name: “entity1″rn Definition: “define entity 1”rn Labeling instruction: provide instruction for entity1rnrn Entity Name: “entity 2″rn Definition: “define entity 2”rn Labeling instruction: provide instruction for entity 2rnrn..rn..rnrn### Output specification ###rnYou should provide the output in a strictly valid JSON format same as the following example.rn[{rn”cls”: {the entity name},rn”start_time”: “Start time of the video segment in mm:ss format.”,rn”end_time”: “End time of the video segment in mm:ss format.”,rn},rn{rn”cls”: {the entity name},rn”start_time”: “Start time of the video segment in mm:ss format.”,rn”end_time”: “End time of the video segment in mm:ss format.”,rn}]rnBe aware that the start and end time must be in a strict numeric format: mm:ss. Do not output anything after the JSON content.rnrnYour answer (as a JSON LIST):’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6d910b6a30>)])]>
Challenges and mitigations for multi-label video tasks:
The tuned model tends to learn dominant labels (i.e., labels that appear more frequently in the dataset).
Mitigation: We recommend balancing the target label distribution as much as possible.
When working with video data, skewed label distributions are further complicated by the temporal aspect. For instance, in action localization, a video segment might not contain “event X” but instead feature “event Y” or simply be background footage.
Mitigation: For such use cases, we recommend using multi-class single-label design described below.
Mitigation: Improving the positive:negative instance ratio per label would further improve the tuned model’s performance.
The tuned model tends to hallucinate if the video task involves a large number of labels per instance (typically >10 labels per video input).
Mitigation: For effective tuning, we recommend using multi-label formulation for video tasks that involve less than 10 labels per video.
For video tasks that require temporal understanding in dynamic scenes (e.g. event detection, action localization), the tuned model may not be effective for multiple temporal labels that are overlapping or are very close.
III: Conduct single-label video task analysis
Multi-class single-label analysis involves video tasks where a single video is assigned exactly one label from a predefined set of mutually exclusive labels. In contrast to multi-label tuning, multi-class single-label tuning recipes show good scalability with an increasing number of distinct labels. This makes the multi-class single-label formulation a viable and robust option for complex tasks. For example, tasks that involve categorizing videos into one of many possible exclusive categories or detecting several overlapping temporal events in the video.
In such a case, the prompt must explicitly state that only one label from a defined set is applicable to the video input. List all possible labels within the prompt to provide the model with the complete set of options. It is also important to clarify how a model should handle negative instances, i.e., when none of the labels occur in the video.
See an example prompt below:
code_block
<ListValue: [StructValue([(‘code’, ‘You are a video analysis expert. rnrn### Task definition ###rnDetect which animal appears in the video.The video can only have one of the following animals: dog, cat, rabbit. If you detect none of these animals, output NO_MATCH.rnrn### Output specification ###rnGenerate output in the following JSON format:rn[{rn”animal_name”: “<CATEGORY>”,rn}]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6d8d2a2760>)])]>
Challenges and mitigations for multi-class single-label video tasks
Using highly skewed data distributions may cause quality regression on the tuned model. The model may simply learn to predict the majority class, failing to identify the rare positive instances.
Mitigation: Undersampling the negative instances or oversampling the positive instances to balance the distributions are some effective strategies for tuning recipes. The undersampling/oversampling rate depends on the specific use case at hand
Some video use cases can be formulated as both multi-class single-label tasks and multi-label tasks. For example, detecting time intervals for several events in a video.
For fewer event types with non-overlapping time intervals (typically fewer than 10 labels per video), multi-label formulation is a good option.
On the other hand, for several similar event types with dense time intervals, multi-class single-label recipes yield better model performance . Model inference involves sending a separate query for each class (e.g., “Is event A present?”, then “Is event B present?”). This approach effectively treats the multi-class problem as a series of N binary decisions. This would mean for N classes, you will need to send N inference requests to the tuned models.
This is a tradeoff between higher inference latency and cost vs target performance. The choice should be made based on expected target performance from the model for the use case.
IV. Prepare video tuning dataset
The Vertex Tuning API uses *.jsonl files for both training and validation datasets. Validation data is used to select a checkpoint from the tuning process. Ideally, there should be no overlap in the JSON objects contained within train.jsonl and validation.jsonl. Learn more about how to prepare tuning dataset and its limitations here.
For maximum efficiency when tuning Gemini 2.0 (and newer) models on video, we recommend to use the MEDIA_RESOLUTION_LOW setting, located within the generationConfig object for each video in your input file. It dictates the number of tokens used to represent each frame, directly impacting training speed and cost.
You have two options:
MEDIA_RESOLUTION_LOW (default): Encodes each frame using 64 tokens.
MEDIA_RESOLUTION_MEDIUM: Encodes each frame using 256 tokens.
While MEDIA_RESOLUTION_MEDIUM may offer slightly better performance on tasks that rely on subtle visual cues, it comes with a significant trade-off: training is approximately four times slower. Given that the lower-resolution setting provides comparable performance for most applications, sticking with the default MEDIA_RESOLUTION_LOW is the most effective strategy for balancing performance with crucial gains in training speed.
V. Set the hyperparameters for tuning
After preparing your tuning dataset, you are ready to submit your first video tuning job! We supports 3 hyperparameters:
epochs: specifies the number of iterations over the entire training dataset. With a dataset size of ~500 examples, starting with epochs = 5 is the default value for video tuning tasks. Increase the number of epochs when you have <500 samples and decrease when you have >500 samples.
learning_rate_multiplier: specifies multiplier for the learning rate. We recommended experimenting with values less than 1 if the model is overfitting and values greater than 1 if the model is underfitting.
adapter_size: specified the rank of the LoRA adapter. The default values are adapter_size=8 for flash model tuning. For most use cases, you won’t need to adjust this, but a higher size allows the model to learn more complex tasks.
To streamline your tuning process, Vertex AI provides intelligent, automatic hyperparameter defaults. These values are carefully selected based on the specific characteristics of your dataset, including its size, modality, and context length. For the most direct path to a quality model, we recommend starting your experiments with these pre-configured values. Advanced users looking to further optimize performance can then treat these defaults as a strong baseline, systematically adjusting them based on the evaluation metrics from their completed tuning jobs.
VI. Evaluate the tuned checkpoint on the video tasks
Vertex AI tuning service provides loss and accuracy graph for training and validation dataset out of the box. The monitoring graph is updated in real time as your tuning job progresses. Intermediate checkpoints are automatically deployed for you. We recommend selecting the checkpoint corresponding to the epochs that show loss values on the validation dataset have saturated.
To evaluate the tuned model endpoint, See a sample code snippet below:
For best performance, it is critical that the format, context and distribution of the inference prompts align with the tuning dataset. Also, we recommend using the same mediaResolution for evaluation as the one used during training.
For thinking models like Gemini 2.5 Flash, we recommend setting the thinking budget to 0 to turn off thinking on tuned tasks for optimal performance and cost efficiency. During supervised fine-tuning, the model learns to mimic the ground truth in the tuning dataset, omitting the thinking process.
Get started on Vertex AI today
The ability to derive deep, contextual understanding from video is no longer a futuristic concept—it’s a present-day reality. By applying the best practices we’ve discussed for prompt engineering, tuning dataset design, and leveraging the intelligent defaults in Vertex AI, you are now equipped to effectively tune Gemini models for your specific video-based tasks.
What challenges will you solve? What novel user experiences will you create? The tools are ready and waiting. We can’t wait to see what you build.