Today, we’re thrilled to announce another significant milestone for our Google Public Sector business: Google Distributed Cloud (GDC) & GDC air-gapped appliance achieved Department of Defense (DoD) Impact Level 6 (IL6) authorization. Google Public Sector is now able to provide DoD customers with a secure, compliant, and cutting-edge cloud environment at IL6, enabling them to leverage the full power of GDC for their most sensitive Secret classified data and applications. This accreditation builds on our existing IL5 and Top Secret accreditations, and solidifies Google Cloud’s ability to deliver secure solutions for digital sovereignty, critical national security and defense missions for the U.S. government.
Secure, distributed cloud for critical missions
This authorization comes at a crucial time, as the digital landscape is becoming increasingly complex, and the need for robust security measures is growing more urgent. Google’s collaboration with the U.S. Navy under the JWCC contract exemplifies its commitment to providing advanced infrastructure and cloud services for a resilient hybrid-cloud environment. Google Distributed Cloud provides a fully-managed solution designed specifically to uphold stringent security requirements, allowing U.S. intelligence and DoD agencies to host, control, and manage their infrastructure and services.
GDC can operate within Google’s trusted, secure, and managed data centers, or in forward deployed locations to provide the DoD and Intelligence Community with a comprehensive suite of secure cloud solutions. This platform unlocks the power of advanced cloud capabilities like data analytics, machine learning (ML), and artificial intelligence (AI). The isolated platform, physically located and managed by Google, ensures customers can trust the foundation of their sensitive workloads.
Google has accelerated AI services dramatically to support the DoD. Vertex AI and Google’s state of the art Gemini models are available now at IL6 and TS, supporting missions at the highest classification levels.
Next-gen cloud and AI capabilities at the tactical edge
In harsh, disconnected, or mobile environments, organizations face significant challenges in providing computing capabilities. The Google Distributed Cloud air-gapped appliance brings Google Cloud and AI capabilities to tactical edge environments. These capabilities unlock real-time local data processing for use cases such as cyber analysis, predictive maintenance, tactical communications kits, sensor kits, or field translation. The appliance includes Vertex AI and Pre-Trained Model APIs (Speech to Text, Translate, and OCR).
The appliance can be conveniently transported in a rugged case or mounted in a rack within customer-specific local operating environments and remain disconnected indefinitely based on mission need.
Enabling efficiency through digital transformation
Customers throughout the federal government today are using Google Cloud to help achieve their missions. For example, the Defense Innovation Unit (DIU) is using Google Cloud technology to develop AI models to assist augmented reality microscope (ARM) detection of certain types of cancer; the U.S. Air Force is using Vertex AI to overhaul their manual processes; and the U.S. Air Force Rapid Sustainment Office (RSO) is using Google Cloud technology for aircraft maintenance.
Learn more about how Google Cloud solutions can empower your agency and accelerate mission impact and stay up to date with our latest innovations by signing up for the Google Public Sector newsletter.
Everyone’s talking about AI agents, but the real magic happens when they collaborate to tackle complex tasks. Think: complex processes, data analysis, content creation, and customer support. In this hackathon, you’ll build autonomous multi-agent AI systems using Google Cloud and the open source Agent Development Kit (ADK).
This is your chance to dive deep into cutting-edge AI, showcase your skills, and contribute to the future of agent development.
Hands-on learning with the ADK: This is your chance to try out and contribute to Agent Development Kit (ADK). We’ll provide you with the resources, support, and expert guidance you need to build sophisticated multi-agent systems.
Real-world impact: Tackle real world problems that directly impact how work gets done, from automating complex processes and deriving data insights to changing customer service and content creation.
A showcase for your talent: Present your project to a panel of judges and demonstrate your expertise to a wide audience. Build working agents that can help your workflows and be the foundation for a future product.
And the rewards? Exciting prizes await!
We’re offering a range of exciting prizes:
Overall grand prize: $15,000 in USD, $3,000 in Google Cloud Credits for use with a Cloud Billing Account, 1 year of Google Developer Program Premium subscription at no-cost, virtual coffee with a Google team member, and social promo
Regional winners: $8,000 in USD, $1,000 in Google Cloud Credits for use with a Cloud Billing Account, virtual coffee with a Google team member, and social promo
Honorable mentions: $1000 in USD and $500 in Google Cloud Credits for use with a Cloud Billing Account
Unleash the power of the Agent Development Kit (ADK):
ADK is a flexible and modular framework designed for developing and deploying AI agents. It’s an open-source framework that offers tight integration with the Google ecosystem and Gemini models. ADK makes it easy to get started with simple agents powered by Gemini models and Google AI tools, while also providing the control needed for more complex agent architectures and orchestration.
What to build
Your project should demonstrate how to design and orchestrate interactions between multiple autonomous agents using ADK. Build in one of these categories:
Automation of complex processes: Design multi-agent workflows to automate complex, multi-step business processes, software development lifecycle, or manage intricate tasks.
Data analysis and insights: Create multi-agent systems that autonomously analyze data from various sources, derive meaningful insights using tools like BigQuery, and collaboratively present findings.
Customer service and engagement: Develop intelligent virtual assistants or support agents built with ADK as multi-agent systems to handle complex customer inquiries, provide personalized support, and proactively engage with customers.
Content creation and generation: Build multi-agent systems that can autonomously generate different forms of content, such as marketing materials, reports, or code, by orchestrating agents with specialized content generation capabilities.
Crucial note: Your project must be built using the Agent Development Kit (ADK), focusing on the design and interactions between multiple agents. Think ADK first, but feel free to supercharge your solution by integrating with other awesome Google Cloud technologies!
Ready to start building?
Head over to our hackathon website and watch our webinar to learn more, review the rules, and register.
Google Cloud’s Vertex AI platform makes it easy to experiment with and customize over 200 advanced foundation models – like the latest Google Gemini models, and third-party partner models such as Meta’s Llama and Anthropic’s Claude. And now, thanks to a major refresh focused on developer feedback, it’s even more efficient and intuitive.
The redesigned, developer-first experience will be your source for generative AI media models across all modalities. You’ll have access to Google’s powerfulgenerative AI media models such as Veo, Imagen, Chirp and Lyria in the Vertex AI Media Studio.These aren’t just cosmetic changes; they translate directly into five workflow benefits, from accelerated prototyping to experimentation:
Stay cutting-edge: Get hands-on experience with Google’s latest AI models and features as soon as they’re available.
Easier to start with AI in Cloud: The new design makes it easier for developers of all experience levels to start building with generative AI.
Accelerated prototyping: Quickly test ideas, iterate on prompts, and prototype applications faster than before.
Integrated end-to-end workflow: Move easily from ideation and prompting to grounding, tuning, code generation, and even test deployment – all within a single, cohesive environment…with a couple of clicks! Less tool-switching, more building!
Efficient experimentation: Vertex AI Studio provides a place to explore different models, parameters, and prompting techniques.
Dive in to see the key improvements.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Vertex AI Studio’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee588102dc0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
What’s new and how it works for you
We heard you wanted features to explore, iterate and boost your productivity. That’s why we’re making things easier and more powerful in three ways: faster prompting, easier ways to build, and a fresh interface.
Enhanced prompting capabilities:
Faster prompting: Get prompting faster. Our revamped overview provides quick access to samples and tools, complemented by a unified UI combining Chat and Freeform prompting for a smoother workflow.
Prompt management & enhancement: Simplify your prompt engineering by easily managing the lifecycle (create, refine, compare, save, track history) while simultaneously improving prompt quality and capabilities through techniques like variables, function calling, and adding examples.
Integrated prompt engineering: Access to tuning, evaluation, and batch prediction, all designed to optimize model performance.
Prompt with gen AI models in Vertex AI Studio
Better ways to build
Build with Gemini: Access and experiment with the latest Gemini models such as Gemini 2.5 to test:
Text generation
Image creation
Audio generation
Multimodal capabilities
and Live API directly within the Studio.
Build trust with grounded AI: Easily connect models to real-world, up-to-date information or your specific private data. Grounding with Google Search or Google Maps is simpler than ever. Need custom knowledge? Integrate effortlessly with your data via Vertex AI RAG Engine or Vertex AI Search. This dramatically improves the reliability and factual accuracy of model outputs, letting you build applications your users can trust.
Code generation & app deployment: Get sample code (Python, Android, Swift, Web, Flutter, cUrl), including direct integration to open Python in Colab Enterprise. You can also deploy the prompt as a test web application for quick proof-of-concept validation.
Fresher interface
Dark mode is here: Recognizing that many developers prefer darker interfaces for extended sessions, you can now experience dark mode across the entire Vertex AI platform for improved visual comfort and focus. Activate it easily in your Cloud profile user preferences.
Get started with Vertex AI today
We’re committed to continually refining Vertex AI Studio based on your feedback, which you can share right in the console, ensuring you have the tools you need for building the next generation of AI applications.
Since November 2024, Mandiant Threat Defense has been investigating an UNC6032 campaign that weaponizes the interest around AI tools, in particular those tools which can be used to generate videos based on user prompts. UNC6032 utilizes fake “AI video generator” websites to distribute malware leading to the deployment of payloads such as Python-based infostealers and several backdoors. Victims are typically directed to these fake websites via malicious social media ads that masquerade as legitimate AI video generator tools like Luma AI, Canva Dream Lab, and Kling AI, among others. Mandiant Threat Defense has identified thousands of UNC6032-linked ads that have collectively reached millions of users across various social media platforms like Facebook and LinkedIn. We suspect similar campaigns are active on other platforms as well, as cybercriminals consistently evolve tactics to evade detection and target multiple platforms to increase their chances of success.
Mandiant Threat Defense has observed UNC6032 compromises culminating in the exfiltration of login credentials, cookies, credit card data, and Facebook information through the Telegram API. This campaign has been active since at least mid-2024 and has impacted victims across different geographies and industries. Google Threat Intelligence Group (GTIG) assesses UNC6032 to have a Vietnam nexus.
Mandiant Threat Defense acknowledges Meta’s collaborative and proactive threat hunting efforts in removing the identified malicious ads, domains, and accounts. Notably, a significant portion of Meta’s detection and removal began in 2024, prior to Mandiant alerting them of additional malicious activity we identified.
Threat actors haven’t wasted a moment capitalizing on the global fascination with Artificial Intelligence. As AI’s popularity surged over the past couple of years, cybercriminals quickly moved to exploit the widespread excitement. Their actions have fueled a massive and rapidly expanding campaign centered on fraudulent websites masquerading as cutting-edge AI tools. These websites have been promoted by a large network of misleading social media ads, similar to the ones shown in Figure 1 and Figure 2.
Figure 1: Malicious Facebook ads
Figure 2: Malicious LinkedIn ads
As part of Meta’s implementation of the Digital Services Act, the Ad Library displays additional information (ad campaign dates, targeting parameters and ad reach) on all ads that target people from the European Union. LinkedIn has also implemented a similar transparency tool.
Our research through both Ad Library tools identified over 30 different websites, mentioned across thousands of ads, active since mid 2024, all displaying similar ad content. The majority of ads which we found ran on Facebook, with only a handful also advertised on LinkedIn. The ads were published using both attacker-created Facebook pages, as well as by compromised Facebook accounts. Mandiant Threat Defense performed further analysis of a sample of over 120 malicious ads and, from the EU transparency section of the ads, their total reach for EU countries was over 2.3 million users. Table 1 displays the top 5 Facebook ads by reach. It should be noted that reach does not equate to the number of victims. According to Meta, the reach of an ad is an estimated number of how many Account Center accounts saw the ad at least once.
Ad Library ID
Ad Start Date
Ad End Date
EU Reach
1589369811674269
14.12.2024
18.12.2024
300,943
559230916910380
04.12.2024
09.12.2024
298,323
926639029419602
07.12.2024
09.12.2024
270,669
1097376935221216
11.12.2024
12.12.2024
124,103
578238414853201
07.12.2024
10.12.2024
111,416
Table 1: Top 5 Facebook ads by reach
The threat actor constantly rotates the domains mentioned in the Facebook ads, likely to avoid detection and account bans. We noted that once a domain is registered, it will be referenced in ads within a few days if not the same day. Moreover, most of the ads are short lived, with new ones being created on a daily basis.
On LinkedIn, we identified roughly 10 malicious ads, each directing users to hxxps://klingxai[.]com. This domain was registered on September 19, 2024, and the first ad appeared just a day later. These ads have a total impression estimate of 50k-250k. For each ad, the United States was the region with the highest percentage of impressions, although the targeting included other regions such as Europe and Australia.
Ad Library ID
Ad Start Date
Ad End Date
Total Impressions
% Impressions in the US
490401954
20.09.2024
20.09.2024
<1k
22
508076723
27.09.2024
28.09.2024
10k-50k
68
511603353
30.09.2024
01.10.2024
10k-50k
61
511613043
30.09.2024
01.10.2024
10k-50k
40
511613633
30.09.2024
01.10.2024
10k-50k
54
511622353
30.09.2024
01.10.2024
10k-50k
36
Table 2: LinkedIn ads
From the websites investigated, Mandiant Threat Defense observed that they have similar interfaces and offer purported functionalities such as text-to-video or image-to-video generation. Once the user provides a prompt to generate a video, regardless of the input, the website will serve one of the static payloads hosted on the same (or related) infrastructure.
The payload downloaded is the STARKVEIL malware. It drops three different modular malware families, primarily designed for information theft and capable of downloading plugins to extend their functionality. The presence of multiple, similar payloads suggests a fail-safe mechanism, allowing the attack to persist even if some payloads are detected or blocked by security defences.
In the next section, we will delve deeper into one particular compromise Mandiant Threat Defense responded to.
Luma AI Investigation
Infection Chain
Figure 3: Infection chain lifecycle
This blog post provides a detailed analysis of our findings on the key components of this campaign:
Lure: The threat actors leverage social networks to push AI-themed ads that direct users to fake AI websites, resulting in malware downloads.
Malware: It contains several malware components, including the STARKVEIL dropper, which deploys the XWORM and FROSTRIFT backdoors and the GRIMPULL downloader.
Execution: The malware makes extensive use of DLL side-loading, in-memory droppers, and process injection to execute its payloads.
Persistence: It uses AutoRun registry key for its two Backdoors (XWORM and FROSTRIFT).
Anti-VM and Anti-analysis: GRIMPULL checks for commonly used artifactsfeatures from known Sandbox and analysis tools.
Reconnaissance
Host reconnaissance: XWORM and FROSTRIFT survey the host by collecting information, including OS, username, role, hardware identifiers, and installed AV.
Software reconnaissance: FROSTRIFT checks the existence of certain messaging applications and browsers.
Command-and-control (C2)
Tor: GRIMPULL utilizes a Tor Tunnel to fetch additional .NET payloads.
Telegram: XWORM sends victim notification via telegram including information gathered during host reconnaissance.
TCP: The malware connects to its C2 using ports 7789, 25699, 56001.
Information stealer
Keylogger: XWORM log keystrokes from the host.
Browser extensions: FROSTRIFT scans for 48 browser extensions related to Password managers, Authenticators, and Digital wallets potentially for data theft.
Backdoor Commands: XWORM supports multiple commands for further compromise.
The Lure
This particular case began from a Facebook Ad for “Luma Dream AI Machine”, masquerading as a well-known text-to-video AI tool – Luma AI. The ad, as seen in Figure 4, redirected the user to an attacker-created website hosted at hxxps://lumalabsai[.]in/.
Figure 4: The ad the victim clicked on
Once on the fake Luma AI website, the user can click the “Start Free Now” button and choose from various video generation functionalities. Regardless of the selected option, the same prompt is displayed, as shown in the GIF in Figure 5.
This multi-step process, made to resemble any other legitimate text-to-video or image-to-video generation tool website, creates a sense of familiarity to the user and does not give any immediate indication of malicious intent. Once the user hits the generate button, a loading bar appears, mimicking an AI model hard at work. After a few seconds, when the new video is supposedly ready, a Download button is displayed. This leads to the download of a ZIP archive file on the victim host.
Figure 5: Fake AI video generation website
Unsurprisingly, the ready-to-download archive is one of many payloads already hosted on the same server, with no connection to the user input. In this case, several archives were hosted at the path hxxps://lumalabsai[.]in/complete/. Mandiant determined that the website will serve the archive file with the most recent “Last Modified” value, indicating continuous updates by the threat actor. Mandiant compared some of these payloads and found them to be functionally similar, with different obfuscation techniques applied, thus resulting in different sizes.
Figure 6: Payloads hosted at hxxps://lumalabsai[.]in/complete
Execution
The previously downloaded ZIP archive contains an executable with a double extension (.mp4 and.exe) in its name, separated by thirteen Braille Pattern Blank (Unicode: U+2800, UTF-8: E2 A0 80)characters. This is a special whitespace character from the Braille Pattern Block in Unicode.
Figure 7: Braille Pattern Blank characters in the file name
The resulting file name, Lumalabs_1926326251082123689-626.mp4⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀.exe, aims to make the binary less suspicious by pushing the.exe extension out of the user view. The number of Braille Pattern Blank characters used varies across different samples served, ranging from 13 to more than 30. To further hide the true purpose of this binary, the default .mp4 Windows icon is used on the malicious file.
Figure 8 shows how the file looks on Windows 11, compared to a legitimate.mp4 file.
Figure 8: Malicious binary vs legitimate .mp4 file
STARKVEIL
The binary Lumalabs_1926326251082123689-626.mp4⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀.exe, tracked by Mandiant as STARKVEIL, is a dropper written in Rust. Once executed, it extracts an embedded archive containing benign executables and its malware components. These are later utilized to inject malicious code into several legitimate processes.
Executing the malware displays an error window, as seen in Figure 9, to trick the user into trying to execute it again and into believing that the file is corrupted.
Figure 9: Error window displayed when executing STARKVEIL
For a successful compromise, the executable needs to run twice; the initial execution results in the extraction of all the embedded files under the C:winsystem directory.
Figure 10: Files in the winsystem directory
During the second execution, the main executable spawns the Python Launcher, py.exe, with an obfuscated Python command as an argument. The Python command decodes an embedded Python code, which Mandiant tracks as COILHATCHdropper. COILHATCH performs the following actions (note that the script has been deobfuscated and renamed for improved readability):
The command takes a Base85-encoded string, decodes it, decompresses the result using zlib, deserializes the resulting data using the marshalmodule, and then executes the final deserialized data as Python code.
Figure 11: Python command
The decompiled first-stage Python code combines RSA, AES, RC4, and XOR techniques to decrypt the second stage Python bytecode.
Figure 12: First-stage Python
The decrypted second-stage Python script executes C:winsystemheifheif.exe, which is a legitimate, digitally signed executable, used to side-load a malicious DLL. This serves as the launcher to execute the other malware components.
As mentioned, the STARKVEIL malware drops its components during its first execution and executes a launcher on its second execution. The complete analysis of all the malware components and their roles is provided in the next sections.
Each of these DLLs operates as an in-memory dropper and spawns a new victim process to perform code injection through process replacement.
Launcher
The execution of C:winsystemheifheif.exe results in the side-loading of the malicious heif.dll, located in thesame directory. This DLL is an in-memory dropper that spawns a legitimate Windows process (which may vary) and performs code injection through process replacement.
The injected code is a .NET executable that acts as a launcher and performs the following:
Moves multiple folders from C:winsystem to %APPDATA%. The destination folders are:
%APPDATA%python
%APPDATA%pythonw
%APPDATA%ffplay
%APPDATA%Launcher
Launches three legitimate processes to side-load associated malicious DLLs. The malicious DLLs for each process are:
python.exe: %APPDATA%pythonavcodec-61.dll
pythonw.exe: %APPDATA%pythonwheif.dll
ffplay.exe: %APPDATA%ffplaylibde265.dll
Establishes persistence via AutoRun registry key.
value: Dropbox
key: SOFTWAREMicrosoftWindowsCurrentVersionRun
root: HKCU
value data: "cmd.exe /c "cd /d "<exePath>" && "Launcher.exe""
Figure 14: Main function of launcher
The AutoRun Key executes %APPDATA%LauncherLauncher.exe that sideloads the DLL file libde265.dll. This DLL spawns and injects its payload into AddInProcess32.exe via PE hollowing. The injected code’s main purpose is to execute the legitimate binaries C:winsystemheif2rgbheif2rgb.exe and C:winsystemheif-infoheif-info.exe, which, in turn, sideload the backdoors XWORM and FROSTRIFT,respectively.
GRIMPULL
Of the three executables, the launcher first executes %APPDATA%pythonpython.exe, which side-loads the DLL avcodec-61.dll and injects the malware GRIMPULLinto a legitimate Windows process.
GRIMPULLis a .NET-based downloader that incorporates anti-VM capabilities and utilizes Tor for C2 server connections.
Anti-VM and Anti-Analysis
GRIMPULL begins by checking for the presence of the mutex value aff391c406ebc4c3, and terminates itself if this is found. Otherwise, the malware proceeds to perform further anti-VM checks, exiting in case any of the mentioned checks succeeds.
Anti-VM and Anti-Analysis Checks
Module Detection
Checks for sandbox/analysis tool DLLs:
SbieDll.dll (Sandboxie)
cuckoomon.dll (Cuckoo Sandbox)
BIOS Information Checks
Queries Win32_BIOS via WMI and checks version and serial number for:
VMware
VIRTUAL
A M I (AMI BIOS)
Xen
Parent Process Check
Checks if parent process is cmd (command line)
VM File Detection
Checks for existence of vmGuestLib.dll in the System folder
System Manufacturer Checks
Queries Win32_ComputerSystem via WMI and checks manufacturer and model for:
Microsoft (Hyper-V)
VMWare
Virtual
Display and System Configuration Checks
Checks for specific screen resolutions:
1440×900
1024×768
1280×1024
Checks if the OS is 32-bit
Username Checks
Checks for common analysis environment usernames:
john
anna
Any username containing xxxxxxxx
Table 4: Anti-VM and Anti-analysis checks
Download Function
GRIMPULLverifies the presence of a Tor process. If a Tor process is not detected, it proceeds to download, decompress, and execute Tor from the following URL:
GRIMPULL then attempts to connect to the following C2 server via the Tor tunnel over TCP.
strokes[.]zapto[.]org:7789
The malware maintains this connection and periodically checks for .NET payloads. Fetched payloads are decrypted using TripleDES in ECB mode with the MD5 hash of the campaign ID aff391c406ebc4c3 as the decryption key, decompressed with GZip (using a 4-byte length prefix), reversed, and then loaded into memory as .NET assemblies.
Malware Configuration
The configuration elements are encoded as base64 strings, as shown in Figure 16.
Figure 16: Encoded malware configuration
Table 5 shows the extracted malware configuration.
GRIMPULL Malware Configuration
C2 domain/server
strokes[.]zapto[.]org
Port number
7789
Unique identifier/campaign ID
aff391c406ebc4c3
Configuration profile name
Default
Table 5: GRIMPULL configuration
XWORM
Secondly, the launcher executes the file %APPDATA%pythonwpythonw.exe, which side-loads the DLL heif.dll and injects XWORM into a legitimate Windows process.
XWORM is a .NET-based backdoor that communicates using a custom binary protocol over TCP. Its core functionality involves expanding its capabilities through a plugin management system. Downloaded plugins are written to disk and executed. Supported capabilities include keylogging, command execution, screen capture, and spreading to USB drives.
XWORM Configuration
The malware begins by decoding its configuration using the AES algorithm.
Figure 17: Decryption of configuration
Table 6 shows the extracted malware configuration.
XWORM Malware Configuration
Host
artisanaqua[.]ddnsking[.]com
Port number
25699
KEY
<123456789>
SPL
<Xwormmm>
Version
XWorm V5.2
USBNM
USB.exe
Telegram Token
8060948661:AAFwePyBCBu9X-gOemLYLlv1owtgo24fcO0
Telegram ChatID
-1002475751919
Mutex
ZMChdfiKw2dqF51X
Table 6: XWORM configuration
Host Reconnaissance
The malware then performs a system survey to gather the following information:
Bot ID
Username
OS Name
If it’s running on USB
CPU Name
GPU Name
Ram Capacity
AV Products list
Sample of collected information:
☠ [KW-2201]
New Clinet : <client_id_from_machine_info_hash>
UserName : <victim_username>
OSFullName : <victim_OS_name>
USB : <is_sample_name_USB.exe>
CPU : <cpu_description>
GPU : <gpu_description>
RAM : <ram_size_in_GBs>
Groub : <installed_av_solutions>
Then the sample waits for any of the following supported commands:
Command
Description
Command
Description
pong
echo back to server
StartDDos
Spam HTTP requests over TCP to target
rec
restart bot
StopDDos
Kill DDOS threads
CLOSE
shutdown bot
StartReport
List running processes continuously
uninstall
self delete
StopReport
Kill process monitoring threads
update
uninstall and execute received new version
Xchat
Send C2 message
DW
Execute file on disk via powershell
Hosts
Get hosts file contents
FM
Execute .NET file in memory
Shosts
Write to file, likely to overwrite hosts file contents
LN
Download file from supplied URL and execute on disk
DDos
Unimplemented
Urlopen
Perform network request via browser
ngrok
Unimplemented
Urlhide
Perform network request in process
plugin
Load a Bot plugin
PCShutdown
Shutdown PC now
savePlugin
Save plugin to registry and load it HKCUSoftware<victim_id><plugin_name>=<plugin_bytes>
PCRestart
Restart PC now
RemovePlugins
Delete all plugins in registry
PCLogoff
Log off
OfflineGet
Read Keylog
RunShell
Execute CMD on shell
$Cap
Get screen capture
Table 7: Supported commands
FROSTRIFT
Lastly, the launcher executes the file %APPDATA%ffplayffplay.exe to side-load the DLL %APPDATA%ffplaylibde265.dll and inject FROSTRIFT into a legitimate Windows process.
FROSTRIFT is a .NET backdoor that collects system information, installed applications, and crypto wallets. Instead of receiving C2 commands, it receives .NET modules that are stored in the registry to be loaded in-memory. It communicates with the C2 server using GZIP-compressed protobuf messages over TCP/SSL.
Malware Configuration
The malware starts by decoding its configuration, which is a Base64-encoded and GZIP-compressed protobuf message embedded within the strings table.
Figure 18: FROSTRIFT configuration
Table 8 shows the extracted malware configuration.
Field
Value
Protobuf Tag
38
C2 Domain
strokes.zapto[.]org
C2 Port
56001
SSL Certificate
<Base64 encoded SSL certificate>
Unknown
Default
Installation folder
APPDATA
Mutex
7d9196467986
Table 8: FROSTRIFT configration
Persistence
FROSTRIFT can achieve persistence by running the command:
The sample copies itself to %APPDATA% and adds a new registry value under HKCUSOFTWAREMicrosoftWindowsCurrentVersionRun with the new file path as data to ensure persistence at each system startup.
Host Reconnaissance
The following information is initially collected and submitted by the malware to the C2:
Collected Information
Host information
Installed Anti-Virus
Web camera
Hostname
Username and Role
OS name
Local time
Victim ID
HEX digest of the MD5 hash for the following combined:
Sample process ID
Disk drive serial number
Physical memory serial number
Victim user name
Malware Version
4.1.8
Software Applications
com.liberty.jaxx
Foxmail
Telegram
Browsers (see Table 10)
Standalone Crypto Wallets
Atomic, Bitcoin-Qt, Dash-Qt, Electrum, Ethereum, Exodus, Litecoin-Qt, Zcash, Ledger Live
Browser Extension
Password managers, Authenticators, and Digital wallets (see Table 11)
Others
5th entry from the Config (“Default” in this sample)
Malware full file path
Table 9: Collected information
FROSTRIFT checks for the existence of the following browsers:
FROSTRIFT also checks for the existence of 48 browser extensions related to Password managers, Authenticators, and Digital wallets. The full list is provided in Table 11.
String
Extension
ibnejdfjmmkpcnlpebklmnkoeoihofec
TronLink
nkbihfbeogaeaoehlefnkodbefgpgknn
MetaMask
fhbohimaelbohpjbbldcngcnapndodjp
Binance Chain Wallet
ffnbelfdoeiohenkjibnmadjiehjhajb
Yoroi
cjelfplplebdjjenllpjcblmjkfcffne
Jaxx Liberty
fihkakfobkmkjojpchpfgcmhfjnmnfpi
BitApp Wallet
kncchdigobghenbbaddojjnnaogfppfj
iWallet
aiifbnbfobpmeekipheeijimdpnlpgpp
Terra Station
ijmpgkjfkbfhoebgogflfebnmejmfbml
BitClip
blnieiiffboillknjnepogjhkgnoapac
EQUAL Wallet
amkmjjmmflddogmhpjloimipbofnfjih
Wombat
jbdaocneiiinmjbjlgalhcelgbejmnid
Nifty Wallet
afbcbjpbpfadlkmhmclhkeeodmamcflc
Math Wallet
hpglfhgfnhbgpjdenjgmdgoeiappafln
Guarda
aeachknmefphepccionboohckonoeemg
Coin98 Wallet
imloifkgjagghnncjkhggdhalmcnfklk
Trezor Password Manager
oeljdldpnmdbchonielidgobddffflal
EOS Authenticator
gaedmjdfmmahhbjefcbgaolhhanlaolb
Authy
ilgcnhelpchnceeipipijaljkblbcobl
GAuth Authenticator
bhghoamapcdpbohphigoooaddinpkbai
Authenticator
mnfifefkajgofkcjkemidiaecocnkjeh
TezBox
dkdedlpgdmmkkfjabffeganieamfklkm
Cyano Wallet
aholpfdialjgjfhomihkjbmgjidlcdno
Exodus Web3
jiidiaalihmmhddjgbnbgdfflelocpak
BitKeep
hnfanknocfeofbddgcijnmhnfnkdnaad
Coinbase Wallet
egjidjbpglichdcondbcbdnbeeppgdph
Trust Wallet
hmeobnfnfcmdkdcmlblgagmfpfboieaf
XDEFI Wallet
bfnaelmomeimhlpmgjnjophhpkkoljpa
Phantom
fcckkdbjnoikooededlapcalpionmalo
MOBOX WALLET
bocpokimicclpaiekenaeelehdjllofo
XDCPay
flpiciilemghbmfalicajoolhkkenfel
ICONex
hfljlochmlccoobkbcgpmkpjagogcgpk
Solana Wallet
cmndjbecilbocjfkibfbifhngkdmjgog
Swash
cjmkndjhnagcfbpiemnkdpomccnjblmj
Finnie
knogkgcdfhhbddcghachkejeap
Keplr
kpfopkelmapcoipemfendmdcghnegimn
Liquality Wallet
hgmoaheomcjnaheggkfafnjilfcefbmo
Rabet
fnjhmkhhmkbjkkabndcnnogagogbneec
Ronin Wallet
klnaejjgbibmhlephnhpmaofohgkpgkd
ZilPay
ejbalbakoplchlghecdalmeeeajnimhm
MetaMask
ghocjofkdpicneaokfekohclmkfmepbp
Exodus Web3
heaomjafhiehddpnmncmhhpjaloainkn
Trust Wallet
hkkpjehhcnhgefhbdcgfkeegglpjchdc
Braavos Smart Wallet
akoiaibnepcedcplijmiamnaigbepmcb
Yoroi
djclckkglechooblngghdinmeemkbgci
MetaMask
acdamagkdfmpkclpoglgnbddngblgibo
Guarda Wallet
okejhknhopdbemmfefjglkdfdhpfmflg
BitKeep
mijjdbgpgbflkaooedaemnlciddmamai
Waves Keeper
Table 11: List of browser extensions
C2 Communication
The malware expects the C2 to respond by sending GZIP-compressed Protobuf messages with the following fields:
registry_val: A registry value under HKCUSoftware<victim_id> to store the loader_bytes.
loader_bytes: Assembly module to load the loaded_bytes (stored at registry in reverse order).
loaded_bytes: GZIP-compressed assembly module to be loaded in-memory.
The sample receives loader_bytes only in the first message as it stores it under the registry value HKCUSoftware<victim_id>registry_val. For the subsequent messages, it only receives registry_val which it uses to fetch loader_bytes from the registry.
The sample sends empty GZIP-compressed Protobuf messages as a keep-alive mechanism until the C2 sends another assembly module to be loaded.
The malware has the ability to download and execute extra payloads from the following hardcoded URLs (this feature is not enabled in this sample):
The files are WebDrivers for browsers that can be used for testing, automation, and interacting with the browser. They can also be used by attackers for malicious purposes, such as deploying additional payloads.
Conclusion
As AI has gained tremendous momentum recently, our research highlights some of the ways in which threat actors have taken advantage of it. Although our investigation was limited in scope, we discovered that well-crafted fake “AI websites” pose a significant threat to both organizations and individual users. These AI tools no longer target just graphic designers; anyone can be lured in by a seemingly harmless ad. The temptation to try the latest AI tool can lead to anyone becoming a victim. We advise users to exercise caution when engaging with AI tools and to verify the legitimacy of the website’s domain.
Acknowledgements
Special thanks to Stephen Eckels, Muhammad Umair, and Mustafa Nasser for their assistance in analyzing the malware samples. Richmond Liclican for his inputs and attribution. Ervin Ocampo, Swapnil Patil, Muhammad Umer Khan, and Muhammad Hasib Latif for providing the detection opportunities.
Detection Opportunities
The following indicators of compromise (IOCs) and YARA rules are also available as a collection and rule pack in Google Threat Intelligence (GTI).
rule G_Backdoor_FROSTRIFT_1 {
meta:
author = "Mandiant"
strings:
$guid = "$23e83ead-ecb2-418f-9450-813fb7da66b8"
$r1 = "IdentifiableDecryptor.DecryptorStack"
$r2 = "$ProtoBuf.Explorers.ExplorerDecryptor"
$s1 = "\User Data\" wide
$s2 = "SELECT * FROM AntiVirusProduct" wide
$s3 = "Telegram.exe" wide
$s4 = "SELECT * FROM Win32_PnPEntity WHERE (PNPClass =
'Image' OR PNPClass = 'Camera')" wide
$s5 = "Litecoin-Qt" wide
$s6 = "Bitcoin-Qt" wide
condition:
uint16(0) == 0x5a4d and (all of ($s*) or $guid or all of ($r*))
}
YARA-L Rules
Mandiant has made the relevant rules available in the Google SecOps Mandiant Intel Emerging Threats curated detections rule set. The activity discussed in the blog post is detected under the rule names:
At Google Cloud, we’re committed to providing the most open and flexible AI ecosystem for you to build solutions best suited to your needs. Today, we’re excited to announce our expanded AI offerings with Mistral AI on Google Cloud:
Le Chat Enterprise on Google Cloud Marketplace: An AI assistant that offers enterprise search, agent builders, custom data and tool connectors, custom models, document libraries, and more in a unified platform.
Available today on Google Cloud Marketplace, Mistral AI’s Le Chat Enterprise is a generative AI work assistant designed to connect tools and data in a unified platform for enhanced productivity.
Use cases include:
Building agents: With Le Chat Enterprise, you can customize and deploy a variety of agents that understand and synchronize with your unique context, including no-code agents.
Accelerating research and analysis: WithLe Chat Enterprise, you can quickly summarize lengthy reports, extract key data from documents, and perform rapid web searches to gather information efficiently.
Generating actionable insights: With Le Chat Enterprise, industries — like finance — can convert complex data into actionable insights, generate text-to-SQL queries for financial analysis, and automate financial report generation.
Accelerating software development: With Le Chat Enterprise, you can debug and optimize existing code, generate and review code, or create technical documentation.
Enhancing content creation: With Le Chat Enterprise, you can help marketers generate and refine marketing copy across channels, analyze campaign performance data, and collaborate on visual content creation through Canvas.
By deploying Le Chat Enterprise through Google Cloud Marketplace, organizations can leverage the scalability and security of Google Cloud’s infrastructure, while also benefiting from a simplified procurement process and integrations with existing Google Cloud services such as BigQuery and Cloud SQL.
Mistral OCR 25.05 excels in document understanding and can comprehend elements of content-rich papers—like media, text, charts, tables, graphs, and equations—with powerful accuracy and cognition. More example use cases include:
Digitizing scientific research: Research institutions can use Mistral OCR 25.05 to accelerate scientific workflows by converting scientific papers and journals into AI-ready formats, making them accessible to downstream intelligence engines.
Preserving historical and cultural heritage: Digitizing historical documents and artifacts to assist with preservation and making them more accessible to a broader audience.
Streamlining customer service: Customer service departments can reduce response times and improve customer satisfaction by using Mistral OCR 25.05 to transform documentation and manuals into indexed knowledge.
Making literature across design, education, legal, etc. AI ready: Mistral OCR 25.05 can discover insights and accelerate productivity across a large volume of documents by helping companies convert technical literature, engineering drawings, lecture notes, presentations, regulatory filings and more into indexed, answer-ready formats.
When building with Mistral OCR 25.05 as a Model-as-a-Service (MaaS) on Vertex AI, you get a comprehensive AI platform to scale with fully managed infrastructure and build confidently with enterprise-grade security and compliance. Mistral OCR 25.05 joins a curated selection of over 200 foundation models in Vertex AI Model Garden, empowering you to choose the ideal solution for your specific needs.
To start building with Mistral OCR 25.05 on Vertex AI, visit the Mistral OCR 25.05 model card in Vertex AI Model Garden, select “Enable”, and follow the proceeding instructions.
Today, we’re expanding the choice of third-party models available in Vertex AI Model Garden with the addition of Anthropic’s newest generation of the Claude model family: Claude Opus 4 and Claude Sonnet 4. Both Claude Opus 4 and Claude Sonnet 4 are hybrid reasoning models, meaning they offer modes for near-instant responses and extended thinking for deeper reasoning.
Claude Opus 4 is Anthropic’s most powerful model to date. Claude Opus 4 excels at coding, with sustained performance on complex, long-running tasks and agent workflows. Use cases include advanced coding work, autonomous AI agents, agentic search and research, tasks that require complex problem solving, and long-running tasks that require precise content management.
Claude Sonnet 4 is Anthropic’s mid-size model that balances performance with cost. It surpasses its predecessor, Claude Sonnet 3.7, across coding and reasoning while responding more precisely to steering. Use cases include coding tasks such as code reviews and bug fixes, AI assistants, efficient research, and large-scale content generation and analysis.
Claude Opus 4 and Claude Sonnet 4 are generally available as a Model-as-a-Service (MaaS) offering on Vertex AI. For more informationon the newest Claude models, visit Anthropic’s blog.
Build advanced agents on Vertex AI
Vertex AI is Google Cloud’s comprehensive platform for orchestrating your production AI workflows across three pillars: data, models, and agents—a combination that would otherwise require multiple fragmented solutions. A key component of the model pillar is Vertex AI Model Garden, which offers a curated selection of over 200 foundation models, including Google’s models, third-party models, and open models—empowering you to choose the ideal solution for your specific needs.
You can leverage Vertex AI’s Model-as-a-Service (MaaS) to rapidly deploy and scale Claude-powered intelligent agents and applications, benefiting from integrated agentic tooling, fully managed infrastructure, and enterprise-grade security.
By building on Vertex AI, you can:
Orchestrate sophisticated multi-agent systems: Build agents with an open approach using Google’s Agent Development Kit (ADK) or your preferred framework. Deploy your agents to production with enterprise-grade controls directly in Agent Engine.
Harness the power of Google Cloud integrations: You can connect Claude directly within BigQuery ML to facilitate functions like text generation, summarization, translation, and more.
Optimize performance with provisioned throughput: Reserve dedicated capacity and prioritized processing for critical production workloads with Claude models at a fixed fee. To get started with provisioned throughput, contact your Google Cloud sales representative.
Maximize Claude model utilization: Reduce latency and costs while increasing throughput by employing Vertex AI’s advanced features for Claude models such asbatch predictions, prompt caching, token counting, and citations. For detailed information, refer to our documentation.
Scale withfully managed infrastructure: Vertex AI’s fully managed and AI-optimized infrastructure simplifies how you deploy your AI workloads in production. Additionally, Vertex AI’s new global endpoints for Claude (public preview) enhance availability by dynamically serving traffic from the nearest available region.
Build confidently with enterprise-grade security and compliance: Benefit from Vertex AI’s built-in security and compliance measures that satisfy stringent enterprise requirements.
Customers achieving real impact with Claude on Vertex AI
To date, more than 4,000 customers have started using Anthropic’s Claude models on Vertex AI. Here’s a look at how top organizations are driving impactful results with this powerful integration:
Augment Codeis running its AI coding assistant, which specializes in helping developers navigate and contribute to production-grade codebases, with Anthropic’s Claude models on Vertex AI.
“What we’re able to get out of Anthropic is truly extraordinary, but all of the work we’ve done to deliver knowledge of customer code, used in conjunction with Anthropic and the other models we host on Google Cloud, is what makes our product so powerful.” – Scott Dietzen, CEO, Augment Code
Palo Alto Networks is accelerating software development and security by deploying Claude on Vertex AI.
“With Claude running on Vertex AI, we saw a 20% to 30% increase in code development velocity. Running Claude on Google Cloud’s Vertex AI not only accelerates development projects, it enables us to hardwire security into code before it ships.” – Gunjan Patel, Director of Engineering, Office of the CPO, Palo Alto Networks
Replit leverages Claude on Vertex AI to power Replit Agent, which empowers people across the world to use natural language prompts to turn their ideas into applications, regardless of coding experience.
“Our AI agent is made more powerful through Anthropic’s Claude models running on Vertex AI. This integration allows us to easily connect with other Google Cloud services, like Cloud Run, to work together behind the scenes to help customers turn their ideas into apps.” – Amjad Masad, Founder and CEO, Replit
Get started
To get started with the new Claude models on Vertex AI, navigate to the Claude Opus 4 or the Claude Sonnet 4 model card in Vertex AI Model Garden, select “Enable”, and follow the proceeding instructions.
In today’s data-driven world, understanding large datasets often requires numerous, complex non-additive1 aggregation operations. But as the size of the data becomes massive2, these types of operations become computationally expensive and time-consuming using traditional methods. That’s where Apache DataSketches come in. We’re excited to announce the availability of Apache DataSketches functions within BigQuery, providing powerful tools for approximate analytics at scale.
Apache DataSketches is an open-source library of sketches, specialized streaming algorithms that efficiently summarize large datasets. Sketches are small probabilistic data structures that enable accurate estimates of distinct counts, quantiles, histograms, and other statistical measures – all with minimal memory, minimal computational overhead, and with a single pass through the data. All but a few of these sketches provide mathematically proven error bounds, i.e., the maximum possible difference between a true value and its estimated or approximated value. These error bounds can be adjusted by the user as a trade-off between the size of the sketch and the size of the error bounds. The larger the configured sketch, the smaller will be the size of the error bounds.
With sketches, you can quickly gain insights from massive datasets, especially when exact computations are impractical or impossible. The sketches themselves can be merged, making them additive and highly parallelizable, so you can combine sketches from multiple datasets for further analysis. This combination of small size and mergeability can translate into orders-of-magnitude improvement in speed of computational workload compared to traditional methods.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6130342520>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Why DataSketches in BigQuery?
BigQuery is known for its ability to process petabytes of data, and DataSketches are a natural fit for this environment. With DataSketches functions, BigQuery lets you:
Perform rapid approximate queries: Get near-instantaneous results for distinct counts, quantile analysis, adaptive histograms and other non-additive aggregate calculations on massive datasets.
Save on resources: Reduce query costs and storage requirements by working with compact sketches instead of raw data.
Move between systems: DataSketches have well-defined stored binary representations that let sketches be transported between systems and interpreted by three major languages: Java, C++, and Python, all without losing any accuracy.
Apache DataSketches come to BigQuery through custom C++ implementations using the Apache DataSketches C++ core library compiled to WebAssembly (WASM) libraries, and then loaded within BigQuery Javascript user-defined aggregate functions (JS UDAFs).
How BigQuery customers use Apache DataSketches
Yahoo started the Apache DataSketches project in 2011, open-sourced it in 2015, and still uses the Apache DataSketches library. They use approximate results in various analytic query operations such as count distinct, quantiles, and most frequent items (a.k.a. Heavy Hitters). More recently, Yahoo adapted the DataSketches library to leverage the large scale of BigQuery, using the Google-defined JavaScript User Defined Aggregate Functions (UDAF) interface to the Google Cloud and BigQuery platform.
“Yahoo has successfully used the Apache DataSketches library to analyze massive data in our internal production processing systems for more than 10 years. Data sketching has allowed us to respond to a wide range of queries summarizing data in seconds, at a fraction of the time and cost of brute-force computation. As an early innovator in developing this powerful technology, we are excited about this fast, accurate, large-scale, open-source technology becoming available to those already working in a Google Cloud BigQuery environment.” – Matthew Sajban, Director of Software Development Engineering, Yahoo
Featured sketches
So, what can you do with Apache DataSketches? Let’s take a look at the sketches integrated with BigQuery.
Cardinality sketches
Hyper Log Log Sketch (HLL): The DataSketches library implements this historically famous sketch algorithm with lots of versatility. It is best suited for straightforward distinct counting (or cardinality) estimation. It can be adapted to a range of sizes from roughly 50 bytes to about 2MB depending on the accuracy requirements. It also comes in three flavors: HLL_4, HLL_6, HLL_8 that enable additional tuning of speed and size.
Theta Sketch: This sketch specializes in set expressions and allows not only normal additive unions but also full set expressions between sketches with set-intersection and set-difference. Because of its algebraic capability, this sketch is one of the most popular sketches. It has a range of sizes from a few hundred bytes to many megabytes, depending on the accuracy requirements.
CPC Sketch: This cardinality sketch takes advantage of recent algorithmic research and enables smaller stored size, for the same accuracy, than the classic HLL sketch. It is targeted for situations where accuracy per stored size is the most critical metric.
Tuple Sketch: This extends Theta Sketch to enable the association of other values with each unique item retained by the sketch. This allows the computation of summaries of attributes like impressions or clicks as well as more complex analysis of customer engagement, etc.
Quantile sketches
KLL Sketch: This Sketch is designed for quantile estimation (e.g., median, percentiles), and ideal for understanding distributions, creating density and histogram plots, and partitioning large data sets. The KLL algorithm used in this sketch has been proven to have statistically optimal quantile approximation accuracy for a given size. The KLL Sketch can be used with any kind of data that is comparable, i.e., has a defined sorting order between items. The accuracy of KLL is insensitive to the input data distribution.
REQ Sketch: This quantile sketch is designed for situations where accuracy at the ends of the rank domain is more important than at the median. In other words, if you’re most interested in accuracy at the 99.99th percentile and not so interested in the accuracy at the 50th percentile, this is the sketch to choose. Like the KLL Sketch, this sketch has mathematically proven error bounds. The REQ sketch can be used with any kind of data that is comparable, i.e., has a defined sorting order between items. By design, the accuracy of REQ is sensitive to how close an item is to the ends of the normalized rank domain (i.e., close to rank 0.0 or rank 1.0), otherwise it is insensitive to the input distribution.
T-Digest Sketch: This is also a quantile sketch, but it’s based on a heuristic algorithm and doesn’t have mathematically proven error properties. It is also limited to strictly numeric data. The accuracy of the T-Digest Sketch can be sensitive to the input data distribution. However, it’s a very good heuristic sketch, fast, has a small footprint, and can provide excellent results in most situations.
Frequency sketches
Frequent Items Sketch: This sketch is also known as a Heavy-Hitter sketch. Given a stream of items, this sketch identifies, in a single pass, the items that occur more frequently than a noise threshold, which is user-configured by the size of the sketch. This is especially useful in real-time situations. For example, what are the most popular items from a web site that are being actively queried, over the past hour, day, or minute? Its output is effectively an ordered list of the most frequently visited items. This list changes dynamically, which means you can query the sketch, say, every hour to help you understand the query dynamics over the course of a day. In static situations, for example, it can be used to discover the largest files in your database in a single pass and with only a modest amount of memory.
How to get started
To leverage the power of DataSketches in BigQuery, you can find the new functions within the bqutil.datasketches dataset (for US multi-region location) or bqutil.datasketches_<bq_region> dataset (for any other regions and locations). For detailed information on available functions and their usage, refer to the DataSketches README. You can also find demo notebooks in our GitHub repo for the KLL Sketch, Theta Sketch, and FI Sketch.
Example: Obtaining estimates of Min, Max, Median, 75th, 95th percentiles and total count using the KLL Quantile Sketch
Suppose you have 1 million comparable3 records in 100 different partitions or groups. You would like to understand how the records are distributed by their percentile or rank, without having to bring them all together in memory or even sort them.
SQL:
code_block
<ListValue: [StructValue([(‘code’, ‘## Creating sample data with 1 million records split into 100 groups of nearly equal sizernrnCREATE TEMP TABLE sample_data ASrnSELECTrn CONCAT(“group_key_”, CAST(RAND() * 100 AS INT64)) AS group_key,rn RAND() AS xrnFROMrn UNNEST(GENERATE_ARRAY(1, 1000000));rnrn## Creating KLL merge sketches for a group keyrnrnCREATE TEMP TABLE agg_sample_data ASrnSELECTrn group_key,rn count(*) AS total_count,rn bqutil.datasketches.kll_sketch_float_build_k(x, 250) AS kll_sketchrnFROM sample_datarnGROUP BY group_key;rnrn## Merge group based sketches into a single sketch and then get approx quantilesrnrnWITH agg_data AS (rn SELECTrn bqutil.datasketches.kll_sketch_float_merge_k(kll_sketch, 250) rnAS merged_kll_sketch,rn SUM(total_count) AS total_countrn FROM agg_sample_datarn)rnSELECTrn bqutil.datasketches.kll_sketch_float_get_quantile(merged_kll_sketch, 0.0, true) AS mininum,rn bqutil.datasketches.kll_sketch_float_get_quantile(merged_kll_sketch, 0.5, true) AS p50,rn bqutil.datasketches.kll_sketch_float_get_quantile(merged_kll_sketch, 0.75, true) AS p75,rn bqutil.datasketches.kll_sketch_float_get_quantile(merged_kll_sketch, 0.95, true) AS p95,rn bqutil.datasketches.kll_sketch_float_get_quantile(merged_kll_sketch, 1.0, true) AS maximum,rn total_countrnFROM agg_data;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6130342640>)])]>
The DataSketches Tuple Sketch is a powerful tool to analyze properties that have a natural association with unique identifiers.
For example, imagine you have a large-scale web application that records user identifiers and their clicks on various elements. You would like to analyze this massive dataset efficiently to obtain approximate metrics for clicks per unique user. The Tuple Sketch computes the number of unique users and allows you to track additional properties that are naturally associated with the unique identifiers as well.
SQL:
code_block
<ListValue: [StructValue([(‘code’, ‘## Creating sample data with 100M records (1 through 100M) split in 10 nearly equal sized groups of 10M values eachrnrnrnCREATE TEMP TABLE sample_data_100M ASrnSELECTrn CONCAT(“group_key_”, CAST(RAND() * 10 AS INT64)) AS group_key,rn 1000000 * x2 + x1 AS user_id, rn X2 AS clicksrnFROM UNNEST(GENERATE_ARRAY(1, 1000000)) AS x1,rn UNNEST(GENERATE_ARRAY(0, 99)) AS x2;rnrn## Creating Tuple sketches for a group key ( group key can be any dimension for example date, product, location etc ) rnrnrnCREATE TEMP TABLE agg_sample_data_100M ASrnSELECTrn group_key, count(distinct user_id) AS exact_uniq_users_ct,rn sum(clicks) AS exact_clicks_ct,rn bqutil.datasketches.tuple_sketch_int64_agg_int64(user_id, clicks) rn AS tuple_sketchrnFROM sample_data_100MrnGROUP BY group_key;rnrn## Merge group based sketches into a single sketch and then extract relevant metrics like distinct count estimate as well as the estimate of the sum of clicks and its upper and lower bounds.rnrnrnWITHrnagg_data AS (rn SELECTrn bqutil.datasketches.tuple_sketch_int64_agg_union(tuple_sketch)rn AS merged_tuple_sketch, SUM(exact_uniq_users_ct) rn AS total_uniq_users_ct, rn FROM agg_sample_data_100Mrn)rnSELECTrn total_uniq_users_ct,rn bqutil.datasketches.tuple_sketch_int64_get_estimate(merged_tuple_sketch)rn AS distinct_count_estimate,rn bqutil.datasketches.tuple_sketch_int64_get_sum_estimate_and_boundsrn (merged_tuple_sketch, 2)rn AS sum_estimate_and_boundsrnFROM agg_data;rnrnrn## The average clicks / unique user can be obtained by simple divisionrn## Note: the number of digits of precision in the estimates above are due to the fact that the returned values are floating point.’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6130342280>)])]>
In short, DataSketches in BigQuery unlocks a new dimension of approximate analytics, helping you gain valuable insights from massive datasets quickly and efficiently. Whether you’re tracking website traffic, analyzing user behavior, or performing any other large-scale data analysis, DataSketches are your go-to tools for fast, accurate estimations.
To start using DataSketches in BigQuery, refer to the DataSketches-BigQuery repository README for building, installing and testing the DataSketches-BigQuery library in your own environment. In each sketch folder there is a README that details the specific function specifications available for that sketch.
If you are working in a BigQuery environment, the DataSketches-BigQuery library is already available for you to use in all regional public BigQuery datasets.
1. Examples include distinct counting, quantiles, topN, K-means, density estimation, graph analysis, etc. The results from one parallel partition cannot be simply “added” to the results of another partition – thus the term non-additive (a.k.a. non-linear operations). 2. Massive ~ typically, much larger than what can be conveniently kept in random-access memory. 3. Any two items can be compared to establish their order, i.e. if A < B, then A precedes B.
Want to save some money on large AI training? For a typical PyTorch LLM training workload that spans thousands of accelerators for several weeks, a 1% improvement in ML Goodput can translate to more than a million dollars in cost savings1. Therefore, improving ML Goodput is an important goal for model training — both from an efficiency perspective, as well as for model iteration velocity.
However, there are several challenges to improving ML Goodput today: frequent interruptions that necessitate restarts from the latest checkpoint, slow inline checkpointing that interrupts training, and limited observability that makes it difficult to detect failures. These issues contribute to a significant increase in the time-to-market (TTM) and cost-to-train. There have been several industry publications articulating these issues, e.g., this Arxiv paper.
Improving ML Goodput
In order to improve ML Goodput, you need to minimize the impact of disruptive events on the progress of the training workload. To resume a job quickly, you can automatically scale down the job, or swap failed resources from spare capacity. At Google Cloud, we call this elastic training. Further, you can reduce workload interruptions during checkpointing and speed up checkpoint loads on failures from the nearest available storage location. We call these capabilities asynchronous checkpointing and multi-tier checkpointing.
The following picture illustrates how these techniques provide an end-to-end remediation workflow to improve ML Goodput for training. An example workload of nine nodes is depicted with three-way data parallelism (DP) and three-way pipeline parallelism (PP), with various remediation actions shown based on the failures and spare capacity.
You can customize the remediation policy for your specific workload. For example, you can choose between a hotswap and a scaling-down remediation strategy, or to configure checkpointing frequency, etc. A supervisor process receives failure, degradation, and straggler signals from a diagnostic service. The supervisor uses the policy to manage these events. In case of correctable errors, the supervisor might request an in-job restart, potentially restoring from a local checkpoint. For uncorrectable hardware failures, a hot swap can replace the faulty node, potentially restoring from a peer checkpoint. If no spare resources are available, the system can scale down. These mechanisms ensure training is more resilient and adaptable to resource changes. When a replacement node is available, training scales up automatically to maximize GPU utilization. During scale down and scale up, user-defined callbacks help adjust hyperparameters such as learning rate and batch size. You can set remediation policies using a Python script.
Let’s take a deeper look at the key techniques you can use when optimizing ML Goodput.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e612fbff4f0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Elastic training
Elastic training enhances the resiliency of LLM training by enabling failure sensing and mitigation capabilities for workloads. This allows jobs to automatically continue with remediation strategies including GPU reset, node hot swap, and scaling down the data-parallel dimension of a workload to avoid using faulty nodes, thereby reducing job interruption time and improving ML Goodput. Furthermore, elastic training enables automatic scaling up of data-parallel replicas when replacement nodes become available, maximizing training throughput.
Watch this short video to see elastic training techniques in action:
Sub-optimal checkpointing can lead to unnecessary overhead during training and significant loss of training productivity when interruptions occur and previous checkpoints are restored. You can substantially reduce these impacts by defining a dedicated asynchronous checkpointing process and optimizing it to quickly offload the training state from GPU high-bandwidth memory to host memory. Tuning the checkpoint frequency — based on factors such as the job interruption rate and the asynchronous overhead — is vital, as the best interval may range from several hours to mere minutes, depending on the workload and cluster size. An optimal checkpoint frequency minimizes both checkpoint overhead during training operation and computational loss during unexpected interruptions.
A robust way to meet the demands of frequent checkpointing is to leverage three levels of storage: local node storage, e.g., local SSD; peer node storage in the same cluster; and Google Cloud Storage. This multi-tiered checkpointing approach automatically replicates data across these storage tiers during save and restore operations via the host network interface or NCCL (the NVIDIA Collective Communications Library), allowing the system to use the fastest accessible storage option. By combining asynchronous checkpointing with a multi-tier storage strategy, you can achieve quicker recovery times and more resilient training workflows while maintaining high productivity and minimizing the loss of computational progress.
Watch this short video to see optimized checkpointing techniques in action :
These ML Goodput improvement techniques leverage NVIDIA Resiliency Extension, which provides failure signaling and in-job restart capabilities, as well as recent improvements to PyTorch’s distributed checkpointing, which support several of the previously mentioned checkpoint-related optimizations. Further, these capabilities are integrated with Google Kubernetes Engine (GKE) and the NVIDIA NeMo training framework, pre-packaged into a container image and available with an ML Goodput optimization recipe for easy deployment.
Elastic training in action
In a recent internal case study with 1,024 A3 Mega GPU-accelerated instances (built on NVIDIA Hopper), workload ML Goodput improved from 80%+ to 90%+ using a combination of these techniques. While every workload may not benefit in the same way, this table shows the specific metric improvements and ML Goodput contribution of each of the techniques.
Example: Case study experiment used an A3 Mega cluster with 1024 GPUs running ~40hr jobs with ~5 simulated interruptions per day
Conclusion
In summary, elastic training and optimized checkpointing, along with easy deployment options, are key strategies to maximize ML Goodput for large PyTorch Training workloads. As seen from the case study above, they can contribute to meaningful ML Goodput improvements and provide significant efficiency savings. These capabilities are customizable and composable through a python script. If you’re running PyTorch GPU training workloads on Google Cloud today, we encourage you to try out our ML Goodput optimization recipe, which provides a starting point with recommended configurations for elastic training and checkpointing. We hope you have fun building and share your feedback!
Various teams and individuals within Google Cloud contributed to this effort. Special thanks to – Jingxin Ye, Nicolas Grande, Gerson Kroiz, and Slava Kovalevskyi, as well as our collaborative partners – Jarek Kazmierczak, David Soto, Dmitry Kakurin, Matthew Cary, Nilay Goyal and Parmita Mehta for their immense contributions to developing all of the components that made this project a success.
1. Assuming A3 Ultra pricing for 20,000 GPUs with jobs spanning 8 weeks or longer
Confidential Computing has redefined how organizations can securely process their sensitive workloads in the cloud. The growth in our hardware ecosystem is fueling a new wave of adoption, enabling customers to use Confidential Computing to support cutting-edge uses such as building privacy-preserving AI and securing multi-party data analytics.
We are thrilled to share our latest Confidential Computing innovations, highlighting the creative ways our customers are using Confidential Computing to protect their most sensitive workloads including AI workloads.
Building on our foundational work last year, we’ve seen remarkable progress through our deep collaborations with industry leaders including Intel, AMD, and NVIDIA. Together, we’ve significantly expanded the reach of Confidential Computing, embedding critical security features across the latest generations of CPUs, and also extending them to high-performance GPUs.
Confidential VMs and GKE Nodes with NVIDIA H100 GPUs for AI workloads, in preview
An ongoing, top goal for Confidential Computing is to expand our capabilities for secure computation.
We unveiled Confidential Virtual Machines on the accelerator-optimized A3 machine series with NVIDIA H100 GPUs last year, which extends hardware-based data protection from the CPU to GPUs. Confidential VMs can help ensure the confidentiality and integrity of artificial intelligence, machine learning, and scientific simulation workloads using protected GPUs while the data is in use.
“AI and Agentic workflows are accelerating and transforming every aspect of business. As these technologies are integrated into the fabric of everyday operations — data security and protection of intellectual property are key considerations for businesses, researchers and governments,” said Daniel Rohrer, vice president, software product security, NVIDIA. “Putting data and model owners in direct control of their data’s journey — NVIDIA’s Confidential Computing brings advanced hardware-backed security for accelerated computing providing more confidence when creating and adopting innovative AI solutions and services.”
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e612fe7d370>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Confidential Vertex AI Workbench, in preview
We are expanding Confidential Computing support on Vertex AI. Vertex AI Workbench customers can now use Confidential Computing to enhance their data privacy needs, and is now in preview. This integration offers greater privacy and confidentiality with just a few clicks.
How to enable Confidential VMs in Vertex AI Workbench instances.
Confidential Space with Intel TDX (generally available) and NVIDIA H100 GPUs, in preview
We are excited to announce that Confidential Space is now generally available on the general-purpose C3 machine series with Intel® Trust Domain Extensions (Intel® TDX) technology, and coming soon in preview on the accelerator-optimized A3 machine series with NVIDIA H100 GPUs.
Built on our Confidential Computing portfolio, Confidential Space provides a secure enclave, also known as a Trusted Execution Environment (TEE), that Google Cloud customers can use for privacy-focused applications such as joint data analysis, joint machine learning (ML) model training or secure sharing of proprietary ML models.
Importantly, Confidential Space is designed to protect data from all parties involved — including removing the operator of the environment from the trust boundary along with hardened protection against cloud service provider access. These properties can help organizations harden their products from insider threats, and ultimately provide stronger data privacy guarantees to their own customers.
Confidential Space enables secure collaboration.
Confidential GKE Nodes on C3 machines with Intel TDX and built-in acceleration, generally available
Confidential GKE Nodes are now generally available with Intel TDX. These nodes are powered by the general purpose C3 machine series, which run on the 4th generation Intel Xeon Scalable processors (code-named Sapphire Rapids) and have the Intel Advanced Matrix Extensions (Intel AMX) built in and on by default.
Confidential GKE Nodes with Intel TDX offers nodes an additional isolation layer from the host and hypervisor to protect nodes against a broad range of software and hardware attacks.
“Intel Xeon processors deliver outstanding performance and value for many machine learning and AI inference workloads, especially with Intel AMX acceleration,” said Anand Pashupathy, vice president and general manager, Security Software and Services, Intel. “Google Cloud’s C3 machine series will not only impress with their performance on AI and other workloads, but also protect the confidentiality of the user’s data.”
How to enable Confidential GKE Nodes with Intel TDX.
Confidential GKE Nodes on N2D machines with AMD SEV-SNP, generally available
Confidential GKE nodes are also now generally available with AMD Secure Encrypted Virtualization-Secure Nested Paging (AMD SEV-SNP) technology. These nodes use the general purpose N2D machine series and run on the 3rd generation AMD EPYC™ (code-named Milan) processors. Confidential GKE nodes with AMD SEV-SNP provides security for cloud workloads through assurance that workloads are running and encrypted on secured hardware.
Confidential VMs on C4D machines with AMD SEV, in Preview
The C4D machine series are powered by the 5th generation AMD EPYC™ (code-named Turin) processors and designed to deliver optimal, reliable, and consistent performance with Google’s Titanium hardware.
Today, we offer global availability of Confidential Compute on AMD machine families such as N2D, C2D, and C3D. We’re happy to share that Confidential VMs on general purpose C4D machine series with AMD Secure Encrypted Virtualization (AMD SEV) technology are in preview today, and will be generally available soon.
Unlocking new use cases with Confidential Computing
We’re seeing impact across all major verticals where organizations are using Confidential Computing to unlock business innovations.
AiGenomix AiGenomix is leveraging Google Cloud Confidential Computing to deliver highly differentiated infectious disease surveillance, early detection of cancer, and therapeutics intelligence with a global ecosystem of collaborators in the public and private sector.
“Our customers are dealing with extremely sensitive data about pathogens. Adding relevant data sets like patient information and personalized therapeutics further adds to the complexity of compliance. Preserving privacy and security of pathogens, patients’ genomic and related health data assets is a requirement for our customers and partners,” said Dr. Jonathan Monk, head of bioinformatics, AiGenomix.
“Our Trusted AI for Healthcare solutions leveraging Google Cloud Confidential Computing overcome the barriers to accelerated global adoption by making sure that our assets and processes are secure and compliant. With this, we are able to contribute towards the mitigation of the ever-growing risk emerging from infectious diseases and drug resistance resulting in loss of lives and livelihood,” said Dr. Harsh Sharma, chief AI strategist, AiGenomix.
Google Ads Google Ads has introduced confidential matching to securely connect customers’ first-party data for their marketing. This marks the first use of Confidential Computing in Google Ads products, and there are plans to bring this privacy-enhancing technology to more products over time.
“Confidential matching is now the default for any data connections made for Customer Match including Google Ads Data Manager — with no action required from you. For advertisers with very strict data policies, it also means the ability to encrypt the data yourself before it ever leaves your servers,” said Kamal Janardhan, senior director, Product Management, Measurement, Google Ads.
Google Ads plans to further integrate Confidential Computing across more services, such as the new Google tag gateway for advertisers. This update will give marketers conversion tag data encrypted in the browser, by default, and at no extra cost. The Google tag gateway for advertisers can help drive performance improvements and strengthen the resilience of advertisers’ measurement signals, while also boosting security and increasing transparency on how data is collected and processed.
Swift Swift is using Confidential Computing to ensure that sensitive data from some of the largest banks remains completely private while powering a money laundering detection model.
“We are exploring how to leverage the latest technologies to build a global anomaly detection model that is trained on the historic fraud data of an entire community of institutions in a secure and scalable way. With a community of banks we are exploring an architecture which leverages Google Cloud Confidential Computing and verifiable attestation, so participants can ensure that their data is secure even during computation as they locally train the global model and rely on verifiable attestation to ensure the security posture of every environment in the architecture,” said Rachel Levi, head of artificial intelligence, Swift.
Expedite your Confidential Compute journey with Gemini Cloud Assist, in preview
To make it easy for you to use Confidential Computing we’re providing AI-powered assistance directly in existing configuration workflows by integrating Gemini Cloud Assist across Confidential Compute, now in preview.
Through natural language chat, Google Cloud administrators can get tailored explanations, recommendations, and step-by-step guidance for many security and compliance tasks. One such example is Confidential Space, where Gemini Cloud Assist can guide you through the journey of setting up the environment as a Workload Author, Workloads Operator, or a Data Collaborator. This significantly reduces the complexity and the time to set up such an environment for organizations.
Gemini Cloud Assist for Confidential Space
Next steps
By continuously innovating and collaborating, we’re committed to making Confidential Computing the cornerstone of a secure and thriving cloud ecosystem.
Our latest video covers several creative ways organizations are using Confidential Computing to move their AI journeys forward. You can watch it here.
Welcome to the first Cloud CISO Perspectives for May 2025. Today, Iain Mulholland, senior director, Security Engineering, pulls back the curtain on how Google Cloud approaches security engineering and how we take secure by design from mindset to production.
As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.
aside_block
<ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7f61c98580>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
How Google Cloud’s security team helps engineers build securely
By Iain Mulholland, senior director, Security Engineering
Velocity is a chief concern in every executive office, but it falls to CISOs to balance the tension between keeping the business secure and ensuring the business keeps up. At Google, we’re constantly thinking about how to enable both resilience and innovation.
For decades, we’ve been taking a holistic approach to how security decision-making can work better. We believe that the success we’ve seen with our security teams is achievable at many organizations, and can help lead to better security and business outcomes.
My team is responsible for ensuring Google Cloud is the most secure cloud, and we approach security as an engineering function. It’s a different lens than traditional IT or compliance views, two parts of the business where security priorities are often set, which results in improved decision-making and security outcomes.
Our Office of the CISO security engineering team partners with product team software engineers at all stages of the software development lifecycle to find paths to ship secure software — all while maintaining product-release velocity and adhering to secure-by-design principles.
We’re still seeing too many organizations rely on defenses that were designed for the desktop era — despite successful efforts to convince business leaders to invest in more modern security tools, as Phil Venables and Andy Wen noted last year.
“To be truly resilient in today’s security landscape, organizations must consider an IT overhaul and rethink their strategy toward solutions with modern, secure-by-design architectures that nullify classes of vulnerabilities and attack vectors,” they said.
To turn this core security philosophy into reality, we’ve used it to guide how we build our teams. Cloud security engineers are embedded with product teams to help the entire organization “shift left” and take an engineering-centered approach to security. Our Office of the CISO security engineering team partners with product team software engineers at all stages of the software development lifecycle (SDLC) to find paths to ship secure software — all while maintaining product-release velocity and adhering to secure-by-design principles.
You can see this in action with our threat modelling practice. Security engineers and software development teams work closely to analyze potential threats to the product and to identify actions and product capabilities that can mitigate risks. Because this happens in the design phase, the team can eliminate these threats early in the SDLC, ensuring our products are secure by design.
With engineering as our security foundation, we can build capabilities at breadth, at depth, and in clear relationship to each other, so that our total power exceeds the sum of these parts.
Instead of simulating risk, we deploy our researchers to consider the whole cloud as an attack surface. They chain vulnerabilities in novel ways to improve our overall security architecture.
Protecting against threats is a great example of the impact of this approach. We characterize the vast cloud threat landscape in three specific areas: outbound network attacks (such as DDoS, outbound intrusion attempts, and vulnerability scans); resource misuse (such as cryptocurrency mining, illegal video streaming, and bots); and content-based threats (such as phishing and malware).
Across that landscape, threat actors often use similar techniques and exploit similar vulnerabilities. To combat these tactics, the team generates intelligence to prevent, detect, and mitigate risk in Google Cloud offerings before they become problems to our customers.
We “shift left” on threats, too: Identifying this systemic risk feeds into the lifecycle of software and product development. Once we identify a threat vector, we work closely with our security and product engineers to harden product defenses to help eliminate threats before they can take root.
We use AI, advanced data science, and analytics solutions to protect Google Cloud and our customers from future threats by focusing on three key capabilities: predicting future user behavior, proactively identifying risky security patterns, and improving the efficiency and measurability of threats and security operations.
It’s vital to our mission that we find attack paths before attackers do, reducing unknown security risks by finding vulnerabilities in our products and services before they are made available to customers. In addition to simulating risk, we push our researchers to consider the whole cloud as an attack surface. They chain vulnerabilities in novel ways to improve our overall security architecture.
Responding to threats is a critical third element of our engineering environment’s interlocking capabilities. Our security response operations assess and implement remediation strategies that come from external parties, and we frequently participate in comprehensive, industry-wide responses. Regular collaboration with Google Cloud’s Vulnerability Rewards Program has been a major driver of our success in this area.
Across all of these areas, there is incredible complexity, but the philosophy that guides the work is simple: By baking security into engineering processes, you can secure systems better and earlier than bolting security on at the end. Investing in a deep engineering bench coupled with embedding security personnel, processes, and procedures as early as possible in the development lifecycle can strengthen decision-making confidence and business resilience across the organization.
You can learn more about how you can incorporate security best practices into your organization’s engineering environment from our Office of the CISO.
aside_block
<ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7f61c98940>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
In case you missed it
Here are the latest updates, products, services, and resources from our security teams so far this month:
How boards can boost resiliency with the updated U.K. cyber code: Here’s how Google Cloud can help your organization and board of directors adapt to the newly updated U.K. cyber code. Read more.
What’s new in IAM, Access Risk, and Cloud Governance: A core part of our mission is to help you meet your policy, compliance, and business objectives. Here’s what’s new for IAM, Access Risk, and Cloud Governance. Read more.
3 new ways to use AI as your security sidekick: Generative AI is already providing clear and impactful security results. Here’s three decisive examples that organizations can adopt right now. Read more.
Expanding our Risk Protection Program with new insurance partners and AI coverage: We unveiled at Next ‘25 major updates to our Risk Protection Program, an industry-first collaboration between Google and cyber insurers. Here’s what’s new. Read more.
From insight to action: M-Trends, agentic AI, and how we’re boosting defenders at RSAC 2025: From the latest M-Trends report to updates across Google Unified Security, our product portfolio, and our AI capabilities, here’s what’s new from us at RSAC. Read more.
The dawn of agentic AI in security operations: Agentic AI promises a fundamental, tectonic shift for security teams, where intelligent agents work alongside human analysts. Here’s our vision for the agentic future. Read more.
What’s new in Android security and privacy in 2025: We’re announcing new features and enhancements that build on our industry-leading protections to help keep you safe from scams, fraud, and theft on Android. Read more.
Please visit the Google Cloud blog for more security stories published this month.
COLDRIVER using new malware to steal data from Western targets and NGOs: Google Threat Intelligence Group (GTIG) has attributed new malware to the Russian government-backed threat group COLDRIVER (also known as UNC4057, Star Blizzard, and Callisto) that has been used to steal data from western governments and militaries, as well as journalists, think tanks, and NGOs. Read more.
Cybercrime hardening guidance from the frontlines: The U.S. retail sector is currently being targeted in ransomware operations that GTIG suspects is linked to UNC3944, also known as Scattered Spider. UNC3944 is a financially-motivated threat actor characterized by its persistent use of social engineering and brazen communications with victims. Here’s our latest proactive hardening recommendations to combat their threat activities. Read more.
Please visit the Google Cloud blog for more threat intelligence stories published this month.
Now hear this: Podcasts from Google Cloud
How cyber-savvy is your board: We’ve long extolled the importance of bringing boards of directors up to speed on cybersecurity challenges both foundational and cutting-edge, which is why we’ve launched “Cyber Savvy Boardroom,” a new monthly podcast from our Office of the CISO’s David Homovich, Alicja Cade, and Nick Godfrey. Our first three episodes feature security and business leaders known for their intuition, expertise, and guidance, including Karenann Terrell, Christian Karam, and Don Callahan. Listen here.
From AI agents to provenance in MLSecOps: What is MLSecOps, and what should CISOs know about it? Diana Kelley, CSO, Protect AI, goes deep on machine-learning model security with hosts Anton Chuvakin and Tim Peacock. Listen here.
What we learned at RSAC 2025: Anton and Tim discuss their RSA Conference experiences this year. How did the show floor hold up to the complicated reality of today’s information security landscape? Listen here.
Deconstructing this year’s M-Trends: Kirstie Failey, GTIG, and Scott Runnels, Mandiant Incident Response, chat with Anton and Tim about the challenges of turning standard incident reports into bigger-picture review found in this year’s M-Trends. Listen here.
Defender’s Advantage: How UNC5221 targeted Ivanti Connect Secure VPNs: Mandiant’s Matt Lin and Ivanti’s Daniel Spicer join host Luke McNamara as they dive into the research and response of UNC5221’s campaigns against Ivanti. Listen here.
To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.
The telecommunications industry is undergoing a profound transformation, with AI and generative AI emerging as key catalysts. Communication service providers (CSPs) are increasingly recognizing that these technologies are not merely incremental improvements but fundamental drivers for achieving strategic business and operational objectives. This includes enabling digital transformation, fostering service innovation, optimizing monetization strategies, and enhancing customer retention.
To provide a comprehensive and data-driven analysis of this evolving landscape, Google Cloud partnered with Analysys Mason to conduct an in-depth study “ Gen AI in the network: CSP progress in adopting gen AI for network operations. This research examines CSPs’ progress, priorities, challenges, and best practices in leveraging gen AI to reshape their networks, offering quantifiable insights into this critical transformation.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7f6108bc40>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Key findings: A data-driven roadmap
The Analysys Mason study offers valuable insights into the current state of gen AI adoption in telecom, providing a data-driven roadmap for CSPs seeking to navigate this transformative journey:
1. Widespread gen AI adoption and future intentions
Demonstrating the strong momentum behind gen AI, 82% of CSPs surveyed are currently trialing or using it in at least one network operations area, and this adoption is set to expand further, with an additional 9% planning to implement it within the next 2 years.
2. Strategic importance of gen AI
Gen AI empowers CSPs to achieve strategic goals within the network: 57% surveyed see it as a key enabler of autonomous, cloud-based network transformation initiatives, 52% for the transition to new business models like NetCo/ServCo and more digitally driven organizations, and all with the aim of enhancing customer experience and driving broader transformation.
3. Key drivers for gen AI investment
CSPs are strategically prioritizing gen AI investments to achieve a range of network objectives, including optimizing network performance and reliability, enhancing application quality of experience (QoE), and improving network resource utilization, recognizing gen AI’s potential to move beyond a productivity tool and become a cornerstone of future network operations and automation..
4. Challenges in achieving model accuracy
While gen AI offers significant potential, the study found that 80% of CSPs face challenges in achieving the expected accuracy from gen AI models, a hurdle that impacts use case scaling and ROI. These accuracy issues are linked to data-related problems, which many CSPs across different maturity levels are still working to resolve, and the complexity of customizing models for specific network operations.
5. Addressing the skills gap
With over 50% of CSPs citing it as a key concern, employee skillsets represent a major challenge, highlighting the urgent imperative for CSPs to invest in upskilling and reskilling initiatives to cultivate in-house expertise in AI, gen AI, and data science related fields.
6. Gen AI implementation strategies
While many CSPs begin their gen AI implementation by utilizing vendor-provided applications with embedded gen AI capabilities (the most common approach), the study emphasizes that to fully address their diverse network needs, CSPs also seek to customize models using techniques like fine-tuning and prompt engineering; this customization, however, is heavily reliant on a strong data strategy to overcome challenges such as data silos and data quality issues, which significantly impact the accuracy and effectiveness of the resulting gen AI solutions.
7. Deployment preferences
While 51% of CSPs indicated hybrid cloud environments as the predominant deployment choice for gen AI platforms in network operations, reflecting the need for flexibility and control, a significant 39% of CSPs show a strong preference for private cloud-only deployments specifically for their data platforms, driven by the critical importance of data security and control. Public cloud deployments are preferred for AI model deployments.
Recommendations for CSPs
In summary, to secure a competitive edge, CSPs will need to prioritize gen AI use caseswith clear ROI by adopting early-win gen AI use cases while developing a long-term strategy, transform their organizational structure and invest in upskilling initiatives, develop and implement a robust data strategy to support all AI initiatives and cultivate strong partnerships with expert vendors to accelerate their gen AI journey.
Google Cloud: Your partner for network transformation
Google Cloud empowers CSPs’ data-driven transformation by providing expertise in operating planetary-scale networks, a unified data platform, AI model optimization, professional services for gen AI, hybrid cloud solutions, and a rich partner ecosystem. This is further strengthened by Google Cloud’s proven success in driving network transformation for major telcos, leveraging infrastructure, platforms, and tools that deliver the required near real-time processing and scale.
The explosion of digital content from social media, smartphones, and other sources has created a massive amount of unstructured data like images, videos, and documents. To help you analyze this data, BigQuery is connected with Vertex AI, Google Cloud’s powerful AI platform, so you can use advanced AI models, like Gemini 2.5 Pro/Flash, to understand the meaning hidden within your unstructured data.
Google’s advanced AI models can analyze a wide range of data formats, from text and images to audio and video. They can extract key information like names, dates, and keywords, transforming raw data into structured insights that integrate with your existing tools. Plus, with new techniques like constrained decoding, these models can even generate structured data in JSON format, helping to ensure compatibility with your workflows.
To further streamline this process, we recently added a new BigQuery feature called AI.GENERATE_TABLE(), which builds upon the capabilities of ML.GENERATE_TEXT(). This function allows you to automatically convert the insights from your unstructured data into a structured table within BigQuery, based on the provided prompt and table schema. This streamlined process allows you to easily analyze the extracted information using your existing data analysis tools.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7f6109c460>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Extracting structured data from images
Let’s dive deeper into how this new feature works with an example that uses three images. First, you have a picture of the Seattle skyline featuring the iconic Space Needle. Next, you have a city view of New York City. Finally, you have an image of cookies and flowers, which is unrelated to cityscapes.
To use these images with BigQuery’s generative AI functions, you first need to make them accessible to BigQuery. You can do this by creating a table namely “image_dataset” that connects to the Google Cloud Storage bucket where the images are stored.
code_block
<ListValue: [StructValue([(‘code’, ‘CREATE OR REPLACE EXTERNAL TABLErn bqml_tutorial.image_datasetrnWITH CONNECTION DEFAULT rnOPTIONS(object_metadata=”DIRECTORY”,rn uris=[“gs://bqml-tutorial-bucket/images/*”])’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7f61c5aeb0>)])]>
Now that you’ve prepared your image data, let’s connect to the powerful Gemini 2.5 Flash model. You do this by creating a “remote model” within BigQuery, which acts as a bridge to this advanced AI.
code_block
<ListValue: [StructValue([(‘code’, ‘CREATE OR REPLACE MODELrn bqml_tutorial.gemini25flash001rnREMOTE WITH CONNECTION DEFAULT rnOPTIONS (endpoint = “gemini-2.5-flash-001″)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7f61c5a3a0>)])]>
Now, let’s use the AI.GENERATE_TABLE() function to analyze the images. You’ll need to provide the function with two things: the remote model you created (connected to Gemini 2.5 Flash) and the table containing your images.
You’ll ask the model to “Recognize the city from the picture and output its name, belonging state, brief history, and tourist attractions. Please output nothing if the image is not a city.” To ensure the results are organized and easy to use, we’ll specify a structured output format with the following fields:
city_name (string)
state (string)
brief_history (string)
attractions (array of strings)
This format, known as a schema, ensures the output is consistent and compatible with other BigQuery tools. You’ll notice that the syntax for defining this schema is the same as the CREATE TABLE command in BigQuery.
code_block
<ListValue: [StructValue([(‘code’, ‘SELECTrn city_name,rn state,rn brief_history,rn attractions,rn urirnFROMrn AI.GENERATE_TABLE( MODEL bqml_tutorial.gemini25flash001,rn (rn SELECTrn (“Recognize the city from the picture and output its name, belonging state, brief history, and tourist attractions. Please output nothing if the image is not a city.”, ref) AS prompt,rn urirn FROMrn bqml_tutorial.image_dataset),rn STRUCT( “city_name STRING, state STRING, brief_history STRING, attractions ARRAY<STRING>” AS output_schema,rn 8192 AS max_output_tokens))’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7f61486430>)])]>
When you run the AI.GENERATE_TABLE() function, it produces a table with five columns. Four of these columns match the schema you defined (city_name, state, brief_history, and attractions), while the fifth column contains the image URI from the input table.
As you can see, the model successfully identified the cities in the first two images, providing their names and the states in which they are found. It even generated a brief history and a list of attractions for each city based on its internal knowledge. This demonstrates the power of large language models to extract information and insights directly from images.
Extracting structured data from medical transcriptions
Now let’s see another example where you can use AI.GENERATE_TABLE to extract information from unstructured data stored in a BQ managed table. We are going to use the Kaggle Medical Transcriptions dataset which contains sample medical transcriptions from various specialities.
Transcriptions are long and verbose and have all kinds of information, e.g. a patient’s age, weight, blood pressure, conditions, etc. It is challenging and time-consuming for people to process them manually and make it well organized. But now, we can let the LLM and AI.GENERATE_TABLE help us.
You can see that the model successfully extracted the information from the medical transcriptions and the results are organized as the schema specified with the help of AI.GENERATE_TABLE.
The AI.GENERATE_TABLE() function can help you transform your data and create a BigQuery table for easy analysis and integration with your existing workflows. To learn more about the full syntax, refer to the documentation. Have feedback on these new features or have additional feature requests? Let us know at bqml-feedback@google.com.
If you’re building a generative AI application or an AI agent, there’s a high likelihood you’ll need to perform simultaneous searches on structured and unstructured data. For example, the prompt “Show me all pictures of sunsets I took in the past month” includes a structured part (the date is within the past month) and an unstructured part (the picture contains a sunset). In recent years, modern relational databases such as AlloyDB for PostgreSQL have added vector search capabilities to cover the unstructured part.
At Google Cloud Next 2025, we announced a series of key innovations in AlloyDB AI’s ScaNN index to improve performance and quality of search over structured and unstructured data. By deeply integrating with the AlloyDB query planner, the ScaNN index is able to optimize the ordering of SQL filters in vector search based on your workload characteristics. Let’s dive into what filter selectivity is and how AlloyDB ScaNN’s index leverages it to improve the performance and quality of your search.
aside_block
<ListValue: [StructValue([(‘title’, ‘Get started with a 30-day AlloyDB free trial instance’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7f615b8940>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://goo.gle/try_alloydb’), (‘image’, None)])]>
Filtered vector search
To illustrate the power of filtered vector search in AlloyDB, imagine you’re an online retailer managing a product catalog within AlloyDB. With more than 100,000 items, this product catalog includes references to images, textual descriptions, inventory information, and catalog metadata in your products table.
To search through this data, you can leverage vector search with SQL filters to enable search across unstructured and structured data, providing users with higher quality search results. In the metadata, there may be fields such as color, gender, size and price stored in your table that you can leverage as search filters.
Say a user searches for a “maroon puffer jacket”. You might use “maroon” as a filter and “puffer jacket” as the part of the query upon which you perform a vector search. So you might have a SQL statement like:
code_block
<ListValue: [StructValue([(‘code’, “SELECT * from products WHERE color=’maroon’ ORDER BY text_embedding <-> google_ml.embedding(‘text-embedding-005’, ‘puffer jacket’) LIMIT 100″), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7f61deb460>)])]>
In the products table, we have set a vector index on our text_embedding column and a B-tree index on our metadata column, color.
Depending on how commonly maroon appears in the color column of the dataset, which is called selectivity in database terminology, the AlloyDB query planner may choose to apply the filter before, after, or in-line with the vector search query. Let’s dive into why the planner may choose one option over the other.
High selectivity
When a filter is highly selective, it means that only a small percentage of your data meets the specified criteria. In our example, “maroon” is a rare color, with only 0.2% of the 100,000 products in the catalog being that color.
In cases where we have highly selective filters, the AlloyDB query planner often chooses to apply a pre-filter, i.e, applying the filtering conditions prior to the vector search. In our example, we would apply our filter condition WHERE color=’maroon’before the vector search. Since “maroon” is rare, the B-tree index on the color column efficiently identifies a small subset of products (e.g., 200 out of 100,000). Subsequently, the computationally intensive vector search is performed only on this significantly reduced set of candidates. This strategy utilizes a K-Nearest Neighbors (KNN) vector search, which delivers results with 100% recall, i.e., the exact closest neighbors, within the set of results after the filter is replied.
Low selectivity
Conversely, if the filter isn’t highly selective (e.g., if 90% of products are “blue”), pre-filtering is inefficient because it doesn’t significantly narrow down the search space. In such cases, when a large proportion of your data satisfies the filtering conditions, a filter is considered to have low selectivity.
Say you’re searching for “blue puffer jackets”; if 90% of our catalog is blue, applying the filter first isn’t beneficial because it doesn’t narrow down our list of candidates all that much. If you applied a pre-filter, you would end up performing a KNN vector search against the majority of the dataset, which would be computationally expensive. Therefore, the AlloyDB query planner would choose to apply a post-filter.
Post-filtering means performing the vector search first, leveraging an Approximate Nearest Neighbors (ANN) vector index such as ScaNN on the text_embedding column to quickly identify a set of candidate results. Only after retrieving these initial candidates — the top 100 based on vector similarity — is the filter condition, WHERE color=’blue’, applied.
If your filter had high selectivity, there’s a risk this approach would yield very few candidates meeting your filter criteria. However because the condition WHERE color=’blue’ has low selectivity, you would likely obtain the approximate top-100 results. In the unlikely case you do not retrieve 100 results, the vector search would need to perform additional scans on the vector index to retrieve more candidates until the desired limit was reached. While effective for filters with low selectivity, post-filtering can become less efficient with highly selective filters, as the vector index might need to scan through many non-matching candidates.
Medium selectivity
When a filter has medium selectivity, the AlloyDB query planner may choose to apply either a pre-filter or a post-filter. However in cases of medium selectivity that range from 0.5-10% selectivity (such as, say, the color “purple”), AlloyDB supports a method called inline filtering, or in-filtering. Inline filtering applies the filter conditions in tandem with the vector search. With in-line filtering, AlloyDB leverages a bitmap from a B-tree index to select candidates matching the filter condition in tandem with the vector search in one pass.
So in this example, while the plan evaluates which candidates are purple, AlloyDB is simultaneously searching for the approximate neighbors of the search query against items in the data catalog. This approach balances the benefits of reducing the search space, as pre-filtering does, without the risk of returning too few results, a potential issue with post-filtering when combined with a highly selective filter.
Adaptive filtration
While the cases detailed above seem to clearly partition the search space across three different kinds of filtering, in practice it’s not so simple. At times the query planner may misjudge the selectivity of a filter due to outdated statistics, resulting in the vector search and filtering conditions being applied in a suboptimal order, for less high-quality results. This is where AlloyDB ScaNN’s latest innovation, adaptive filtration, comes in. With adaptive filtration, AlloyDB learns the selectivity of your filters at query time based on actual observed statistics and can adaptively change its execution plan. This results in more optimal ordering of filters and vector search, and greatly mitigates cases of planner misestimations.
In summary, real-world workloads are complex, and distinct filtering conditions have different selectivities that may change over time as your data and workloads grow. That’s where an intelligent database engine powering your vector search can make a difference — by optimizing and adapting filtering for your workload, helping to ensure consistently high-quality and performant search results as your data evolves.
Get started today
Get started with vector search leveraging AlloyDB’s ScaNN index today. Then, learn how you can use AlloyDB AI’s latest features to power multimodal vector search. Adaptive filtration is available in preview; get started by turning on the feature flag.
Like most organizations, Google Cloud is continually engaging with customers, partners, and policymakers to deliver technology capabilities that reflect their needs. When it comes to digital sovereignty solutions, Google Cloud has worked with customers for nearly a decade.
Today, we’re pleased to announce significant technical and commercial updates on our sovereign cloud solutions for customers, and details on how we’re helping them achieve greater control, choice, and security in the cloud — without compromising functionality.
Building on the first sovereign solutions we introduced years ago, we’ve massively scaled our infrastructure footprint globally, now consisting of more than 42 cloud regions, 127 zones, 202 network edge locations, and 33 subsea cable investments.
We have also forged key partnerships in Asia, Europe, the Middle East, and the United States to help deliver these sovereign solutions, including Schwarz Group and T-Systems (Germany), S3NS (France), Minsait (Spain), Telecom Italia (Italy), Clarence (Belgium and Luxembourg), CNTXT (Saudi Arabia), KDDI (Japan), and World Wide Technology (United States).
A commitment to customer choice
Digital sovereignty is about more than just controlling encryption keys. At its core, it’s about giving customers the flexibility their global businesses require. It’s about enabling them to operate on multiple clouds. And it’s about securing data with the most advanced technologies.
We’ve long been committed to enabling customers to choose the cloud provider and solution that best fit their needs, and not locking them into a single option. Sovereignty in the cloud is not one-size-fits-all. We offer customers a portfolio of solutions that align with their business needs, regulatory requirements, and risk profiles.
Our strong contractual commitments to our customers are backed by robust sovereign controls and solutions that are all available today. Our updated sovereign cloud solution portfolio includes:
Google Cloud Data Boundary gives customers the ability to deploy a sovereign data boundary and control where their content is stored and processed. This boundary also allows customers to store and manage their encryption keys outside Google’s infrastructure, which can help customers meet their specific data access and control requirements no matter what market.
Google Cloud Data Boundary customers have access to a large set of Google Cloud products, including AI services, and can enable capabilities, including Confidential Computing and External Key Management with Key Access Justifications to control access to their data and deny access for any reason.
In addition, Google Workspace customers can take advantage of Google Cloud Data Boundary’s sovereign controls to limit the processing of their content to the United States or EU, choose a country to locally store data, and use client-side encryption to prevent unauthorized access (even by Google) to their most critical content.
Today, we are also announcing User Data Shield, a solution that adds Mandiant services to validate the security of customer applications built on top of Google Cloud Data Boundary. User Data Shield provides recurring security testing of customer applications to validate sovereignty postures.
Google Cloud Dedicated delivers a solution designed to meet local sovereignty requirements, enabled by independent local and regional partners. As an example, Google Cloud has partnered with Thales since 2021 to build a first-of-its-kind Trusted Cloud by S3NS for Europe.
This offering with Thales is designed to offer a rich set of Google Cloud services with GPUs to support AI workloads and is operated by S3NS, a standalone French entity. Currently in preview, S3NS’ solution is designed to meet the rigorous security and operational resilience requirements of France’s SecNumCloud standards. We are expanding our Google Cloud Dedicated footprint globally, launching next in Germany.
“For France to truly embrace digital sovereignty, it is essential to have a cloud solution that marries the immense power of hyperscale technology with the strictest local security and operational controls. S3NS is committed to providing French organizations with access to advanced cloud services, including critical AI capabilities, all operated within France by a European operator to meet and exceed the rigorous SecNumCloud standards,” said Christophe Salomon, EVP, Information Systems and Secured Communication, at Thales.
Google Cloud Air-Gapped offers a fully standalone and air-gapped solution that does not require connectivity to an external network. This solution is tailored for customers in the intelligence, defense, and other sectors with strict data security and residency requirements. The air-gapped solution can be deployed and operated by Google, the customer, or a Google partner.
It is built with open-source components and comes with a targeted set of AI, database, and infrastructure services. Because air-gapped solutions run on open-source components, they are designed to provide business continuity and survivability in the event of service disruptions. Google Cloud Air-Gapped received authorization in 2024 to host U.S. government Top Secret and Secret-level data.
“Working with Google Cloud to introduce sovereign offerings can give our joint clients greater control, choice, and security in the cloud, without compromising the functionality of their underlying cloud architectures,” said Scott Alfieri, Senior Managing Director and Google Business Group Lead at Accenture. “Google Cloud’s extensive global infrastructure, coupled with Accenture’s transformation and industry expertise, helps organizations build an agile and scalable foundation, unlocking opportunities for growth and continuous innovation.”
Local control, global security
Security and sovereignty are two sides of the same coin. Local control of data and operations can provide customers a greater level of confidence in their security, but it’s also true that no organization can be considered sovereign if dependencies on legacy infrastructure leave its data vulnerable to loss or theft.
Analysis from the Google Threat Intelligence Group and Google Cloud’s Office of the CISO suggests that the global cyber threat landscape will only become more complex as malicious actors tap into AI-powered tools and techniques to prey on older software products, platforms, and outdated infrastructures.
With Google Cloud, customers not only get sovereign solutions, but also gain access to our leading security capabilities. This includes our rigorous focus on secure by design technology and deep expertise from Google Threat Intelligence Group and Mandiant Consulting, who operate on the frontlines of cyber conflicts worldwide and maintain trusted partnerships with more than 80 governments around the world.
In addition, Google Cloud CyberShield provides AI and intelligence-driven cyber defense to help governments defend against threats at national scale. And Mandiant Managed Defense services make it easy for customers worldwide to extend their security teams with our security team.
Google Sovereign Cloud solutions ultimately enable customers to leverage the secure foundation of Google Cloud, while gaining access to advanced security features — such as Confidential Computing, Zero Trust, post-quantum cryptography, and AI-powered platform defenses — faster and more cost-effectively than they could achieve on their own.
Sovereign solutions for any organization
We remain dedicated to fostering an environment of trust and control for our customers, empowering organizations globally to navigate the complex landscape of digital sovereignty with confidence. We continue to work with customers, partners, and policymakers around the world to refine our sovereign cloud offerings and deliver technologies that address their needs.
To learn more about how we are enabling our customers’ digital sovereignty capabilities, visit our web page or contact your account manager.
Today, we’re excited to announce Google AI Edge Portal in private preview, Google Cloud’s new solution for testing and benchmarking on-device machine learning (ML) at scale.
Machine learning on mobile devices enables amazing app experiences. But how will your model truly perform across the vast, diverse, and ever-changing landscape of mobile devices? Manually testing at scale – across hundreds of device types – is a laborious task that often requires a dedicated device lab. It’s slow, prohibitively expensive, and often out of reach to most developers, leaving you guessing about performance on users’ devices and risking delivering a subpar user experience.
Google AI Edge Portal solves the above challenges, enabling you to benchmark LiteRT models so you can find the best configuration for large-scale deployment of ML models across devices. Now, you can:
Simplify & accelerate testing cycles across the diverse hardware landscape: Effortlessly assess model performance across hundreds of representative mobile devices in minutes.
Proactively assure model quality & identify issues early: Pinpoint hardware-specific performance variations or regressions (like on particular chipsets or memory-constrained devices) before deployment.
Lower device testing cost & access latest hardware: Test on diverse and continually growing fleet of physical devices (currently 100+ device models from various Android OEMs) without the expense and complexity of maintaining your own lab.
Unlock powerful, data-driven decisions & business intelligence: Google AI Edge Portal delivers rich performance data and comparisons, providing the crucial business intelligence needed to confidently guide model optimization and validate deployment readiness.
Fig. 1. Interactive dashboard to gain insights on model performance across devices
In this post, we’ll share how our partners are already using Google AI Edge Portal, the user journey, and how you can get started.
What our partners are saying
We’ve been fortunate to work with several innovative teams during the early development of Google AI Edge Portal. Here’s what a few of them had to say about its potential:
How Google AI Edge Portal helps you benchmark your LiteRT models
Upload & configure: Upload your model file via the UI or point to it in your Google Cloud Storage bucket.
Select accelerators: Specify testing against CPU or GPU (with automatic CPU fallback). NPU support is planned for future releases.
Select devices: Choose target devices from our diverse pool using filters (device tier, brand, chipset, RAM) or select curated lists with convenient shortcuts.
Fig. 2. Create a New Benchmark Job on 100+ Devices. (Note: GIF is accelerated and edited for brevity)
From there, submit your job and await completion. Once ready, explore the results in the Interactive Dashboard:
Compare configurations: Easily visualize how performance metrics (e.g., average latency, peak memory) differ when using different accelerators across all tested devices.
Analyze device impact: See how a specific model configuration performs across the range of selected devices. Use histograms and scatter plots to quickly identify performance variations tied to device characteristics.
Detailed metrics: Access a detailed, sortable table showing specific metrics (initialization time, inference latency, memory usage) for each individual device, alongside its hardware specifications.
Fig. 3. View Benchmark Results on the interactive Dashboard. (Note: GIF is accelerated and edited for brevity)
Help us shape the future of Google AI Edge Portal
Your feedback is crucial as we expand availability and enhance capabilities based on developer needs. In the future, we are keen to explore integrating features such as:
Bulk inference & evaluation: Run your models with custom datasets on diverse devices to validate functional correctness and enable qualitative GenAI evaluations.
LLM benchmarking: Introduce dedicated workflows and metrics specifically tailored for benchmarking the unique characteristics of large language models on edge devices.
Model optimization tools: Explore integrated tooling to potentially assist with tasks like model conversion and quantization within the portal.
Expanded platform & hardware support: Work towards supporting additional accelerators like NPUs, and other platforms beyond Android in the future.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5bc475b670>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Join the Google AI Edge Portal private preview
Google AI Edge Portal is available starting today in private preview for allowlisted Google Cloud customers. During this private preview period, access is provided at no charge, subject to the preview terms.
This preview is ideal for developers and teams building mobile ML applications with LiteRT who need reliable benchmarking data across diverse Android hardware and are willing to provide feedback to help shape the product’s future. To request access, complete our sign-up form here to express interest. Access is granted via allowlisting.
We are committed to making Google AI Edge Portal a valuable tool for the entire on-device ML community and we look forward to your feedback and collaboration!
Cloud Run has become a go-to app hosting solution for its remarkable simplicity, flexibility, and scalability. But the age of AI-assisted development is here, and going from idea to application is faster and more streamlined than ever. Today, we’re excited to make AI deployments easier and more accessible by introducing new ways to deploy your apps to Cloud Run:
Deploy applications in Google AI Studio to Cloud Run with a single button click
Scale your Gemma projects with direct deployment of Gemma 3 models from Google AI Studio to Cloud Run
Empower MCP-compatible AI agents to deploy apps with the new Cloud Run MCP server
1. Streamlining app development and deployment with AI Studio and Cloud Run
Google AI Studio is the fastest way to start building with Gemini. Once you develop an app in AI Studio, you can deploy it toCloud Run with a single button click, allowing you to go from code to shareable URL in seconds (video at 2x speed):
Build apps in AI Studio and deploy to Cloud Run
Once deployed, the app is available at a stable HTTPS endpoint that automatically scales, including down to zero when not in use. You can re-deploy with updates from AI Studio, or continue your development journey in the Cloud Run source editor. Plus, your Gemini API key remains securely managed server-side on Cloud Run and is not accessible from the client device.
It’s also a very economical solution for hosting apps developed with AI Studio: Cloud Run has request-based billing with 100ms granularity and a free tier of 2 million requests per month, in addition to any free Google Cloud credits.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Cloud Run’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5bc1fd7ca0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘https://console.cloud.google.com/run’), (‘image’, None)])]>
2. Bring your Gemma app to production in a click with Cloud Run
Gemma is a leading open model for single-GPU performance. To help you scale your Gemma projects, AI Studio now enables direct deployment of Gemma 3 models to Cloud Run:
Selecting Gemma from AI Studio and deploying it to Cloud Run with GPU via a single click in under a minute, with no quota request requirements (video at 4x speed)
This provides an endpoint running on Cloud Run’s simple, pay-per-second, scale-to-zero infrastructure with GPU instances starting in less than five seconds, and it scales to zero when not in use. It’s even compatible with the Google Gen AI SDK out-of-the-box, simply update two parameters in your code to use the newly deployed endpoint:
code_block
<ListValue: [StructValue([(‘code’, ‘from google import genairnfrom google.genai.types import HttpOptionsrnrn# Configure the client to use your Cloud Run endpoint and API keyrnclient = genai.Client(api_key=”KEY_RECEIVED_WHEN_DEPLOYING”, http_options=HttpOptions(base_url=”CLOUD_RUN_ENDPOINT_URL”))rnrn# Example: Stream generate contentrnresponse = client.models.generate_content_stream(rn model=”gemma-3-4b-it”,rn contents=[“Write a story about a magic backpack. You are the narrator of an interactive text adventure game.”]rn)rnfor chunk in response:rn print(chunk.text, end=””)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5bc1fd7700>)])]>
3. Empower AI agents to deploy apps with the new Cloud Run MCP server
The Model Context Protocol (MCP) is an open protocol standardizing how AI agents interact with their environment. At Google I/O, we shared that supporting open standards for how agents will interact with tools is a top priority for us.
Today, we are introducing the Cloud Run MCP server to enable MCP-compatible AI agents to deploy apps to Cloud Run. Let’s see it in action with a variety of MCP clients: AI assistant apps, AI-powered Integrated Development Environments (IDEs), and agent SDKs.
1. AI assistant apps
Using the Claude desktop application to generate a Node.js app and deploy it to Cloud Run (video at 4x speed)
2. AI-powered IDEs
Updating a FastAPI Python app from VS Code with Copilot in agent mode using Gemini 2.5 Pro, and deploying it using the Cloud Run MCP server (video at 4x speed)
3. Agent SDKs, like the Google Gen AI SDK or Agent Development Kit also have support for calling tools via MCP, and can therefore deploy to Cloud Run using the Cloud Run MCP server.
Add the Cloud Run MCP server to your favorite MCP client:
Today, we are introducing the next wave of generative AI media models on Vertex AI: Imagen 4, Veo 3, and Lyria 2.
We’ve already seen customers generate stunning, photorealistic images with Imagen 3, Google’s image generation model. Customers have taken these images and transformed them into high quality videos and assets with Veo 2. We’ve even seen customers take these remarkable videos and bring them to life with professional-grade audio using Lyria, Google’s advanced AI music generation model.
With a surge of momentum in the generative AI media space across marketing, media, and more, storytelling has never been easier. Users are creating campaign assets quicker, and building breakthrough creative content. Let’s take a look into each model and the ways you can get started today.
Imagen 4: Higher quality image generation
Today we’re introducing Imagen 4 text-to-image generation on Vertex AI in public preview. As Google’s highest quality image generation model, Imagen 4 delivers:
Outstanding text rendering and prompt adherence
Higher overall image quality across all styles
Multilingual prompt support to help creators globally
Prompt: Capture an intimate close-up bathed in warm, soft, late-afternoon sunlight filtering into a quintessential 1960s kitchen. The focal point is a charmingly designed vintage package of all-purpose flour, resting invitingly on a speckled Formica countertop. The packaging itself evokes pure nostalgia: perhaps thick, slightly textured paper in a warm cream tone, adorned with simple, bold typography (a friendly serif or script) in classic red and blue “ALL-PURPOSE FLOUR”, featuring a delightful illustration like a stylized sheaf of wheat or a cheerful baker character. In smaller bold print at the bottom of the package: “NET WT 5 LBS (80 OZ) 2.27kg”. Focus sharply on the package details – the slightly soft edges of the paper bag, the texture of the vintage printing, the inviting “All-Purpose Flour” text. Subtle hints of the 1960s kitchen frame the shot – the chrome edge of the counter gleaming softly, a blurred glimpse of a pastel yellow ceramic tile backsplash, or the corner of a vintage metal canister set just out of focus. The shallow depth of field keeps attention locked on the beautifully designed package, creating an aesthetic rich in warmth, authenticity, and nostalgic appeal.
Prompt: This four-panel comic strip uses a charming, deliberately pixelated art style reminiscent of classic 8-bit video games, featuring simple shapes and a limited, bright color palette dominated by greens, blues, browns, and the dinosaur’s iconic grey/black. The setting is a stylized pixel beach. Panel one shows the familiar Google Chrome T-Rex dinosaur, complete with its characteristic pixelated form, wearing tiny pixel sunglasses and lounging on a pixelated beach towel under a blocky yellow sun. Pixelated palm trees sway gently in the background against a blue pixel sky. A caption box with pixelated font reads, “Even error messages need a vacation.” Panel two is a close-up of the T-Rex attempting to build a pixel sandcastle. It awkwardly pats a mound of brown pixels with its tiny pixel arms, looking focused. Small pixelated shells dot the sand around it. Panel three depicts the T-Rex joyfully hopping over a series of pixelated cacti planted near the beach, mimicking its game obstacle avoidance. Small “Boing! Boing!” sound effect text appears in a blocky font above each jump. A pixelated crab watches from the side, waving its pixel claw. The final panel shows the T-Rex floating peacefully on its back in the blocky blue pixel water, sunglasses still on, with a contented expression. A small thought bubble above it contains pixelated “Zzz…” indicating relaxation.
Prompt: Filmed cinematically from the driver’s seat, offering a clear profile view of the young passenger on the front seat with striking red hair. Her gaze is fixed ahead, concentrated on navigating the dusty, lonely highway visible through her side window, which shows a blurred expanse of dry earth and perhaps distant, hazy mountains. Her arm rests on the window ledge or steering wheel. The shot includes part of the aged truck interior beside her – the door panel, maybe a glimpse of the worn seat fabric. The lighting could be late afternoon sun, casting long shadows and warm highlights across her face and the truck’s interior. This angle emphasizes her individual presence and contemplative state within the vast, empty landscape.
To get started with Imagen 4 in public preview on Vertex AI, you can use Media Studio or run the following code sample, which uses the Google Gen AI SDK for Python.
code_block
<ListValue: [StructValue([(‘code’, ‘from google import genairnrn# TODO(developer): Update and un-comment below linesrn# project_id = “PROJECT_ID”rnclient = genai.Client(vertexai=True, project=project_id, location=”us-central1”)rnrnprompt = “””rnA white wall with two Art Deco travel posters mounted. First poster has the text: “NEPTUNE”, tagline: “The jewel of the solar system!’ Second poster has the text: “JUPITER”, tagline: “Travel with the giants!rn”””rnrnimage = client.models.generate_images(rn model=”imagen-4.0-generate-preview-05-20”,rn prompt=prompt,rn)rnrn# OPTIONAL: View the generated image in a notebookrn# image.generated_images[0].image.show()’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5bc47874f0>)])]>
Veo 3: Higher-quality video generation with audio and speech
Veo 3 is our latest state-of-the art video generation model from Google DeepMind. With Veo 3, you can generate videos with:
Improved quality when generating videos from text and image prompts
Speech, such as dialogue and voice-overs
Audio, such as music and sound effects
Here’s what a few of our customers have to say about productivity and creative gains with Veo:
Klarna, a leader in digital payments, is leveraging Veo and Imagen on Vertex AI to boost content creation efficiency. From b-roll to YouTube bumpers, the company is significantly reducing production timelines.
“At Klarna, we’re constantly exploring ways to push the boundaries of innovation in our marketing efforts, and Veo has been a game-changer in our creative workflows. With Veo and Imagen, we’ve transformed what used to be time-intensive production processes into quick, efficient tasks that allow us to scale content creation rapidly. Whether it’s producing engaging b-roll, crafting eye-catching YouTube bumpers, or developing dynamic social media animations, these tools have empowered our teams to be more agile and creative. The results speak for themselves, driving increased engagement and content performance. With Google Cloud, we’re laying the groundwork for the future of commerce and revolutionizing how we bring our brand to life.” – David Sandström, Chief Marketing Officer, Klarna
Jellyfish, a renowned digital marketing company within The Brandtech Group, has integrated Veo into their top performing AI marketing platform, Pencil, and teamed up with Japan Airlines to offer AI generated in-flight entertainment.
“The addition of Veo 2 in Pencil reinforces our commitment to empowering marketers with sophisticated AI, enabling them to produce campaigns that are not only smarter and faster but also bolder and more artistically inspired. Our pilots have shown incredible results, with an average 50% reduction in costs and time-to-market efficiencies. This step change in control and quality turns previously impossible ideas into real marketing content in minutes. Japan Airlines is leading the way in applying Gen AI to the travel industry, and we’re excited to see how other brands follow suit.” – David Jones, Founder & CEO, Brandtech
Kraft Heinz’s Tastemaker platform empowers their teams with access to Imagen and Veo, dramatically accelerating creative and campaign development processes.
“With Veo and Imagen on Vertex AI as part of our Tastemaker platform, Kraft Heinz has unlocked unprecedented speed and efficiency in our creative workflows. What once took us eight weeks is now only taking eight hours, resulting in substantial cost savings.” – Justin Thomas, Head Digital Experience & Growth
Envato, a global leader for digital creative assets and templates, used Veo 2 to develop their newly launched video generation feature, VideoGen, to enable creative professionals to turn text or images into hyper realistic and cinematic video content.
“We’ve tried many of the top video models, and Veo 2 has driven the most impressive results in terms of speed and quality across a diverse set of text and image inputs. Within the first few days of launch, tens of thousands of Envato subscribers were already accessing VideoGen, with nearly 60% of their generated videos being downloaded for use in creative projects. Since March, Envato has seen VideoGen usage surpass 100%+ month over month. It’s been a pleasure working with Google Cloud to bring Envato’s VideoGen feature to life with Veo.” said Aaron Rutley, Head of Product for AI at Envato.
See how it works: Veo 3 is capable of handling intricate prompt details, as demonstrated in the following examples.
Prompt: A medium shot, historical adventure setting: Warm lamplight illuminates a cartographer in a cluttered study, poring over an ancient, sprawling map spread across a large table. Cartographer: “According to this old sea chart, the lost island isn’t myth! We must prepare an expedition immediately!”
Prompt: A low-angle shot shows an open, light purple door leading from a room with light purple walls and a gray floor to a vibrant outdoor scene. Lush green grass and wildflowers spill from the doorway onto the indoor floor, creating a whimsical transition between spaces. Beyond the door, rolling green hills dotted with more wildflowers stretch towards a bright, clear sky. A single tree stands prominently in the foreground of the outdoor scene, its leaves adding depth to the view. The sunlight and natural elements contrast with the simplicity of the indoor space, inviting a sense of wonder and escape.
Veo 3 is in private preview on Vertex AI and will be available more broadly in the coming weeks. If you’re interested in early access, please fill out this form.
Lyria 2: Greater creative control with music generation
At Google Cloud Next 2025, we announced Lyria in Vertex AI, Google’s text-to-music model. Today, we’re announcing Lyria 2 is generally available in Vertex AI. As Google’s latest music generation model, Lyria 2 features high-fidelity music across a range of styles. As your next creative collaborator, Lyria 2 provides:
High-quality audio content from text prompts
Greater creative control over instruments, BPM, and other characteristics
To start creating content with Lyria 2, check out Media Studio on Vertex AI. Once there, you can start generating music from text prompts or access the model API via Vertex AI. For inspiration, check out some of the music clips and prompts below.
Prompt: Upbeat, Rhythmic Peruvian Cumbia with a psychedelic edge, LA, Live performance at a Latin music Festival, incorporating electric guitars, bass, and often utilizing a prominent timbales percussion section, creating a powerful and danceable vibe. Vibrant and energetic.
Prompt: Sweeping Orchestral Film Score, Pristine Studio recording, London, 100-piece Orchestra, Majestic and profound. A blend of soaring melodies, dramatic harmonic shifts, and powerful percussive elements, with instruments such as french horns, strings, and timpani, and a thematic approach, featuring intricate orchestrations, dynamic range, and emotional depth, evoking a cinematic and awe-inspiring atmosphere.
See what some of our customers have to say about Lyria 2 so far:
Captions is an AI-powered video creation tool that allows users to create studio-grade talking videos quickly and easily. They have integrated Lyria 2 into their Mirage Edit feature enabling customers to quickly generate complete videos with customized sound.
“At Captions, our Mirage Edit feature already gives subscribers the power to go from prompt to fully-edited AI talking video — complete with images, B-roll clips, voiceovers, and transitions. Now, we’re adding a keystone element: adaptive music powered by Google’s Lyria 2. With a single prompt, Lyria composes a score that syncs to the script, pacing, and transitions at every emotional beat, so our customers can publish cinematic short-form videos without ever leaving Captions or shuffling through stock libraries.” said Dwight Churchill, Co-Founder and COO, Captions.ai
Dashverse, owner of digital content platforms such as Dashtoon and DashReels, is leveraging Google’s Lyria 2 on Vertex AI to provide the next generation of AI-native creators with advanced music generation capabilities. This integration allows users to craft dynamic and emotionally responsive soundtracks that seamlessly adapt to the narrative and pacing of their content on platforms like DashReels.
“We’ve always believed in empowering everyday creators at Dashverse — whether they’re making comics with Dashtoon or short dramas on DashReels. Our move into dynamic, emotionally resonant storytelling with DashReels needed a music engine that was just as expressive and responsive. Lyria 2 on Vertex AI delivers exactly that. It gives our users studio-level control over music — adapting to emotion, scene, and pacing — without the overhead. It’s not just a soundtrack generator; it’s a storytelling amplifier. We’re incredibly excited about what this unlocks for the next generation of AI-native creators.” said Soumyadeep Mukherjee, CTO, Dashverse
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5bc32b6700>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Create securely and share responsibly
The security and safety of any AI generated content is crucial. Therefore, these models are designed with built in safeguards, allowing you to concentrate on your creative work. Veo 3, Imagen 4, and Lyria 2 are all built with safety as a fundamental design principle in partnership with Google DeepMind.
Watermarking: By default, all creations generated with Veo, Imagen, and Lyria utilize SynthID, a technology that embeds an invisible watermark directly into the generated output. This watermark allows for the identification of AI generated media, ensuring transparency.
Safety filters: Both input prompts and output content for all generative AI media models are accessed against a list of safety filters. By being able to configure how aggressively the content is filtered, you can ensure the assets meet your brand values. In visual output data, you also have control over person generation.
Get started
You can learn more about these new models by checking out the resources below:
Today at Google I/O, we’re expanding Gemini 2.5 Flash and Pro model capabilities that help enterprises build more sophisticated and secure AI-driven applications and agents:
Thought summaries: For enterprise-grade AI, we’re bringing clarity and auditability with thought summaries. This feature organizes a model’s raw thoughts — including key details and tool usage — into a clear format. Customers can now validate complex AI tasks, ensure alignment with business logic, and dramatically simplify debugging, leading to more trustworthy and dependable AI systems.
Deep Think mode: Using new research techniques that enable the model to consider multiple hypotheses before responding, will help Gemini 2.5 Pro get even better. This enhanced reasoning mode is designed for highly-complex use cases like math and coding. We will be making 2.5 Pro Deep Think available to trusted testers soon on Vertex AI.
Advanced security: We’ve significantly increased Gemini’s protection rate against indirect prompt injection attacks during tool use, a critical factor for enterprise adoption. Our new security approach makes Gemini 2.5 our most secure model family to date.
Gemini 2.5 Flash will be generally available for everyone in Vertex AI early June, with 2.5 Pro generally available soon after. Let’s dive into how these advancements can impact your business, from operations to customer engagement.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5bc2b0a8b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Powering diverse enterprise needs with Gemini 2.5 Flash and Pro on Vertex AI
Our customers are seeing real value and boosting efficiency with Gemini 2.5 models on Vertex AI. From delivering faster response times to tackling complex extraction, enterprises are pushing the boundaries of automation.
“With respect to Geotab Ace (our data analytics agent for commercial fleets), Gemini 2.5 Flash on Vertex AI strikes an excellent balance. It maintains good consistency in the agent’s ability to provide relevant insight to the customer question, while also delivering 25% faster response times on subjects where it has less familiarity. What’s more, our early analysis suggests it could operate at potentially 85% lower cost per question compared to the Gemini 1.5 Pro baseline. This efficiency is vital for scaling AI insights affordably to our customers via Ace.” –Mike Branch, Vice President Data & Analytics, Geotab.
For the most complex enterprise solutions, Gemini 2.5 Pro is our most advanced and capable model. The introduction of Deep Think mode will help make 2.5 Pro even better by using our latest cutting edge research in thinking and reasoning, including parallel thinking techniques.
We’ve enhanced 2.5 Pro usability with features like configurable Thinking Budgets (up to 32K tokens), allowing for fine-tuned control over processing. This means enterprises can tackle more intricate challenges and gain deeper insights.
“Box is revolutionizing how enterprises interact with their vast, and rarely organized, amounts of content. With Box AI Extract Agents, powered by Gemini 2.5 on Vertex AI, users can instantly extract precise insights from complex, unstructured content – whether it’s scanned PDFs, handwritten forms, or image-heavy documents. Gemini 2.5 Pro’s advanced reasoning makes it the top choice for tackling complex enterprise tasks, delivering 90%+ accuracy on complex extraction use cases and outperforming previous models in both clause interpretation and temporal reasoning, leading to a significant reduction in manual review efforts. This evolution pushes the boundaries of automation, allowing businesses to unlock and act upon their most valuable information with even greater impact and efficiency.” – Yashodha Bhavnani, Vice President of AI Product Management, Box
The versatility of the Gemini 2.5 family allows diverse organizations like LiveRamp to innovate and democratize data access.
“With its improved reasoning capabilities and insightful responses, Gemini 2.5 provides tremendous potential for LiveRamp. Its advanced features can enhance our data analysis agents and add support across our product suite, including segmentation, activation, and clean room-powered measurement for advertisers, publishers, and retail media networks. We are committed to assessing the model’s impact across a wide array of features and functionalities to ensure our clients and partners can unlock new use cases and enhance existing ones.” – Roopak Gupta, Vice President Engineering, LiveRamp”
Google Developer Experts building with Gemini 2.5
Google Developer Experts (GDEs) are a global community of tech experts, influencers, developers, and thought leaders. The community is already testing out Gemini 2.5 and building solutions. Take a look at some examples:
Kalev built a persona-based news recommender service for a supply chain analyst using Gemini 2.5 Pro’s large context window and reasoning abilities to filter and summarize global news relevant to their specific role.
Rubens built the Xtreme Weather App, a disaster preparedness multi-agent system, using Gemini 2.5 Pro for intelligent query routing and generating clear, actionable emergency guidance from diverse weather and hazard data. This use case demonstrates Gemini’s benefit in transforming complex environmental information into personalized advice, empowering users to prepare effectively for potential climate threats in their specific location.
Truong built a GitHub Action that automatically reviews pull requests using Google’s Gemini AI. This action helps catch errors, inconsistencies, and potential bugs early on. This leads to more robust and reliable software.
These developers are showing how Gemini on Vertex AI enables businesses and developers to achieve new efficiencies and foster innovation.
A big thank you to Fran Hinkelmann and Aaron Wanjala for their contributions and support in making this blog post happen.
After a period of intense development, Spring AI 1.0 has officially landed, bringing a robust and comprehensive solution for AI engineering right to your Java ecosystem. This isn’t just another library; it’s a strategic move to position Java and Spring at the forefront of the AI revolution.
With an overwhelming number of enterprises already running on Spring Boot, the path to integrating AI into existing business logic and data has never been smoother. Spring AI 1.0 empowers developers to seamlessly connect their applications with cutting-edge AI models, unlocking new possibilities without the typical integration headaches. Get ready to level up your JVM applications with intelligent capabilities!
Spring AI provides support for various AI models and technologies:
Image models can generate images from text prompts that are provided to them.
Transcription models can take audio files and convert them to text.
Embedding models can convert arbitrary data into vectors, which are data types that are optimized for semantic similarity search.
Chat models should be familiar! You’ve no doubt even had a brief conversation with one somewhere.
They’re versatile. You can get them to help you correct a document or write a poem or seemingly anything. They’re awesome, but they have some drawbacks.
Chat models are open-minded and given to distraction. You can help manage the chat model by providing additional capabilities to do the following:
Keep chat models on track: use system prompts to define their behavior.
Give them a sense of memory: implement memory to track conversational context.
Let AI models access external functions: enable tool calling.
Provide relevant information directly in the request: use prompt stuffing for private data.
Fetch and utilize specific enterprise data: leverage vector stores for retrieval augmented generation (RAG).
Ensure accuracy: evaluation uses another model to validate outputs.
Let AI applications connect with other services: use the Model Context Protocol (MCP). MCP works regardless of the application’s programming language, so that you can build agentic workflows for complex tasks.
Spring AI integrates smoothly with Spring Boot, offering familiar abstractions and starter dependencies from the Spring Initializr, giving you the convention-over-configuration setup that you expect. You can easily connect your existing logic and data to AI models within Spring Boot applications.
With Spring AI, you can leverage robust solutions to make chat models more effective and to deeply integrate these models into larger systems.
Prerequisites
To make a call to a Gemini model in Vertex AI, you will need to obtain credentials for the service that you want to use and then configure your local development environment.
You start by setting up your Google Cloud environment:
In your terminal, install the gcloud CLI, which is essential for managing resources and setting up authentication for local development where you run your application.
<ListValue: [StructValue([(‘code’, ‘# initialize gcloud rngcloud initrn# set the Project ID you have configuredrngcloud config set project <PROJECT_ID>rn# authenticate with your user accountrngcloud auth application-default login <ACCOUNT>’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2e2bea60>)])]>
You’re in. Now you get to build.
The Build
You start by building a simple Google Cloud-optimized SpringAI application:
For the Project type, select Apache Maven (used in this blog) or Gradle.
Click Add Dependencies, and then select these dependencies: Spring Web, Spring Boot Actuator Actuator, GraalVM Native Support, Spring Data JDBC, Vertex AI Gemini, Vertex AI Embeddings, PGvector Vector Database, MCP Client, and Docker Compose Support.
Select the Java version that you want to use. We recommend that you use the latest available version. This blog uses GraalVM, which is a distribution of OpenJDK with some extra utilities that let you compile your code into images specific to the OS and architecture of the respective machine. These images operate in a fraction of the RAM and startup in a fraction of the time compared to regular JRE-based applications.
If you’re using sdkman.io, you can install it on your local machine by using the following command:
sdk install java 24-graalce
Click Generate and then save the .zip file that you can open in your IDE.
Add the required dependencies
Extract the .zip file that you downloaded, and then open thepom.xml file.
In the pom.xml file, at the end of the <dependencyManagement>, add the following lines:
In the pom.xml file, you’ll need to update these configuration values, starting with the application.properties:
code_block
<ListValue: [StructValue([(‘code’, ‘# set application namernspring.application.name=googlernrn# configure actuators supported in the apprnmanagement.endpoints.web.exposure.include=*rnrn# docker compose configurationrnspring.docker.compose.lifecycle-management=start_onlyrnrn# configure the chat and embedding modelsrn# vertex embeddingrnspring.ai.vertex.ai.embedding.project-id=<your_user>rnspring.ai.vertex.ai.embedding.location=us-central1rn# vertex chatrnspring.ai.vertex.ai.gemini.project-id=<your_user>rnspring.ai.vertex.ai.gemini.location=us-central1rnspring.ai.vertex.ai.gemini.chat.options.model=gemini-2.5-pro-preview-05-06rnrn# initialize the schema in the vector storernspring.ai.vectorstore.pgvector.initialize-schema=truernrn# database connection parametersrnspring.datasource.password=secretrnspring.datasource.username=myuserrnspring.datasource.url=jdbc:PostgreSQL://localhost/mydatabase’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2bb72580>)])]>
The following are more details about the sections that you updated:
configure actuators supported in the app: tells Spring Boot’s observability integration, the Spring Boot Actuator, to expose all Actuator endpoints.
docker compose configuration: leverages the built-in Docker Compose support. Spring Boot automatically detects the compose.yml file in the root of the directory and it runs the Docker image for you before the application starts. The Spring Initializer generated the Docker compose.yml file for you. But you don’t want Spring Boot to restart the container each time because PostgreSQL isn’t serverless! Tell Spring Boot to start the container only if it isn’t running.
configure the chat and embedding models: the vertex embedding andvertex chat values configure which Gemini chat and Gemini embedding models to use from Google Cloud’s Gemini. This example uses two suitable models for the use case.
initialize the schema in the vector store: configures the app to use PostgreSQL loaded with a plugin that supports the vector type. Spring AI has an abstraction called VectorStore which handles writing data to various vector stores. This configuration ensures that Spring AI will initialize the storage that’s required to treat PostgreSQL as a vector store.
database connection parameters: configures the connection to the SQL database. Do you need this, strictly speaking? No. The Docker Compose support in Spring Boot will automatically connect to the SQL database. It can be handy for reference, though.
The Database
You have a SQL database with no data. Imagine! If a database has no data, is it really a database? Or is it just a base? You can get Spring Boot to initialize the database by running some SQL commands and installing some data on startup. Two files will run on startup: src/main/resources/schema.sql and /src/main/resources/data.sql. First, the schema.sql file:
code_block
<ListValue: [StructValue([(‘code’, ‘drop table if exists dog;rncreate table if not exists dog(rn id serial primary key,rn name text not null,rn description text not null,rn owner textrn);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2bb72460>)])]>
Simple enough. It defines a dog table. The src/main/resources/data.sql file gets some dogs in there:
code_block
<ListValue: [StructValue([(‘code’, “…rnINSERT INTO dog(id, name, description) values (45, ‘Prancer’, ‘A silly, goofy dog who slobbers all over everyone.’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2bb72dc0>)])]>
Nice! The database will be initialized on every restart. This configuration helps to avoid duplicate data by dropping the table on every restart and re-inserting the same rows. If this were a real database, you might use an upsert, which PostgresSQL supports with its insert on conflict... dosyntax.
To make short work of building a data access repository and an entity you use Spring Data JDBC. You’ll create an entity called Dog to model and map to the data in the repository.
In the GoogleApplication.java file, following GoogleApplication, add the following lines:
Now for the meat… or the dog bone of the matter, as our pal Prancer might say!
Add Chat clients
First, you build an AssistantController class, by adding the following, again at the end of the GoogleApplication.java file:
code_block
<ListValue: [StructValue([(‘code’, ‘@Controllerrn@ResponseBodyrnclass AssistantController {rn private final ChatClient ai;rnrn private final Map<String, PromptChatMemoryAdvisor> advisors = new ConcurrentHashMap<>();rnrn AssistantController(ChatClient.Builder ai) {rn var system = “””rnYou are an AI powered assistant to help people adopt a dog from the adoptionrn agency named Pooch Palace with locations in Mountain View, rnSeoul, Tokyo, Singapore, Paris,rn Mumbai, New Delhi, Barcelona, San Francisco, and London. rnInformation about the dogs availablern will be presented below. If there is no information, then rnreturn a polite response suggesting we\srn don’t have any dogs available.rn “””;rn this.ai = airn .defaultSystem(system)rn .build();rn }rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a790>)])]>
The ChatClient that’s added in the preceding code is your one-stop-shop for all your chat model interactions. It in turn depends on the (autoconfigured) ChatModel that talks to, in this case, Google’s legendary Gemini. You typically only have one ChatModel configured in the application, but you might reasonably have many ChatClients, with different defaults and scenarios configured appropriately. You create a new ChatClient by using the ChatClient.Builder, which the preceding code injects into the constructor.
Set the HTTP endpoint
Next, you set up an HTTP endpoint, /{user}/inquire. When a request comes in, you use a system prompt to ensure that the model acts like it’s an actual employee at our fictitious dog adoption agency, Pooch Palace.
Add the following method to the AssistantController controller class file:
code_block
<ListValue: [StructValue([(‘code’, ‘@GetMapping(“/{user}/inquire”)rn String inquire(@PathVariable String user,rn @RequestParam String question) {rn var c = rnMessageWindowChatMemoryrn.builder()rn.chatMemoryRepository(new rnInMemoryChatMemoryRepository()).build();rn var advisor = this.advisorsrn .computeIfAbsent(user, rn_ -> PromptChatMemoryAdvisor.builder(c).build());rn return this.airn .prompt()rn .advisors(advisor)rn .user(question)rn .call()rn .content();rn }’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a760>)])]>
This method defines a PromptChatMemoryAdvisor that keeps track of everything that’s said between a particular user and the model. The method then transmits that transcript to the model on every subsequent request to remind it.
Test the endpoint
Try the following requests out:
code_block
<ListValue: [StructValue([(‘code’, ‘http :8080/lee/inquire question==”my name is Lee.”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a640>)])]>
and:
code_block
<ListValue: [StructValue([(‘code’, ‘http :8080/lee/inquire question==”what’s my name?”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a6a0>)])]>
It should confirm that your name is Lee. And it might even try to keep you on track in adopting a dog. Let’s see what else it can do. Ask it:
code_block
<ListValue: [StructValue([(‘code’, ‘http :8080/lee/inquire question==”do you have any silly dogs?”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9ab20>)])]>
It will respond that it doesn’t have any information about dogs in any Pooch Palace location, and it might even encourage you to check the local listings.
Source data
The problem is that it doesn’t have access to the data in the database. We shouldn’t give it all of the data…
We could perhaps give it all of the data. There are only 18 or so records in this particular database. That would fit easily into the token limit for most (if not all?) LLMs. It would certainly fit into a request made to Google’s LLMs large-context window, though.
All LLMs have this concept of tokens – an approximation for the amount of data consumed and produced by an LLM. Google Gemini 2.5 Pro has a very large token size. If you’re using a local model like the open Gemma model, which you can run locally, then the only cost to running a model is the complexity and CPU cost. If you run a hosted model like Gemini, then there’s also a dollars-and-cents cost.
So, even though you could send all of the data along with your request to one of these models, you should try to limit the data that you send. It’s the principle of the thing! Instead, store everything in a vector store and then find things that might be potentially germane to the request, and finally transmit only that data to the model for consideration. This process of sourcing data from a database and then using that date to inform the response that’s produced by a model is called retrieval augmented generation (or RAG).
Add a parameter of type VectorStore vectorStoreto the class constructor. You’ll read all of the data from the dog table and then write it all out to the VectorStore implementation that’s backed by PostgreSQL. Add the following to the constructor, at the very top:
code_block
<ListValue: [StructValue([(‘code’, ‘if (db.sql(“select count(*) from vector_store”).query(Integer.class).single().equals(0)) {rn repository.findAll().forEach(d -> {rn var dogument = new Document(“id: %s, name: %s, description: %s”.formatted(rn d.id(), d.name(), d.description()rn ));rn vectorStore.add(List.of(dogument));rn });rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a340>)])]>
The only thing to do now is to tell the ChatClient to first consult the VectorStore for relevant data to include in the body of the request to the model. You do this with an advisor. Change the definition of the ChatClient:
<ListValue: [StructValue([(‘code’, ‘http :8080/lee/inquire question==”do you have any silly dogs?”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a6d0>)])]>
This time, it should respond that indeed there’s a silly dog named Prancer (hi, buddy!) in the shelter who might be just the dog for us!
Well, naturally the next thing you’ll want to do is adopt this dog. But when might you stop by to adopt and pick up this dog? You’ll need to connect your LLM to the patent-pending, class-leading scheduling algorithm.
Add the DogAdoptionsScheduler class to the GoogleApplication.java file:
code_block
<ListValue: [StructValue([(‘code’, ‘@Componentrnclass DogAdoptionsScheduler {rnrn @Tool(description = “schedule an appointment to pickup” + rn “or adopt a dog at a Pooch Palace location”)rn String scheduleAppointment(rn @ToolParam(description = “the id of the dog”) String dogId,rn @ToolParam(description = “the name of the dog”) String dogName) rnthrows Exception {rn var i = Instantrn .now()rn .plus(3, ChronoUnit.DAYS)rn .toString();rn System.out.println(“scheduled appointment for ” + i +rn ” for dog ” + dogName + ” with id ” + dogId + “.”);rn return i;rn }rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a460>)])]>
You’ve annotated the method with Spring AI’s @Tool and @ToolParam annotations. These annotations furnish descriptions that the model will use, along with the shape of the methods, to intuit whether they might be of use. You’ll want to tell the model these tools are available, too. Inject the newly defined DogAdoptionScheduler scheduler into the constructor of the controller from the GoogleApplication.java file and then add the following to the definition of the ChatClient:
Restart the program and try it out. Ask about Prancer again:
code_block
<ListValue: [StructValue([(‘code’, ‘http :8080/lee/inquire question==”do you have any silly dogs?”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a5e0>)])]>
The response has information now, so then ask how you can get Prancer:
code_block
<ListValue: [StructValue([(‘code’, ‘http :8080/lee/inquire question==”fantastic. When can I schedule an appointment to pick up Prancer from the San Francisco location?”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a700>)])]>
Did it confirm it’s going to schedule an adoption for three days hence? Congratulations! You successfully have model access to the data and business logic of the service.
Introducing Model Context Protocol
This service is all written in Spring and it uses Spring AI, but of course there are other languages and technology stacks out there, and those stacks might want to leverage this patent-pending, industry-leading scheduling algorithm. You’ll want to extract that functionality and make it a tool that’s available to all interactions with the LLM by using Model Context Protocol. This protocol was first designed by Anthropic and it provides an easy way for any LLM to assimilate tools into their toolbox, no matter what programming language the tools were written in.
3. Choose the latest version of Java and Maven, as before.
4. Unzip the resulting .zipfile and open it in your IDE.
5. Transplant (cut and paste for the win!) the DogAdoptionsScheduler component from the google project to the SxchedulerApplication.java file in this new project.
6. Start the new project on a different port; add server.port=8081 to application.properties.
7. Define the following bean in SchedulerApplication:
11. Change the controller of AssistantController to inject a McpSyncClientclient instead of the DogAdoptionsScheduler that was there before. Modify the ai definition accordingly:
code_block
<ListValue: [StructValue([(‘code’, ‘…rnthis.ai = ai.defaultToolCallbacks(rn new SyncMcpToolCallbackProvider(client))rn…’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a310>)])]>
12. Restart the application and then try the interaction again:
code_block
<ListValue: [StructValue([(‘code’, ‘http :8080/dogs/lee/inquire question==”do you have any silly dogs?” rnrnhttp :8080/dogs/lee/inquire question==”fantastic. When can I schedule an appointment to pick up Prancer from the San Francisco location?”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ecd2de9a190>)])]>
You should see this time that the model responds basically the same as last time, except that the request was handled in the newly minted scheduler component, not in the google service! We love it when a plan comes together.
Production worthy AI
When you deploy your application to Production, Google Cloud offers two great choices that are compatible with PostgreSQL, both supporting pgVector capabilities. You can use Google Cloud SQL which is basically just PostgreSQL hosted by Google. What else could you need? We reckon, for a lot of use cases, not much! But if you really want superior performance, availability, and scale – then look no further than Google Cloud’s AlloyDB.
AlloyDB is scalable for all sizes and it offers a 99.99% available SLA, inclusive of maintenance. And it’s purpose built for your AI workloads. We’re using PostgreSQL in a Docker image in this example, but when it’s time to deploy, we’re going to spin up an AlloyDB instance.
Codebase
The code for this walkthrough is available in Github.
Next Steps
You’ve just built a production-worthy, AI-ready, Spring AI- and Google-powered application in no time at all. You’ve only begun to scratch the surface! Check out Spring AI 1.0 at the Spring Initializr today and learn more about Google Cloud here.
SAP and Google Cloud are deepening their collaboration across data analytics, AI, security, and more to deliver what customers need most: faster paths to business value, lower risk on complex projects, and smart, enterprise-ready innovation that’s built to serve real workloads — not hypothetical use cases.
At SAP Sapphire 2025, we are excited to highlight multiple updates that will simplify how customers access SAP data with BigQuery, power new AI agents and capabilities with Gemini and Agentspace, and bring Google Cloud’s leading infrastructure to SAP customers’ most demanding workloads.
AI collaboration that drives business value
AI is at the core of work between SAP and Google Cloud. Today, we’re highlighting innovations that will help enterprise customers bring our leading AI technology to SAP data and systems, enabling measurable impact without rebuilding everything from scratch.
Building AI-powered applications: The Vertex AI platform provides full access to leading foundation models, an SDK for SAP’s ABAP language, and Model Garden: a single place to search, discover, and interact with models from Google and Google partners with pre-integrated models. For SAP customers, these capabilities include improving business processes like invoice resolution, enabling generative AI chatbots that provide faster, more accurate customer service experiences, and generate personalized marketing content — all with data from SAP systems.
Bringing Gemini to SAP Joule: SAP continues to build and refine Joule, its native agentic intelligence layer that enables AI-powered agents to operate across SAP applications. Now, with the addition of newer Gemini models embedded into the SAP’s Generative AI Hub on SAP Business Technology Platform (BTP), those agents can leverage Google Cloud’s most advanced language models to power more intelligent, autonomous behavior. With 10 SAP BTP regions available on Google Cloud globally, customers can now utilize these capabilities to scale AI agent applications that comply with enterprise requirements for governance, data locality, and operational control.
Accessing SAP data in Agentspace:Google Agentspace unifies access to enterprise agents and applications including SAP agents, offering users a single place to streamline task automation and secure enterprise knowledge access across various systems, applications, and data formats. This empowers employees to leverage enterprise intelligence within their current tools, regardless of its location, including in popular partner CRM, IT, and data platforms. Accessing SAP data in Agentspace improves access to AI agents, structured and unstructured data, and contextual recommendations, all within the employee’s existing environment.
Enabling Agent2Agent interoperability: SAP supports and is contributing to Google Cloud’s Agent2Agent (A2A) interoperability protocol, which enables AI agents across platforms to securely exchange information, share context, and work together to achieve business outcomes. For example, an SAP orchestrator can utilize A2A to trigger a Vertex AI agent using BigQuery with full context, replacing manual, brittle integrations with autonomous, standards-based collaboration.
AI-powered SAP operations with Gemini: As customer adoption of RISE with SAP continues to grow, SAP and Google Cloud are partnering on AI co-innovations to deliver faster, more intelligent SAP operations at scale. As part of this effort, we have co-developed specialized AI agents with SAP, powered by the latest Gemini models, to prevent outages, optimize performance, and accelerate troubleshooting.
As an example, AMD is leveraging Google’s AI capabilities to improve user experiences with SAP. The company developed several gen AI chatbots that allow finance, HR, and customer service teams to use natural language to find the information they need from SAP systems. One customer operations chatbot pulls data related to order scheduling and deliveries to help answer customers’ questions quickly, while a finance chatbot built with Vertex AI and Cortex Framework helps accelerate cash-flow forecasting.
Unlock and enrich the value of enterprise data
Enterprise data is only truly valuable when it’s connected, current, and actionable. Customers can leverage the speed of BigQuery, enhanced by Cortex Framework, to accomplish this. Cortex provides prebuilt models for key systems (like SAP, Google Ads, and other enterprise data sources), simplifying connections to disparate data and integration of real-time signals. This serves to build a reliable data model from your existing truth that’s ready for BigQuery’s native ML to help you proactively act on opportunities, enhancing AI readiness, analytics, and operational agility.
Suzano, the world’s largest pulp company and a Google Cloud RISE with SAP customer, uses Cortex to centralize its vast operational data, which was previously difficult for its global workforce to access. The company then developed a gen AI tool that translates employees’ natural language questions into SQL queries, allowing non-technical users to easily retrieve insights from their SAP data. This significantly democratized data access, led to a 90% reduction in data consultation time, and enabled faster, more informed decision-making across the organization.
SAP and Google Cloud are continuing to jointly engineer data integrations between SAP source systems and BigQuery using SAP Business Data Cloud, making SAP data instantly available for AI, analytics, and visualization — without moving workloads off SAP. Customers can build new intelligence pipelines using tools they already know and with data they already trust.
Leading infrastructure for SAP workloads
Choosing the right cloud infrastructure is critical for your SAP systems. Google Cloud now offers new SAP-optimized virtual machines (M4 and C4D families) designed to directly boost your business performance and efficiency. M4, generally available, offers the most performant memory-optimized VM Compute Engine instances, with up to 127% better performance at 36% better TCO compared to previous generations. This means you can run your core SAP workloads significantly faster while reducing costs.
Google Cloud infrastructure is designed from the ground up for exceptional reliability and security, now supporting industry-leading single instance 99.95% uptime service level agreement (SLA) to minimize disruption to your business critical operations. Plus, with our expanded large-scale X4 instances, you can operate even your most demanding S/4HANA environments under RISE confidently at enterprise scale, ensuring your IT foundation scales with your business ambitions, not against them.
Enhancing security and compliance
Google Cloud safeguards your critical SAP systems and valuable business data through a comprehensive, multi-layered approach to security, starting with secure-by-design global infrastructure and data encryption by default, to reduce risk from the get-go. This is enhanced by Google Unified Security which provides precise access controls, continuous monitoring, and centralized security management, for proactive threat visibility across your SAP landscape. For sensitive SAP workloads, especially in the public sector, Google Cloud enables compliance through the differentiated Assured Workloads control platform, which is supporting ITAR for SAP Ariba and RISE with SAP. This combination of Google Cloud solutions helps ensure compliant operations while enabling rapid detection, investigation, and neutralization of sophisticated threats, to minimize impact on the business and enhance operational resilience.
What’s next: Learn, try, build
SAP customers seek clear wins: faster time to value, simplified architectures, and improved utilization of their existing data and systems. This is precisely what SAP and Google Cloud are building together. We expect Google Cloud to deliver even greater value in 2025 and beyond with our advanced enterprise AI and planet-scale cloud infrastructure.
Visit us at SAP Sapphire to explore what’s possible with the advances we have been sharing here and elsewhere in recent weeks:
Learn how Google Cloud’s approach to enterprise AI translates into real-world value for our SAP customers. See hands-on demos of AI-assisted SAP workflows using Google Agentspace.
Get details on the A2A protocol and agentic integration scenarios.
See how RISE with SAP on Google Cloud infrastructure is engineered for scale, security and reliability — and proven in the field.
Discover how Google’s security solutions are protecting SAP’s customers’ most valued data and operations.