Today, Amazon CloudWatch announces support for a new tag-based telemetry experience to help customers monitor their metrics and set up their alarms using AWS resources tags. This new capability simplifies monitoring cloud infrastructure at scale by automatically adapting alarms and metrics analysis as resources change. DevOps engineers and cloud administrators can now create dynamic monitoring views that align with their organizational structure using their existing AWS resource tags.
Tag-based querying filtering eliminates the manual overhead of updating alarms and dashboards after deployments, freeing teams to focus on innovation rather than maintenance. This provides faster, targeted insights that match how teams organize their systems. Teams can query AWS default metrics using their existing resource tags, making it easier to troubleshoot issues and maintain operational visibility while focusing on core business initiatives.
CloudWatch tag-based filtering is available in the following regions: US East (N. Virginia); US East (Ohio); US West (N. California); US West (Oregon); Asia Pacific (Tokyo); Asia Pacific (Seoul); Asia Pacific (Singapore); Asia Pacific (Sydney); Asia Pacific (Mumbai); Asia Pacific (Osaka); Canada (Central); Europe (Frankfurt); Europe (Ireland); Europe (London); Europe (Paris); Europe (Stockholm) and South America (São Paulo).
You can now preview your Amazon S3 Tables directly in the S3 console without having to write a SQL query. You can view the schema and sample rows of your tables stored in S3 Tables to better understand and gather key information about your data quickly, without any setup.
AWS X-Ray, a service that helps developers analyze and debug distributed applications by providing request tracing capabilities, now offers adaptive sampling to solve a common challenge for DevOps teams, Site Reliability Engineers (SREs), and application developers. These customers often face a difficult trade-off: setting sampling rates too low risks missing critical traces during incidents, while setting them too high unnecessarily increases observability costs during normal operations. Today, with adaptive sampling, you can automatically adjust sampling rates within user-defined limits to ensure you capture the most important traces precisely when you need them. This helps development teams reduce mean time to resolution (MTTR) during incidents by providing comprehensive trace data for root cause analysis, while maintaining cost-efficient sampling rates during normal operations. Adaptive sampling supports two approaches, Sampling Boost and Anomaly Span Capture. These can be applied independently or can be combined together. Customers can use Sampling Boost to temporarily increase sampling rates when anomalies are detected to capture complete traces and Anomaly Span Capture to ensures anomaly-related spans are always captured, even when the full trace isn’t sampled. Adaptive sampling is currently available in all commercial regions where AWS X-Ray is offered. For more information, see the X-Ray documentation. and CloudWatch pricing page for X-ray pricing details.
Allowed AMIs, the Amazon EC2 account-wide setting that enables you to limit the discovery and use of Amazon Machine Images (AMIs) within your Amazon Web Services accounts, adds support for four new parameters — marketplace codes, deprecation time, creation date and AMI names.
Previously, you could specify accounts or owner aliases that you trust in your Allowed AMIs setting. Starting today, you can use the four new parameters to define additional criteria to further reduce risk of inadvertently launching instances with non-compliant or unauthorized AMIs. Marketplace codes can be provided to limit the use of Marketplace AMIs, the deprecation time and creation date parameters can be used to limit the use of outdated AMIs, and AMI name parameter can be used to restrict usage to AMIs with specific naming pattern. You can also leverage Declarative Policies to configure these parameters to perform AMI governance across your organization.
These additional parameters are now supported in all AWS regions including AWS China (Beijing) Region, operated by Sinnet, and AWS China (Ningxia) Region, operated by NWCD, and AWS GovCloud (US). To learn more, please visit the documentation.
Amazon RDS for PostgreSQL 18.0 is now available in the Amazon RDS Database Preview Environment, allowing you to evaluate the latest PostgreSQL features while leveraging the benefits of a fully managed database service. This preview environment provides you a sandbox where you can test applications and explore new PostgreSQL 18.0 capabilities before they become generally available.
PostgreSQL 18.0 includes “skip scan” support for multicolumn B-tree indexes and improves WHERE clause handling for OR and IN conditions. It introduces parallel Generalized Inverted Index (GIN) builds and updates join operations. It now supports Universally Unique Identifiers Version 7 (UUIDv7), which combines timestamp-based ordering with traditional UUID uniqueness to boost performance in high-throughput distributed systems. Observability improvements show buffer usage counts and index lookups during query execution, along with per-connection I/O utilization metric. Please refer to the RDS PostgreSQL release documentation for more details.
Amazon RDS Database Preview Environment database instances are retained for a maximum period of 60 days and are automatically deleted after the retention period. Amazon RDS database snapshots that are created in the preview environment can only be used to create or restore database instances within the preview environment. You can use the PostgreSQL dump and load functionality to import or export your databases from the preview environment.
Amazon RDS Database Preview Environment database instances are priced as per the pricing in the US East (Ohio) Region.
A year ago today, Google Cloud filed a formal complaint with the European Commission about Microsoft’s anti-competitive cloud licensing practices — specifically those that impose financial penalties on businesses that use Windows Server software on Azure’s biggest competitors.
Despite regulatory scrutiny, it’s clear that Microsoft intends to keep its restrictive licensing policies in place for most cloud customers. In fact, it’s getting worse.
As part of a recent earnings call, Microsoft disclosed that its efforts to force software customers to use Azure are “not anywhere close to the finish line,” and represented one of three pillars “driving [its] growth.” As we approach the end of September, Microsoft is imposing another wave of licensing changes to force more customers to Azure by preventing managed service providers from hosting certain workloads on Azure’s competitors.
Regulators have taken notice. As part of a comprehensive investigation, the U.K.’s Competition and Markets Authority (CMA) recently found that restrictive licensing harms cloud customers, competition, economic growth, and innovation. At the same time, a growingnumber of regulators around the world are also scrutinizing Microsoft’s anti-competitive conduct — proving that fair competition is an issue that transcends politics and borders.
While some progress has been made, restrictive licensing continues to be a global problem, locking in cloud customers, harming economic growth, and stifling innovation.
Economic, security, and innovation harms
Restrictive cloud licensing has caused an enormous amount of harm to the global economy over the last year. This includes direct penalties that Microsoft forces businesses to pay, and downstream harms to economic growth, cybersecurity, and innovation. Ending restrictive licensing could help supercharge economies around the world.
Microsoft still imposes a 400% price markup on customers who choose to move legacy workloads to competitors’ clouds. This penalty forces customers onto Azure by making it more expensive to use a competitor. A mere 5% increase in cloud pricing due to lack of competition costs U.K. cloud customers £500 million annually, according to the CMA. A separate study in the EU found restrictive licensing amounted to a billion-Euro tax on businesses.
With AI technologies disrupting the business market in dramatic ways, ending Microsoft’s anti-competitive licensing is more important than ever as customers move to the cloud to access AI at scale. Customers, not Microsoft, should decide what cloud — and therefore what AI tools — work best for their business.
The ongoing risk of inaction
Perhaps most telling of all, the CMA found that since some of the most restrictive licensing terms went into place over the last few years, Microsoft Azure has gained customers at two or even three times the rate as competitors. Less choice and weaker competition is exactly the type of “existential challenge” to Europe’s competitiveness that the Draghi report warned of.
Ending restrictive licensing could help governments “unlock up to €1.2 trillion in additional EU GDP by 2030” and “generate up to €450 billion per year in fiscal savings and productivity gains,” according to a recent study by the European Centre for International Political Economy. Now is the time for regulators and policymakers globally to act to drive forward digital transformation and innovation.
In the year since our complaint to the European Commission, our message is as clear as ever: Restrictive cloud licensing practices harm businesses and undermine European competitiveness. To drive the next century of technology innovation and growth, regulators must act now to end these anti-competitive licensing practices that harm businesses.
AWS Lambda now offers Code Signing in GovCloud Regions (AWS GovCloud (US-West) and AWS GovCloud (US-East)), which allows administrators to ensure that only trusted and verified code is deployed to Lambda functions. This feature uses AWS Signer, a managed code signing service. When code is deployed, Lambda checks the signatures to confirm the code hasn’t been altered and is signed by trusted developers.
Administrators can create Signing Profiles in AWS Signer and use AWS Identity and Access Management (IAM) to manage user access. Within Lambda, they can specify allowed signing profiles for each function and configure whether to warn or reject deployments if signature checks fail.
Autopilot is an operational mode for Google Kubernetes Engine (GKE) that provides a fully managed environment and takes care of operational details, like provisioning compute capacity for your workloads. Autopilot allows you to spend more time on developing your own applications and less time on managing node-level details. This year, we upgraded Autopilot’s autoscaling stack to a fully dynamic container-optimized compute platform that rapidly scales horizontally and vertically to support your workloads. Simply attach a horizontal pod autoscaler (HPA) or vertical pod autoscaler (VPA) to your environment, and experience a fully dynamic platform that can scale rapidly to serve your users.
More and more customers, including Hotspring and Contextual AI, understand that Autopilot can dramatically simplify Kubernetes cluster operations and enhance resource efficiency for their critical workloads. In fact, in 2024, 30% of active GKE clusters were created in Autopilot mode. The new container-optimized compute platform has also proved popular with customers, who report rapid performance improvements in provisioning time. The faster GKE provisions capacity, the more responsive your workloads become, improving your customers’ experience and optimizing costs.
Today, we are pleased to announce that the best of Autopilot is now available in all qualified GKE clusters, not just dedicated Autopilot ones. Now, you can utilize Autopilot’s container-optimized compute platform and ease of operation from existing GKE clusters. It’s generally available, starting with clusters enrolled in the Rapid release channel and running GKE version 1.33.1-gke.1107000 or later. Most clusters will qualify and be able to access these new features as they roll out to the other release channels, except clusters enrolled in the Extended channel and those that use the older routes-based networking. To access these new features, enroll in the Rapid channel and upgrade your cluster version, or wait to be auto-upgraded.
Autopilot features are offered in Standard clusters via compute classes, which are a modern way to group and specify compute requirements for workloads in GKE. GKE now has two built-in compute classes, autopilot and autopilot-spot, that are pre-installed on all qualified clusters running on GKE 1.33.1-gke.1107000 or later and enrolled in the Rapid release channel. Running your workload on Autopilot’s container-optimized compute platform is as easy as specifying the autopilot (or autopilot-spot) compute class, like so:
Better still, you can make the Autopilot container-optimized compute platform the default for a namespace, a great way to save both time and money. You get efficient bin-packing, where the workload is charged for resource requests (and can even still burst!), rapid scaling, and you don’t have to plan your node shapes and sizes.
Here’s how to set Autopilot as your default for a namespace:
Pod sizes for the container-optimized compute platform start at 50 milli-CPU (that’s just 5% of 1 CPU core!), and can scale to 28vCPU. With the container-optimized compute platform you only pay for the resources your Pod requests, so you don’t have to worry about system overhead or empty nodes. Pods such as those larger than 28 vCPU or with specific hardware requirements can also run in Autopilot mode on specialized compute with node-based pricing via customized compute classes.
Run AI workloads on GPUs and TPUs with Autopilot
It’s easy to pair Autopilot’s container-optimized compute platform with specific hardware such as GPUs, TPUs and high-performance CPUs to run your AI workloads. You can run those workloads in the same cluster side by side Pods on the container-optimized compute platform. By choosing Autopilot mode for these AI workloads, you benefit from the Autopilot’s managed node properties, where we take a more active role in management. Furthermore, you also get our enterprise-grade privileged admission controls that require workloads to run in user-space, for better supportability, reliability and an improved security posture.
Here’s how to define your own customized compute class that runs in Autopilot mode with specific hardware, in this example a G2 machine type with NVIDIA L4s with two priority rules:
We’re also making compute classes work better with a new provisioning mode that automatically provisions resources for compute classes, without changing how other workloads are scheduled on existing node pools. This means you can now adopt the new deployment paradigm of compute class (including the new Autopilot-enabled compute classes) at your own pace, without affecting existing workloads and deployment strategies.
Until now, to use compute class in Standard clusters with automatic node provisioning, you needed to enable node auto-provisioning for the entire cluster. Node auto-provisioning has been part of GKE for many years, but it was previously an all-or-nothing decision — you couldn’t easily combine a manual node pool with a compute class provisioned by node auto-provisioning without potentially changing how workloads outside of the compute class were scheduled. Now you can, with our new automatically provisioned compute classes. All Autopilot compute classes use this system, so it’s easy to run workloads in Autopilot mode side-by-side with your existing deployments (e.g., on manual node pools). You can also enable this feature on any compute class starting with clusters in the Rapid channel running GKE version 1.33.3-gke.1136000 or later.
With the Autopilot mode for compute classes in Standard clusters, and the new automatic provisioning mode for all compute classes, you can now introduce compute class as an option to more clusters without impacting how any of your existing workloads are scheduled. Customers we’ve spoken to like this, as they can adopt these new patterns gradually for new workloads and by migrating existing ones, without needing to plan a disruptive switch-over.
Autopilot for all
At Google Cloud, we believe in the power of GKE’s Autopilot mode to simplify operations for your GKE clusters and make them more efficient. Now, those benefits are available to all GKE customers! To learn more about GKE Autopilot and how to enable it for your clusters, check out these resources.
Region switch in Amazon Application Recovery Controller (ARC) is now available in the Asia Pacific (New Zealand) Region. Region switch allows you to orchestrate the specific steps to operate your cross-AWS account application resources out of another AWS Region. It provides dashboards for real-time visibility into the recovery process and gathers data from across resources and accounts required for reporting to regulators and compliance teams. Region switch supports failover and failback for active/passive multi-Region approaches, and shift-away and return for active/active multi-Region approaches. When you create a Region switch plan, it is replicated to all the Regions your application operates in. This removes dependencies on the Region you are leaving for your recovery.
The role of the data scientist is rapidly transforming. For the past decade, their mission has centered on analyzing the past to run predictive models that informed business decisions. Today, that is no longer enough. The market now demands that data scientists build the future by designing and deploying intelligent, autonomous agents that can reason, act, and learn on behalf of the enterprise.
This transition moves the data scientist from an analyst to an agentic architect. But the tools of the past — fragmented notebooks, siloed data systems, and complex paths to production — create friction that breaks the creative flow.
At Big Data London, we are announcing the next wave of data innovations built on an AI-native stack, designed to address these challenges. These capabilities help data scientists move beyond analysis to action by enabling them to:
Stop wasting time context-switching. We’re delivering a single, intelligent notebook environment where you can instantly use SQL, Python, and Spark together, letting you build and iterate in one place instead of fighting your tools.
Build agents that understand the real world. We’re giving you native, SQL-based access to the messy, real-time data — like live event streams and unstructured data — that your agents need to make smart, context-aware decisions.
Go from prototype to production in minutes, not weeks. We’re providing a complete ‘Build-Deploy-Connect’ toolkit to move your logic from a single notebook into a secure, production-grade fleet of autonomous agents.
Unifying the environment for data science
The greatest challenge of data science productivity is friction. Data scientists live in a state of constant, forced context-switching: writing SQL in one client, exporting data, loading it into a Python notebook, configuring a separate Spark cluster for heavy lifting, and then switching to a BI tool just to visualize results. Every switch breaks the creative “flow state” where real discovery happens. Our priority is to eliminate this friction by creating the single, intelligent environment an architect needs to engineer, build, and deploy — not just run predictive models.
Today, we are launching fundamental enhancements to Colab Enterprise notebooks in BigQuery and Vertex AI. We’ve added native SQL cells (preview), so you can now iterate on SQL queries and Python code in the same place. This lets you use SQL for data exploration and immediately pipe the results into a BigQuery DataFrame to build models in Python. Furthermore, rich interactive visualization cells (preview) automatically generate editable charts from your data to quickly assess the analysis. This integration breaks the barrier between SQL, Python, and visualization, transforming the notebook into an integrated development environment for data science tasks.
But an integrated environment is only half the solution; it must also be intelligent. This is the power of our Data Science Agent, which acts as an “interactive partner” inside Colab. Recent enhancements to this agent mean it can now incorporate sophisticated tool usage (preview) within its detailed plans, including the use of BigQueryML for training and inferencing, BigQueryDataFrames for analysis using Python, or large scale Spark transformations. This means your analysis gets more advanced, your demanding workloads are more cost-effective to run, and your models get into production quicker.
In addition, we are also making our Lightning Engine generally available. The Lightning Engine acceleratesSpark performance more than 4x compared to open-source Spark. And Lightning Engine is ML and AI-ready by default, seamlessly integrating with BigQuery Notebooks, Vertex AI, and VS Code. This means you can use the same accelerated Spark runtime across your entire workflow in any tool of choice — from initial exploration in a notebook to distributed training on Vertex AI. We’re also announcing advanced support for Spark 4.0 (preview), bringing its latest innovations directly to you.
Building agents that understand the real world
Agentic architects build systems that will sense and respond to the world in real time. This requires access to data that has historically been siloed in separate, specialized systems such as live event streams and unstructured data. To address this challenge we are making real-time streams and unstructured data more accessible for data science teams.
First, to process real-time data using SQL we are announcing stateful processing for BigQuery continuous queries (preview). In the past, it was difficult to ask questions about patterns over time using just SQL on live data. This new capability changes that. It gives your SQL queries a “memory,” allowing you to ask complex, state-aware questions. For example, instead of just seeing a single transaction, you can ask, “Has this credit card’s average transaction value over the last 5 minutes suddenly spiked by 300%?” An agent can now detect this suspicious velocity pattern — which a human analyst reviewing individual alerts would miss — and proactively trigger a temporary block on the card before a major fraudulent charge goes through. This unlocks powerful new use cases, from real-time fraud detection to adaptive security agents that learn and identify new attack patterns as they happen.
Second, we are removing the friction to build AI applications using a vector database, by helping data teams with autonomous embedding generation in BigQuery (preview) over multimodal data. Building on our BigQuery Vector Search capabilities, you no longer have to build, manage, or maintain a separate, complex data pipeline just to create and update your vector embeddings. BigQuery now takes care of this automatically as data arrives and as users search for new terms in natural language. This capability enables agents to connect user intent to enterprise data, and it’s already powering systems like the in-store product finder at Morrisons, which handles 50,000 customer searches on a busy day. Customers can use the product finder on their phones as they walk around the supermarket. By typing in the name of a product, they can immediately find which aisle a product is on and in which part of that aisle. The system uses semantic search to identify the specific product SKU, querying real-time store layout and product catalog data.
Trusted, production ready multi-agent development
When an analyst delivers a report and their job is done. When an architect deploys an autonomous application or agent, their job has just begun. This shift from notebook-as-prototype to agent-as-product introduces a critical new set of challenges: How do you move your notebook logic into a scalable, secure, and production-ready fleet of agents?
To solve this, we are providing a complete “Build-Deploy-Connect” toolkit for the agent architect. First, the Agent Development Kit (ADK) provides the framework to build, test, and orchestrate your logic into a fleet of specialized, production-grade agents. This is how you move from a single-file prototype to a robust, multi-agent system. And this agentic fleet doesn’t just find problems — it acts on them. ADK allows agents to ‘close the loop’ by taking intelligent, autonomous actions, from triggering alerts to creating and populating detailed case files directly in operational systems like ServiceNow or Salesforce.
A huge challenge until now was securely connecting these agents to your enterprise data, forcing developers to build and maintain their own custom integrations. To solve this, we launched first-party BigQuery tools directly integrated within ADK or via MCP. These are Google-maintained, secure tools that allow your agent to intelligently discover datasets, get table info, and execute SQL queries, freeing your team to focus on agent logic, not foundational plumbing. In addition, your agentic fleet can now easily connect to any data platform in Google Cloud using our MCP Toolbox. Available across BigQuery, AlloyDB, Cloud SQL, and Spanner, MCP Toolbox provides a secure, universal ‘plug’ for your agent fleet, connecting them to both the data sources and the tools they need to function.
This “Build-Deploy-Connect” toolkit also extends to the architect’s own workflow. While ADK helps agents connect to data, the architect (the human developer) needs to manage this system using a new primary interface: the command line (CLI). To eliminate the friction of switching to a UI for data tasks, we are integrating data tasks directly into the terminal with our new Gemini CLI extensions for Data Cloud (preview). Through the agentic Gemini CLI, developers can now use natural language to find datasets, analyze data, or generate forecasts — for example, you can simply state gemini bq “analyze error rates for ‘checkout-service'” — and even pipe results to local tools like Matplotlib, all without leaving your terminal.
Architecting the future
These innovations transform the impact data scientists can have within the organization. Using an AI-native stack we are now unifying the development environment in new ways, expanding data boundaries, and enabling trusted production ready development.
You can now automate tasks and use agents to become an agentic architect helping your organization to sense, reason, and act with intelligence. Ready to experience this transformation? Check out our new Data Science eBook with eight practical use cases and notebooks to get you started building today.
In June, Google introduced Gemini CLI, an open-source AI agent that brings the power of Gemini directly into your terminal. And today, we’re excited to announce open-source Gemini CLI extensions for Google Data Cloud services.
Building applications and analyzing trends with services like Cloud SQL, AlloyDB and BigQuery has never been easier — all from your local development environment! Whether you’re just getting started or a seasoned developer, these extensions make common data interactions such as app development, deployment, operations, and data analytics more productive and easier. So, let’s jump right in!
Using a Data Cloud Gemini CLI extension
Before you get started, make sure you have enabled the APIs and configured the IAM permissions required to access specific services.
To retrieve the newest functionality, install the latest release of the Gemini CLI (v0.6.0):
Replace <EXTENSION> with the name of the service you want to use. For example, alloydb, cloud-sql-postgresql or bigquery-data-analytics.
Before starting the Gemini CLI, you’ll need to configure the extension to connect with your Google Cloud project by adding the required environment variables. The table below provides more information on the configuration required.
Extension Name
Description
Configuration
alloydb
Create resources and interact with AlloyDB for PostgreSQL databases and data.
Now, you can start the Gemini CLI using command gemini. You can view the extensions installed with the command /extensions
You can list the MCP servers and tools included in the extension using command /mcp list
Using the Gemini CLI for Cloud SQL for PostgreSQL extension
The Cloud SQL for PostgreSQL extension lets you perform a number of actions. Some of the main ones are included below:
Create instance: Creates a new Cloud SQL instance for PostgreSQL (and also MySQL, or SQL Server)
List instances: Lists all Cloud SQL instances in a given project
Get instance: Retrieves information about a specific Cloud SQL instance
Create user: Creates a new user account within a specified Cloud SQL instance, supporting both standard and Cloud IAM users
Curious about how to put it in action? Like any good project, start with a solid written plan of what you are trying to do. Then, you can provide that project plan to the CLI as a series of prompts, and the agent will start provisioning the database and other resources:
After configuring the extension to connect to the new database, the agent can generate the required tables based on the approved plan. For easy testing, you can prompt the agent to add test data.
Now the agent can use the context it has to generate an API to make the data accessible.
As you can see, these extensions make it incredibly easy to start building with Google Cloud databases!
Using the BigQuery Analytics extensions
For your analytical needs, we are thrilled to give you a first look at the Gemini CLI extension for BigQuery Data Analytics. We are alsoexcited togiveaccess to the Conversational Analytics API through the BigQuery Conversational Analytics extension. This is the first step in our journey to bring the full power of BigQuery directly into your local coding environment, creating an integrated and unified workflow.
With this extension you can
Explore data: Use natural language to search for your tables.
Analyze: Ask business questions on the data and generate intelligent insights.
Dive deeper: Use conversational analytics APIs to dive deeper into the insights.
And extend: Use other tools or extensions to extend into advanced workflows like charting, reporting, code management, etc.
This initial release provides a comprehensive suite of tools to Gemini CLI:
Metadata tools: Discover and understand the BigQuery data landscape.
Query execution tool: Run any BigQuery query and get the results back, summarized to your console.
AI-powered forecasting: Leverage BigQuery’s built-in AI.Forecastfunction for powerful time-series predictions directly from the command line.
Deeper data Insights: The“ask_data_insights” tool provides access to server-side BigQuery agent for richer data insights.
And more …
[Note: To use the conversational analytics extension you need to enable additional APIs. Refer to documentation for additional info.]
Here is an example journey with analytics extensions:
Explore and analyze your data , e.g.,
code_block
<ListValue: [StructValue([(‘code’, ‘> find tables related to PyPi downloadsrn rn✦ I found the following tables related to PyPi downloads:rnrn * file_downloads: projects/bigquery-public-data/datasets/pypi/tables/file_downloadsrn * distribution_metadata: projects/bigquery-public-data/datasets/pypi/tables/distribution_metadata’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7fdcc86a8700>)])]>
code_block
<ListValue: [StructValue([(‘code’, ‘> Using bigquery-public-data.pypi.file_downloads show me top 10 downloaded pypi packages this month rnrn✦ Here are the top 10 most downloaded PyPI packages this month:rnrn 1. boto3: 685,007,866 downloadsrn 2. botocore: 531,034,851 downloadsrn 3. urllib3: 512,611,825 downloadsrn 4. requests: 464,595,806 downloadsrn 5. typing-extensions: 459,505,780 downloadsrn 6. certifi: 451,929,759 downloadsrn 7. charset-normalizer: 428,716,731 downloadsrn 8. idna: 409,262,986 downloadsrn 9. grpcio-status: 402,535,938 downloadsrn 10. aiobotocore: 399,650,559 downloads’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7fdcc5054a60>)])]>
Run deeper insights
Using “ask_data_insights” to trigger an agent on the BigQuery (Conversational analytics API) to answer your questions. The server side agent is smart enough to gather additional context about your data and offer deeper insights into your questions.
You can go further and generate charts and reports by mixing BigQuery data with your local tools. Here’s a prompt to try:
”using bigquery-public-data.pypi.file_downloads can you forecast downloads for the last four months of 2025 for package urllib3? Please plot a chart that includes actual downloads for the first 8 months, followed by the forecast for the last four months”
Get started today!
Ready to level up your Gemini CLI extensions for our Data Cloud services? Read more in theextensions documentation. Check out our templates and start building your own extensions to share with the community!
Written by: Sarah Yoder, John Wolfram, Ashley Pearson, Doug Bienstock, Josh Madeley, Josh Murchie, Brad Slaybaugh, Matt Lin, Geoff Carstairs, Austin Larsen
Introduction
Google Threat Intelligence Group (GTIG) is tracking BRICKSTORM malware activity, which is being used to maintain persistent access to victim organizations in the United States. Since March 2025, Mandiant Consulting has responded to intrusions across a range of industry verticals, most notably legal services, Software as a Service (SaaS) providers, Business Process Outsourcers (BPOs), and Technology. The value of these targets extends beyond typical espionage missions, potentially providing data to feed development of zero-days and establishing pivot points for broader access to downstream victims.
aside_block
<ListValue: [StructValue([(‘title’, ‘BRICKSTORM Scanner’), (‘body’, <wagtail.rich_text.RichText object at 0x7f13aeed1d30>), (‘btn_text’, ‘Get the tool!’), (‘href’, ‘https://github.com/mandiant/brickstorm-scanner’), (‘image’, None)])]>
We attribute this activity to UNC5221 and closely related, suspected China-nexus threat clusters that employ sophisticated capabilities, including the exploitation of zero-day vulnerabilities targeting network appliances. While UNC5221 has been used synonymously with the actor publicly reported as Silk Typhoon, GTIG does not currently consider the two clusters to be the same.
These intrusions are conducted with a particular focus on maintaining long-term stealthy access by deploying backdoors on appliances that do not support traditional endpoint detection and response (EDR) tools. The actor employs methods for lateral movement and data theft that generate minimal to no security telemetry. This, coupled with modifications to the BRICKSTORM backdoor, has enabled them to remain undetected in victim environments for 393 days, on average. Mandiant strongly encourages organizations to reevaluate their threat model for appliances and conduct hunt exercises for this highly evasive actor. We are sharing an updated threat actor lifecycle for BRICKSTORM associated intrusions, along with specific and actionable steps organizations should take to hunt for and protect themselves from this activity.
Figure 1: BRICKSTORM targeting
Threat Actor Lifecycle
The actor behind BRICKSTORM employs sophisticated techniques to maintain persistence and minimize the visibility traditional security tools have into their activities. The section is a review of techniques observed from multiple Mandiant investigations, with customer details sanitized.
Initial Access
A consistent challenge across Mandiant investigations into BRICKSTORM intrusions has been determining the initial intrusion vector. In many cases, the average dwell time of 393 days exceeded log retention periods and the artifacts of the initial intrusion were no longer available. Despite these challenges, a pattern in the available evidence points to the actor’s focus on compromising perimeter and remote access infrastructure.
In at least one case, the actor gained access by exploiting a zero-day vulnerability. Mandiant has identified evidence of this actor operating from several other edge appliances early in the lifecycle, but could not find definitive evidence of vulnerability exploitation. As noted in our previous blog post from April 2025, Mandiant has identified the use of post-exploitation scripts that have included a wide range of anti-forensics functions designed to obscure entry.
Establish Foothold
The primary backdoor used by this actor is BRICKSTORM, as previously discussed by Mandiant and others. BRICKSTORM includes SOCKS proxy functionality and is written in Go, which has wide cross-platform support. This is essential to support the actor’s preference to deploy backdoors on appliance platforms that do not support traditional Endpoint Detection and Response (EDR) tools. Mandiant has found evidence of BRICKSTORM on Linux and BSD-based appliances from multiple manufacturers. Although there is evidence of a BRICKSTORM variant for Windows, Mandiant has not observed it in any investigation. Appliances are often poorly inventoried, not monitored by security teams, and excluded from centralized security logging solutions. While BRICKSTORM has been found on many appliance types, UNC5221 consistently targets VMware vCenter and ESXi hosts. In multiple cases, the threat actor deployed BRICKSTORM to a network appliance prior to pivoting to VMware systems. The actor moved laterally to a vCenter server in the environment using valid credentials, which were likely captured by the malware running on the network appliances.
Our analysis of samples recovered from different victim organizations has found evidence of active development of BRICKSTORM. While the core functionality has remained, some samples are obfuscated using Garble and some carry a new version of the custom wssoft library. Mandiant recovered one sample of BRICKSTORM with a “delay” timer built-in that waited for a hard-coded date months in the future before beginning to beacon to the configured command and control domain. Notably, this backdoor was deployed on an internal vCenter server after the victim organization had begun their incident response investigation, demonstrating that the threat actor was actively monitoring and capable of rapidly adapting their tactics to maintain persistence.
As previously reported, BRICKSTORM deployments are often designed to blend in with the target appliance, with the naming convention and even the functionality of the sample being designed to masquerade as legitimate activity. Mandiant has identified samples using Cloudflare Workers and Heroku applications for C2, as well as sslip.io or nip.io to resolve directly to C2 IP addresses. From the set of samples we’ve recovered, there has been no reuse of C2 domains across victims.
Escalate Privileges
At one investigation, Mandiant analyzed a vCenter server and found the threat actor installed a malicious Java Servlet filter for the Apache Tomcat server that runs the web interface for vCenter. A Servlet Filter is code that runs every time the web server receives an HTTP request. Normally, installing a filter requires modifying a configuration file and restarting or reloading the application; however, the actor used a custom dropper that made the modifications entirely in memory, making it very stealthy and negating the need for a restart. The malicious filter, tracked by Mandiant as BRICKSTEAL, runs on HTTP requests to the vCenter web login Uniform Resource Indicators (URIs) /web/saml2/sso/*. If present, it decodes the HTTP Basic authentication header, which may contain a username and password. Many organizations use Active Directory authentication for vCenter, which means BRICKSTEAL could capture those credentials. Often, users who log in to vCenter have a high level of privilege in the rest of the enterprise. Previously shared hardening guidance for vSphere includes steps that can mitigate the ability of BRICKSTEAL to capture usable credentials in this scenario, such as enforcement of multi-factor authentication (MFA).
VMware vCenter is an attractive target for threat actors because it acts as the management layer for the vSphere virtualization platform and can take actions on VMs such as creating, snapshotting, and cloning. In at least two cases, the threat actor used their access to vCenter to clone Windows Server VMs for key systems such as Domain Controllers, SSO Identity Providers, and secret vaults. This is a technique that other threat actors have used. With a clone of the virtual machine, the threat actor can mount the filesystem and extract files of interest, such as the Active Directory Domain Services database (ntds.dit). Although these Windows Servers likely have security tools installed on them, the threat actor never powers on the clone so the tools are not executed. The following example shows vCenter VPXD logs of the threat actor using the local vSphere Administrator account to clone a VM.
2025-04-01 03:37:40 [vim.event.TaskEvent] [info] [VSPHERE.LOCALAdministrator] [<vCenter inventory object>] [<unique identifier>] [Task: VirtualMachine.clone]
2025-04-01 03:37:49 [vim.event.VmBeingClonedEvent] [info] [VSPHERE.LOCALAdministrator] [<vCenter inventory object>] [<same unique identifier>] [Cloning DC01 on esxi01, in <vCenter inventory object> to DC01-clone on esxi02, in <vCenter inventory object>]
2025-04-01 03:42:07 [vim.event.VmClonedEvent] [info] [VSPHERE.LOCALAdministrator] [<vCenter inventory object>] [<unique identifier>] [DC01 cloned to DC01-clone on esxi02, in <vCenter inventory object>]
2025-04-01 04:05:40 [vim.event.TaskEvent] [info] [VSPHERE.LOCALAdministrator] [<vCenter inventory object>] [<unique identifier>] [Task: VirtualMachine.destroy]
2025-04-01 04:05:47 [vim.event.VmRemovedEvent] [info] [VSPHERE.LOCALAdministrator] [<vCenter inventory object>] [<unique identifier>] [Removed DC01-Clone on esxi02 from <vCenter inventory object>]
In one instance the threat actor used legitimate server administrator credentials to repeatedly move laterally to a system running Delinea (formerly Thycotic) Secret Server. The forensic artifacts recovered from the system were consistent with the execution of a tool, such as secret stealer, to automatically extract and decrypt all credentials stored by the Secret Server application.
Move Laterally
Typically, at least one instance of BRICKSTORM would be the primary source of hands-on keyboard activity, with two or more compromised appliances serving as backups. To install BRICKSTORM, the actor used legitimate credentials to connect to the appliance, often with SSH. In one instance the actor used credentials known to be stored in a password vault they previously accessed. In another instance they used credentials known to be stored in a PowerShell script the threat actor previously viewed. In multiple cases the actor logged in to either the ESXi web-based UI or the vCenter Appliance Management Interface (VAMI) to enable the SSH service so they could connect and install BRICKSTORM. The following are example VAMI access events that show the threat actor connecting to VAMI and making changes to the SSH settings for vCenter.
To maintain access to victim environments, the threat actor modified the init.d, rc.local, or systemd files to ensure BRICKSTORM started on appliance reboot. In multiple cases, the actor used the sed command line utility to modify legitimate startup scripts to launch BRICKSTORM. The following are a few example sed commands executed by the actor on vCenter.
sed -i s/export TEXTDOMAIN=vami-lighttp/export TEXTDOMAIN=vami-lighttpn/path/to/brickstorm/g /opt/vmware/etc/init.d/vami-lighttp
sed -i $aSETCOLOR_WARNING="echo -en `/path/to/brickstorm`\033[0;33m" /etc/sysconfig/init
The threat actor has also created a web shell tracked by Mandiant as SLAYSTYLE on vCenter servers. SLAYSTYLE, tracked by MITRE as BEEFLUSH, is a JavaServer Pages (JSP) web shell that functions as a backdoor. It is designed to receive and execute arbitrary operating system commands passed through an HTTP request. The output from these commands is returned in the body of the HTTP response.
Complete Mission
A common theme across investigations is the threat actor’s interest in the emails of key individuals within the victim organization. To access the email mailboxes of target accounts, the threat actor made use of Microsoft Entra ID Enterprise Applications with mail.read or full_access_as_app scopes. Both scopes allow the application to access mail in any mailbox. In some cases, the threat actor targeted the mailboxes of developers and system administrators while in other cases, they targeted the mailboxes of individuals involved in matters that align with PRC economic and espionage interests.
When the threat actor exfiltrated files from the victim environment, they used the SOCKS proxy feature of BRICKSTORM to tunnel their workstation and directly access systems and web applications of interest. In multiple cases the threat actor used legitimate credentials to log in to the web interface for internal code stores and download repositories as ZIP archives. In other cases the threat actor browsed to specific directories and files on remote machines by specifying Windows Universal Naming Convention (UNC) paths.
In several cases the BRICKSTORM samples deployed by the threat actor were removed from compromised systems. In these cases, the presence of BRICKSTORM was observed by conducting forensic analysis of backup images that identified the BRICKSTORM malware in place.
Hunting Guidance
Mandiant has previously discussed the diminishing usefulness of atomic IOCs and the need to adopt TTP-based hunting. Across BRICKSTORM investigations we have not observed the reuse of C2 domains or malware samples, which, coupled with high operational security, means these indicators quickly expire or are never observed at all. Therefore, a TTP-based hunting approach is not only an ideal practice, but a necessity to detect patterns of attack that are unlikely to be detected by traditional signature-based defenses. The following is a checklist of the minimal set of hunts Mandiant recommends organizations conduct to search for BRICKSTORM and related activities.
Step
Hunt
Data Sources
0
Create or update asset inventory that includes edge devices and other appliances
N/A
1
File and backup scan for BRICKSTORM
Appliance file system, backups
2
Internet traffic from edge devices and appliances
Firewall connection logs, DNS logs, IDS/IPS, netflow
3
Access to Windows servers and desktops from appliances
EDR telemetry, Security Event Logs, Terminal Service Logs, Windows UAL
4
Access to credentials and secrets
Windows Shellbags, EDR telemetry
5
Access to M365 mailboxes using Enterprise Application
M365 UAL
6
Cloning of sensitive virtual machines
vSphere VPXD logs
7
Creation of local vCenter and ESXi accounts
VMware audit events
8
SSH enablement on vSphere platform
VMware audit events, VAMI logs
9
Rogue VMs
VMware audit events, VM inventory reports
Create or Update Asset Inventory
Foundational to the success of any threat hunt is an asset inventory that includes devices not covered by the standard security tool stack, such as edge devices and other appliances. Because these appliances lack support for traditional security tools an inventory is critical for developing effective compensating controls and detections. Especially important is to track the management interface addresses of these appliances, as they act as the default gateway that malware and threat actor commands will egress out of.
Mandiant recommends organizations take a multi-step approach to building or updating this inventory:
Known knowns: Begin with the appliance classes that all organizations use: firewalls, VPN concentrators, virtualization platforms, conferencing systems, badging, and file storage.
Known unknowns: Work across teams to brainstorm appliance classes that may be more specialized to your organization, but the security organization likely lacks visibility into.
Unknown unknowns: These are the appliances that were supposed to be decommissioned but weren’t, sales POVs, and others. Consider using network visibility tools or your existing EDR to scan for “live” IP addresses that do not show in your EDR reports. This has the added benefit of identifying unmanaged devices that should have EDR but don’t.
Figure 2: Asset inventory
File and Backup Scan for BRICKSTORM
YARA rules have proven to be the most effective method for detecting BRICKSTORM binaries on appliances. We are sharing relevant YARA rules in the appendix section of this post. Yara can be difficult to run at scale, but some backup solutions provide the ability to run YARA across the backup data store. Mandiant is aware of multiple customers who have identified BRICKSTORM through this method.
To aid organizations in hunting for BRICKSTORM activity in their environments, Mandiant released a scanner script, which can run on appliances and other Linux or BSD-based systems.
Internet Traffic from Edge Devices and Appliances
Use the inventory of appliance management IP addresses to hunt for evidence of malware beaconing in network logs. In general, appliances should not communicate with the public Internet from management IP addresses except to download updates and send crash analytics to the manufacturer.
Established outbound traffic to domains or IP addresses not controlled by the appliance manufacturer should be regarded as very suspicious and warranting forensic review of the appliance. BRICKSTORM can use DNS over HTTP (DoH), which should be similarly rare when sourced from appliance management IP addresses.
Access to Windows Systems from Appliances
The threat actor primarily accessed Windows machines (both desktops and servers) using type 3 (network) logins, although in some cases the actor also established RDP sessions. Appliances should rarely log in to Windows desktops or servers and any connections should be treated as suspicious. Some examples of false positives could include VPN appliances using a known service account to connect to a domain controller in order to perform LDAP lookups and authenticated vulnerability scanners using a well-known service account.
In addition to EDR telemetry, Terminal Services logs and Security event logs, defenders should obtain and parse the Windows User Access Log (UAL). The UAL is stored on Windows Servers inside the directory WindowsSystem32LogFilesSum and can be parsed using open-source tools such as SumECmd. This log source records attempted authenticated connections to Windows systems and often retains artifacts going back much longer than typical Windows event logs. Note that this log source includes successful and unsuccessful logins, but is still useful to identify suspicious activity sourced from appliances.
Access to Credentials and Secrets
Use the forensic capabilities of EDR tools to acquire Windows Shellbags artifacts from Windows workstations and servers. Shellbags records folder paths that are browsed by a user with the Windows Explorer application. Use an open-source parser to extract the relevant data and look for patterns of activity that are suspicious:
Access to folder paths where the initiating user is a service account, especially service accounts that are unfamiliar or rarely used
File browsing activity sourced from servers that include a Windows Universal Naming Convention (UNC) path that points to a workstation (e.g., \bobwin7.corp.localbrowsingpath)
File browsing activity to folder paths that contain credential data, such as:
Appdata locations used to store session tokens (e.g., Users<username>.azure)
Windows credential vault (%appdatalocal%MicrosoftCredentials)
Data Protection API (DPAPI) keys (%appdata%MicrosoftProtect<SID>)
Access to M365 Mailboxes using Enterprise Application
Mandiant has observed this actor use common techniques to conduct bulk email access and exfiltration from Microsoft 365 Exchange Online. Organizations should follow our guidance outlined in our APT29 whitepaper to hunt for these techniques. Although the white paper specifically references APT29, these techniques have become widely used by many groups. In multiple investigations the threat actor used a Microsoft Entra ID Enterprise Application with mail.read or full_access_as_app scopes to access mailboxes of key individuals in the victim organization.
To hunt for this activity, we recommend a phased approach:
Enumerate the Enterprise Applications and Application Registrations with graph permissions that can read all mail.
For each application, validate that there is at least one secret or certificate configured for it. Record the Application (client) ID
Conduct a free text search against the Unified Audit Log or the OfficeActivity table in Sentinel for the client IDs from step 2. This will return the mailitemsaccessed events that recorded the application accessing mail.
For each application analyze the source IP addresses and user-agent strings for discrepancies. Legitimate usage of the applications should occur from well-defined IP addresses. Additionally, look for focused interest in key personnel mailboxes across multiple days.
When accessing M365 and other internet-facing services the actor has used multiple commercial VPN and proxy providers. Mandiant has found evidence of the threat actor using PIA, NordVPN, Surfshark, VPN Unlimited, and PrivadoVPN, although there is no reason for these to be the only solutions used. There is also evidence to support that this actor has access to a purpose-built obfuscation network built from compromised small office/home office routers. Mandiant has no knowledge of how these routers are being compromised. The exit nodes for commercial VPNs and obfuscation networks change rapidly and sharing atomic indicators for hunting purposes is unlikely to yield results. Instead, identify the key individuals in the organization, with respect to the organization vertical and likely goals of the threat actor. Fetch mailitemsaccessed logs for those mailboxes for the last year or as long as retention allows. Analyze the SessionID values of the log events and look for IDs that span multiple IP addresses where the IP addresses are not in the user’s typical geographic location.
Cloning of Sensitive Virtual Machines
On VMware vCenter servers, VPXD logs contain valuable information for VM management related tasks such as clone events, powering on and off a VM, and creating snapshots. The threat actor often used the VSPHERE.LOCALAdministrator account when cloning VMs and targeted VMs that would contain credentials such as password vaults and domain controllers. The threat actor would delete the cloned VM shortly after cloning, and primarily operated between the hours of 01:00 and 10:00 UTC. Investigators should search vCenter VPXD logs for activity that matches the aforementioned criteria and confirm if the cloning activity was intended or not.
Creation of Local vCenter and ESXi Accounts
Mandiant identified evidence the threat actor created a new local account to install BRICKSTORM and then removed the account after they were done. The following logs show the threat actor using the local Administrator account to create a new local account and add it to the BashShellAdministrators group. The threat actor established an SSH connection from a compromised appliance to vCenter using the newly created account and installed the BRICKSTORM backdoor on vCenter. Shortly after, the threat actor deleted the account. Investigators should review audit logs in /var/log/audit/sso-events/audit_events.log for the creation and deletion of local accounts and validate their purpose. In one instance, the threat actor named the account with a similar naming convention as a local service account used for backups on vCenter.
2025-04-01T06:45:32 {"user":"Administrator@VSPHERE.LOCAL","client":"","timestamp":"04/01/2025 06:45:32 GMT","description":"Creating local person user '<account_name>' with details ('','','','','<account_name>@vsphere.local')","eventSeverity":"INFO","type":"com.vmware.sso.PrincipalManagement"}
2025-04-01T06:45:55 {"user":"Administrator@VSPHERE.LOCAL","client":"","timestamp":"04/01/2025 06:45:55 GMT","description":"Adding users '[{Name: <account_name>, Domain: vsphere.local}]' to local group 'Administrators'","eventSeverity":"INFO","type":"com.vmware.sso.PrincipalManagement"}
2025-04-01T06:46:23 {"user":"Administrator@VSPHERE.LOCAL","client":"","timestamp":"04/01/2025 06:46:23 GMT","description":"Updating local group 'SystemConfiguration.BashShellAdministrators' details ('Access bash shell and manage local users on nodes').","eventSeverity":"INFO","type":"com.vmware.sso.PrincipalManagement"}
2025-04-01T06:52:03 <vcenter_hostname> sshd[36952]: Postponed keyboard-interactive/pam for <account_name>@vsphere.local from <compromised_system>
2025-04-01T06:52:30 <vcenter_hostname> sudo: pam_unix(sudo:session): session opened for user root
2025-04-01T06:53:39 Creation of BRICKSTORM on vCenter
2025-04-01T06:56:18 <vcenter_hostname> sudo: pam_unix(sudo:session): session closed for user root
2025-04-01T06:56:25 <vcenter_hostname> sshd[36952]: pam_unix(sshd:session): session closed for user <account_name>@vsphere.local
2025-04-01T06:56:57 {"user":"Administrator@VSPHERE.LOCAL","client":"","timestamp":"04/01/2025 06:56:57 GMT","description":"Removing principals '[{Name: <account_name>, Domain: vsphere.local}]' from local group 'Administrators'","eventSeverity":"INFO","type":"com.vmware.sso.PrincipalManagement"}
2025-04-01T06:58:12 {"user":"Administrator@VSPHERE.LOCAL","client":"","timestamp":"04/01/2025 06:58:12 GMT","description":"Deleting principal '<account_name>'","eventSeverity":"INFO","type":"com.vmware.sso.PrincipalManagement"}
SSH Enablement on ESXi and vCenter
For ESXi servers, monitoring should be set up for SSH logins using local accounts. In most organizations it is relatively rare for legitimate direct access to the ESXi hosts over SSH. In many cases the SSH server is disabled by default. Write rules to alert on log events when SSH is enabled for a vSphere platform appliance.
Rogue VMs
Organizations should review VMWare audit events that track the creation and deletion of new VMs, particularly using non-standard ISO images and Operating Systems. Audit events may also record the threat actor downloading archived ISO images to the datastore volumes used by vSphere.
Hardening Guidance
It is crucial to maintain an up-to-date inventory of appliances and other devices in the network that do not support the standard security tool stack. Any device in that inventory, whether internal or internet-facing, should be configured to follow a principle of least access.
Internet access: Appliances should not have unrestricted access to the internet. Work with your vendors or monitor your firewall logs to lock down internet access to only those domains or IP addresses that the appliance requires to function properly.
Internal network access: Appliances exposed to the internet should not have unrestricted access to internal IP address space. The management interface of most appliances does not need to establish connections to internal IP addresses. Work with the vendor to understand specific needsLDAP queries to verify user attributes for VPN logins.
Mandiant has previously published guidance to secure the vSphere platform from threat actors. We recommend you follow the guidance, especially the forwarding of logs to a central SIEM, enabling vSphere lockdown mode, enforcing MFA for web logins, and enforcing the execInstalledOnly policy.
Organizations should assess and improve the isolation of any credential vaulting systems. In many cases if a threat actor is able to gain access to the underlying Operating System, any protected secrets can be exposed. Servers hosting credential vaulting applications should be considered Tier 0 systems and have strict access controls applied to them. Mandiant recommends organizations work with their vendors to adopt secure software practices such as storing encryption keys in the Trusted Platform Module (TPM) of the server.
Outlook and Implications
Recent intrusion operations tied to BRICKSTORM likely represent an array of objectives ranging from geopolitical espionage, access operations, and intellectual property (IP) theft to enable exploit development. Based on evidence from recent investigations the targeting of the US legal space is primarily to gather information related to US national security and international trade. Additionally, GTIG assesses with high confidence that the objective of BRICKSTORM targeting SaaS providers is to gain access to downstream customer environments or the data SaaS providers host on their customers’ behalf. The targeting of technology companies presents an opportunity to conduct theft of valuable IP to further the development of zero-day exploits.
Acknowledgements
This analysis would not have been possible without the assistance from across Google Threat Intelligence Group, Mandiant Consulting and FLARE. We would like to specifically thank Nick Simonian from GTIG Research and Discovery (RAD).
Indicators of Compromise
The following indicators of compromise are available in a Google Threat Intelligence (GTI) collection. Note that Mandiant has not observed instances where the threat actor reused a malware sample and hunting for the exact indicators is unlikely to yield results.
Public sector agencies are under increasing pressure to operate with greater speed and agility, yet are often hampered by decades of legacy data. Critical information, essential for meeting tight deadlines and fulfilling mandates, frequently lies buried within vast collections of unstructured documents. This challenge of transforming institutional knowledge into actionable insight is a common hurdle on the path to modernization.
The Indiana Department of Transportation (INDOT) recently faced this exact scenario. To comply with Governor Mike Braun’s Executive Order 25-13, all state agencies were given 30 days to complete a government efficiency report, mapping all statutory responsibilities to their core purpose. For INDOT, the critical information needed to complete this report was buried in a mix of editable and static documents – decades of policies, procedures, and manuals scattered across internal sites. A manual review was projected to take hundreds of hours, making the deadline nearly impossible. This tight deadline necessitated an innovative approach to data processing and report generation.
Recognizing a complex challenge as an opportunity for transformation, INDOT’s leadership envisioned an AI-powered solution. The agency chose to build its pilot program on its existing Google Cloud environment, which allowed it to deploy Gemini’s capabilities immediately. By taking this strategic approach, the team was able to turn a difficult compliance requirement into a powerful demonstration of government efficiency.
From manual analysis to an AI-powered pilot in one week
Operating in an agile week-long sprint, INDOT’s team built an innovative workflow centered on Retrieval-Augmented Generation (RAG). This technique enhances generative AI models by grounding them in specific, private data, allowing them to provide accurate, context-aware answers.
The technical workflow began with data ingestion and pre-processing. The team quickly developed Python scripts to perform “Extract, Transform, Load” (ETL) on the fly, scraping internal websites for statutes and parsing text from numerous internal files. This crucial step cleaned and structured the data for the next stage: indexing. Using Vertex AI Search, they created a robust, searchable vector index of the curated documents, which formed the definitive knowledge base for the generative model.
With the data indexed, the RAG engine in Vertex AI could efficiently retrieve the most relevant document snippets in response to a query. This contextual information was then passed to Gemini via Vertex AI. This two-step process was critical, as it ensured the model’s responses were based solely on INDOT’s official documents, not on public internet data.
Setting a new standard for government efficiency
Within an intensive, week-long effort, the team delivered a functioning pilot that generated draft reports across nine INDOT divisions with an impressive 98% fidelity – a measure of how accurately the new reports reflected the information in the original source documents. This innovative approach saved an estimated 360 hours of manual effort, freeing agency staff from tedious data collection to focus on the high-value work of refining and validating the reports. The solution enabled INDOT to become the largest Indiana state agency to submit its government efficiency report on time.
The government efficiency report was a novel experience for many on our executive team, demonstrating firsthand the transformative potential of large language models like Gemini. This project didn’t just help us meet a critical deadline; it paved the way for broader executive support of AI initiatives that will ultimately enhance our ability to serve Indiana’s transportation needs.
Alison Grand
Deputy Commissioner and Chief Legal Counsel, Indiana Department of Transportation
The AI-generated report framework was so effective that it became the official template for 60 other state agencies, powerfully demonstrating a responsible use of AI and building significant trust in INDOT as a leader in statewide policy. By building a scalable, secure RAG system on Google Cloud, INDOT not only met its tight deadline but also created a reusable model for future innovation, accelerating its mission to better serve the people of Indiana.
Join us at Google Public Sector Summit
To see Google’s latest AI innovations in action, and learn more about how Google Cloud technology is empowering state and local government agencies, register to attend the Google Public Sector Summit taking place on October 29 in Washington, D.C.
Editor’s note: Today’s post is by Syed Mohammad Mujeeb, CIO and Arsalan Mazhar, Head of Infrastructure,forJS Bank a prominent and rapidly growing midsize commercial bank in Pakistan with a strong national presence of over 293 branches. JS Bank, always at the forefront of technology, deployed a Google stack to modernize operations while maintaining security & compliance.
Snapshot:
JS Bank’s IT department, strained across 293 branches, was hindered by endpoint instability, a complex security stack, and a lack of device standardization. This reactive environment limited their capacity for innovation.
Through a strategic migration to a unified Google ecosystem—including ChromeOS, Google Workspace, and Google Cloud—the bank transformed its operations. The deployment of 1,500 Chromebooks resulted in a more reliable, secure, and manageable IT infrastructure. This shift cut device management time by 40% and halved daily support tickets, empowering the IT team to pivot from routine maintenance to strategic initiatives like digitization and AI integration.
Reduced IT Burden: reduced device management time by 40%
Daily support tickets were halved, freeing up IT time for strategic, value-added projects
Nearly 90% endpoint standardization, creating a manageable and efficient IT architecture
A simplified, powerful security posture with the built-in protection of ChromeOS and Google Workspace
At JS Bank, we pride ourselves as technology pioneers, always bringing new technology into banking. Our slogan, “Barhna Hai Aagey,” means we are always moving onward and upward. But a few years ago, our internal IT infrastructure was holding us back. We researched and evaluated different solutions, but found the combination of ChromeOS and Google Workspace, a perfect fit in today’s technology landscape which is surrounded by cyber threats. When we shifted to a unified Google stack, we paved the way for our future driven by AI, innovation, and operational excellence.
Before our transformation, our legacy solution was functional, but it was a constant struggle. Our IT team was spread thin across our 293 branches, dealing with a cumbersome setup that required numerous security tools, including antivirus, anti-malware, all layered on top of each other. Endpoints crashed frequently, and with a mixture of older devices and some devices running Ubuntu, we lacked the standardization needed for true efficiency and security. It was a reactive environment, and our team was spending too much time on basic fixes rather than driving innovation.
We decided to make a strategic change to align with our bank’s core mission of digitization, and that meant finding a partner with an end-to-end solution. We chose Google because we saw the value in their integrated ecosystem and anticipated the future convergence of public and private clouds. We deployed 1,500 Chromeboxes across branches and fully transitioned to Google Workspace.
Today, we have achieved nearly 90% standardization across our endpoints with Chromebooks and Chromeboxes, all deeply integrated with Google Workspace. This shift has led to significant improvements in security, IT management, and employee productivity. The built-in security features of the Google ecosystem provide peace of mind, especially during periods of heightened cybersecurity threats, as we feel that Google will inherently protect us from cyberattacks. This has simplified security protocols in branches, eliminating the need for multiple antivirus and anti-malware tools, giving our security team incredible peace of mind. Moreover, the lightweight nature of the Google solutions ensures applications are available from anywhere, anytime, and deployments in branches are simplified.
To strengthen security across all corporate devices, we made Chrome our required browser. This provides foundational protections like Safe Browsing to block malicious sites, browser reporting, and password reuse alerts. For 1,500 users, we adopted Chrome Enterprise Premium. This provides features like Zero-Trust enterprise security, centralized management, data loss prevention (DLP) to protect against accidental data loss, secure access to applications with context-aware access restrictions, and scans high-risk files.
With Google, our IT architecture is now manageable. The team’s focus has fundamentally shifted from putting out fires to supporting our customers and building value. We’ve seen a change in our own employees, too; the teams who once managed our legacy systems are now eager to work within the Google ecosystem. From an IT perspective, the results are remarkable: the team required to manage the ChromeOS environment has shrunk to 40%. Daily support tickets have been halved, freeing IT staff from hardware troubleshooting to focus on more strategic application support, enhancing their job satisfaction and career development. Our IT staff now enjoy less taxing weekends due to reduced work hours and a lighter operational burden.
Our “One Platform” vision comes to life
We are simplifying our IT architecture using Google’s ecosystem to achieve our “One Platform” vision. As a Google shop, we’ve deployed Chromebooks enterprise-wide and unified user access with a “One Window” application and single sign-on. Our “One Data” platform uses an Elastic Search data lake on Google Cloud, now being connected to Google’s LLMs. This integrated platform provides our complete AI toolkit—from Gemini and NotebookLM to upcoming Document and Vision AI. By exploring Vertex AI, we are on track to become the region’s most technologically advanced bank by 2026.
Our journey involved significant internal change, but by trusting the process and our partners, we have built a foundation that is not only simpler and more secure but is also ready for the next wave of innovation. We are truly living our mission of moving onward and upward.
Today, we are announcing the availability of Route 53 Resolver Query Logging in Asia Pacific (New Zealand), enabling you to log DNS queries that originate in your Amazon Virtual Private Cloud (Amazon VPC). With query logging enabled, you can see which domain names have been queried, the AWS resources from which the queries originated – including source IP and instance ID – and the responses that were received.
Route 53 Resolver is the Amazon provided DNS server that is available by default in all Amazon VPCs. Route 53 Resolver responds to DNS queries from AWS resources within a VPC for public DNS records, Amazon VPC-specific DNS names, and Amazon Route 53 private hosted zones. With Route 53 Resolver Query Logging, customers can log DNS queries and responses for queries originating from within their VPCs, whether those queries are answered locally by Route 53 Resolver, or are resolved over the public internet, or are forwarded to on-premises DNS servers via Resolver Endpoints. You can share your query logging configurations across multiple accounts using AWS Resource Access Manager (RAM). You can also choose to send your query logs to Amazon S3, Amazon CloudWatch Logs, or Amazon Data Firehose.
There is no additional charge to use Route 53 Resolver Query Logging, although you may incur usage charges from Amazon S3, Amazon CloudWatch, or Amazon Data Firehose. To learn more about Route 53 Resolver Query Logging or to get started, visit the Route 53 Resolver product page or the Route 53 documentation.
Amazon GameLift Servers now supports a new AWS Local Zone in Dallas, Texas (us-east-1-dfw-2). You can use this Local Zone to deploy GameLift Fleets with EC2 C6gn, C6i, C6in, M6g, M6i, M6in, M8g, and R6i instances. Local Zones place AWS services closer to major player population and IT centers where no AWS region exists. From the Amazon GameLift Servers Console, you can enable the Dallas Local Zone and add it to your fleets, just as you would with any other Region or Local Zone.
With this launch, game studios can run latency-sensitive workloads such as real-time multiplayer gaming, responsive AR/VR experiences, and competitive tournaments closer to players in the Dallas metro area. Local Zones help deliver single-digit millisecond latency, giving players a smoother, more responsive experience by reducing network distance between your servers and players.
Today, AWS eliminated the networking bandwidth burst duration limitations for Amazon EC2 I7i and I8g instances on sizes larger than 4xlarge. This update doubles the Network Bandwidth available at all times for i7i and i8g instances on sizes larger than 4xlarge. Previously, these instance sizes had a baseline bandwidth and used a network I/O credit mechanism to burst beyond their baseline bandwidth on a best effort basis. Today these instance sizes can sustain their maximum performance indefinitely. With this improvement, customers running memory and network intensive workloads on larger instance sizes can now consistently maintain their maximum network bandwidth without interruption, delivering more predictable performance for applications that require sustained high-throughput network connectivity. This change applies only to instance sizes larger than 4xlarge, while smaller instances will continue to operate with their existing baseline and burst bandwidth configurations.
Amazon EC2 I7i and I8g instances are designed for I/O intensive workloads that require rapid data access and real-time latency from storage. These instances excel at handling transactional, real-time, distributed databases, including MySQL, PostgreSQL, Hbase and NoSQL solutions like Aerospike, MongoDB, ClickHouse, and Apache Druid. They’re also optimized for real-time analytics platforms such as Apache Spark, data lakehouse, and AI LLM pre-processing for training. These instances have up to 1.5 TiB of memory, and 45 TB local instance storage. They deliver up to 100 Gbps of network performance bandwidth, and 60 Gbps of dedicated bandwidth for Amazon Elastic Block Store (EBS).
Amazon EC2 Auto Scaling now enables customers to force cancel instance refreshes immediately, without waiting for in-progress instance launches or terminations to complete. This enhancement provides greater control over Auto Scaling group (ASG) updates, especially during emergency situations such as when needing to rapidly roll forward to a new application deployment when the current deployment is causing service disruptions. Customers can now quickly abort ongoing deployments and immediately start new instance refreshes when needed.
Instance refreshes are used to update instances within an ASG, typically when configuration changes require instance replacement. To use this feature, set the WaitForTransitioningInstances to false when calling the CancelInstanceRefresh API. This enables faster cancellation of the instance refresh, bypassing the wait for any pending instance activities such as instance lifecycle hooks.
This feature is available in all AWS regions, including AWS GovCloud (US) Regions. To get started, please visit Amazon EC2 Auto Scaling user guide.
Amazon DataZone is now available in AWS Asia Pacific (Hong Kong), Asia Pacific (Malaysia) and Europe (Zurich) Regions.
Amazon DataZone is a fully managed data management service to catalog, discover, analyze, share, and govern data between data producers and consumers in your organization. With Amazon DataZone, data producers populate the business data catalog with structured data assets from AWS Glue Data Catalog and Amazon Redshift tables. Data consumers search and subscribe to data assets in the data catalog and share with other collaborators working on the same business use case. Consumers can analyze their subscribed data assets with tools—such as Amazon Redshift or Amazon Athena query editors—that are directly accessed from the Amazon DataZone portal. The integrated publishing and subscription workflow provides access to auditing capabilities across projects.
For more information on AWS Regions where Amazon DataZone is available in preview, see supported regions.
Additionally, Amazon DataZone powers governance in the next generation of Amazon SageMaker, which simplifies the discovery, governance, and collaboration of data and AI across your lakehouse, AI models, and GenAI applications. With Amazon SageMaker Catalog (built on Amazon DataZone), users can securely discover and access approved data and models using semantic search with generative AI–created metadata, or they could just ask Amazon Q Developer using natural language to find their data. For more information on AWS Regions where the next generation of SageMaker is available, see supported regions. To learn more about the next generation of SageMaker, visit the product webpage.
As a Python library for accelerator-oriented array computation and program transformation, JAX is widely recognized for its power in training large-scale AI models. But its core design as a system for composable function transformations unlocks its potential in a much broader scientific landscape. Following our recent post on solving high-order partial differential equations, or PDEs, we’re excited to highlight another frontier where JAX is making a significant impact: AI-driven protein engineering.
I recently spoke with April Schleck and Nick Boyd, two co-founders of Escalante, a startup using AI to train models that predict the impact of drugs on cellular protein expression levels. Their story is a powerful illustration of how JAX’s fundamental design choices — especially its functional and composable nature — are enabling researchers to tackle multi-faceted scientific challenges in ways that are difficult to achieve with other frameworks.
A new approach to protein design
April and Nick explained that Escalante’s long-term vision is to train machine learning (ML) models that can design drugs from the ground up. Unlike fields like natural language processing, which benefit from vast amounts of public data, biology currently lacks the specific datasets needed to train models that truly understand cellular systems. Thus, their immediate focus is to solve this data problem by using current AI tools to build new kinds of lab assays that can generate these massive, relevant biological datasets.
This short-term mission puts them squarely in the field of protein engineering, which they described as a complex, multi-objective optimization problem. When designing a new protein, they aren’t just optimizing one thing; it needs to bind to a specific target, while also being soluble, thermostable, and expressible in bacteria. Each of these properties is predicted by a different ML model (see figure below), ranging from complex architectures like AlphaFold 2(implemented in JAX) to simpler, custom-trained models. Their core challenge is to combine all these different objectives into a single optimization loop.
This is where, as April put it, “JAX became a game-changer for us.” She noted that while combining many AI models might be theoretically possible in other frameworks, JAX’s functional nature makes it incredibly natural to integrate a dozen different ones into a single loss function (see figure below).
Easily combine multiple objectives represented by different loss terms and models
In the above code, Nick explained that there are at least two different ways models are being combined — some loss terms that are being linearly combined (e.g. the AF loss + the ESM pseudo log likelihood loss), and some terms where models are being composed serially (e.g., in the first Boltz-1 term we first fold the sequence with Boltz-1 and then compute the sequence likelihood after inverse folding with another model, ProteinMPNN).
To make this work, they embraced the JAX ecosystem, even translating models from PyTorch themselves — a prime example being their JAX translation of the Boltz-2 structure prediction model.
This approach gives what April called an “expressive language for protein design,” where models can be composed, added, and transformed to define a final objective. April said that the most incredible part is that this entire, complex graph of models “can be wrapped in a single jax.jit call that gives great performance” — something they found very difficult to do in other frameworks.
Instead of a typical training run that optimizes a model’s weights, their workflow inverts the process to optimize the input itself, using a collection of fixed, pre-trained neural networks as a complex, multi-objective loss function. The approach is mechanically analogous to Google’s DeepDream. Just as DeepDream takes a fixed, pre-trained image classifier and uses gradient ascent to iteratively modify an input image’s pixels to maximize a chosen layer’s activation, Escalante’s method starts with a random protein sequence. This sequence is fed through a committee of “expert” models — each one a pre-trained scorer for a different desirable property, like binding affinity or stability. The outputs from all the models are combined into a single, differentiable objective functional. They then calculate a gradient of this final score with respect to the input sequence via backpropagation. An optimizer then uses this gradient to update the sequence, nudging it in a direction that better satisfies the collective requirements of all the models. This cycle repeats, evolving the random initial input into a novel, optimized protein sequence that the entire ensemble of models “believes” is ideal.
Nick said that the choice of JAX was critical for this process. Its ability to compile and automatically differentiate complex code makes it ideal for optimizing the sophisticated loss functions at the heart of their work with Escalante’s library of tools for their protein design work, Mosaic. Furthermore, the framework’s native integration with TPU hardware via the XLA compiler allowed them to easily scale these workloads.
Escalante is sampling many potential protein designs for solving a problem (by optimizing the loss function). Each sampling job might generate 1K – 50K potential designs, which are then ranked and filtered. By the end of the process, they test only about 10 designs in the wet lab. This has led them to adopt a unique infrastructure pattern. Using Google Kubernetes Engine) (GKE), they instantly spin up 2,000 to 4,000 spot TPUs, run their optimization jobs for about half an hour, and then shut them all down.
Nick also shared the compelling economics driving this choice. Given current spot pricing, adopting Cloud TPU v6e (Trillium) over an H100 GPU translated to a gain of 3.65x in performance per dollar for their large-scale jobs. He stressed that this cost-effectiveness is critical for their long-term goal of designing protein binders against the entire human proteome, a task that requires immense computational scale.
To build their system, they rely on key libraries within the JAX ecosystem like Equinox and Optax. Nick prefers Equinox because it feels like “vanilla JAX,” calling its concept of representing a model as a simple PyTree “beautiful and easy to reason about.” Optax, meanwhile, gives them the flexibility to easily swap in different optimization algorithms for their design loops.
They emphasized that this entire stack — JAX’s functional core, its powerful ecosystem libraries, and the scalable TPU hardware — is what makes their research possible.
We are excited to see community contributions like Escalante’s Mosaic library, which contains the tools for their protein design work and is now available on GitHub. It’s a fantastic addition to the landscape of JAX-native scientific tools.
Stories like this highlight a growing trend: JAX is much more than a framework for deep learning. Its powerful system of program transformations, like grad and jit, makes it a foundational library for the paradigm of differentiable programming, empowering a new generation of scientific discovery. The JAX team at Google is committed to supporting and growing this vibrant ecosystem, and that starts with hearing directly from you.
Share your story: Are you using JAX to tackle a challenging problem?
Help guide our roadmap: Are there new features or capabilities that would unlock your next breakthrough?
Your feature requests are essential for guiding the evolution of JAX. Please reach out to the team to share your work or discuss what you need from JAX via GitHub.
Our sincere thanks to April and Nick for sharing their insightful journey with us. We’re excited to see how they and other researchers continue to leverage JAX to solve the world’s most complex scientific problems.