The world of Generative AI is evolving rapidly, and AI Agents are at the forefront of this change. An AI agent is a software system designed to act on your behalf. They show reasoning, planning, and memory and have a level of autonomy to make decisions, learn, and adapt.
At its core, an AI agent uses a large language model (LLM), like Gemini, as its “brain” to understand and reason. This allows it to process information from various sources, create a plan, and execute a series of tasks to reach a predefined objective. This is the key difference between a simple prompt-and-response and an agent: the ability to act on a multi-step plan.
The great news is that you can now easily build your own AI agents, even without deep expertise, thanks toAgent Development Kit (ADK). ADK is an open-source Python and Java framework by Google designed to simplify agent creation.
To guide you, this post introduces three hands-on labs that cover the core patterns of agent development:
Building your first autonomous agent
Empowering that agent with tools to interact with external services
Orchestrate a multi-agent system where specialized agents collaborate
Build your first agent
This lab introduces the foundational principles of ADK by guiding you through the construction of a personal assistant agent.
You will write the code for the agent itself and will interact directly with the agent’s core reasoning engine, powered by Gemini, to see how it responds to a simple request. This lab is focused on building the fundamental scaffolding of every agent you’ll create.
aside_block
<ListValue: [StructValue([(‘title’, ‘Go to the lab!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f04342d1dc0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Empower your agent with tools
An agent without custom tools can only rely on its built-in knowledge. To make it more powerful for your specific use-case, you can give it access to specialized tools. In this lab, you will learn three different ways to add tools:
Build a Custom Tool: Write a currency exchange tool from scratch.
Leverage a Third-Party Tool: Import and use a Wikipedia toolfrom the LangChain library.
aside_block
<ListValue: [StructValue([(‘title’, ‘Go to the lab!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f04342d1550>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Build a Team of Specialized Agents
When a task is too complex for a single agent, you can build out a multi-agent team. This lab goes deep into the power of multi-agent systems by having you build a “movie pitch development team” that can research, write, and analyze a film concept.
You will learn how to use ADK’s Workflow Agents to control the flow of work automatically, without needing user input at every step. You’ll also learn how to use the session state to pass information between the agents.
aside_block
<ListValue: [StructValue([(‘title’, ‘Go to the lab!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f04342d1400>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Summary: Build Your First AI Teammate Today
Ready to build your first AI agents? Dive into the codelabs from this post:
As machine learning models continue to scale, a specialized, co-designed hardware and software stack is no longer optional, it’s critical. Ironwood, our latest generation Tensor Processing Unit (TPU), is the cutting-edge hardware behind advanced models like Gemini and Nano Banana, from massive-scale training to high-throughput, low-latency inference. This blog details the core components of Google’s AI software stack that are woven into Ironwood, demonstrating how this deep co-design unlocks performance, efficiency, and scale. We cover the JAX and PyTorch ecosystems, the XLA compiler, and the high-level frameworks that make this power accessible.
1. The co-designed foundation
Foundation models today have trillions of parameters that require computation at ultra-large scale. We designed the Ironwood stack from the silicon up to meet this challenge.
The core philosophy behind the Ironwood stack is system-level co-design, treating the entire TPU pod not as a collection of discrete accelerators, but as a single, cohesive supercomputer. This architecture is built on a custom interconnect that enables massive-scale Remote Direct Memory Access (RDMA), allowing thousands of chips to exchange data directly at high bandwidth and low latency, bypassing the host CPU. Ironwood has a total of 1.77 PB of directly accessible HBM capacity, where each chip has eight stacks of HBM3E, with a peak HBM bandwidth of 7.4 TB/s and capacity of 192 GiB.
Unlike general-purpose parallel processors,TPUs are Application-Specific Integrated Circuits (ASICs) built for one purpose: accelerating large-scale AI workloads. The deep integration of compute, memory, and networking is the foundation of their performance. At a high level, the TPU consists of two parts:
Hardware core: The TPU core is centered around a dense Matrix Multiply Unit (MXU) for matrix operations, complemented by a powerful Vector Processing Unit (VPU) for element-wise operations (activations, normalizations) and SparseCores for scalable embedding lookups. This specialized hardware design is what delivers Ironwood’s 42.5 Exaflops of FP8 compute.
Software target: This hardware design is explicitly targeted by the Accelerated Linear Algebra (XLA) compiler, using a software co-design philosophy that combines the broad benefits of whole-program optimization with the precision of hand-crafted custom kernels. XLA’s compiler-centric approach provides a powerful performance baseline by fusing operations into optimized kernels that saturate the MXU and VPU. This approach delivers good “out of the box” performance with broad framework and model support. This general-purpose optimization is then complemented by custom kernels (detailed below in the Pallas section) to achieve peak performance on specific model-hardware combinations. This dual-pronged strategy is a fundamental tenet of the co-design.
The figure below shows the layout of the Ironwood chip:
This specialized design extends to the connectivity between TPU chips for massive scale-up and scale-out for a total of 88473.6 Tbps (11059.2TB/s) for a complete Ironwood superpod.
The building block: Cubes and ICI. Each physical Ironwood host has four TPU chips. A single rack of these hosts has 64 Ironwood chips and forms a “cube”. Within this cube, every chip is connected via multiple high-speed Inter-Chip Interconnect (ICI) links that form a direct 3D Torus topology. This creates an extremely dense, all-to-all network fabric, enabling massive bandwidth and low latency for distributed operations within the cube.
Scaling with OCS: Pods and Superpods To scale beyond a single cube, multiple cubes are connected using an Optical Circuit Switch (OCS) network.This isa dynamic, reconfigurable optical network that connects entire cubes, allowing the system to scale from a small “pod” (e.g., a 256-chip Ironwood pod with four cubes) to a massive “superpod” (e.g., a 9,216-chip system with 144 cubes). This OCS-based topology is key to fault tolerance. If a cube or link fails, the OCS fabric manager instructs the OCS to optically bypass the unhealthy unit and establish new, complete optical circuits connecting only the healthy cubes, swapping in a designated spare. This dynamic reconfigurability allows for both resilient operation and the provisioning of efficient “slices” of any size. For the largest-scale systems, into the hundreds of thousands of chips, multiple superpods can then be connected via a standard Data-Center Network (DCN).
Chips can be configured in different “slices” with different OCS topologies as shown below.
Each chip is connected to 6 other chips in the 3D torus and provides 3 distinct axes for parallelism.
Ironwood delivers this performance while focusing on power efficiency, allowing AI workloads to run more cost-effectively. Ironwood perf/watt is 2x relative to Trillium, our previous-generation TPU. Our advanced liquid cooling solutions and optimized chip design can reliably sustain up to twice the performance of standard air cooling even under continuous, heavy AI workloads. Ironwood is nearly 30x more power efficient than our first Cloud TPU from 2018 and is our most power-efficient chip to date.
It’s the software stack’s job to translate high-level code into optimized instructions that leverage the full power of the hardware. The stack supports two primary frameworks: the JAX ecosystem, which offers maximum performance and flexibility, as well as PyTorch on TPUs, which provides a native experience for the PyTorch community.
2. Optimizing the entire AI lifecycle
We use the principle of a co-designed Ironwood hardware and software stack to deliver maximum performance and efficiency across every phase of model development, with specific hardware and software capabilities tuned for each stage.
Pre-training: This phase demands sustained, massive-scale computation. A full 9,216-chip Ironwood superpod leverages the OCS and ICI fabric to operate as a single, massive parallel processor, achieving maximum sustained FLOPS utilization through different data formats. Running a job of this magnitude also requires resilience, which is managed by high-level software frameworks like MaxText, detailed in Section 3.3, that handle fault tolerance and checkpointing transparently.
Post-training (Fine-tuning and alignment): This stage includes diverse, FLOPS-intensive tasks like supervised fine-tuning (SFT) and Reinforcement Learning (RL), all requiring rapid iteration. RL, in particular, introduces complex, heterogeneous compute patterns. This stage often requires two distinct types of jobs to run concurrently: high-throughput, inference-like sampling to generate new data (often called ‘actor rollouts’), and compute-intensive, training-like ‘learner’ steps that perform the gradient-based updates. Ironwood’s high-throughput, low-latency network and flexible OCS-based slicing are ideal for this type of rapid experimentation, efficiently managing the different hardware demands of both sampling and gradient-based updates. In Section 3.3, we discuss how we provide optimized software on Ironwood — including reference implementations and libraries — to make these complex fine-tuning and alignment workflows easier to manage and execute efficiently.
Inference (serving): In production, models must deliver low-latency predictions with high throughput and cost-efficiency. Ironwood is specifically engineered for this, with its large on-chip memory and compute power optimized for both the large-batch “prefill” phase and the memory-bandwidth-intensive “decode” phase of large generative models. To make this power easily accessible, we’ve optimized state-of-the-art serving engines. At launch, we’ve enabled vLLM, detailed in Section 3.3, providing the community with a top-tier, open-source solution that maximizes inference throughput on Ironwood.
3. The software ecosystem for TPUs
The TPU stack, and Ironwood’s stack in particular, is designed to be modular, allowing developers to operate at the level of abstraction they need. In this section, we focus on the compiler/runtime, framework, and AI stack libraries.
3.1 The JAX path: Performance and composability
JAX is a high-performance numerical computing system co-designed with the TPU architecture. It provides a familiar NumPy-like API backed by powerful function transformations:
jit(Just-in-Time compilation): Uses the XLA compiler to fuse operations into a single, optimized kernel for efficient TPU execution.
grad(automatic differentiation): Automatically computes gradients of Python functions, the fundamental mechanism for model training.
shard_map(parallelism): The primitive for expressing distributed computations, allowing explicit control over how functions and data are sharded across a mesh of TPU devices, directly mapping to the ICI/OCS topology.
This compositional approach allows developers to write clean, Pythonic code that JAX and XLA transform into highly parallelized programs optimized for TPU hardware. JAX is what Google Deepmind and other Google teams use to build, train, and service their variety of models.
For most developers, these primitives are abstracted by high-level frameworks, like MaxText, built upon a foundation of composable, production-proven libraries:
Optax: A flexible gradient processing and optimization library (e.g., AdamW)
Orbax: A library for asynchronous checkpointing of distributed arrays across large TPU slices
Qwix: A JAX quantization library supporting Quantization Aware Training (QAT) and Post-Training Quantization (PTQ)
Metrax: A library for collecting and processing evaluation metrics in a distributed setting
Tunix: A high-level library for orchestrating post-training jobs
Goodput: A library for measuring and monitoring real-time ML training efficiency, providing a detailed breakdown of badput (e.g., initialization, data loading, checkpointing)
3.2 The PyTorch path: A native eager experience
To bring Ironwood’s power to the PyTorch community, we are developing a new, native PyTorch experience complete with support for a “native eager mode”, which executes operations immediately as they are called. Our goal is to provide a more natural and developer-friendly way to access Ironwood’s scale, minimizing the code changes and level of effort required to adapt models for TPUs. This approach is designed to make the transition from local experimentation to large-scale training more straightforward.
This new framework is built on three core principles to ensure a truly PyTorch-native environment:
Full eager mode: Enables the rapid prototyping, debugging, and research workflows that developers expect from PyTorch.
Standard distributed APIs: Leverages the familiar torch.distributed API, built on DTensor, for scaling training workloads across TPU slices.
Idiomatic compilation: Uses torch.compile as the single, unified path to JIT compilation, utilizing XLA as its backend to trace the graph and compile it into efficient TPU machine code.
This ensures the transition from local experimentation to large-scale distributed training is a natural extension of the standard PyTorch workflow.
3.3 Frameworks: MaxText, PyTorch on TPU, and vLLM
While JAX and PyTorch provide the computational primitives, scaling to thousands of chips is a supercomputer management problem. High-level frameworks handle the complexities of resilience, fault tolerance, and infrastructure orchestration.
MaxText (JAX): MaxText is an open-source, high-performance LLM pre-training and post-training solution written in pure Python and JAX. MaxText demonstrates optimized training on its library of popular OSS models like DeepSeek, Qwen, gpt-oss, Gemma, and more. Whether users are pre-training large Mixture-of-Experts (MoE) models from scratch, or leveraging the latest Reinforcement Learning (RL) techniques on an OSS model, MaxText provides tutorials and APIs to make things easy. For scalability and resiliency, MaxText leverages Pathways, which was originally developed by Google DeepMind and now provides TPU users with differentiated capabilities like elastic training and multi-host inference during RL.
PyTorch on TPU: We recently shared our proposal about our PyTorch native experience on TPUs at Pytorch Conference 2025, including an early preview of training on TPU with minimal code changes. In addition to the framework itself, we are working with the community (RFC), investing in reproducible recipes, reference implementations, and migration tools to enable PyTorch users to use their favorite frameworks on TPUs. Expect further updates as this work matures.
vLLM TPU (Serving): vLLM TPU is now powered by tpu-inference, an expressive and powerful new hardware plugin that unifies JAX and PyTorch under a single lowering path – meaning both frameworks are translated to optimized TPU code through one common, shared backend. This new unified backend is not only faster than the previous generation of vLLM TPU but also offers broader model coverage. This integration provides more flexibility to JAX and PyTorch users, running PyTorch models performantly with no code changes while also extending native JAX support, all while retaining the standard vLLM user experience and interface.
3.4 Extreme performance: Custom kernels via Pallas
While XLA is powerful, cutting-edge research often requires novel algorithms e.g. new attention mechanisms, custom padding to handle dynamic ragged tensors and other optimizations for custom MoE models that the XLA compiler cannot yet optimize.
The JAX ecosystem solves this with Pallas, a JAX-native kernel programming language embedded directly in Python. Pallas presents a unified, Python-first experience, dramatically reducing cognitive load and accelerating the iteration cycle. Other platforms lack this unified, in-Python approach, forcing developers to fragment their workflow. To optimize these operations, they must drop into a disparate ecosystem of lower-level tools—from DSLs like Triton and cuTE to raw CUDA C++ and PTX. This introduces significant mental overhead by forcing developers to manually manage memory, streams, and kernel launches, pulling them out of their Python-based environment
This is a clear example of co-design. Developers use Pallas to explicitly manage the accelerator’s memory hierarchy, defining how “tiles” of data are staged from HBM into the extremely fast on-chip SRAM to be operated on by the MXUs. Pallas has two main parts to it.
Pallas: The developer defines the high-level algorithmic structure and memory logistics in Python.
Mosaic: This compiler backend translates the Pallas definition into optimized TPU machine code. It handles operator fusion, determines optimal tiling strategies, and generates software pipelines to perfectly overlap data transfers (HBM-to-SRAM) with computation (on the MXUs), with the sole objective of saturating the compute units.
Because Pallas kernels are JAX-traceable, they are fully compatible with jit, vmap, and grad. This stack provides Python-native extensibility for both JAX and PyTorch, as PyTorch users can consume Pallas-optimized kernels without ever leaving the native PyTorch API. Pallas kernels for PyTorch and JAX models, on both TPU and GPU, are available via Tokamax, the ML ecosystem’s first multi-framework, multi-hardware kernel library.
3.5 Performance engineering: Observability and debugging
The Ironwood stack includes a full suite of tools for performance analysis, bottleneck detection, and debugging, allowing developers to fully optimize their workloads and operate large scale clusters reliably,
Cloud TPU metrics: Exposes key system-level counters (FLOPS, HBM bandwidth, ICI traffic) to Google Cloud Monitoring that can then be exported to popular monitoring tools like Prometheus.
TensorBoard: Visualizes training metrics (loss, accuracy) and hosts the XProf profiler UI.
XProf (OpenXLA Profiler): The essential toolset for deep performance analysis. It captures detailed execution data from both the host-CPU and all TPU devices, providing:
Trace Viewer: A microsecond-level timeline of all operations, showing execution, collectives, and “bubbles” (idle time).
Input Pipeline Analyzer: Diagnoses host-bound vs. compute-bound bottlenecks.
Op Profile: Ranks all XLA/HLO operations by execution time to identify expensive kernels.
Memory Profiler: Visualizes HBM usage over time to debug peak memory and fragmentation.
Debugging Tools:
JAX Debugger (jax.debug): Enables print and breakpoints from within jit-compiled functions.
TPU Monitoring Library: A real-time diagnostic dashboard (analogous to nvidia-smi) for live debugging of HBM utilization, MXU activity, and running processes.
Beyond performance optimization, developers and infra admins can view fleet efficiency and goodput metrics at various levels (e.g., job, reservation) to ensure maximum utilization of their TPU infrastructure.
4. Conclusion
The Ironwood stack is a complete, system-level co-design, from the silicon to the software. It delivers performance through a dual-pronged strategy: the XLA compiler provides broad, “out-of-the-box” optimization, while the Pallas and Mosaic stack enables hand-tuned kernel performance.
This entire co-designed platform is accessible to all developers, providing first-class, native support for both the JAX and the PyTorch ecosystem. Whether you are pre-training a massive model, running complex RL alignment, or serving at scale, Ironwood provides a direct, resilient, and high-performance path from idea to supercomputer.
Get started today with vLLM on TPU for inference and MaxText for pre-training and post-training.
Decision makers and builders today face a constant challenge: managing rising cloud costs while delivering the performance their customers demand. As applications evolve to use scale-out microservices and handle ever-growing data volumes, organizations need maximum efficiency from their underlying infrastructure to support their growing general-purpose workloads.
To meet this need, we’re excited to announce our latest Axion-based virtual machine series: N4A, available in preview on Compute Engine, Google Kubernetes Engine (GKE), Dataproc, and Batch, with support in Dataflow and other services coming soon.
N4A is the most cost-effective N-series VM to date, delivering up to 2x better price-performance and 80% better performance-per-watt than comparable current-generation x86-based VMs. This makes it easier for customers to further optimize the Total Cost of Ownership (TCO) for a broad range of general-purpose workloads. We see this with cloud-native businesses running scale-out web servers and microservices on GKE, enterprise teams managing backend application servers and mid-sized databases, and engineering organizations operating large CI/CD build farms.
At Google Cloud, we co-design our compute offerings with storage, networking and software at every layer of the stack, from orchestrators to runtimes, to deliver exceptional system-level performance and cost-efficiency. N4A’s breakthrough price-performance is powered by our latest-generation Google Axion Processors, built on the Arm® Neoverse® N3 compute core, Google Dynamic Resource Management (DRM) technology, and Titanium, Google Cloud’s custom-designed hardware and software system that offloads networking and storage processing to free up the CPU. Titanium is part of Google Cloud’s vertically integrated software stack — from the custom silicon in our servers to our planet-scale network traversing 7.75 million kilometers of terrestrial and subsea fiber across 42 regions — that is engineered to maximize efficiency and provide the ultra-low latency and high bandwidth to customers at global scale.
Redefining general-purpose compute and enabling AI inference
N4A is engineered for versatility, with a feature set to support your general-purpose and CPU-based AI workloads. It comes in predefined and custom shapes, with up to 64 vCPUs and 512GB of DDR5 in high-cpu (2GB of memory per vCPU), standard (4GB per vCPU), and high-memory (8GB per vCPU) configurations, with instance networking up to 50 Gbps of bandwidth. N4A VMs feature support for our latest generation Hyperdisk storage options, including Hyperdisk Balanced, Hyperdisk Throughput, and Hyperdisk ML (coming later), providing up to 160K IOPS, 2.4GB/s of throughput per instance.
N4A performs well across a range of industry-standard benchmarks that represent the key workloads our customers run every day. For example, relative to comparable current-generation x86-based VM offerings, N4A delivers up to 105% better price-performance for compute-bound workloads, up to 90% better price-performance for scale-out web servers, up to 85% better price-performance for Java applications, and up to 20% better price-performance for general-purpose databases.
Footnote: As of October 2025. Performance based on the estimated SPECrate®2017_int_base, estimated SPECjbb2015, MySQL Transactions/minute (RO), and Google internal Nginx Reverse Proxy benchmark scores run in production on comparable latest-generation generally-available VMs with general purpose storage types. Price-performance claims based on published and upcoming list prices for Google Cloud.
In the real world, early adopters are seeing dramatic price-performance improvements from the new N4A instances.
“At ZoomInfo, we operate a massive data intelligence platform where efficiency is paramount. Our core data processing pipelines, which are critical for delivering timely insights to our customers, run extensively on Dataflow and Java services in GKE. In our preview of the new N4A instances, we measured a 60% improvement in price-performance for these key workloads compared to their x86-based counterparts. This allows us to scale our platform more efficiently and deliver more value to our customers, faster.” – Sergei Koren, Chief Infrastructure Architect, ZoomInfo
“Organizations today need performance, efficiency, flexibility, and scale to meet the computing demands of the AI era; this requires the close collaboration and co-design that is at the heart of our partnership with Google Cloud. As N4A redefines cost-efficiency, customers gain a new level of infrastructure optimization, enabling enterprises to choose the right infrastructure for their workload requirements with Arm and Google Cloud.” – Bhumik Patel, Director, Server Ecosystem Development, Infrastructure Business, Arm
Granular control with Custom Machine Types and Hyperdisk
A key advantage of our N-series VMs has always been flexibility, and with N4A, we are bringing one of our most popular features to the Axion family for the first time: Custom Machine Types (CMT). Instead of fitting your workload into a predefined shape, CMTs on N4A lets you independently configure the amount of vCPU and memory to meet your application’s unique needs. This ability to right-size your instances means you pay only for the resources you use, minimizing waste and optimizing your total cost of ownership.
This same principle of matching resources to your specific workload applies to storage. N4A VMs feature support for our latest generation of Hyperdisk, allowing you to select the perfect storage profile for your application’s needs:
Hyperdisk Balanced: Offers an optimal mix of performance and cost for the majority of general-purpose workloads, with up to 160K IOPs per N4A VM.
Hyperdisk Throughput: Delivers up to 2.4GiBps of max throughput for bandwidth-intensive analytics workloads like Hadoop or Kafka, providing high-capacity storage at an excellent value.
Hyperdisk ML (post GA): Purpose-built for AI/ML workloads, allows you to attach a single disk containing your model weights or datasets to up to 32 N4A instances simultaneously for large-scale inference or training tasks.
Hyperdisk Storage Pools: Instead of provisioning capacity and performance on a per-volume basis, allows you to provision performance and capacity in aggregate, further optimizing costs by up to 50% and simplifying management.
“At Vimeo, we have long relied on Custom Machine Types to efficiently manage our massive video transcoding platform. Our initial tests on the new Axion-based N4A instances have been very compelling, unlocking a new level of efficiency. We’ve observed a 30% improvement in performance for our core transcoding workload compared to comparable x86 VMs. This points to a clear path for improving our unit economics and scaling our services more profitably, without changing our operational model.” – Joe Peled, Sr. Director of Hosting & Delivery Ops
A growing Arm-based Axion portfolio for customer choice
C-series VMs are designed for workloads that require consistently high performance, e.g., medium-to-large-scale databases and in-memory caches. Alongside them, N-series VMs have been a key Compute Engine pillar, offering a balance of price-performance and flexibility, lowering the cost of running workloads with variable resource needs such as scale-out Java/GKE workloads. We released our first Axion-based machine series, C4A, in October 2024, and the introduction of N4A complements C4A, providing a range of Google Axion instances suited to your workloads’ precise needs.
On top of that, GKE unlocks significant price-performance advantages by orchestrating Axion-based C4A and N4A machine types. GKE leverages Custom Compute Classes to provision and mix these machine types, matching workloads to the right hardware. This automated, heterogeneous cluster management allows teams to optimize their total cost of ownership across their entire application stack.
Also joining the Axion family is C4A.metal, Google Cloud’s first Axion bare metal instance that helps builders meet use cases that require access to the underlying physical server to run specialized applications in a non-virtualized environment, such as automotive systems development, workloads with strict licensing requirements, and Android software development. C4A.metal will be available in preview soon.
Supported by the broad and mature Arm ecosystem, adopting Axion is easier than ever, and the combination of C4A and N4A can help you lower the total cost of running your business, without compromising on performance or workload-specific requirements:
N4A for cost optimization and flexibility. Deliberately engineered for general-purpose workloads that need a balance of price and performance, including scale-out web servers, microservices, containerized applications, open-source databases, batch, data analytics, development environments, data preparation and AI/ML experimentation.
C4A for consistently high performance, predictability, and control. Powering workloads where every microsecond counts, such as medium- to large-scale databases, in-memory caches, cost-effective AI/ML inference, and high-traffic gaming servers. C4A delivers consistent performance, offering a controlled maintenance experience for mission-critical workloads, networking bandwidth up to 100 Gbps, and next-generation Titanium Local SSD storage.
“Migrating to Google Cloud’s Axion portfolio gave us a critical competitive advantage. We slashed our compute consumption by 20% while maintaining low and stable latency with C4A instances, such as our Supply-Side Platform (SSP) backend service. Additionally, C4A enabled us to leverage Hyperdisk with precisely the IOPS we need for our stateful workloads, regardless of instance size. This flexibility gives us the best of both worlds – allowing us to win more ad auctions for our clients while significantly improving our margins. We’re now testing the N4A family by running some of our key workloads that require the most flexibility, such as our API relay service. We are happy to share that several applications running in production are consuming 15% less CPU compared to our previous infrastructure, reducing our costs further, while ensuring that the right instance backs the workload characteristics required.” – Or Ben Dahan, Cloud & Software Architect at Rise
Get started with N4A today
N4A is available during preview in the following Google Cloud regions: us-central1 (Iowa), us-east4 (N. Virginia), europe-west3 (Frankfurt) and europe-west4 (Netherlands) with more regions to follow.
We can’t wait to see what you build. To get access, sign-up here. To learn more, check out the N4A documentation.
Today, we are thrilled to announce C4A metal, our first bare metal instance running on Google Axion processors, available in preview soon. C4A metal is designed for specialized workloads that require direct hardware access and Arm®-native compatibility.
Now, organizations running environments such as Android development, automotive simulation, CI/CD pipelines, security workloads, and custom hypervisors can run them on Google Cloud, without the performance overheads and complexity of nested virtualization.
C4A metal instances, like other Axion instances, are built on the standard Arm architecture, so your applications and operating systems compiled for Arm remain portable across your cloud, on-premises, and edge environments, protecting your development investment. C4A metal offers 96 vCPUs, 768GB of DDR5 memory, up to 100Gbps of networking bandwidth, with full support for Google Cloud Hyperdisk including Hyperdisk Balanced, Extreme, Throughput, and ML block storage options.
Google Cloud provides workload-optimized infrastructure to ensure the right resources are available for every task. C4A metal, like the Google Cloud Axion virtual machine family, is powered by Titanium, a key component for multi-tier offloads and security that is foundational to our infrastructure. Titanium’s custom-designed silicon offloads networking and storage processing to free up the CPU, and its dedicated SmartNIC manages all I/O, ensuring that Axion cores are reserved exclusively for your application’s performance. Titanium is part of Google Cloud’s vertically integrated software stack — from the custom silicon in our servers to our planet-scale network traversing 7.75 million kilometers of terrestrial and subsea fiber across 42 regions — that is engineered to maximize efficiency and provide the ultra-low latency and high bandwidth to customers at global scale.
Architectural parity for automotive workloads
Automotive customers can benefit from the Arm architecture’s performance, efficiency, and flexible design for in-vehicle systems such as infotainment and Advanced Driver Assistance Systems (ADAS). Axion C4A metal instances enable architectural parity between test environments and production silicon, allowing automotive technology providers to validate their software on the same Arm Neoverse instruction set architecture (ISA) used in production electronic control units (ECUs). This significantly reduces the risk of late-stage integration failures. For performance-sensitive tasks, these customers can execute demanding virtual hardware-in-the-loop (vHIL) simulations with the consistent, low-latency performance of physical hardware, ensuring test results are reliable and accurate. Finally, C4A metal lets providers move beyond the constraints of a physical lab, by dynamically scaling entire test farms and transforming them from fixed capital expenses into flexible operational ones.
“In the era of AI-defined vehicles, the accelerating pace and complexity of technology are pushing us to rethink traditional linear approaches to software development. Google Cloud’s introduction of Axion C4A metal is a major step forward in this journey. By offering full architectural parity on Arm between test environments and physical silicon, customers can benefit from accelerated development cycles, enabling continuous integration and compliance for a variety of specialized use cases.” – Dipti Vachani, Senior Vice President and General Manager, Automotive Business, Arm
“Our partners and customers rely on QNX to deliver the safety, security, reliability, and real-time performance required for their most mission-critical systems — from advanced driver assistance to digital cockpits. As the Software-Defined Vehicle era continues to gain momentum, decoupling software development from physical hardware is no longer optional — it’s essential for innovation at scale. The launch of Google Cloud’s C4A-metal instances on Axion introduces a powerful ARM-based bare metal platform that we are eager to test and support as this will enable transformative cloud infrastructure benefits for our automotive ecosystem.” –Grant Courville, Senior Vice President, Products and Strategy, QNX
“The future of automotive mobility demands unprecedented speed and precision in practice and development. For automakers and suppliers leveraging the Snapdragon Digital Chassis platform, aligning their cloud development and testing environments to ensure parity with the Snapdragon SoCs in the vehicle is absolutely crucial for efficiency and quality. We are excited about Google Cloud’s commitment to this segment — offering C4A-metal instances with Axion is a massive leap forward, giving the automotive ecosystem a true 1:1 physical to virtual environment in the cloud. This breakthrough significantly reduces integration challenges, slashes validation time, and allows our partners to unleash AI-driven features to market faster at scale.” – Laxmi Rayapudi, VP, Product Management, Qualcomm Technologies, Inc.
Align test and production for Android development
The Android platform was built for Arm-based processors, the standard for virtually all mobile devices. By running development and testing pipelines on the bare-metal instances of Axion processors with C4A metal, Android developers can benefit from native performance, eliminating the overhead of emulation management, such as slow instruction-by-instruction translation layers. In addition, they can significantly reduce latency for Android build toolchains and automated test systems, leading to faster feedback cycles. C4A metal also solves the performance challenges of nested virtualization, making it a great platform for scalable Cuttlefish (Cloud Android) environments.
Once available, developers can deploy scalable Cuttlefish environment farms on top C4A metal instances with an upcoming release of Horizon or by directly leveraging Cloud Android Orchestration. C4A metal allows these virtual devices to run directly on the physical hardware, providing the performance needed to build and manage large, high-fidelity test farms for true continuous testing.
Bare metal access without compromise
As a cloud offering, C4A metal enables a lower total cost of ownership by replacing the entire lifecycle of physical hardware procurement and management with a predictable operational expense. This eliminates the direct capital expenditures of purchasing servers, along with the associated operational costs of hardware maintenance contracts, power, cooling, and physical data center space. You can programmatically provision and de-provision instances to match your exact testing demands, ensuring you are not paying for an over-provisioned fleet of servers sitting idle waiting for peak development cycles.
Operating as standard compute resources within your Virtual Private Cloud (VPC), C4A metal instances inherit and leverage the same security policies, audit logging, and network controls as virtual machines. Instances are designed to appear as physical servers to your toolchain and support common monitoring and security agents, allowing for straightforward integration with your existing Google Cloud environments. This integration extends to storage, where network-attached Hyperdisk allows you to manage persistent disks using the same snapshot and resizing tools your teams already use for your virtual machine fleet.
“For our build system, true isolation is paramount. Running on Google Cloud’s new C4A metal instance on Axion enables us to isolate our package builds with a strong hypervisor security boundary without compromising on build performance.” – Matthew Moore, Founder and CTO, Chainguard, Inc
Better together: the Axion C and N series
The addition of C4A metal to the Arm-based Axion portfolio allows customers to lower TCO by matching the right infrastructure to every workload. While Axion C4A virtual machines optimize for consistently high performance and N4A virtual machines (now in preview) optimize for price-performance and flexibility, C4A metal addresses the critical need for direct hardware access by specialized applications that require a non-virtualized Arm environment.
For example, an Android development company could create a highly efficient CI/CD pipeline by using C4A virtual machines for the build farm. For large-scale testing, they could use C4A metal to run Cuttlefish virtual devices directly on the physical hardware, eliminating nested virtualization overhead. To enable even higher fidelity, they can run Cuttlefish hybrid devices on C4A metal, reusing the system images from their physical hardware. Concurrently, supporting infrastructure such as CI/CD orchestrators and artifact repositories could run on cost-effective N4A instances, using Custom Machine Types to right-size resources and minimize operational expenses.
Coming soon to preview
C4A metal is scheduled for preview soon. Please fill this form to sign up for early access and additional updates.
Today’s frontier models, including Google’s Gemini, Veo, Imagen, and Anthropic’s Claude train and serve on Tensor Processing Units (TPUs). For many organizations, the focus is shifting from training these models to powering useful, responsive interactions with them. Constantly shifting model architectures, the rise of agentic workflows, plus near-exponential growth in demand for compute, define this new age of inference. In particular, agentic workflows that require orchestration and tight coordination between general-purpose compute and ML acceleration are creating new opportunities for custom silicon and vertically co-optimized system architectures.
We have been preparing for this transition for some time and today, we are announcing the availability of three new products built on custom silicon that deliver exceptional performance, lower costs, and enable new capabilities for inference and agentic workloads:
Ironwood, our seventh generation TPU, will be generally available in the coming weeks. Ironwood is purpose-built for the most demanding workloads: from large-scale model training and complex reinforcement learning (RL) to high-volume, low-latency AI inference and model serving. It offers a 10X peak performance improvement over TPU v5p and more than 4X better performance per chip for both training and inference workloads compared to TPU v6e (Trillium), making Ironwood our most powerful and energy-efficient custom silicon to date.
New Arm®-based Axion instances. N4A, our most cost-effective N series virtual machine to date, is now in preview. N4A offers up to 2x better price-performance than comparable current-generation x86-based VMs. We are also pleased to announce C4A metal,our first Arm-based bare metal instance, will be coming soon in preview.
Ironwood and these new Axion instances are just the latest in a long history of custom silicon innovation at Google, including TPUs, Video Coding Units (VCU) for YouTube, and five generations of Tensor chips for mobile. In each case, we build these processors to enable breakthroughs in performance that are only possible through deep, system-level co-design, with model research, software, and hardware development under one roof. This is how we built the first TPU ten years ago, which in turn unlocked the invention of the Transformer eight years ago — the very architecture that powers most of modern AI. It has also influenced more recent advancements like our Titanium architecture, and advanced liquid cooling that we’ve deployed at GigaWatt scale with fleet-wide uptime of ~99.999% since 2020.
Pictured: An Ironwood board showing three Ironwood TPUs connected to liquid cooling.
Pictured: Third-generation Cooling Distribution Units, providing liquid cooling to an Ironwood superpod.
Ironwood: The fastest path from model training to planet-scale inference
The early response to Ironwood is overwhelmingly enthusiastic. Anthropic is compelled by the impressive price-performance gains that accelerate their path from training massive Claude models to serving them to millions of users. In fact, Anthropic plans to access up to 1 million TPUs:
“Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work. As demand continues to grow exponentially, we’re increasing our compute resources as we push the boundaries of AI research and product development. Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect.” – James Bradbury, Head of Compute, Anthropic
Ironwood is being used by organizations of all sizes and across industries:
“Our mission at Lightricks is to define the cutting edge of open creativity, and that demands AI infrastructure that eliminates friction and cost at scale. We relied on Google Cloud TPUs and its massive ICI domain to achieve our breakthrough training efficiency for LTX-2, our leading open-source multimodal generative model. Now, as we enter the age of inference, our early testing makes us highly enthusiastic about Ironwood. We believe that Ironwood will enable us to create more nuanced, precise, and higher-fidelity image and video generation for our millions of global customers.” – Yoav HaCohen, Research Director, GenAI Foundational Models, Lightricks
“At Essential AI, our mission is to build powerful, open frontier models. We need massive, efficient scale, and Google Cloud’s Ironwood TPUs deliver exactly that. The platform was incredibly easy to onboard, allowing our engineers to immediately leverage its power and focus on accelerating AI breakthroughs.” – Philip Monk, Infrastructure Lead, Essential AI
System-level design maximizes inference performance, reliability, and cost
TPUs are a key component of AI Hypercomputer, our integrated supercomputing system that brings together compute, networking, storage, and software to improve system-level performance and efficiency. At the macro level, according to a recent IDC report, AI Hypercomputer customers achieved on average 353% three-year ROI, 28% lower IT costs, and 55% more efficient IT teams.
Ironwood TPUs will help customers push the limits of scale and efficiency even further. When you deploy TPUs, the system connects each individual chip to each other, creating a pod — allowing the interconnected TPUs to work as a single unit. With Ironwood, we can scale up to 9,216 chips in a superpod linked with breakthrough Inter-Chip Interconnect (ICI) networking at 9.6 Tb/s. This massive connectivity allows thousands of chips to quickly communicate with each other and access a staggering 1.77 Petabytes of shared High Bandwidth Memory (HBM), overcoming data bottlenecks for even the most demanding models.
Pictured: Part of an Ironwood superpod, directly connecting 9,216 Ironwood TPUs in a single domain.
At that scale, services demand uninterrupted availability. That’s why our Optical Circuit Switching (OCS) technology acts as a dynamic, reconfigurable fabric, instantly routing around interruptions to restore the workload while your services keep running. And when you need more power, Ironwood scales across pods into clusters of hundreds of thousands of TPUs.
Pictured: Jupiter data center network enables the connection of multiple Ironwood superpods into clusters of hundreds of thousands of TPUs.
The AI Hypercomputer advantage: Hardware and software co-designed for faster, more efficient outcomes
On top of this hardware is a co-designed software layer, where our goal is to maximize Ironwood’s massive processing power and memory, and make it easy to use throughout the AI lifecycle.
To improve fleet efficiency and operations, we’re excited to announce that TPU customers can now benefit from Cluster Director capabilities in Google Kubernetes Engine. This includes advanced maintenance and topology awareness for intelligent scheduling and highly resilient clusters.
For pre-training and post-training, we’re also sharing new enhancements to MaxText, a high-performance, open source LLM framework, to make it easier to implement the latest training and reinforcement learning optimization techniques, such as Supervised Fine-Tuning (SFT) and Generative Reinforcement Policy Optimization (GRPO).
For inference, we recently announced enhanced support for TPUs in vLLM, allowing developers to switch between GPUs and TPUs, or run both, with only a few minor configuration changes, and GKE Inference Gateway, which intelligently load balances across TPU servers to reduce time-to-first-token (TTFT) latency by up to 96% and serving costs by up to 30%.
Our software layer is what enables AI Hypercomputer’s high performance and reliability for training, tuning, and serving demanding AI workloads at scale. Thanks to deep integrations across the stack — from data-center-wide hardware optimizations to open software and managed services— Ironwood TPUs are our most powerful and energy-efficient TPUs to date. Learn more about our approach to hardware and software co-design here.
Axion: Redefining general-purpose compute
Building and serving modern applications requires both highly specialized accelerators and powerful, efficient general-purpose compute. This was our vision for Axion, our custom Arm Neoverse®-based CPUs, which we designed to deliver compelling performance, cost and energy efficiency for everyday workloads.
Today, we are expanding our Axion portfolio with:
N4A (preview), our second general-purpose Axion VM, which is ideal for microservices, containerized applications, open-source databases, batch, data analytics, development environments, experimentation, data preparation and web serving jobs that make AI applications possible. Learn more about N4A here.
C4A metal (in preview soon), our first Arm-based bare-metal instance, which provides dedicated physical servers for specialized workloads such Android development, automotive in-car systems, software with strict licensing requirements, scale test farms, or running complex simulations. Learn more about C4A metal here.
With today’s announcements, the Axion portfolio now includes three powerful options, N4A, C4A and C4A metal. Together, the C and N series allow you to lower the total cost of running your business without compromising on performance or workload-specific requirements.
Axion-based Instance
Optimized for
Key Features
N4A (preview)
Price-performance and flexibility
Up to 64 vCPUs, 512GB of DDR5 Memory, and 50 Gbps networking, with support for Custom Machine Types, Hyperdisk Balanced and Throughput storage.
C4A Metal (in preview soon)
Specialized workloads, such as Hypervisors and native Arm development
Up to 96 vCPUs, 768GB of DDR5 Memory, Hyperdisk storage and up to 100Gbps of networking
C4A
Consistently high performance
Up to 72 vCPUs, 576GB of DDR5 Memory, 100Gbps of Tier 1 networking, Titanium SSD with up to 6TB of local capacity, advanced maintenance controls and support for Hyperdisk Balanced, Throughput, and Extreme.
Axion’s inherent efficiency also makes it a valuable option for modern AI workflows. While specialized accelerators like Ironwood handle the complex task of model serving, Axion excels at the operational backbone: supporting high-volume data preparation, ingestion, and running application servers that host your intelligent applications. Axion is already translating into customer impact:
“At Vimeo, we have long relied on Custom Machine Types to efficiently manage our massive video transcoding platform. Our initial tests on the new Axion-based N4A instances have been very compelling, unlocking a new level of efficiency. We’ve observed a 30% improvement in performance for our core transcoding workload compared to comparable x86 VMs. This points to a clear path for improving our unit economics and scaling our services more profitably, without changing our operational model.” – Joe Peled, Sr. Director of Hosting & Delivery Ops, Vimeo
“At ZoomInfo, we operate a massive data intelligence platform where efficiency is paramount. Our core data processing pipelines, which are critical for delivering timely insights to our customers, run extensively on Dataflow and Java services in GKE. In our preview of the new N4A instances, we measured a 60% improvement in price-performance for these key workloads compared to their x86-based counterparts. This allows us to scale our platform more efficiently and deliver more value to our customers, faster.” –Sergei Koren, Chief Infrastructure Architect, ZoomInfo
“Migrating to Google Cloud’s Axion portfolio gave us a critical competitive advantage. We slashed our compute consumption by 20% while maintaining low and stable latency with C4A instances, such as our Supply-Side Platform (SSP) backend service. Additionally, C4A enabled us to leverage Hyperdisk with precisely the IOPS we need for our stateful workloads, regardless of instance size. This flexibility gives us the best of both worlds – allowing us to win more ad auctions for our clients while significantly improving our margins. We’re now testing the N4A family by running some of our key workloads that require the most flexibility, such as our API relay service. We are happy to share that several applications running in production are consuming 15% less CPU compared to our previous infrastructure, reducing our costs further, while ensuring that the right instance backs the workload characteristics required.” – Or Ben Dahan, Cloud & Software Architect, Rise
A powerful combination for AI and everyday computing
To thrive in an era with constantly shifting model architectures, software, and techniques, you need a combination of purpose-built AI accelerators for model training and serving, alongside efficient, general-purpose CPUs for the everyday workloads, including the workloads that support those AI applications.
Ultimately, whether you use Ironwood and Axion together or mix and match them with the other compute options available on AI Hypercomputer, this system-level approach gives you the ultimate flexibility and capability for the most demanding workloads. Sign up to test Ironwood, Axion N4A, or C4A metal today.
If you’re a developer, you’ve seen generative AI everywhere. It can feel like a complex world of models and advanced concepts. It can be difficult to know where to actually start.
The good news is that building your first AI-powered application is more accessible than you might imagine. You don’t need to be an AI expert to get started. This post introduces a new codelab designed to bridge this gap and provide you with a first step. We’ll guide you through the entire process of building a functional, interactive travel chatbot using Google’s Gemini model.
In this codelab, you’ll step into the role of a developer at a travel company tasked with building a new chat application. You’ll start with a basic web application frontend and, step-by-step, you will bring it to life by connecting it to the power of generative AI.
By the end, you will have built a travel assistant that can:
Answer questions about travel destinations.
Provide personalized recommendations.
Fetch real-time data, like the weather, to give genuinely helpful advice.
The process is broken down into a few key stages.
Making the First Connection
Before you can do anything fancy, you need to get your application talking to the AI model. An easy way to do this is with the Vertex AI SDK, a complete library for interacting with the Vertex AI platform.
While the Vertex AI SDK is a powerful tool for the full machine learning lifecycle, this lab focuses on one of its most-used tools: building generative AI applications. This part of the Vertex AI SDK acts as the bridge between your application and the Gemini model. Without it, you would have to manually handle all the complex wiring yourself—writing code to manage authentication, formatting intricate API requests, and parsing the responses. The Vertex AI SDK handles all that complexity for you so you can focus on what you actually want to do: send a message and get a response.
In this codelab, you’ll see just how simple it is.
Giving your AI purpose with system instructions
Once your app is connected, you’ll notice the AI’s responses won’t be tailored to your purposes yet. One way you can make it more useful for your specific use case is by giving it system instructions.
Hot Tip: Use Google AI Studio to Create Your System Instructions
A great way to develop your system instructions is to leverage Gemini as a creative partner to draft them for you. For example, you could ask Gemini in Google AI Studio to generate a thorough set of instructions for a “sophisticated and friendly travel assistant.”
Once you have a draft, you can immediately test it, also in Google AI Studio. Start a new chat and in the panel to the right, set the Gemini model to the one you’re using in your app and paste the text into the system instruction field. This allows you to quickly interact with the model and see how it behaves with your instructions, all without writing any code. When you’re happy with the results, you can copy the final version directly into your application.
Connecting Your AI to the Real World
This is where you break the model out of its knowledge silo and connect it to live data. By default, an AI model‘s knowledge is limited to the data it was trained on; it doesn’t know today’s weather. However, you can provide Gemini with access to external knowledge using a powerful feature called function calling!
The concept is simple: you write a basic Python function (like one to check the weather) and then describe that tool to the model. Then, when a user asks about the weather, the model can ask your application to run your function and use the live result in its answer. This allows the model to answer questions far beyond its training data, making it a much more powerful and useful assistant with access to up-to-the-minute information.
In this lab, we used the Geocoding API and the Weather Forecast API to provide the app with the ability to factor in the weather when answering questions about travel.
Your Journey Starts Here
Building with AI isn’t about knowing everything at once. It’s about taking the first step, building something tangible, and learning key concepts along the way. This codelab was designed to be that first step. By the end, you won’t just have a working travel chatbot—you’ll have hands-on experience with the fundamental building blocks of a production-ready AI application. You’ll be surprised at what you can build.
Share your progress and connect with others on the journey using the hashtag #ProductionReadyAI. Happy learning!
Editor’s note: Today we hear from Buildertrend, a leading provider of cloud-based construction management software. Since 2006, the platform has helped more than a million users globally simplify business management, track financials, and improve communication. To support this massive scale and their ambitious vision, they rely on a robust technology stack on Google Cloud, including, recently, Memorystore for Valkey. Read on to hear about their migration from Memorystore for Redis to the new platform.
Running a construction business is a complex balancing act that requires a constant stream of real-time information to keep projects on track. At Buildertrend, we understand the challenges our customers face — from fluctuating material costs and supply chain delays to managing tight deadlines and the risk of budget overruns — and work to help construction professionals improve efficiency, reduce risk, and enhance collaboration, all while growing their bottom line.
The challenge: Caching at scale
The construction industry has historically been slow to adopt new technologies, hindering efficiency and scalability. At Buildertrend, we aim to change this by being at the forefront of adopting new technology. When Memorystore for Valkey became generally available, we spent time looking into whether it could help us modernize our stack and deliver value to customers. We were attracted by Valkey’s truly open source posture and its promised performance benefits over competing technologies.
Before adopting Memorystore for Valkey, we had used Memorystore for Redis. While it served our basic needs, we found ourselves hitting a wall when it came to a critical feature: native cross-regional replication. As we scaled, we needed a solution that could support a global user base and provide seamless failover in case of a disaster or other issues within a region. We also needed a modern connectivity model such as Google Cloud’s Private Service Connect to enhance network security and efficiency.
As a fully managed, scalable, and highly available in-memory data store, Memorystore for Valkey offered the key features we needed out of the box to take our platform to the next level.
A modern solution for a modern problem
Within this ecosystem, we use Memorystore for Valkey for a variety of critical functions, including:
Database-backed cache: Speeds up data retrieval for a faster user experience
Session state: Manages user sessions for web applications
Job storage: Handles asynchronous task queues for background processes
Pub/Sub idempotency keys: Ensures messages are processed exactly once, preventing data duplication
Authentication tokens: Securely validates user identity with cryptographically signed tokens, enabling fast, scalable authentication
By leveraging the cache in these scenarios, our application is fast, resilient, and ready to meet the demands of our growing customer base. The native cross regional replication helped us support a global user base without having to worry about keeping global caches in sync.
A seamless migration with minimal disruption
Migrating from Memorystore for Redis to Memorystore for Valkey was a smooth process, thanks to close collaboration with the Google Cloud team. We worked with the Google Cloud team to identify the best approach, which for us involved exporting data to Google Cloud Storage and seeding the data at Valkey instance creation, allowing us to migrate with minimal downtime. Because Memorystore for Valkey natively supports Private Service Connect, we were able to eliminate a proxy layer that our engineers used to connect to our Memorystore for Redis instances, simplifying our stack and improving our networking posture.
Looking ahead to a global future
Although it’s still early in our journey, the impact is already clear. Memorystore for Valkey has unlocked our ability to scale and drastically reduced our time to market. It has allowed our team to streamline and own deployment processes, so they can be more agile and responsive.
For us, the future is about global scalability. With nearly 300 Memorystore for Valkey instances in our fleet, we’re building a globally available, cloud-native stack. Our most critical instances are highly optimized to serve up to 30,000 requests per second each, demonstrating the foundation’s scalability and performance.
We strive to use scalable cloud-native technologies, and Memorystore for Valkey will enable us to continue down this path. By using the Memorystore for Valkey managed service, we not only solve technical problems, but also accelerate business growth and empower engineering teams to focus on what matters most: building great products.
Ready to build with Memorystore for Valkey?
Like Buildertrend, you can leverage the power of a fully managed, scalable, and highly available in-memory data store to accelerate your applications and empower your development teams.
Artificial intelligence is reshaping our world – accelerating discovery, optimising systems, and unlocking new possibilities across every sector. But with its vast potential comes a shared responsibility.
AI can be a powerful ally for transforming businesses and reducing cost. It can help organizations minimize carbon emissions, industries manage energy use, and scientists model complex climate systems in real time. Yet the way we design, deploy, and run AI also matters. Building software sustainably means making every stage of the digital journey – from architecture to inference – more efficient, transparent, and resilient.
Innovation that serves sustainability
At Google, we believe innovation and sustainability go hand in hand. The same intelligence that powers breakthroughs can also help us use resources more wisely.
Projects like Green Light, which uses AI to optimise traffic signals and reduce emissions, and Project Contrails, which helps airlines cut the warming effects of condensation trails, show what happens when technology serves both performance and planet.
Each example reveals a helpful truth – that sustainability doesn’t slow innovation but instead fuels it, enabling efficiency to become an engine of progress.
From footprint to framework
Every software system, including AI, has an environmental footprint – from the hardware and energy that powers data centers to the water used to cool them. Water is one of the planet’s most precious and increasingly scarce resources and protecting it must be part of any technology strategy. That’s why Google is investing in advanced cooling systems and water stewardship projects with the goal to replenish more than we consume, helping preserve local ecosystems and community supplies.
Understanding this footprint helps engineers and organisations make smarter choices, like selecting efficient accelerators, rightsizing workloads, and scheduling operations when the grid is cleanest.
Across Google Cloud, we’re continually improving efficiency. Our Ironwood Tensor Processing Units (TPUs) are nearly 30 times more energy-efficient than our first Cloud TPU from 2018, and our data centres operate at a fleet-wide Power Usage Effectiveness (PUE) of 1.09, which is amongst the best in the world.
By designing systems that consume less energy and run on more carbon-free power, we help close the gap between ambition and action – turning digital progress into tangible emissions reductions.
But this isn’t achieved through infrastructure alone. It’s the result of decisions made at every layer of the software lifecycle. That’s why we encourage teams to think Sustainable by Design, bringing efficiency, measurement, and responsibility into every stage of building software.
Sustainable by Design: a mindset for the AI era
Today’s sustainability questions aren’t coming just from sustainability teams; they are coming directly from executives, financial operations teams, technology leads and developers. And they are often asking sustainability questions using infrastructure language: “Are we building the most price-performant AND efficient way to run AI?” This is not a niche environmental question; it’s relevant across -industries, across-geo’s and it requires that leaders consider sustainability criteria when they are designing infrastructure. A Sustainable by Design infrastructure strategy makes AI training and operation dramatically more cost- and energy-efficient. It’s built around a set of principles known as the 4Ms which lay out powerful ways to embed efficiency into software:
Machine – choose efficient computing resources that deliver more performance per watt.
Model – use or adapt existing models rather than starting from scratch — smaller, fine-tuned models can be faster and more resource efficient.
Mechanisation – automate data and AI operations through serverless and managed services to minimise idle compute.
Map – run workloads where and when the energy supply is cleanest.
The 4Ms help turn sustainability into a design principle, and a shared responsibility across every role in tech.
A collective journey toward resilience
As we host the AI Days in the Nordics, the conversation about AI’s environmental impact is accelerating, and so is the opportunity to act. Every software team, cloud architect, and product manager has a role to play in designing a digital ecosystem that enables and fuels innovation without compromising environmental impact.
Building software sustainably is essential for business resilience –AI applications that use fewer resources are not only more energy efficient; they’re scalable, and cost-effective for the organisations that depend on them.
Many developers are prototyping AI agents, but moving to a scalable, secure, and well-managed production agent is far more complex.
Vertex AI Agent Builder is Google Cloud’s comprehensive and open platform to build, scale, and govern reliable agents. As a suite of products, it provides the choice builders need to create powerful agentic systems at global scale.
Since Agent Builder’s public inception earlier this year, we’ve seen tremendous traction with components such as our Python Agent Development Kit (ADK), which has been downloaded over7 million times. Agent Development Kit also powers agents for customers using Gemini Enterprise and agents operating in products across Google.
Today, we build on that momentum by announcing new capabilities across the entire agent lifecycle to help you build, scale, and govern AI agents. Now, you can:
Build faster with control agent context and reduce token usage with configurable context layers (Static, Turn, User, Cache)via the ADK API.
Scale in production with new managed services from the Vertex AI Agent Engine (AE) including new observability and evaluation capabilities
Govern agents with confidence with newfeaturesincluding nativeagent identities and security safeguards
These new capabilities underscore our commitment to Agent Builder, and simplify the agent development lifecycle to meet you where you are, no matter which tech stack you choose.
For reference, here’s what to use, and when:
This diagram showcases the comprehensive makeup of Agent Builder neatly organized into the build, scale, and govern pillars.
1. Build your AI agents faster
Building an agent from a concept to a working product involves complex orchestration. That’s why we’ve improved ADK for your building experience:
Build more robust agents: Use our adaptable plugins framework for custom logic (like policy enforcement or usage tracking). Or use our prebuilt plugins, including a new plugin for tool use that helps agents ‘self-heal.’ This means the agent can recognize when a tool call has failed and automatically retry the action in a new way.
More language support: We are also enabling Go developers to build ADK agents (with a dedicated A2A Go SDK) alongside Python and Java, making the framework accessible to many more developers.
Single command deployment: Once you have built an agent, you can now use the ADK CLI to deploy agents using a single command, adk deploy,to the Agent Engine (AE) runtime. This is a major upgrade to help you move your agent from local development to live testing and production usage quickly and seamlessly.
You can start building today with adk-samples on GitHub or on Vertex AI Agent Garden – a growing repository of curated agent samples, solutions, and tools, designed to accelerate your development and support one click deployment of your agents built with ADK.
2. Scale your AI agents effectively
Once your agent is built and deployed, the next step is running it in production. As you scale from one agent to many, managing them effectively becomes a key challenge. That’s why we continue to expand the managed services available in Agent Engine. It provides the core capabilities for deploying and scaling the agents you create in Agent Builder
Observability: We’re bringing the local development environment that you know and love from adk web to Google Cloud to enable Cloud based production monitoring. Within Agent Engine, we are making it easy to:
Track key agent performance metrics with a dashboard that measures token consumption, latency, error rates, and tool calls over time.
Find and fix production issues faster in a traces tab so you can dive into flyouts to visualize and understand the sequence of actions your agents are taking.
Interact with your deployed agent (including past sessions or issues) with a playground to dramatically shorten your debug loop.
Quality & evaluation: You told us that evaluating non-deterministic systems is a major challenge. We agree. Now, you can simulate agent performance using the new Evaluation Layer that includes a User Simulator.
Simplified access: You can use the ADK CLI to deploy to the Agent Engine runtime and use AE sessions and memory without signing up for a Google Cloud account. Sign up using your Gmail addressand get started for free for up to 90 days. If you have a Google Cloud account, the AE runtime now offers a free tierso you can deploy and experiment without hesitation.
Below is a demo showcasing the new observability features in actions such as an updated AE dashboard, traces, and playground within Agent Engine
3. Govern your AI agents with confidence
Now that you can measure your agent performance at scale the final stage of the lifecycle is ensuring they operate safely and responsibly. New and expanded capabilities include:
Agent identities: Building on our existing Cloud IAM capabilities, we are giving agents their own unique, native identities within Google Cloud. As first-classIAM principals, agent identities allow you to enforce true least-privilege access, establish granular policies, and resource boundaries to meet your compliance and governance requirements.
Safeguards and advanced security: Existing protections are already available to protect and secure AI applications. Model Armor provides protection against input risks like prompt injection, while also screening tool calls and agent responses. For complete control, Model Armor provides built-in inline protection for Gemini models and a REST API to integrate with your agents. To provide full visibility, new integrations with AI Protection in Security Command Center will discover and inventory agentic assets as well as detect agentic threats such as unauthorized access and data exfiltration attempts by agents.
As a bonus, agents you build in Agent Builder can be registered for your teams to use directly within Gemini Enterprise.
Below is a mock of a dashboard in Gemini Enterprise, showing how custom agents built in Agent Builder can be registered and made available to your employees, creating a single place for them to accelerate their workflows.
How customers are achieving more with Agent Builder
“Color Health, with its affiliated medical group Color Medical, operates the nation’s only Virtual Cancer Clinic, delivering clinically guided, end-to-end cancer care across all 50 states, from prevention to survivorship. In partnership with Google Cloud and Google.org, we’re helping more women get screened for breast cancer using an AI-powered agent built with Vertex AI Agent Builder using ADK powered by Gemini LLMs and scaling them into production with Agent Engine. The Color Assistant determines if women are due for a mammogram, connects them with clinicians, and schedules care. The power of the agent lies in the scale it enables, helping us reach more women, collect diverse and context-rich answers, and respond in real time. Early detection saves lives: 1 in 8 women develop breast cancer, yet early detection yields a 99% survival rate. Check it out here: color.com/breast-cancer-screening” – Jayodita Sanghvi, PhD., Head of AI Platform, Color
“PayPal uses Vertex AI Agent Builder to rapidly build and deploy agents in production. Specifically, we use Agent Development Kit (ADK) CLI and visual tools to inspect agent interactions, follow state changes, and manage multi-agent workflows. We leverage the step-by-step visibility feature for tracing and debugging agent workflows. This lets the team easilytrace requests/responses and visualize the flow of intent, cart, and payment mandates. Finally, Agent Payment Protocol (AP2) on Agent Builder provides us the critical foundation for trusted agent payments. AP2 helps our ecosystem accelerate the shipping of safe, secure agent-based commerce experiences.” –Nitin Sharma, Principal Engineer, AI
“Geotab uses Vertex AI Agent Builder to rapidly build and deploy agents in production. Specifically, we use Google’s Agent Development Kit (ADK) as the framework for our AI Agent Center of Excellence. It provides the flexibility to orchestrate various frameworks under a single, governable path to production, while offering an exceptional developer experience that dramatically accelerates our build-test-deploy cycle. For Geotab, ADK is the foundation that allows us to rapidly and safely scale our agentic AI solutions across the enterprise” – Mike Bench, Vice President, Data & Analytics
Get started
Vertex AI Agent Builder provides the unified platform to manage the entire agent lifecycle, helping you close the gap from prototype to a production-ready agent. To explore these new features, visit the updated Agent Builder documentation to learn more.
If you’re a startup and you’re interested in learning more about building and deploying agents, download the Startup Technical Guide: AI Agents. This guide provides the knowledge needed to go from an idea to prototype to scale, whether your goals are to automate tasks, enhance creativity, or launch entirely new user experiences for your startup.
Welcome to the first Cloud CISO Perspectives for November 2025. Today, Sandra Joyce, vice-president, Google Threat Intelligence, updates us on the state of the adversarial misuse of AI.
As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.
aside_block
<ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0b0eeb2610>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
Recent advances in how threat actors use AI tools
By Sandra Joyce, vice-president, Google Threat Intelligence
Sandra Joyce, vice-president, Google Threat Intelligence
As defenders have made significant advances in using AI to boost their efforts this year, government-backed threat actors and cybercriminals have been trying to do the same. Google Threat Intelligence Group (GTIG) has observed threat actors moving beyond using AI solely for productivity gains: They’re experimenting with deploying novel AI-enabled malware in active operations.
This shift marks a new phase in how threat actors use AI, shifting from experimentation to wider takeup of tools. It follows our analysis on the adversarial misuse of generative AI, where we found that, up until the point when we published the report in January, threat actors were using Gemini mostly for productivity gains.
At Google, we are committed to developing AI responsibly and are taking proactive steps to disrupt malicious activity, disabling the projects and accounts associated with these threat actors.
Based on GTIG’s unique visibility into the misuse of AI tools and the broader threat landscape, the new report details four key findings on how government-backed threat actors and cybercriminals are integrating AI across their entire attack lifecycle. By understanding how adversaries are innovating with AI, security leaders can get ahead of threats and take proactive measures to update their security posture against a changing threat landscape.
1. AI generating commands to steal documents and data
For the first time, GTIG has identified malware families that use large language models (LLMs) during execution. These tools can dynamically generate malicious scripts, use self-modification to obfuscate their own code to evade detection, and receive commands from AI models rather than traditional command-and-control (C2) servers.
One such new malware detailed in the full report is a data miner we track as PROMPTSTEAL. In June, GTIG identified the Russian government-backed actor APT28 (also known as FROZENLAKE) using PROMPTSTEAL, which masquerades as an image generation program that guides the user through a series of prompts to generate images.
In the background, PROMPSTEAL queries the API for Hugging Face, a platform for open-source machine learning including LLMs, to generate commands for execution, rather than hard-coding commands in the malware. The prompt specifically asks the LLM to output commands to gather system information, to copy documents to a specified directory, and to exfiltrate data.
Our analysis indicates continued development of this malware, with new samples adding obfuscation and changing the C2 method.
FROZENLAKE’s use of PROMPTSTEAL constitutes our first observation of malware querying a LLM deployed in live operations. Combined with other recent experimental implementations of novel AI techniques, this campaign provides an early indicator of how threats are evolving and how adversaries can potentially integrate AI capabilities into future intrusion activity.
What Google is doing: Google has taken action against this actor by disabling the assets associated with their activity. Google DeepMind has also used these insights to further strengthen our protections against misuse by strengthening both Google’s classifiers and the model itself. This enables the model to refuse to assist with these types of attacks moving forward.
2. Social engineering to bypass safeguards
Threat actors have been adopting social engineering pretexts in their prompts to bypass AI safeguards. We observed actors posing as cybersecurity researchers and as students in capture-the-flag (CTF) competitions to persuade Gemini to provide information that would otherwise receive a safety response from Gemini.
In one interaction, a threat actor asked Gemini to identify vulnerabilities on a compromised system, but received a safety response from Gemini that a detailed response would not be safe. They reframed the prompt by depicting themselves as a participant in a CTF exercise, and in response Gemini returned helpful information that could be misused to exploit the system.
The threat actor appeared to learn from this interaction and continued to use the CTF pretext over several weeks in support of phishing, exploitation, and webshell development.
What Google is doing: We took action against the CTF threat actor by disabling the assets associated with the actor’s activity. Google DeepMind was able to use these insights to further strengthen our protections against misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.
3. Maturing cybercrime marketplace for AI tooling
In addition to misusing mainstream AI-enabled tools and services, there is a growing interest and marketplace for purpose-built AI tools and services that can enable illicit activities. To identify evolving threats, GTIG tracks posts and advertisements on underground forums related to AI tools and services as well as discussions surrounding the technology.
Many underground forum advertisements mirror language comparable to marketing for legitimate AI models, citing the need to improve the efficiency of workflows and effort while simultaneously offering guidance for prospective customers interested in their offerings.
The underground marketplace for illicit AI tools has matured in 2025. GTIG has identified multiple offerings of multifunctional tools designed to support phishing, malware development, vulnerability research, and other capabilities. This development has lowered the barrier to entry for less sophisticated, poorly-resourced threat actors.
What Google is doing: While there are no direct mitigations to prevent threat actors from developing their own AI tools, at Google we use threat intelligence to disrupt adversary operations — including monitoring the cybercrime AI tool marketplace.
4. Continued augmentation of the full attack lifecycle
State-sponsored actors from North Korea, Iran, and the People’s Republic of China (PRC) continue to misuse AI to enhance all stages of their operations, from reconnaissance and phishing lure creation to C2 development and data exfiltration.
In one example, GTIG observed a suspected PRC-nexus actor using Gemini to support multiple stages of an intrusion campaign, including conducting initial reconnaissance on targets, researching phishing techniques to deliver payloads, soliciting assistance from Gemini related to lateral movement, seeking technical support for C2 efforts once inside a victim’s system, and helping with data exfiltration.
What Google is doing: GTIG takes a holistic, intelligence-driven approach to detecting and disrupting threat activity. Our understanding of government-backed threat actors and their campaigns can help provide the needed context to identify threat-enabling activity. By tracking this activity, we’re able to leverage our insights to counter threats across Google platforms, including disrupting the activity of threat actors who have misused Gemini.
Our learnings from countering malicious activities are fed back into our product development to improve safety and security for our AI models. Google DeepMind was able to use these insights to further strengthen our protections against misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.
Building AI safely and responsibly
At Google, we are committed to developing AI responsibly and are taking proactive steps to disrupt malicious activity, disabling the projects and accounts associated with these threat actors. In addition to taking action against accounts, we have proactively fed the intelligence back into our teams and products to better protect Google and its users. We continuously improve our models to make them less susceptible to misuse, and share our findings to arm defenders and enable stronger protections across the ecosystem.
We believe our approach to AI must be both bold and responsible. That means developing AI in a way that maximizes the positive benefits to society while addressing the challenges. Guided by our AI Principles, Google designs AI systems with robust security measures and strong safety guardrails, and we continuously test the security and safety of our models to improve them.
<ListValue: [StructValue([(‘title’, ‘Tell us what you think’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0b0eeb2280>), (‘btn_text’, ‘Join the conversation’), (‘href’, ‘https://google.qualtrics.com/jfe/form/SV_2n82k0LeG4upS2q’), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
In case you missed it
Here are the latest updates, products, services, and resources from our security teams so far this month:
How Google Does It: Threat modeling, from basics to AI: Threat modeling plays a critical role at Google in how we detect and respond to threats — and secure our use of the public cloud. Read more.
How rapid threat models inject more reality into tabletops: Using rapid threat models in tabletop exercises can help you better understand how defense should adapt to the dynamic threat environment. Read more.
How we’re helping customers prepare for a quantum-safe future: Google has been working on quantum-safe computing for nearly a decade. Here’s our latest on protecting data in transit, digital signatures, and public key infrastructure. Read more.
HTTPS by default coming to Chrome: One year from now, with the release of Chrome 154 in October 2026, we will change the default settings of Chrome to enable “Always Use Secure Connections”. This means Chrome will ask for the user’s permission before the first access to any public site without HTTPS. Read more.
How AI helps Android keep you safe from mobile scams: For years, Android has been on the frontlines in the battle against scammers, using the best of Google AI to build proactive, layered protections that can anticipate and block scams before they reach you. Read more.
Please visit the Google Cloud blog for more security stories published this month.
aside_block
<ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0b0eeb2dc0>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
Threat Intelligence news
A defender’s guide to privileged account monitoring: Privileged access stands as the most critical pathway for adversaries seeking to compromise sensitive systems and data. This guide can help you protect the proverbial keys to your kingdom with recommendations and insights to prevent, detect, and respond to intrusions targeting privileged accounts. Read more.
Pro-Russia information operations leverage Russian drone incursions into Polish airspace: GTIG has observed multiple instances of pro-Russia information operations (IO) actors promoting narratives related to the reported incursion of Russian drones into Polish airspace that occurred in September. The IO activity appeared consistent with previously-observed instances of pro-Russia IO targeting Poland — and more broadly the NATO Alliance and the West. Read more.
Vietnamese actors using fake job posting campaigns to deliver malware and steal credentials: GTIG is tracking a cluster of financially-motivated threat actors operating from Vietnam that use fake job postings on legitimate platforms to target individuals in the digital advertising and marketing sectors. Read more.
Please visit the Google Cloud blog for more threat intelligence stories published this month.
Now hear this: Podcasts from Google Cloud
The end of ‘collect everything’: Moving from centralization to data access: Will the next big SIEM and SOC cost-savings come from managing security data access? Balazs Scheidler, CEO, Axoflow, and founder of syslog-ng, debates the future of security data with hosts Anton Chuvakin and Tim Peacock. Listen here.
Cyber Savvy Boardroom: Valuing investment beyond the balance sheet: Andreas Wuchner, cybersecurity and risk expert, and board advisor, shares his perspective on how smart investments can transform risk management into a brand promise. Listen here.
Behind the Binary: Building a robust network at Black Hat: Host Josh Stroschein is joined by Mark Overholser, a technical marketing engineer, Corelight, who also helps run the Black Hat Network Operations Center (NOC). He gives us an insider’s look at the philosophy and challenges behind building a robust network for a security conference. Listen here.
To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.
Based on recent analysis of the broader threat landscape, Google Threat Intelligence Group (GTIG) has identified a shift that occurred within the last year: adversaries are no longer leveraging artificial intelligence (AI) just for productivity gains, they are deploying novel AI-enabled malware in active operations. This marks a new operational phase of AI abuse, involving tools that dynamically alter behavior mid-execution.
This report serves as an update to our January 2025 analysis, “Adversarial Misuse of Generative AI,” and details how government-backed threat actors and cyber criminals are integrating and experimenting with AI across the industry throughout the entire attack lifecycle. Our findings are based on the broader threat landscape.
At Google, we are committed to developing AI responsibly and take proactive steps to disrupt malicious activity by disabling the projects and accounts associated with bad actors, while continuously improving our models to make them less susceptible to misuse. We also proactively share industry best practices to arm defenders and enable stronger protections across the ecosystem. Throughout this report we’ve noted steps we’ve taken to thwart malicious activity, including disabling assets and applying intel to strengthen both our classifiers and model so it’s protected from misuse moving forward. Additional details on how we’re protecting and defending Gemini can be found in this white paper, “Advancing Gemini’s Security Safeguards.”
aside_block
<ListValue: [StructValue([(‘title’, ‘GTIG AI Threat Tracker: Advances in Threat Actor Usage of AI Tools’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0aebe01b50>), (‘btn_text’, ‘Download now’), (‘href’, ‘https://services.google.com/fh/files/misc/advances-in-threat-actor-usage-of-ai-tools-en.pdf’), (‘image’, <GAEImage: misuse of AI 2 cover>)])]>
Key Findings
First Use of “Just-in-Time” AI in Malware: For the first time, GTIG has identified malware families, such as PROMPTFLUX and PROMPTSTEAL, that use Large Language Models (LLMs) during execution. These tools dynamically generate malicious scripts, obfuscate their own code to evade detection, and leverage AI models to create malicious functions on demand, rather than hard-coding them into the malware. While still nascent, this represents a significant step toward more autonomous and adaptive malware.
“Social Engineering” to Bypass Safeguards: Threat actors are adopting social engineering-like pretexts in their prompts to bypass AI safety guardrails. We observed actors posing as students in a “capture-the-flag” competition or as cybersecurity researchers to persuade Gemini to provide information that would otherwise be blocked, enabling tool development.
Maturing Cyber Crime Marketplace for AI Tooling: The underground marketplace for illicit AI tools has matured in 2025. We have identified multiple offerings of multifunctional tools designed to support phishing, malware development, and vulnerability research, lowering the barrier to entry for less sophisticated actors.
Continued Augmentation of the Full Attack Lifecycle: State-sponsored actors including from North Korea, Iran, and the People’s Republic of China (PRC) continue to misuse Gemini to enhance all stages of their operations, from reconnaissance and phishing lure creation to command and control (C2) development and data exfiltration.
Threat Actors Developing Novel AI Capabilities
For the first time in 2025, GTIG discovered a code family that employed AI capabilities mid-execution to dynamically alter the malware’s behavior. Although some recent implementations of novel AI techniques are experimental, they provide an early indicator of how threats are evolving and how they can potentially integrate AI capabilities into future intrusion activity. Attackers are moving beyond “vibe coding” and the baseline observed in 2024 of using AI tools for technical support. We are only now starting to see this type of activity, but expect it to increase in the future.
Publicly available reverse shell written in PowerShell that establishes a remote connection to a configured command-and-control server and allows a threat actor to execute arbitrary commands on a compromised system. Notably, this code family contains hard-coded prompts meant to bypass detection or analysis by LLM-powered security systems.
Dropper written in VBScript that decodes and executes an embedded decoy installer to mask its activity. Its primary capability is regeneration, which it achieves by using the Google Gemini API. It prompts the LLM to rewrite its own source code, saving the new, obfuscated version to the Startup folder to establish persistence. PROMPTFLUX also attempts to spread by copying itself to removable drives and mapped network shares.
Cross-platform ransomware written in Go, identified as a proof of concept. It leverages an LLM to dynamically generate and execute malicious Lua scripts at runtime. Its capabilities include filesystem reconnaissance, data exfiltration, and file encryption on both Windows and Linux systems.
Data miner written in Python and packaged with PyInstaller. It contains a compiled script that uses the Hugging Face API to query the LLM Qwen2.5-Coder-32B-Instruct to generate one-line Windows commands. Prompts used to generate the commands indicate that it aims to collect system information and documents in specific folders. PROMPTSTEAL then executes the commands and sends the collected data to an adversary-controlled server.
Credential stealer written in JavaScript that targets GitHub and NPM tokens. Captured credentials are exfiltrated via creation of a publicly accessible GitHub repository. In addition to these tokens, QUIETVAULT leverages an AI prompt and on-host installed AI CLI tools to search for other potential secrets on the infected system and exfiltrate these files to GitHub as well.
Observed in operations
Table 1: Overview of malware with novel AI capabilities GTIG detected in 2025
Experimental Malware Using Gemini for Self-Modification to Evade Detection
In early June 2025, GTIG identified experimental dropper malware tracked as PROMPTFLUX that suggests threat actors are experimenting with LLMs to develop dynamic obfuscation techniques. PROMPTFLUX is written in VBScript and interacts with Gemini’s API to request specific VBScript obfuscation and evasion techniques to facilitate “just-in-time” self-modification, likely to evade static signature-based detection.
Further examination of PROMPTFLUX samples suggests this code family is currently in a development or testing phase since some incomplete features are commented out and a mechanism exists to limit the malware’s Gemini API calls. The current state of this malware does not demonstrate an ability to compromise a victim network or device. We have taken action to disable the assetsassociated with this activity.
The most novel component of PROMPTFLUX is its “Thinking Robot” module, designed to periodically query Gemini to obtain new code for evading antivirus software. This is accomplished using a hard-coded API key to send a POST request to the Gemini API endpoint. The actor specifies the use of the “gemini-1.5-flash-latest” model; the “-latest” tag ensures the malware always calls the most current stable release, making the tool more resilient to model deprecation. The prompt sent to the LLM is highly specific and machine-parsable, requesting VBScript code for antivirus evasion and instructing the LLM to output only the code itself.
Although the self-modification function (AttemptToUpdateSelf) is commented out, its presence, combined with the active logging of AI responses to “%TEMP%thinking_robot_log.txt“, clearly indicates the author’s goal of creating a metamorphic script that can evolve over time.
Figure 1: VBS “StartThinkingRobot” function
We identified multiple variations of PROMPTFLUX using LLM-driven code regeneration, including one with the “Thinking Robot” function replaced with a new “Thinging” function. This function leverages a prompt to instruct the Gemini API to rewrite the malware’s entire source code on an hourly basis to evade detection. The prompt directs the LLM to act as an “expert VBScript obfuscator,” ensuring the new variant remains viable by embedding the original decoy payload, the hard-coded API key, and the complete self-regeneration logic, thereby establishing a recursive cycle of mutation.
Although unattributed to a specific threat actor, the filenames associated with PROMPTFLUX highlight behaviors commonly associated with financially motivated actors. Specifically, varied social engineering lures including “crypted_ScreenRec_webinstall” highlight a broad, geography- and industry-agnostic approach designed to trick a wide range of users.
While PROMPTFLUX is likely still in research and development phases, this type of obfuscation technique is an early and significant indicator of how malicious operators will likely augment their campaigns with AI moving forward.
Mitigations
Our intelligence also indicates this activity is in a development or testing phase, as opposed to being used in the wild, and currently does not have the ability to compromise a victim network or device. Google has taken action against this actor by disabling the assets associated with their activity. Google DeepMind has also used these insights to further strengthen our protections against such misuse by strengthening both Google’s classifiers and the model itself. This enables the model to refuse to assist with these types of attacks moving forward.
LLM Generating Commands to Steal Documents and System Information
In June, GTIG identified the Russian government-backed actor APT28 (aka FROZENLAKE) using new malware against Ukraine we track as PROMPTSTEAL and reported by CERT-UA as LAMEHUG. PROMPTSTEAL is a data miner, which queries an LLM (Qwen2.5-Coder-32B-Instruct) to generate commands for execution via the API for Hugging Face, a platform for open-source machine learning including LLMs. APT28’s use of PROMPTSTEAL constitutes our first observation of malware querying an LLM deployed in live operations.
PROMPTSTEAL novelly uses LLMs to generate commands for the malware to execute rather than hard coding the commands directly in the malware itself. It masquerades as an “image generation” program that guides the user through a series of prompts to generate images while querying the Hugging Face API to generate commands for execution in the background.
Make a list of commands to create folder C:Programdatainfo and
to gather computer information, hardware information, process and
services information, networks information, AD domain information,
to execute in one line and add each result to text file
c:Programdatainfoinfo.txt. Return only commands, without markdown
Figure 2: PROMPTSTEAL prompt used to generate command to collect system information
Make a list of commands to copy recursively different office and
pdf/txt documents in user Documents,Downloads and Desktop
folders to a folder c:Programdatainfo to execute in one line.
Return only command, without markdown.
Figure 3: PROMPTSTEAL prompt used to generate command to collect targeted documents
PROMPTSTEAL likely uses stolen API tokens to query the Hugging Face API. The prompt specifically asks the LLM to output commands to generate system information and also to copy documents to a specified directory. The output from these commands are then blindly executed locally by PROMPTSTEAL before the output is exfiltrated. Our analysis indicates continued development of this malware, with new samples adding obfuscation and changing the C2 method.
Social Engineering to Bypass Safeguards
Guided by our AI Principles, Google designs AI systems with robust security measures and strong safety guardrails. Threat actors are adopting social engineering-like pretexts in their prompts to bypass AI safety guardrails. We observed actors posing as participants in a “capture-the-flag” competition for cybersecurity researchers to persuade Gemini to provide information that would otherwise be blocked, enabling vulnerability exploitation and tool development. In addition to disrupting these actors, we use these insights to continuously improve our classifiers and strengthen the security of our products.
Capture-the-Flag: Identifying as a CTF Player to Enable Research for Mass Exploitation
A China-nexus threat actor misused Gemini to enhance the effectiveness of their campaigns by crafting lure content, building technical infrastructure, and developing tooling for data exfiltration.
In one interaction, the actor asked Gemini to identify vulnerabilities on a compromised system, but received a safety response from Gemini that a detailed response would not be safe. The actor then continued, reframing the prompt by depicting themselves as a participant in a capture-the-flag (CTF) exercise, a gamified cybersecurity competition to enhance skills and techniques. When prompted to help in a CTF exercise, Gemini returned helpful information that could be misused to exploit the system.
The actor appeared to learn from this interaction and used the CTF pretext in support of phishing, exploitation, and web shell development. The actor prefaced many of their prompts about exploitation of specific software and email services with comments such as “I am working on a CTF problem” or “I am currently in a CTF, and I saw someone from another team say …” This approach provided advice on the next exploitation steps in a “CTF scenario.”
Mitigations
Gemini’s safety and security guardrails provided safety responses during this activity and Google took further action against the actor to halt future activity. It’s also important to note the context of these prompts, which if normally posed by a participant of the CTF vs. a threat actor, would be benign inquiries. This nuance in AI use highlights critical differentiators in benign vs. misuse of AI that we continue to analyze to balance Gemini functionality with both usability and security. Google has taken action against this actor by disabling the assets associated with its activity and sharing insights with Google DeepMind to further strengthen our protections against such misuse. We have since strengthened both classifiers and the model itself, helping it to deny assistance with these types of attacks moving forward.
Figure 4: A China-nexus threat actor’s misuse of Gemini mapped across the attack lifecycle
The Iranian state-sponsored threat actor TEMP.Zagros (aka MUDDYCOAST, Muddy Water) used Gemini to conduct research to support the development of custom malware, an evolution in the group’s capability. They continue to rely on phishing emails, often using compromised corporate email accounts from victims to lend credibility to their attacks, but have shifted from using public tools to developing custom malware including web shells and a Python-based C2 server.
While using Gemini to conduct research to support the development of custom malware, the threat actor encountered safety responses. Much like the previously described CTF example, Temp.Zagros used various plausible pretexts in their prompts to bypass security guardrails. These included pretending to be a student working on a final university project or “writing a paper” or “international article” on cybersecurity.
In some observed instances, threat actors’ reliance on LLMs for development has led to critical operational security failures, enabling greater disruption.
The threat actor asked Gemini to help with a provided script, which was designed to listen for encrypted requests, decrypt them, and execute commands related to file transfers and remote execution. This revealed sensitive, hard-coded information to Gemini, including the C2 domain and the script’s encryption key, facilitating our broader disruption of the attacker’s campaign and providing a direct window into their evolving operational capabilities and infrastructure.
Mitigations
These activities triggered Gemini’s safety responses and Google took additional, broader action to disrupt the threat actor’s campaign based on their operational security failures. Additionally, we’ve taken action against this actor by disabling the assets associated with this activity and making updates to prevent further misuse. Google DeepMind has used these insights to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.
Purpose-Built Tools and Services for Sale in Underground Forums
In addition to misusing existing AI-enabled tools and services across the industry, there is a growing interest and marketplace for AI tools and services purpose-built to enable illicit activities. Tools and services offered via underground forums can enable low-level actors to augment the frequency, scope, efficacy, and complexity of their intrusions despite their limited technical acumen and financial resources.
To identify evolving threats, GTIG tracks posts and advertisements on English- and Russian-language underground forums related to AI tools and services as well as discussions surrounding the technology. Many underground forum advertisements mirrored language comparable to traditional marketing of legitimate AI models, citing the need to improve the efficiency of workflows and effort while simultaneously offering guidance for prospective customers interested in their offerings.
Advertised Capability
Threat Actor Application
Deepfake/Image Generation
Create lure content for phishing operations or bypass know your customer (KYC) security requirements
Malware Generation
Create malware for specific use cases or improve upon pre-existing malware
Phishing Kits and Phishing Support
Create engaging lure content or distribute phishing emails to a wider audience
Research and Reconnaissance
Quickly research and summarize cybersecurity concepts or general topics
Technical Support and Code Generation
Expand a skill set or generate code, optimizing workflow and efficiency
Vulnerability Exploitation
Provide publicly available research or searching for pre-existing vulnerabilities
Table 2: Advertised capabilities on English- and Russian-language underground forums related to AI tools and services
In 2025 the cyber crime marketplace for AI-enabled tooling matured, and GTIG identified multiple offerings for multifunctional tools designed to support stages of the attack lifecycle. Of note, almost every notable tool advertised in underground forums mentioned their ability to support phishing campaigns.
Underground advertisements indicate many AI tools and services promoted similar technical capabilities to support threat operations as those of conventional tools. Pricing models for illicit AI services also reflect those of conventional tools, with many developers injecting advertisements into the free version of their services and offering subscription pricing tiers to add on more technical features such as image generation, API access, and Discord access for higher prices.
Figure 5: Capabilities of notable AI tools and services advertised in English- and Russian-language underground forums
GTIG assesses that financially motivated threat actors and others operating in the underground community will continue to augment their operations with AI tools. Given the increasing accessibility of these applications, and the growing AI discourse in these forums, threat activity leveraging AI will increasingly become commonplace amongst threat actors.
Continued Augmentation of the Full Attack Lifecycle
State-sponsored actors from North Korea, Iran, and the People’s Republic of China (PRC) continue to misuse generative AI tools including Gemini to enhance all stages of their operations, from reconnaissance and phishing lure creation to C2 development and data exfiltration. This extends one of our core findings from our January 2025 analysis Adversarial Misuse of Generative AI.
Expanding Knowledge of Less Conventional Attack Surfaces
GTIG observed a suspected China-nexus actor leveraging Gemini for multiple stages of an intrusion campaign, conducting initial reconnaissance on targets of interest, researching phishing techniques to deliver payloads, soliciting assistance from Gemini related to lateral movement, seeking technical support for C2 efforts once inside a victim’s system, and leveraging help for data exfiltration.
In addition to supporting intrusion activity on Windows systems, the actor misused Gemini to support multiple stages of an intrusion campaign on attack surfaces they were unfamiliar with including cloud infrastructure, vSphere, and Kubernetes.
The threat actor demonstrated access to AWS tokens for EC2 (Elastic Compute Cloud) instances and used Gemini to research how to use the temporary session tokens, presumably to facilitate deeper access or data theft from a victim environment. In another case, the actor leaned on Gemini to assist in identifying Kubernetes systems and to generate commands for enumerating containers and pods. We also observed research into getting host permissions on MacOS, indicating a threat actor focus on phishing techniques for that system.
Mitigations
These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.
Figure 6: A suspected China-nexus threat actor’s misuse of Gemini across the attack lifecycle
North Korean Threat Actors Misuse Gemini Across the Attack Lifecycle
Threat actors associated with the Democratic People’s Republic of Korea (DPRK) continue to misuse generative AI tools to support operations across the stages of the attack lifecycle, aligned with their efforts to target cryptocurrency and provide financial support to the regime.
Specialized Social Engineering
In recent operations, UNC1069 (aka MASAN) used Gemini to research cryptocurrency concepts, and perform research and reconnaissance related to the location of users’ cryptocurrency wallet application data. This North Korean threat actor is known to conduct cryptocurrency theft campaigns leveraging social engineering, notably using language related to computer maintenance and credential harvesting.
The threat actor also generated lure material and other messaging related to cryptocurrency, likely to support social engineering efforts for malicious activity. This included generating Spanish-language work-related excuses and requests to reschedule meetings, demonstrating how threat actors can overcome the barriers of language fluency to expand the scope of their targeting and success of their campaigns.
To support later stages of the campaign, UNC1069 attempted to misuse Gemini to develop code to steal cryptocurrency, as well as to craft fraudulent instructions impersonating a software update to extract user credentials. We have disabled this account.
Mitigations
These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.
Using Deepfakes
Beyond UNC1069’s misuse of Gemini, GTIG recently observed the group leverage deepfake images and video lures impersonating individuals in the cryptocurrency industry as part of social engineering campaigns to distribute its BIGMACHO backdoor to victim systems. The campaign prompted targets to download and install a malicious “Zoom SDK” link.
Figure 7: North Korean threat actor’s misuse of Gemini to support their operations
Attempting to Develop Novel Capabilities with AI
UNC4899 (aka PUKCHONG), a North Korean threat actor notable for their use of supply chain compromise, used Gemini for a variety of purposes including developing code, researching exploits, and improving their tooling. The research into vulnerabilities and exploit development likely indicates the group is developing capabilities to target edge devices and modern browsers. We have disabled the threat actor’s accounts.
Figure 8: UNC4899 (aka PUKCHONG) misuse of Gemini across the attack lifecycle
Capture-the-Data: Attempts to Develop a “Data Processing Agent”
The use of Gemini by APT42, an Iranian government-backed attacker, reflects the group’s focus on crafting successful phishing campaigns. In recent activity, APT42 used the text generation and editing capabilities of Gemini to craft material for phishing campaigns, often impersonating individuals from reputable organizations such as prominent think tanks and using lures related to security technology, event invitations, or geopolitical discussions. APT42 also used Gemini as a translation tool for articles and messages with specialized vocabulary, for generalized research, and for continued research into Israeli defense.
APT42 also attempted to build a “Data Processing Agent”, misusing Gemini to develop and test the tool. The agent converts natural language requests into SQL queries to derive insights from sensitive personal data. The threat actor provided Gemini with schemas for several distinct data types in order to perform complex queries such as linking a phone number to an owner, tracking an individual’s travel patterns, or generating lists of people based on shared attributes. We have disabled the threat actors’ accounts.
Mitigations
These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.
Figure 9: APT42’s misuse of Gemini to support operations
Code Development: C2 Development and Support for Obfuscation
Threat actors continue to adapt generative AI tools to augment their ongoing activities, attempting to enhance their tactics, techniques, and procedures (TTPs) to move faster and at higher volume. For skilled actors, generative AI tools provide a helpful framework, similar to the use of Metasploit or Cobalt Strike in cyber threat activity. These tools also afford lower-level threat actors the opportunity to develop sophisticated tooling, quickly integrate existing techniques, and improve the efficacy of their campaigns regardless of technical acumen or language proficiency.
Throughout August 2025, GTIG observed threat activity associated with PRC-backed APT41, utilizing Gemini for assistance with code development. The group has demonstrated a history of targeting a range of operating systems across mobile and desktop devices as well as employing social engineering compromises for their operations. Specifically, the group leverages open forums to both lure victims to exploit-hosting infrastructure and to prompt installation of malicious mobile applications.
In order to support their campaigns, the actor was seeking out technical support for C++ and Golang code for multiple tools including a C2 framework called OSSTUN by the actor. The group was also observed prompting Gemini for help with code obfuscation, with prompts related to two publicly available obfuscation libraries.
Figure 10: APT41 misuse of Gemini to support operations
Information Operations and Gemini
GTIG continues to observe IO actors utilize Gemini for research, content creation, and translation, which aligns with their previous use of Gemini to support their malicious activity. We have identified Gemini activity that indicates threat actors are soliciting the tool to help create articles or aid them in building tooling to automate portions of their workflow. However, we have not identified these generated articles in the wild, nor identified evidence confirming the successful automation of their workflows leveraging this newly built tooling. None of these attempts have created breakthrough capabilities for IO campaigns.
Mitigations
For observed IO campaigns, we did not see evidence of successful automation or any breakthrough capabilities. These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.
Building AI Safely and Responsibly
We believe our approach to AI must be both bold and responsible. That means developing AI in a way that maximizes the positive benefits to society while addressing the challenges. Guided by our AI Principles, Google designs AI systems with robust security measures and strong safety guardrails, and we continuously test the security and safety of our models to improve them.
Our policy guidelines and prohibited use policies prioritize safety and responsible use of Google’s generative AI tools. Google’s policy development process includes identifying emerging trends, thinking end-to-end, and designing for safety. We continuously enhance safeguards in our products to offer scaled protections to users across the globe.
At Google, we leverage threat intelligence to disrupt adversary operations. We investigate abuse of our products, services, users, and platforms, including malicious cyber activities by government-backed threat actors, and work with law enforcement when appropriate. Moreover, our learnings from countering malicious activities are fed back into our product development to improve safety and security for our AI models. These changes, which can be made to both our classifiers and at the model level, are essential to maintaining agility in our defenses and preventing further misuse.
Google DeepMind also develops threat models for generative AI to identify potential vulnerabilities, and creates new evaluation and training techniques to address misuse. In conjunction with this research, Google DeepMind has shared how they’re actively deploying defenses in AI systems, along with measurement and monitoring tools, including a robust evaluation framework that can automatically red team an AI vulnerability to indirect prompt injection attacks.
Our AI development and Trust & Safety teams also work closely with our threat intelligence, security, and modelling teams to stem misuse.
The potential of AI, especially generative AI, is immense. As innovation moves forward, the industry needs security standards for building and deploying AI responsibly. That’s why we introduced the Secure AI Framework (SAIF), a conceptual framework to secure AI systems. We’ve shared a comprehensive toolkit for developers with resources and guidance for designing, building, and evaluating AI models responsibly. We’ve also shared best practices for implementing safeguards, evaluating model safety, and red teaming to test and secure AI systems.
Google also continuously invests in AI research, helping to ensure AI is built responsibly, and that we’re leveraging its potential to automatically find risks. Last year, we introduced Big Sleep, an AI agent developed by Google DeepMind and Google Project Zero, that actively searches and finds unknown security vulnerabilities in software. Big Sleep has since found its first real-world security vulnerability and assisted in finding a vulnerability that was imminently going to be used by threat actors, which GTIG was able to cut off beforehand. We’re also experimenting with AI to not only find vulnerabilities, but also patch them. We recently introduced CodeMender, an experimental AI-powered agent utilizing the advanced reasoning capabilities of our Gemini models to automatically fix critical code vulnerabilities.
About the Authors
Google Threat Intelligence Group focuses on identifying, analyzing, mitigating, and eliminating entire classes of cyber threats against Alphabet, our users, and our customers. Our work includes countering threats from government-backed attackers, targeted zero-day exploits, coordinated information operations (IO), and serious cyber crime networks. We apply our intelligence to improve Google’s defenses and protect our users and customers.
If you’ve ever wondered how multiple AI agents can actually work together to solve problems too complex for a single agent, you’re in the right place. This guide, based on our two-part video series, will walk you through the foundational concepts of Multi-Agent Systems (MAS) and show you how Google’s Agent Development Kit (ADK) makes building them easier for developers.
By the end of this post, you’ll understand what multi-agent systems are, how to structure them, and how to enable communication between your agents using ADK.
Let’s dive in.
What Is a Multi-Agent System?
At its core, a multi-agent system is a collection of individual, autonomous agents that collaborate to achieve a goal. To truly grasp this, let’s break it down into three key ideas:
Decentralized Control: There’s no single “boss” agent controlling everything. Each agent makes its own decisions based on its own rules and local information. Think of a flock of birds swirling in the sky, there’s no leader, but together they form incredible, coordinated patterns.
Local Views: Each agent only has a partial view of the system. It perceives and reacts to its immediate environment, not the entire system state. Imagine standing in a crowded stadium; you only see and react to the people directly around you, not the entire crowd simultaneously.
Emergent Behavior: This is where the magic happens. From these simple, local interactions, complex and intelligent global behaviors emerge. Agents working together in this way can solve tasks that no single agent could easily accomplish alone.
This collaborative approach allows for robust, scalable, and flexible solutions to complex problems.
How ADK Supports Multi-Agent Systems
Google’s Agent Development Kit (ADK) was built from the ground up with multi-agent systems in mind. Instead of forcing you to hack different components together, it provides a structured framework with three primary types of agents, each with a specific role:
LLM Agents: These are the “brains” of the operation. They leverage large language models like Gemini to understand natural language input, reason through problems, and decide on the next course of action.
Workflow Agents: These are the “managers” that orchestrate how tasks get done. They don’t perform the work themselves but instead direct the flow of execution among other agents. We’ll explore these in detail later.
Custom Agents: These are the “specialists.” When you need full control or specific logic that doesn’t fit the other agent types, you can write your own Python code by inheriting from BaseAgent.
The Foundational Concept: Agent Hierarchy
When you build with ADK, agents are organized into a hierarchy, much like a company’s organizational chart. This structure is the backbone of your system and is governed by two simple rules:
Parent & Sub-Agents: A parent agent can manage one or more sub-agents, delegating tasks to them.
Single Parent Rule: Each agent can have only one parent, ensuring a clear line of command and data flow.
Think of it like this: the root agent is the CEO, who oversees the entire operation. Its sub-agents might be VPs, who in turn manage directors, managers, and individual contributors. Everyone has a defined role, and together they accomplish the company’s mission. See example here.
This hierarchical structure is fundamental to organizing and scaling your multi-agent system.
Orchestrating Tasks with Workflow Agents
So, we have a hierarchy. But how do we control the flow of work within that structure? This is where Workflow Agents shine. ADK provides three pre-built orchestrators to manage sub-agents:
SequentialAgent: This agent functions like an assembly line. It runs its sub-agents one after another, in a predefined order. The output of one agent can be passed as the input to the next, making it perfect for multi-step pipelines like: fetch data → clean data → analyze data → summarize findings. See example here.
ParallelAgent: This agent acts like a manager assigning tasks to multiple employees at once. It runs all its sub-agents concurrently, which is ideal for independent tasks that can be performed simultaneously, such as calling three different APIs to gather information. See example here.
LoopAgent: This agent works like a while loop in programming. It repeatedly executes its sub-agents until a specific condition is met or a maximum number of iterations is reached. This is useful for tasks like polling an API for a status update or retrying an operation until it succeeds. See example here.
Using these workflow agents, you can build complex and dynamic execution paths without getting lost in boilerplate code.
How Do Agents Communicate?
We have our structure and our managers. The final piece of the puzzle is communication. How do agents actually share information and delegate work? ADK provides three primary communication mechanisms.
Shared Session State
Shared Session State is like a shared digital whiteboard. An agent can write its result to a common state object, and other agents in the hierarchy can read that information to inform their own actions. For example, an LLMAgent can analyze user input and save the key entities to the state, allowing a CustomAgent to then use those entities to query a database.
LLM-Driven Delegation
LLM-Driven Delegation is a more dynamic and intelligent form of communication. A parent agent (often an LLMAgent) can act as a coordinator. It analyzes the incoming request and uses its reasoning capabilities to decide which of its sub-agents is best suited to handle the task. For instance, if a user asks to “generate an invoice for last month,” the coordinator agent can dynamically route the request to a specialized BillingAgent.
Explicit Invocation (AgentTool)
Explicit Invocation (AgentTool) describes a pattern where one agent can directly call another agent as if it were a function. This is achieved by wrapping the target agent as a “tool” that the parent agent can choose to invoke. For example, a primary analysis agent might call a CalculatorAgent tool whenever it encounters a task requiring precise mathematical calculations.
It’s important to understand the distinction between a sub-agent and an AgentTool:
A Sub-Agent is a permanent part of the hierarchy—an employee on the org chart, always managed by its parent.
An AgentTool is like an external consultant. You call on them when you need their specific expertise, but they aren’t part of your core team structure.
Wrapping up
Let’s quickly recap what we’ve covered:
Multi-Agent Systems are powerful because they use decentralized control and local views to produce complex, emergent behaviors.
ADK provides a robust framework with three agent categories: LLM (brains), Workflow (managers), and Custom (specialists).
Agent Hierarchy provides the organizational structure for your system, defining clear parent-child relationships.
Workflow Agents (Sequential, Parallel, Loop) give you the patterns to orchestrate complex task flows.
Communication Mechanisms (Shared State, Delegation, and Explicit Invocation) allow your agents to collaborate effectively.
Together, these concepts make your multi-agent systems not just structured, but truly collaborative, flexible, and intelligent. Now you have the foundational knowledge to start building your own multi-agent applications with ADK. You can start coding the following tutorial here!
Do you find yourself battling surprise cloud bills? Do you spend more time tracking down un-tagged resources and chasing development teams than you do on strategic financial planning? In the fast-paced world of cloud, manual cost management is a losing game. It’s time-consuming, prone to errors, and often, by the time you’ve identified a cost anomaly, it’s too late to prevent the impact.
What if you could codify your financial governance policies and automate their enforcement across your entire Google Cloud organization? Enter Workload Manager (WLM), a powerful tool that lets you automate the validation of your cloud workloads against best practices for security and compliance, including your own custom-defined FinOps rules. Better yet, we recently slashed the cost of using Workload Manager by up to 95% for certain scenarios, letting you run large-scale scans more economically, including a small free tier to help you run small-scale tests. In this blog, we show you how to get started with automated financial governance policies in Workload Manager, so you can stop playing catch-up and start proactively managing your cloud spend.
The challenge with manual FinOps
Managing business-critical workloads in the cloud is complex. Staying on top of cost-control best practices is a significant and time-consuming effort. Manual reviews and audits can take weeks or even months to complete, by which time costs can spiral. This manual approach often leads to “configuration drift,” where systems deviate from your established cost management policies, making it difficult to detect and control spending.
Workload Manager helps you break free from these manual constraints by providing a framework for automated, continuous validation, helping FinOps teams to:
Improve standardization: Decouple team dependencies and drive consistent application of cost-control policies across the organization.
Enable ownership: Empower individual teams to build and manage their own detection rules for specific use cases, fostering a culture of financial accountability.
Simplify auditing: Easily run infrastructure checks across your entire organization and consolidate the findings into a single BigQuery dataset for streamlined reporting and analysis.
By codifying your FinOps policies, you can define them once and run continuous scans to detect violations across your entire cloud environment on a regular schedule.
Workload Manager makes this easy, providing you with out-of-the-box rules across Security, Cost, Reliability etc. Here are some examples of FinOps cost management policies that can be automated with Workload Manager:
Must have required label or tag for a specific google cloud resource (eg: BigQuery dataset)
Enforce lifecycle management or autoclass configuration for every cloud storage bucket
Ensure appropriate data retention is set for storage (eg: BigQuery tables)
Disable simultaneous multi-threading to optimize licensing costs (eg: SQL Server)
Figure – 1: Default Workload Manager policies as per Google Cloud best practices
Don’t find what you need? You can always build your own custom policies using examples in our Git repo.
Let’s take a closer look.
Automating FinOps policies: A step-by-step guide
Here’s how you can use Workload Manager to automate your cost management policies.
Step 1: Define your FinOps rules and create a new evaluation
First, you need to translate your cost management policies into a format that the Workload Manager can understand. The tool uses Open Policy Agent (OPA) Rego for defining custom rules. In this blog we will take a primary use case for FinOps — that is, to ensure resources are properly labeled for cost allocation and showback.
You can choose from hundreds of predefined rules authored by Google Cloud experts that cover FinOps, reliability, security, and operations according to the Google Cloud best practices or create and customize your own rules (checkout examples from the Google Cloud GitHub repository). In our example we will use one of the predefined ‘Google Cloud Best Practices’ rules for bigquery-missing-labels on a dataset. In this case, navigate to the Workload Manager section in your Google Cloud Console and start by creating a new evaluation.
Give your evaluation a name and select “Custom” as the workload type. This is where you can point Workload Manager to the Cloud Storage bucket that contains your custom FinOps rules if you’ve built one. The experience allows you to run both pre-defined and custom rule checks in one evaluation.
Figure 2 – Creating new evaluation rule
Step 2: Define the scope of your scan
Next, define the scope of your evaluation. You have the flexibility to scan your entire Google Cloud organization, specific folders, or individual projects. This allows you to apply broad cost-governance policies organization-wide, or create more targeted rules for specific teams or environments. You can also apply filters based on resource labels or names for more granular control. In this example, region selection lets you select where you want to process your data to meet data residency requirements.
Figure 3 – Selecting scope and location for your evaluation rule
Step 3: Schedule and notify
With FinOps, automation is key. You can schedule your evaluation to run at a specific cadence, from hourly to monthly. This helps ensure continuous monitoring and provides a historical record of your policy compliance. Optionally, but highly recommended for FinOps, you can configure the evaluation to save all results to a BigQuery dataset for historical analysis and reporting.
You can also set up notifications to alert the right teams when an issue is found. Channels include email, Slack, PagerDuty, and more, so that policy violations can be addressed promptly.
Figure 4 – Export, schedule and notify evaluation rules
Step 4: Run, review, and report
Once saved, the evaluation will run on your defined schedule, or you can trigger it on-demand. The results of each scan are stored, providing a historical view of your compliance posture
From the Workload Manager dashboard, you can see a summary of scanned resources, issues found, and trends over time. For deeper analysis, you can explore the violation data directly in the BigQuery dataset you configured earlier.
Figure – 5: Checkout evaluations for workload manager
Visualize findings with Looker Studio
To make the data accessible and actionable for all stakeholders, you can easily connect your BigQuery results to Looker Studio. Create interactive dashboards that visualize your FinOps policy violations, such as assets missing required labels or resources that don’t comply with cost-saving rules. This provides a clear, at-a-glance view of your cost governance status.
You can find Looker Studio template in template gallery and easily connect it with your datasets and modify as needed. Here is how you can use it:
Click on “Use your own Data” that asks for connecting the Bigquery table generated in previous steps.
After you have connected the Bigquery dataset, lick on Edit to create a customizable copy to incorporate any changes or share it with your team.
Figure – 6: Set up preconfigured Looker Studio dashboard for reporting
Take control of your cloud costs today
Stop the endless cycle of manual cloud cost management. With Workload Manager, you can embed your FinOps policies directly into your cloud environment, automate enforcement, and provide teams with the feedback they need to stay on budget.
Ready to get started? Explore the sample policies on GitHub and check out the official documentation to begin automating your FinOps framework today, and take advantage of Workload Manager’s new pricing.
Check out a quick overview video on how Workload Manager Evaluations helps you do a lot more across Security, Reliability and FinOps.
When we talk about artificial intelligence (AI), we often focus on the models, the powerful TPUs and GPUs, and the massive datasets. But behind the scenes, there’s an unsung hero making it all possible: networking. While it’s often abstracted away, networking is the crucial connective tissue that enables your AI workloads to function efficiently, securely, and at scale.
In this post, we explore seven key ways networking interacts with your AI workloads on Google Cloud, from accessing public APIs to enabling next-generation, AI-driven network operations.
#1 – Securely accessing AI APIs
Many of the powerful AI models available today, like Gemini on Vertex AI, are accessed via public APIs. When you make a call to an endpoint like *-aiplatform.googleapis.com, you’re dependent on a reliable network connection. To gain access these endpoints require proper authentication. This ensures that only authorized users and applications can access these powerful models, helping to safeguard your data and your AI investments. You can also access these endpoints privately, which we will see in more detail in point # 5.
#2 – Exposing models for inference
Once you’ve trained or tuned your model, you need to make it available for inference. In addition to managed offerings in Google Cloud, you also have the flexibility to deploy your models on infrastructure you control, using specialized VM families with powerful GPUs. For example, you can deploy your model on Google Kubernetes Engine (GKE) and use the GKE Inference Gateway, Cloud Load Balancing, or a ClusterIP to expose it for private or public inference. These networking components act as the entry point for your applications, allowing them to interact with your model deployments seamlessly and reliably.
#3 – High-speed GPU-to-GPU communication
AI workloads, especially training, involve moving massive amounts of data between GPUs. Traditional networking, which relies on CPU copy operations, can create bottlenecks. This is where protocols like Remote Direct Memory Access (RDMA) come in. RDMA bypasses the CPU, allowing for direct memory-to-memory communication between GPUs.
To support this, the underlying network must be lossless and high-performance. Google has built out a non-blocking rail-aligned network topology in its data center architecture to support RDMA communication and node scaling. Several high-performance GPU VM families support RDMA over Converged Ethernet (RoCEv2), providing the speed and efficiency needed for demanding AI workloads.
#4 – Data ingestion and storage connectivity
Your AI models are only as good as the data they’re trained on. This data needs to be stored, accessed, and retrieved efficiently. Google Cloud offers a variety of storage options, for example Google Cloud Storage, Hyperdisk ML and Managed Lustre. Networking is what connects your compute resources to your data. Whether you’re accessing data directly or over the network, having a high-throughput, low-latency connection to your storage is essential for keeping your AI pipeline running smoothly.
#5 – Private connectivity to AI workloads
Security is paramount, and you often need to ensure that your AI workloads are not exposed to the public internet. Google Cloud provides several ways to achieve private communication to both managed Vertex AI services and your own DIY AI deployments. These include:
Private Service Connect: Allows you to access Google APIs and managed services privately from your VPC. You can use PSC endpoints to connect to your own services or Google services.
#6 – Bridging the gap with hybrid cloud connections
Many enterprises have a hybrid cloud strategy, with sensitive data remaining on-premises. The Cross-Cloud Network allows you to architect your network to provide any-to-any connectivity. With design cases covering distributed applications, Global front end, and Cloud WAN, you can build your architecture securely from on-premises, other clouds or other VPCs to connect to your AI workloads. This hybrid connectivity allows you to leverage the scalability of Google Cloud’s AI services while keeping your data secured.
#7 – The Future: AI-driven network operations
The relationship between AI and networking is becoming a two-way street. With Gemini for Google Cloud, network engineers can now use natural language to design, optimize, and troubleshoot their network architectures. This is the first step towards what we call “agentic networking,” where autonomous AI agents can proactively detect, diagnose, and even mitigate network issues. This transforms network engineering from a reactive discipline to a predictive and proactive one, ensuring your network is always optimized for your AI workloads.
Upgrading a Kubernetes cluster has always been a one-way street: you move forward, and if the control plane has an issue, your only option is to roll forward with a fix. This adds significant risk to routine maintenance, a problem made worse as organizations upgrade more frequently for new AI features while demanding maximum reliability. Today, in partnership with the Kubernetes community, we are introducing a new capability in Kubernetes 1.33 that solves this: Kubernetes control-plane minor-version rollback. For the first time, you have a reliable path to revert a control-plane upgrade, fundamentally changing cluster lifecycle management.This feature is available in open-source Kubernetes, and is integrated and generally available in Google Kubernetes Engine starting in GKE 1.33 soon.
The challenge: Why were rollbacks so hard?
Kubernetes’ control plane components, especially kube-apiserver and etcd, are stateful and highly sensitive to API version changes. When you upgrade, many new APIs and features are introduced in the new binary. Some data might be migrated to new formats and API versions. Downgrading was unsupported because there was no mechanism to safely revert changes, risking data corruption and complete cluster failure.
As a simple example, consider adding a new field to an existing resource. Until now, both the storage and API progressed in a single step, allowing clients to write data to that new field immediately. If a regression was detected, rolling back removed access to that field, but the data written to it would not be garbage-collected. Instead, it would persist silently in etcd. This left the administrator in an impossible situation. Worse, upon a future re-upgrade to that minor version, this stale “garbage” data could suddenly become “alive” again, introducing potentially problematic and indeterministic behavior.
The solution: Emulated versions
The Kubernetes Enhancement Proposal (KEP), KEP-4330: Compatibility Versions, introduces the concept of an “emulated version” for the control plane. Contributed by Googlers, this creates a new two-step upgrade process:
Step 1: Upgrade binaries. You upgrade the control plane binary, but the “emulated version” stays the same as the pre-upgrade version. At this stage, all APIs, features, and storage data formats remain unchanged. This makes it safe to roll back your control plane to the previously stable version if you find a problem.
Validate health and check for regressions. The 1st step creates a safe validation window during which you can verify that it’s safe to proceed — for example, making sure your own components or workloads are running healthy under the new binaries and checking for any performance regressions before committing to the new API versions.
Step 2:Finalize upgrade. After you complete your testing, you “bump” the emulated version to the new version. This enables all the new APIs and features of the latest Kubernetes release and completes the upgrade.
This two-step process gives you granular control, more observability, and a safe window for rollbacks. If an upgrade has an unexpected issue, you no longer need to scramble to roll forward. You now have a reliable way to revert to a known-good state, stabilize your cluster, and plan your next move calmly. This is all backed by comprehensive testing for the two-step upgrade in both open-source Kubernetes and GKE.
Enabling this was a major effort, and we want to thank all the Kubernetes contributors and feature owners whose collective work to test, comply, and adapt their features made this advanced capability a reality.
This feature, coming soon to GKE 1.33, gives you a new tool to de-risk upgrades and dramatically shorten recovery time from unforeseen complications.
A better upgrade experience in OSS Kubernetes
This rollback capability is just one part of our broader, long-term investment in improving the Kubernetes upgrade experience for the entire community. At Google, we’ve been working upstream on several other critical enhancements to make cluster operations smoother, safer, and more automated. Here are just a few examples:
Support for skip-version upgrades:Our work on KEP-4330 also makes it possible to enable “skip-level” upgrades for Kubernetes. This means that instead of having to upgrade sequentially through every minor version (e.g., v1.33 to v1.34 to v1.35), you will be able to upgrade directly from an older version to a newer one, potentially skipping one or more intermediate releases (e.g., v1.33 to v1.35). This aims to reduce the complexity and downtime associated with major upgrades, making the process more efficient and less disruptive for cluster operators.
Coordinated Leader Election (KEP-4355): This effort ensures that different control plane components (like kube-controller-manager and kube-scheduler) can gracefully handle leadership changes during an upgrade, so that the Kubernetes version skew policy is not violated.
Graceful Leader Transition (KEP-5366): Building on the above, this allows a leader to cleanly hand off its position before shutting down for an upgrade, enabling zero-downtime transitions for control plane components.
Mixed Version Proxy (KEP-4020): This feature improves API server reliability in mixed-version clusters (like during an upgrade). It prevents false “NotFound” errors by intelligently routing resource requests to a server that recognizes the resource. It also ensures discovery provides a complete list of all resources from all servers in a mixed-version cluster.
Component Health SLIs for Upgrades (KEP-3466): To upgrade safely, you need to know if the cluster is healthy. This KEP defines standardized Service Level Indicators (SLIs) for core Kubernetes components. This provides a clear, data-driven signal that can be used for automated upgrade canary analysis, stopping a bad rollout before it impacts the entire cluster.
Together, these features represent a major step forward in the maturity of Kubernetes cluster lifecycle management. We are incredibly proud to contribute this work to the open-source community and to bring these powerful capabilities to our GKE customers.
Learn more at KubeCon
Want to learn more about the open-source feature and how it’s changing upgrades? Come say hi to our team at KubeCon! You can find us at booths #200 and #1100 and at a variety of sessions, including:
This is what it looks like when open-source innovation and managed-service excellence come together. This new, safer upgrade feature is coming soon in GKE 1.33. To learn more about managing your clusters, check out the GKE documentation.
Every November, we make it our mission to equip organizations with the knowledge needed to stay ahead of threats we anticipate in the coming year. The Cybersecurity Forecast 2026 report, released today, provides comprehensive insights to help security leaders and teams prepare for those challenges.
This report does not contain “crystal ball” predictions. Instead, our forecasts are built on real-world trends and data we are observing right now. The information contained in the report comes directly from Google Cloud security leaders, and dozens of experts, analysts, researchers, and responders directly on the frontlines.
Artificial Intelligence, Cybercrime, and Nation States
Cybersecurity in the year ahead will be defined by rapid evolution and refinement by adversaries and defenders. Defenders will leverage artificial intelligence and agentic AI to protect against increasingly sophisticated and disruptive cybercrime operations, nation-state actors persisting on networks for long periods of time to conduct espionage and achieve other strategic goals, and adversaries who are also embracing artificial intelligence to scale and speed up attacks.
AI Threats
Adversaries Fully Embrace AI: We anticipate threat actors will move decisively from using AI as an exception to using it as the norm. They will leverage AI to enhance the speed, scope, and effectiveness of operations, streamlining and scaling attacks across the entire lifecycle.
Prompt Injection Risks: A critical and growing threat is prompt injection, an attack that manipulates AI to bypass its security protocols and follow an attacker’s hidden command. Expect a significant rise in targeted attacks on enterprise AI systems.
AI-Enabled Social Engineering: Threat actors will accelerate the use of highly manipulative AI-enabled social engineering. This includes vishing (voice phishing) with AI-driven voice cloning to create hyperrealistic impersonations of executives or IT staff, making attacks harder to detect and defend against.
AI Advantages
AI Agent Paradigm Shift: Widespread adoption of AI agents will create new security challenges, requiring organizations to develop new methodologies and tools to effectively map their new AI ecosystems. A key part of this will be the evolution of identity and access management (IAM) to treat AI agents as distinct digital actors with their own managed identities.
Supercharged Security Analysts: AI adoption will transform security analysts’ roles, shifting them from drowning in alerts to directing AI agents in an “Agentic SOC.” This will allow analysts to focus on strategic validation and high-level analysis, as AI handles data correlation, incident summaries, and threat intelligence drafting.
Cybercrime
Ransomware and Extortion: The combination of ransomware, data theft, and multifaceted extortion will remain the most financially disruptive category of cybercrime. The volume of activity is escalating, with focus on targeting third-party providers and exploiting zero-day vulnerabilities for high-volume data exfiltration.
The On-Chain Cybercrime Economy: As the financial sector increasingly adopts cryptocurrencies, threat actors are expected to migrate core components of their operations onto public blockchains for unprecedented resilience against traditional takedown efforts.
Virtualization Infrastructure Under Threat: As security controls mature in guest operating systems, adversaries are pivoting to the underlying virtualization infrastructure, which is becoming a critical blind spot. A single compromise here can grant control over the entire digital estate and render hundreds of systems inoperable in a matter of hours.
Nation States
Russia: Cyber operations are expected to undergo a strategic shift, prioritizing long-term global strategic goals and the development of advanced cyber capabilities over just tactical support for the conflict in Ukraine.
China: The volume of China-nexus cyber operations is expected to continue surpassing that of other nations. They will prioritize stealthy operations, aggressively targeting edge devices and exploiting zero-day vulnerabilities.
Iran: Driven by regional conflicts and the goal of regime stability, Iranian cyber activity will remain resilient, multifaceted, and semi-deniable, deliberately blurring the lines between espionage, disruption, and hacktivism.
North Korea: They will continue to conduct financial operations to generate revenue for the regime, cyber espionage against perceived adversaries, and seek to expand IT worker operations.
Be Prepared for 2026
Understanding threats is key to staying ahead of them. Read the full Cybersecurity Forecast 2026 report for a more in-depth look at the threats covered in this blog post. We have also released special reports that dive into some of the threats and challenges unique to EMEA and JAPAC organizations.
For an even deeper look at the threat landscape next year, register for our Cybersecurity Forecast 2026 webinar, which will be hosted once again by threat expert Andrew Kopcienski.
Data is the lifeblood of the modern enterprise, but the process of making it useful is often fraught with friction. Data engineers, analysts, and scientists—some of the most skilled and valuable talent in any organization—are spending a disproportionate amount of their time on repetitive, low-impact tasks. What if you could shift your focus from manually building and maintaining pipelines to defining the best practices and rules that automate them?
Today, we’re announcing a fundamental shift to solve this challenge. We’re excited to announce the preview of the Data Engineering Agent in BigQuery, a first-party agent designed to automate the most complex and time-consuming data engineering tasks, powered by Gemini.
The Data Engineering Agent isn’t just an incremental improvement; it’s fundamentally transforming the way we work, with truly autonomous data engineering operations. According to IDC, ‘GenAI and other automation solutions will drive over $1 trillion in productivity gains for companies by 2026’1.
Here is a closer look at the powerful capabilities you can access today:
Pipeline development and maintenance
The Data Engineering Agent makes it easy to build and maintain robust data pipelines. The agent is available in BigQuery pipelines and it can help you with:
Natural language pipeline creation: Describe your pipeline requirements in plain language, and the agent generates the necessary SQL code, adhering to data engineering best practices that you can customize through instruction files. For example: “Create a pipeline to load data from the ‘customer_orders’ bucket, standardize the date formats, remove duplicate entries, and load it into a BigQuery table named ‘clean_orders’.”
Intelligent pipeline modification: Need to update an existing pipeline? Just tell the agent what you want to change. It analyzes the existing code, and proposes the necessary modifications, leaving you to simply review and approve the changes. For example, you can ask it to “Create a pipeline to load data from the ‘customer_orders’ bucket, standardize the date formats, remove duplicate entries, and load it into a BigQuery table named ‘clean_orders’.” The agent follows best-practice design principles and helps you optimize and redesign your existing pipelines to eliminate redundant operations, as well as to leverage BigQuery’s query optimization features such as partitioning.
Dataplex Universal Catalog integration: The agent leverages Google Cloud’s Dataplex data governance offering. It automatically retrieves additional resource metadata such as business glossaries and data profiles from Dataplex to improve the relevance, table-metadata generation (new tables) and performance of the generated pipelines.
Custom agent instructions and logic: Incorporate your unique business logic and engineering best practices by providing custom instructions and leveraging User-Defined Functions (UDFs) within the pipeline.
Automated code documentation: The agent automatically generates clear and concise documentation for your pipelines along with column descriptions, making them easier to understand and maintain for the entire team.
Spanish-language news and entertainment group PRISA Media and early access customer has had a positive experience with the Data Engineering Agent.
“The agent provides solutions that enable us to explore new development approaches, showing strong potential to address complex data engineering tasks. It demonstrates an impressive ability to correctly interpret our requirements, even for sophisticated data modeling tasks like creating SCD Type 2 dimensions. In its current state, it already delivers value in automating maintenance and small optimizations, and we believe it has the foundation to become a truly distinctive tool in the future.” – Fernando Calo, Lead Data Engineer at the Spanish-language news and entertainment group PRISA
Data preparation, transformation and modeling
The first step in any data project is often the most time-consuming: understanding, preparing, and cleaning raw data. The Data Engineering Agent allows you, for example, to access raw files from Google Cloud Storage. It automatically cleans, deduplicates, formats and standardizes your data based on the provided instructions. Integration with Dataplex allows you to generate data quality assertions based on rules defined in the Dataplex repository and automatically encrypt columns that were flagged as containing Personally Identifiable Information (PII). No more writing complex queries to identify data quality issues or to standardize formats.
The agent can then generate the necessary code to perform essential data transformation tasks, significantly reducing the time it takes to get your data ready for analysis. This process covers operations like joining and aggregating datasets.
The agent assists with complex data modeling, too. You can use natural language prompts to generate sophisticated schemas, such as Data Vault or Star Schemas, directly from your source tables.
Pipeline troubleshooting
When issues arise, the Data Engineering Agent can help you quickly identify and resolve them. Instead of manually digging through logs and code, you invoke the agent to diagnose the problem. The Data Engineering Agent is integrated with Gemini Cloud Assist. It analyzes the execution logs, identifies the root cause of the failure, and suggests a solution, helping you get your pipelines back up and running in record time.
Pipeline migrations
For teams looking to modernize their data stack, the Data Engineering Agent can speed up the transition to a unified Google Cloud data platform. That’s what happened at Vodafone as it migrated to BigQuery.
“During the migration journey to a Dataform environment, the Data Engineer Agent successfully replicated all existing data and transformations scripts with 100% automation and zero manual intervention. This achievement resulted in a 90% reduction in the time typically required for manual ETL migration, significantly accelerating the transition.” – Chris Benfield, Head of Engineering, Vodafone
Customers have already migrated onto BigQuery pipelines to:
Standardize and unify code: If you’re looking to consolidate your processing engines, the agent helps you to standardize on BigQuery pipelines. Simply provide the agent with your existing code, and it will generate the equivalent, optimized BigQuery pipeline, reducing operational complexity and cost.
Migrate from legacy tools: The agent can translate proprietary formats and configurations from legacy data processing tools into native BigQuery pipelines.
The road ahead
This is just the beginning for the Data Engineering Agent. We are continuously working to expand its capabilities to address more challenges faced by data engineering teams. In the future, you can expect to see the agent extend its reach to include proactive troubleshooting, IDE integration, and pipeline orchestration in Cloud Composer.
Get started today
The BigQuery Data Engineering Agent is available now. We are excited to see how you integrate this new intelligent partner into your daily work.
Ready to transform your data engineering workflows?
Access the agent: Navigate to BigQuery Pipelines in BigQuery Studio or the Dataform UI. The Data Engineering Agent is accessible via the ‘Ask Agent’ button.
Learn more: Review the official documentation for setup instructions and best practices.
Mercado Libre, the e-commerce and fintech pioneer of Latin America, operates at a staggering scale, demanding an infrastructure that’s not just resilient and scalable, but also a catalyst for rapid innovation. While our use of Spanner for foundational consistency and scale is known, a deeper dive reveals a sophisticated, multi-layered strategy. Spanner is not just a database here; it’s a core engine powering our internal developer platform, diverse data models, advanced analytics loops, intelligent features, and even our roadmap for next-generation AI applications.
This blog explores the technical underpinnings of how Mercado Libre leverages Spanner in concert with our internal innovations like the Fury platform, achieving significant business impact and charting a course for an AI-driven future.
The dual challenge: internet-scale operations and developer velocity
Mercado Libre faces the classic challenges of internet-scale services: keeping millions of daily financial transactions safe, making it easy for developers to build apps, and maintaining near-perfect uptime. The solution required a database powerful enough for the core and an abstraction layer elegant enough for broad developer adoption.
Fury: Mercado Libre’s developer gateway
At the heart of Mercado Libre’s strategy is Fury, our in-house middleware platform. Fury is designed to abstract away the complexities of various backend technologies, providing developers with standardized, simplified interfaces to build applications.
Abstraction & Standardization: Fury allows development teams to focus on business logic rather than the nuances of distributed database management, schema design for specific engines, or optimal connection pooling.
Spanner as the Reliable Core: Spanner is an always-on, globally consistent, multi-model database with virtually unlimited scale.By designating Spanner as a choice within Fury, Mercado Libre ensures that applications built on the platform using Spanner inherit its best features – they stay consistent globally, scale without breaking, and rarely go down.
Fig. 1 – Fury’s core services
Spanner – the versatile backbone
Through Fury, Spanner empowers Mercado Libre’s developers with remarkable versatility. Some apps need complex transactions, others need fast lookups. Spanner handles both, which means teams can use just one system:
Relational prowess for complex transactions: For sophisticated transactional workloads like order management, payments, and inventory systems, Spanner’s relational capabilities (SQL, ACID transactions, joins) remain critical.
High-performance key-value store: Many modern applications require fast point lookups and simple data structures. While Spanner isn’t Mercado Libre’s default backend for typical key-value workloads, there are specific applications running large scale non-relational, KV-style workloads on the Spanner.
Spanner’s foundational architecture — TrueTime for global consistency and automated sharding for effortless scaling — makes it an ideal candidate to reliably serve both these access patterns through the Fury platform.
Handling peak demand
Mercado Libre’s Spanner instances demonstrate significant processing capacity, handling around 214K queries per second (QPS) and 30K transactions per second (TPS). To manage this substantial workload, the Spanner infrastructure dynamically scales to over 400 nodes (by 30%), highlighting the robust and elastic nature of the underlying system in accommodating high-demand scenarios. This level of throughput and scalability is critical for maintaining the performance and reliability of Mercado Libre’s services during its busiest times.
Fig. 2 – Diagram of the solution built with Spanner, which uses current search data to predict and recommend products that a customer is most likely to purchase.
Turning data into action
Mercado Libre builds a dynamic data ecosystem around Spanner, leveraging advanced analytics to feed insights directly back into operational systems.
They achieve real-time analytics by combining Spanner Data Boost with BigQuery Federation. Data Boost isolates analytical queries, preventing them from impacting critical transactional performance. This allows for powerful, large-scale analytics to run directly on fresh Spanner data within BigQuery, integrating seamlessly with other data sources.
Insights from BigQuery, such as customer segmentations or fraud scores, are then actioned via Reverse ETL, feeding directly back into Spanner. This enriches operational data, enabling immediate action by frontline applications like serving personalized content or performing real-time risk assessments.
Furthermore, Spanner Change Streams coupled with Dataflow drive crucial service integrations. By capturing real-time data modifications from Spanner, they establish robust pipelines. These enable loading changes into BigQuery for analytics or streaming them to services like Fury Stream for real-time consumption, ensuring low-latency data propagation and enabling event-driven architectures across their systems.
The impact: cost savings, agility, and future-proofing
The strategic adoption of Spanner, amplified by internal platforms like Fury and sophisticated data workflows, has yielded significant benefits for Mercado Libre:
Significant cost savings & low total cost of ownership: The combination of Spanner’s managed nature (reducing manual sharding, maintenance, and maintenance work), efficient resource utilization, and the abstraction provided by Fury has led to a lower Total Cost of Ownership and substantial cost savings.
Business impact & agility: Developers, freed from infrastructure complexities by Fury and empowered by Spanner’s versatile capabilities, can deliver new features and applications faster. The reliability of Spanner underpins critical business operations, minimizing disruptions.
Low operational overhead: Automated scaling, sharding, and maintenance in Spanner significantly reduce the human effort required to manage large-scale database infrastructure.
Building for AI: Next-generation applications on Spanner
Looking ahead, Mercado Libre is exploring Spanner to support more AI workloads.
Spanner’s characteristics make it an ideal foundation:
Consistent state management: Critical for AI systems that need to maintain and reliably update their state context.
Scalable memory/knowledge store: Ability to store and retrieve vast amounts of data for AI system memory, logs, and contextual information.
Transactional operations: Enabling AI systems to perform reliable actions that interact with other systems.
Integration with analytics & Machine Learning (ML): The existing data loops and ML.PREDICT capabilities can enrich AI systems with real-time insights and intelligence.
Spanner provides the transactional foundation these sophisticated, AI applications will require.
Conclusion: A Unified, Intelligent Data Foundation
Mercado Libre’s adoption of Spanner demonstrates how to use a powerful, globally consistent database not just for its core capabilities, but as a strategic enabler for developer productivity, operational efficiency, advanced analytics, and future AI ambitions. Through their Fury platform, they’ve simplified access to Spanner’s capabilities, allowing it to serve as a flexible foundation for both relational and non-relational needs. The integration with BigQuery via Data Boost demonstrates a comprehensive approach to building an intelligent, data-driven enterprise. As Mercado Libre builds AI applications, Spanner is set to continue its role as the consistent and scalable foundation for their next wave of innovation.
Engineering teams use Ray to scale AI workloads across a wide range of hardware, including both GPUs and Cloud TPUs. While Ray provides the core scaling capabilities, developers have often managed the unique architectural details of each accelerator. For Cloud TPUs, this included its specific networking model and Single Programming Multiple Data (SPMD) programming style.
As part of our partnership with Anyscale, we are working on reducing the engineering effort to get started with TPUs on Google Kubernetes Engine (GKE). Our goal is to make the Ray experience on TPUs as native and low-friction as possible.
Today, we are launching several key improvements that help make that possible.
Ray TPU Library for improved TPU awareness and scaling in Ray Core
TPUs have a unique architecture and a specific programming style called SPMD. Large AI jobs run on a TPU slice, which is a collection of chips connected by high-speed networking called interchip interconnect (ICI).
Previously, you needed to manually configure Ray to be aware of this specific hardware topology. This was a major setup step, and if done incorrectly, jobs could get fragmented resources from different, unconnected slices, causing severe performance bottlenecks.
This new library, ray.util.tpu, abstracts away these hardware details. It uses a feature called SlicePlacementGroup along with the new label_selector API to automatically reserve the entire, co-located TPU slice as one atomic unit. This guarantees the job runs on unified hardware, preventing performance issues from fragmentation. Because Ray couldn’t guarantee this single-slice atomicity before, building reliable true multi-slice training (which intentionally spans multiple unique slices) was impossible. This new API also provides the critical foundation for Ray users to use Multislice technology to scale using multiple TPU slices.
Expanded support for Jax, Ray Train and Ray Serve
Our developments cover both training and inference. For training, Ray Train now offers alpha support for JAX (via JaxTrainer) and PyTorch on TPUs.
The JaxTrainer API simplifies running JAX workloads on multi-host TPUs. It now automatically handles the complex distributed host initialization. As shown in the code example below, you only need to define your hardware needs—like the number of workers, topology, and accelerator type—within a simple ScalingConfig object. The JaxTrainer takes care of the rest.
This is a significant improvement because it solves a critical performance problem: resource fragmentation. Previously, a job requesting a “4×4” topology (which must run on a single co-located hardware unit called a slice) could instead receive fragmented resources—for example, eight chips from one physical slice and eight chips from a different, unconnected slice. This fragmentation was a major bottleneck, as it prevented the workload from using the high-speed ICI interconnect that only exists within a single, unified slice.
Example of how the JaxTrainer simplifies training on multi-host TPU:
code_block
<ListValue: [StructValue([(‘code’, ‘import jaxrnimport jax.numpy as jnprnimport optaxrnimport ray.trainrnrnfrom ray.train.v2.jax import JaxTrainerrnfrom ray.train import ScalingConfigrnrndef train_func():rn”””This function is run on each distributed worker.”””rn…rnrn# Define the hardware configuration for your distributed job.rnscaling_config = ScalingConfig(rnnum_workers=4,rnuse_tpu=True,rntopology=”4×4″,rnaccelerator_type=”TPU-V6E”,rnplacement_strategy=”SPREAD”rn)rnrn# Define and run the JaxTrainer.rntrainer = JaxTrainer(rntrain_loop_per_worker=train_func,rnscaling_config=scaling_config,rn)rnresult = trainer.fit()rnprint(f”Training finished on TPU v6e 4×4 slice”)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f30c26eae50>)])]>
Ray Serve APIs support TPUs and with the improvements we have made to vLLM TPU, you can continue to use Ray on vLLM when moving to TPUs. This allows you to use the same stack you use on GPUs and run it on TPUs with minimal code changes.
Label-based Scheduling API for easy obtainability
The new Label-Based Scheduling API integrates with GKEcustom compute classes. A custom compute class is a simple way to define a named hardware configuration. For example, you can create a class called cost-optimized that tells GKE to try acquiring a Spot instance first, then fall back to a Dynamic Workload Scheduler FlexStart instance, and finally to a reserved instance as a last resort. The new Ray API lets you use classes directly from Python. With a simple label_selector, you can request hardware like “TPU-V6E” or target your cost-optimized class, all without managing separate YAML files.
This same label_selector mechanism also exposes deep hardware control for TPUs. As GKE provisions the TPU pods for a slice, it injects metadata (like worker rank and topology) into each one. KubeRay (which manages Ray on GKE) then reads this GKE-provided metadata and automatically translates it into Ray-specific labels as it creates the nodes. This provides key information like the TPU generation (ray.io/accelerator-type), the physical chip topology (ray.io/tpu-topology), and the worker rank within the slice (ray.io/tpu-worker-id).These node labels let you use a Ray label_selector to pin SPMD workloads to specific, co-located hardware, such as a “4×4” topology or a particular worker rank.
In the example below, a Ray user can request a v6e-32 TPU slice but instruct GKE to use custom compute classes to fallback to v5e-16 if that’s not available. Similarly, the user could start by requesting spot or DWS resources and if not available, fallback to reservation instances.
Developers select compute and nodepools
Platform Admins set up Kubernetes
@ray.remote(num_cpu=1, label_selector={ “ray.io/tpu-pod-type”: “v6e-32”, “gke-flex-start”: “true”, }, fallback_strategy=[ {“label_selector”: { “ray.io/tpu-pod-type”: “v5litepod-16”, “reservation-name”: “v5e-reservation”, } }, ] ) def tpu_task(): # Attempts to run on a node in a v6e 4×8 # TPU slice, falling back to a node in a # v5e 4×4 TPU if v6e is unavailable. …
You can now see key TPU performance metrics, like TensorCore utilization, duty cycle, High-Bandwidth Memory (HBM) usage, and memory bandwidth utilization, directly in the Ray Dashboard. We’ve also added low-level libtpu logs. This makes debugging much faster, as you can immediately check if a failure is caused by the code or by the TPU hardware itself.
Get started today
Together, these updates are a significant step toward making TPUs a seamless part of the Ray ecosystem. They make adapting your existing Ray applications between GPUs and TPUs a much more straightforward process. Here’s how to learn more and get started:
Scientific inquiry has always been a journey of curiosity, meticulous effort, and groundbreaking discoveries. Today, that journey is being redefined, fueled by the incredible capabilities of AI. It’s moving beyond simply processing data to actively participating in every stage of discovery, and Google Cloud is at the forefront of this transformation, building the tools and platforms that make it possible.
The sheer volume of data generated by modern research is immense, often too vast for human analysis alone. This is where AI steps in, not just as a tool, but as a collaborative force. We’re seeing powerful new models and AI agents assist with everything from identifying relevant literature and generating novel hypotheses to designing experiments, running simulations, and making sense of complex results. This collaboration doesn’t replace human intellect; it amplifies it, allowing researchers to explore more avenues, more quickly, and with greater precision.
At Google Cloud, we’re bringing together high-performance computing (HPC) and advanced AI on a single, integrated platform. This means you can seamlessly move from running massive-scale simulations to applying sophisticated machine learning models, all in one environment.
So, how can you leverage these capabilities to get to insights faster? The journey begins at the foundation of scientific inquiry: the hypothesis.
AI-enhanced scientific inquiry
Every great discovery starts with a powerful hypothesis. With millions of research papers published annually, identifying novel opportunities is a monumental task. To overcome this information overload, scientists can now turn to AI as a powerful research partner.
Our Deep Research agent tackles the first step: performing a comprehensive analysis of published literature to produce detailed reports on a given topic that would otherwise take months to compile. Building on that foundation, our Idea Generation agent then deploys an ensemble of AI collaborators to brainstorm, evaluate, propose, debate, and rank novel hypotheses. This powerful combination, available in Gemini Enterprise, transforms the initial phase of scientific inquiry, empowering researchers to augment their expertise and find connections they might otherwise miss.
Go from hypothesis to results, faster
Once a hypothesis is formed, the work of translating it into executable code begins. This is where AI coding assistants, such as Gemini Code Assist, excel. They automate the tedious tasks of writing analysis scripts and simulation models by generating code from natural language and providing real-time suggestions, dramatically speeding up the core development process.
But modern research is more than just a single script; it’s a complete workflow of data, environments, and results managed from the command line. For this, Gemini CLI brings that same conversational power directly to your terminal. It acts as the ultimate workflow accelerator, allowing you to instantly synthesize research and generate hypotheses with simple commands, then seamlessly transition to experimentation by generating sophisticated analysis scripts, and debugging errors on the fly, all without ever breaking your focus. Gemini CLI can further accelerate your path to impact by transforming raw results into publication-ready text, generating the code for figures and tables, and refining your work for submission.
This capability extends to automating the entire research environment. Beyond single commands, Gemini CLI can manage complex, multi-step processes like cloning a scientific application, installing its dependencies, and then building and testing it—all with a simple prompt, maximizing your productivity.
The new era of discovery: Your expertise, AI agents, and Google Cloud
The new era of scientific discovery is here. By embedding AI into every stage of the scientific process – from sparking the initial idea to accelerating the final analysis – Google Cloud provides a single, unified platform for discovery. This new era of AI-enhanced scientific inquiry is built on a robust, intelligent infrastructure that combines the strengths of HPC simulation and AI. This includes purpose-built solutions like our H4D VMs optimized for scientific simulations, alongside the latest A4 and A4X VMs, powered by the latest NVIDIA GPUs, and Google Cloud Managed Lustre, a parallel file system that eliminates storage bottlenecks and allows your HPC and AI workloads to create and analyze massive datasets simultaneously. We provide the power to streamline the entire process so you can focus on scientific creativity – and changing the world!
Join the Google Cloud Advanced Computing Community to connect with other researchers, share best practices, and stay up to date on the latest advancements in AI for scientific and technical computing, or contact sales to get started today.