2025 11 06

GCP – Announcing Ironwood TPUs General Availability and new Axion VMs to power the age of inference

Today’s frontier models, including Google’s Gemini, Veo, Imagen, and Anthropic’s Claude train and serve on Tensor Processing Units (TPUs). For many organizations, the focus is shifting from training these models to powering useful, responsive interactions with them. Constantly shifting model architectures, the rise of agentic workflows, plus near-exponential growth in demand for compute, define this new age of inference. In particular, agentic workflows that require orchestration and tight coordination between general-purpose compute and ML acceleration are creating new opportunities for custom silicon and vertically co-optimized system architectures.

We have been preparing for this transition for some time and today, we are announcing the availability of three new products built on custom silicon that deliver exceptional performance, lower costs, and enable new capabilities for inference and agentic workloads:

Ironwood, our seventh generation TPU, will be generally available in the coming weeks. Ironwood is purpose-built for the most demanding workloads: from large-scale model training and complex reinforcement learning (RL) to high-volume, low-latency AI inference and model serving. It offers a 10X peak performance improvement over TPU v5p and more than 4X better performance per chip for both training and inference workloads compared to TPU v6e (Trillium), making Ironwood our most powerful and energy-efficient custom silicon to date.
New Arm®-based Axion instances. N4A, our most cost-effective N series virtual machine to date, is now in preview. N4A offers up to 2x better price-performance than comparable current-generation x86-based VMs. We are also pleased to announce C4A metal, our first Arm-based bare metal instance, will be coming soon in preview.

Ironwood and these new Axion instances are just the latest in a long history of custom silicon innovation at Google, including TPUs, Video Coding Units (VCU) for YouTube, and five generations of Tensor chips for mobile. In each case, we build these processors to enable breakthroughs in performance that are only possible through deep, system-level co-design, with model research, software, and hardware development under one roof. This is how we built the first TPU ten years ago, which in turn unlocked the invention of the Transformer eight years ago — the very architecture that powers most of modern AI. It has also influenced more recent advancements like our Titanium architecture, and advanced liquid cooling that we’ve deployed at GigaWatt scale with fleet-wide uptime of ~99.999% since 2020.

Pictured: An Ironwood board showing three Ironwood TPUs connected to liquid cooling.

Pictured: Third-generation Cooling Distribution Units, providing liquid cooling to an Ironwood superpod.

Ironwood: The fastest path from model training to planet-scale inference

The early response to Ironwood is overwhelmingly enthusiastic. Anthropic is compelled by the impressive price-performance gains that accelerate their path from training massive Claude models to serving them to millions of users. In fact, Anthropic plans to access up to 1 million TPUs:

“Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work. As demand continues to grow exponentially, we’re increasing our compute resources as we push the boundaries of AI research and product development. Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect.” – James Bradbury, Head of Compute, Anthropic

Ironwood is being used by organizations of all sizes and across industries:

“Our mission at Lightricks is to define the cutting edge of open creativity, and that demands AI infrastructure that eliminates friction and cost at scale. We relied on Google Cloud TPUs and its massive ICI domain to achieve our breakthrough training efficiency for LTX-2, our leading open-source multimodal generative model. Now, as we enter the age of inference, our early testing makes us highly enthusiastic about Ironwood. We believe that Ironwood will enable us to create more nuanced, precise, and higher-fidelity image and video generation for our millions of global customers.” – Yoav HaCohen, Research Director, GenAI Foundational Models, Lightricks

“At Essential AI, our mission is to build powerful, open frontier models. We need massive, efficient scale, and Google Cloud’s Ironwood TPUs deliver exactly that. The platform was incredibly easy to onboard, allowing our engineers to immediately leverage its power and focus on accelerating AI breakthroughs.” – Philip Monk, Infrastructure Lead, Essential AI

System-level design maximizes inference performance, reliability, and cost

TPUs are a key component of AI Hypercomputer, our integrated supercomputing system that brings together compute, networking, storage, and software to improve system-level performance and efficiency. At the macro level, according to a recent IDC report, AI Hypercomputer customers achieved on average 353% three-year ROI, 28% lower IT costs, and 55% more efficient IT teams.

Ironwood TPUs will help customers push the limits of scale and efficiency even further. When you deploy TPUs, the system connects each individual chip to each other, creating a pod — allowing the interconnected TPUs to work as a single unit. With Ironwood, we can scale up to 9,216 chips in a superpod linked with breakthrough Inter-Chip Interconnect (ICI) networking at 9.6 Tb/s. This massive connectivity allows thousands of chips to quickly communicate with each other and access a staggering 1.77 Petabytes of shared High Bandwidth Memory (HBM), overcoming data bottlenecks for even the most demanding models.

Pictured: Part of an Ironwood superpod, directly connecting 9,216 Ironwood TPUs in a single domain.

At that scale, services demand uninterrupted availability. That’s why our Optical Circuit Switching (OCS) technology acts as a dynamic, reconfigurable fabric, instantly routing around interruptions to restore the workload while your services keep running. And when you need more power, Ironwood scales across pods into clusters of hundreds of thousands of TPUs.

Pictured: Jupiter data center network enables the connection of multiple Ironwood superpods into clusters of hundreds of thousands of TPUs.

The AI Hypercomputer advantage: Hardware and software co-designed for faster, more efficient outcomes

On top of this hardware is a co-designed software layer, where our goal is to maximize Ironwood’s massive processing power and memory, and make it easy to use throughout the AI lifecycle.

To improve fleet efficiency and operations, we’re excited to announce that TPU customers can now benefit from Cluster Director capabilities in Google Kubernetes Engine. This includes advanced maintenance and topology awareness for intelligent scheduling and highly resilient clusters.
For pre-training and post-training, we’re also sharing new enhancements to MaxText, a high-performance, open source LLM framework, to make it easier to implement the latest training and reinforcement learning optimization techniques, such as Supervised Fine-Tuning (SFT) and Generative Reinforcement Policy Optimization (GRPO).
For inference, we recently announced enhanced support for TPUs in vLLM, allowing developers to switch between GPUs and TPUs, or run both, with only a few minor configuration changes, and GKE Inference Gateway, which intelligently load balances across TPU servers to reduce time-to-first-token (TTFT) latency by up to 96% and serving costs by up to 30%.

Our software layer is what enables AI Hypercomputer’s high performance and reliability for training, tuning, and serving demanding AI workloads at scale. Thanks to deep integrations across the stack — from data-center-wide hardware optimizations to open software and managed services— Ironwood TPUs are our most powerful and energy-efficient TPUs to date. Learn more about our approach to hardware and software co-design here.

Axion: Redefining general-purpose compute

Building and serving modern applications requires both highly specialized accelerators and powerful, efficient general-purpose compute. This was our vision for Axion, our custom Arm Neoverse®-based CPUs, which we designed to deliver compelling performance, cost and energy efficiency for everyday workloads.

Today, we are expanding our Axion portfolio with:

N4A (preview), our second general-purpose Axion VM, which is ideal for microservices, containerized applications, open-source databases, batch, data analytics, development environments, experimentation, data preparation and web serving jobs that make AI applications possible. Learn more about N4A here.
C4A metal (in preview soon), our first Arm-based bare-metal instance, which provides dedicated physical servers for specialized workloads such Android development, automotive in-car systems, software with strict licensing requirements, scale test farms, or running complex simulations. Learn more about C4A metal here.

With today’s announcements, the Axion portfolio now includes three powerful options, N4A, C4A and C4A metal. Together, the C and N series allow you to lower the total cost of running your business without compromising on performance or workload-specific requirements.

Axion-based Instance	Optimized for	Key Features
N4A (preview)	Price-performance and flexibility	Up to 64 vCPUs, 512GB of DDR5 Memory, and 50 Gbps networking, with support for Custom Machine Types, Hyperdisk Balanced and Throughput storage.
C4A Metal (in preview soon)	Specialized workloads, such as Hypervisors and native Arm development	Up to 96 vCPUs, 768GB of DDR5 Memory, Hyperdisk storage and up to 100Gbps of networking
C4A	Consistently high performance	Up to 72 vCPUs, 576GB of DDR5 Memory, 100Gbps of Tier 1 networking, Titanium SSD with up to 6TB of local capacity, advanced maintenance controls and support for Hyperdisk Balanced, Throughput, and Extreme.

Axion’s inherent efficiency also makes it a valuable option for modern AI workflows. While specialized accelerators like Ironwood handle the complex task of model serving, Axion excels at the operational backbone: supporting high-volume data preparation, ingestion, and running application servers that host your intelligent applications. Axion is already translating into customer impact:

“At Vimeo, we have long relied on Custom Machine Types to efficiently manage our massive video transcoding platform. Our initial tests on the new Axion-based N4A instances have been very compelling, unlocking a new level of efficiency. We’ve observed a 30% improvement in performance for our core transcoding workload compared to comparable x86 VMs. This points to a clear path for improving our unit economics and scaling our services more profitably, without changing our operational model.” – Joe Peled, Sr. Director of Hosting & Delivery Ops, Vimeo

“At ZoomInfo, we operate a massive data intelligence platform where efficiency is paramount. Our core data processing pipelines, which are critical for delivering timely insights to our customers, run extensively on Dataflow and Java services in GKE. In our preview of the new N4A instances, we measured a 60% improvement in price-performance for these key workloads compared to their x86-based counterparts. This allows us to scale our platform more efficiently and deliver more value to our customers, faster.” – Sergei Koren, Chief Infrastructure Architect, ZoomInfo

“Migrating to Google Cloud’s Axion portfolio gave us a critical competitive advantage. We slashed our compute consumption by 20% while maintaining low and stable latency with C4A instances, such as our Supply-Side Platform (SSP) backend service. Additionally, C4A enabled us to leverage Hyperdisk with precisely the IOPS we need for our stateful workloads, regardless of instance size. This flexibility gives us the best of both worlds – allowing us to win more ad auctions for our clients while significantly improving our margins. We’re now testing the N4A family by running some of our key workloads that require the most flexibility, such as our API relay service. We are happy to share that several applications running in production are consuming 15% less CPU compared to our previous infrastructure, reducing our costs further, while ensuring that the right instance backs the workload characteristics required.” – Or Ben Dahan, Cloud & Software Architect, Rise

A powerful combination for AI and everyday computing

To thrive in an era with constantly shifting model architectures, software, and techniques, you need a combination of purpose-built AI accelerators for model training and serving, alongside efficient, general-purpose CPUs for the everyday workloads, including the workloads that support those AI applications.

Ultimately, whether you use Ironwood and Axion together or mix and match them with the other compute options available on AI Hypercomputer, this system-level approach gives you the ultimate flexibility and capability for the most demanding workloads. Sign up to test Ironwood, Axion N4A, or C4A metal today.

GCP – Announcing Ironwood TPUs General Availability and new Axion VMs to power the age of inference

Ironwood: The fastest path from model training to planet-scale inference

System-level design maximizes inference performance, reliability, and cost

The AI Hypercomputer advantage: Hardware and software co-designed for faster, more efficient outcomes

Axion: Redefining general-purpose compute

A powerful combination for AI and everyday computing

Related Posts

AWS – Amazon SageMaker launches custom tags for project resources

AWS – AWS B2B Data Interchange is now available in AWS Europe (Ireland) Region

AWS – Amazon ECS announces non-root container support for managed EBS volumes