Cloud

2025 11 07

AWS – AWS Advanced .NET Data Provider Driver is Generally Available

The Amazon Web Services (AWS) Advanced .NET Data Provider Driver is now generally available for Amazon RDS and Amazon Aurora PostgreSQL and MySQL-compatible databases. This advanced database driver reduces RDS Blue/Green switchover and database failover times, improving application availability. Additionally, it supports multiple authentication mechanisms for your database, including Federated Authentication, AWS Secrets Manager authentication, and token-based authentication with AWS Identity and Access Management (IAM).

The driver builds on top of Npgsql PostgreSQL, native MySql.Data, and MySqlConnector drivers to further enhance functionality beyond standard database connectivity. The driver is natively integrated with Aurora and RDS databases, enabling it to monitor database cluster status and quickly connect to newly promoted writers during unexpected failures that trigger database failovers. Furthermore, the driver seamlessly works with popular frameworks like NHibernate and supports Entity Framework (EF) with MySQL databases.

The driver is available as an open-source project under the Apache 2.0 license. Refer the instructions on the on the GitHub repository to get started.

Read More for the details.

2025 11 07

AWS – Amazon Cognito user pools now supports private connectivity with AWS PrivateLink

Tibor Kiss AWS, Cloud AWS

Amazon Cognito user pools now supports AWS PrivateLink for secure and private connectivity. With AWS PrivateLink, you can establish a private connection between your virtual private cloud (VPC) and Amazon Cognito user pools to configure, manage, and authenticate against your Cognito user pools without using the public internet. By enabling private network connectivity, this enhancement eliminates the need to use public IP addresses or relying solely on firewall rules to access Cognito. This feature supports user pool management operations (e.g., list user pools, describe user pools), administrative operations (e.g., admin-created users), and user authentication flows (sign in local users stored in Cognito). OAuth 2.0 authorization code flow (Cognito managed login, hosted UI, sign-in via social identity providers), client credentials flow (Cognito machine-to-machine authorization), and federated sign-ins via SAML and OIDC standards are not supported through VPC endpoints at this time.

You can use PrivateLink connections in all AWS Regions where Amazon Cognito user pools is available, except AWS GovCloud (US) Regions. Creating VPC endpoints on AWS PrivateLink will incur additional charges; refer to AWS PrivateLink pricing page for details. You can get started by creating an AWS PrivateLink interface endpoint for Amazon Cognito user pools using the AWS Management Console, AWS Command Line Interface (CLI), AWS Software Development Kits (SDKs), AWS Cloud Development Kit (CDK), or AWS CloudFormation. To learn more, refer to the documentation on creating an interface VPC endpoint and Amazon Cognito’s developer guide.

Read More for the details.

2025 11 07

AWS – AWS KMS now supports Edwards-curve Digital Signature Algorithm (EdDSA)

Tibor Kiss AWS, Cloud AWS

AWS Key Management Service (KMS) announces support for the Edwards-curve Digital Signature Algorithm (EdDSA). With this new capability, you can create an elliptic curve asymmetric KMS key or data key pairs to sign and verify EdDSA signatures using the Edwards25519 curve (Ed25519). Ed25519 provides 128-bit security level equivalent to NIST P-256, faster signing performance, and small signature size (64 bytes) and public key sizes (32 bytes).

Ed25519 is ideal for situations that require small key and signature sizes, such as Internet of Things (IoT) devices and blockchain applications like cryptocurrency.

This new capability is available in all AWS Regions, including the AWS GovCloud (US) Regions and the China Regions. To learn more about this new capability, see Asymmetric key specs section in the AWS KMS Developer Guide.

Read More for the details.

2025 11 06

AWS – Amazon SageMaker launches custom tags for project resources

Tibor Kiss AWS, Cloud AWS

Today, Amazon SageMaker Unified Studio announced new capabilities allowing SageMaker projects to add custom tags to resources created through the project. This helps customers enforce tagging standards that conform to Service Control Policies (SCP) and helps enable cost tracking reporting practices on resources created across the organization.

As an Amazon SageMaker Unified Studio administrator, you can configure a project profile with tag configurations that will be pushed down to all projects using the project profile. Project profiles can be setup to pass Key and Value tag pairings or pass the Key of the tag with a default Value that can be modified during project creation. All tag values passed to the project will result in the resources created by that project being tagged. This provides administrators a governance mechanism that enforces project resources have the expected tags.

This first release of custom tags for project resources is supported only through application programming interface (API).

Custom tags for project resources capability is available in all AWS Regions where Amazon SageMaker Unified Studio is supported, including: Asia Pacific (Tokyo), Europe (Ireland), US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), South America (São Paulo), Asia Pacific (Seoul), Europe (London), Asia Pacific (Singapore), Asia Pacific (Sydney), Canada (Central), Asia Pacific (Mumbai), Europe (Paris), Europe (Stockholm)

To learn more, visit Amazon SageMaker then get started with the custom tag API documentation.

Read More for the details.

2025 11 06

AWS – AWS B2B Data Interchange is now available in AWS Europe (Ireland) Region

Tibor Kiss AWS, Cloud AWS

Customers in AWS Europe (Ireland) Region can now use AWS B2B Data Interchange to build highly customizable, scalable and cost-efficient EDI workloads.

AWS B2B Data Interchange automates validation, transformation, and generation of EDI files such as ANSI X12 documents to and from JSON and XML data formats. With this launch, you can use AWS B2B Data Interchange to process your EDI documents in AWS Europe (Ireland) Region, which enables you to meet your compliance and data sovereignty obligations while modernizing your B2B integration workloads. As part of this launch, the AWS B2B Data Interchange generative AI mapping capability will also become available in AWS Europe (Ireland) Region, simplifying mapping code development and ultimately expediting trading partners onboarding.

To learn more about AWS B2B Data Interchange visit our product page, user-guide or take our self-paced workshop. See the AWS Region Table for complete regional availability.

Read More for the details.

2025 11 06

AWS – Amazon ECS announces non-root container support for managed EBS volumes

Tibor Kiss AWS, Cloud AWS

Amazon Elastic Container Service (ECS) now supports mounting Amazon Elastic Block Store (EBS) volumes to containers running as non-root users. With this launch, ECS automatically configures the EBS volume’s file system permissions to allow non-root users to read and write data securely, while preserving the root-level ownership of the volume. This enhancement simplifies security-first container deployments by removing the need for manual permission management or custom entrypoint scripts.

This feature enhances container security by allowing tasks to run as non-root users, reducing the risk of privilege escalation and unauthorized access to data. Previously, for a container in a task to write to a mounted Amazon EBS volume, it had to run as the root user. ECS now automatically manages EBS volume permissions, simplifying workflows and ensuring that all containers within a task — regardless of user ID — can securely read and write to the mounted volume.

This feature is now available in all AWS Regions where Amazon ECS and Amazon EBS are supported, for EC2, AWS Fargate, and ECS Managed Instances launch types. To learn more, see Use Amazon EBS volumes with Amazon ECS in the Amazon ECS Developer Guide.

Read More for the details.

2025 11 06

AWS – Amazon DynamoDB Streams expands AWS PrivateLink support to FIPS endpoints

Tibor Kiss AWS, Cloud AWS

Amazon DynamoDB Streams now supports AWS PrivateLink for all available Amazon DynamoDB Streams Federal Information Processing Standard (FIPS) endpoints in US and Canada commercial AWS Regions.

With this launch, you can establish a private connection between your virtual private cloud (VPC) and Amazon DynamoDB Streams FIPS endpoints instead of connecting over the public internet, helping you meet your organization’s business, compliance, and regulatory requirements to limit public internet connectivity.

Amazon DynamoDB Streams support for AWS PrivateLink FIPs endpoints is available with Amazon DynamoDB Streams in the US and Canada commercial AWS Regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), and Canada West (Calgary).

To learn more about Amazon DynamoDB Streams support for AWS PrivateLink FIPs endpoints, visit the Amazon DynamoDB Stream documentation. For more information about AWS PrivateLink and its benefits, visit the AWS PrivateLink product page.

Read More for the details.

2025 11 06

AWS – Amazon Keyspaces (for Apache Cassandra) is now available in the Middle East (UAE) Region

Tibor Kiss AWS, Cloud AWS

Amazon Keyspaces (for Apache Cassandra) is now available in the Middle East (UAE) Region, allowing customers in the Middle East to build Cassandra-compatible applications with lower latency while keeping their data within the Region to meet data residency requirements.

Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service. Amazon Keyspaces is serverless, so you pay for only the resources that you use and you can build applications that serve thousands of requests per second with virtually unlimited throughput and storage.

The Middle East (UAE) Region provides the same Amazon Keyspaces features available in other AWS Regions, including point-in-time recovery, Multi-Region replication, CDC streams, and IPv6 support. This regional expansion enables organizations in the Middle East to build highly scalable, low-latency applications using familiar Cassandra Query Language (CQL) without the operational burden of managing Cassandra clusters.

To learn more about on Keyspaces, visit the Amazon Keyspaces documentation.

Read More for the details.

2025 11 06

AWS – AWS IoT Greengrass v2.16 introduces system log forwarder and TPM2.0 capabilities

Tibor Kiss AWS, Cloud AWS

AWS announces the release of AWS IoT Greengrass v2.16, introducing new core components for nucleus and nucleus lite. AWS IoT Greengrass is an Internet of Things (IoT) edge runtime and cloud service that helps customers build, deploy, and manage device software at the edge. The latest version 2.16 release includes enhanced debugging capabilities through the system log forwarder component. This component uploads system log files to AWS Cloud Watch, making it easier for developers to troubleshoot IoT edge applications.

The AWS IoT Greengrass v2.16 release also features a new nucleus lite version (v2.3) with TPM2.0 specification support, enabling developers to manage edge device security for their resource constrained devices using hardware-based root of trust modules. The implementation helps developers to scale their IoT deployments with confidence while providing secure storage for secrets and streamlined device authentication.

AWS IoT Greengrass v2.16 is available in all AWS Regions where AWS IoT Greengrass is offered. To learn more about AWS IoT Greengrass v2.16 and its new features, visit the AWS IoT Greengrass documentation. Follow the Getting Started guide for a quick introduction to AWS IoT Greengrass.

Read More for the details.

2025 11 06

AWS – Amazon Elastic VMware Service (Amazon EVS) is now available in additional Regions

Tibor Kiss AWS, Cloud AWS

Today, we’re announcing that Amazon Elastic VMware Service (Amazon EVS) is now available in all availability zones in the Asia Pacific (Mumbai), Asia Pacific (Sydney), Canada (Central) and Europe (Paris) Regions. This expansion provides more options to leverage the scale and flexibility of AWS for running your VMware workloads in the cloud.

Amazon EVS lets you run VMware Cloud Foundation (VCF) directly within your Amazon Virtual Private Cloud (VPC) on EC2 bare-metal instances, powered by AWS Nitro. Using either our step-by-step configuration workflow or the AWS Command Line Interface (CLI) with automated deployment capabilities, you can set up a complete VCF environment in just a few hours. This rapid deployment enables faster workload migration to AWS, helping you eliminate aging infrastructure, reduce operational risks, and meet critical timelines for exiting your data center.

The added availability in the Asia Pacific (Mumbai), Asia Pacific (Sydney), Canada (Central) and Europe (Paris) Regions gives your VMware workloads lower latency through closer proximity to your end users, compliance with data residency or sovereignty requirements, and additional high availability and resiliency options for your enhanced redundancy strategy.

To get started, visit the Amazon EVS product detail page and user guide.

Read More for the details.

2025 11 06

AWS – AWS announces a new Regional planning tool in Builder Center

Tibor Kiss AWS, Cloud AWS

Today, AWS announced a new tool called AWS Capabilities by Region in Builder Center. This tool helps you discover and compare AWS services, features, APIs, CloudFormation resources across AWS Regions. You can explore service availability through an interactive interface, compare multiple Regions side-by-side, and view forward-looking roadmap information. This detailed visibility helps you make informed decisions about global deployments and prevent project delays due to service unavailability.

In addition to this tool, AWS also enhanced the AWS Knowledge Model Context Protocol (MCP) Server to include information about Regional capabilities in an LLM-compatible format. MCP clients and agentic frameworks can connect to the AWS Knowledge MCP Server to get real-time insights into regional service availability and suggestions for alternative solutions when specific services or features are unavailable.

You can begin exploring AWS Capabilities by Region in AWS Builder Center today. The Knowledge MCP server is also publicly accessible at no cost and does not require an AWS account. Usage is subject to rate limits. Follow the getting started guide for setup instructions.

Read More for the details.

2025 11 06

AWS – AWS Backup now supports AWS KMS customer managed keys with logically air-gapped vaults

Tibor Kiss AWS, Cloud AWS

AWS Backup now supports encrypting backups in logically air-gapped vaults with AWS Key Management Service (KMS) customer managed keys (CMKs). This enhancement provides additional encryption options beyond the existing AWS-owned keys, helping organizations meet their regulatory and compliance requirements.

You can now create logically air-gapped vaults using your own customer managed keys (CMKs) in AWS KMS, giving you more control over your backup protection strategy. Whether you want to use keys from the same account or across accounts, you maintain centralized key management while preserving the security benefits of logically air-gapped vaults. This integration works seamlessly with your existing logically air-gapped vaults and other AWS Backup features, ensuring no disruption to your backup workflows.

AWS KMS customer managed key support with logically air-gapped vaults is available in all AWS Regions where logically air-gapped vaults are currently supported.

You can get started with logically air-gapped vault support for CMKs using the AWS Backup console, API, or CLI. When creating a new logically air-gapped vault, you can now choose between an AWS-owned key or your own CMK for encryption. For more information about implementing this feature, visit the AWS Backup product page, documentation, and blog.

Read More for the details.

2025 11 06

AWS – Deadline Cloud expands support with latest 6th, 7th, and 8th generation instances

Tibor Kiss AWS, Cloud AWS

AWS announces expanded instance family support in Deadline Cloud, adding new 6th, 7th, and 8th generation EC2 instances to enhance visual effects and animation rendering workloads. This release includes support for C7i, C7a, M7i, M7a, R7a, R7i, M8a, M8i, and R8i instance families, along with additional 6th generation instance types that were previously unavailable. Deadline Cloud is a fully managed service that helps customers run visual compute workloads in the cloud without having to manage infrastructure.

With this enhancement, studios can utilize a broader range of AWS compute technology to optimize their rendering workflows. The compute-optimized (C-series), general-purpose (M-series), and memory-optimized (R-series) instances provide tailored options for different rendering workloads – from compute-intensive simulations to memory-heavy scene processing. The inclusion of latest-generation instances like M8a and R8i enables customers to access improved performance and efficiency for their most demanding rendering tasks.

These instance families are available in all 10 AWS Regions where Deadline Cloud is offered. The specific instance types available in each Region depend on the regional availability of the EC2 instance types themselves.

To learn more about the new instance types supported in Deadline Cloud and their regional availability, see the AWS Deadline Cloud pricing page.

Read More for the details.

2025 11 06

AWS – Amazon CloudWatch Application Signals now available in AWS GovCloud (US) Regions

Tibor Kiss AWS, Cloud AWS

Amazon CloudWatch Application Signals expands its availability to AWS GovCloud (US-East) and AWS GovCloud (US-West) Regions, enabling government customers and regulated industries to automatically monitor and improve application performance in these regions. CloudWatch Application Signals provides comprehensive application monitoring capabilities by automatically collecting telemetry data from applications running on Amazon EC2, Amazon ECS, Amazon EKS and AWS Lambda, helping customers meet their compliance and monitoring requirements while maintaining workload visibility.

With CloudWatch Application Signals, customers in AWS GovCloud (US) regions can now monitor application health in real time, track performance against business goals, visualize service relationships and dependencies, and quickly identify and resolve performance issues. This automated observability solution eliminates the need for manual instrumentation while providing detailed insights into application behavior and performance patterns. The service automatically detects anomalies and helps correlate issues across different AWS services, enabling faster problem resolution and improved application reliability.

CloudWatch Application Signals will be available in AWS GovCloud (US-East) and AWS GovCloud (US-West). For pricing information, visit the Amazon CloudWatch pricing page. To get started, visit the Amazon CloudWatch Application Signals documentation.

Read More for the details.

2025 11 06

GCP – From silicon to softmax: Inside the Ironwood AI stack

Tibor Kiss Cloud, Google Cloud gcp

As machine learning models continue to scale, a specialized, co-designed hardware and software stack is no longer optional, it’s critical. Ironwood, our latest generation Tensor Processing Unit (TPU), is the cutting-edge hardware behind advanced models like Gemini and Nano Banana, from massive-scale training to high-throughput, low-latency inference. This blog details the core components of Google’s AI software stack that are woven into Ironwood, demonstrating how this deep co-design unlocks performance, efficiency, and scale. We cover the JAX and PyTorch ecosystems, the XLA compiler, and the high-level frameworks that make this power accessible.

1. The co-designed foundation

Foundation models today have trillions of parameters that require computation at ultra-large scale. We designed the Ironwood stack from the silicon up to meet this challenge.

The core philosophy behind the Ironwood stack is system-level co-design, treating the entire TPU pod not as a collection of discrete accelerators, but as a single, cohesive supercomputer. This architecture is built on a custom interconnect that enables massive-scale Remote Direct Memory Access (RDMA), allowing thousands of chips to exchange data directly at high bandwidth and low latency, bypassing the host CPU. Ironwood has a total of 1.77 PB of directly accessible HBM capacity, where each chip has eight stacks of HBM3E, with a peak HBM bandwidth of 7.4 TB/s and capacity of 192 GiB.

Unlike general-purpose parallel processors,TPUs are Application-Specific Integrated Circuits (ASICs) built for one purpose: accelerating large-scale AI workloads. The deep integration of compute, memory, and networking is the foundation of their performance. At a high level, the TPU consists of two parts:

Hardware core: The TPU core is centered around a dense Matrix Multiply Unit (MXU) for matrix operations, complemented by a powerful Vector Processing Unit (VPU) for element-wise operations (activations, normalizations) and SparseCores for scalable embedding lookups. This specialized hardware design is what delivers Ironwood’s 42.5 Exaflops of FP8 compute.

Software target: This hardware design is explicitly targeted by the Accelerated Linear Algebra (XLA) compiler, using a software co-design philosophy that combines the broad benefits of whole-program optimization with the precision of hand-crafted custom kernels. XLA’s compiler-centric approach provides a powerful performance baseline by fusing operations into optimized kernels that saturate the MXU and VPU. This approach delivers good “out of the box” performance with broad framework and model support. This general-purpose optimization is then complemented by custom kernels (detailed below in the Pallas section) to achieve peak performance on specific model-hardware combinations. This dual-pronged strategy is a fundamental tenet of the co-design.

The figure below shows the layout of the Ironwood chip:

This specialized design extends to the connectivity between TPU chips for massive scale-up and scale-out for a total of 88473.6 Tbps (11059.2TB/s) for a complete Ironwood superpod.

The building block: Cubes and ICI. Each physical Ironwood host has four TPU chips. A single rack of these hosts has 64 Ironwood chips and forms a “cube”. Within this cube, every chip is connected via multiple high-speed Inter-Chip Interconnect (ICI) links that form a direct 3D Torus topology. This creates an extremely dense, all-to-all network fabric, enabling massive bandwidth and low latency for distributed operations within the cube.

Scaling with OCS: Pods and Superpods To scale beyond a single cube, multiple cubes are connected using an Optical Circuit Switch (OCS) network. This is a dynamic, reconfigurable optical network that connects entire cubes, allowing the system to scale from a small “pod” (e.g., a 256-chip Ironwood pod with four cubes) to a massive “superpod” (e.g., a 9,216-chip system with 144 cubes). This OCS-based topology is key to fault tolerance. If a cube or link fails, the OCS fabric manager instructs the OCS to optically bypass the unhealthy unit and establish new, complete optical circuits connecting only the healthy cubes, swapping in a designated spare. This dynamic reconfigurability allows for both resilient operation and the provisioning of efficient “slices” of any size. For the largest-scale systems, into the hundreds of thousands of chips, multiple superpods can then be connected via a standard Data-Center Network (DCN).

Chips can be configured in different “slices” with different OCS topologies as shown below.

Each chip is connected to 6 other chips in the 3D torus and provides 3 distinct axes for parallelism.

Ironwood delivers this performance while focusing on power efficiency, allowing AI workloads to run more cost-effectively. Ironwood perf/watt is 2x relative to Trillium, our previous-generation TPU. Our advanced liquid cooling solutions and optimized chip design can reliably sustain up to twice the performance of standard air cooling even under continuous, heavy AI workloads. Ironwood is nearly 30x more power efficient than our first Cloud TPU from 2018 and is our most power-efficient chip to date.

It’s the software stack’s job to translate high-level code into optimized instructions that leverage the full power of the hardware. The stack supports two primary frameworks: the JAX ecosystem, which offers maximum performance and flexibility, as well as PyTorch on TPUs, which provides a native experience for the PyTorch community.

2. Optimizing the entire AI lifecycle

We use the principle of a co-designed Ironwood hardware and software stack to deliver maximum performance and efficiency across every phase of model development, with specific hardware and software capabilities tuned for each stage.

Pre-training: This phase demands sustained, massive-scale computation. A full 9,216-chip Ironwood superpod leverages the OCS and ICI fabric to operate as a single, massive parallel processor, achieving maximum sustained FLOPS utilization through different data formats. Running a job of this magnitude also requires resilience, which is managed by high-level software frameworks like MaxText, detailed in Section 3.3, that handle fault tolerance and checkpointing transparently.

Post-training (Fine-tuning and alignment): This stage includes diverse, FLOPS-intensive tasks like supervised fine-tuning (SFT) and Reinforcement Learning (RL), all requiring rapid iteration. RL, in particular, introduces complex, heterogeneous compute patterns. This stage often requires two distinct types of jobs to run concurrently: high-throughput, inference-like sampling to generate new data (often called ‘actor rollouts’), and compute-intensive, training-like ‘learner’ steps that perform the gradient-based updates. Ironwood’s high-throughput, low-latency network and flexible OCS-based slicing are ideal for this type of rapid experimentation, efficiently managing the different hardware demands of both sampling and gradient-based updates. In Section 3.3, we discuss how we provide optimized software on Ironwood — including reference implementations and libraries — to make these complex fine-tuning and alignment workflows easier to manage and execute efficiently.

Inference (serving): In production, models must deliver low-latency predictions with high throughput and cost-efficiency. Ironwood is specifically engineered for this, with its large on-chip memory and compute power optimized for both the large-batch “prefill” phase and the memory-bandwidth-intensive “decode” phase of large generative models. To make this power easily accessible, we’ve optimized state-of-the-art serving engines. At launch, we’ve enabled vLLM, detailed in Section 3.3, providing the community with a top-tier, open-source solution that maximizes inference throughput on Ironwood.

3. The software ecosystem for TPUs

The TPU stack, and Ironwood’s stack in particular, is designed to be modular, allowing developers to operate at the level of abstraction they need. In this section, we focus on the compiler/runtime, framework, and AI stack libraries.

3.1 The JAX path: Performance and composability

JAX is a high-performance numerical computing system co-designed with the TPU architecture. It provides a familiar NumPy-like API backed by powerful function transformations:

jit (Just-in-Time compilation): Uses the XLA compiler to fuse operations into a single, optimized kernel for efficient TPU execution.
grad (automatic differentiation): Automatically computes gradients of Python functions, the fundamental mechanism for model training.
shard_map (parallelism): The primitive for expressing distributed computations, allowing explicit control over how functions and data are sharded across a mesh of TPU devices, directly mapping to the ICI/OCS topology.

This compositional approach allows developers to write clean, Pythonic code that JAX and XLA transform into highly parallelized programs optimized for TPU hardware. JAX is what Google Deepmind and other Google teams use to build, train, and service their variety of models.

For most developers, these primitives are abstracted by high-level frameworks, like MaxText, built upon a foundation of composable, production-proven libraries:

Optax: A flexible gradient processing and optimization library (e.g., AdamW)

Orbax: A library for asynchronous checkpointing of distributed arrays across large TPU slices

Qwix: A JAX quantization library supporting Quantization Aware Training (QAT) and Post-Training Quantization (PTQ)

Metrax: A library for collecting and processing evaluation metrics in a distributed setting

Tunix: A high-level library for orchestrating post-training jobs

Goodput: A library for measuring and monitoring real-time ML training efficiency, providing a detailed breakdown of badput (e.g., initialization, data loading, checkpointing)

3.2 The PyTorch path: A native eager experience

To bring Ironwood’s power to the PyTorch community, we are developing a new, native PyTorch experience complete with support for a “native eager mode”, which executes operations immediately as they are called. Our goal is to provide a more natural and developer-friendly way to access Ironwood’s scale, minimizing the code changes and level of effort required to adapt models for TPUs. This approach is designed to make the transition from local experimentation to large-scale training more straightforward.

This new framework is built on three core principles to ensure a truly PyTorch-native environment:

Full eager mode: Enables the rapid prototyping, debugging, and research workflows that developers expect from PyTorch.
Standard distributed APIs: Leverages the familiar torch.distributed API, built on DTensor, for scaling training workloads across TPU slices.
Idiomatic compilation: Uses torch.compile as the single, unified path to JIT compilation, utilizing XLA as its backend to trace the graph and compile it into efficient TPU machine code.

This ensures the transition from local experimentation to large-scale distributed training is a natural extension of the standard PyTorch workflow.

3.3 Frameworks: MaxText, PyTorch on TPU, and vLLM

While JAX and PyTorch provide the computational primitives, scaling to thousands of chips is a supercomputer management problem. High-level frameworks handle the complexities of resilience, fault tolerance, and infrastructure orchestration.

MaxText (JAX): MaxText is an open-source, high-performance LLM pre-training and post-training solution written in pure Python and JAX. MaxText demonstrates optimized training on its library of popular OSS models like DeepSeek, Qwen, gpt-oss, Gemma, and more. Whether users are pre-training large Mixture-of-Experts (MoE) models from scratch, or leveraging the latest Reinforcement Learning (RL) techniques on an OSS model, MaxText provides tutorials and APIs to make things easy. For scalability and resiliency, MaxText leverages Pathways, which was originally developed by Google DeepMind and now provides TPU users with differentiated capabilities like elastic training and multi-host inference during RL.

PyTorch on TPU: We recently shared our proposal about our PyTorch native experience on TPUs at Pytorch Conference 2025, including an early preview of training on TPU with minimal code changes. In addition to the framework itself, we are working with the community (RFC), investing in reproducible recipes, reference implementations, and migration tools to enable PyTorch users to use their favorite frameworks on TPUs. Expect further updates as this work matures.

vLLM TPU (Serving): vLLM TPU is now powered by tpu-inference, an expressive and powerful new hardware plugin that unifies JAX and PyTorch under a single lowering path – meaning both frameworks are translated to optimized TPU code through one common, shared backend. This new unified backend is not only faster than the previous generation of vLLM TPU but also offers broader model coverage. This integration provides more flexibility to JAX and PyTorch users, running PyTorch models performantly with no code changes while also extending native JAX support, all while retaining the standard vLLM user experience and interface.

3.4 Extreme performance: Custom kernels via Pallas

While XLA is powerful, cutting-edge research often requires novel algorithms e.g. new attention mechanisms, custom padding to handle dynamic ragged tensors and other optimizations for custom MoE models that the XLA compiler cannot yet optimize.

The JAX ecosystem solves this with Pallas, a JAX-native kernel programming language embedded directly in Python. Pallas presents a unified, Python-first experience, dramatically reducing cognitive load and accelerating the iteration cycle. Other platforms lack this unified, in-Python approach, forcing developers to fragment their workflow. To optimize these operations, they must drop into a disparate ecosystem of lower-level tools—from DSLs like Triton and cuTE to raw CUDA C++ and PTX. This introduces significant mental overhead by forcing developers to manually manage memory, streams, and kernel launches, pulling them out of their Python-based environment

This is a clear example of co-design. Developers use Pallas to explicitly manage the accelerator’s memory hierarchy, defining how “tiles” of data are staged from HBM into the extremely fast on-chip SRAM to be operated on by the MXUs. Pallas has two main parts to it.

Pallas: The developer defines the high-level algorithmic structure and memory logistics in Python.
Mosaic: This compiler backend translates the Pallas definition into optimized TPU machine code. It handles operator fusion, determines optimal tiling strategies, and generates software pipelines to perfectly overlap data transfers (HBM-to-SRAM) with computation (on the MXUs), with the sole objective of saturating the compute units.

Because Pallas kernels are JAX-traceable, they are fully compatible with jit, vmap, and grad. This stack provides Python-native extensibility for both JAX and PyTorch, as PyTorch users can consume Pallas-optimized kernels without ever leaving the native PyTorch API. Pallas kernels for PyTorch and JAX models, on both TPU and GPU, are available via Tokamax, the ML ecosystem’s first multi-framework, multi-hardware kernel library.

3.5 Performance engineering: Observability and debugging

The Ironwood stack includes a full suite of tools for performance analysis, bottleneck detection, and debugging, allowing developers to fully optimize their workloads and operate large scale clusters reliably,

Cloud TPU metrics: Exposes key system-level counters (FLOPS, HBM bandwidth, ICI traffic) to Google Cloud Monitoring that can then be exported to popular monitoring tools like Prometheus.

TensorBoard: Visualizes training metrics (loss, accuracy) and hosts the XProf profiler UI.

XProf (OpenXLA Profiler): The essential toolset for deep performance analysis. It captures detailed execution data from both the host-CPU and all TPU devices, providing:

- Trace Viewer: A microsecond-level timeline of all operations, showing execution, collectives, and “bubbles” (idle time).
- Input Pipeline Analyzer: Diagnoses host-bound vs. compute-bound bottlenecks.
- Op Profile: Ranks all XLA/HLO operations by execution time to identify expensive kernels.
- Memory Profiler: Visualizes HBM usage over time to debug peak memory and fragmentation.

Debugging Tools:

- JAX Debugger (jax.debug): Enables print and breakpoints from within jit-compiled functions.
- TPU Monitoring Library: A real-time diagnostic dashboard (analogous to nvidia-smi) for live debugging of HBM utilization, MXU activity, and running processes.

Beyond performance optimization, developers and infra admins can view fleet efficiency and goodput metrics at various levels (e.g., job, reservation) to ensure maximum utilization of their TPU infrastructure.

4. Conclusion

The Ironwood stack is a complete, system-level co-design, from the silicon to the software. It delivers performance through a dual-pronged strategy: the XLA compiler provides broad, “out-of-the-box” optimization, while the Pallas and Mosaic stack enables hand-tuned kernel performance.

This entire co-designed platform is accessible to all developers, providing first-class, native support for both the JAX and the PyTorch ecosystem. Whether you are pre-training a massive model, running complex RL alignment, or serving at scale, Ironwood provides a direct, resilient, and high-performance path from idea to supercomputer.

Get started today with vLLM on TPU for inference and MaxText for pre-training and post-training.

Read More for the details.

2025 11 06

GCP – Unlock 2x better price-performance with Axion-based N4A VMs, now in preview

Tibor Kiss Cloud, Google Cloud gcp

Decision makers and builders today face a constant challenge: managing rising cloud costs while delivering the performance their customers demand. As applications evolve to use scale-out microservices and handle ever-growing data volumes, organizations need maximum efficiency from their underlying infrastructure to support their growing general-purpose workloads.

To meet this need, we’re excited to announce our latest Axion-based virtual machine series: N4A, available in preview on Compute Engine, Google Kubernetes Engine (GKE), Dataproc, and Batch, with support in Dataflow and other services coming soon.

N4A is the most cost-effective N-series VM to date, delivering up to 2x better price-performance and 80% better performance-per-watt than comparable current-generation x86-based VMs. This makes it easier for customers to further optimize the Total Cost of Ownership (TCO) for a broad range of general-purpose workloads. We see this with cloud-native businesses running scale-out web servers and microservices on GKE, enterprise teams managing backend application servers and mid-sized databases, and engineering organizations operating large CI/CD build farms.

At Google Cloud, we co-design our compute offerings with storage, networking and software at every layer of the stack, from orchestrators to runtimes, to deliver exceptional system-level performance and cost-efficiency. N4A’s breakthrough price-performance is powered by our latest-generation Google Axion Processors, built on the Arm® Neoverse® N3 compute core, Google Dynamic Resource Management (DRM) technology, and Titanium, Google Cloud’s custom-designed hardware and software system that offloads networking and storage processing to free up the CPU. Titanium is part of Google Cloud’s vertically integrated software stack — from the custom silicon in our servers to our planet-scale network traversing 7.75 million kilometers of terrestrial and subsea fiber across 42 regions — that is engineered to maximize efficiency and provide the ultra-low latency and high bandwidth to customers at global scale.

Redefining general-purpose compute and enabling AI inference

N4A is engineered for versatility, with a feature set to support your general-purpose and CPU-based AI workloads. It comes in predefined and custom shapes, with up to 64 vCPUs and 512GB of DDR5 in high-cpu (2GB of memory per vCPU), standard (4GB per vCPU), and high-memory (8GB per vCPU) configurations, with instance networking up to 50 Gbps of bandwidth. N4A VMs feature support for our latest generation Hyperdisk storage options, including Hyperdisk Balanced, Hyperdisk Throughput, and Hyperdisk ML (coming later), providing up to 160K IOPS, 2.4GB/s of throughput per instance.

N4A performs well across a range of industry-standard benchmarks that represent the key workloads our customers run every day. For example, relative to comparable current-generation x86-based VM offerings, N4A delivers up to 105% better price-performance for compute-bound workloads, up to 90% better price-performance for scale-out web servers, up to 85% better price-performance for Java applications, and up to 20% better price-performance for general-purpose databases.

Footnote: As of October 2025. Performance based on the estimated SPECrate®2017_int_base, estimated SPECjbb2015, MySQL Transactions/minute (RO), and Google internal Nginx Reverse Proxy benchmark scores run in production on comparable latest-generation generally-available VMs with general purpose storage types. Price-performance claims based on published and upcoming list prices for Google Cloud.

In the real world, early adopters are seeing dramatic price-performance improvements from the new N4A instances.

“At ZoomInfo, we operate a massive data intelligence platform where efficiency is paramount. Our core data processing pipelines, which are critical for delivering timely insights to our customers, run extensively on Dataflow and Java services in GKE. In our preview of the new N4A instances, we measured a 60% improvement in price-performance for these key workloads compared to their x86-based counterparts. This allows us to scale our platform more efficiently and deliver more value to our customers, faster.” – Sergei Koren, Chief Infrastructure Architect, ZoomInfo

“Organizations today need performance, efficiency, flexibility, and scale to meet the computing demands of the AI era; this requires the close collaboration and co-design that is at the heart of our partnership with Google Cloud. As N4A redefines cost-efficiency, customers gain a new level of infrastructure optimization, enabling enterprises to choose the right infrastructure for their workload requirements with Arm and Google Cloud.” – Bhumik Patel, Director, Server Ecosystem Development, Infrastructure Business, Arm

Granular control with Custom Machine Types and Hyperdisk

A key advantage of our N-series VMs has always been flexibility, and with N4A, we are bringing one of our most popular features to the Axion family for the first time: Custom Machine Types (CMT). Instead of fitting your workload into a predefined shape, CMTs on N4A lets you independently configure the amount of vCPU and memory to meet your application’s unique needs. This ability to right-size your instances means you pay only for the resources you use, minimizing waste and optimizing your total cost of ownership.

This same principle of matching resources to your specific workload applies to storage. N4A VMs feature support for our latest generation of Hyperdisk, allowing you to select the perfect storage profile for your application’s needs:

Hyperdisk Balanced: Offers an optimal mix of performance and cost for the majority of general-purpose workloads, with up to 160K IOPs per N4A VM.
Hyperdisk Throughput: Delivers up to 2.4GiBps of max throughput for bandwidth-intensive analytics workloads like Hadoop or Kafka, providing high-capacity storage at an excellent value.
Hyperdisk ML (post GA): Purpose-built for AI/ML workloads, allows you to attach a single disk containing your model weights or datasets to up to 32 N4A instances simultaneously for large-scale inference or training tasks.
Hyperdisk Storage Pools: Instead of provisioning capacity and performance on a per-volume basis, allows you to provision performance and capacity in aggregate, further optimizing costs by up to 50% and simplifying management.

“At Vimeo, we have long relied on Custom Machine Types to efficiently manage our massive video transcoding platform. Our initial tests on the new Axion-based N4A instances have been very compelling, unlocking a new level of efficiency. We’ve observed a 30% improvement in performance for our core transcoding workload compared to comparable x86 VMs. This points to a clear path for improving our unit economics and scaling our services more profitably, without changing our operational model.” – Joe Peled, Sr. Director of Hosting & Delivery Ops

A growing Arm-based Axion portfolio for customer choice

C-series VMs are designed for workloads that require consistently high performance, e.g., medium-to-large-scale databases and in-memory caches. Alongside them, N-series VMs have been a key Compute Engine pillar, offering a balance of price-performance and flexibility, lowering the cost of running workloads with variable resource needs such as scale-out Java/GKE workloads. We released our first Axion-based machine series, C4A, in October 2024, and the introduction of N4A complements C4A, providing a range of Google Axion instances suited to your workloads’ precise needs.

On top of that, GKE unlocks significant price-performance advantages by orchestrating Axion-based C4A and N4A machine types. GKE leverages Custom Compute Classes to provision and mix these machine types, matching workloads to the right hardware. This automated, heterogeneous cluster management allows teams to optimize their total cost of ownership across their entire application stack.

Also joining the Axion family is C4A.metal, Google Cloud’s first Axion bare metal instance that helps builders meet use cases that require access to the underlying physical server to run specialized applications in a non-virtualized environment, such as automotive systems development, workloads with strict licensing requirements, and Android software development. C4A.metal will be available in preview soon.

Supported by the broad and mature Arm ecosystem, adopting Axion is easier than ever, and the combination of C4A and N4A can help you lower the total cost of running your business, without compromising on performance or workload-specific requirements:

N4A for cost optimization and flexibility. Deliberately engineered for general-purpose workloads that need a balance of price and performance, including scale-out web servers, microservices, containerized applications, open-source databases, batch, data analytics, development environments, data preparation and AI/ML experimentation.
C4A for consistently high performance, predictability, and control. Powering workloads where every microsecond counts, such as medium- to large-scale databases, in-memory caches, cost-effective AI/ML inference, and high-traffic gaming servers. C4A delivers consistent performance, offering a controlled maintenance experience for mission-critical workloads, networking bandwidth up to 100 Gbps, and next-generation Titanium Local SSD storage.

“Migrating to Google Cloud’s Axion portfolio gave us a critical competitive advantage. We slashed our compute consumption by 20% while maintaining low and stable latency with C4A instances, such as our Supply-Side Platform (SSP) backend service. Additionally, C4A enabled us to leverage Hyperdisk with precisely the IOPS we need for our stateful workloads, regardless of instance size. This flexibility gives us the best of both worlds – allowing us to win more ad auctions for our clients while significantly improving our margins. We’re now testing the N4A family by running some of our key workloads that require the most flexibility, such as our API relay service. We are happy to share that several applications running in production are consuming 15% less CPU compared to our previous infrastructure, reducing our costs further, while ensuring that the right instance backs the workload characteristics required.” – Or Ben Dahan, Cloud & Software Architect at Rise

Get started with N4A today

N4A is available during preview in the following Google Cloud regions: us-central1 (Iowa), us-east4 (N. Virginia), europe-west3 (Frankfurt) and europe-west4 (Netherlands) with more regions to follow.

We can’t wait to see what you build. To get access, sign-up here. To learn more, check out the N4A documentation.

Read More for the details.

2025 11 06

GCP – Announcing Axion C4A metal: Arm-based Axion VMs for specialized use cases

Tibor Kiss Cloud, Google Cloud gcp

Today, we are thrilled to announce C4A metal, our first bare metal instance running on Google Axion processors, available in preview soon. C4A metal is designed for specialized workloads that require direct hardware access and Arm®-native compatibility.

Now, organizations running environments such as Android development, automotive simulation, CI/CD pipelines, security workloads, and custom hypervisors can run them on Google Cloud, without the performance overheads and complexity of nested virtualization.

C4A metal instances, like other Axion instances, are built on the standard Arm architecture, so your applications and operating systems compiled for Arm remain portable across your cloud, on-premises, and edge environments, protecting your development investment. C4A metal offers 96 vCPUs, 768GB of DDR5 memory, up to 100Gbps of networking bandwidth, with full support for Google Cloud Hyperdisk including Hyperdisk Balanced, Extreme, Throughput, and ML block storage options.

Google Cloud provides workload-optimized infrastructure to ensure the right resources are available for every task. C4A metal, like the Google Cloud Axion virtual machine family, is powered by Titanium, a key component for multi-tier offloads and security that is foundational to our infrastructure. Titanium’s custom-designed silicon offloads networking and storage processing to free up the CPU, and its dedicated SmartNIC manages all I/O, ensuring that Axion cores are reserved exclusively for your application’s performance. Titanium is part of Google Cloud’s vertically integrated software stack — from the custom silicon in our servers to our planet-scale network traversing 7.75 million kilometers of terrestrial and subsea fiber across 42 regions — that is engineered to maximize efficiency and provide the ultra-low latency and high bandwidth to customers at global scale.

Architectural parity for automotive workloads

Automotive customers can benefit from the Arm architecture’s performance, efficiency, and flexible design for in-vehicle systems such as infotainment and Advanced Driver Assistance Systems (ADAS). Axion C4A metal instances enable architectural parity between test environments and production silicon, allowing automotive technology providers to validate their software on the same Arm Neoverse instruction set architecture (ISA) used in production electronic control units (ECUs). This significantly reduces the risk of late-stage integration failures. For performance-sensitive tasks, these customers can execute demanding virtual hardware-in-the-loop (vHIL) simulations with the consistent, low-latency performance of physical hardware, ensuring test results are reliable and accurate. Finally, C4A metal lets providers move beyond the constraints of a physical lab, by dynamically scaling entire test farms and transforming them from fixed capital expenses into flexible operational ones.

“In the era of AI-defined vehicles, the accelerating pace and complexity of technology are pushing us to rethink traditional linear approaches to software development. Google Cloud’s introduction of Axion C4A metal is a major step forward in this journey. By offering full architectural parity on Arm between test environments and physical silicon, customers can benefit from accelerated development cycles, enabling continuous integration and compliance for a variety of specialized use cases.” – Dipti Vachani, Senior Vice President and General Manager, Automotive Business, Arm

“Our partners and customers rely on QNX to deliver the safety, security, reliability, and real-time performance required for their most mission-critical systems — from advanced driver assistance to digital cockpits. As the Software-Defined Vehicle era continues to gain momentum, decoupling software development from physical hardware is no longer optional — it’s essential for innovation at scale. The launch of Google Cloud’s C4A-metal instances on Axion introduces a powerful ARM-based bare metal platform that we are eager to test and support as this will enable transformative cloud infrastructure benefits for our automotive ecosystem.” – Grant Courville, Senior Vice President, Products and Strategy, QNX

“The future of automotive mobility demands unprecedented speed and precision in practice and development. For automakers and suppliers leveraging the Snapdragon Digital Chassis platform, aligning their cloud development and testing environments to ensure parity with the Snapdragon SoCs in the vehicle is absolutely crucial for efficiency and quality. We are excited about Google Cloud’s commitment to this segment — offering C4A-metal instances with Axion is a massive leap forward, giving the automotive ecosystem a true 1:1 physical to virtual environment in the cloud. This breakthrough significantly reduces integration challenges, slashes validation time, and allows our partners to unleash AI-driven features to market faster at scale.” – Laxmi Rayapudi, VP, Product Management, Qualcomm Technologies, Inc.

Align test and production for Android development

The Android platform was built for Arm-based processors, the standard for virtually all mobile devices. By running development and testing pipelines on the bare-metal instances of Axion processors with C4A metal, Android developers can benefit from native performance, eliminating the overhead of emulation management, such as slow instruction-by-instruction translation layers. In addition, they can significantly reduce latency for Android build toolchains and automated test systems, leading to faster feedback cycles. C4A metal also solves the performance challenges of nested virtualization, making it a great platform for scalable Cuttlefish (Cloud Android) environments.

Once available, developers can deploy scalable Cuttlefish environment farms on top C4A metal instances with an upcoming release of Horizon or by directly leveraging Cloud Android Orchestration. C4A metal allows these virtual devices to run directly on the physical hardware, providing the performance needed to build and manage large, high-fidelity test farms for true continuous testing.

Bare metal access without compromise

As a cloud offering, C4A metal enables a lower total cost of ownership by replacing the entire lifecycle of physical hardware procurement and management with a predictable operational expense. This eliminates the direct capital expenditures of purchasing servers, along with the associated operational costs of hardware maintenance contracts, power, cooling, and physical data center space. You can programmatically provision and de-provision instances to match your exact testing demands, ensuring you are not paying for an over-provisioned fleet of servers sitting idle waiting for peak development cycles.

Operating as standard compute resources within your Virtual Private Cloud (VPC), C4A metal instances inherit and leverage the same security policies, audit logging, and network controls as virtual machines. Instances are designed to appear as physical servers to your toolchain and support common monitoring and security agents, allowing for straightforward integration with your existing Google Cloud environments. This integration extends to storage, where network-attached Hyperdisk allows you to manage persistent disks using the same snapshot and resizing tools your teams already use for your virtual machine fleet.

“For our build system, true isolation is paramount. Running on Google Cloud’s new C4A metal instance on Axion enables us to isolate our package builds with a strong hypervisor security boundary without compromising on build performance.” – Matthew Moore, Founder and CTO, Chainguard, Inc

Better together: the Axion C and N series

The addition of C4A metal to the Arm-based Axion portfolio allows customers to lower TCO by matching the right infrastructure to every workload. While Axion C4A virtual machines optimize for consistently high performance and N4A virtual machines (now in preview) optimize for price-performance and flexibility, C4A metal addresses the critical need for direct hardware access by specialized applications that require a non-virtualized Arm environment.

For example, an Android development company could create a highly efficient CI/CD pipeline by using C4A virtual machines for the build farm. For large-scale testing, they could use C4A metal to run Cuttlefish virtual devices directly on the physical hardware, eliminating nested virtualization overhead. To enable even higher fidelity, they can run Cuttlefish hybrid devices on C4A metal, reusing the system images from their physical hardware. Concurrently, supporting infrastructure such as CI/CD orchestrators and artifact repositories could run on cost-effective N4A instances, using Custom Machine Types to right-size resources and minimize operational expenses.

Coming soon to preview

C4A metal is scheduled for preview soon. Please fill this form to sign up for early access and additional updates.

Read More for the details.

2025 11 06

GCP – Announcing Ironwood TPUs General Availability and new Axion VMs to power the age of inference

Tibor Kiss Cloud, Google Cloud gcp

Today’s frontier models, including Google’s Gemini, Veo, Imagen, and Anthropic’s Claude train and serve on Tensor Processing Units (TPUs). For many organizations, the focus is shifting from training these models to powering useful, responsive interactions with them. Constantly shifting model architectures, the rise of agentic workflows, plus near-exponential growth in demand for compute, define this new age of inference. In particular, agentic workflows that require orchestration and tight coordination between general-purpose compute and ML acceleration are creating new opportunities for custom silicon and vertically co-optimized system architectures.

We have been preparing for this transition for some time and today, we are announcing the availability of three new products built on custom silicon that deliver exceptional performance, lower costs, and enable new capabilities for inference and agentic workloads:

Ironwood, our seventh generation TPU, will be generally available in the coming weeks. Ironwood is purpose-built for the most demanding workloads: from large-scale model training and complex reinforcement learning (RL) to high-volume, low-latency AI inference and model serving. It offers a 10X peak performance improvement over TPU v5p and more than 4X better performance per chip for both training and inference workloads compared to TPU v6e (Trillium), making Ironwood our most powerful and energy-efficient custom silicon to date.
New Arm®-based Axion instances. N4A, our most cost-effective N series virtual machine to date, is now in preview. N4A offers up to 2x better price-performance than comparable current-generation x86-based VMs. We are also pleased to announce C4A metal, our first Arm-based bare metal instance, will be coming soon in preview.

Ironwood and these new Axion instances are just the latest in a long history of custom silicon innovation at Google, including TPUs, Video Coding Units (VCU) for YouTube, and five generations of Tensor chips for mobile. In each case, we build these processors to enable breakthroughs in performance that are only possible through deep, system-level co-design, with model research, software, and hardware development under one roof. This is how we built the first TPU ten years ago, which in turn unlocked the invention of the Transformer eight years ago — the very architecture that powers most of modern AI. It has also influenced more recent advancements like our Titanium architecture, and advanced liquid cooling that we’ve deployed at GigaWatt scale with fleet-wide uptime of ~99.999% since 2020.

Pictured: An Ironwood board showing three Ironwood TPUs connected to liquid cooling.

Pictured: Third-generation Cooling Distribution Units, providing liquid cooling to an Ironwood superpod.

Ironwood: The fastest path from model training to planet-scale inference

The early response to Ironwood is overwhelmingly enthusiastic. Anthropic is compelled by the impressive price-performance gains that accelerate their path from training massive Claude models to serving them to millions of users. In fact, Anthropic plans to access up to 1 million TPUs:

“Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work. As demand continues to grow exponentially, we’re increasing our compute resources as we push the boundaries of AI research and product development. Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect.” – James Bradbury, Head of Compute, Anthropic

Ironwood is being used by organizations of all sizes and across industries:

“Our mission at Lightricks is to define the cutting edge of open creativity, and that demands AI infrastructure that eliminates friction and cost at scale. We relied on Google Cloud TPUs and its massive ICI domain to achieve our breakthrough training efficiency for LTX-2, our leading open-source multimodal generative model. Now, as we enter the age of inference, our early testing makes us highly enthusiastic about Ironwood. We believe that Ironwood will enable us to create more nuanced, precise, and higher-fidelity image and video generation for our millions of global customers.” – Yoav HaCohen, Research Director, GenAI Foundational Models, Lightricks

“At Essential AI, our mission is to build powerful, open frontier models. We need massive, efficient scale, and Google Cloud’s Ironwood TPUs deliver exactly that. The platform was incredibly easy to onboard, allowing our engineers to immediately leverage its power and focus on accelerating AI breakthroughs.” – Philip Monk, Infrastructure Lead, Essential AI

System-level design maximizes inference performance, reliability, and cost

TPUs are a key component of AI Hypercomputer, our integrated supercomputing system that brings together compute, networking, storage, and software to improve system-level performance and efficiency. At the macro level, according to a recent IDC report, AI Hypercomputer customers achieved on average 353% three-year ROI, 28% lower IT costs, and 55% more efficient IT teams.

Ironwood TPUs will help customers push the limits of scale and efficiency even further. When you deploy TPUs, the system connects each individual chip to each other, creating a pod — allowing the interconnected TPUs to work as a single unit. With Ironwood, we can scale up to 9,216 chips in a superpod linked with breakthrough Inter-Chip Interconnect (ICI) networking at 9.6 Tb/s. This massive connectivity allows thousands of chips to quickly communicate with each other and access a staggering 1.77 Petabytes of shared High Bandwidth Memory (HBM), overcoming data bottlenecks for even the most demanding models.

Pictured: Part of an Ironwood superpod, directly connecting 9,216 Ironwood TPUs in a single domain.

At that scale, services demand uninterrupted availability. That’s why our Optical Circuit Switching (OCS) technology acts as a dynamic, reconfigurable fabric, instantly routing around interruptions to restore the workload while your services keep running. And when you need more power, Ironwood scales across pods into clusters of hundreds of thousands of TPUs.

Pictured: Jupiter data center network enables the connection of multiple Ironwood superpods into clusters of hundreds of thousands of TPUs.

The AI Hypercomputer advantage: Hardware and software co-designed for faster, more efficient outcomes

On top of this hardware is a co-designed software layer, where our goal is to maximize Ironwood’s massive processing power and memory, and make it easy to use throughout the AI lifecycle.

To improve fleet efficiency and operations, we’re excited to announce that TPU customers can now benefit from Cluster Director capabilities in Google Kubernetes Engine. This includes advanced maintenance and topology awareness for intelligent scheduling and highly resilient clusters.
For pre-training and post-training, we’re also sharing new enhancements to MaxText, a high-performance, open source LLM framework, to make it easier to implement the latest training and reinforcement learning optimization techniques, such as Supervised Fine-Tuning (SFT) and Generative Reinforcement Policy Optimization (GRPO).
For inference, we recently announced enhanced support for TPUs in vLLM, allowing developers to switch between GPUs and TPUs, or run both, with only a few minor configuration changes, and GKE Inference Gateway, which intelligently load balances across TPU servers to reduce time-to-first-token (TTFT) latency by up to 96% and serving costs by up to 30%.

Our software layer is what enables AI Hypercomputer’s high performance and reliability for training, tuning, and serving demanding AI workloads at scale. Thanks to deep integrations across the stack — from data-center-wide hardware optimizations to open software and managed services— Ironwood TPUs are our most powerful and energy-efficient TPUs to date. Learn more about our approach to hardware and software co-design here.

Axion: Redefining general-purpose compute

Building and serving modern applications requires both highly specialized accelerators and powerful, efficient general-purpose compute. This was our vision for Axion, our custom Arm Neoverse®-based CPUs, which we designed to deliver compelling performance, cost and energy efficiency for everyday workloads.

Today, we are expanding our Axion portfolio with:

N4A (preview), our second general-purpose Axion VM, which is ideal for microservices, containerized applications, open-source databases, batch, data analytics, development environments, experimentation, data preparation and web serving jobs that make AI applications possible. Learn more about N4A here.
C4A metal (in preview soon), our first Arm-based bare-metal instance, which provides dedicated physical servers for specialized workloads such Android development, automotive in-car systems, software with strict licensing requirements, scale test farms, or running complex simulations. Learn more about C4A metal here.

With today’s announcements, the Axion portfolio now includes three powerful options, N4A, C4A and C4A metal. Together, the C and N series allow you to lower the total cost of running your business without compromising on performance or workload-specific requirements.

Axion-based Instance	Optimized for	Key Features
N4A (preview)	Price-performance and flexibility	Up to 64 vCPUs, 512GB of DDR5 Memory, and 50 Gbps networking, with support for Custom Machine Types, Hyperdisk Balanced and Throughput storage.
C4A Metal (in preview soon)	Specialized workloads, such as Hypervisors and native Arm development	Up to 96 vCPUs, 768GB of DDR5 Memory, Hyperdisk storage and up to 100Gbps of networking
C4A	Consistently high performance	Up to 72 vCPUs, 576GB of DDR5 Memory, 100Gbps of Tier 1 networking, Titanium SSD with up to 6TB of local capacity, advanced maintenance controls and support for Hyperdisk Balanced, Throughput, and Extreme.

Axion’s inherent efficiency also makes it a valuable option for modern AI workflows. While specialized accelerators like Ironwood handle the complex task of model serving, Axion excels at the operational backbone: supporting high-volume data preparation, ingestion, and running application servers that host your intelligent applications. Axion is already translating into customer impact:

A powerful combination for AI and everyday computing

To thrive in an era with constantly shifting model architectures, software, and techniques, you need a combination of purpose-built AI accelerators for model training and serving, alongside efficient, general-purpose CPUs for the everyday workloads, including the workloads that support those AI applications.

Ultimately, whether you use Ironwood and Axion together or mix and match them with the other compute options available on AI Hypercomputer, this system-level approach gives you the ultimate flexibility and capability for the most demanding workloads. Sign up to test Ironwood, Axion N4A, or C4A metal today.

Read More for the details.

2025 11 06

GCP – Your First AI Application is Easier Than You Think

Tibor Kiss Cloud, Google Cloud gcp

If you’re a developer, you’ve seen generative AI everywhere. It can feel like a complex world of models and advanced concepts. It can be difficult to know where to actually start.

The good news is that building your first AI-powered application is more accessible than you might imagine. You don’t need to be an AI expert to get started. This post introduces a new codelab designed to bridge this gap and provide you with a first step. We’ll guide you through the entire process of building a functional, interactive travel chatbot using Google’s Gemini model.

Dive into the codelab and build your first AI application today!

Setting the Stage: Your First Project

In this codelab, you’ll step into the role of a developer at a travel company tasked with building a new chat application. You’ll start with a basic web application frontend and, step-by-step, you will bring it to life by connecting it to the power of generative AI.

By the end, you will have built a travel assistant that can:

Answer questions about travel destinations.
Provide personalized recommendations.
Fetch real-time data, like the weather, to give genuinely helpful advice.

The process is broken down into a few key stages.

Making the First Connection

Before you can do anything fancy, you need to get your application talking to the AI model. An easy way to do this is with the Vertex AI SDK, a complete library for interacting with the Vertex AI platform.

While the Vertex AI SDK is a powerful tool for the full machine learning lifecycle, this lab focuses on one of its most-used tools: building generative AI applications. This part of the Vertex AI SDK acts as the bridge between your application and the Gemini model. Without it, you would have to manually handle all the complex wiring yourself—writing code to manage authentication, formatting intricate API requests, and parsing the responses. The Vertex AI SDK handles all that complexity for you so you can focus on what you actually want to do: send a message and get a response.

In this codelab, you’ll see just how simple it is.

Giving your AI purpose with system instructions

Once your app is connected, you’ll notice the AI’s responses won’t be tailored to your purposes yet. One way you can make it more useful for your specific use case is by giving it system instructions.

Hot Tip: Use Google AI Studio to Create Your System Instructions

A great way to develop your system instructions is to leverage Gemini as a creative partner to draft them for you. For example, you could ask Gemini in Google AI Studio to generate a thorough set of instructions for a “sophisticated and friendly travel assistant.”

Once you have a draft, you can immediately test it, also in Google AI Studio. Start a new chat and in the panel to the right, set the Gemini model to the one you’re using in your app and paste the text into the system instruction field. This allows you to quickly interact with the model and see how it behaves with your instructions, all without writing any code. When you’re happy with the results, you can copy the final version directly into your application.

Connecting Your AI to the Real World

This is where you break the model out of its knowledge silo and connect it to live data. By default, an AI model‘s knowledge is limited to the data it was trained on; it doesn’t know today’s weather. However, you can provide Gemini with access to external knowledge using a powerful feature called function calling!

The concept is simple: you write a basic Python function (like one to check the weather) and then describe that tool to the model. Then, when a user asks about the weather, the model can ask your application to run your function and use the live result in its answer. This allows the model to answer questions far beyond its training data, making it a much more powerful and useful assistant with access to up-to-the-minute information.

In this lab, we used the Geocoding API and the Weather Forecast API to provide the app with the ability to factor in the weather when answering questions about travel.

Your Journey Starts Here

Building with AI isn’t about knowing everything at once. It’s about taking the first step, building something tangible, and learning key concepts along the way. This codelab was designed to be that first step. By the end, you won’t just have a working travel chatbot—you’ll have hands-on experience with the fundamental building blocks of a production-ready AI application. You’ll be surprised at what you can build.

Share your progress and connect with others on the journey using the hashtag #ProductionReadyAI. Happy learning!

Read More for the details.

2025 11 06

AWS – AWS End User Messaging SMS launches Carrier Lookup

Tibor Kiss AWS, Cloud AWS

Starting today, AWS End User Messaging customers can now lookup carrier information related to a phone number including the country, number type, dialing code, and mobile network and carrier codes. With Carrier Lookup, you can increase deliverability by checking important information about a phone number before you start sending messages, avoiding sending messages to the wrong destination, or to incorrect phone numbers.

AWS End User Messaging provides developers with a scalable and cost-effective messaging infrastructure without compromising the safety, security, or results of their communications. Developers can integrate messaging to support uses cases such as one-time passcodes (OTP) at sign-ups, account updates, appointment reminders, delivery notifications, promotions and more.

Support for Carrier Lookup is available in all AWS Regions where End User Messaging is available, see the AWS Region table.

To learn more, see AWS End User Messaging.

Read More for the details.