Amazon Connect now makes it easier to deliver high-quality voice experiences in Omnissa Virtual Desktop Infrastructure (VDI) environments. Amazon Connect automatically optimizes audio by redirecting media from your agent’s local desktop to Connect, simplifying the agent experience and improving audio quality by reducing network hops. Agents can simply log into their Omnissa remote desktop application (e.g., Omnissa Horizon) and start accepting calls using your custom agent user interface (i.e., custom Contact Control Panel) using APIs in the Amazon Connect open source JavaScript libraries.
These new features are available in all AWS regions where Amazon Connect is offered. To learn more, please see the documentation.
Amazon SageMaker HyperPod now integrates with Amazon EventBridge, enabling you to receive near-real time notifications about changes in your cluster’s status. With this integration, you can easily track key events such as HyperPod cluster status transitions and node health changes.
SageMaker HyperPod delivers two types of notifications via EventBridge – 1. Cluster Status Change Events, that notify you when your HyperPod cluster transitions between states, such as InService or Failed. 2) Node Health Events that notify you when nodes change health status (e.g. Healthy/Unhealthy), or are automatically replaced during recovery from failures. You can also write simple EventBridge rules to trigger automated actions when these events occur.
SageMaker HyperPod events via EventBridge are now available in all AWS regions where both SageMaker HyperPod and Amazon EventBridge are generally available.
Customers can now create Amazon FSx for NetApp ONTAP file systems in the AWS Mexico (Central) Region, providing fully managed shared storage in the cloud with the data access and management capabilities of ONTAP.
Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for NetApp ONTAP provides the first and only complete, fully managed NetApp ONTAP file systems in the cloud. It offers the familiar features, performance, capabilities, and APIs of ONTAP with the agility, scalability, and simplicity of an AWS service.
To learn more about Amazon FSx for NetApp ONTAP, visit our product page, and see the AWS Region Table for complete regional availability information.
Customers can now create Amazon FSx for OpenZFS file systems in the AWS Mexico (Central) Region, providing fully managed shared file storage built on the OpenZFS file system..
Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for OpenZFS provides fully managed, cost-effective, shared file storage powered by the popular OpenZFS file system, and is designed to deliver sub-millisecond latencies and multi-GB/s throughput along with rich ZFS-powered data management capabilities (like snapshots, data cloning, and compression).
To learn more about Amazon FSx for OpenZFS, visit our product page, and see the AWS Region Table for complete regional availability information.
Customers can now create Amazon FSx for Lustre file systems in the AWS Mexico (Central) Region, providing fully managed shared storage with the scalability and performance of the popular Lustre file system.
Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for Lustre provides fully managed shared storage built on the world’s most popular high-performance file system, designed for fast processing of workloads such as machine learning, high performance computing (HPC), video processing, financial modeling, and electronic design automation (EDA).
To learn more about Amazon FSx for Lustre, visit our product page, and see the AWS Region Table for complete regional availability information.
Customers can now create Amazon FSx for Windows File Server file systems in the AWS Mexico (Central) Region, providing fully managed shared storage built on Windows Server.
Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for Windows File Server provides fully managed, highly reliable file storage built on Windows Server and can be accessed via the industry-standard Server Message Block (SMB) protocol.
To learn more about Amazon FSx for Windows File Server, visit our product page, and see the AWS Region Table for complete regional availability information.
Customers can now create Amazon FSx for Windows File Server file systems in the AWS Asia Pacific (Thailand) Region, providing fully managed shared storage built on Windows Server.
Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for Windows File Server provides fully managed, highly reliable file storage built on Windows Server and can be accessed via the industry-standard Server Message Block (SMB) protocol.
To learn more about Amazon FSx for Windows File Server, visit our product page, and see the AWS Region Table for complete regional availability information.
Amazon VPC Reachability Analyzer now supports the ability to exclude network resources when analyzing reachability between a source and destination, providing you greater flexibility to run reachability analyses.
VPC Reachability Analyzer is a configuration analysis feature that enables you to check network reachability between a source resource and a destination resource in your virtual private clouds (VPCs). With this launch, you can easily identify an alternative traffic path in your network. For example, if you want to identify any path from your internet gateway to Elastic Network Interfaces (ENIs) that is not passing through the network firewall for inspection, you can specify Network Firewall under resource exclusion and run the reachability analysis. If the analysis returns a reachable path, you know there is an alternative path in your network and can take required actions.
To learn more about Amazon VPC Reachability Analyzer, please visit documentation. To view Reachability Analyzer prices, visit Amazon VPC Pricing.
Customers can now create Amazon FSx for OpenZFS file systems in the AWS Asia Pacific (Thailand) Region, providing fully managed shared file storage built on the OpenZFS file system.
Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for OpenZFS provides fully managed, cost-effective, shared file storage powered by the popular OpenZFS file system, and is designed to deliver sub-millisecond latencies and multi-GB/s throughput along with rich ZFS-powered data management capabilities (like snapshots, data cloning, and compression).
To learn more about Amazon FSx for OpenZFS, visit our product page, and see the AWS Region Table for complete regional availability information.
Customers can now create Amazon FSx for NetApp ONTAP file systems in the AWS Asia Pacific (Thailand) Region, providing fully managed shared storage in the cloud with the data access and management capabilities of ONTAP.
Amazon FSx for NetApp ONTAP makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for NetApp ONTAP provides the first and only complete, fully managed NetApp ONTAP file systems in the cloud. It offers the familiar features, performance, capabilities, and APIs of ONTAP with the agility, scalability, and simplicity of an AWS service.
To learn more about Amazon FSx for NetApp ONTAP, visit our product page, and see the AWS Region Table for complete regional availability information.
Customers can now create Amazon FSx for Lustre file systems in the AWS Asia Pacific (Thailand) Region, providing fully managed shared storage with the scalability and performance of the popular Lustre file system.
Amazon FSx makes it easier and more cost effective to launch, run, and scale feature-rich, high-performance file systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and broad set of capabilities. Amazon FSx for Lustre provides fully managed shared storage built on the world’s most popular high-performance file system, designed for fast processing of workloads such as machine learning, high performance computing (HPC), video processing, financial modeling, and electronic design automation (EDA).
To learn more about Amazon FSx for Lustre, visit our product page, and see the AWS Region Table for complete regional availability information.
From retail to gaming, from code generation to customer care, an increasing number of organizations are running LLM-based applications, with 78% of organizations in development or production today. As the number of generative AI applications and volume of users scale, the need for performant, scalable, and easy to use inference technologies is critical. At Google Cloud, we’re paving the way for this next phase of AI’s rapid evolution with our AI Hypercomputer.
At Google Cloud Next 25, we shared many updates to AI Hypercomputer’s inference capabilities, unveiling Ironwood, our newest Tensor Processing Unit (TPU) designed specifically for inference, coupled with software enhancements such as simple and performant inference using vLLM on TPU and the latest GKE inference capabilities — GKE Inference Gateway and GKE Inference Quickstart.
With AI Hypercomputer, we also continue to push the envelope for performance with optimized software, backed by strong benchmarks:
Google’s JetStream inference engine incorporates new performance optimizations, integrating Pathways for ultra-low latency multi-host, disaggregated serving.
MaxDiffusion, our reference implementation of latent diffusion models, delivers standout performance on TPUs for compute-heavy image generation workloads, and now supports Flux, one of the largest text-to-image generation models to date.
The latest performance results from MLPerf™ Inference v5.0 demonstrate the power and versatility of Google Cloud’s A3 Ultra (NVIDIA H200) and A4 (NVIDIA HGX B200) VMs for inference.
Optimizing performance for JetStream: Google’s JAX inference engine
To maximize performance and reduce inference costs, we are excited to offer more choice when serving LLMs on TPU, further enhancing JetStream and bringing vLLM support for TPU, a widely-adopted fast and efficient library for serving LLMs. With both vLLM on TPU and JetStream, we deliver standout price-performance with low-latency, high-throughput inference and community support through open-source contributions and from Google AI experts.
JetStream is Google’s open-source, throughput- and memory-optimized inference engine, purpose-built for TPUs and based on the same inference stack used to serve Gemini models. Since we announced JetStream last April, we have invested significantly in further improving its performance across a wide range of open models. When using JetStream, our sixth-generation Trillium TPU now exceeds throughput performance by 2.9x for Llama2-70B and 2.8x for Mixtral 8x7B compared to TPU v5e (using our reference implementation MaxText).
Figure 1: JetStream throughput (output tokens / second). Google internal data. Measured using Llama2-70B (MaxText) on Cloud TPU v5e-8 and Trillium 8-chips and Mixtral 8x7B (MaxText) on Cloud TPU v5e-4 and Trillium 4-chips. Maximum input length: 1024, maximum output length: 1024. As of April 2025.
Available for the first time for Google Cloud customers, Google’s Pathways runtime is now integrated into JetStream, enabling multi-host inference and disaggregated serving — two important features as model sizes grow exponentially and generative AI demands evolve.
Multi-host inference using Pathways distributes the model across multiple accelerators hosts when serving. This enables the inference of large models that don’t fit on a single host. With multi-host inference, JetStream achieves 1703 token/s on Llama3.1-405B on Trillium. This translates to three times more inference per dollar compared to TPU v5e.
In addition, with Pathways, disaggregated serving capabilities allow workloads to dynamically scale LLM inference’s decode and prefill stages independently. This allows for better utilization of resources and can lead to improvements in performance and efficiency, especially for large models. For Llama2-70B, using multiple hosts with disaggregated serving performs seven times better for prefill (time-to-first-token, TTFT) operations, and nearly three times better for token generation (time-per-output-token, TPOT) compared with interleaving the prefill and decode stages of LLM request processing on the same server on Trillium.
Figure 2: Measured using Llama2-70B (MaxText) on Cloud TPU Trillium 16-chips (8 chips allocated for prefill server, 8 chips allocated for decode server). Measured using the OpenOrca dataset. Maximum input length: 1024, maximum output length: 1024. As of April 2025.
Customers like Osmos are using TPUs to maximize cost-efficiency for inference at scale:
“Osmos is building the world’s first AI Data Engineer. This requires us to deploy AI technologies at the cutting edge of what is possible today. We are excited to continue our journey building on Google TPUs as our AI infrastructure for training and inference. We have vLLM and JetStream in scaled production deployment on Trillium and are able to achieve industry leading performance at over 3500 tokens/sec per v6e node for long sequence inference for 70B class models. This gives us industry leading tokens/sec/$, comparable to not just other hardware infrastructure, but also fully managed inference services. The availability of TPUs and the ease of deployment on AI Hypercomputer lets us build out an Enterprise software offering with confidence.” – Kirat Pandya, CEO, Osmos
MaxDiffusion: High-performance diffusion model inference
Beyond LLMs, Trillium demonstrates standout performance on compute-heavy workloads like image generation. MaxDiffusion delivers a collection of reference implementations of various latent diffusion models. In addition to Stable Diffusion inference, we have expanded MaxDiffusion to now support Flux; with 12 billion parameters, Flux is one of the largest open source text-to-image models to date.
As demonstrated on MLPerf 5.0, Trillium now delivers 3.5x throughput improvement for queries/second on Stable Diffusion XL (SDXL) compared to last performance round for its predecessor, TPU v5e. This further improves throughput by 12% since the MLPerf 4.1 submission.
Figure 3: MaxDiffusion throughput (images per second). Google internal data. Measured using the SDXL model on Cloud TPU v5e-4 and Trillium 4-chip. Resolution: 1024×1024, batch size per device: 16, decode steps: 20. As of April 2025.
With this throughput, MaxDiffusion delivers a cost-efficient solution. The cost to generate 1000 images is as low as 22 cents on Trillium, 35% less compared to TPU v5e.
Figure 4: Diffusion cost to generate 1000 images. Google internal data. Measured using the SDXL model on Cloud TPU v5e-4 and Cloud TPU Trillium 4-chip. Resolution: 1024×1024, batch size per device: 2, decode steps: 4. Cost is based on the 3Y CUD prices for Cloud TPU v5e-4 and Cloud TPU Trillium 4-chip in the US. As of April 2025.
A3 Ultra and A4 VMs MLPerf 5.0 Inference results
For MLPerf™ Inference v5.0, we submitted 15 results, including our first submission with A3 Ultra (NVIDIA H200) and A4 (NVIDIA HGX B200) VMs. The A3 Ultra VM is powered by eight NVIDIA H200 Tensor Core GPUs and offers 3.2 Tbps of GPU-to-GPU non-blocking network bandwidth and twice the high bandwidth memory (HBM) compared to A3 Mega with NVIDIA H100 GPUs. Google Cloud’s A3 Ultra demonstrated highly competitive performance, achieving results comparable to NVIDIA’s peak GPU submissions across LLMs, MoE, image, and recommendation models.
Google Cloud was the only cloud provider to submit results on NVIDIA HGX B200 GPUs, demonstrating excellent performance of A4 VM for serving LLMs including Llama 3.1 405B (a new benchmark introduced in MLPerf 5.0). A3 Ultra and A4 VMs both deliver powerful inference performance, a testament to our deep partnership with NVIDIA to provide infrastructure for the most demanding AI workloads.
Customers like JetBrains are using Google Cloud GPU instances to accelerate their inference workloads:
“We’ve been using A3 Mega VMs with NVIDIA H100 Tensor Core GPUs on Google Cloud to run LLM inference across multiple regions. Now, we’re excited to start using A4 VMs powered by NVIDIA HGX B200 GPUs, which we expect will further reduce latency and enhance the responsiveness of AI in JetBrains IDEs.” – Vladislav Tankov, Director of AI, JetBrains
AI Hypercomputer is powering the age of AI inference
Google’s innovations in AI inference, including hardware advancements in Google Cloud TPUs and NVIDIA GPUs, plus software innovations such as JetStream, MaxText, and MaxDiffusion, are enabling AI breakthroughs with integrated software frameworks and hardware accelerators. Learn more about using AI Hypercomputer for inference. Then, check out these JetStream and MaxDiffusion recipes to get started today.
Amazon DocumentDB (with MongoDB compatibility), a fully managed, native JSON database that makes it simple and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure, is now available in the AWS Europe (Stockholm) Region. Amazon DocumentDB provides scalability and durability for mission-critical MongoDB workloads, supporting millions of requests per second and can be scaled to 15 low latency read replicas in minutes without application downtime. Storage scales automatically up to 128 TiB without any impact to your application. In addition, Amazon DocumentDB natively integrates with AWS Database Migration Service (DMS), Amazon CloudWatch, AWS CloudTrail, AWS Lambda, AWS Backup and more.
To learn more about Amazon DocumentDB, please visit the Amazon DocumentDB product page, and see the AWS Region Table for complete regional availability.
Amazon Connect now has new pricing models for external voice transfer and Contact Lens with external voice systems. The new pricing models have independent pricing for external voice connectors and external voice minutes and are effective for all customers from May 1, 2025.
External voice transfer directly transfers voice calls and metadata from Amazon Connect to another voice system, so you can use Amazon Connect telephony and Interactive Voice Response (IVR) to help improve customer experience. Each external transfer connector is now $3,100 per month and each external voice transfer is $0.005 per minute.
Contact Lens with external voice enables Connect Contact Lens contact records, call recording, real-time and post-call analytics, and agent evaluations with your existing voice system to help improve customer experience and agent performance. Each external voice connector is now $3,100 per month and each external voice minute is $0.012 per minute. Standard Contact Lens conversational analytics and performance evaluation charges apply when used with external voice.
Amazon Connect external voice transfer and Contact Lens with external voice are available in the US East (N. Virginia) and US West (Oregon) AWS Regions. To learn more about Amazon Connect and other voice systems, review the following resources:
Today, AWS announced the opening of a new AWS Direct Connect location within the NEXTDC B2 data center near Brisbane, Australia. By connecting your network to AWS at the new location, you gain private, direct access to all public AWS Regions (except those in China), AWS GovCloud Regions, and AWS Local Zones. This site is the first AWS Direct Connect location in Brisbane and the eight AWS Direct connect location within Australia. This Direct Connect location offers dedicated 10 Gbps and 100 Gbps connections with MACsec encryption available.
The Direct Connect service enables you to establish a private, physical network connection between AWS and your data center, office, or colocation environment. These private connections can provide a more consistent network experience than those made over the public internet.
For more information on the over 148 Direct Connect locations worldwide, visit the locations section of the Direct Connect product detail pages. Or, visit our getting started page to learn more about how to purchase and deploy Direct Connect.
Amazon Aurora PostgreSQL Limitless Database is now available with PostgreSQL version 16.8 compatibility, bringing significant improvements and new features. This release contains product improvements and bug fixes made by the PostgreSQL community, along with Aurora Limitless-specific additions such as support for the ltree extension, the btree_gist extension, and improved query performance.
Aurora PostgreSQL Limitless Database makes it easy for you to scale your relational database workloads by providing a serverless endpoint that automatically distributes data and queries across multiple Amazon Aurora Serverless instances while maintaining the transactional consistency of a single database. Aurora PostgreSQL Limitless Database offers capabilities such as distributed query planning and transaction management, removing the need for you to create custom solutions or manage multiple databases to scale. As your workloads increase, Aurora PostgreSQL Limitless Database adds additional compute resources while staying within your specified budget, so there is no need to provision for peak, and compute automatically scales down when demand is low.
Aurora PostgreSQL Limitless Database is available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).
Amazon Relational Database Service (RDS) for PostgreSQL now supports the latest minor versions 17.5, 16.9, 15.13, 14.18, and 13.21. We recommend that you upgrade to the latest minor versions to fix known security vulnerabilities in prior versions of PostgreSQL, and to benefit from the bug fixes added by the PostgreSQL community. This release also includes updates for PostgreSQL extensions such as pg_repack 1.5.1, pg_logical 2.4.5 and others.
You can use automatic minor version upgrades to automatically upgrade your databases to more recent minor versions during scheduled maintenance windows. You can also use Amazon RDS Blue/Green deployments for RDS for PostgreSQL using physical replication for your minor version upgrades. Learn more about upgrading your database instances, including automatic minor version upgrades and Blue/Green Deployments in the Amazon RDS User Guide.
Amazon RDS for PostgreSQL makes it simple to set up, operate, and scale PostgreSQL deployments in the cloud. See Amazon RDS for PostgreSQL Pricing for pricing details and regional availability. Create or update a fully managed Amazon RDS database in the Amazon RDS Management Console.
Amazon SQS now supports VPCE endpoints that have been validated under the Federal Information Processing Standard (FIPS) 140-3 program. You can now easily use AWS PrivateLink with Amazon SQS for regulated workloads that require a secure connection using a FIPS 140-3 validated cryptographic module.
FIPS compliant endpoints help companies contracting with the US federal government meet the FIPS security requirement to encrypt sensitive data in supported regions. To create an interface VPC endpoint that connects to an Amazon SQS FIPS endpoint, see Internetwork traffic privacy in Amazon SQS.
The new capability is available in all AWS Commercial Regions in the United States and Canada. To learn more about FIPS 140-3 at AWS, visit FIPS 140-3 Compliance.
Starting today, customers can use AWS Control Tower in the AWS Asia Pacific (Thailand) and AWS Mexico (Central) Regions. With this launch, AWS Control Tower is available in 32 AWS Regions and the AWS GovCloud (US) Regions. AWS Control Tower offers the easiest way to set up and govern a secure, multi-account AWS environment. It simplifies AWS experiences by orchestrating multiple AWS services on your behalf while maintaining the security and compliance needs of your organization. You can set up a multi-account AWS environment within 30 minutes or less, govern new or existing account configurations, gain visibility into compliance status, and enforce controls at scale.
If you are new to AWS Control Tower, you can launch it today in any of the supported regions, and you can use AWS Control Tower to build and govern your multi-account environment in all supported Regions. If you are already using AWS Control Tower and you want to extend its governance features to the newly supported regions in your accounts, you can go to the settings page in your AWS Control Tower dashboard, select your regions, and update your landing zone. Once you update all accounts that are governed by AWS Control Tower, your landing zone, managed accounts, and registered OUs will be under governance in the new regions.
For a full list of Regions where AWS Control Tower is available, see the AWS Region Table. To learn more, visit the AWS Control Tower homepage or see the AWS Control Tower User Guide.
AWS Security Incident Response is now available to customers in three additional AWS Regions: Asia Pacific (Mumbai), Europe (Paris), and South America (São Paulo). You can now use these additional regions to prepare for, respond to, and recover from security events.
With AWS Security Incident Response, you can enhance your organization’s overall security posture and incident response readiness. AWS Security Incident Response offers three core features: monitoring and triaging of security findings from Amazon GuardDuty and third-party tools through AWS Security Hub; integrated communication and collaboration tools to streamline security escalation and response; and access to self-managed security investigation tools and 24/7 support from the AWS Customer Incident Response Team (CIRT), who can assist you in investigating, containing, eradicating, and recovering from security events. The AWS CIRT has years of experience helping customers recover from security events, building up deep institutional knowledge based on real-world scenarios.