AWS Lambda now supports creating or updating functions using container images stored in an Amazon Elastic Container Registry (ECR) repository located in a different AWS account than the Lambda function in all GovCloud Regions (AWS GovCloud (US-West) and AWS GovCloud (US-East)). Previously, users could only access container images within the same AWS account as their Lambda function. This often required copying images to a local ECR repository if they were stored in a centralized account.
This enhancement streamlines the process by allowing access to container images across different accounts. To achieve this, you need to grant necessary permissions to the Lambda resource and the Lambda service principal. This functionality is available in all AWS Regions where both Lambda and ECR are available. See the AWS Region table for more info. Visit the AWS Lambda documentation for further details on configuring these permissions
Amazon Corretto 25, a Long Term Support (LTS) version, is now generally available. Amazon Corretto is a no-cost, multi-platform, production-ready distribution of OpenJDK. You can download Corretto 25 for Linux, Windows, and macOS from our downloads page.
Amazon Corretto 25 new features include:
Two features that were initially released as experimental in JDK 24 are now LTS production-ready in JDK 25: Compact Object Headers: designed to lower heap memory usage by shrinking object headers from 96-128 bits down to 64 bits. Generational Shenandoah GC: engineered to provide sustainable throughput and lower p99 pause times or similar pause times with a smaller heap and reduced CPU usage.
Ahead-of-Time (AOT) Caching: designed to improve cold-start and warm-up time by reusing pre-parsed pre-linked classes and compilation profiles between training and production runs.
Language improvements: primitive types in patterns, flexible constructors, module‑wide imports, compact source files, scoped values for thread-local variables, stable values for immutable data, all designed to cut boilerplate, keep everyday code shorter and safer.
Observability: JDK Flight Recorder gains CPU‑time sampling, cooperative sampling and method‑trace events for low‑overhead production profiling.
Structured Concurrency: designed to provide coordinated task management, allowing related tasks fail or finish together.
Vector API: developed to provide computations that compile to optimal vector instructions on supported CPUs.
Virtual Thread pinning improvements: reduces thread pinning in synchronized blocks for better scalability.
A detailed description of these features can be found on the OpenJDK 25 Project page. Amazon Corretto 25 is distributed by Amazon under an open source license and will be supported through October 2032.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) storage optimized I8ge instances are available in AWS Europe (Frankfurt) region. I8ge instances are powered by AWS Graviton4 processors to deliver up to 60% better compute performance compared to previous generation Graviton2-based storage optimized Amazon EC2 instances. I8ge instances use the latest third generation AWS Nitro SSDs, local NVMe storage that deliver up to 55% better real-time storage performance per TB while offering up to 60% lower storage I/O latency and up to 75% lower storage I/O latency variability compared to previous generation Im4gn instances. At 120 TB, I8ge instances have the highest storage density among AWS Graviton-based storage optimized Amazon EC2 instances. These instances are built on the AWS Nitro System, which offloads CPU virtualization, storage, and networking functions to dedicated hardware and software enhancing the performance and security for your workloads.
I8ge instances offer instance sizes up to 48xlarge including two metal sizes, 1,536 GiB of memory, and 120 TB instance storage. At 300 Gbps, these instances have the highest networking bandwidth among storage optimized Amazon EC2 instances. They are ideal for real-time applications that require much larger storage density such as relational databases, non-relational databases, streaming databases, search queries and data analytics.
Amazon Connect now provides you with agent hierarchy filters on the contact search page in the Amazon Connect UI. This launch enables contact center leaders to drill-down into their hierarchy to review contacts handled by specific contact center sites, departments or teams, for assessing contact quality or agent performance. This also enables centralized teams within contact centers, such as quality management and regulatory compliance, to efficiently locate and review contacts handled by specific teams or departments, streamlining their workflow for performance evaluation and compliance auditing.
This feature is available in all regions where Amazon Connect is offered. To learn more, please visit our documentation and our webpage.
As enterprises increasingly adopt model context protocol (MCP) to extend capabilities of AI models to better integrate with external tools, databases, and APIs, it becomes even more important to ensure secure MCP deployment.
MCP unlocks new capabilities for AI systems; it can also introduce new risks, such as tool poisoning, prompt injection, and dynamic tool manipulation. These can lead to data exfiltration, identity subversion and misuse of AI systems.
Securing an MCP deployment begins with a strong security foundation. Here are five key MCP deployment risks you should be aware of, and how using a centralized proxy architecture on Google Cloud can help mitigate them.
Top five MCP deployment risks you should know
While there are some broader risks unique to AI, these five are especially important to be aware of when designing MCP deployments:
Unauthorized tool exposure: A misconfigured MCP manifest can create a vulnerability that allows unauthorized individuals or agents to access sensitive tools, such as internal administration functions.
Session hijacking: An attacker may steal a legitimate user’s session ID to impersonate them. This allows the attacker to either make unauthorized API calls directly or, in stateful systems, inject malicious payloads into a shared data queue for delivery to the victim’s active session.
Tool shadowing and shadow MCP: Rogue MCP tools, mimicking legitimate services, can be deployed by malicious actors. This deceptive practice can trick both AI agents and human employees into interacting with harmful tools under the belief they are genuine.
Sensitive data exposure and token theft: Improperly configured environments or inadequate data-handling practices can lead to the accidental exposure of sensitive information, such as API keys, credentials, tokens, and personally identifiable information (PII). Attackers can exploit exposed credentials to gain unauthorized access to corporate resources, which could lead to significant data breaches.
Authentication bypass: Weak, misconfigured, or inadequately-enforced authentication mechanisms can be exploited by attackers to circumvent security controls, allowing them to gain unauthorized access by successfully impersonating legitimate users and trusted entities.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x7f53708dba60>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
How a centralized MCP proxy architecture can help
To address these core risks in a scalable manner, we recommend a pattern built around a centralized MCP proxy that acts as a secure intermediary for all communication between clients and MCP servers.
Built on Cloud Run, Apigee, or Google Kubernetes Engine, this intelligent proxy intercepts all tool calls, enforcing your organization’s security policies and providing extensive monitoring. By serving as a centralized security enforcement point, the MCP proxy enables important risk-managing capabilities such as consistent access controls (acts as an authorization server), advanced traffic management, audit logging, secret scanning, resource limits, and real-time threat detection, all without requiring modifications to individual remote MCP servers.
Without centralization, organizations deploying MCP servers face challenges:
Fragmented authentication: Each MCP server enforces its own policies, creating inconsistencies.
Operational overhead: Updating agents with new endpoints is extensive and error-prone.
Security blind spot: Limited visibility into health, usage, and anomalies across distributed MCP servers.
Expended attack surface: Vulnerabilities such as prompt injection, tool poisoning, and unmonitored traffic flows.
A unified MCP proxy on Google Cloud addresses these challenges by:
Enforcing organizational security policies at a single, managed entry point for all MCP requests and responses.
Model Armor screens model prompts and responses for prompt injection and jailbreak detection, sensitive data, and can help ensure responsible AI practices.
Secret Manager stores API keys, database credentials, and sensitive configuration values.
Artifact Registry ensures MCP server images are scanned, verified and encrypted at rest with customer managed encryption keys.
Google Cloud applies security at every layer of the architecture. Let’s dig into these capabilities below.
Network-level security control
Network segmentation can isolate MCP tools servers, load balancers, and databases using virtual private networks (VPC), Cloud Run ingress, and Cloud Run egress options to reduce lateral movement risk.
Web attack and DDOS protection can help block malicious traffic, enforce IP allow and deny lists, and provide resilience against DDoS attacks with Global Load Balancer with Cloud Armor.
Advanced traffic managementcan help provide custom domain path matching for MCP tool servers, using Internal Application Load Balancer (ALB) with advanced traffic management capabilities. By inspecting the incoming request host and path, the ALB intelligently directs requests to the appropriate backend service or instance of your MCP server.
The backend service supports various back end types which can be MCP servers deployed in Google Cloud (such as VMs, Cloud Run, and Google Kubernetes Engine), in a remote network (using hybrid NEG) or on the internet (using internet NEG).
Advanced Traffic Management uses Internal Application Load Balancer to help provide custom domain path matching for MCP tool servers.
The MCP proxy functions as the MCP Authorization Service (MAS) and integrates with an external Identity provider (such as Google Identity Platform, Okta, and Entra ID). This allows for secure access to your MCP server without the need to build and maintain a complex identity system.
This highly scalable and reliable authentication solution can help users securely authenticate through their existing identity platform. Additionally, Cloud Identity and Access Management (IAM) can enable enforcement of role-based access, and restrict access to cloud resources.
MCP authentication and authorization flow.
Protection in-line MCP client prompt and response with Model Armor
Proxy leverages Model Armor to provide robust, real-time protection against runtime threats like prompt injection, jailbreaking, tool poisoning, dynamic tool manipulation, and sensitive data leakage.
Using Model Armor templates, you can configure the MCP client prompts to be scrutinized and sanitized by Model Armor before being forwarded to the application load balancer. This process ensures that only sanitized prompts reach your MCP tools servers, thereby preventing unintended actions.
All actions performed by MCP clients will be logged by the proxy to Cloud Logging, such as accessing specific tools, modifying or writing to memory locations, initiating or receiving communication. The log will include a timestamp, agent ID, session ID, payload, and input/output signature fingerprints.
The comprehensive nature of these logs, combined with the detailed metadata, can help administrators and auditors collect the necessary information to reconstruct client activities, identify anomalies, and enforce security policies effectively.
Additional security controls
In addition to layered security controls, a robust defense-in-depth strategy requires continuous monitoring and proactive measures to identify and mitigate threats.
Secrets scanning with Sensitive Data Protection
Sensitive Data Protection (SDP) discovery periodically scans the MCP Tools server on Cloud Run, specifically targeting the detection of hardcoded secrets. This proactive measure involves regular scans of the server’s build and runtime environment variables.
These scans detect insecure coding-practice risks, such as instances where sensitive information (including API keys, database credentials, and access tokens) has been inadvertently embedded directly in the code or configuration.
Vulnerability scanning
MCP server images stored in Artifact Registry are scanned for vulnerabilities before they are deployed. When a new image is pushed to Artifact Registry, it is automatically scanned for known vulnerabilities, providing actionable insights into potential weaknesses.
You can enforce policies that block the deployment of images with critical vulnerabilities as part of securing your environment from the ground up.
Threat detection
Security Command Center (SCC) provides AI protection that helps manage the security posture of your AI workloads by detecting threats and helping with mitigating risks to AI asset inventories. This includes managing threats to the MCP deployment through detection, investigation, and response.
SCC can also identify unauthorized access and data exfiltration attempts, delivering real-time alerts and remediation recommendations.
AI Protection helps you manage the security posture of your AI workloads by detecting threats and helping you to mitigate risks to your AI asset inventory.
Bring it all together
By strategically implementing these Google Cloud security services, you can establish a secure and resilient environment for your Model Context Protocol remote servers. Prioritizing authentication, network security, data protection, and continuous monitoring will ensure the integrity and confidentiality of your AI models and the sensitive information they process.
This unified, Zero Trust approach can help mitigate emerging AI-specific risks, and also provide a scalable and future-proof foundation for the evolution of your AI-driven applications.
To learn more about securing your AI workload, please refer to our documentation.
MCP Toolbox for Databases (Toolbox) is an open-source MCP server that makes it easy for developers to connect gen AI agents to enterprise data, with initial support for databases like BigQuery, AlloyDB, Cloud SQL, and Spanner. Since launching earlier this year, Toolbox has made it easier for millions of developers to access enterprise data in databases.
Today, we’re expanding Toolbox with a comprehensive new set of tools for Firestore. This will help developers build more modern web and mobile applications. Let’s explore how these new capabilities can improve your development process.
What MCP is, and how it unlocks AI-assisted workflows
MCP is an emerging open standard for connecting AI systems with tools and data sources through a standardized protocol, replacing fragmented, custom integrations. Think of MCP as a universal adapter for AI, allowing any compatible assistant to plug into any tool or database without needing a custom-built connector each time. Now with the MCP Toolbox, these assistants (such as those in an IDE or a CLI like the Gemini CLI) can connect directly to your Firestore database.
This is a massive step for AI-assisted workflows, from debugging data and testing security rules to managing your collections—all using natural language. For instance, a developer building a retail app can now ask their assistant, ‘Find all users whose wishlists contain discontinued product IDs,’ to perform data cleanup, without writing a single line of code.
Let’s explore how these new capabilities can improve your development process.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x7f536c5b1580>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
AI-assisted development meets the NoSQL world
As you carry out AI-assisted tasks, you’re probably looking for the most efficient way to interact with your data. Our new pre-built tools for Firestore enable you to do just that, directly from your Gemini CLI or other AI-powered development environment.
Firestore’s flexible document structure and powerful security rules offer unique capabilities for building modern mobile and web applications. These tools are crafted to empower the Firestore developer, helping them master both the flexibility of the document model and the creation of robust access controls that protect their app. You can now use your AI assistant to perform queries, carry out targeted document updates, and even validate your security rules before you deploy them, saving you time and preventing errors.
From QA bug to resilient feature: A developer’s story
Let’s take a hypothetical example. Alex is a full-stack developer on a team building a new e-commerce application using Firestore. She uses Gemini CLI to help her code, debug, and test. This morning, a high-priority bug was filed by the QA team: an issue in the staging environment is causing items to reappear in a user’s “wishlist” after being removed. Because of the blog, her release was blocked.
The bug hunt begins
Until now, investigating a bug meant Alex would have to manually click through the Cloud Console to inspect test documents or write a custom script just to query the staging database—a slow and cumbersome process. Now, she can simply have a conversation with Gemini CLI.
The bug report contains the test user accounts. Alex opens her terminal and asks:
“Hey, show me the Firestore data for the test users qa_user_123 and qa_user_456 from the users-staging collection.”
Gemini CLI understands this, calls the firestore-get-documents tool, and instantly displays the JSON for both user documents. Alex confirms the bug—the wishlist array contains stale data. She continues the conversation to understand the scope:
“Okay, that’s the bug. Find all users in the users-staging collection whose wishlist contains product-glasses(inactive).”
CLI uses the firestore-query-collection tool and reports back that 20 test accounts are affected. After developing a code fix, she needs to clean the test environment to verify it.
“For all 20 test users you just found, please remove product-glasses(inactive) from their wishlist.”
Gemini CLI confirms the plan and uses the firestore-update-document tool to perform the cleanup, clearing the way for a successful re-test.
The immediate bug is fixed, but Alex wants to ensure this class of error can never happen again. She decides to enforce the correct data structure with Firestore Security Rules.
Until now, validating security rules meant a disruptive context switch. Alex would have to copy her rules, navigate away from her terminal to the Firebase Console’s Rules Playground, or set up a local emulator just to check for syntax errors. This friction often discourages thorough, iterative testing within the main development loop.
Now, Alex drafts her new, stricter security rule. Before deploying, she asks Gemini CLI for a pre-flight check right from her terminal:
“new_rules.txt is a new Firestore Security Rule I’m working on for staging. Can you validate it for me?”
CLI uses the firestore-validate-rules tool and replies, “The issue is a missing semicolon at the end of the return statement” Alex fixes the typo instantly. For a final check, she asks:
“Show me the active Firestore security rules for this project.”
Using the firestore-get-rules tool, CLI displays the current ruleset, allowing Alex to do a final comparison. Confident and assured, she deploys her changes.
A task that could have taken hours of manual investigation, scripting, and context switching was completed in minutes. Alex didn’t just fix a bug; she used her AI assistant to make the entire application more resilient.
Getting started
These new Firestore tools within the MCP Toolbox represent our commitment to providing you with powerful, intuitive tools that accelerate the entire development lifecycle. By connecting your Firestore database directly to your AI assistant, you can spend less time on tedious tasks and more time building incredible applications.
Learn more about MCP Toolbox for Databases, connect it to your favorite AI-assisted coding platform, and experience the future of AI-accelerated, database-connected software development today.
People often think of BigQuery in the context of data warehousing and analytics, but it is a crucial part of the AI ecosystem as well. And today, we’re excited to share significant performance improvements to BigQuery that make it even easier to extract insights from your data with generative AI.
In addition to native model inference where computation takes place entirely in BigQuery, we offer several batch-oriented generative AI capabilities that combine distributed execution in BigQuery with distributed execution with remote LLM inference on Vertex AI, with functions such as:
ML.GENERATE_TEXT to generate text via Gemini, other Google-hosted partner LLMs (e.g., Anthropic Claude, Llama) or any open-source LLMs
AI.GENERATE_TABLE to generate structured tabular data via LLMs and their constrained decoding capabilities.
In addition to the above table-valued functions, you can use our row-wise functions such as AI.GENERATE for more convenient SQL query composition. All these functions are compatible with text data in managed tables and unstructured object files, such as images and documents. Thanks to the performance improvements we are unveiling today, users can expect dramatic gains in scalability, reliability, and usability across BigQuery and BigQuery ML:
Scalability: Over100x gain for first-party LLM models (tens of millions of rows per six-hour job), over 30x gain for first-party embedding models (tens to hundreds of millions rows per six-hour job), and added support for Provisioned Throughput quotas.
Reliability: Over 99.99% LLM inference query completion without any row failures; and over 99.99% row-level success rate across all jobs, with the rare per-row failures being easily retriable without failing the query.
Usability: Enhanced user experience by supporting default connections, enabling global endpoints. We also enabled a single place for quota management (in Vertex AI) by automatically retrieving quota from Vertex AI to BigQuery.
Let’s take a closer look at these improvements.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x7f536c5d4850>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
1. Scalability
We’ve dramatically improved throughput for our users on pay-as-you-go (PayGo) pricing model: ML.GENERATE_TEXT function throughput on first-party LLMs has increased by over 100x, and ML.GENERATE_EMBEDDING throughput by over 30x. For users requiring even higher performance, we’ve also added support for Vertex AI’s Provisioned Throughput (PT).
Google embeddings LLMs In BigQuery, each project has a fixed default quota for first-party text embedding models, including the most popular text-embedding-004/005 models. To enhance utilization and scalability, we introduced dynamic token-based batching to pack as many rows as possible into a single request, under the token constraint. Combined with other optimizations, this boosts scalability from 2.7 million to approximately 80 million rows(30x)per six-hour job (based on a default quota of 1500 QPM and 50 tokens per row dataset). You can further increase this capacity by raising your Vertex AI quota to 10,000 QPM without manual approval, which enables embedding over 500 million rows in a six-hour job.
Early access customers such as Faraday are excited for this scalability boost:
“I just did 12,678,259 embeddings in 45 min with BigQuery’s built-in Gemini. That’s about 5000 per second. Try doing that with an HTTP API!” – Seamus Abshere, Co-founder and CTO, Faraday
Google Gemini: PayGo users via dynamic shared quota Gemini models served via Vertex AI now use dynamic shared quota (DSQ) for pay-as-you-go requests. DSQ provides access to a large, shared pool of resources, with throughput dynamically allocated based on real-time availability and demand across all customers. We rebuilt our remote inference system with adaptive traffic and producer-consumer-based error-retry mechanism. This enabled us to effectively leverage the higher amount of, but less-guaranteed, quota from DSQ. Based on internal benchmarking results, now we can process roughly 10.2 million rows in a six-hour job with gemini-2.0-flash-001, or 9.3 million rows with gemini-2.5-flash, where each row has an average of 500 input and 50 output tokens. Specific numbers depend on factors such as token count and model. See the chart below for more details.
Google Gemini: Dedicated quota via provisioned throughput While our generative AI inference with Dynamic Shared Quota offers high throughput, it has an inherent upper bound and the potential for quota errors due to its non-guaranteed nature. To overcome these limitations, we’ve added support for Provisioned Throughput from Vertex AI. By purchasing dedicated capacity, you can help ensure consistently high throughput for demanding workloads, and get a reliable and predictable user experience. After purchasing Vertex AI provisioned throughput, you can easily leverage it in BigQuery gen AI queries by setting the “request_type” argument to “dedicated”.
2. Reliability
Facing limited and non-guaranteed generative AI inference quotas, we implemented a partial failure mode to allow queries to succeed, even if some rows fail. Via adaptive traffic control and a robust retry mechanism, across all users’ query traffic, we’ve now achieved 1) over 99.99% generative AI queries can finish without a single row failure, and 2) a row success rate of over 99.99%. Thisgreatly enhanced BigQuery gen AI reliability.
Note that the above row-level success rate is achieved based on independent queries that invoke our generative AI functions, including both table-valued functions and row-wise scalar functions, where the error retry takes place implicitly. If you have even large workloads that see occasional errors you can use this simple SQL script or the Dataform package, which allow you to iteratively retry failed rows to get to a 100% row success rate in almost all cases.
3. Usability
We have also made Gen AI experience on BigQuery more user-friendly. We’ve streamlined complex workflows like quota management and BQ connection/permission setup, and enabled global endpoint more accessible experiences for users in all regions.
3.1 Default connection for remote model creation Previously, setting up gen AI remote inference required a series of manual steps for users to configure a BigQuery connection and grant permissions to its service account. To solve this, we launched a new default connection. This feature automatically creates the connection and grants the necessary permissions, eliminating all manual config steps.
3.2 Global endpoint for first-party models BigQuery gen AI inference now supports the global endpoint from Vertex AI, in addition to dozens of regional endpoints. This global endpoint provides higher availability across the world, which is particularly useful for smaller regions that may not have immediate access to the latest first-party gen AI models. Users without a data residency requirement for ML processing can use the global endpoint.
The SQL below illustrates how to create a Gemini model with the global endpoint and the BigQuery default connection.
code_block
<ListValue: [StructValue([(‘code’, “CREATE OR REPLACE MODEL `myproject.mydataset.mymodel`rnREMOTE WITH CONNECTION DEFAULTrnOPTIONS(rn endpoint = ‘https://aiplatform.googleapis.com/v1/projects/my_project/locations/global/publishers/google/models/gemini-2.5-flash’rn)”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f536c58c190>)])]>
3.3 Automatic quota sync from Vertex AI BigQuery now automatically syncs with Vertex AI to fetch your quota. Thus, if you have a higher than default quota, BigQuery uses it automatically, without any manual configuration on your part.
Get started today
Ready to build your next AI application? Start generating text, embedding, or structured tables using gen AI directly in BigQuery today. Dive into our gen AI overview page and its linked documentation for more details. If you have any questions or feedback, reach out to us at bqml-feedback@google.com.
It’s hard to believe it’s been over 10 years since Kubernetes first set sail, fundamentally changing how we build, deploy, and manage applications. Google Cloud was at the forefront of the Kubernetes revolution with Google Kubernetes Engine (GKE), providing a robust, scalable, and cutting-edge platform for your containerized workloads. Since then, Kubernetes has emerged as the preferred platform for workloads such as AI/ML.
Kubernetes is all about sharing machine resources among the applications and pod networking is essential for the connectivity between workloads and services using the Container Network Interface (CNI).
As we celebrate the 10th year anniversary of GKE, let’s take a look at how we’ve built out network interfaces to provide you with the performant, secure, and flexible pod networking and how we’ve evolved our networking model to support AI workloads with the Kubernetes Network Driver.
Let’s take a look back in time and see how we got there.
2015-2017: Laying the CNI foundation with kubenet
In Kubernetes’s early days, we needed to establish reliable communication between containers. For GKE, we adopted a flat model of IP addressing so that the pods and the node could communicate freely with other resources in the Virtual Private Cloud (VPC) without going through tunnels and gateways. During these formative years, GKE’s early networking models often used kubenet, a basic network plugin. Kubenet provided a straightforward way to get clusters up and running, by creating a bridge on each node and allocating IP addresses to pods from a CIDR range dedicated to that node. While Google Cloud’s network handled routing between nodes, Kubenet was responsible for connecting pods to the node’s network and enabling basic pod-to-pod communication within the node.
During this time, we also introduced route-based clusters, which were based on Google Cloud routes, part of the Andromeda engine that powers all of cloud networking. The routes feature in Andromeda played a crucial role in IP address allocation and routing within the cluster network using VPC routing rules. This required advertising the pod IP ranges between the nodes.
However, as Kubernetes adoption exploded and applications grew in scale and complexity, we faced challenges around managing IP addresses and achieving high-performance communication directly between pods across different parts of a VPC. This pushed us to develop a networking solution that was more deeply integrated with the underlying cloud network.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x7f536c5b25e0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
2018-2019: Embracing VPC-native networking
To address these evolving needs and integrate with Google Cloud’s powerful networking capabilities, we introduced VPC-native networking for GKE. This marked a significant leap forward for how CNI operates in GKE, with alias IP ranges (the IP ranges that pods use in a node) becoming a cornerstone of the solution. VPC-native networking became the default and recommended approach, helping to increase the scale of the GKE clusters up to 15K nodes. With VPC-native clusters, the GKE CNI plugin ensures that pods receive IP addresses directly from your VPC network — they become first-class citizens on your network.
This shift brought a multitude of benefits:
Simplified IP management: GKE CNI plugin works with GKE to allocate pod IPs directly from the VPC, making them directly routable and easier to manage alongside your other cloud resources.
Enhanced security through VPC integration: Because pod IPs are VPC-native, you can apply VPC firewall rules directly to pods.
Improved performance and scalability: GKE CNI plugin facilitates direct routing within the VPC, reducing overhead and improving network throughput for pod traffic.
A foundation for advanced CNI features: VPC-native networking laid the groundwork for more sophisticated CNI functionalities that followed.
We referred to GKE’s implementation of CNI’s with VPC-native networking as GKE standard networking with dataplane v1 (DPv1). During this time, we also announced GA support for network policies with Calico. Network policies allow you to specify rules for traffic flow within your cluster, and also between pods and the outside world.
2020 and beyond: The eBPF revolution
The next major evolution in GKE’s CNI strategy arrived with the power of extended Berkeley Packet Filter or eBPF, which lets you run sandboxed programs in a privileged context. eBPF makes it safe to program the Linux kernel dynamically, opening up new possibilities for networking, security, and observability at the CNI level without having to recompile the kernel.
Recognizing this potential, Google Cloud embraced Cilium, a leading open-source CNI project built on eBPF, to create GKE Dataplane V2 (DPv2). Reaching general availability in May 2021, GKE Dataplane V2 represented a significant leap in GKE’s CNI capabilities:
Enhanced performance and scalability: By leveraging eBPF, CNI can bypass traditional kernel networking paths (like kube-proxy’s iptables-based service routing) for highly efficient packet processing for services and network policy.
Built-in network policy enforcement: GKE Dataplane V2 comes with Kubernetes network policy enforcement out-of-the-box, meaning you don’t need to install or manage a separate CNI like Calico solely for policy enforcement when using DPv2.
Enhanced observability at the data plane layer: eBPF enables deep insights into network flows directly from the kernel. GKE Dataplane V2 provides the foundation for features like network policy logging, offering visibility into CNI-level decisions.
Integrated security in the dataplane: eBPF enforces network policies efficiently and with context-awareness directly within CNI’s dataplane.
Simplified operations: As it’s a Google-managed CNI component, GKE Dataplane V2 simplifies operations for Customer workloads.
Advanced networking capabilities: Dataplane V2 unlocks a suite of powerful features that were not available or harder to achieve with Data Plane V1. These include:
Multi-networking: Allowing pods to have multiple network interfaces, connecting to different VPC networks or specialized network attachments, crucial for use cases like cloud native network functions (CNFs) and traffic isolation.
Service steering: Providing fine-grained control over traffic flow by directing specific traffic through a chain of service functions (like virtual firewalls or inspection points) within the cluster.
Persistent IP addresses for pods: In conjunction with the Gateway API, GKE Dataplane V2 allows pods to retain the same IP address across restarts or rescheduling, which is vital for certain stateful applications or network functions.
GKE Dataplane V2 is now the default CNI for new clusters in GKE Autopilot mode and our recommended choice for GKE Standard clusters, underscoring our commitment to providing a cutting-edge, eBPF-powered network interface.
2024: Scaling new heights for AI Training and Inference
In 2024, we marked a monumental achievement in GKE’s scalability, with the announcement that GKE supports clusters of up to 65,000 nodes. This incredible feat, a significant jump from previous limits, was made possible in large part by GKE Dataplane V2’s robust, highly optimized architecture. Powering such massive clusters, especially for demanding AI/ML training and inference workloads, requires a dataplane that is not only performant but also incredibly efficient at scale. The version of GKE Dataplane V2 underpinning these 65,000-node clusters is specifically enhanced for extreme scale and the unique performance characteristics of large-scale AI/ML applications — a testament to CNI’s ability to push the boundaries of what’s possible in cloud-native computing.
For AI/ML workloads, GKE Dataplane v2 also supports ever-increasing bandwidth requirements such as in our recently announced A4 instance. GKE Dataplane v2 also supports a variety of compute and AI/ML accelerators such the latest GB200 GPUs and Ironwood, Trillium TPUs.
For today’s AI/ML workloads networking plays critical role: AI and machine learning workloads are pushing the boundaries of computing as well as networking, presenting unique challenges for GKE networking interfaces:
Extreme throughput: Training large models requires processing massive datasets that demand upwards of terabit networking orchestrated by GKE networking interfaces.
Ultra-low latency: Distributed training relies on near-instantaneous communication between processing units.
Multi-NIC capabilities: Providingpods with multiple network interfaces, managed by GKE Dataplane V2’s multi-networking capability, can significantly boost bandwidth and allow for network segmentation.
2025 – Beyond CNI: addressing next gen Pod Networking challenges
Dynamic resource allocation (DRA) for networking
A promising Kubernetes innovation is dynamic resource allocation (DRA). Introduced to provide a more flexible and extensible way for workloads to request and consume resources beyond CPU and memory, DRA is poised to significantly impact how CNIs manage and expose network resources. While initially focused on resources like GPUs, its framework is designed for broader applicability.
In GKE, DRA (available in preview from GKE version 1.32.1-gke.1489001+) opens up possibilities for more efficient and tailored network resource management, helping demanding applications get the network performance they need using the Kubernetes Network Drivers (KNDs).
KNDs use DRA to expose Network resources at the Node level that can be referenced by all the Pod (or all containers). This is particularly relevant for AI/ML workloads, which often require very specific networking capabilities.
Looking ahead: Innovations shaping the future
The journey doesn’t stop here. With the increased adoption of accelerated workloads driving new architectures on GKE, the demands on Kubernetes networking will continue. One of the reference implementations for the Kubernetes Network Driver is the DRANET project. We look forward to continued discussions with the community and contributions to the DRANET project. We are committed to working with the community to deliver innovative customer centric solutions addressing these new challenges.
Second-generation AWS Outposts racks can now be shipped and installed at your data center and on-premises locations in Australia, Bahrain, Brazil, Brunei, Chile, Costa Rica, Egypt, European Union countries, Iceland, Indonesia, Israel, Japan, Jordan, Kenya, the Kingdom of Saudi Arabia, Kuwait, Malaysia, New Zealand, Peru, the Philippines, Singapore, Trinidad and Tobago, Türkiye, the United Arab Emirates (UAE), the United Kingdom, and Vietnam.
Outposts racks extend AWS infrastructure, AWS services, APIs, and tools to virtually any on-premises data center or colocation space for a truly consistent hybrid experience. Outposts racks are ideal for workloads that require low-latency access to on-premises systems, local data processing, and migration of applications with local system interdependencies. Outposts racks can also help meet data residency requirements. Second-generation Outposts racks support the latest generation of x86-powered Amazon Elastic Compute Cloud (Amazon EC2) instances, starting with C7i, M7i, and R7i instances. These instances provide up to 40% better performance compared to C5, M5, and R5 instances on first-generation Outposts racks. Second-generation Outposts racks also offer simplified network scaling and configuration, and support a new category of accelerated networking Amazon EC2 instances optimized for ultra-low latency and high throughput needs.
With the availability of second-generation Outposts racks in the above countries, you can use AWS services to run your workloads and data in country in your on-premises facilities and connect to the nearest available AWS Region for management and operations.
To learn more about second-generation Outposts racks, read this blog post and the user guide. For the most updated list of countries and territories and the AWS Regions where second-generation Outposts racks are supported, check out the Outposts racks FAQs page.
Amazon CloudWatch now offers cross-account and cross-region log centralization, allowing customers to copy log data from multiple AWS accounts and regions into a single destination account. This capability seamlessly integrates with AWS Organizations, enabling efficient aggregation of logs from workloads that span multiple accounts and regions into a single account without the need to manage custom solutions.
The log centralization feature provides the ability to scope the centralization rules to copy log data from their entire organization, specific organizational units, or selected accounts into a single account. To maintain source context and data lineage, log events are enriched with new system fields (@aws.account and @aws.region) that identify the original source account and region. Additional capabilities include selective log group copying, automatic merging of same-named log groups in the destination account, and optional backup region setup, simplifying centralized log management.
Log centralization is available in US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), and South America (São Paulo).
To learn more, visit the Amazon CloudWatch documentation. Customers can centralize one copy of logs for free. Additional copies are charged at $0.05/GB of logs centralized (the backup region feature is considered an additional copy). For details, visit the CloudWatch Pricing page.
Amazon EventBridge now extends AWS Key Management Service (KMS) customer managed key support to event bus rule filter patterns and input transformers. This capability enables you to use your own encryption keys to protect sensitive information in your event filtering and transformation logic to meet stringent security and compliance requirements while maintaining full control over your encryption keys.
Amazon EventBridge is a serverless event router that enables you to create scalable event-driven applications by routing events between your applications, third-party SaaS applications, and AWS services. Filter patterns determine which events match your rules, while input transformers allow you to customize the event data before sending it to targets. By encrypting these components with customer managed keys, you can help meet your organization’s compliance and governance requirements and use AWS CloudTrail to audit and track encryption key usage.
This feature is available in all commercial AWS Regions. Using this feature incurs no additional cost, but standard AWS KMS pricing applies. To learn more, visit the EventBridge documentation and AWS KMS documentation.
State and local governments across the nation face a myriad of challenges, including strained budgets, aging infrastructure, and a complex regulatory landscape. In California, these challenges are compounded by a rapidly growing population and increasing demand for public services. To address these issues, the state is turning to technology as a catalyst for change.
California, a state renowned for its innovation and technological advancements, is embracing the future of public sector services through their partnership with Google Cloud. By leveraging Google Cloud’s latest AI and machine learning, security and infrastructure technologies, we are helping agencies across the state to improve service delivery, streamline operations, bolster security, drive innovation, and ultimately improve the lives of Californians.
Streamlining health coverage: How Covered California innovates with Google Cloud
A great example of our partnership is Covered California, the state’s health insurance marketplace. By utilizing Google Cloud’s Document AI, Covered California has significantly streamlined its verification processes, enabling faster and more accurate eligibility determinations. This has led to increased efficiency and reduced administrative burdens, ultimately benefiting millions of Californians. Covered California also uses Assured Workloads to strengthen its security posture and comply with strict industry regulations, safeguarding sensitive patient data.
Covered California’s commitment to innovation extends beyond administrative efficiency, as demonstrated by their adoption of Google Security Operations, our intelligence-driven and AI-powered security operations platform.This underscores their commitment to safeguarding sensitive patient data. By leveraging advanced threat detection and response capabilities, Covered California is bolstering its security posture and ensuring the integrity of its systems.
Beyond Covered California, Google Cloud is joining forces with other innovative California organizations to leverage AI and machine learning to drive innovation and solve complex challenges.
University of California, Riverside (UCR): Expanding academic frontiers with Google Cloud’s AI tools
The University of California, Riverside (UCR) is leveraging Google Cloud’s AI and machine learning tools to revolutionize its research and academic capabilities. By utilizing Google Cloud’s Vertex AI platform, UCR researchers can accelerate the development and deployment of AI models, analyze complex datasets, and gain valuable insights. This enables UCR to explore groundbreaking research areas, such as natural language processing, computer vision, and predictive analytics. Additionally, Google Cloud’s infrastructure and storage solutions, like Google Cloud Compute Engine and Google Cloud Storage, provide the necessary foundation to support these AI initiatives.
Pushing the frontiers of science: Caltech leverages Google Cloud AI for groundbreaking research
This commitment to empowering groundbreaking research extends to the very cutting edge of scientific inquiry. Building on the foundation of providing powerful AI tools, we are also helping top-tier institutions build the next generation of research infrastructure. A powerful illustration of this is our initiative to support AI-optimized High-Performance Computing (HPC) for researchers at the California Institute of Technology (Caltech), enabling them to tackle humanity’s most complex challenges and accelerate discoveries that will benefit society for generations to come.
Advancing public service, together
The ongoing advancements in AI are creating a pivotal moment for transformation within state and local government, and Google Cloud is delivering tools designed to meet these unique needs. Agentspace empowers teams by providing a unified way to search and access vital information spread across different agency systems, streamlining complex workflows for quicker, more informed decision-making. Google Workspace revolutionizes daily operations by enabling seamless collaboration through real-time document co-editing and integrated communication tools like Google Meet and Chat. With the AI assistance of Gemini in Workspace, teams can significantly boost productivity by simplifying tasks such as drafting communications or summarizing reports, all while ensuring data is protected by enterprise-grade security. Together, these solutions help break down silos, save valuable time, and allow your teams to focus more on serving your communities effectively.
Google Public Sector is dedicated to supporting California’s ongoing digital transformation and partnering with state and local governments across the United States. Our commitment is to help agencies leverage Google’s AI, security, and infrastructure solutions to enhance service delivery, build more resilient and equitable communities, and better meet the evolving needs of the citizens they serve.
Join us to dive deeper into the technologies and strategies powering the next generation of government at the Google Public Sector Summit on October 29th in Washington, D.C. to see these innovations firsthand. This is your opportunity to connect with innovators, hear from public sector leaders, and learn how Google’s latest advancements in AI, security, and cloud can help you meet your mission. Register now to secure your spot!
Today AWS announced custom time periods for AWS Budgets, a new capability that lets you create budgets with flexible start and end dates. This enhancement allows you to define budget periods that align with your organization’s specific needs, moving beyond traditional calendar-based periods like monthly, quarterly, or annual budgets.
Custom time periods help you accurately monitor costs for projects with specific duration and funding limits. For example, if you have a three-month development project starting mid-month, you can create a single budget for that exact time frame and receive alerts when spending approaches your thresholds. This eliminates the need to calculate and split your project budget across multiple calendar months or maintain separate spreadsheets to track time-bound initiatives.
Custom time periods in AWS Budgets is available today in all AWS commercial Regions, except the AWS GovCloud (US) Regions and the China Regions.
Trading in capital markets demands peak compute performance, with every microsecond impacting critical decisions and market outcomes. At Google Cloud, we’re committed to providing global markets with the cutting-edge infrastructure they need to create and participate in digital exchange ecosystems. Our industry investments enable a purpose-built, cloud-native market infrastructure solution leveraging a global network that was built for security and scale, data, and AI capabilities. At the same time, we’re building industry-specific innovations that offer performant, scalable, and resilient environments for exchanges and trading participants, ultimately transforming how they access markets, utilize data, and manage risk.
The general availability of our C3 and C4 machine series, powered by the latest Intel Xeon Scalable processors and our custom Titanium offload network investments, represents a significant leap forward for latency-sensitive trading applications. In addition, Citadel Securities recently joined us as part of our Cloud WAN layer 2 solution that enables point-to-point connections over Google’s proprietary global network and complements our existing Network Connectivity Center layer 3 solution. Together, these offerings help trading participants achieve low-latency compute and connectivity for their globally distributed trading infrastructure.
Building on these foundational investments, we’re excited to announce new benchmarks specifically tailored for trading participants who require minimal latency and jitter with maximal throughput to handle the increasing market velocity of their trading infrastructures. In collaboration with 28Stone, a consultancy with expertise in capital markets technology and electronic trading solutions, we’ve tested and validated the performance of our C3 machine types to meet the needs of trading participants.
28Stone’s published report highlights that Google Cloud can achieve a round-trip trading decision in less than 50 microseconds at P99 across a range of compute profiles.
“As a 24×7 digital exchange focused on delivering institutional exchange level consistency in the cloud, we are excited to see that Bullish’s participants can immediately leverage what 28Stone and Google Cloud have publicly demonstrated: the technical capabilities to rapidly spot opportunity and engage with confidence.” – Alan Fraser, VP of Platform & Operations, Bullish
The role of latency, jitter, and throughput
In today’s hyper-competitive electronic trading landscape, the performance of underlying technology infrastructure is not just a contributing factor to success — it’s fundamental. Three key performance metrics stand out for their impact on trading outcomes: latency, jitter, and throughput.
Latency: the race
In the context of trading, latency refers to the time it takes to receive, understand, and decide an action from a single datum. For trading systems, this means the time it takes for market data to reach the trading algorithm, for that algorithm to make a decision, and finally send a response back to the exchange. In a world of high-frequency trading (HFT) and algorithmic execution, single-microsecond delays can mean the difference between a profitable trade and a missed opportunity, a less favorable execution price, or getting filled on a resting order.
28Stone demonstrated latency for participants to make a straightforward trade decision — from receiving the ticks to processing a trade decision — of between 1.5µs and 3.5µs (P50 and P99, respectively) for normal replay speed of CME Group Equity market pcap files using open source Data Plane Development Kit (DPDK) network acceleration. Additionally, they demonstrated similar latency profiles with data rates increasing by up to 100x.
Jitter: the quest for consistency
Jitter refers to the variation in latency over time. While low latency is critical, high jitter — meaning unpredictable and inconsistent delays — can negatively impact trading performance. If the time it takes for an order to reach the exchange varies significantly, it becomes incredibly difficult to predict execution outcomes, manage risk, or implement trading strategies that rely on time certainty. If you were to use UberEats to order lunch but were told it would be delivered between 12 and 5pm, you would be unlikely to choose that option.
28Stone demonstrated that participants can expect their experience in Google Cloud to be uniform, regardless of volatile market conditions. Google Cloud infrastructure delivers low jitter, allowing trading algorithms to operate with a higher degree of certainty, leading to more stable and reliable market mechanics – as well as profitability from good trading signals.
As illustrated below for various C3 instance sizes using C++ with or without DPDK, the percentiles demonstrate strong consistent message performance. The increase in latency between each percentile demonstrates network and compute consistency measured in single and low tens of microseconds. The 28Stone report contains complete histograms for various configurations, allowing customers to see how to balance their specific latency and jitter requirements against the configurations’ cost profiles.
Throughput: handling the pressure of information
Throughput measures the amount of data that can be processed or transmitted within a given window of time (typically one second). In trading, throughput is a system’s capacity to handle large volumes of market data updates, process numerous events simultaneously, and aggregate trade events efficiently — especially during periods of high market volatility or peak trading hours. Insufficient throughput can lead to data queues, order rejections, increased risk exposure, and an inability to keep pace with market activity.
The C3 machine series leverages Google Cloud’s high-bandwidth networking capabilities, which include up to 200 Gbps per VM Tier 1 networking and services such as Cloud WAN. These machines are designed to provide the high throughput traders need to ingest vast streams of market data and execute on a large number of orders per second. The result is a trading system that performs optimally under strenuous load.
As highlighted, 28Stone tested increasingly higher replay speeds, resulting in higher bit rates that various trading systems may see in a variety of markets.
In summary, minimizing latency, reducing jitter, and maximizing throughput aren’t abstract technical pursuits but about consistency and certainty, akin to your lunch order arriving near your lunch break and not in the evening. Modern capital markets trading participants and digital exchanges demand these capabilities to enable market quality, fairness, and operational resilience.
Embracing new market dynamics with cloud
Financial markets are in a perpetual state of flux, characterized by evolving regulations, a proliferation of new asset classes (including those that trade 24/7 like FX and digital assets), and sudden, sharp spikes in trading volumes. In this dynamic environment, the ability to scale infrastructure rapidly, operate with flexibility, and manage costs effectively is not just an advantage — it’s a prerequisite for survival and growth.
With Google Cloud, exchanges and participants enjoy several benefits:
Elastic scalability: Automatically scale resources up or down based on real-time demand. This means that during unexpected volume spikes — driven by market news, geopolitical events, or algorithmic trading activity — trading infrastructure can dynamically access additional compute power to maintain optimal performance. When volumes normalize, resources can be scaled back down, so that firms only pay for what they use.
Flexibility for continuous trading: Cloud infrastructure provides the resilience and availability that continuous 24/7 trading requires, as demonstrated daily by critical workloads for global 24/7 industry platforms like air travel, retail banking, media, and retail. Google Cloud’s global network and multiple regions help ensure high uptime and fault tolerance, both critical for markets that never sleep. As seen with Deutsche Börse’s development of a digital asset trading platform on Google Cloud, the architecture is designed for 24/7 availability and can be rolled out quickly to new markets. Google Cloud’s continuous operations allow firms to capitalize on opportunities around the clock without the massive investment and operational overhead of maintaining private data centers with equivalent N+1 redundancy globally.
Rapid engagement in new markets: Historically, expanding into new geographical markets or launching new asset classes involved lengthy and costly infrastructure build-outs. With Google Cloud’s global regions, firms can deploy trading infrastructure in new regions in a fraction of the time. This agility allows for rapid market entry, thereby enabling businesses to seize new revenue streams and diversify their business with lower upfront investment costs and risk. Quickly provisioning and de-provisioning resources also means firms can experiment with new market strategies more freely, knowing they are not locked into long-term hardware commitments.
Want to see it for yourself?
By harnessing the scale, flexibility, and compelling cost-to-performance ratio of Google Cloud, including the powerful C3 and C4 family instances, trading participants can transform market volatility from a threat into an opportunity. They can confidently handle volume surges, support round-the-clock trading, and swiftly enter new markets, all while maintaining tight control over their operational expenditure and maximizing their competitive edge.
Want to apply these capabilities to the announced CME Group market migration to Google Cloud? Register with Google Cloud for notifications and engagements.
Amazon Relational Database Service (RDS) for MySQL now supports new Amazon RDS Extended Support minor version 5.7.44-RDS.20250818. We recommend that you upgrade to this version to fix known security vulnerabilities and bugs in prior versions of MySQL. Learn more about upgrading your database instances, including minor and major version upgrades, in the Amazon RDS User Guide.
Amazon RDS Extended Support provides you more time, up to three years, to upgrade to a new major version to help you meet your business requirements. During Extended Support, Amazon RDS will provide critical security and bug fixes for your MySQL databases on Aurora and RDS after the community ends support for a major version. You can run your MySQL databases on Amazon RDS with Extended Support for up to three years beyond a major version’s end of standard support date. Learn more about Extended Support in the Amazon RDS User Guide and the Pricing FAQs.
Amazon RDS for MySQL makes it simple to set up, operate, and scale MySQL deployments in the cloud. See Amazon RDS for MySQL Pricing for pricing details and regional availability. Create or update a fully managed Amazon RDS database in the Amazon RDS Management Console.
Today, AWS End User Messaging SMS announces support for AWS CloudFormation, enabling customers to deploy and manage SMS resources using AWS CloudFormation templates. Using AWS CloudFormation, customers can standardize how they setup and manage their SMS resources along side their other AWS resources in the development environment simplifying deployments and delivery pipelines. SMS resources supported via CloudFormation include phone numbers, sender IDs, configuration sets, protection configurations, opt-out lists, resource policies, and phone pools.
AWS End User Messaging provides developers with a scalable and cost-effective messaging infrastructure without compromising the safety, security, or results of their communications. Developers can integrate messaging to support uses cases such as one-time passcodes (OTP) at sign-ups, account updates, appointment reminders, delivery notifications, promotions and more.
Support for CloudFormation for SMS resources is available in all AWS Regions where End User Messaging is available, see the AWS Region table.
AWS Parallel Computing Service (PCS) now supports Amazon EC2 Capacity Blocks for ML. You can now use Amazon EC2 instances reserved using EC2 Capacity Blocks natively in PCS clusters.
Native support for EC2 Capacity Blocks in PCS simplifies capacity planning for cutting-edge GPU-based workloads in Slurm clusters, helping to ensure that GPU capacity is available when and where it’s needed. EC2 Capacity Blocks can be associated with PCS compute node groups via an EC2 Launch Template.
PCS is a managed service that makes it easier for you to run and scale your high performance computing (HPC) workloads and build scientific and engineering models on AWS using Slurm. You can use PCS to build complete, elastic environments that integrate compute, storage, networking, and visualization tools. PCS simplifies cluster operations with managed updates and built-in observability features, helping to remove the burden of maintenance. You can work in a familiar environment, focusing on your research and innovation instead of worrying about infrastructure.
PCS now supports EC2 Capacity Blocks in all AWS Regions where both services are available. Read more about PCS support for EC2 Capacity Blocks in the PCS User Guide.
Today, Amazon Elastic Kubernetes Service (EKS) announced a new catalog of community add-ons that includes metrics-server, kube-state-metrics, cert-manager, prometheus-node-exporter, fluent-bit, and external-dns. This enables you to easily find, select, configure, and manage popular open-source Kubernetes add-ons directly through EKS. Each add-on has been packaged, scanned, and validated for compatibility by EKS, with container images securely hosted in an EKS-owned private Amazon Elastic Container Registry (ECR) repository.
To make Kubernetes clusters production-ready, you need to integrate various operational tools and add-ons. These add-ons can come from various sources including AWS and open-source community repositories. Now, EKS makes it easy for you to access a broader selection of add-ons, providing a unified management experience for AWS and community add-ons. You can view available add-ons, compatible versions, configuration options, and install and manage them directly through the EKS Console, API, CLI, eksctl, or IaC tools like AWS CloudFormation.
This feature is available in all AWS GovCloud (US) Regions. To learn more visit the EKS documentation.
Amazon Web Services (AWS) announces the availability of high performance Storage Optimized Amazon EC2 I7i instances in the AWS South America (São Paulo), Canada West (Calgary) regions. Powered by 5th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz, these new instances deliver up to 23% better compute performance and more than 10% better price performance over previous generation I4i instances. Powered by 3rd generation AWS Nitro SSDs, I7i instances offer up to 45TB of NVMe storage with up to 50% better real-time storage performance, up to 50% lower storage I/O latency, and up to 60% lower storage I/O latency variability compared to I4i instances.
I7i instances offer the best compute and storage performance for x86-based storage optimized instances in Amazon EC2, ideal for I/O intensive and latency-sensitive workloads that demand very high random IOPS performance with real-time latency to access the small to medium size datasets (multi-TBs). Additionally, torn write prevention feature support up to 16KB block sizes, enabling customers to eliminate database performance bottlenecks.
I7i instances are available in eleven sizes – nine virtual sizes up to 48xlarge and two bare metal sizes – delivering up to 100Gbps of network bandwidth and 60Gbps of Amazon Elastic Block Store (EBS) bandwidth. To learn more, visit the I7i instances page.
Amazon Lex now allows you to leverage large language models (LLMs) to improve the natural language understanding of your deterministic conversational AI bots in eight new languages: Chinese, Japanese, Korean, Portuguese, Catalan, French, Italian, and German. With this capability, your voice- and chat-bots can better handle complex utterances, maintain accuracy despite spelling errors, and extract key information from verbose inputs to fulfill the customer’s request. For example, a customer could say ‘Hi I want to book a flight for my wife, my two kids and myself’, and the LLM will properly identify to book flight tickets for four people.
This feature is available in 10 commercial AWS Regions where Amazon Connect is available: Europe (Ireland), Europe (Frankfurt), US East (N. Virginia), Asia Pacific (Seoul), Europe (London), Asia Pacific (Tokyo), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Canada (Central). To learn more about this feature, visit Amazon Lex documentation or to learn how Amazon Connect and Amazon Lex deliver cloud-based conversational AI experiences for contact centers, please visit the Amazon Connect website.