Amazon Bedrock now offers comprehensive CloudWatch metrics support for Agents, enabling developers to monitor, troubleshoot, and optimize their agent-based applications with greater visibility. This new capability provides detailed runtime metrics for both InvokeAgent and InvokeInlineAgent operations, including invocation counts, latency measurements, token usage, and error rates, helping customers better understand their agents’ performance in production environments.
With CloudWatch metrics integration, developers can track critical performance indicators such as total processing time, time-to-first-token (TTFT), model latency, and token counts across different dimensions including operation type, model ID, and agent alias ARN. These metrics enable customers to identify bottlenecks, detect anomalies, and make data-driven decisions to improve their agents’ efficiency and reliability. Customers can also set up CloudWatch alarms to receive notifications when metrics exceed specified thresholds, allowing for proactive management of their agent deployments.
CloudWatch metrics for Amazon Bedrock Agents is now available in all AWS Regions where Amazon Bedrock is supported. To get started with monitoring your agents, ensure your IAM service role has the appropriate CloudWatch permissions. For more information about this feature and implementation details, visit the Amazon Bedrock documentation or refer to the CloudWatch User Guide for comprehensive monitoring best practices.
AWS CodeBuild now supports new IAM condition keys enabling granular access control on CodeBuild’s resource-modifying APIs. The new condition keys cover most of CodeBuild’s API request contexts, including network settings, credential configurations and compute restrictions. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages ready for deployment.
The new condition keys allow you to create IAM policies that better enforce your organizational policies on CodeBuild resources such as projects and fleets. For example, you can use codebuild:vpcConfig.vpcId condition keys to enforce the VPC connectivity settings on projects or fleets, codebuild:source.buildspec condition keys to prevent unauthorized modifications to project buildspec commands, and codebuild:computeConfiguration.instanceType condition keys to restrict which compute types your builds can use.
The new IAM condition keys are available in all regions where CodeBuild is offered. For more information about the AWS Regions where CodeBuild is available, see the AWS Regions page.
For a full list of new CodeBuild IAM condition keys, please visit our documentation. To learn more about how to get started with CodeBuild, visit the AWS CodeBuild product page.
Today, Amazon DynamoDB announces the general availability of DynamoDB local on AWS CloudShell, a browser-based, pre-authenticated shell that you can launch directly from the AWS Management Console. With DynamoDB local, you can develop and test your applications by running DynamoDB in your local development environment without incurring any costs.
DynamoDB local works with your existing DynamoDB API calls without impacting your production environment. You can now start DynamoDB local just by using dynamodb-local alias in CloudShell to develop and test your DynamoDB tables anywhere in the console without downloading or installing the AWS CLI nor DynamoDB local. To interact with DynamoDB local running in CloudShell with CLI commands, use the –endpoint-url parameter and point it to localhost:8000.
Monitoring food intake is a key factor in maintaining a balanced diet, but traditional documentation can be cumbersome and ineffective. Oviva set out to change this. That’s why we developed an AI-powered meal logging app that simplifies the meal logging process and enhances the quality of feedback people receive, helping them make better dietary decisions.
The challenges of effective meal logging
To receive actionable, personalized feedback on dietary habits, people need to document meals and accumulate data. However, traditional meal logging can be tedious and often lacks immediate or tangible feedback to help people stay motivated. Users often find themselves overwhelmed by complex nutritional data, which can lead to confusion. Our goal was to create a simpler, engaging, rewarding, and educational experience that encourages long-term adherence to healthy eating habits.
Exploring AI-driven benefits
To address these challenges, we leveraged the power of Artificial Intelligence (AI) to transform meal logging from a simple data entry task into a dynamic, interactive experience. Initially, we explored OpenAI’s services but ultimately found that Gemini and Vertex AI offered a more compelling solution for us due to several factors:
Reliable GCP infrastructure: Google Cloud’s robust infrastructure ensured high availability and scalability for our application, which was a critical consideration for a user-facing app.
Technical support and expertise: Google Cloud’s dedicated support team and comprehensive documentation provided invaluable assistance throughout our development journey.
Beneficial pricing: Compared to OpenAI, Google Cloud’s competitive pricing model for Gemini and Vertex AI aligned better with our long-term cost platform strategy.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea4ae2b0580>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Our AI-powered solution
Our AI algorithms analyze the logged meals in near real-time, providing people with feedback that is not just specific and personalized but also easy to understand and act upon. This feedback focuses on helping people maintain a balanced diet throughout the day, rather than overwhelming them with detailed nutritional breakdowns.
Key Features of the app
Personalized, actionable feedback: Our AI analyzes individual dietary patterns and provides feedback tailored to the person’s specific needs. For instance, if a user consistently logs meals low in protein, the system would suggest incorporating more lean meats or plant-based proteins, together with simple and accessible recipe suggestions. This feedback is immediate and contextual, allowing people to make continuously better decisions.
Simplified user experience: Logging meals is now simpler and more intuitive. People can log their meals by snapping a photo or selecting from frequent meals, and our AI takes care of describing and analysing the meal in depth. The system also learns from user behavior, making future logging faster and more accurate.
Integrated with daily goals: The AI-powered meal logging is integrated into goal tracking to keep an eye on behaviors. People can set goals, and the system will provide feedback on how well their meals align with these goals, reinforcing positive behavior and suggesting adjustments when necessary.
Positive reinforcement through gamification: We’ve incorporated gamification elements, such as daily streaks, to make meal logging more sticky. People can also earn rewards for consistent logging and meeting dietary goals, which helps build long-term habits. This positive reinforcement is crucial for keeping users motivated and invested in their journey for a healthier self.
Key technical challenges to overcome
One of our main technical challenges was ensuring reliable AI performance during peak usage hours. Our app’s user base exhibits cyclical activity patterns, with meal logging concentrated around specific times. This behavior necessitates significant fluctuations in processing resources, sometimes requiring several orders of magnitude higher capacity. Low latencies and excellent availability have proven to be a reliable way to achieve our goals, without dedicating engineering resources to make our offering scalable.
The Impact on our users
Early feedback from our users indicates that the AI-powered meal logging feature is making a significant difference in how they approach their diets. People report feeling more confident in their food choices and more motivated to maintain healthy eating habits. The simplicity and immediacy of the feedback have also improved user retention, with more people consistently logging their meals over time.
As we continue to refine and enhance our AI capabilities, we plan to introduce even more personalized features. Future updates will include deeper integration with other metrics, such as physical activity and sleep patterns, to provide even more comprehensive dietary advice. Our vision is to create a holistic digital coach that supports people in every aspect of their journey to a healthier self.
The introduction of AI-powered meal logging, developed with the support of Google Cloud’s Gemini and Vertex AI, marks a significant step forward in our mission to improve people’s wellbeing through technology. By making meal logging easier, more engaging, and more informative, we are empowering people to take control of their diets and, ultimately, their health. We’re excited to see the positive impact this feature will have and look forward to continuing our work to make healthcare more personalized and effective for everyone.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is now available in Asia Pacific (Thailand) and Mexico (Central) regions. Customers can create Amazon MSK Provisioned clusters in these regions starting today.
Amazon MSK is a fully managed service for Apache Kafka and Kafka Connect that makes it easier for you to build and run applications that use Apache Kafka as a data store. Amazon MSK is fully compatible with Apache Kafka, which enables you to more quickly migrate your existing Apache Kafka workloads to Amazon MSK with confidence or build new ones from scratch. With Amazon MSK, you spend more time building innovative streaming applications and less time managing Kafka clusters.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 4.0, bringing the latest advancements in cluster management and performance to MSK Provisioned. Kafka 4.0 introduces a new consumer rebalance protocol, now generally available, that helps ensure smoother and faster group rebalances. In addition, Kafka 4.0 requires brokers and tools to use Java 17, providing improved security and performance, includes various bug fixes and improvements, and deprecates metadata management via Apache ZooKeeper.
To start using Apache Kafka 4.0 on Amazon MSK, simply select version 4.0.x when creating a new cluster via the AWS Management Console, AWS CLI, or AWS SDKs. You can also upgrade existing MSK provisioned clusters with an in-place rolling update. Amazon MSK orchestrates broker restarts to maintain availability and protect your data during the upgrade. Kafka version 4.0 support is available today across all AWS regions where Amazon MSK is offered. For more details, see the Amazon MSK Developer Guide and the Apache Kafka release notes for version 4.0.
Today, CloudWatch Synthetics, which allows monitoring of customer workflows on websites through periodically running custom code scripts, announces two new features: canary safe updates and automatic retries for failing canaries. The former allows you to test updates for your existing canaries before applying changes and the latter enables canaries to automatically attempt additional retries when a scheduled run fails, helping to differentiate between genuine and intermittent failures.
Canary safe updates helps minimize potential monitoring disruptions caused by erroneous updates. By doing a dry run you can verify canary compatibility with newly released runtimes, or with any configuration or code changes. It minimizes potential monitoring gaps by maintaining continuous monitoring during update processes and mitigates risk to end user experience in the process of keeping canaries up-to-date. The automatic retries feature helps in reducing false alarms. When enabled, it provides more reliable monitoring results by distinguishing between persistent issues and intermittent failures preventing unnecessary disruption. Users can analyze temporary failures using the canary runs graph, which employs color-coded points to represent scheduled runs and their retries. You can start using these features by accessing CloudWatch Synthetics through the AWS Management Console, AWS CLI, or CloudFormation.
Dry runs for safe canary updates and automatic retries are are priced the same as regular canary runs and are available in all commercial AWS Regions.
To learn more about safe canary updates and automatic retries visit the linked Amazon CloudWatch Synthetics documentation. Or get started with Synthetics monitoring by visiting the user guide.
Amazon Bedrock Data Automation (BDA) now supports video blueprints so you can generate tailored, accurate insights in a consistent format for your multimedia analysis applications. BDA automates the generation of insights from unstructured multimodal content such as documents, images, audio, and videos for your GenAI-powered applications. With video blueprints, you can customize insights — such as scene summaries, content tags, and object detection — by specifying what to generate, the output data type, and the natural language instructions to guide generation.
You can create a new video blueprint in minutes or select from a catalog of pre-built blueprints designed for use cases such as media search or highlight generation. With your blueprint, you can generate insights from a variety of video media including movies, television shows, advertisements, meetings recordings, and user-generated videos. For example, a customer analyzing a reality television episode for contextual ad placement can use a blueprint to summarize a scene where contestants are cooking, detect objects like ‘tomato’ and ‘spaghetti’, and identify the logos of condiments used for cooking. As part of the release, BDA also enhances logo detection and the Interactive Advertising Bureau (IAB) taxonomy in standard output.
Video blueprints are available in all AWS Regions where Amazon Bedrock Data Automation is supported.
Amazon Inspector now automatically maps your Amazon Elastic Container Registry (Amazon ECR) images to specific tasks running on Amazon Elastic Container Service (Amazon ECS) or pods running on Amazon Elastic Kubernetes Service (Amazon EKS), helping identify where the images are actively in use. This enables you to focus your limited resources on patching most critical vulnerable images that are associated with running workloads, improving security and mean- time to remediation.
With this launch, you can use Amazon Inspector console or APIs to identify your actively used container images, when you last used an image, and which clusters are running the image. This information will be included in your findings and resource coverage details, and will be routed to EventBridge. You can also control how long an image is monitored by Inspector after its ‘last in use’ date by updating the ECR re-scan duration using the console or APIs. This is in addition to the existing push and pull date settings. Your Amazon ECR images with continuous scanning enabled on Amazon Inspector will automatically get this updated data within your Amazon Inspector findings.
Amazon Inspector is a vulnerability management service that continually scans AWS workloads including Amazon EC2 instances, container images, and AWS Lambda functions for software vulnerabilities, code vulnerabilities, and unintended network exposure across your entire AWS organization.
This feature is available at no additional cost to Amazon Inspector customers scanning thier container images in Amazon Elastic Container Registry (ECR). Feature is available in all commercial and AWS GovCloud (US) Regions where Amazon Inspector is available.
Amazon Lightsail now supports IPv6-only and dual-stack PrivateLink interface VPC endpoints. AWS PrivateLink is a highly available, scalable service that allows you to privately connect your VPC to services and resources as if they were in your VPC.
Previously, Lightsail supported private connectivity over PrivateLink using IPv4-only VPC endpoints. With today’s launch, customers can use IPv6-only, IPv4-only, or dual-stack VPC endpoints to create a private connection between their VPC and Lightsail, and access Lightsail without traversing the public internet.
Lightsail supports connectivity using PrivateLink in all AWS Regions supporting Lightsail. To learn more about accessing Lightsail using PrivateLink, please see documentation.
Starting today, AWS Entity Resolution is now available in AWS Canada (Central) and Africa (Cape Town) Regions. With AWS Entity Resolution, organizations can match and link related customer, product, business, or healthcare records stored across multiple applications, channels, and data stores. You can get started in minutes using matching workflows that are flexible, scalable, and can seamlessly connect to your existing applications, without any expertise in entity resolution or ML.
With this launch, AWS Entity Resolution rule-based and ML-powered workflows are now generally available in 12 AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Canada (Central), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), Europe (London), and Africa (Cape Town).
Additional AWS Config rules are now available in 17 AWS Regions. AWS Config rules help you automatically evaluate your AWS resource configurations for desired settings, enabling you to assess, audit, and evaluate configurations of your AWS resources.
When a resource violates a rule, an AWS Config rule evaluates it as non-compliant and can send you a notification through Amazon EventBridge. AWS Config provides managed rules, which are predefined, customizable rules that AWS Config uses to evaluate whether your AWS resources comply with common best practices.
With this expansion, AWS Config managed rules in the following AWS Regions: Africa (Cape Town), Asia Pacific (Hong Kong), Asia Pacific (Hyderabad), Asia Pacific (Jakarta), Asia Pacific (Kuala Lumpur), Asia Pacific (Melbourne), Asia Pacific (Osaka), Canada (Calgary), Europe (Milan), Europe (Paris), Europe (Stockholm), Europe (Zaragoza), Europe (Zurich), Middle East (Bahrain), Middle East (Tel Aviv), Middle East (UAE), South America (São Paulo).
You will be charged per rule evaluation in your AWS account per AWS Region. Visit the AWS Config pricing page for more details. To learn more about AWS Config rules, visit our documentation.
Amazon Cognito announces support for the OpenID Connect (OIDC) prompt parameter in Cognito Managed Login. Managed Login provides a fully-managed, hosted sign-in and sign-up experience that customers can personalize to align with their company or application branding. This new capability enables customers to control authentication flows more precisely by supporting two commonly requested prompt values: ‘login’ for re-authentication scenarios and ‘none’ for silent authentication state check. These prompt parameters respectively allow applications to specify whether users should be prompted to authenticate again or leverage existing sessions, enhancing both security and user experience. With this launch, Cognito can also pass through select_account and consent prompts to third-party OIDC providers when the user pool is configured for federated sign-in.
With the ‘login’ prompt, applications can now require users to re-authenticate explicitly while maintaining their existing authenticated sessions. This is particularly useful for scenarios requiring additional and more recent authentication verification, such as right before accessing sensitive information or performing transactions. The ‘none’ prompt enables a silent check on authentication state, allowing applications to check if users have an existing active authentication session without having to re-authenticate. This prompt can be valuable for implementing seamless single sign-on experiences across multiple applications sharing the same user pool.
This enhancement is available in Amazon Cognito Managed Login to customers on the Essentials or Plus tiers in all AWS Regions where Amazon Cognito is available. To learn more about implementing these authentication flows, visit the Amazon Cognito documentation.
Today, Amazon SageMaker and Amazon DataZone announced a new data governance capability that enables customers to move a project from one domain unit to another. Domain units enable customers to create business unit/team level organization and manage authorization policies per their business needs. Customers can now take a project mapped to a domain unit and organize it under a new domain unit within their domain unit hierarchy. The move project feature lets customers reflect changes in team structures as business initiatives or organizations shift by allowing them to change a project’s owning domain unit.
As an Amazon SageMaker or Amazon DataZone administrator, you can now create domain units (e.g Sales, Marketing) under the top-level domain and organize the catalog by moving existing projects to new owning domain units. Users can then login to the portal to browse and search assets in the catalog by the domain units associated with their business units or teams.
The move project feature for domain units is available in all AWS Regions where Amazon SageMaker and Amazon DataZone are available.
Amazon Data Lifecycle Manager now offers customers the option to use Internet Protocol version 6 (IPv6) addresses for their new and existing endpoints. Customers moving to IPv6 can simplify their networks stack by running their Data Lifecycle Manager dual-stack endpoints on a network supporting both IPv4 and IPv6, depending on the protocol used by their network and client.
Customers create Amazon Data Lifecycle Manager policies to automate the creation, retention, and management of EBS Snapshots and EBS-backed Amazon Machine Images (AMIs). The policies can also automatically copy created resources across AWS Regions, move EBS Snapshots to EBS Snapshots Archive tier, and manage Fast Snapshot Restore. Customers can also create policies to automate creation and retention of application-consistent EBS Snapshots via pre and post-scripts, as well as create Default Policies for comprehensive protection for their account or AWS Organization.
Amazon Data Lifecycle Manager with IPv6, supported in all AWS Commercial Regions, is now available in the AWS GovCloud (US) Regions.
To learn more about configuring Amazon Data Lifecycle Manager endpoints for IPv6, please refer to our documentation.
AWS CodePipeline now supports Deploy Spec file configurations in the EC2 Deploy action, enabling you to specify deployment parameters directly in your source repository. You can now include either a Deploy Spec file name or deploy configurations in your EC2 Deploy action. The action accepts Deploy Spec files in YAML format and maintains compatibility with existing CodeDeploy AppSpec files.
The deployment debugging experience for large-scale EC2 deployments is also enhanced. Previously, customers relied solely on action execution logs to track deployment status across multiple instances. While these logs provide comprehensive deployment details, tracking specific instance statuses in large deployments was challenging. The new deployment monitoring interface displays real-time status information for individual EC2 instances, eliminating the need to search through extensive logs to identify failed instances. This improvement streamlines troubleshooting for deployments targeting multiple EC2 instances.
To learn more about how to use the EC2 deploy action, visit our documentation. For more information about AWS CodePipeline, visit our product page. These new actions are available in all regions where AWS CodePipeline is supported, except the AWS GovCloud (US) Regions and the China Regions.
AWS CodePipeline now offers a new Lambda deploy action that simplifies application deployment to AWS Lambda. This feature enables seamless publishing of Lambda function revisions and supports multiple traffic-shifting strategies for safer releases.
For production workloads, you can now deploy software updates with confidence using either linear or canary deployment patterns. The new action integrates with CloudWatch alarms for automated rollback protection – if your specified alarms trigger during traffic shifting, the system automatically rolls back changes to minimize impact.
To learn more about using this Lambda Deploy action in your pipeline, visit our documentation. For more information about AWS CodePipeline, visit our product page. These new actions are available in all regions where AWS CodePipeline is supported, except the AWS GovCloud (US) Regions and the China Regions.
Today, we’re excited to announce the general availability (GA) of GKE Data Cache, a powerful new solution for Google Kubernetes Engine to accelerate the performance of read-heavy stateful or stateless applications that rely on persistent storage via network attached disks. By intelligently utilizing high-speed local SSDs as a cache layer for persistent disks, GKE Data Cache helps you achieve lower read latency and higher queries per second (QPS), without complex manual configuration.
Using GKE Data Cache with Postgres we have seen:
Up to a 480% increase in transactions per second for PostgreSQL on GKE
Up to a 80% latency reduction for PostgreSQL on GKE
“The launch of GKE Persistent Disk with Data Cache enables significant improvements in vector search performance. Specifically, we’ve observed that Qdrant search response times are a remarkable 10x faster compared to balanced disks, and 2.5x faster compared to premium SSDs, particularly when operating directly from disk without caching all data and indexes in RAM. Qdrant Hybrid Cloud users on Google Cloud can leverage this advancement to efficiently handle massive datasets, delivering unmatched scalability and speed without relying on full in-memory caching.” – Bastian Hofmann, Director of Engineering, Qdrant
Stateful applications like databases, analytics platforms, and content management systems are critical to many businesses. However, their performance can often be limited by the I/O speed of the underlying storage. While persistent disks provide durability and flexibility, read-intensive workloads can experience bottlenecks, impacting application responsiveness and scalability.
GKE Data Cache addresses this challenge head-on, providing a managed block storage solution that integrates with your existing Persistent Disk or Hyperdisk volumes. When you enable GKE Data Cache on your node pools and configure your workloads to use it, frequently accessed data is automatically cached on the low-latency local SSDs attached to your GKE nodes.
This caching layer serves read requests directly from the local SSDs whenever the data is available, significantly reducing the need to access the underlying persistent disk and potentially allowing for the use of less system memory cache (RAM) to service requests in a timely manner.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5757cc7130>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
The result is a substantial improvement in read performance, leading to:
Lower read latency: Applications experience faster data retrieval, improving the user experience and application responsiveness.
Higher throughput and QPS: The ability to serve more read requests in parallel allows your applications to handle increased load and perform more intensive data operations.
Potential cost optimization: By accelerating reads, you may be able to utilize smaller or lower-IOPS persistent disks for your primary storage while still achieving high performance through the cache. Additionally you may be able to reduce the required memory of the machine’s paging cache by pushing the read latency to the local SSD. Memory capacity is more expensive than capacity on a local SSD.
Simplified management: As a managed feature, GKE Data Cache simplifies the process of implementing and managing a high-performance caching solution for your stateful workloads.
“Nothing elevates developer experience like an instant feedback loop. Thanks to GKE Data Cache, developers can spin up pre-warmed Coder Workspaces on demand, blending local-speed coding with the consistency of ephemeral Kubernetes infrastructure.” – Ben Potter, VP of Product, Coder
GKE Data Cache supports all read/write Persistent Disk and Hyperdisk types as backing storage, so you can choose the right persistent storage for your needs while leveraging the performance benefits of local SSDs for reads. You can configure your node pools to dedicate a specific amount of local SSD space for data caching.
For data consistency, GKE Data Cache offers two write modes: writethrough (recommended for most production workloads to ensure data is written to both the cache and the persistent disk synchronously) and writeback (for workloads prioritizing write speed, with data written to the persistent disk asynchronously).
Getting started
Getting started with GKE Data Cache is straightforward. You’ll need a GKE Standard cluster running a compatible version (1.32.3-gke.1440000 or later), node pools configured with local SSDs, the data cache feature enabled, and a StorageClass that specifies the use of data cache acceleration. Your stateful workloads can then request storage with caching using PersistentVolumeClaims that reference this StorageClass. The amount of data to store in a cache for each disk is defined in the StorageClass.
Here’s how to create a data cache-enabled node pool in an existing cluster:
When you create a node pool with the `data-cache-count` flag, local SSDs (LSSDs) are reserved for the data cache feature. This feature uses those LSSDs to cache data for all pods that have caching enabled and are scheduled onto that node pool.
The LSSDs not reserved for caching can be used as ephemeral storage. Note that we do not currently support using the remaining LSSDs as raw block storage.
Once you reserve the required local SSDs for caching, you set up the cache configuration in the StorageClass with `data-cache-mode` and `data-cache-size` then reference that StorageClass in a PersistentVolumeClaim.
Building cutting-edge AI models is exciting, whether you’re iterating in your notebook or orchestrating large clusters. However, scaling up training can present significant challenges, including navigating complex infrastructure, configuring software and dependencies across numerous instances, and pinpointing performance bottlenecks.
At Google Cloud, we’re focused on making AI training easier, whatever your scale. We’re continuously evolving our AI Hypercomputer system, not just with powerful hardware like TPUs and GPUs, but with a suite of tools and features designed to make you, the developer, more productive. Let’s dive into some recent enhancements that can help streamline your workflows, from interactive development to optimized training and easier deployment.
Scale from your notebook with Pathways on Cloud
You love the rapid iteration that Jupyter notebooks provide, but scaling to thousands of accelerators means leaving that familiar environment behind. At the same time, having to learn different tools for running workloads at scale isn’t practical; nor is tying up large clusters of accelerators for weeks for iterative experiments that might run only for a short time.
You shouldn’t have to choose between ease-of-use and massive scale. With JAX, it’s easy to write code for one accelerator and scale it up to thousands of accelerators. Pathways on Cloud, an orchestration system for creating large-scale, multi-task, and sparsely activated machine learning systems, takes this concept further, making interactive supercomputinga reality. Pathways dynamically manages pools of accelerators for you, orchestrating data movement and computation across potentially thousands of devices. The result? You can launch an experiment on just one accelerator directly from your Jupyter notebook, refine it, and then scale it to thousands of accelerators within the same interactive session. Now you can quickly iterate on research and development without sacrificing scale.
With Pathways on Cloud, you can finally stop rewriting code for different scales. Stop over-provisioning hardware for weeks when your experiments only need a few hours. Stay focused on your science, iterate faster, and leverage supercomputing power on demand. Watch this video to see how Pathways on Cloud delivers true interactive scaling — far beyond just running JupyterHub on a Google Kubernetes Engine (GKE) cluster.
When scaling up a job, simply knowing that your accelerators are being used isn’t enough. You need to understand how they’re being used and why things might be slow or crashing. How else would you find that pesky out-of-memory error that takes down your entire run?
Meet the Xprofiler library, your tool for deep performance analysis on Google Cloud accelerators. It lets you profile and trace your code execution, giving you critical insights, especially into the high level operations (HLO) generated by the XLA compiler. Getting actionable insights using Xprofiler is easy. Simply launch an Xprofiler instance from the command line to capture detailed profile and trace logs during your run. Then, use TensorBoard to quickly analyze this data. You can visualize performance bottlenecks, understand hardware limits with roofline analysis (is your workload compute- or memory-bound?), and quickly pinpoint the root cause of errors. Xprofiler helps you optimize your code for peak performance, so you can get the most out of your AI infrastructure.
Skip the setup hassle with container images
You have the choice of many powerful AI frameworks and libraries, but configuring them correctly — with the right drivers and dependencies — can be complex and time-consuming. Getting it wrong, especially when scaling to hundreds or thousands of instances, can lead to costly errors and delays. To help you bypass these headaches, we provide pre-built, optimized container images designed for common AI development needs.
For PyTorch on GPUs, our GPU-accelerated instance container images offer a ready-to-run environment. We partnered closely with NVIDIA to include tested versions of essential software like the NVIDIA CUDA Toolkit, NCCL, and frameworks such as NVIDIA NeMo. Thanks to Canonical, these run on optimized Ubuntu LTS. Now you can get started quickly with a stable environment that’s tuned for performance, avoiding compatibility challenges and saving significant setup time.
And if you’re working with JAX (on either TPUs or GPUs), our curated container images and recipes for JAX for AI on Google Cloud streamline getting started. Avoid the hassle of manual dependency tracking and configuration with these tested and ready-to-use JAX environments.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e575816a130>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
Boost GPU training efficiency with proven recipes
Beyond setup, maximizing useful compute time (“ML Goodput”) during training is crucial, especially at scale. Wasted cycles due to job failures can significantly inflate costs and delay results. To help, we provide techniques and ready-to-use recipes to tackle these challenges.
Techniques like asynchronous and multi-tier checkpointing increase checkpoint frequency without slowing down training and speed up save/restore operations. AI Hypercomputer can automatically handle interruptions, choosing intelligently between resets, hot-swaps, or scaling actions. Our ML Goodput recipe, created in partnership with NVIDIA, bundles these techniques, integrating NVIDIA NeMo and the NVIDIA Resiliency Extension (NVRx) for a comprehensive solution to boost the efficiency and reliability of your PyTorch training on Google Cloud.
We also added optimized recipes (complete with checkpointing) for you to benchmark training performance for different storage options like Google Cloud Storage and Parallelstore. Lastly, we added recipes for our A4 NVIDIA accelerated instance (built on NVIDIA Blackwell). The training recipes include sparse and dense model training up to 512 Blackwell GPUs with PyTorch and JAX.
Cutting-edge JAX LLM development with MaxText
For developers who use JAX for LLMs on Google Cloud, MaxText provides advanced training, tuning, and serving on both TPUs and GPUs. Recently, we added support for key fine-tuning techniques like Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO), alongside resilient training capabilities such as suspend-resume and elastic training. MaxText leverages JAX optimizations and pipeline parallelism techniques that we developed in collaboration with NVIDIA to improve training efficiency across tens of thousands of NVIDIA GPUs. And we also added support and recipes for the latest open models: Gemma 3, Llama 4 training and inference (Scout and Maverick), and DeepSeek v3 training and inference.
To help you get the best performance with Trillium TPU, we added microbenchmarking recipes including matrix multiplication, collective compute, and high-bandwidth memory (HBM) tests scaling up to multiple slices with hundreds of accelerators. These metrics are particularly useful for performance optimization. For production workloads on GKE, be sure to take a look at automatic application monitoring.
Harness PyTorch on TPU with PyTorch/XLA 2.7 and torchprime
We’re committed to providing an integrated, high-performance experience for PyTorch users on TPUs. To that end, the recently released PyTorch/XLA 2.7 includes notable performance improvements, particularly benefiting users working with vLLM on TPU for inference. This version also adds an important new flexibility and interoperability capability: you can now call JAX functions directly from within your PyTorch/XLA code.
Then, to help you harness the power of PyTorch/XLA on TPUs, we introduced torchprime, a reference implementation for training PyTorch models on TPUs. Torchprime is designed to showcase best practices for large-scale, high-performance model training, making it a great starting point for your PyTorch/XLA development journey.
Build cutting-edge recommenders with RecML
While generative AI often captures the spotlight, highly effective recommender systems remain a cornerstone of many applications, and TPUs offer unique advantages for training them at scale. Deep-learning recommender models frequently rely on massive embedding tables to represent users, items, and their features, and processing these embeddings efficiently is crucial. This is where TPUs shine, particularly with SparseCore, a specialized integrated dataflow processor. SparseCore is purpose-built to accelerate the lookup and processing of the vast, sparse embeddings that are typical in recommenders, dramatically speeding up training compared to alternatives.
To help you leverage this power, we now offer RecML: an easy-to-use, high-performance, large-scale deep-learning recommender system library optimized for TPUs. It provides reference implementations for training state-of-the-art recommender models such as BERT4Rec, Mamba4Rec, SASRec, and HSTU. RecML uses SparseCore to maximize performance, making it easy for you to efficiently utilize the TPU hardware for faster training and scaling of your recommender models.
Build with us!
Improving the AI developer experience on Google Cloud is an ongoing mission. From scaling your interactive experiments with Pathways, to pinpointing bottlenecks with Xprof, to getting started faster with optimized containers and framework recipes, these AI Hypercomputer improvements help to remove friction so you can innovate faster, and build on the other AI Hypercomputer innovations we announced at Google Cloud Next 25:
Explore these new features, spin up the container images, try the JAX and PyTorch recipes, and contribute back to open-source projects like MaxText, torchprime, and RecML. Your feedback shapes the future of AI development on Google Cloud. Let’s build it together.
Organizations depend on fast and accurate data-driven insights to make decisions, and SQL is at the core of how they access that data. With Gemini, Google can generate SQL directly from natural language — a.k.a. text-to-SQL. This capability increases developer and analysts’ productivity and empowers non-technical users to interact directly with the data they need.
Today, you can find text-to-SQL capabilities in many Google Cloud products:
“Help me code” functionality in Cloud SQL Studio (Postgres, MySQL and SQLServer), AlloyDB Studio and Cloud Spanner Studio
AlloyDB AI with its direct natural language interface to the database, currently available as a public preview
Through Vertex AI, which lets you access the Gemini models that are the basis for these products directly
Recently, powerful large language models (LLMs) like Gemini, with their abilities to reason and synthesize, have driven remarkable advancements in the field of text-to-SQL. In this blog post, the first entry in a series, we explore the technical internals of Google Cloud’s text-to-SQL agents. We will cover state-of-the-art approaches to context building and table retrieval, how to do effective evaluation of text-to-SQL quality with LLM-as-a-judge techniques, the best approaches to LLM prompting and post-processing, and how we approach techniques that allows the system to offer virtually certified correct answers.
The ‘Help me code’ feature in Cloud SQL Studio generates SQL from a text prompt
The challenges of text-to-SQL technology
Current state-of-the-art LLMs like Gemini 2.5 have reasoning capabilities that make them good at translating complex questions posed in natural language to functioning SQL, complete with joins, filters, aggregations and other difficult concepts.
To see this in action you can do a simple test in Vertex AI Studio. Given the prompt “I have a database schema that contains products and orders. Write a SQL query that shows the number of orders for shoes”, Gemini produces SQL for a hypothetical schema:
code_block
<ListValue: [StructValue([(‘code’, “SELECT COUNT(DISTINCT o.order_id) AS NumberOfShoeOrders FROM orders o JOIN order_items oi ON o.order_id = oi.order_id JOIN products p ON oi.product_id = p.product_id WHERE p.product_name LIKE ‘%shoe%’.”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eaaccfde9d0>)])]>
Great, this is a good looking query. But what happens when you move beyond this trivial example, and use Gemini for text-to-SQL against a real world database and on real-world user questions? It turns out that the problem is more difficult. The model needs to be complemented with methods to:
provide business-specific context
understand user intent
manage differences in SQL dialects
Let’s take a look at each of these challenges.
Problem #1: Provide business-specific context
Just like data analysts or engineers, LLMs need significant amounts of knowledge or “context” to generate accurate SQL. The context can be both explicit (what does the schema look like, what are the relevant columns, and what does the data itself look like?) or more implicit (what is the precise semantic meaning of a piece of data? what does it mean for the specific business case?).
Specialized model training, or fine tuning, is typically not a scalable solution to this problem. Training on the shape of every database or dataset, and keeping up with schema or data changes, is both difficult and cost-prohibitive. Business knowledge and semantics are often not well documented in the first place, and difficult to turn into training data.
For example, even the best DBA in the world would not be able to write an accurate query to track shoe sales if they didn’t know that cat_id2 = 'Footwear' in a pcat_extension table means that the product in question is a kind of shoe. The same is true for LLMs.
Problem #2: Understanding user intent
Natural language is less precise than SQL. An engineer or analyst faced with an ambiguous question can detect that they need more information and go back and ask the right follow-up questions. An LLM, on the other hand, tends to try to give you an answer, and when the question is ambiguous, can be prone to hallucinating.
Example: Take a question like “What are the best-selling shoes?” Here, one obvious point of ambiguity is what “best selling” actually means in the context of the business or application — the most ordered shoes? The shoe brand that brought in the most money? Further, should the SQL count returned orders? And how many kinds of shoes do you want to see in the report? etc.
Further, different users need different kinds of answers. If the user is a technical analyst or a developer asking a vague question, giving them a reasonable, but perhaps not 100% correct SQL query is a good starting point. On the other hand, if the user is less technical and does not understand SQL, providing precise, correct SQL is more important. Being able to reply with follow-up questions to disambiguate, explaining the reasoning that went into an answer, and guiding the user to what they are looking for is key.
Problem #3: Limits of LLM generation
Out of the box, LLMs are particularly good at tasks like creative writing, summarizing or extracting information from documents. But some models can struggle with following precise instructions and getting details exactly right, particularly when it comes to more obscure SQL features. To be able to produce correct SQL, the LLM needs to adhere closely to what can often turn into complex specifications.
Example: Consider the differences between SQL dialects, which are more subtle than differences between programming languages like Python and Java. As a simple example, if you’re using BigQuery SQL, the correct function for extracting a month from a timestamp column is EXTRACT(MONTH FROM timestamp_column). But if you are using MySQL, you use MONTH(timestamp_column).
Text-to-SQL techniques
At Google Cloud, we’re constantly evolving our text-to-SQL agents to improve their quality. To address the problems listed above, we apply a number of techniques.
Problem
Solutions
Understanding schema, data and business concepts
Intelligent retrieval and ranking of datasets, tables and columns, based on semantic similarity.
In-context-learning with business specific examples
Data linking and sampling
Semantic layer over raw data. This provides a bridge between complex data structures and the everyday language used by the customer
Usage pattern analysis and query history
Understanding user intent
Disambiguation using LLMs
Entity resolution
SQL-aware foundation models
Limits of LLM generation
Self-consistency
Validation and rewriting
Strong foundation models
In-context-learning with dialect specific examples
Model finetuning
The text-to-SQL architecture
Let’s take a closer look at some of these techniques.
SQL-aware models Strong LLMs are the foundation of text-to-SQL solutions, and the Gemini family of models has a proven track record of high-quality code and SQL generation. Depending on the particular SQL generation task, we mix and match model versions, including some cases where we employ customized fine-tuning, for example to ensure that models provide sufficiently good SQL for certain dialects.
Disambiguation using LLMs Disambiguation involves getting the system to respond with a clarifying question when faced with a question that is not clear enough (in the example above of “What is the best selling shoe?” should lead to a follow-up question like “Would you like to see the shoes ordered by order quantity or revenue?” from the text-to-SQL agent). Here we typically orchestrate LLM calls to first try to identify if a question can actually be answered given the available schema and data, and if not, to generate the necessary follow-up questions to clarify the user’s intent.
Retrieval and in-context-learning As mentioned above, providing models with the context they need to generate SQL is critical. We use a variety of indexing and retrieval techniques — first to identify relevant datasets, tables and columns, typically using vector search for multi-stage semantic matching, then to load additional useful context. Depending on the product, this may include things like user-provided schema annotations, examples of similar SQL or how to apply specific business rules, or samples of recent queries that a user has run against the same datasets. All of this data is organized into prompts then passed to the model. Gemini’s support for long context windows unlocks new capabilities here by allowing the model to handle large schemas and other contextual information.
Validation and reprompting Even with a high-quality model, there is still some level of non-determinism or unpredictability involved in LLM-driven SQL generation. To address this we have found that non-AI approaches like query parsing or doing a dry run of the generated SQL complements model-based workflows well. We can get a clear, deterministic signal if the LLM has missed something crucial, which we then pass back to the model for a second pass. When provided an example of a mistake and some guidance, models can typically address what they got wrong.
Self-consistency The idea of self-consistency is to not depend on a single round of generation, but to generate multiple queries for the same user question, potentially using different prompting techniques or model variants, and picking the best one from all candidates. If several models agree that one answer looks particularly good, there is a greater chance that the final SQL query will be accurate and matches what the user is looking for.
Evaluation and measuring improvements
Improving AI-driven capabilities depends on robust evaluation. The text-to-SQL benchmarks developed in the academic community, like the popular BIRD-bench, have been a very useful baseline to understand model and end-to-end system performance. However, these benchmarks are often lacking when it comes to representing broad real-world schemas and workloads. To address this we have developed our own suite of synthetic benchmarks that augment the baseline in many ways.
Coverage: We make sure to have benchmarks that cover a broad list of SQL engines and products, both dialects and engine-specific features. This includes not only queries, but also DDL, DML and other administrative needs, and questions that are representative for common usage patterns, including more complex queries and schemas.
Metrics: We combine user metrics and offline eval metrics, and employ both human and automated evaluation, particularly using LLM-as-a-judge techniques, which reduce cost but still allow us to understand performance on ambiguous and unclear tasks.
Continuous evals: Our engineering and research teams use evals to quickly be able to test out new models, new prompting techniques and other improvements. It can give us signals quickly to tell if an approach is showing promise and is worth pursuing.
Taken together, using these techniques are driving the remarkable improvements in text-to-SQL that we are witnessing in our labs, as well as in customers’ environments. As you get ready to incorporate text-to-SQL in your own environments, stay tuned for more deep dives into our text-to-SQL solutions. Try Gemini text-to-SQL in BigQuery Studio, CloudSQL, AlloyDB and Spanner Studio, and in AlloyDB AI today.