AWS Lambda announces Provisioned Mode for SQS event-source-mappings (ESMs) that subscribe to Amazon SQS, a feature that allows you to optimize the throughput of your SQS ESM by provisioning event polling resources that remain ready to handle sudden spikes in traffic. SQS ESM configured with Provisioned Mode scales 3x faster (up to 1000 concurrent executions per minute) and supports 16x higher concurrency (up to 20,000 concurrent executions) than default SQS ESM capability. This allows you to build highly responsive and scalable event-driven applications with stringent performance requirements.
Customers use SQS as an event source for Lambda functions to build mission-critical applications using Lambda’s fully-managed SQS ESM, which automatically scales polling resources in response to events. However, for applications that need to handle unpredictable bursts of traffic, lack of control over the throughput of ESM can lead to delays in event processing. Provisioned Mode for SQS ESM allows you to fine tune the throughput of the ESM by provisioning a minimum and maximum number of polling resources called event pollers that are ready to handle sudden spikes in traffic. With this feature, you can process events with lower latency, handle sudden traffic spikes more effectively, and maintain precise control over your event processing resources.
This feature is generally available in all AWS Commercial Regions. You can activate Provisioned Mode for SQS ESM by configuring a minimum and maximum number of event pollers in the ESM API, AWS Console, AWS CLI, AWS SDK, AWS CloudFormation, and AWS SAM. You pay for the usage of event pollers, along a billing unit called Event Poller Unit (EPU). To learn more, read Lambda ESM documentation and AWS Lambda pricing.
Amazon Elastic Container Service (Amazon ECS) now includes enhancements that improve service availability during rolling deployments. These enhancements help maintain availability when new application version tasks are failing, when current tasks are unexpectedly terminated, or when scale-out is triggered during deployments.
Previously, when tasks in your currently running version became unhealthy or were terminated during a rolling deployment, ECS would attempt to replace them with the new version to prioritize deployment progress. If the new version could not launch successfully—such as when new tasks fail health checks or fail to start—these replacements would fail and your service availability could drop. ECS now replaces unhealthy or terminated tasks using the same service revision they belong to. Unhealthy tasks in your currently running version are replaced with healthy tasks from that same version, independent of the new version’s status. Additionally, when Application Auto Scaling triggers during a rolling deployment, ECS applies scale-out to both service revisions, ensuring your currently running version can handle increased load even if the new version is failing.
These improvements respect your service’s maximumPercent and minimumHealthyPercent settings. These enhancements are enabled by default for all services using the rolling deployment strategy and are available in all AWS Regions. To learn more about rolling-update deployments, refer Link.
Cloud infrastructure reliability is foundational, yet even the most sophisticated global networks can suffer from a critical issue: slow or failed recovery from routing outages. In massive, planetary-scale networks like Google’s, router failures or complex, hidden conditions can prevent traditional routing protocols from restoring service quickly, or sometimes at all. These brief but costly outages — what we call slow convergence or convergence failure — critically disrupt real-time applications with low tolerance to packet loss and, most acutely, today’s massive, sensitive AI/ML training jobs, where a brief network hiccup can waste millions of dollars in compute time.
To solve this problem, we pioneered Protective ReRoute (PRR), a radical shift that moves the responsibility for rapid failure recovery from the centralized network core to the distributed endpoints themselves. Since putting it into production over five years ago, this host-based mechanism has dramatically increased Google’s network’s resilience, proving effective in recovering from up to 84%1 of inter-data-center outages that would have been caused by slow convergence events. Google Cloud customers with workloads that are sensitive to packet loss can also enable it in their environments — read on to learn more.
The limits of in-network recovery
Traditional routing protocols are essential for network operation, but they are often not fast enough to meet the demands of modern, real-time workloads. When a router or link fails, the network must recalculate all affected routes, which is known as reconvergence. In a network the size of Google’s, this process can be complicated by the scale of the topology, leading to delays that range from many seconds to minutes. For distributed AI training jobs with their wide, fan-out communication patterns, even a few seconds of packet loss can lead to application failure and costly restarts. The problem is a matter of scale: as the network grows, the likelihood of these complex failure scenarios increases.
Protective ReRoute: A host-based solution
Protective ReRoute is a simple, effective concept: empower the communicating endpoints (the hosts) to detect a failure and intelligently re-steer traffic to a healthy, parallel path. Instead of waiting for a global network update, PRR capitalizes on the rich path diversity built into our network. The host detects packet loss or high latency on its current path, and then immediately initiates a path change by modifying carefully chosen packet header fields, which tells the network to use an alternate, pre-existing path.
This architecture represents a fundamental shift in network reliability thinking. Traditional networks rely on a combination of parallel and series reliability. Serialization of components tends to reduce the reliability of a system; in a large-diameter network with multiple forwarding stages, reliability degrades as the diameter increases. In other words, every forwarding stage affects the whole system. Even if a network stage is designed with parallel reliability, it creates a serial impact on the overall network while the parallel stage reconverges. By adding PRR at the edges, we treat the network as a highly parallel system of paths that appear as a single stage, where the overall reliability increases as the number of available paths grows exponentially, effectively circumventing the serialization effects of slow network convergence in a large-diameter network. The following diagram contrasts the system reliability model for a PRR-enabled network with that of a traditional network. Traditional network reliability is in inverse proportion to the number of forwarding stages; with PRR the reliability of the same network is in direct proportion to the number of composite paths, which is exponentially proportional to the network diameter.
How Protective ReRoute works
The PRR mechanism has three core functional components:
End-to-end failure detection: Communicating hosts continuously monitor path health. On Linux systems, the standard mechanism uses TCP retransmission timeout (RTO) to signal a potential failure. The time to detect a failure is generally a single-digit multiple of the network’s round-trip time (RTT). There are also other methods for end-to-end failure detection that have varying speed and cost.
Packet-header modification at the host: Once a failure is detected, the transmitting host modifies a packet-header field to influence the forwarding path. To achieve this, Google pioneered and contributed the mechanism that modifies the IPv6 flow-label in the Linux kernel (version 4.20+). Crucially, the Google software-defined network (SDN) layer provides protection for IPv4 traffic and non-Linux hosts as well by performing the detection and repathing on the outer headers of the network overlay.
PRR-aware forwarding: Routers and switches in the multipath network respect this header modification and forward the packet onto a different, available path that bypasses the failed component.
Proof of impact
PRR is not theoretical; it is a continuously deployed, 24×7 system that protects production traffic worldwide. Its impact is compelling: PRR has been shown to reduce network downtime caused by slow convergence and convergence failures by up to the above-mentioned 84%. This means that up to 8 out of every 10 network outages that would have been caused by a router failure or slow network-level recovery are now avoided by the host. Furthermore, host-initiated recovery is extremely fast, often resolving the problem in a single-digit multiple of the RTT, which is vastly faster than traditional network reconvergence times.
Key use cases for ultra-reliable networking
The need for PRR is growing, driven by modern application requirements:
AI/ML training and inference: Large-scale workloads, particularly those distributed across many accelerators (GPUs/TPUs), are uniquely sensitive to network reliability. PRR provides the ultra-reliable data distribution necessary to keep these high-value compute jobs running without disruption.
Data integrity and storage: Significant numbers of dropped packets can result in data corruption and data loss, not just reduced throughput. By reducing the outage window, PRR improves application performance and helps guarantee data integrity.
Real-time applications: Applications like gaming and services like video conferencing and voice calls are intolerant of even brief connectivity outages. PRR reduces the recovery time for network failures to meet these strict real-time requirements.
Frequent short-lived connections: Applications that rely on a large number of very frequent short-lived connections can fail when the network is unavailable for even a short time. By reducing the expected outage window, PRR helps these applications reliably complete their required connections.
Activating Protective ReRoute for your applications
The architectural shift to host-based reliability is an accessible technology for Google Cloud customers. The core mechanism is open and part of the mainline Linux kernel (version 4.20 and later).
You can benefit from PRR in two primary ways:
Hypervisor mode: PRR automatically protects traffic running across Google data centers without requiring any guest OS changes. Hypervisor mode provides recovery in the single digit seconds for traffic of moderate fanout in specific areas of the network.
Guest mode: For critical, performance-sensitive applications with high fan-out and in any segment of the network, you can opt into guest-mode PRR, whichenables the fastest possible recovery time and greatest control. This is the optimal setting for demanding mission-critical applications, AI/ML jobs, and other latency-sensitive services.
To activate guest-mode PRR for critical applications follow the guidance in the documentation and be ready to ensure the following:
Your VM runs a modern Linux kernel (4.20+).
Your applications use TCP.
The application traffic uses IPv6. For IPv4 protection, the application needs to use the gVNIC driver.
Get started
The availability of Protective ReRoute has profound implications for a variety of Google and Google Cloud users.
For cloud customers with critical workloads: Evaluate and enable guest-mode PRR for applications that are sensitive to packet loss and that require the fastest recovery time, such as large-scale AI/ML jobs or real-time services.
For network architects: Re-evaluate your network reliability architectures. Consider the benefits of designing for rich path diversity and empowering endpoints to intelligently route around failures, shifting your model from series to parallel reliability.
For the open-source community: Recognize the power of host-level networking innovations. Contribute to and advocate for similar reliability features across all major operating systems to create a more resilient internet for everyone.
With the pace of scientific discovery moving faster than ever, we’re excited to join the supercomputing community as it gets ready for its annual flagship event, SC25, in St. Louis from November 16-21, 2025. There, we’ll share how Google Cloud is poised to help with our lineup of HPC and AI technologies and innovations, helping researchers, scientists, and engineers solve some of humanity’s biggest challenges.
Redefining supercomputing with cloud-native HPC
Supercomputers are evolving from a rigid, capital-intensive resource into an adaptable, scalable service. To go from “HPC in the cloud” to “cloud-native HPC,” we leverage core principles of automation and elastic infrastructure to fundamentally change how you consume HPC resources, allowing you to spin up purpose-built clusters in minutes with the exact resources you need.
This cloud-native model is very flexible. You can augment an on-premises cluster to meet peak demand or build a cloud-native system tailored with the right mix of hardware for your specific problem — be it the latest CPUs, GPUs, or TPUs. With this approach, we’re democratizing HPC, putting world-class capabilities into the hands of startups, academics, labs, and enterprise teams alike.
Key highlights at SC25:
Next-generation infrastructure: We’ll be showcasing our latest H4D VMs, powered by 5th generation AMD EPYC processors and featuring Cloud RDMA for low-latency networking. You’ll also see our latest accelerated compute resources including A4X and A4X Max VMs featuring the latest NVIDIA GPUs with RDMA.
Powering your essential applications: Run your most demanding simulations at massive scale — from Computational Fluid Dynamics (CFD) with Ansys, to Computer-Aided Engineering with Siemens, computational chemistry with Schrodinger, and risk modeling in FSI.
Dynamic Workload Scheduler: Discover how Dynamic Workload Scheduler and its innovative Flex Start mode, integrated with familiar schedulers like Slurm, is reshaping HPC consumption. Move beyond static queues toward flexible, cost-effective, and efficient access to high-demand compute resources.
Easier HPC with Cluster Toolkit: Learn how Cluster Toolkit can help you deploy a supercomputer-scale cluster with less than 50 lines of code.
High-throughput, scalable storage: Get a deep dive into Google Cloud Managed Lustre, a fully managed, high-performance parallel file system that can handle your most demanding HPC and AI workloads.
Hybrid for the enterprise: For our enterprise customers, especially in financial services, we’re enabling hybrid cloud with IBM Spectrum Symphony Connectors, allowing you to migrate or burst workloads to Google Cloud and reduce time-to-solution.
AI-powered scientific discovery
There’s a powerful synergy between HPC and AI — where HPC builds more powerful AI, and AI makes HPC faster and more insightful. This complementary relationship is fundamentally changing how research is done, accelerating discovery in everything from drug development and climate modeling to new materials and engineering. At Google Cloud, we’re at the forefront of this transformation, building the models, tools, and platforms that make it possible.
What to look for:
AI for scientific productivity: We’ll be showcasing Google’s suite of AI tools designed to enhance the entire research lifecycle. From Idea Generation agent to Gemini Code Assist with Gemini Enterprise, you’ll see how AI can augment your capabilities and accelerate discovery.
AI-powered scientific applications: Learn about the latest advancements in our AI-powered scientific applications including AlphaFold 3 and Weather Next
The power of TPUs: Explore Google’s TPUs, including the latest seventh-generation Ironwood model, and discover how they can enhance AI workload performance and efficiency.
Join the Google Cloud at SC25: At Google Cloud, we believe the cloud is the supercomputer of the future. From purpose-built HPC and AI infrastructure to quantum breakthroughs and simplified open-source tools, let Google Cloud be the platform for your next discovery.
We invite you to connect with our experts and learn more. Join the Google Cloud Advanced Computing Community to engage in discussions with our partners and the broader HPC, AI, and quantum communities.
We can’t wait to see what you discover.
See us at the show:
Visit us in booth #3724: Stop by for live demos of our latest HPC and AI solutions, including Dynamic Workload Scheduler, Cluster Toolkit, our latest AI agents, and even see our TPUs. Our team of experts will be on hand to answer your questions and discuss how Google Cloud can meet your needs.
Attend our technical talks: Keep an eye on our SC25 schedule for Google Cloud presentations and technical talks, where our leaders and partners will share deep dives, insights, and best practices.
Passport program: Grab a passport card from the Google booth and visit our demos, labs, and talks to collect stamps and learn about how we’re working with organizations across the HPC ecosystem to democratize HPC. Come back to the Google booth with your completed passport card to choose your prize!
Play a game: Join us in the Google booth and at our events to enjoy some Gemini-driven games — test your tech trivia knowledge or compete head-to-head with others to build the best LEGO creation!
Join our community kickoff: Are you a member of the Google Cloud Advanced Computing Community? Secure your spot today for our SC25 Kickoff Happy Hour!
Celebrate with NVIDIA and Google Cloud: We’re proud to co-host a reception with NVIDIA, and we look forward to toasting another year of innovation with our customers and partners. Register today to secure your spot!
Editor’s note: The post is part of a series that highlights how organizations leverage Google Cloud’s unique data science capabilities over alternative cloud data platforms. Google Cloud’s vector embedding generation and search features are unique for their end-to-end, customizable platform that leverages Google’s advanced AI research, offering features like task-optimized embedding models and hybrid search to deliver highly relevant results for both semantic and keyword-based queries.
Zeotap’s customer intelligence platform (CIP) helps brands understand their customers and predict behaviors, so that they can improve customer engagement. Zeotap partners with Google Cloud to build a customer data platform that offers privacy, security, and compliance. Zeotap CIP, built with BigQuery, enables digital marketers to build and use AI/ML models to predict customer behavior and personalize the customer experienc
The Zeotap platform includes a customer segmentation feature called lookalike audience extensions. A lookalike audience is a group of new potential customers identified by machine learning algorithms who share similar characteristics and behaviors with an existing, high-value customer base. However, sparse or incomplete first-party data can make it hard to create effective lookalike audiences, preventing advertising algorithms from accurately identifying the key characteristics of valuable customers that they need to find similar new prospects. For such rare features, Zeotap uses multiple machine learning (ML) methodologies that combine Zeotap’s multigraph algorithm and high-quality data assets to more accurately extend customers’ audiences between the CDP and lookalike models.
In this blog, we dive into how Zeotap uses BigQuery, including BigQuery ML and Vector Search to solve the end-to-end lookalike problem. By taking a practical approach, we transformed a complex nearest-neighbour problem into a simple inner-join problem, overcoming challenges of cost, scale and performance without a specialized vector database. We break down each step of the workflow, from data preparation to serving, highlighting how BigQuery addresses core challenges along the way. We illustrate one of the techniques, Jaccard similarity with embeddings, to address the low-cardinality categorical columns that dominate user-profile datasets.
The high-level flow is as follows, and happens entirely within the BigQuery ecosystem. Note: In this blog, we will not be covering the flow of high-cardinality columns.
Jaccard similarity
Among a couple of other similarity indexes, which return the most similar vector that are closest in embedding space, Zeotap prefers the Jaccard similarity to be a fitting index for low-cardinality features, which is a measure of overlap between two sets with a simple formula: (A B) / (AB). The Jaccard similarity answers the question, “Of all the unique attributes present in either of the two users, what percentage of them are shared?” It only cares about the features that are present in at least one of the entities (e.g., the 1s in a binary vector) and ignores attributes that are absent in both.
Jaccard similarity shines because it is simple and easily explainable over many other complex distance metrics and similarity indexes that only measure distance in the embeddings space — a real Occam’s razor, as it were.
Implementation blueprint
Generating the vector embeddings After selecting the low-cardinality features, we create our vectors using BigQuery one-hot encoding andmulti-hot encoding for primitive and array-based columns.
Again, it helps to visualize a sample vector table:
Challenge: Jaccard distance is not directly supported in BigQuery vector search!
BigQuery vector search supports three distance types: Euclidean, Cosine and Dot product, but not Jaccard distance — at least not natively. However, we can represent the choice of binary vectors where the Jaccard Distance (1 – Jaccard Similarity) as:
Jd(A,B) = 1 – |A∩B|/|A∪B| = (|A∪B| – |A∩B|)/|A∪B|
Using only the dot product, this can be rewritten as:
So we can, in fact, arrive at the Jaccard distance using the dot product. We found BigQuery’s out-of-the-box LP_NORM function for calculating theManhattan norm useful, as the Manhattan norm for a binary vector is the dot product with itself. In other words, using the Manhattan norm function, we found that we can support the Jaccard distance in a way that it can be calculated using the supported “dot product” search in BigQuery.
Building the vector index
Next, we needed to build our vector index. BigQuery supports two primary vector index types: IVF (Inverted File Index) and TREE_AH (Tree with Asymmetric Hashing), each tailored to different scenarios. The TREE_AH vector index type combines a tree-like structure with asymmetric hashing (AH), based onGoogle’s ScaNN algorithm, which has performed exceptionally well on variousANN benchmarks. Also, since the use case was for large batch queries (e.g., hundreds of thousands to millions of users), this offered reduced latency and cost compared to alternate vector databases.
Lookalike delivery
Once we had a vector index to optimize searches, we asked ourselves, “Should we run our searches directly using the VECTOR_SEARCH function in BigQuery?” Taking this approach over the base table yielded a whopping 118 million user-encoded vectors for just one client! Additionally, and most importantly, since this computation called for a Cartesian product, our in-memory data sizes became very large and complex quickly. We needed to devise a strategy that would scale to all customers.
The rare feature strategy
A simple but super-effective strategy is to avoid searching for ubiquitous user features. In a two-step rare-feature process, we identify the “omnipresent” features, then proceed to create a signal-rich table that includes users who possess at least one of the rarer/discriminative features. Right off the bat, we achieved up to 78% reduction in search space. BigQuery VECTOR_SEARCH allows you to do this with pre-filtering, wherein you use a subquery to dynamically shrink the search space. The catch is that the subquery cannot be a classic join, so we introduce a “flag” column and make it part of the index. Note: If a column is not stored in the index, then the WHERE clause in the VECTOR_SEARCH will execute a post-filter.
Use the BQUI or system tables to see if a vector is used to accelerate queries
Batch strategy
Vector search compares query users (N, the users we’re targeting) against base users (M, the total user pool, in this case 118M). The complexity increases with (M × N), making large-scale searches resource-intensive. To manage this, we applied batches to the N query users, processing them in groups (e.g., 500,000 per batch), while M remained the full base set. This approach reduced the computational load, helping to efficiently match the top 100 similar users for each query user.We then used grid search to determine the optimal batch size for high-scale requirements.
To summarize
We partnered with Google Cloud to enable digital marketers to build and use AI/ML models for customer segmentation and personalized experiences, driving higher conversion rates and lower acquisition costs. We addressed the challenge of Jaccard distance not being directly supported in BigQuery Vector Search by using the dot product and Manhattan norm. This practical approach, leveraging BigQuery ML and vector offerings, allowed us to create bespoke lookalike models with just one single SQL script and overcome challenges of cost, scale, and performance without a specialized vector database.
Using BigQuery ML and vector offerings, coupled with its robust, serverless architecture, we were able to release bespoke lookalike models catering to individual customer domains and needs. Together, Zeotap and Google Cloud look forward to partnering to help marketers expand their reach everywhere.
The Built with BigQuery advantage for ISVs and data providers
Built with BigQuery helps companies like Zeotap build innovative applications with Google Data Cloud. Participating companies can:
Accelerate product design and architecture through access to designated experts who can provide insight into key use cases, architectural patterns, and best practices.
Amplify success with joint marketing programs to drive awareness, generate demand, and increase adoption.
BigQuery gives ISVs the advantage of a powerful, highly scalable unified Data Cloud for the agentic era, that’s integrated with Google Cloud’s open, secure, sustainable platform. Click here to learn more about Built with BigQuery.
In the fast-evolving world of agentic development, natural language is becoming the standard for interaction. This shift is deeply connected to the power of operational databases, where a more accurate text-to-SQL capability is a major catalyst for building better, more capable agents. From empowering non-technical users to self-serve data, to accelerating analyst productivity, the ability to accurately translate natural language questions into SQL is a game-changer. As end-user engagements increasingly happen over chat, conversations become the fundamental connection between businesses and their customers.
In an earlier post, “Getting AI to write good SQL: Text-to-SQL techniques explained,” we explored the core challenges of text-to-SQL — handling complex business context, ambiguous user intent, and subtle SQL dialects — and the general techniques used to solve them.
Today, we’re moving from theory to practice. We’re excited to share that Google Cloud has scored a new state-of-the-art result on the BIRD benchmark’s Single Trained Model Track. We scored 76.13, ahead of any other single-model solution (higher is better). In general, the closer you get to the benchmark of human performance (92.96), the harder it is to score incremental gains.
BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) is an industry standard for testing text-to-SQL solutions. BIRD spans over 12,500 unique question-SQL pairs from 95 databases with a total size of 33 GB. The Single Trained Model Track is designed to measure the raw, intrinsic capability of the model itself, restricting the use of complex preprocessing, retrieval, or agentic frameworks often used to boost model accuracy. In other words, success here reflects an advancement in the model’s core ability to generate SQL.
Gemini scores #1 place in BIRD (October ‘25)
From research to industry-leading products
This leap in more accurate natural-language-to-SQL capability, often referred to as NL2SQL, isn’t just an internal research or engineering win; it fundamentally elevates the customer experience across several key data services,and our state-of-the-art research in this field is enabling us to create industry-leading products that customers leverage to activate their data with agentic AI.
Consider AlloyDB AI’s natural language capability, a tool that customers use to allow end users to query the most current operational data using natural language. For instance, companies like Hughes, an Echostar Corporation, depend on AlloyDB’s NL2SQL for critical tasks like call analytics. Numerous other retail, technology, and industry players also integrate this capability into their customer-facing applications. With NL2SQL that is near-100% accurate, customers gain the confidence to build and deploy applications in production workloads that rely on real-time data access.
The benefits of NL2SQL extend to analysis, as exemplified with conversational analytics in BigQuery. This service lets business users and data analysts explore data, run reports, and extract business intelligence from vast historical datasets using natural language. The introduction of a multi-turn chat experience, combined with a highly accurate NL2SQL engine, helps them make informed decisions with the confidence that the responses from BigQuery-based applications are consistently accurate.
Finally, developers are finding new efficiencies. They have long relied on Google Code Assist (GCA) for code generation, aiding their application development with databases across Spanner, AlloyDB, and Cloud SQL Studio. With the availability of more accurate NL2SQL, developers will be able to use AI coding assistance to generate SQL code too.
BIRD: a proving ground for core model capability
BIRD benchmark is one of the most commonly used benchmarks in the text-to-SQL field. It moves beyond simple, single-table queries to cover real-worldchallenges our models must handle, such as reasoning over very large schemas, dealing with ambiguous values, and incorporating external business knowledge. Crucially, BIRD measures a critical standard: execution-verified accuracy. This means a query is not just considered ‘correct’ if it appears right; it must also successfully run and return the correct data.
We specifically targeted the Single Trained Model Track because it allows us to isolate and measure the model’s core ability to solve the text-to-SQL task (rather than an ensemble, a.k.a., a system with multiple components such as multiple parallel models, re-rankers, etc.). This distinction is critical, as text-to-SQL accuracy can be improved with techniques like dynamic few-shot retrieval or schema preprocessing; this track reflects the model’s true reasoning power. By focusing on a single-model solution, these BIRD results demonstrate that enhancing the core model creates a stronger foundation for systems built on top of it.
Our method: Specializing the model
Achieving a state-of-the-art score doesn’t happen only by using a powerful base model. The key is to specialize the model. We developed a recipe designed to transform the model from a general-purpose reasoner into a highly specialized SQL-generation expert.
This recipe consisted of three critical phases applied before inference:
Rigorous data filtering: Ensuring the model learns from a flawless, “gold standard” dataset.
Multitask learning: Teaching the model not just to translate, but to understand the implicit subtasks required for writing a correct SQL query.
Test-time scaling: “Self consistency” a.k.a., picking the best answer.
Let’s break down each step.
Our process for achieving SOTA result
Step 1: Start with a clean foundation (data filtering)
One important tenet of fine-tuning is “garbage in, garbage out.” A model trained on a dataset with incorrect, inefficient, or ambiguous queries may learn incorrect patterns. The training data provided by the BIRD benchmark is powerful, but like most large-scale datasets, it’s not perfect.
Before we could teach the model to be a SQL expert, we had to curate a gold-standard dataset. We used a rigorous two-stage pipeline: first, execution-based validation to execute every query and discard any that failed, returned an error, or gave an empty result. Second, we used LLM-based validation, where multiple LLMs act as a “judge” to validate the semantic alignment between the question and the SQL, catching queries that run but don’t actually answer the user’s question. This aggressive filtering resulted in a smaller, cleaner, and more trustworthy dataset that helped our model learn from a signal of pure quality rather than noise.
Step 2: Make the model a SQL specialist (multitask learning)
With a clean dataset, we could move on to the supervised fine-tuning itself. This is the process of taking a large, general-purpose model — in our case, Gemini 2.5-pro — and training it further on our narrow, specialized dataset to make it an expert in a specific task.
To build these skills directly into the model, we leveraged the publicly available Supervised Tuning API for Gemini on Vertex AI. This service provided the foundation for our multitask supervised finetuning (SFT) approach, where we trained Gemini-2.5-pro on several distinct-but-related tasks simultaneously.
We also extended our training data to cover tasks outside of the main Text-to-SQL realm, helping enhance the model’s reasoning, planning, and self-correction capabilities.
By training on this combination of tasks in parallel, the model learns a much richer, more robust set of skills. It goes beyond simple question-to-query mapping — it learns to deeply analyze the problem, plan its approach, and refine its own logic, leading to drastically improved accuracy and fewer errors.
Step 3: Inference accuracy + test-time scaling with self-consistency
The final step was to ensure we could reliably pick the model’s single best answer at test time. For this, we used a technique called self-consistency.
With self-consistency, instead of asking the model for just one answer, we ask it to generate several query candidates for the same question. We then execute these queries, cluster them by their execution results, and select a representative query from the largest cluster. This approach is powerful because if the model arrives at the same answer through different reasoning paths, that answer has a much higher probability of being correct.
It’s important to note that self-consistency is a standard, efficient method, but it is not the only way to select a query. More complex, agentic frameworks can achieve even higher accuracy. For example, our team’s own research on CHASE-SQL (our state-of-the-art ensembling methodology) demonstrates that using diverse candidate generators and a trained selection agent can significantly outperform consistency-based methods.
For this benchmark, we wanted to focus on the model’s core performance. Therefore, we used the more direct self-consistency method: we generated several queries, executed them, and selected a query from the group that produced the most common result. This approach allowed us to measure the model’s raw text-to-SQL ability, minimizing the influence of a more complex filtering or reranking system.
The BIRD Single-Model Track explicitly allows for self-consistency, which reflects the model’s own internal capabilities. The benchmark categorizes submissions based on the number of candidates used (‘Few’, ‘Many’, or ‘Scale’). We found our “sweet spot” in the “Few” (1-7 candidates) category.
This approach gave us the final, critical boost in execution accuracy that pushed our model to the top of the leaderboard. More importantly, it proves our core thesis: by investing in high-quality data and instruction tuning, you can build a single model that is powerful enough to be production-ready without requiring a heavy, high-latency inference framework.
A recipe for customizing Gemini for text-to-SQL
A combination of clean data, multi-task learning, and efficient self-consistencyallowed us to take the powerful Gemini 2.5-pro model and build a specialist that achieved the top-ranking score on the BIRD single-model benchmark.
Our fine-tuned model represents a much stronger baseline for text-to-SQL. However, it’s important to note that this score is not the upper bound of accuracy. Rather, it is the new, higher baseline we have established for the core model’s capability in a constrained setting. These results can be further amplified by either
creating an ensemble, aka integrating this specialist model into a broader system that employs preprocessing (like example retrieval) or agentic scaffolding (like our CHASE-SQL research), or
optimizing model quality for your unique database by enhancing metadata and/or query examples (which is how our customers typically deploy production workloads).
Nevertheless, the insights from this research are actively informing how we build our next-generation AI-powered products for Google Data Cloud, and we’ll continue to deliver these enhancements in our data services.
Explore advanced text-to-SQL capabilities today
We’re constantly working to infuse our products with these state-of-the-art capabilities, starting with bringing natural language queries to applications built on AlloyDB and BigQuery. For AI-enhanced retrieval, customers especially value AlloyDB and its AI functions. AlloyDB integrates AI capabilities directly into the database, allowing developers to run powerful AI models using standard SQL queries without moving data. It offers specialized operators such as AI.IF() for intelligent filtering, AI.RANK() for semantic reranking of search results, and AI.GENERATE() for in-database text generation and data transformation.
And if you want to write some SQL yourself, Gemini Code Assist can help. With a simple prompt, you can instruct Gemini as to the query you want to create. Gemini will generate your code and you can immediately test it by executing it against your database. We look forward to hearing about what you build with it!
Editor’s note: Waze (a division of Google parent company Alphabet) depends on vast volumes of dynamic, real-time user session data to power its core navigation features, but scaling that data to support concurrent users worldwide required a new approach. Their team built a centralized Session Server backed by Memorystore for Redis Cluster, a fully managed service with 99.99% availability that supports partial updates and easily scales to Waze’s use case of over 1 million MGET commands per second with ~1ms latency. This architecture is the foundation for Waze’s continued backend modernization.
Real-time data drives the Waze app experience. Our turn-by-turn guidance, accident rerouting, and driver alerts depend on up-to-the-millisecond accuracy. But keeping that experience seamless for millions of concurrent sessions requires robust and battle hardened infrastructure that is built to manage a massive stream of user session data. This includes active navigation routes, user location, and driver reports that can appear and evolve within seconds.
Behind the scenes, user sessions are large, complex objects that update frequently and contribute to an extremely high volume of read and write operations. Session data was once locked in a monolithic service, tightly coupled to a single backend instance. That made it hard to scale and blocked other microservices from accessing the real-time session state. To modernize, we needed a shared, low-latency solution that could handle these sessions in real time and at global scale. Memorystore for Redis Cluster made that possible.
aside_block
<ListValue: [StructValue([(‘title’, ‘Build smarter with Google Cloud databases!’), (‘body’, <wagtail.rich_text.RichText object at 0x7f65a9750eb0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Choosing the right route
As we planned the move to a microservices-based backend, we evaluated our options, including Redis Enterprise Cloud, a self-managed Redis cluster, or continuing with our existing Memcached via Memorystore deployment. In the legacy setup, Memcached stored session data behind the monolithic Realtime (RT) server, but it lacked the replication, advanced data types, and partial update capabilities we wanted. We knew Redis had the right capabilities, but managing it ourselves or through a third-party provider would add operational overhead.
Memorystore for Redis Cluster offered the best of both worlds. It’s a fully managed service from Google Cloud with the performance, scalability, and resilience to meet Waze’s real-time demands. It delivers a 99.99% SLA and a clustered architecture for horizontal scaling. With the database decision made, we planned a careful migration from Memcached to Memorystore for Redis using a dual-write approach. For a period, both systems were updated in parallel until data parity was confirmed. Then we cut over to Redis with zero downtime.
Waze’s new data engine
From there, we built a centralized Session Server – our new command center for active user sessions – as a wrapper around Memorystore for Redis Cluster. This service became the single source of truth for all active user sessions, replacing the tight coupling between session data and the monolithic RT server. The Session Server exposes simple gRPC APIs, allowing any backend microservice to read from or write to the session state directly, including RT during the migration. This eliminated the need for client affinity, freed us from routing all session traffic through a single service, and made session data accessible across the platform.
We designed the system for resilience and scale from the ground up. Redis clustering and sharding remove single points of contention, letting us scale horizontally as demand grows. Built-in replication and automatic failover are designed to keep sessions online. While node replacements may briefly increase failure rates and latency for a short period, sessions are designed to stay online, allowing the navigation experience to quickly stabilize.And with support for direct gRPC calls from the mobile client to any backend service, we can use more flexible design patterns while shaving precious milliseconds off the real-time path.
Fewer pit stops, faster rides
Moving from Memcached’s 99.9% SLA to Memorystore for Redis Cluster’s 99.99% means higher availability and resiliency from the service. Load testing proved the new architecture can sustain full production traffic, comfortably handling bursts of up to 1 million MGET commands per second with a stable sub-millisecond service latency.
Because Memorystore for Redis supports partial updates, we can change individual fields within a session object rather than rewriting the entire record. That reduces network traffic, speeds up write performance, and makes the system more efficient overall – especially important when sessions can grow to many megabytes in size. These efficiencies translate directly into giving our engineering teams more time to focus on application-level performance and new feature development.
Session data in Memorystore for Redis Cluster is now integral to Waze’s core features, from evaluating configurations to triggering real-time updates for drivers. It supports today’s demands and is built to handle what’s ahead.
The road ahead
By proving Memorystore for Redis Cluster in one of Waze’s most critical paths, we’ve built the confidence to use it in other high-throughput caching scenarios across the platform. The centralized Session Server and clustered Redis architecture are now standard building blocks in our backend, which we can apply to new services without starting from scratch.
With that initial critical path complete, our next major focus is the migration of all remaining legacy session management from our RT server. This work will ultimately give every microservice independent access to update session data. Looking ahead, we’re also focused on scaling Memorystore for Redis Cluster to meet future user growth and fine-tuning it for both cost and performance.
Learn more
Waze’s story showcases the power and flexibility of Memorystore for Redis Cluster, a fully managed service with 99.99% availability for high-scale, real-time workloads.
Learn more about the power of Memorystore and get started for free.
AWS Marketplace now delivers purchase agreement events via Amazon EventBridge, transitioning from our Amazon Simple Notification Service (SNS) notifications for Software as a Service and Professional Services product types. This enhancement simplifies event-driven workflows for both sellers and buyers by enabling seamless integration of AWS Marketplace Agreements, reducing operational overhead, and improving event monitoring and automation.
Marketplace sellers (Independent Software Vendors and Channel Partners) and buyers will receive notifications for all events in the lifecycle of their Marketplace Agreements, including when they are created, terminated, amended, replaced, renewed, cancelled or expired. Additionally, ISVs receive license-specific events to manage customer entitlements. With EventBridge integration, you can route these events to various AWS services such as AWS Lambda, Amazon S3, Amazon CloudWatch, AWS Step Functions, and Amazon SNS, maintaining compatibility with existing SNS-based workflows while gaining advanced routing capabilities.
EventBridge notifications are generally available and can be created in AWS US East (N. Virginia) Region.
To learn more about AWS Marketplace event notifications, see the AWS Marketplace documentation. You can start using EventBridge notifications today by visiting the Amazon EventBridge console and enabling the ‘aws.agreement-marketplace’ event source.
Amazon SageMaker Catalog now supports read and write access to Amazon S3 general purpose buckets. This capability helps data scientists and analysts search for unstructured data, process it alongside structured datasets, and share transformed datasets with other teams. Data publishers gain additional controls to support analytics and generative AI workflows within SageMaker Unified Studio while maintaining security and governance controls over shared data.
When approving subscription requests or directly sharing S3 data within the SageMaker Catalog, data producers can choose to grant read-only or read and write access. If granted read and write access, data consumers can process datasets in SageMaker and store the results back to the S3 bucket or folder. The data can then be published and automatically discoverable by other teams. This capability is now available in all AWS Regions where Amazon SageMaker Unified Studio is supported. To get started, you can log into SageMaker Unified Studio, or you can use the Amazon DataZone API, SDK, or AWS CLI. To learn more, see the SageMaker Unified Studio guide.
Amazon RDS Blue/Green deployments now support safer, simpler, and faster updates for your Aurora Global Databases. With just a few clicks, you can create a staging (green) environment that mirrors your production (blue) Aurora Global Database, including primary and all secondary regions. When you’re ready to make your staging environment the new production environment, perform a blue/green switchover. This operation transitions your primary and all secondary regions to the green environment, which now serves as the active production environment. Your application begins accessing it immediately without any configuration changes, minimizing operational overhead.
With Global Database, a single Aurora cluster can span multiple AWS Regions, providing disaster recovery for your applications in case of single Region impairment and enabling fast local reads for globally distributed applications. With this launch, you can perform critical database operations including major and minor version upgrades, OS updates, parameter modifications, instance type validations, and schema changes with minimal downtime. During blue/green switchover, Aurora automatically renames clusters, instances, and endpoints to match the original production environment, enabling applications to continue operating without any modifications. You can leverage this capability using the AWS Management console, SDK, or CLI.
Start planning your next Global Database upgrade using RDS Blue/Green deployments by following the steps in the blog. For more details, refer to our documentation.
AWS IoT Core, AWS IoT Device Management, and AWS IoT Device Defender have expanded support for Virtual Private Cloud (VPC) endpoints and IPv6. Developers can now use AWS PrivateLink to establish VPC endpoints for all data plane operations, management APIs, and credential provider. This enhancement allows IoT workloads to operate entirely within virtual private clouds without traversing the public internet, helping strengthen the security posture for IoT deployments.
Additionally, IPv6 support for both VPC and public endpoints gives developers the flexibility to connect IoT devices and applications using either IPv6 or IPv4. This helps organizations meet local requirements for IPv6 while maintaining compatibility with existing IPv4 infrastructure.
These features can be configured through the AWS Management Console, AWS CLI, and AWS CloudFormation. The functionality is now generally available in all AWS Regions where the relevant AWS IoT services are offered. For more information about the IPv6 support and VPCe support, customers can visit the AWS IoT technical documentation pages. For information about PrivateLink pricing, visit the AWS PrivateLink pricing page.
Welcome back to The Agent Factory! In this episode, we’re joined by Ravin Kumar, a Research Engineer at DeepMind, to tackle one of the biggest topics in AI right now: building and training open-source agentic models. We wanted to go beyond just using agents and understand what it takes to build the entire factory line—from gathering data and supervised fine-tuning to reinforcement learning and evaluations.
This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.
Before diving into the deep research, we looked at the latest developments in the fast-moving world of AI agents.
Gemini 2.5 Computer Use: Google’s new model can act as a virtual user, interacting with computer screens, clicking buttons, typing in forms, and scrolling. It’s a shift from agents that just know things to agents that can do tasks directly in a browser.
Vibe Coding in AI Studio: A new approach to app building where you describe the “vibe” of the application you want, and the AI handles the boilerplate. It includes an Annotation Mode to refine specific UI elements with simple instructions like “Change this to green.”
DeepSeek-OCR and Context Compression: DeepSeek introduced a method that treats documents like images to understand layout, compressing 10-20 text tokens into a single visual token. This drastically improves speed and reduces cost for long-context tasks.
Google Veo 3.1 and Flow: The new update to the AI video model adds rich audio generation and powerful editing features. You can now use “Insert” to add characters or “Remove” to erase objects from existing video footage, giving creators iterative control.
Ravin Kumar on Building Open Models
We sat down with Ravin to break down the end-to-end process of creating an open model with agent capabilities. It turns out the process mirrors a traditional ML lifecycle but with significantly more complex components.
Ravin explained that training data for agents looks vastly different from standard text datasets. It starts with identifying what users actually need. The data itself is a collection of trajectories, complex examples of the model making decisions and using tools. Ravin noted that they use a mix of human-curated data and synthetic data generated by their own internal “teacher” models and APIs to create a playground for the open models to learn in.
Training Techniques: SFT and Reinforcement Learning
Once the data is ready, the training process involves a two-phase approach. First comes Supervised Fine-Tuning (SFT), where frameworks update the model’s weights to nudge it into new behaviors based on the examples. However, to handle generalization—new situations not in the original trainin data—they rely on Reinforcement Learning (RL). Ravin highlighted the difficulty of setting rewards in RL, warning that models are prone to “reward hacking,” where they might collect intermediate rewards without ever completing the final task.
Ravin emphasized that evaluation is the most critical and high-stakes part of the process. You can’t just trust the training process; you need a rigorous “final exam.” They use a combination of broad public benchmarks to measure general capability and specific, custom evaluations to ensure the model is safe and effective for its intended user use case.
Conclusion
This conversation with Ravin Kumar really illuminated that building open agentic models is a highly structured, rigorous process. It requires creating high-quality trajectories for data, a careful combination of supervised and reinforcement learning, and, crucially, intense evaluation.
Your turn to build
As Ravin advised, the best place to start is at the end. Before you write a single line of training code, define what success looks like by building a small, 50-example final exam for your agent. If you can’t measure it, you can’t improve it. We also encourage you to try mixing different approaches; for example, using a powerful API model like Gemini as a router and a specialized open-source model for specific tasks.
Check out the full episode for more details, and catch us next time!
Starting today, AWS Network Firewall is available in the AWS New Zealand (Auckland) Region, enabling customers to deploy essential network protections for all their Amazon Virtual Private Clouds (VPCs).
AWS Network Firewall is a managed firewall service that is easy to deploy. The service automatically scales with network traffic volume to provide high-availability protections without the need to set up and maintain the underlying infrastructure. It is integrated with AWS Firewall Manager to provide you with central visibility and control over your firewall policies across multiple AWS accounts.
To see which regions AWS Network Firewall is available in, visit the AWS Region Table. For more information, please see the AWS Network Firewall product page and the service documentation.
Amazon EventBridge introduces a new intuitive console based visual rule builder with a comprehensive event catalog for discovering and subscribing to events from custom applications, and over 200 AWS services. The new rule builder integrates the EventBridge Schema Registry with an updated event catalog and intuitive drag and drop canvas that simplifies building event-driven applications.
With enhanced rule builder, developers can browse and search through events with readily available sample payloads and schemas, eliminating the need to find and reference individual service documentation. The schema-aware visual builder guides developers through creating event filter patterns and rules, reducing syntax errors and development time.
The EventBridge enhanced rule builder is available today in all regions where the Schema Registry is launched. Developers can get started through the Amazon EventBridge console at no additional cost beyond standard EventBridge usage charges.
For more information, visit the EventBridge documentation.
Amazon RDS for PostgreSQL now supports major version 18, starting with PostgreSQL version 18.1. PostgreSQL 18 introduces several important community updates that improve query performance and database management.
PostgreSQL 18.0 includes “skip scan” support for multicolumn B-tree indexes and improved WHERE clause handling for OR and IN conditions enhance query optimization. Parallel Generalized Inverted Index (GIN) builds and updated join operations boost overall database performance. The introduction of Universally Unique Identifiers Version 7 (UUIDv7) combines timestamp-based ordering with traditional UUID uniqueness, particularly beneficial for high-throughput distributed systems. PostgreSQL 18 also improves observability by providing buffer usage counts, index lookup statistics during query execution, and per-connection I/O utilization metrics. This release also includes support for the new pgcollection extension, and updates to existing extensions such as pgaudit 18.0, pgvector 0.8.1, pg_cron 1.6.7, pg_tle 1.5.2, mysql_fdw 2.9.3, and tds_fdw 2.0.5.
You can upgrade your database using several options including RDS Blue/Green deployments, upgrade in-place, restore from a snapshot. Learn more about upgrading your database instances in the Amazon RDS User Guide.
Amazon RDS for PostgreSQL makes it simple to set up, operate, and scale PostgreSQL deployments in the cloud. See Amazon RDS for PostgreSQL Pricing for pricing details and regional availability. Create or update a fully managed Amazon RDS database in the Amazon RDS Management Console.
Amazon DocumentDB (with MongoDB compatibility) announces version 8.0, which now offers added support for drivers supporting the MongoDB API versions 6.0, 7.0, and 8.0. Amazon DocumentDB 8.0 also improves query latency by up to 7x and compression ratio by up to 5x, enabling you to build high-performance applications at a lower cost.
The following are features and capabilities introduced in Amazon DocumentDB 8.0:
Compatibility with MongoDB 8.0: Amazon DocumentDB 8.0 provides compatibility with MongoDB 8.0 by adding support for MongoDB 8.0 API drivers. Amazon DocumentDB 8.0 also supports applications that are built using MongoDB API versions 6.0 and 7.0.
Planner Version3:New query planner in Amazon DocumentDB 8.0 extends performance improvements to aggregation stage operators, along with supporting aggregation pipeline optimizations and distinct commands.
New aggregation stages and operators: Amazon DocumentDB 8.0 offers 6 new aggregation stages: $replaceWith, $vectorSearch, $merge, $set, $unset, $bucket, and 3 new aggregation operators $pow, $rand, $dateTrunc.
Compression: Support for dictionary-based compression through the Zstandard compression algorithm improves compression ratio by up to 5x, thus improving storage efficiency and reducing I/O costs.
New capabilities: Amazon DocumentDB 8.0 supports collation and views.
A new version of text index:Text index v2 in Amazon DocumentDB 8.0 introduces additional tokens, enhancing text search capabilities.
Vector search improvements: Through parallel vector index build, Amazon DocumentDB 8.0 reduces index build time by up to 30x.
You can use AWS Database Migration Service (DMS) to upgrade your Amazon DocumentDB 5.0 instance-based clusters to Amazon DocumentDB 8.0 clusters. Please see upgrading your DocumentDB cluster to learn more. Amazon DocumentDB 8.0 is available in all AWS Regions where Amazon DocumentDB is available. To learn more about Amazon DocumentDB 8.0 visit the documentation.
Amazon Simple Queue Service (Amazon SQS) now allows customers to make API requests over Internet Protocol version 6 (IPv6) in the AWS GovCloud (US) Regions. The new endpoints have also been validated under the Federal Information Processing Standard (FIPS) 140-3 program.
Amazon SQS is a fully managed message queuing service that enables decoupling and scaling of distributed systems, microservices, and serverless applications. With this update, customers have the option of using either IPv6 or IPv4 when sending requests over dual-stack public or VPC endpoints.
Amazon SQS now supports IPv6 in all Regions where the service is available, including AWS Commercial, AWS GovCloud (US) and China Regions. For more information on using IPv6 with Amazon SQS, please refer to our developer guide.
AWS Lambda now supports building serverless applications using Rust. Previously, AWS classified Rust support in Lambda as ‘experimental’ and did not recommend using Rust for production workloads. With this launch, Rust support in Lambda is now Generally Available, backed by AWS Support and the Lambda SLA.
Rust is a popular programming language, offering high performance, memory efficiency, compile-time code safety features, and a mature package management and tooling ecosystem. This makes Rust an ideal choice for developers building performance-sensitive serverless applications. Developers can now build business-critical serverless applications in Rust and run them in Lambda, taking advantage of Lambda’s built-in event source integrations, fast scaling from zero, automatic patching, and usage-based pricing.
Lambda support for Rust is available in all AWS Regions, including the AWS GovCloud (US) Regions and the China Regions.
Building on the Hooks Invocation Summary launched in September 2025, AWS CloudFormation Hooks now supports granular invocation details. Hook authors can supplement their Hook evaluation responses with detailed findings, finding severity, and remediation advice. The Hooks console now displays these details at the individual control level within each invocation, enabling developers to quickly identify and resolve specific Hook failures.
Customers can easily drill down from the invocation summary to see exactly which controls passed, failed, or were skipped, along with specific remediation guidance for each failure. This granular visibility eliminates guesswork when debugging Hook failures, allowing teams to pinpoint the exact control that blocked a deployment and understand how to fix it. The detailed findings accelerate troubleshooting and streamline compliance reporting by providing actionable insights at the individual control level.
The Hooks invocation summary page is available in all commercial and GovCloud (US) regions. To learn more, visit the AWS CloudFormation Hooks View Invocations documentation.
In a world of increasing data volume and demand, businesses are looking to make faster decisions and separate insight from noise. Today, we’re bringing Conversational Analytics to general availability in Looker, delivering natural language queries to everyone in your organization, removing BI bottlenecks. With Conversational Analytics, we’re transforming the way you get answers, cutting through stale dashboards and accelerating data discovery. Our goal: make analytics and AI as easy and scalable as performing a Google search, extending BI to the broader enterprise as you go from prompt to full data exploration in seconds.
Instant AI-powered insights with Conversational Analytics in Looker
Now, with Conversational Analytics, getting an answer from your data is as simple as chatting with your most knowledgeable colleague. By tapping into human conversation, Conversational Analytics relieves you from struggling with complex dashboard filters, obscure field names, or the need to write custom SQL.
“At YouTube, we’re focused on helping creators succeed and bring their creativity to the world. We’ve been testing Conversational Analytics in Looker to give our partner managers instant, actionable data that lets them quickly guide creators and optimize creator support.” – Thomas Seyller, Senior Director, Technology & Insights, YouTube Business
The general availability of Conversational Analytics combines the reasoning power of Gemini, new capabilities in Google’s agentic frameworks, and the trusted data modeling of the Looker platform. Together, these set the stage for the next chapter in self-service analytics, making reliable data insights accessible to the entire enterprise. Conversational Analytics agents can understand your questions and provide insightful answers to questions about your data.
New at general availability is the ability to analyze data across domains. You can ask questions that integrate insights from up to five distinct Looker Explores (pre-joined views), spanning multiple business areas. Additionally, you can share the agents you build with colleagues, giving them faster access to a single source of truth, speeding consensus, and driving uniform decisions.
You can build and share agents with colleagues to have a consistent data picture.
Built on a trusted, governed foundation
The power of Conversational Analytics isn’t just in the conversation it enables; it’s in the trust of the underlying data. Conversational Analytics is grounded in Looker’s semantic layer, which ensures that every metric, field, and calculation is centrally defined and consistent, acting as a crucial context engine for AI. As more of your colleagues rapidly use these expanded capabilities, you need to know the results they see and act on are accurate.
For analysts looking to explore data or everyday users receiving insights in the context of their business, Conversational Analytics also improves data self-service, minimizing technical friction that can create bottlenecks and leaves insights locked away.
You can now:
Ask anything, anytime: Get instant answers to simple questions like “Show me our website traffic last month for shoe sales,” leading to deeper questions and greater insights across business areas and domains.
Deepen the discovery: Move beyond the constraints of static dashboards and ask open-ended questions like, “Show me the trend of website traffic over the past six months and filter it by the California region.” The system intelligently generates the appropriate query and visualization instantly.
Extend enterprise BI: Connect your Looker models to your enterprise BI ecosystem, centralize and share agents, and create new dashboards, starting with a prompt. Built on top of Looker Explores, Conversational Analytics’ natural language interface usesLookML for fine tuning and output accuracy.
Pivot quickly: The conversational interface supports multi-turn questions, so you can iterate on your findings. Ask for total sales, then follow up with, “Now show me that as an area chart, broken down by payment method.”
Gain full transparency: To build confidence and data literacy, the “How was this calculated?” feature provides a clear, natural language explanation of the underlying query that generated the results, so that you understand the source of your findings.
Empower the BI analyst and business user
Conversational Analytics is democratizing data for business teams, helping them govern the business’s data. At the same time, it’s also enhancing productivity and influence for data analysts and developers.
When business users can self-serve trusted data insights, data analysts see fewer interruptions and “ad-hoc” ticket requests, and can instead focus on high-impact work. Analysts can customize their client teams’ BI experiences by building Conversational Analytics agents that define common questions, filters, and style guidelines, so different teams can act on the same data in different ways.
Get ready to start talking
Conversational Analytics is available now for all Looker platform users. Your admin can enable it in your Looker instance today and you will discover how easy it is to move from simply asking “What?” to confidently determining “What’s next?” For more information, review the product documentation or watch this video tutorial.