When it comes to AI, inference is where today’s generative AI models can solve real-world business problems. Google Kubernetes Engine (GKE) is seeing increasing adoption of gen AI inference. For example, customers like HubX run inference of image-based models to serve over 250k images/day to power gen AI experiences, and Snap runs AI inference on GKE for its ad ranking system.
However, there are challenges when deploying gen AI inference. First, during the evaluation phase of this journey, you have to evaluate all your accelerator options. You need to choose the right one for your use case. While many customers are interested in using Tensor Processing Units (TPU), they are looking for compatibility with popular model servers. Then, once you’re in production, you need to load-balance traffic, manage price-performance with real traffic at scale, monitor performance, and debug any issues that arise.
To help, this week at Google Cloud Next, we introduced new gen AI inference capabilities for GKE:
GKE Inference Quickstart to help you set up inference environments according to best practices enhancements
GKE Inference Gateway, which introduces gen-AI-aware scaling and load balancing techniques
Together these capabilities help reduce serving costs by over 30%, tail latency by 60%, and increase throughput by up to 40% compared to other managed and open-source Kubernetes offerings.
GKE Inference Quickstart helps you select and optimize the best accelerator, model server and scaling configuration for your AI/ML inference applications. It includes information about instance types, their model compatibility across GPU and TPUs, and benchmarks for how a given accelerator can help you meet your performance goals. Then, once your accelerators are configured, GKE Inference Quickstart can help you with Kubernetes scaling, as well as new inference-specific metrics. In future releases, GKE Inference Quickstart will be available as a Gemini Cloud Assist experience.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3ece88cb05b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
GKE TPU serving stack
With support for TPUs and vLLM, one of the leading open-source model servers, you get seamless portability across GPUs and TPUs. This means you can use any open model, select the vLLM:TPU container image and just deploy on GKE without any TPU-specific changes. GKE Inference Quickstart also recommends TPU best practices so you can seamlessly run on TPUs without any switching costs. For customers who want to run state-of-the-art models, Pathways, used internally at Google for large models like Gemini, allows you to run multi-host and disaggregated serving.
GKE Inference Gateway
GKE Gateway is an abstraction backed by a load balancer to route incoming requests to your Kubernetes applications, and traditionally, it has been tuned for web serving applications, using load-balancing techniques such as round-robin, whose requests have very predictable patterns. But LLMs have high variability in their request patterns. This can result in high tail latencies and uneven compute utilization, which can negatively impact the end-user experience and unnecessarily increase inference costs. In addition, traditional Gateway does not support routing infrastructure for popular Parameter-Efficient Fine-Tuning (PEFT) techniques like Low-Rank Adaptation (LoRA), which can increase GPU efficiency by model reuse during inference.
For scale-out scenarios, the new GKE Inference Gateway provides gen-AI-aware load balancing, for optimal routing. With GKE Inference Gateway, you can define routing rules for safe rollouts, cross-regional preferences, and performance goals such as priority. Finally, GKE Inference Gateway supports LoRA, which lets you map multiple models to the same underlying service, for better efficiency.
To summarize, the visual below shows the needs of the customers during the different stages of the AI inference journey, and how GKE Inference Quickstart, GKE TPU serving stack and GKE Inference Gateway help simplify the evaluation, onboarding and production phases.
What our customers are saying
“Using TPUs on GKE, especially the newer Trillium for inference, particularly for image generation, has reduced latency by up to 66%, leading to a better user experience and increased conversion rates. Users get responses in under 10 seconds instead of waiting up to 30 seconds. This is crucial for user engagement and retention.” – Cem Ortabas, Co-founder, HubX
“Optimizing price-performance for generative AI inference is key for our customers. We are excited to see GKE Inference Gateway with its optimized load balancing and extensibility in open-source. The new GKE Inference Gateway capabilities could help us further improve performance for our customers’ inference workloads “ – Chaoyu Yang, CEO & Founder, BentoML
With GKE’s new inference capabilities, you get a powerful set of capabilities to take the next step with AI. To learn more, join our GKE gen AI inference breakout session at Next 25, and hear how Snap re-architected their inference platform.
Data is the fuel for AI, and organizations are racing to leverage enterprise data to build AI agents, intelligent search, and AI-powered analytics for productivity, deeper insights, and a competitive edge. To power their data clouds, tens of thousands of organizations already choose BigQuery and its integrated AI capabilities.
This decade requires AI-native, multimodal, and agentic data-to-AI platforms, with BigQuery leading the way as the autonomous data-to-AI platform. Finally, we have a platform that infuses AI, makes unstructured data a first class citizen, accelerates open lakehouses and embeds governance.
As an autonomous data-to-AI platform, BigQuery enables a self-managing multimodal data foundation that’s built for processing and activation of all data types, with advanced engines that can be operated on by specialized agents. The platform’s shared catalog and governance layer helps ensure consistent data access, metadata understanding, and security policies across all data and engines, minimizing silos and simplifying management. BigQuery is built on Google’s global infrastructure, leveraging high-bandwidth networks, low-latency storage, and AI-accelerated hardware (TPUs, GPUs), for virtually unlimited scalability. With our commitment to open standards and AI embedded at every layer, this fully integrated architecture accelerates your journey to AI-driven insights at the lowest cost possible.
AI assistance across the entire data lifecycle
Gemini in BigQuery brings a set of AI-powered assistive capabilities to automate data discovery and exploration, data preparation and engineering, analysis and insight generation, covering the entire data journey.
Thousands of organizations are using Gemini in BigQuery. In fact, usage of code assist in BigQuery grew 350% over the last 9 months, with over a 60% code generation acceptance rate across SQL and Python.
Yesterday we announced the general availability of several additional Gemini in BigQuery features and added new capabilities that further enhance and automate your analytics workflows.
Simplify data preparation: BigQuery Gemini-assisted data preparation (GA)provides intelligent suggestions for data enrichment, easily identifies and rectifies data inconsistencies, provides low-code visual data pipelines, and automates the execution and monitoring of your data pipelines.
Faster time to insights with data canvas: BigQuery data canvas allows you to find, transform, query, and visualize data using natural language prompts and a graphic interface. New dataset-level insights (preview) can surface hidden relationships between tables and generate cross-table queries by integrating query usage analysis and metadata.
Boost productivity with coding assistance for DataFrames: With AI code assistance in BigQuery, you can use natural language prompts to generate or suggest code in SQL or Python, or to explain an existing SQL query. We are now extending this code assist capabilities to BigQuery DataFrames (preview).
Improve data and AI governance: New automated metadata generation (preview) uses profile scans and Gemini to create clear and consistent descriptions for columns, tables, and glossary terms, even with large datasets. This metadata improves governance and helps AI agents find the data they need for exploration and analysis.
Accelerate BigQuery migrations: SQL translation assistance (GA) is an AI-based translator that lets you create Gemini-enhanced rules to customize your SQL translations. You can describe changes to the SQL translation output using natural language prompts or specify SQL patterns to find and replace. This can also help in rapidly increasing familiarity with BigQuery SQL.
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ece8ad7a070>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
A multimodal autonomous data foundation
BigQuery helps you develop an autonomous data foundation by unifying analytics capabilities across diverse data types and enabling the seamless, concurrent analysis of both structured and unstructured data within a single platform. In fact, customer data in BigQuery grew nearly 30% last year, adding to the multiple exabytes already stored. Furthermore, its native, first-party integration with Vertex AI allows you to apply powerful AI models directly to your data, eliminating the requirement for complex data movement or replication.
“BigQuery and Vertex AI bring all our data and AI together into a single platform. This has transformed how we take action on customer feedback from a lengthy manual process, to a simple natural language query in seconds, allowing us to get to customer insights in minutes instead of months.” – TJ Allard, Lead Data Scientist, Mattel
Yesterday we announced several innovations to enhance our unstructured data support and AI processing:
BigQuery tables for Apache Iceberg (preview:)Connect your Apache Iceberg data to SQL, Spark, AI and third-party engines in an open and interoperable manner so you can get the flexibility of an open data lakehouse alongside the performance and integrated tooling of BigQuery. This offering provides adaptive and autonomous table management, delivers high-performance streaming, auto-AI-generated insights, near-infinite serverless scale and advanced governance.
Native multimodal support for BigQuery tables: Built on object tables, the new ObjectRef data type (preview) enables storage and querying of unstructured and structured data using Python and SQL functions.
Multimodal capabilities for Python users: The BigQuery DataFrames library now has multimodal capabilities for unified structured and unstructured analytics, AI operators for semantic insights, and Gemini code assistance.
Easy capture of Unstructured Data Processing: BigQuery ML new capabilities in preview include AI.GENERATE_TABLE for capturing the output of LLM inference within SQL clauses. Additionally, we’ve expanded model choice to include Anthropic’s Claude, Llama, and Mistral models, and open-source models hosted on Vertex AI.
Scalable, faster and cost-efficient vector search: BigQuery vector search allows you to generate, manage, and search embeddings within a serverless, fully integrated environment for powerful analytics. We are introducing a new index type (GA) based on Google’s ScaNN model coupled with a CPU-optimized distance computation algorithm, enabling scalable, faster and more cost-efficient processing.
Easier time-series forecasting in BigQuery ML: BigQuery ML simplifies time-series forecasting with the new TimesFM model (preview). This pretrained model, developed by Google Research, is user-friendly, accurate, fast, and scalable.
Pinpoint the key factors driving changes in your metrics: Organizations constantly need to answer questions like “Why did our sales drop last month?”. ” Answering these “why” questions accurately is vital, but often involves complex manual analysis. BigQuery’scontribution analysis feature (GA) helps you pinpoint the key factors (or combinations of factors) responsible for the most significant changes in a metric.
Simplified and unified governance in BigQuery
BigQuery offers built-in governance capabilities that simplify how you discover, manage, monitor, govern, and use your data and AI assets. BigQuery universal catalog brings together a data catalog (formerly known as the Dataplex Catalog) and a fully managed, serverless metastore. Yesterday, we announced the following new capabilities for BigQuery governance:
Enable engine interoperability across BigQuery, Apache Spark, and Apache Flink engines with BigQuery metastore (GA). With support for the Iceberg Catalog it simplifies data discovery and querying across engines, mirroring the open-source experience.
Empower your organization with a business glossary (GA),which provides a shared understanding of data. Customers can define and administer company terms in a business glossary, identify data stewards for these terms, and attach them to data asset fields, to improve context, collaboration, and search.
Perform bulk extract of catalog entries into Cloud Storage with Catalog metadata export (GA). This enables a wide range of use cases including metadata analytics by making the export output queryable from BigQuery, programmatic workloads requiring access to a large scope of metadata, and metadata integration.
Automatic at-scale cataloging of BigLake and object tables (GA): BigQuery harvests up-to-date metadata for structured and unstructured data from Cloud Storage and automatically creates query-ready BigLake tables at scale.
Enhanced enterprise capabilities
BigQuery offers easy managed disaster recovery (GA) for compute and storage. It features automatic failover coordination, continuous near-real-time data replication to a secondary region, and fast, transparent recovery during outages. This provides business continuity with industry-leading recovery point objectives (RPO) and recovery time objectives (RTO).
We are also introducing new workload management capabilities (preview) for isolation, resource control, and observability. Users gain granular controls with flexible, securable reservations that allow users to assign to different jobs in the same project to different reservations. Features include reservation level fair sharing of slots, predictability in performance of reservations, and enhanced observability through reservation attribution in billing for better cost tracking.
Improved query performance
To further simplify analytics, we introduced several new innovations to help you get the most out of SQL and make your queries work better for you automatically. Query performance optimizations (GA) improve query performance and automatically identify and accelerate relevant workloads with no changes required to the schema or queries. These include:
Low latency API for short queries enables short-query-optimized mode to improve overall latency of short queries that are common in workloads such as data exploration or building dashboards by executing the query and returns the results inline for SELECT statements.
History-based optimizations use information from already-completed executions of similar queries to apply additional optimizations and further improve query performance such as query latency and slot-time consumed.
Column metadata index (CMETA) provides (almost) infinitely scalable and highly performant metadata management for BigQuery, where you can go from 10GB tables to 100PB and still get great price/performance, without having to worry about redesign or replatforming.
New analytics capabilities
SQL-based continuous queries (GA): Simplify real-time data processing by enabling users to express complex transformations in SQL. You can runcontinuously processing SQLstatements to help analyze, transform, and reverse ETL data the moment new events arrive in BigQuery. This feature now supports slot autoscaling, greater monitoring through Cloud Monitoring, and exports to other clouds.
Simplify SQL with BigQuery pipe syntax (GA): This unique feature extends standard SQL to make it simpler, more concise, and flexible. Pipe syntax lets you apply operators in any order and as often as you need, streamlining SQL queries for tasks like data exploration, dashboard creation, and log analysis. Pipe syntax enhances clarity, efficiency, and maintainability, and its compatibility with most standard SQL operators ensures broad usability.
Geospatial analytics (preview): We’re integrating rich, analysis-ready geospatial datasets from Earth Engine and Google Maps Platform directly into BigQuery data clean rooms. And with the ST_RegionStats function, BigQuery users can now use Earth Engine to efficiently extract statistics from raster data. For the first time, data analysts and decision-makers can access geospatial insights from Google Maps Platform and Earth Engine that lead to more informed and faster business and sustainability outcomes. Key decisions such as optimal site selection for a new business location, how to optimize operations and maintenance of your infrastructure assets, how to enable sustainable sourcing, and more are now enabled directly in BigQuery.
Continued innovation with the ISV ecosystem
Finally, BigQuery’s capabilities are being significantly extended by its vibrant partner ecosystem, through new and enhanced AI integrations and solutions. Anthropic’s Claude models are now accessible via BigQuery ML, facilitating functions like text generation and summarization. GrowthLoop introduced its Compound Marketing Engine built on BigQuery with Growth Agents powered by Gemini, so marketing can build personalized audiences and journeys that drive rapidly compounding growth. Furthermore, Informatica is expanding their services on Google Cloud to enable sophisticated analytical and AI governance use cases.
Significant advancements have also occurred in data management and observability. Fivetran introduced its Managed Data Lake Service for Cloud Storage with native integration with BigQuery metastore and automatic data conversion to open table formats like Apache Iceberg and Delta Lake, improving data lake management and discoverability. DBT is now integrated with BigQuery DataFrames and DBT Cloud is now on Google Cloud. Finally, Datadog has introduced expanded monitoring capabilities for BigQuery, providing granular visibility into query performance, usage attribution, and data quality metrics.
These partner innovations provide customers with expanded functionality, improved operational control, and streamlined access to sophisticated capabilities within the BigQuery ecosystem.
A data-to-AI platform for the autonomous era
BigQuery is evolving beyond a data warehouse and becoming the autonomous data-to-AI platform for all your data teams. The Gemini-powered agents, unified architecture, and commitment to open standards are lowering the barriers to entry for AI-powered analytics and enabling you to focus on what you do best: building innovative models and driving data-driven decisions.
As we bring together more capabilities within a unified platform we are making it easy for you to consume and use the platform with unified commercials with our new BigQuery spend commit. This provides commitments across our BigQuery unified platform, giving you the flexibility to move spend across data processing engines, streaming, governance and more.
Learn more about BigQuery and start exploring how these new features can transform your organization.
Special thanks to Geeta Banda, Head of Outbound Product Management, for her contributions to this blog post.
Data is the critical foundation for AI, yet a vast amount of data’s potential remains untapped. Why? Data quality remains a top barrier. To use enterprise data to drive analytics-driven decisions and build differentiated AI, businesses need to be able to find, understand, and trust their data assets. This requires effective data governance encompassing discovery, cataloging, metadata management, quality assurance, sharing, and access control.
The stakes are high. According to Gartner, “through 2026, those organizations that don’t enable and support their AI use cases through an AI-ready data practice will see over 60% of AI projects fail to deliver on business SLAs and be abandoned.”
At Google Cloud Next 25, we’re announcing BigQuery unified governance, powerful data governance capabilities that help enterprises keep pace with governance complexities. Data silos, fragmented metadata, and ambiguous ownership create significant risks and impede innovation. BigQuery unified governance provides services and tools organizations need to simplify data management and unlock actionable insights.
BigQuery’s built-in, intelligent governance simplifies data and AI management, helping organizations discover, understand, and leverage their assets, transforming governance from a burden into a powerful tool for data activation. Central to BigQuery governance is BigQuery universal catalog, a unified, AI-powered data catalog that natively integrates Dataplex, BigQuery sharing, security and metastore capabilities, bringing together business, technical, and runtime metadata.
BigQuery’s unified governance capabilities are:
1. Unified: BigQuery brings governance directly into the heart of your data-to-AI lifecycle, enabling discovery, understanding, governance, and utilization of your data assets and AI models. This gives data administrators, stewards, and custodians robust tools for metadata management and policy enforcement, providing end-to-end data-to-AI lineage, data profiling, insights, and secure sharing. And with the new universal semantic search, finding the right data is as simple as asking a question in natural language.
2. Intelligent: New governance capabilities powered by gen AI stand to revolutionize data management. By harnessing the power of large language models (LLMs), BigQuery universal catalog can help you uncover hidden relationships between BigQuery data assets, enable automated metadata curation and intelligent query recommendations at scale, automate governance, and democratize data-driven insights across the organization.
3. Open: BigQuery universal catalog insulates you from change with support for open storage standards such as Apache Iceberg, and a unified runtime metastore across SQL, open-source engines, and AI/ML. The BigQuery metastore, which is included in the BigQuery universal catalog, is Iceberg-compliant, enabling a multi-engine, multi-vendor architecture for governance and use of fully managed Iceberg data.
ANZ Bank, a multinational banking and financial services provider, uses the BigQuery universal catalog for comprehensive data governance, discovery, and observability.
“With BigQuery universal catalog, ANZ has significantly improved the reliability and trustworthiness of our data. The centralized data quality monitoring and automated validation features are increasing confidence and efficiency in critical business outputs and decisions based on accurate and consistent information. BigQuery governance has become a cornerstone of our data governance strategy, ensuring our data is not just available, but dependable.” Artur Kaluza, Head of Data Strategy and Transformation, Risk, ANZ
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ece8b947df0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Noteworthy features
The new unified governance experience in BigQuery provides a centralized interface within the BigQuery UI for managing, securing, and sharing data and AI assets. In addition, we are introducing a wide range of key new features and capabilities across governance, sharing, and security.
Governance
1. Full-catalog search with semantic understanding (preview): Users can now discover data and AI resources across projects and data silos within BigQuery using full-catalog semantic search. This feature introduces natural-language search capabilities, making it easier for both technical and non-technical users to search the catalog.
2. Automated metadata curation (preview): BigQuery universal catalog can now automatically generate metadata for BigQuery tables, including table and column descriptions, improving data discovery and support gen AI applications.
3. AI-powered knowledge engine (preview): Users can efficiently discover hidden relationships within a dataset with automated entity-relationship visualization. By leveraging inferred relationships, BigQuery universal catalog generates suggestions for cross-table queries and natural language questions, getting new data teams up to speed fast on unfamiliar data assets.
4. Data products (preview): BigQuery data products allow data owners to create, share, and govern collections of data assets by use case, packaging and sharing them within and across organizations in a way that’s consistent, governed, and that follows security best practices.
5. Business glossary (GA): The BigQuery business glossary provides organizations with a shared understanding of their data. Customers can define and administer company terms, identify data stewards for these terms, and attach them to data asset fields, improving context, collaboration, and search.
6. Automatic at-scale cataloging of BigLake and object tables (GA): BigQuery universal catalog harvests up-to-date metadata for structured and unstructured data from Cloud Storage, and uses it to automatically create query-ready BigLake tables at scale.
7. Automated anomaly detection (preview): BigQuery universal catalog automates data anomaly detection to help you identify data errors, inconsistencies, and outliers in your data, reducing the time you spend identifying and resolving data issues.
Full catalog search with semantic understanding
Automated metadata curation
Sharing
8. BigQuery sharing integration with Google Cloud Marketplace (preview): Data owners can monetize datasets in BigQuery sharing (formerly Analytics Hub) through Google Cloud Marketplace.
9. Stream sharing in BigQuery (GA): Curate and share valuable real-time streams with Pub/Sub topics in BigQuery sharing.
10. Stored procedure sharing in BigQuery (preview): Share SQL stored procedures and enable execution in the subscriber’s project without revealing the actual code.
11. Query template sharing in BigQuery (preview): Customize, reuse, and restrict SQL queries in a clean room through publisher-defined query templates.
Security
12. Data policies on columns (preview): Create raw access and data-masking policies associated directly to a column and that can be reused across columns and tables.
13. Subquery support with row-level security (GA): BigQuery universal catalog now supports SQL subqueries in security access policy definitions, enabling row filtering without changing existing data models.
These built-in governance advancements within the BigQuery platform help organizations unlock the full potential of their data and AI initiatives.
In addition to the innovation in BigQuery, we continue to partner with third-party catalog providers to complement their governance capabilities. For example, Collibra’s enterprise-wide governance for data and AI extends BigQuery universal catalog capabilities to provide end-to-end visibility, quality and stewardship across hybrid and multicloud environments. This partnership helps ensure more teams can discover and trust the data they need to do AI, no matter where it lives, accelerating and strengthening every use case.
By embedding governance into BigQuery and automating metadata management, BigQuery universal catalog is helping businesses move beyond the challenges of data silos and operational inefficiency, ultimately driving innovation and accelerating business impact. Ready to learn more? You can join several sessions covering the latest in BigQuery governance, sharing, and security featuring customer speakers:
From unraveling the mysteries of our planet and the universe, to accelerating medical research and industrial innovation, scientific discovery impacts nearly every facet of human life. Today, scientific progress depends on the interplay of theory, experimentation, and computation, and increasingly, the most important and challenging problems require high-performance computing (HPC) and other advanced computing technologies and techniques.
In recent years, artificial intelligence (AI) has emerged as a powerful tool for information assessment and generation, while also becoming a powerful tool for scientific discovery, business innovation, and productivity. More recently, advances in quantum computing are increasing our confidence in shortening the timelines to solving problems beyond the reach of classical computers. Quantum computers under development now will lead to larger production systems that will catalyze the creation of new drugs and materials, reduce costs and risks in complex financial and logistics scenarios, and enable the development of more capable AI models.
At Google, our vision is to be the most comprehensive, capable, and accessible platform for science. Since 2008, Google Cloud has powered scientific discoveries, providing computational and data storage capabilities — including HPC clusters — to scientists, engineers, and developers worldwide. And this week, to enable continued revolutionary new science, we are bringing the best of Google DeepMind and Google Research together with new infrastructure and AI capabilities in Google Cloud, providing researchers with highly capable, cloud-scale tools for scientific computing. These new capabilities include:
Supercomputing-class infrastructure for scientific computing: Researchers can now deploy and use supercomputing clusters powered by the latest H4D VMs powered by AMD CPUs, and A4/A4X VMs powered by the latest NVIDIA GPUs. These VMs have new low-latency networking that provides supercomputer-like scaling and performance. We’re also announcing Google Cloud Managed Lustre for high performance storage I/O. These resources will enable scientists to tackle large-scale, complex science problems.
Advanced scientific applications powered by AI models for weather forecasting and biology: We’re now offering our first AI-powered science applications for the broader science community: AlphaFold 3 for predicting the structure and interactions of biomolecules, and WeatherNext models for weather forecasting.
AI agents for quicker ideas and faster discovery: Two new AI agents in Google Agentspace – Deep Research and Idea Generation – can help prepare comprehensive research reports and rapidly generate new scientific hypotheses.
Let’s take a look at these new capabilities in more detail.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e318d9f9d30>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Supercomputing-class infrastructure and tools for science
Supercomputers are designed to achieve maximum performance on very large problems, as well as to train large AI models. With ongoing advances in science and AI, quick and easy access to supercomputing resources is critical.
Researchers can now deploy and use supercomputering-class HPC clusters in Google Cloud based on newH4D VMs (virtual machines), our most powerful CPU-based VMs that use 5th Generation AMD EPYCTM Processors. H4D clusters are connected with Remote Direct Memory Access (RDMA) networking utilizing Google’s Falcon and Titaniumoffload technologies, providing low-latency communications for HPC applications. By using standard message-passing libraries over RDMA, H4D VMs can efficiently scale applications up to tens of thousands of cores, resulting in faster time-to-solution. You can register for the H4D VM preview here.
Harvard University is using Google Cloud to advance heart disease research by simulating large-scale systems of red blood cells and other structures, including magnetically controlled artificial bacterial flagella (ABF), with the goal of developing therapies to attack and dissolve blood clots and circulating tumor cells in human vasculatures.
“With the power of Google’s new H4D-based clusters, we are poised to simulate systems approaching a trillion particles, unlocking unprecedented insights into circulatory functions and diseases. This leap in computational capability will dramatically accelerate our pursuit of breakthrough therapeutics, bringing us closer to effective precision therapies for blood vessel damage in heart disease.” – Petros Koumoutsakos, Harvard University
Professor Koumoutsakos’ research involves the simulation of blood flowing in a microfluidics device which is designed to capture circulating tumor cells.
HPC clusters based on our recently announced A4 and A4X VMs are also a critical component of our scientific discovery portfolio. A4 VMs,built on NVIDIA’s latest HGX B200 GPUs, are a versatile and powerful tool for multiple scientific computing applications, offering excellent performance for direct numerical simulation, and for AI training. A4X VMs, accelerated by NVIDIA GB200 NVL72 GPUs, are purpose-built for training and serving the most demanding, extra-large-scale AI workloads.
Clusters using these GPU-powered VMs can also unlock supercomputing-class performance for the next frontier of innovation: quantum computing. In the future, quantum computing systems will allow scientists to solve problems that are intractable even with the most powerful traditional supercomputers. In the meantime, HPC clusters based on A-series VMs can be used to design tomorrow’s quantum computers and optimize quantum algorithms, by simulating large quantum circuits using the quantum simulation solution blueprint.
For example, Google Research’s Quantum AI team leverages Google Cloud to simulate the intricate device physics of quantum hardware, develop sophisticated hybrid quantum-classical algorithms, and explore and test novel quantum algorithms. This robust simulation environment facilitates scientific breakthroughs by delivering the performance and scalability essential for demanding quantum research workflows.
“We observed excellent scalability simulating a 43-qubit circuit with a depth of 30 on Google Cloud’s new GPU-based supercomputers. These results underscore the potential for researchers to develop and test larger and deeper quantum circuits, which is important for understanding the performance of quantum algorithms and accelerating progress toward applications for today’s quantum computers.” – Sergio Boixo, Director, Computer Science, Google Quantum AI
HPC clusters demand high I/O performance to keep computational performance from stalling. Our new Google Cloud Managed Lustre storage service, developed in collaboration with DataDirect Networks and based on EXAScaler technology, provides the I/O performance needed for supercomputing-scale applications. Google Cloud Managed Lustre delivers a high-performance, fully-managed parallel file system optimized for HPC and AI applications. With petabyte-scale capacity and up to 1 TB/s throughput, Managed Lustre ensures researchers have the I/O performance they need to power their scientific discoveries. Request access to the Managed Lustre preview by contacting your account representative.
Advanced scientific applications powered by AI models
We recently announced our first AI-powered science applications for researchers and enterprises on Google Cloud: the groundbreaking AlphaFold 3 molecular structure and interaction prediction model, and the WeatherNext weather forecasting models.
AlphaFold 3,developed by Google DeepMind and Isomorphic Labs, is revolutionizing biology through its ability to predict the structure and interactions of all of life’s molecules with unprecedented accuracy. Understanding molecular structures and their interactions helps researchers better grasp complex interactions in human health and disease. AlphaFold 3 is now available for non-commercial use on Google Cloud.
“Having access to the scientific capabilities of AlphaFold on Google Cloud can help our research rapidly predict and explore the structure and interactions of all biomolecule classes. This change in capability will accelerate our understanding of diseases and enable the generation of therapeutic hypotheses.” – Sumaiya Iqbal, Senior group lead of the Ladders to Cures Accelerator, Broad Institute
To further support users, we’re simplifying access to AlphaFold 3 through a new high-throughput solution deployable via Cluster Toolkit. This turnkey solution enables efficient batch processing of hundreds to tens of thousands of sequences while minimizing costs by autoscaling infrastructure.
In the domain of weather, Google DeepMind and Google Research WeatherNext models use AI for fast and accurate weather forecasting, and we recently released live WeatherNext AI forecasts on BigQuery and Earth Engine. Today, we’re introducing access to WeatherNext AI models via Google Cloud’s Vertex AI Model Garden, enabling practitioners to customize and deploy these advanced models for energy prediction, logistics, agriculture, risk management, and more.
With easier and more affordable access to faster and more accurate weather forecasting models, researchers can study far more scenarios, and organizations can better prepare for weather events — such as heat waves, floods, and hurricanes — to reduce their impact on infrastructure, personnel, supply chains, and communities.
WeatherNext Graph forecasts visualized in Google Earth Engine, showing forecasted wind speed, wind direction, and precipitation as of September 8, 2023. The visualization demonstrates the projected path of Hurricane Lee over the Atlantic Ocean.
For instance, Carrier plans to leverage Google Cloud’s WeatherNext AI models as part of its Home Energy Management System (HEMS) to help enhance grid flexibility and enable smarter energy management. Once deployed, WeatherNext AI models are expected to help HEMS intelligently manage energy flows in real time — charging, discharging, and redirecting energy based on grid conditions, energy demands, and weather forecasts — contributing to a more balanced and sustainable energy grid.
Using AI as the ultimate research partner
Google’s robust ecosystem of information, productivity, and advanced AI tools has long helped drive scientific research, providing researchers with information and insight. Google Scholar is an indispensable resource for navigating the vast landscape of scientific literature and for discovering and tracking relevant publications. Then there’s Gemini, which can synthesize, summarize and explain information from highly scientific and technical content. And NotebookLM, an AI-powered research assistant, intelligently processes and summarizes selected research papers and datasets, dramatically accelerating literature reviews and extracting crucial information.
We’re excited to announce two new AI agents in Agentspace that have the potential to further accelerate scientific research and to revolutionize hypothesis generation. Deep Researchcondenses hours of research by synthesizing information across internal and external sources to generate in-depth research reports. Idea Generation helps rapidly develop novel ideas through AI agents that create ideas, then test them against each other to find the best hypotheses.
Scientists can also leverage AI StudioandVertex AIon Google Cloud to develop customized AI applications and advanced machine learning workflows. We also recently announced Gemma 3, a collection of lightweight, state-of-the-art open models built from the same research and technology that powers our Gemini 2.0 models. These are our most advanced, portable and responsibly developed open models yet, and can be used to create scientific applications on local devices. Finally, Google Research’s Geospatial Reasoning framework, leveraging Vertex AI Agent Engine, will allow scientists and analysts to unlock powerful insights about the world through new geospatial foundation models and generative AI.
Enabling transformational science today and tomorrow
Together, these new advanced infrastructure, AI applications, and AI productivity technologies provide new cloud-scale scientific capabilities for all kinds of computational science research. Combined with our discovery, collaboration, and productivity tools, we are providing scientists and researchers with a comprehensive array of cloud-powered scientific capabilities.
Argonne National Laboratory, a leading laboratory for open science computational research, is working with Google Cloud to explore how advanced computing technologies and AI tools can empower scientists and engineers to make groundbreaking discoveries faster than ever. Through the collaboration, ANL will use and evaluate Google Cloud solutions for computational research, providing feedback and guidance to further advance the design, performance, and usefulness of Google Cloud for supercomputing-scale science.
“Having access to powerful computational capabilities is critical for making new scientific discoveries and accelerating innovations that power business and society.We are eager to work with Google Cloud to leverage their comprehensive, global-scale AI and HPC infrastructure, software technologies and AI-powered applications such as AlphaFold 3. Argonne National Laboratory’s collaboration with Google Cloud will effectively drive innovation and enable discoveries that change the world — and bring these capabilities to researchers everywhere.” – Rick Stevens, Associate Laboratory Director for Computing, Environment and Life Sciences, Argonne National Laboratory
Scientific discoveries are more important than ever for solving the world’s greatest challenges. At Google, we’re building powerful advanced computing technologies to enable scientific discoveries and innovations, and we are excited to bring all these capabilities together in Google Cloud.
The transformative power of AI and intelligent agents is driving profound changes, where software can understand natural language questions and commands — and even autonomously act on our behalf. At the heart of this revolution is the “AI-ready” enterprise database, an active, intelligent engine that understands the semantics of structured and unstructured data, and uses the power of foundation models to create a platform where you can unlock unprecedented opportunities from enterprise data.
This week at Google Cloud Next, we’re announcing several new capabilities in AlloyDB AI to accelerate intelligent agent and application development. These include advanced semantic search with high-performance filtered vector search, automatic vector index maintenance, and a major increase in the quality of searches using the newly launched AlloyDB AI query engine and the Vertex AI Ranking API. The AI query engine brings AI-powered operators to SQL queries for filtering, as well.
We’re also launching natural language capabilities to provide users and agents with deep insights from natural language questions. Taken together, these innovations position AlloyDB as the foundation for agentic AI, evolving the database beyond data storage and conventional SQL querying to a future where intelligent agents can converse with the data and autonomously explore it on our behalf.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e318d9d2040>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
High-performance, high-quality, and easy semantic search
Modern apps require smart data retrieval that combines structured data with unstructured, multimodal data such as text and images. Previously, AlloyDB AI enabled semantic queries over unstructured data, deeply integrating vector search with PostgreSQL so search results are always up to date. Our next set of AlloyDB AI capabilities addresses customer requests for higher performance, better search result quality, and low-cost automated maintenance.
Adaptive filtering: This innovative technique, now in preview, can help ensure that filters, joins, and vector indexes deliver optimal performance when used together. Adaptive filtering optimizes the query plan once it learns the actual filter selectivity as it access data, and then can appropriately switch between filtered vector search methods.
Vector index auto-maintenance, also in preview, reduces how often you need to rebuild your vector indexes, while ensuring that vector indexes remain accurate and performant even as data changes. You can enable vector index auto-maintenance during index creation or when altering the index.
Reranking: The newly-released AlloyDB AI query engine can enhance semantic search by combining vector search with high-accuracy AI reranking, through the new Vertex AI cross-attention Ranking API. Our reranking capability uses vector search to efficiently generate initial candidates (such as Top N) and then apply the high quality cross-attention Ranking API to accurately determine the final best results (such as Top 10) from those candidates. To give you as much flexibility as possible, AlloyDB AI can connect with any third-party ranking API, including custom ones.
Recall evaluator: Now generally available, this capability provides the transparency you need for managing and tuning the quality of vector search results. With a simple stored procedure, you can evaluate end-to-end recall for any query, including complex ones with filters, joins, and reranking.
Parallel index build: Now generally available, index build parallelization allows developers to build indexes of up to 1 billion rows in just hours, down from several times that number. To support this capability, AlloyDB AI spins up parallel processes to distribute the workload and create indexes faster.
These improvements are made possible by the deep integration of AlloyDB AI’s Scalable Nearest Neighbors (ScaNN) vector index with the PostgreSQL query planner, and they lead to notably faster performance:
10x faster index creation when compared to the HNSW index in standard PostgreSQL.
4x faster vector search when compared to the HNSW index in standard PostgreSQL.
AlloyDB AI natural language
Natural language interfaces on databases showed great progress in 2024, backed by AI technology that turns questions, posed either by end users or by agents, into SQL queries that provide answers.
To further improve accuracy, a quantum leap was needed. Building on the natural language support announced last year, we’re introducing new capabilities to help you build interactive natural language user interfaces that decipher user intent accurately, and can build highly-accurate mappings of user questions to SQL queries that answer them.
Disambiguation: Natural language is inherently ambiguous. The AlloyDB AI natural language interface will ask follow-up questions when it needs more information about user intent. Since ambiguity is often rooted deep in the data, the database is the best at solving it.
For example, a question may refer to “John Smith,” but there may be two John Smiths in the database, or perhaps there’s a “Jon Smith,” whose first name was spelled differently, or even misspelled. AlloyDB concept types and the AlloyDB values index enable finding the relevant entities and their concepts when they’re not obvious from the question.
High accuracy and intent explanation: AlloyDB AI natural language uses plain templates, which correspond to parameterized SQL queries, and faceted templates for providing highly-accurate, virtually-certified answers to predictable and important classes of questions.
For example, a retailer’s product search page could theoretically include dozens of product properties — far too daunting for a screen-based faceted search interface. In contrast, a faceted search template, even with one simple search field, can answer any question that directly or indirectly poses any combination of property requirements. AlloyDB can automatically produce templates from query logs, and you can provide additional templates to boost query coverage. To ensure confidence in results, AlloyDB offers a transparent explanation of its interpretation of user inquiries.
High accuracy and flexibility: For cases where questions are not predictable but question answering must provide flexibility, AlloyDB enables the user to raise accuracy by automatically enriching the context that is used in the mapping of the question to SQL with the rich data found in the schema, the data (such as sample data that can greatly enhance accuracy), and the query logs.
Parameterized secure views: AlloyDB offers parameterized secure views, a new kind of database view that locks down access to end-user data at the database level, to help protect against prompt injection attacks.
Beyond AlloyDB with Agentspace: AlloyDB AI natural language is available in Google Agentspace for building your own agents that, for example, may answer questions by combining AlloyDB data with data from other sources, such as the web or another database.
AlloyDB AI query engine
To empower you to build intuitive and powerful AI applications, AlloyDB AI query engine can unlock deep semantic insights from enterprise data through AI-powered SQL operators. AI query engine leverages Model Endpoint Management, a mechanism for calling any AI model on any platform.
Let’s review AlloyDB AI query engine and other capabilities newly available in AlloyDB AI via new AI models:
AI query engine: AlloyDB SQL now features simple but powerful AI operators — AI.IF() for filters and joins, and AI.RANK() for ordering. These operators use natural language in SQL queries to express the filtering conditions and the ranking criteria. They can use foundation models to bring reasoning and real-world knowledge to SQL queries, and they can use cross-attention models, which also draw their power from foundation models and their real-world knowledge. In particular, AI.RANK() can use the Vertex AI Ranking API to find the most relevant results.
Multimodal embedding generation: Previously, AlloyDB AI enabled a SQL developer to easily generate embeddings from text in SQL statements. We’ve expanded this capability to generate embeddings for any modality (text, images, and videos) so you can search using any modality.
Updated text embedding generation: AlloyDB AI query engine provides out-of-the-box integration with the text-embedding generation model from Google DeepMind.
Getting started
We believe today’s AlloyDB AI announcements — enhanced filtered vector search, next-generation natural language support, and the AI query engine — are the foundation for the future of databases. They provide proactive insights for agents that anticipate and act decisively, powered by AI-ready data. AlloyDB AI is building a database revolution, empowering you to step boldly into this intelligent future and unlock your data’s boundless potential.
Migrating data workloads to BigQuery, our unified Data to AI platform, just got significantly easier. You no longer have to choose between unlocking value from your data assets by migrating to a modern data platform, or mitigating risk by staying put. You can achieve both with BigQuery Migration Services, a collection of free-to-use, cloud-native services that enable large-scale transformations for data warehouses and data lakes by breaking down migrations into templated, iterative and manageable steps. They move data, code, and business logic from on-premises and cloud platforms to BigQuery, utilizing a “next-best action” approach that minimizes time-to-migrate and maximizes ROI for your business transformation.
At Google Cloud Next 25, we announced several new innovations in BigQuery Migration Services, including coverage for data science and expanding support for data engineering and data analytics workloads. New capabilities span across four stages of a data platform migration: 1) automated assessment and planning, 2) automatic code translation, 3) data migration, and 4) validation.
BigQuery Migration Services
Let’s look at the new innovations in BigQuery Migration Services.
1. Automated discovery and assessment with estimated total cost of ownership
Your data platform migration journey begins with automated discovery and assessment of the source environment. BigQuery Migration Services’ automated assessments provide details of the existing environment, create an insights-filled view of the workloads’ projected landed state on BigQuery (including performance and estimated total cost of ownership), and guide you on how to get to BigQuery (migration planning). You can run an assessment with the push of a button on the Google Cloud console, which delivers a detailed Looker-studio report and BigQuery datasets as output. Assessments are available for Teradata, Snowflake, and Redshift, and today, we also announced that assessments for Oracle/Exadata and Cloudera/Hive are available immediately, and that a Databricks assessment is coming soon.
To help with a structured and successful migration, we also announced a source lineage service in preview. This service automatically identifies and groups dependencies between workloads, creating an explicit ordering in which to move them, helping to minimize risk and disruption, and improving time-to-value.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebab2fc1b80>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
2. Automated code translations
Some of our heaviest investments in BigQuery Migration Services over the past years have been in our code translation services, which migrate code from 15+ sources. Today,we announced advancements inGemini-enhanced code translation, which was previously only available in interactive mode, letting you translate code, like you would with, say, Google Translate.
Now, Gemini-enhanced code translations are also available in batch and API modes, helping you migrate at scale. Coupled with a new unified Translation API that backs all three modes, you can first translate bulk code using batch or API modes, and then fine-tune and debug it using interactive mode.
Gemini-enhanced translations
Now, you canalso preprocess your code withGemini, so you can migrate code that’s not just SQL, but also other kinds of code, e.g., an ETL job with SQL embedded inside XML. This means you don’t need to submit perfectly clean SQL, and can translate code from sources that don’t have full compiler coverage yet.
Finally, there’s an enhanced user-experience in the console to guide you at each step of the translation process, suggesting the next-best action to get you to the finish line.
Enhanced User Experience
These advancements dramatically reduce code conversion times while continuing to deliver over 95% accuracy, helping you tackle large migration jobs with greater efficiency.
3. Data, metadata and permissions migration
Historically, BigQuery Migration Services have supported large-scale data migrations from Teradata and Amazon Redshift. Today, BigQuery Migration Services support incremental updates from Teradata, batch and incremental file and permission migrations from Cloudera, and batch and incremental data migration from Snowflake, all in preview. All migrated data is automatically validated as part of the migration process.
4.Intelligent end-to-end validation
Each step of the migration process will soon include an intelligent validation mechanism that can incorporate schema and data-type updates, vs. static data checksum comparisons that exist today. You can combine validation with source lineage, making it easy to quickly identify discrepancies between source and target environments. This comprehensive code, data, and dependency validation helps ensure your business applications stay intact as you incrementally move them.
Together, these investments in each of the four stages of a data platform migration help automate your journey while containing risk, providing deterministic outcomes, and faster ROI.
Customer successes
Customers trust BigQuery Migration Services for migrating their mission-critical workloads. BigQuery Migration Services usage has grown 3x year over year, with thousands of customers using the services to migrate workloads to BigQuery.
“By migrating from Databricks to BigQuery and combining our own models with the models provided by Google Cloud, we’ve improved the performance and efficiency of our machine learning processes and better positioned ourselves for ongoing growth.” – Hamdi Amroun, Head of AI, Yassir
“BigQuery has unlocked unprecedented scalability and flexibility for VMO2, improving data platform availability and uptime, which ultimately enhances customer experience. By moving all key functions to Google Cloud, VMO2 has reduced its TCO for equivalent on-premises platforms by approximately 30%.” – Vinay Pai, Head of Data Architecture, Virgin Media O2
Take the next steps
Ready to start migrating your data platform to BigQuery? We’re ready to help!
Sign up today for the BigQuery migration incentives program for additional benefits such as Google Cloud credits, implementation services and cloud egress credits.
Government agencies rely on IT providers to provide secure, compliant, and efficient technology to help complete their vital missions. At the same time, cost-savings and productivity are taking center stage. These priorities – lower cost with better security and productivity – may seem at odds, but with the right cloud provider, they don’t have to be.
Starting today, we are offering Google Workspace at a significant discount for U.S. federal government agencies. Workspace is a FedRAMP High authorized communication and collaboration platform that includes familiar apps, such as Gmail, Drive, Docs, Meet and more. Workspace comes with the best of Google AI, including Gemini and NotebookLM, at no additional cost, and is infused with efficient, time-saving features, such as real-time collaboration. Hundreds of thousands of personnel across the Department of Energy, the Air Force Research Laboratory, and others have access to Workspace to enhance their productivity and collaboration. Now, with Gemini being the first AI assistant toreceive FedRAMP High Authorization, Workspace is also paving the way for federal agencies to leverage state-of-the-art AI capabilities in a compliant manner.
Read on to learn more about how Google Workspace could help Federal agencies potentially save up to $2 billion over the next three years with government-wide adoption, while offering improved security and enabling greater productivity.
Cutting costs
Consistent with the U.S. General Services Administration’s strategy of treating the government as a single buyer, we have launched a temporary discount of 71% off the current Multiple Award Schedule (MAS IT) pricing for a bundled offering of Google Workspace Enterprise Plus and Assured Controls Plus, ensuring federal agencies of all sizes can access volume-based pricing. This pricing is effective until September 30, 2025.
aside_block
<ListValue: [StructValue([(‘title’, ‘Google Workspace cost-saving offer for U.S. federal government’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebfa8ea9df0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Improving security
Workspace has a unique approach to ensuring that cost and productivity are addressed with security top of mind. First, it delivers a secure, reliable, and compliant cloud infrastructure for all customers, ensuring that government agencies receive the same benefits, capacity, and features at the same pace as commercial customers. Second, Workspace can nullify classes of attack vectors since it doesn’t require client desktop apps or on-premises software. And third, Workspace is built-in with AI defenses that leverage threat signals from billions of endpoints and Google’s vast threat intelligence. The result? Workspace blocks more than 99.9% of spam, phishing attempts, and malware, and comes with a 99.9% uptime SLA.
Supercharging productivity
Workspace is highly interoperable with other software tools, leading to faster deployment and increased productivity —including an estimated 30% improvement in collaboration. Designed for the cloud, Google Workspace is intuitive, reducing time to configure workstations by up to 90%, as well as facilitating simplified user onboarding and training.
Additionally, with Gemini in Workspace apps and the Gemini app having achieved FedRAMP High authorization, federal agencies can get more done with AI without additional costs for AI add-ons or subscriptions. Workspace with Gemini dramatically accelerates the creation and sharing of emails, documents, and even transcribed meeting notes. Users can save an average of 105 minutes per week generating text, summarizing content and automating tasks. Critically, 75% of daily Gemini users say it also improves the quality of their work.
Increasing efficiency
Workspace has long been a trusted partner to federal agencies, enabling efficiency through ease of collaboration and communication. Working with the Air Force Research Laboratory since 2021, Workspace has been able to create “a flexible, synergistic enterprise that capitalizes on the seamless integration of data and information through the use of modern methods, digital processes and tools and IT infrastructure.”
We are committed to ensuring the public sector can benefit from Google’s latest AI, innovations, and technologies, freed from redundancy or vendor lock-in.
Learn more about how Google Workspace can help your agency accelerate mission impact. Register now for digital access to Google Cloud Next ’25 to watch keynotes and explore sessions on-demand.
Hello from Google Cloud Next 25 in Las Vegas! This year, it’s all about how AI can reimagine work and improve our lives — even bringing Hollywood classics like The Wizard of Oz to life on one of the biggest screens in the world.
To start, CEO of Google Cloud Thomas Kurian celebrated the incredible momentum we’ve seen over the past year, with over 4 million developers already building with Gemini and a 20 times surge in Vertex AI usage. He also took a moment to acknowledge where this power comes from — our global infrastructure, which has grown to 42 regions, in 200 countries, connected by more than 2 million miles of fiber. This network moves data at “Google speed” — near-zero latency — for billions of users worldwide.
Then, Google and Alphabet CEO Sundar Pichai took the stage and announced our seventh-generation TPU — Ironwood — is coming later this year. Compared to our first publicly available TPU, Ironwood achieves 3600 times better performance. In the same period, our TPUs have become 29 times more energy efficient.
As Sundar said, we’re investing in the full stack of AI innovation — from infrastructure, to research, to models, to platforms. “One year ago, we stood here and talked about the future of AI for organizations. Today, that future is being built by all of us,” he said. Put simply: the opportunity with AI is as big as it gets. In fact, we’ve counted more than 600 use cases from massive brands, forward-thinking entrepreneurs, and major institutions all over the world, including Intuit, Verizon, and many more.
Here’s a taste of all the things that we announced today, across infrastructure, research and models, Vertex AI, and agents.
AI infrastructure
Demand for AI compute is growing — fast. To quote Amin Vadhat, VP & GM of AI, Systems, and Cloud AI, “For over 8 years, it has increased by over 10 times year over year – a factor of 100,000,000 in just 8 years!” Now, we’re bringing this AI infrastructure to you with innovations across our AI Hypercomputer stack.
What we announced:
Ironwood TPUs: Our seventh-generation TPU, Ironwood represents our largest and most powerful TPU to date, a more than 10x improvement from our most recent high-performance TPU. Featuring more than 9,000 chips per pod, Ironwood delivers a staggering 42.5 exaflops of compute per pod, meeting the exponentially growing demands of the most sophisticated thinking models, like Gemini 2.5. Read more here.
Gemini on Google Distributed Cloud: No matter where you are, Gemini can run on GDC locally in air-gapped and connected environments, with support for NVIDIA’s Confidential Computing and DGX B200 and HGX B200 Blackwell systems, with Dell as a key partner. Get the details.
It feels like a long time ago, but we first released Gemini just last year. As our first native multimodal model, Gemini delivered the first 2-million-token context window, and led in price performance with our flash models. Then, we launched Gemini 2.5 Pro, which hit #1 on Chatbot Arena!
Gemini 2.5 Flash — our workhorse model optimized specifically for low latency and cost efficiency — is coming soon to Vertex AI, AI Studio, and the Gemini app.
Imagen 3, our highest quality text-to-image model, now has improved image generation and inpainting capabilities for reconstructing missing or damaged portions of an image. Imagen delivers unmatched prompt adherence, bringing customers’ creative visions to life with incredible precision and is ranked #1 on LMArena.
Chirp 3, our groundbreaking audio generation model, now includes a new way to create custom voices with just 10 seconds of audio input, enabling enterprises to personalize call centers, develop accessible content, and establish unique brand voices—all while maintaining a consistent brand identity.
Lyria, the industry’s first enterprise-ready, text-to-music model, can transform simple text prompts into 30-second music clips, opening up new avenues for creative expression.
Veo 2, our industry-leading video generation model, is expanding with new features that help organizations create videos, edit them, and add visual effects, transforming Veo on Vertex AI from a generation tool to a comprehensive video creation and editing platform.
Help me analyze in Google Sheets which guides you through your data to complete expert-level analysis
Audio overviews in Google Docs, where you can interact with Docs in an entirely new way by creating high-quality audio versions of your content
Google Workspace Flows, to help you automate time-consuming, repetitive tasks and make decisions with more context
Vertex AI
Soon, every enterprise will soon rely on multi-agent systems — multiple AI agents working together — even when built on different frameworks or providers. Vertex AI — our comprehensive platform to orchestrate the three pillars of production AI: models, data, and agents — seamlessly brings these elements together. As Thomas said on stage, tens of thousands of enterprises are already building with Vertex AI and Gemini. In just the last year alone, we’ve seen over 40x growth in Gemini use on Vertex AI, now with billions of API calls per month. So what’s new and better with Vertex AI?
What we announced:
Meta’s Llama 4 is generally available on Vertex AI.
Vertex AI Dashboards: These help you monitor usage, throughput, latency, and troubleshoot errors, providing you with greater visibility and control.
Live API: To enable truly conversational interactions, Live API offers streaming audio and video directly into Gemini.
Agents
Now, let’s get to what everyone is talking about — agents. How to build them, where to build them, and how to scale them. Today we put AI agents in the hands of every employee with Google Agentspace. Employees can now find and synthesize information from within their organization, converse with AI agents, and take action with their enterprise applications. We also expanded Vertex AI to enable multi-agent ecosystems — we’re far beyond single-agent capabilities now.
Agent Development Kit (ADK): Our new AI Agent Development Kit (ADK) is an open-source framework that makes it easy to build multi-agent systems — with ADK, you can build an AI agent in under 100 lines of intuitive code.
Agent Garden: This collection of ready-to-use samples and tools is directly accessible in ADK. Agent Garden allows you to connect your agents to 100+ pre-built connectors, your custom APIs, integration workflows, or data stored within cloud systems like BigQuery and AlloyDB.
Interoperability: With Vertex AI, you can manage agents built on multiple agent frameworks, including LangGraph and Crew AI.
Agents for all
We also announced specialized agents for several of our key constituencies: developers, operators, data scientists and data analysts, as well as several verticals.
What we announced:
Customer Engagement Suite: The next generation of this CRM solution provides out-of-the-box functionality to build agents across web, mobile, call center, in-store and with third-party telephony and CRM systems.
Gemini Code Assist: To boost developer productivity, we now offer agents that can translate natural language requests into multi-step, multi-file solutions, new tools that make it easy to connect Code Assist to external services, third-party partners, or even other agents, and support for Gemini 2.5 and its enhanced coding capabilities.
Gemini Code Assist tools: These are prebuilt connections accessible within Gemini Code Assist’s chat that help you access information from Google apps and industry-leading tools from partners including Atlassian, Sentry, Snyk, and more.
Gemini Cloud Assist: Operators can lean on Gemini for help across a variety of IT tasks, from application design and operations, to troubleshooting, to cost optimization.
Data agents: From new data engineering agent capabilities, a data science agent embedded within Google’s Colab notebook, to Looker conversational analytics to allow business users to chat with their data, AI is making it easy to ask questions of your data — and get easy answers!
And that’s just the tip of the iceberg! There were also spotlights, hundreds of breakout sessions, hands-on labs, and thousands of show floor conversations. We can’t wait to see you again tomorrow, when we’ll share even more news, go deep on today’s announcements, and host the perennial favorite — the Developer Keynote. Have fun in Vegas tonight. But don’t stay out too late, because there’s lots more ahead tomorrow!
aside_block
<ListValue: [StructValue([(‘title’, ‘Turn your new insights from Google Cloud Next into action’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee7bf776ac0>), (‘btn_text’, ‘Get started’), (‘href’, ‘https://cloud.google.com/events/next25-offer-11’), (‘image’, <GAEImage: next 25>)])]>
Anthropic’s Claude 3.7 Sonnet hybrid reasoning model, their most intelligent model to date, is now available through cross-region inference on Bedrock in Europe (Ireland), Europe (Paris), Europe (Frankfurt), and Europe (Stockholm). Claude 3.7 Sonnet represents a significant advancement in AI capabilities, offering both quick responses and extended, step-by-step thinking made visible to the user. This new model includes strong improvements in coding and brings enhanced performance across various tasks, like instruction following, math, and physics.
Claude 3.7 Sonnet introduces a unique approach to AI reasoning by integrating it seamlessly with other capabilities. Unlike traditional models that separate quick responses from those requiring deeper thought, Claude 3.7 Sonnet allows users to toggle between standard and extended thinking modes. In standard mode, it functions as an upgraded version of Claude 3.5 Sonnet. In extended thinking mode, it employs self-reflection to achieve improved results across a wide range of tasks. Amazon Bedrock customers can adjust how long the model thinks, offering a flexible trade-off between speed and answer quality. Additionally, users can control the reasoning budget by specifying a token limit, enabling more precise cost management.
Claude 3.7 Sonnet is also available on Amazon Bedrock in the US East (N. Virginia), US East (Ohio), and US West (Oregon) regions. To get started, visit the Amazon Bedrock console. Integrate it into your applications using the Amazon Bedrock API or SDK. For more information, see the AWS News Blog and Claude in Amazon Bedrock.
We are excited to announce that Amazon SageMaker Studio now supports recovery mode, enabling users to regain access to their JupyterLab and Code Editor applications when configuration issues prevent normal startup.
Starting today, when users encounter application startup failures due to issues such as corrupted Conda configuration or insufficient storage space, they can launch their application in recovery mode on Studio UI or using AWS CLI. When configuration issues occur, users see a warning banner with the recommended solution and can choose to run their space in recovery mode. This simplified environment provides access to essential features like terminal and file explorer, allowing users to diagnose and fix configuration issues without administrator intervention. This functionality provides users with an important self-service mechanism, helping them minimize workspace downtime.
This feature is available in all AWS Regions where Amazon SageMaker Studio is currently available, excluding China Regions and GovCloud (US) Regions. To learn more, visit our documentation.
Amazon Relational Database Service (Amazon RDS) for Oracle now supports R6id and M6id instances. These instances offer up to 7.6 TB of NVMe-based local storage, making them well-suited for database workloads that require access to large amounts of intermediate data beyond the instance’s memory capacity. Customers can configure their Oracle database to use the local storage for temporary tablespace and Database Smart Flash Cache.
Operations such as sorts, hash joins, and aggregations can generate large amounts of intermediate data that doesn’t fit in memory and is stored in temporary tablespace. With R6id and M6id, Customers can place temporary tablespaces in the local storage instead of the Amazon EBS volume attached to their instance to reduce latency, improve throughput, and lower the provisioned IOPS.
Customers with Oracle Enterprise Edition license can configure Database Smart Flash Cache to use the local storage. When configured, Smart Flash Cache will use the local storage to keep frequently accessed data that doesn’t fit in memory and improve the read performance of the database.
You can launch the new instance in the Amazon RDS Management Console or using the AWS CLI. Refer Amazon RDS for Oracle Pricing for available instance configurations, pricing details, and region availability.
Amazon Aurora PostgreSQL-Compatible Edition now supports PostgreSQL versions 16.8, 15.12, 14.17 and 13.20. Please note, this release supports the versions released by the PostgreSQL community on February 20,2025 which replaces the previous February 13, 2025 release. These releases contain product improvements and bug fixes made by the PostgreSQL community, along with Aurora-specific security and feature improvements such as dynamic resizing of the allocated space for Optimized Reads-enabled temporary objects on Aurora I/O-Optimized clusters and new features for Babelfish. For more details, please refer to the release notes.
These releases are now available in all commercial AWS regions and AWS GovCloud (US) Regions, except China regions. You can initiate a minor version upgrade by modifying your DB cluster. Please review the Aurora documentation to learn more. For a full feature parity list across regions, head to our feature parity page.
Amazon Aurora is designed for unparalleled high performance and availability at global scale with full MySQL and PostgreSQL compatibility. It provides built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication, and integrations with other AWS services. To get started with Amazon Aurora, take a look at our getting started page.
Amazon Aurora PostgreSQL-Compatible Edition now supports pgvector 0.8.0, an open-source extension for PostgreSQL for storing vector embeddings in your database. pgvector provides vector similarity search capabilities that enables Aurora use in generative artificial intelligence (AI) semantic search and retrieval-augemented generation (RAG) applications. pgvector 0.8.0 includes improvements to PostgreSQL query planner’s selection of index when filters are present, which can deliver better query performance and improve search result quality.
pgvector 0.8.0 improves data filtering using conditions in WHERE clauses and joins that can improve query performance and usability. Additionally, the iterative index scans help prevent ‘overfiltering’, ensuring generation of sufficient results to satisfy the conditions of a query. If an initial index scan doesn’t satisfy the query conditions, pgvector will continue to search the index until it hits a configurable threshold. pgvector 0.8.0 also has performance improvements for searching and building HNSW indexes.
pgvector 0.8.0 is available in Amazon Aurora clusters running PostgreSQL 16.8, 15.12, 14.17, and 13.20 and higher in all AWS Regions including AWS GovCloud (US) Regions, except China. You can initiate a minor version upgrade by modifying your DB cluster. Please review the Aurora documentation to learn more.
Amazon Aurora is designed for unparalleled high performance and availability at global scale with full MySQL and PostgreSQL compatibility. It provides built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication, and integrations with other AWS services. To get started with Amazon Aurora, take a look at our getting started page.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Graviton3-based M7g instances for both Standard brokers and Express brokers for MSK Provisioned clusters in Middle East (UAE) AWS region.
Graviton M7G instances for Standard brokers deliver up to 24% compute cost savings and up to 29% higher write and read throughput over comparable MSK clusters running on M5 instances. When you use Graviton instance on Express brokers, you can realize even more benefits with up to 3x more throughput per broker, scale up to 20x faster, and reduce recovery time by 90% compared to standard Apache Kafka brokers.
The latest Well-Architected Framework update is now available in the Well-Architected Tool, featuring updates and improvements for 78 new best practices that offer actionable guidance to help organizations build more secure, resilient, scalable, and sustainable workloads.
With this release, the Well-Architected Framework has refreshed 100% of each pillar, including the Reliability Pillar, with 14 of its best practices updated for the first time since major Framework improvements started in 2022.
With the refreshed AWS Well-Architected Framework, organizations can use our actionable guidance to help achieve more operable, secure, sustainable, scalable, and resilient environment and workload solutions.
The updated AWS Well-Architected Framework is available now for all AWS customers. To learn more about the AWS Well-Architected Framework, visit the AWS Well-Architected Framework documentation.
Today, AWS announces a new fulfillment experience for container products in AWS Marketplace, enhancing the deployment and management of container-based software from AWS Partners.
The new fulfillment experience helps to reduce complexity and improve workflow efficiency by making it easier to understand available deployment options, and providing explanations of each option’s purpose and implications. The fulfillment experience also offers readily accessible help resources, including detailed guides from AWS Marketplace sellers. The experience is available across all AWS Regions and in local languages, delivering a consistent experience worldwide.
To learn more about the new fulfillment experience for container products in AWS Marketplace and how it can benefit your organization, visit the AWS Marketplace Buyer Guide or start exploring container products in AWS Marketplace today.
Amazon OpenSearch Service expands its modernized operational analytics experience to the AWS Europe (Stockholm) and Asia Pacific (Hong Kong) Regions, enabling users to gain insights across data spanning managed domains and serverless collections from a single endpoint. The expansion includes Workspaces to enhance collaboration and productivity, allowing teams to create dedicated spaces. Discover is revamped to provide a unified log exploration experience supporting languages such as SQL and Piped-Processing-Language (PPL), in addition to DQL and Lucene. Discover now features a data selector to support multiple sources, new visual design and query autocomplete for improved usability. This experience ensures users can access the latest UI enhancements, regardless of version of underlying managed cluster or collection.
The expanded OpenSearch analytics helps users gain insights from their operational data by providing purpose-built features for observability, security analytics, and search use cases. With the enhanced Discover interface, users can now analyze data from multiple sources without switching tools, improving efficiency. Workspaces enable better collaboration by creating dedicated environments for teams to work on dashboards, saved queries, and other relevant content. Availability of the latest UI updates across all versions ensures uninterrupted access to the newest features and tools.
Starting today, PartyRock is supporting an image playground that uses the Amazon Nova Canvas foundation model to transform your ideas into customizable images. You can access the image playground directly through the “Images” section, featuring an intuitive interface and comprehensive customization options.
This new capability enhances PartyRock’s existing image generation features. While you could previously generate images using widgets in your apps, you can now also create images through the dedicated image playground. The playground offers configuration options including orientation choices (landscape, portrait, square), resolution sizes, and color guidance. The image playground comes with pre-filled prompts to help you get started, and provides suggested prompts after each generation to help refine and customize your images further.
We welcome your feedback and contributions to help shape our roadmap as we continue to enhance PartyRock’s capabilities for improving everyday productivity. You can experiment with PartyRock using a free daily use grant, without worrying about exhausting free trial credits. To begin creating with the image playground, try PartyRock today.
Today, Amazon Q Developer announced expanded multi-language support for the integrated development environment (IDE) and the Q Developer CLI. Among the many supported languages are Mandarin, French, German, Italian, Japanese, Spanish, Korean, Hindi and Portuguese, with more languages available.
To get started, simply start a conversation with Q Developer using your preferred language. Q Developer will then automatically detect it and provide answers, code suggestions, and responses in the appropriate language, making development more accessible and efficient for global teams.
Amazon MQ is now available in two new regions, Asia Pacific (Thailand) and Mexico (Central). With this launch, Amazon MQ is now available in a total of 36 regions.
Amazon MQ is a managed message broker service for open-source Apache ActiveMQ and RabbitMQ that makes it easier to set up and operate message brokers on AWS. Amazon MQ reduces your your operational responsibilities by managing the provisioning, setup, and maintenance of message brokers for you. Because Amazon MQ connects to your current applications with industry-standard APIs and protocols, you can more easily migrate to AWS without having to rewrite code.