Today, AWS announces a new flow management feature for AWS Network Firewall that enables customers to identify and control active network flows. This feature introduces two key functions: Flow Capture, which allows point-in-time snapshots of active flows, and Flow Flush, which enables selective termination of specific connections. With these new capabilities, customers can now view and manage active flows based on criteria such as source/destination IP addresses, ports, and protocols, providing enhanced control over their network traffic.
This new feature helps customers maintain consistent security policies when updating firewall rules and enables rapid response during security incidents. Network administrators can now easily validate security configurations and ensure that all traffic is evaluated against current policies. The flow management feature is particularly valuable for troubleshooting network issues and isolating suspicious traffic during security events. By providing granular control over active network flows, AWS Network Firewall enhances customers’ ability to maintain a secure and efficient network environment.
The new flow management feature is available in all regions where AWS Network Firewall is supported, allowing customers to benefit from these enhanced capabilities across their global infrastructure.
Amazon Bedrock RAG evaluation is now generally available. You can evaluate your retrieval-augmented generation (RAG) applications, either those built on Amazon Bedrock Knowledge Bases or a custom RAG system. You can evaluate either retrieval or end-to-end generation. Evaluations are powered by an LLM-as-a-judge, with a choice of several judge models. For retrieval, you can select from metrics such as context relevance and coverage. For end-to-end retrieve and generation, you can select from quality metrics such as correctness, completeness, and faithfulness (hallucination detection), and responsible AI metrics such as harmfulness, answer refusal, and stereotyping. You can also compare across evaluation jobs to iterate on your Knowledge Bases or custom RAG applications with different settings like chunking strategy or vector length, rerankers, or different content generating models.
*Brand new – more flexibility!* As of today, in addition to Bedrock Knowledge Bases, Amazon Bedrock’s RAG evaluations supports custom RAG pipeline evaluations. Customers evaluating custom RAG pipelines can now bring their input-output pairs and retrieved contexts into the evaluation job directly in their input dataset, enabling them to bypass the call to a Bedrock Knowledge Base (“bring your own inference responses”). We also added citation precision and citation coverage metrics for Bedrock Knowledge Bases evaluation. If you use a Bedrock Knowledge Base as part of your evaluation, you can incorporate Amazon Bedrock Guardrails directly.
Amazon Bedrock Model Evaluation’s LLM-as-a-judge capability is now generally available. Amazon Bedrock Model Evaluation allows you to evaluate, compare, and select the right models for your use case. You can choose an LLM as your judge from several available on Bedrock to ensure you have the right combination of evaluator models and models being evaluated. You can select quality metrics such as correctness, completeness, and professional style and tone, as well as responsible AI metrics such as harmfulness and answer refusal. You can evaluate all available models on Amazon Bedrock, including serverless models, Bedrock Marketplace models compatible with Converse API, customized and distilled models, imported models, and model routers. You can also compare results across evaluation jobs.
*Brand new – more flexibility!* Today, you can evaluate any model or system hosted anywhere by bringing your own inference responses you already fetched into your input prompt dataset for the evaluation job (“bring your own inference responses“). These responses can be from an Amazon Bedrock model or from any model or application hosted outside of Amazon Bedrock, enabling you to bypass calling an Amazon Bedrock model in the evaluation job, and allowing you to incorporate all the intermediate steps of your application into your final responses.
With LLM-as-a-judge, you can get human-like evaluation quality at lower cost, while saving weeks of time.
When it comes to managing the infrastructure and AI that powers Google’s products and platforms – from Search to YouTube to Google Cloud – every decision we make has an impact. Traditionally, meeting growing demands for machine capacity means deploying new machines and that has an associated embodied carbon impact. That’s why we’re working to reduce the embodied carbon impact at our data centers by optimizing machine placement and promoting the reuse of technical infrastructure hardware.
In this post, we shine a spotlight on our hardware harvesting program, an approach to fleet deployment that prioritizes the reuse of existing hardware.
The hardware harvesting program
The concept is simple: As we deploy new machines or components in our fleet, we repurpose older equipment for alternative and/or additional use cases. The harvesting program prioritizes the reuse of existing hardware, which reduces our carbon emissions compared to exclusively buying brand new machines from the market. This program also helps conserve valuable resources and minimize waste, which contributes to a more circular economy. By scrutinizing the carbon impact of deployment decisions, we’re not just reducing emissions — we’re embedding carbon considerations into the very core of our data center machine operations and business decisions.
Hardware harvesting is not without its challenges. For the program to be successful, we need to ensure the harvested machines meet the specific demands of our workloads and our customers’ requirements, which vary depending on the type of machine and its configuration. However, our heterogeneous fleet, with a wide variety of computational, storage, and accelerator machines, gives us the flexibility to find creative solutions that support both our services and our sustainability goals.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2d7998e790>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Hardware harvesting in action
Google’s harvesting program has already yielded strong benefits. By prioritizing the reuse of existing hardware, we’ve been able to optimize the use of new equipment, reduce our carbon footprint, minimize waste and lower costs.
For example, in 2024, we needed more specific models and configurations of certain components (PCBs, CPUs, motherboards, and HDDs). We harvested them from existing machines by migrating configuration-agnostic jobs from existing machines to more efficient ones, then reclaimed the components from these specific machines. In 2024, the harvesting program helped us reuse over 293,000 components to fulfill new demand, save carbon emissions, and reduce costs. Scaling this hardware harvesting approach across Google’s data center infrastructure presents an opportunity for cost, resource, and carbon reduction.
Looking ahead: Leading by example
Harvesting is just one example of how we’re embedding carbon considerations into our data center practices. We believe that these initiatives will play a role in helping us achieve our company-wide net-zero goal and build a more sustainable future for cloud computing and AI. Read our 2024 Environmental Report to learn more about our sustainability practices.
As we continue to refine our strategies, we aim to lead by example and encourage other companies, especially those in the cloud computing industry, to consider similar approaches.
Model endpoint management helps developers to build new experiences using SQL and provides a flexible interface to call gen AI models running anywhere — right from the database. You can generate embeddings inside the database, perform quality control on your vector search and analyze sentiment in the database, making it easier to monitor results. This feature is available through the google_ml_integration extension, which enables an integration with Vertex AI for both AlloyDB and Cloud SQL for PostgreSQL.
Previously, the google_ml_integration extension only allowed users to call models hosted on the Vertex AI platform. With model endpoint management, you can leverage models running on any platform — including your own local environment. We also added ease-of-use support for models running on Open AI, Hugging Face, and Anthropic, as well as Vertex AI’s latest embedding models so you can easily access these models. We have preconfigured the connectivity details and input/output transformation functions for these providers, so that you can easily register the model and simply set up the authentication details.
For Vertex AI models, we have pre-registered embedding and Gemini models so that you can easily start calling them. Plus, newer embedding models have built-in support meaning you are able to access the latest versions of pre-registered models allowing you to start making prediction calls out-of-the-box.
In this blog, we’ll walk you through three example workflows that leverage model endpoint management to build richer generative AI experiences.
Generating embeddings with Open AI embeddings models
Leveraging Gemini to evaluate vector search results
Running sentiment analysis to analyze user sentiment
aside_block
<ListValue: [StructValue([(‘title’, ‘Try AlloyDB for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2d7892c190>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
First, register your model.
To use your own model, register your model using the create model function, where you specify model endpoint connectivity details. You can then configure a set of optional parameters that allow you to transform the input and output of the model arguments to a format suitable for your database. Here’s an example of registering Anthropic’s Claude model.
Once you register your model, you can call it with the predict row function for any AI model — or you can use the embedding convenience function to call an embedding model.
#1: Generate embeddings with Open AI embeddings models
Model endpoint management allows you to leverage the embedding convenience function with any embeddings model, even ones that don’t run on Google Cloud. Say you want to generate embeddings with OpenAI’s ada embeddings model. With our ease-of-use support you need only register your authentication credentials, register the model, and start generating embeddings. You first need to configure the authentication for the endpoint you would like to reach — you can do so either by creating a PostgreSQL function to specify your API key in the header of the API call or by creating a secret with secret manager and then registering the secret with model endpoint management.
To register your secret, you simply need to specify the secret path and create an ID for the secret. You can find the secret path in the resource manager by clicking on the secret, and then clicking “copy resource name” on the specific version of the secret you want to use.
Once your secret has been registered, you can register your model and point to the secret, open_ai_secret, when you register the openai-ada model. Our ease-of-use support handles the input and output formatting so that you can generate embeddings from data in your database and directly use the output embedding for vector search.
You then need only specify the name of the model you have registered in the first argument and the text in the second argument. For instance, if you want to generate an embedding on the word “I love Google Databases”, you would invoke the embedding function like so:
code_block
<ListValue: [StructValue([(‘code’, ‘select google_ml.embedding(‘openai-ada-002’, ‘I love Google Databases’);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2d7892c5e0>)])]>
If you want to generate an embedding in-line while performing a vector search, combine the embedding function with vector search in SQL using the following syntax:
code_block
<ListValue: [StructValue([(‘code’, “select id, name from itemsrn ORDER BY embeddingrn <-> google_ml.embedding(‘openai-ada-002’, ‘I love Google Databases’) LIMIT 10;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2d7892cc70>)])]>
Model endpoint management also has built in integrations with Vertex AI’s latest embedding models, allowing you to access any of Vertex AI’s supported text embedding models. We recommend the embedding() function for in line SQL queries or to generate stored embeddings on datasets smaller than 100k rows.
#2: Leverage Gemini to evaluate vector search results
In addition to a deep integration with embedding models, model endpoint management provides developers out-of-the-box support for the latest Gemini models. Gemini Pro and Gemini Flash Light are both available as pre-registered models in AlloyDB and Cloud SQL for PostgreSQL. Leveraging Gemini, you can generate content, perform sentiment analysis or analyze the quality of vector search results. Let’s see how you might analyze the quality of your vector search results with Gemini using thepredict row function.
Suppose you have a table apparels with an ID, product_description and embedding column. We can use model endpoint management to call Gemini to validate the vector search results by comparing a user’s search query against the product descriptions. This allows us to see discrepancies between the user’s query and the products returned by the vector search.
code_block
<ListValue: [StructValue([(‘code’, ‘SELECTrnLLM_RESPONSErnFROM (rnSELECTrnjson_array_elements( google_ml.predict_row( model_id =>’gemini-1.5-pro:streamGenerateContent’,rn request_body => CONCAT(‘{rn “contents”: [rn { “role”: “user”,rn “parts”:rn [ { “text”: “Read this user search text: ‘, user_text, ‘ Compare it against the product inventory data set: ‘, content, ‘ Return a response with 3 values: 1) MATCH: if the 2 contexts are at least 85% matching or not: YES or NO 2) PERCENTAGE: percentage of match, make sure that this percentage is accurate 3) DIFFERENCE: A clear short easy description of the difference between the 2 products. Remember if the user search text says that some attribute should not be there, and the record has it, it should be a NO match.”rn } ]rn }rn] }’rn)::json))-> ‘candidates’ -> 0 -> ‘content’ -> ‘parts’ -> 0 -> ‘text’rnAS LLM_RESPONSErn FROM (rn SELECTrn id || ‘ – ‘ || product_description AS literature,rn product_description AS content,rn ‘I want womens tops, pink casual only pure cotton.’ user_textrn FROMrn apparelsrn ORDER BYrn embedding <=> embedding(‘text-embedding-005’,rn ‘I want womens tops, pink casual only pure cotton.’)::vectorrn LIMITrn 5 ) AS xyz ) AS X;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2d7892c4f0>)])]>
We are able to pass in the vector search results to Gemini to evaluate how well the user’s query matches the descriptions qualitatively, and note differences in natural language. This allows you to build quality control to your vector search use case so that your vector search application improves over time. For the full end to end use case follow this code lab.
#3: Run sentiment analysis to analyze user sentiment
One of the benefits of calling Gemini in the database is its versatility. Above, we showed how you can use it to check the quality of your vector search. Now, let’s take a look at how you might use it to analyze the sentiment of users.
Say you are an e-commerce company and you want to perform sentiment analysis on user review information stored in the database. You have a table products which stores the name of the product and their descriptions. You have another table of product reviews, product_reviews, storing user reviews of those products joined on the id of the product. You just added headphones to your online offering and want to see how well they are doing in terms of customer sentiment. You can use Gemini through model endpoint management to analyze the sentiment as positive or negative in the database and view the results as a separate column.
First, create a wrapper function in SQL to send a prompt and the text you want to analyze the sentiment on to Gemini with the predict row function.
code_block
<ListValue: [StructValue([(‘code’, ‘– Pass in the prompt for Gemini and text you want to analyze the sentiment ofrnCREATE OR REPLACE FUNCTION get_sentiment(prompt text)rnRETURNS VARCHAR(100)rnLANGUAGE plpgsqlrnAS $$rnDECLARErn prompt_output VARCHAR(100);rn predict_row_input text;rnBEGINrn SELECT ‘{rn “contents”: [{“role”: “user”,”parts”: [{“text”: “Only return just the output value for the input. input: ‘ || prompt || ‘. output:”}]}]}’ INTO predict_row_input;rn — Execute the prediction query with the input country namern SELECT trim(replace(google_ml.predict_row(‘gemini-1.5-pro:generateContent’, predict_row_input::json)-> ‘candidates’ -> 0 -> ‘content’ -> ‘parts’ -> 0 -> ‘text’#>> ‘{}’, E’\n’,”))rn INTO prompt_output;rn — Return the continent namern RETURN prompt_output;rnEND; $$;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2d79dd37c0>)])]>
Now let’s say you want to analyze the sentiment on a single review — you could do it like so:
code_block
<ListValue: [StructValue([(‘code’, “SELECTrn get_sentiment(rn ‘Please output a sentiment for a given review input. The sentiment value return should be a single word positive/negative/neutral. Input review: These headphones are amazing! Great sound quality and comfortable to wear.’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2d77899ca0>)])]>
You can then generate predictions on only reviews containing the word “headphones” by using a LIKE clause and calling your get sentiment function:
code_block
<ListValue: [StructValue([(‘code’, “SELECTrn review_id,rn product_review,rn gemini_prompt_get_scalar(rn ‘Please output a sentiment for a given review input. The sentiment value return should be a single word positive/negative/neutral. Input review:’rn || product_review)rnFROM product_reviewsrnWHERE product_id IN (SELECT product_id FROM products WHERE name LIKE ‘%Headphones%’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2d77899eb0>)])]>
This should output whether the review was “positive, negative or neutral” for user reviews regarding headphones. Allowing you to see what the user sentiment is around this new product. Later, you can use aggregators to see whether the majority of the sentiment is positive or negative.
Get started
Model endpoint management is now available in AlloyDB, AlloyDB Omni and Cloud SQL for PostgreSQL. To get started with it, follow our documentation on AlloyDB and Cloud SQL for PostgreSQL.
If you used the internet today, you’ve probably already benefited from generative AI. Whether it helped you get your work done faster, research home repairs, or find the perfect gift, gen AI is transforming how we get things done. These generative AI experiences use searches against vector embeddings — multi-dimensional representations of data’s meaning — to match your intent with the best answer.
But integrating vector technology into existing applications can be challenging. Many databases have historically not supported vector search, so developers have had to integrate specialized vector databases side-by-side with their existing databases.
Enter MySQL similarity search
Cloud SQL for MySQL now supports vector storage and similarity search, which means you can transform your MySQL databases in place to integrate gen AI capabilities without a specialized vector database. Now generally available, it’s as simple as adding a new column to your existing table and loading in your vector embeddings, which you can generate using your favorite models; for example, you can use Vertex AI’s pre-trained text embeddings models. Once you’ve imported your dataset, you can perform both k-nearest neighbors (kNN) and approximate nearest neighbors (ANN) searches by adding the right index for your use case; these search indexes were developed using Google’s open-source ScaNN libraries. Our GA offering includes the same ACID support and crash recovery for vectors that you expect from a relational database.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2d789b3b80>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
To think about this in action, imagine you’re the developer for a hardware store’s online shopping experience. By integrating ANN similarity search into your catalog, when a shopper asks “what do I need to fix a crack in my dining table?” you can convert this question into a vector embedding and match against all products in your catalog to find items that can be used to fix dining table cracks.
We’ve collaborated closely with companies that rely on MySQL to help them integrate generative AI into their existing applications. For instance, supply chain solution provider Manhattan Associates is exploring similarity search in MySQL to improve search results for customers using its applications.
“Similarity search in MySQL enables us to easily integrate gen AI capabilities into the fleet of applications we’ve built on Cloud SQL for MySQL. For example, we’re exploring how we can use similarity search against product information to render better search results. This can be expanded to various searches across the application solutions we provide.” – Sanjeev Siotia, Executive Vice President & Chief Technology Officer, Manhattan Associates
Get started building
Ready to build generative AI apps on top of your MySQL databases? We have a few solutions to help you get started:
Sample app: Lets you customize the datastore for a bot-based app, with Cloud SQL for MySQL as an option. This app uses kNN search as the search type.
Code lab: Walks you through the basics of deploying a gen AI app with Cloud SQL and LangChain, a popular gen AI app development framework.
Modern data teams want to use Git to collaborate effectively and adopt software engineering best practices for managing their data pipelines and analytics code. But most tools used by data teams don’t offer integration with Git version control systems, making a Git workflow feel out of reach. This forces users to copy and paste code between UIs, which is not only time-consuming but also error-prone.
To help, we’re introducing repositories in BigQuery in Preview, a new experience in BigQuery Studio that helps data teams collaborate on code stored in Git repositories.
Develop with Git in BigQuery Studio
BigQuery repositories provide a comprehensive set of features to integrate Git workflows directly into your BigQuery environment:
Set up new repositories in BigQuery Studio where you can develop SQL queries, Notebooks, data preparation, data canvases, or text files with any file extension.
Connect your repositories to remote git hosts like GitHub, GitLab, and other popular Git platforms.
Edit the code in your repositories within a dedicated workspace, on your own copy of the code, before publishing changes to branches.
Perform most Git operations with a user-friendly interface that lets you inspect differences, commit changes, push updates, and create pull requests — all within BigQuery Studio.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2d7a064e20>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Software engineering best practices for all data practitioners
BigQuery repositories help organizations standardize the way code is developed, version, and deployed. Data teams with members of different levels of technical expertise can all collaborate on the same code base, following the same software engineering best practices.
Data analysts can contribute to code repositories via a simple GUI interface that lets them create workspaces, commit changes, and push code to branches.
Data engineers can develop in BigQuery Studio or with their favourite local IDE on the same codebase.
Data scientists can develop Colab Enterprise notebooks from BigQuery Studio, within their organization’s VPC, but back the code in a remote repository where they can manage versions and ask peers for code reviews.
Getting started
To begin using BigQuery repositories, navigate to BigQuery Studio in the Google Cloud console or visit the documentation for detailed instructions.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C7g instances are available in the AWS Canada West (Calgary) region. These instances are powered by AWS Graviton3 processors that provide up to 25% better compute performance compared to AWS Graviton2 processors, and built on top of the the AWS Nitro System, a collection of AWS designed innovations that deliver efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage.
Amazon EC2 Graviton3 instances also use up to 60% less energy to reduce your cloud carbon footprint for the same performance than comparable EC2 instances. For increased scalability, these instances are available in 9 different instance sizes, including bare metal, and offer up to 30 Gbps networking bandwidth and up to 20 Gbps of bandwidth to the Amazon Elastic Block Store (EBS).
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R7g instances are available in the AWS GovCloud (US-West) region. These instances are powered by AWS Graviton3 processors that provide up to 25% better compute performance compared to AWS Graviton2 processors, and built on top of the the AWS Nitro System, a collection of AWS designed innovations that deliver efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage.
Amazon EC2 Graviton3 instances also use up to 60% less energy to reduce your cloud carbon footprint for the same performance than comparable EC2 instances. For increased scalability, these instances are available in 9 different instance sizes, including bare metal, and offer up to 30 Gbps networking bandwidth and up to 20 Gbps of bandwidth to the Amazon Elastic Block Store (EBS).
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) M7gd instances with up to 3.8 TB of local NVMe-based SSD block-level storage are available in Middle East (UAE) region.
These Graviton3-based instances with DDR5 memory are built on the AWS Nitro System and are a great fit for applications that need access to high-speed, low latency local storage, including those that need temporary storage of data for scratch space, temporary files, and caches. They have up to 45% improved real-time NVMe storage performance than comparable Graviton2-based instances. Graviton3-based instances also use up to 60% less energy for the same performance than comparable EC2 instances, enabling you to reduce your carbon footprint in the cloud.
M7gd instances are now available in the following AWS regions: US East (N. Virginia, Ohio), US West (Oregon, N. California), Europe (Spain, Stockholm, Ireland, Frankfurt, Paris), Asia Pacific (Tokyo, Mumbai, Singapore, Sydney), South America (São Paulo), and Middle East (UAE).
Amazon Nova now supports expanded Tool Choice parameter options in the Converse API, enhancing developers’ control over model interactions with tools. Today, developers already use the Converse API to create sophisticated conversational applications, such as customized chat bots to maintain conversations over multiple turns. With this update, Nova adds support for ‘Any’ and ‘Tool’ modes in addition to the existing ‘Auto’ mode support, enabling developers to use all three different modes.
Auto leaves tool selection entirely to Nova’s discretion, whether to call a tool or generate text instead. Auto is useful in use cases like chatbots and assistants where you may need to ask the user for more information, and is the current default.
Any prompts Nova to return at least one tool call, from the list of tools specified, while allowing it to choose which tool to use. Any is particularly useful in machine to machine interactions where your downstream components may not understand natural language but might be able to parse a schema representation.
Tool enable developers to request a specific tool to be returned by Nova. Tool is particularly useful in forcing a structured output by having a tool that has the return type as your desired output schema.
To learn about expanded Tool Choice parameter support in Amazon Nova’s Converse API, see the Amazon Nova user guide. Learn more about Amazon Nova foundation models at the Amazon Nova product page. You can get started with Amazon Nova foundation models in Amazon Bedrock from the Amazon Bedrock console.
Gen AI Toolbox for Databases is an open-source server that streamlines the development and management of sophisticated generative AI tools that can connect to databases. Currently, Toolbox can be used to build tools for a large number of databases: AlloyDB for PostgreSQL (including AlloyDB Omni), Spanner, Cloud SQL for PostgreSQL, Cloud SQL for MySQL, Cloud SQL for SQL Server, and self-managed MySQL and PostgreSQL. Because it’s fully open-source, it includes contributions from third-party databases such as Neo4j and Dgraph. This enables you to develop tools easier, faster, and more securely by handling the complexities such as connection pooling, authentication, and more.
LlamaIndex has emerged as a leading framework for building knowledge-driven and agentic systems. It offers a comprehensive suite of tools and functionality that facilitate the development of sophisticated AI agents. Notably, LlamaIndex provides both pre-built agent architectures that can be readily deployed for common use cases, as well as customizable workflows, which enable developers to tailor the behavior of AI agents to their specific requirements.
In this post, we’ll share how LlamaIndex support for Toolbox works, Toolbox and LlamaIndex use cases, and samples to get started.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Gen AI Toolbox for Databases for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6b07be4430>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Challenges in gen AI tool management
Building AI agents that use different tools, frameworks, and data sources creates challenges, particularly when querying databases. These include:
Complex database connections that require configuration, connection pooling, and caching for optimal performance.
Security vulnerabilities when ensuring secure access from gen AI models to sensitive data.
Scaling tool management due to repetitive code and modifications across multiple locations for each tool.
Inflexible tool updates that require a complete rebuild and redeployment of the application.
Limited workflow observability due to lack of built-in support for comprehensive monitoring and troubleshooting.
Gen AI Toolbox for Databases
Toolbox comprises two components: a server specifying the tools for application use, and a client interacting with this server to load these tools onto orchestration frameworks. This centralizes tool deployment and updates, incorporating built-in production best practices to enhance performance, security, and simplify deployments.
Toolbox supported databases
How LlamaIndex support works
LlamaIndex is particularly useful for developers building knowledge assistants over enterprise data. LlamaIndex’s event-based Workflows provide a clean, easy abstraction for building production agents capable of finding information, synthesizing insights, generating reports, and taking action, even with the most complex enterprise data.
By connecting Large Language Models (LLMs) to virtually any data source to structure data, create indices, and build powerful query engines, LlamaIndex empowers developers to rapidly extract knowledge and build AI agents, accelerating the development and adoption of LLM applications across various industries.
For enterprises, LlamaCloud provides a turn-key solution for data ingestion, parsing, indexing and storage that integrates seamlessly with the rest of the framework to get from prototype to production quickly.
For building agents, the controlled and specified calling of tools, reliable execution, and seamless passing of context back to the LLM are essential. Toolbox handles the execution itself, seamlessly running the tool and returning results. Together, Toolbox and LlamaIndex create a powerful solution for tool calling in agent workflows.
Use cases
LlamaIndex supports a broad spectrum of different industry use cases, including agentic RAG, report generation, customer support, SQL agents, and productivity assistants. LlamaIndex’s multi-modal functionality extends to applications like retrieval-augmented image captioning, showcasing its versatility in integrating diverse data types. LlamaIndex’s hundreds of data integrations and industry-leading parsing solutions in LlamaParse make it a stand-out choice for building agents that interact with enterprise data sources.
“We’re delighted to work with Google on Gen AI Toolbox, which neatly addresses a number of real pain-points in getting production agentic applications off the ground. We think the simplified security story in particular is going to be really attractive to devs building with these popular databases,” said Laurie Voss, VP of Developer Relations at LlamaIndex.
Get started
Through our partnership with LlamaIndex, we’re thrilled to offer enhanced value to developers building production-grade agents across diverse knowledge retrieval use cases. Here are some resources to get you started:
All workloads are not the same. This is especially the case for AI, ML, and scientific workloads. In this blog we show how Google Cloud makes the RDMA over converged ethernet version 2 (RoCE v2) protocol available for high performance workloads.
Traditional workloads
Network communication in traditional workloads involves a well-known flow. This includes:
Movement of data between source and destination. The application initiates requests.
The OS processes the data, adds TCP headers and passes it to the network interface card (NIC).
The NIC sends data on the wire based on networking and routing information.
The Receiving NIC receives data.
OS processing on the receiving end strips headers and delivers data based on information.
This process involves both CPU and OS processing, and these networks can recover from latency and packet loss issues and handle data of varying sizes while functioning normally.
AI workloads
AI workloads are very sensitive, involve large datasets, may require high bandwidth, low latency and lossless communication for training and inference. Because there is a higher cost for running these types of jobs, it’s important that they are completed as quickly as possible and optimize processing. This can be achieved with accelerators — specialized hardware designed to significantly speed up the training and execution of AI applications. Examples of accelerators include specialized hardware chips likeTPUs and GPUs.
aside_block
<ListValue: [StructValue([(‘title’, ‘”$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6b07be0490>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/products?#networking’), (‘image’, None)])]>
RDMA
Remote Direct Memory Access (RDMA) technology allows systems to exchange data directly between one another without involving the OS, networking stack and CPU. This allows faster processing times since the CPU, which can become a bottleneck, is bypassed.
Let’s take a look at how this works with GPUs.
An RDMA-capable application initiates an RDMA operation.
Kernel bypass takes place, avoiding the OS and CPU.
RDMA-capable network hardware gets involved and accesses source GPU memory to transfer the data to the destination GPU memory.
On the receiving end, the application can retrieve the information from the GPU memory, and a notification is sent to the sender as confirmation.
How RDMA with RoCE works
Previously, Google Cloud supported RDMA-like capabilities with its own native networking stack called GPUDirect-TCPX and GPUDirect-TCPXO. Currently the capability has been expanded with RoCEv2, which implements RDMA over ethernet.
RoCE-v2-capable compute
Both the A3 Ultra and A4 Compute Engine machine types leverage RoCE v2 for high-performance networking. Each node supports eight RDMA-capable NICs connected to the isolated RDMA network. Direct GPU-to-GPU communication within a node occurs via NVLink and between nodes via RoCE.
Adopting RoCEv2 networking capabilities offers more benefits including:
Lower latency
Increased bandwidth — from 1.6 Tbps to 3.2 Tbps of inter-node GPU to GPU traffic
Support for new VM series like A3 Ultras, A4 and beyond
Scalability support for large cluster deployments
Optimized rail-designed network
rail design
Overall these features result in faster training and inference, directly improving application speed. It’s achieved through a specialized VPC network, optimized for this purpose. This high-performance connectivity is a key differentiator for demanding applications.
Get started
To enable these capabilities, follow these steps:
Create a reservation: Obtain your reservation ID; you may have to work with your support team for capacity requests.
With Gemini Code Assist, developers aim to boost their efficiency and code quality. But what’s the process to effectively adopt AI-assisted coding? How do you measure the impact of these tools on your team’s performance?
In this article, we’ll provide a practical framework for adopting AI-assisted code creation, and for evaluating the effectiveness of AI assistance in your software development workflow.
This post outlines a four-step framework to adopt AI code assistants like Gemini Code Assist on your software development team: Adoption, trust, acceleration, and impact.
Adoption: Ensure your developers are actively using the tool by tracking measures like daily active use and code suggestions.
Trust: Gauge developers’ confidence in the AI’s output by monitoring code suggestion acceptance rates and lines of code accepted.
Acceleration: Look for improvements in development speed and software quality through existing productivity metrics like DORA measures, story points, or ticket closures.
Impact: Connect these improvements to your overall business goals by assessing changes in key performance indicators like revenue, market share, or time to market.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Gemini Code Assist Standard for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6b070c8b20>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
The AI-assistance journey
Committing to code AI-assistance involves change management, a process defined by a transition from one position (development before AI-assistance) to another (development after). Put simply, it’s a journey.
The four phases of AI-assisted productivity improvement
The journey can be understood in four progressive phases.
Adoption: Evaluation and proof of concept activities to identify how and where code AI-assistance may contribute to developer outcomes.
Trust: Establishment of confidence in AI-assistance’s output.
Acceleration: Assessment of AI-assistance’s ability to improve team development speed with existing productivity metrics like DORA measures, story points, or ticket closures.
Impact: Confirmation of improvement to key business performance indicators such as revenue, market share, or time to market.
In addition to what each phase does, it’s important to understand how involved parties contribute. Technology and business leaders frequently initiate evaluations (Adoption) and affirm ongoing success (Impact). Between these two activities, individual developers explore coding AI-assistance, familiarize with its capabilities, and establish regular use (Trust). During this time it is important to allow developers ample time to learn and experiment with ways to utilize AI. As teams of developers explore and develop trust together, the group iterates through feedback to further optimize team productivity (Acceleration).
The four phases feed forward and cannot be skipped
One common mistake organizations make is believing that using code AI-assistance (Adoption) will yield immediate business results (Impact). Another way to express this is that they believe they can transition from Adoption immediately to Impact, skipping Trust and Acceleration.
If an organization does not meaningfully Adopt AI assistance tools, and has minimal Trust in its suggestions, it’s unrealistic to expect an Acceleration in team productivity, much less substantive business Impact.
Confirming Adoption and Trust benefit from 6 to 8 weeks of time
Another misconception is that adopting code AI-assistance will have an impact overnight. Every organization is different, but we’ve found that at least 6 to 8 weeks, or four two-week sprints, is needed before impact to your organization’s productivity (Acceleration) may be observed. With each phase progressing into the next, it takes time for the effects of adopting AI assistance to propagate forward. This awareness is particularly important when conducting an evaluation, which we discuss further later in this article.
Effect and measures that can be used with each phase
While the four phases of the AI assistance journey are conceptual, they can and should be measured to affirm progress and impact. Below we describe what measures may be used when and why.
Adoption. Daily activity (developer use of AI-assistance), code suggestion (AI code recommendations), and chat exposure (AI chat requests) volume are early signals indicating whether developers are taking advantage of AI-assistance. With this phase and using these measures, you want to confirm consistent and growing daily developer engagement. As Adoption grows, you can shift focus to establishing Trust.
Trust. Are adopting developers accepting AI-assistance? Code suggestions accepted (AI code recommendations accepted), acceptance rate (the percentage of code suggestions accepted divided by the number of code suggestions), and lines of code accepted (number of lines of code accepted) measures can be used to assess trust. Monitoring for low acceptance rates for code suggestions and lines of code should prompt you to investigate why trust may be low. You may also elicit further understanding from developer interviews and surveys (sample survey questions).
Acceleration. You may already have developer productivity (Acceleration) measures in place, including established DORA software delivery measures. Alternatively, you may want to evaluate it through completed story points or ticket closures per time period, among other measures. Once you’ve established adoption and trust, monitoring acceleration measures for improvement can both confirm the productivity benefits of AI assistance as well as provide line of sight to business impact outcomes and measures.
Impact. This final phase is expressed in business key performance indicators. Specific impact measures differ among organizations, and should be monitored by organization leaders to evaluate the business yield of AI-assistance. Impact measures may include revenue, market share, reduced time to product improvements, and other business health criteria. An observed improvement in Acceleration would expect to positively contribute to an Impact measure(s) as well.
It is important to be aware that AI-assistance measures, those within Adoption and Trust phases, are not development productivity measures, those found in the Acceleration phase. To illustrate why, consider the following: would a high AI-assistance code suggestion acceptance rate or significant volume of AI-assisted lines of code accepted that negatively impacted DORA measures or average ticket closures still be considered a development productivity improvement? Most would agree it would not and this is why it is important to make the distinction. AI-assistance metrics measure Adoption and Trust of AI-assistance but development productivity metrics express the impact. Impact measures reveal the final effect.
With a journey, phases, and corresponding measures defined, you can collectively use these elements as a guiding framework to progress and affirm the impact of code AI-assistance.
Measuring impact with Gemini Code Assist
Gemini Code Assist supports Adoption and Trust measurement through Gemini for Cloud Logging logs, in Preview. Through these logs active use, code suggestions, code suggestions accepted, acceptance rate, chat exposures, and lines of code accepted are made visible. This further includes discrete activity on a per user, programming language, IDE client, and activity time basis – deep insights not always available with aggregate AI-assistance measures. These insights can be used to assess both organization journey performance and also answer specific questions like “How many AI-assisted lines of code did we accept last week, by programming language? By developer?”
Gemini Code Assist logs provides discrete activity insights including code exposure and acceptances by programming language, user, and time
While Gemini Code Assist logs provide discrete details per activity, we also provide a sample dashboard built on Log Analytics to assist with aggregate Adoption and Trust measures review.
Gemini Code Assist measures dashboard sample using Log Analytics.
In addition to the above, Cloud Monitoring metrics are provided to monitor active use across Gemini for Cloud, in Preview, including Gemini Code Assist.
The four phases of AI-assisted impact evaluation
Before making a commitment to code AI-assistance, many organizations choose to first conduct an evaluation. Like the AI-assistance adoption journey, an evaluation journey may be a phased process. Here again each phase would feed into the next and involve specific parties.
Success criteria. Before starting an evaluation you need to define and baseline the evaluation success criteria. There are two audiences that need to be considered when defining success criteria, the development team and business decision makers, and both should agree on the definition of success. Success criteria could include improvements to Acceleration measures such as DORA, story point velocity, or tickets resolved, as examples. This phase is often overlooked but it is the most important phase as we have found organizations can overlook, or fail to collectively agree on, a success criteria before initiating their evaluation and then experience difficulty when trying to retroactively assess AI-assistance’s impact.
Participants. While there are multiple approaches to consider, the most common involve either selecting a single team of developers and conducting evaluation with successive project efforts (the first project with AI-assistance and the following without), or comparing two teams performance in an A/B cohort (one team using AI-assistance while the other does not). Whatever you choose, discuss and agree on who will be participating and why upfront. An “apples to apples” comparison should be prioritized. For example, utilizing an A/B cohort where your A team is significantly more experienced and is also using AI-assistance against a B team that is less experienced and is not using AI-assistance may lead to an lopsided evaluation that makes impact evaluation difficult at best if not unreliable.
Measure. The AI-assistance journey can guide progressing and monitoring your evaluation. Quantitative and qualitative measures, alongside success criteria, can be regularly reviewed to ensure an evaluation progresses to a point where the impact of AI-assistance can be assessed.
Commit. If you have agreed on your success criteria and have been measuring and facilitating progress along the way, you will be able to confirm or reject an code AI-assistance commitment pending satisfaction of the accepted success criteria.
Levels of evaluation investment
The intensity of a code AI-assistance evaluation can vary. A Minimal evaluation may involve a handful of developers, qualitative surveys, and the monitoring of Acceleration measures only. A more typical approach in our experience, and one opted for by most organizations, is a Moderate investment looking at both quantitative and qualitative measures and involves either a team performing successive development with and without code AI-assistance, or two teams in an A/B cohort doing the same. An Involved evaluation, listed for completeness, utilizes formal research, lab studies, and analysis. In practice, we have found few organizations for an Involved evaluation.
Whatever intensity model is pursued, we want to again reiterate the importance of defining success criteria up front and gathering baseline data for comparison.
An evaluation may complete when you observe improvement to Acceleration measures
In defining commitment success criteria it is important to consider targeting what may be sufficient to confirm AI-assistance’s impact. Often, improvements to Acceleration phase productivity measures can provide line of sight to Impact phase measure improvement (business key performance indicator). Conversely, choosing a commitment success criteria that requires significant time or effort to conclude may reduce the opportunity to confirm AI-assistance impact sooner if at all.
Get started today
Ready to get started with AI assistance for software development? For more information about Gemini Code Assist, visit our product page, documentation, and quickstart.
DORA’s software delivery performance metrics are good indicators for the Acceleration and Impact phases of your AI assistance adoption journey. Teams that are not yet tracking these metrics can use the DORA Quick Check to capture their current software delivery performance.
DORA’s research also shows many capabilities lead to improved software delivery performance. Measurements for these capabilities can inform the Adoption and Trust phases of your adoption journey and can serve as leading indicators for later journey phases.
Two years ago, on a mission to redefine enterprise-grade databases we released AlloyDB for PostgreSQL in production. We saw the immense popularity and flexibility of PostgreSQL — a database developers love for being open-source — and we knew we could build upon that strong foundation to create a next generation fully managed database with a cloud-first architecture that offered strong performance, scalability, autopilot features and analytical capabilities.
To celebrate this journey marked by significant customer adoption and industry recognition, we are launching a series of blogs to explore the innovations, growth, and customer successes that define AlloyDB, and offer a glimpse into its future. You’ll get an inside look at AlloyDB’s key differentiators, including scalability, performance, analytics and our powerful generative AI capabilities, and learn how customers are leveraging its power.
But first, if you’re new to AlloyDB, here’s a recap of what makes it special, and the milestones we’ve achieved since launching it.
Built on the best: PostgreSQL and Google Cloud
AlloyDB’s strength lies in its foundation: the power and versatility of PostgreSQL combined with Google Cloud’s infrastructure and innovation. PostgreSQL’s popularity among developers is undeniable — it ranked as the most popular database in the 2024 Stack Overflow Developer Survey. And, according to the 2024 Gartner® Magic Quadrant™, Google Cloud is a Leader in both Strategic Cloud Platform Services (SCPS) and Cloud Database Management Systems (CDBMS). This combination makes AlloyDB uniquely positioned to meet the challenges of modern enterprise workloads.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6b070d1d60>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Architectural innovations
AlloyDB’s exceptional performance and scalability are a direct result of Google Cloud’s differentiated infrastructure and an optimized database kernel. Here’s how:
Disaggregated compute and storage: AlloyDB separates database compute from an intelligent, PostgreSQL-optimized intelligent storage service. By offloading database log processing to the storage, we reduce compute resource usage allowing those resources to be dedicated to user queries and transactions. Thestorage service itself further disaggregates compute (log processing) and storage (block storage), enabling independent scaling, dynamic adaptation to workloads, fault tolerance, and cost-effective, horizontally scaling. Tiered caching with automatic data placement further boosts performance. This multi-layered disaggregation is a key driver of AlloyDB’s scalability.
Optimized database kernel: AlloyDB has a highly optimized PostgreSQL engine. This includes efficient resource management, advanced query processing, highly scalable and improved concurrency control, all focused on minimizing overhead and maximizing transaction throughput. This allows AlloyDB to vertically scale transactional workloads to 128 vCPUs.
Superior read performance and scalability: Built on Google’s Colossus, a distributed regional storage service designed for exceptional durability and near-unlimited scalability, AlloyDB offers read pools, which enable massive horizontal scaling for read-intensive workloads. Scale effortlessly to thousands of vCPUs and handle immense concurrent read traffic without impacting transactional performance. Regional storage enables instant provisioning of read pools without data copying, and with improvements in the replication logic allows AlloyDB to achieve over 25x lower replication lag compared to standard PostgreSQL.
High availability and disaster recovery: AlloyDB provides high availability through instance redundancy, data replication, and automated failover mechanisms. Within a region, when configured in high availability, it maintains a standby in another zone, with automatic promotion if the primary instance fails. For disaster recovery, AlloyDB supports cross-regional replicas, which allow data to be available closer to your application’s region, helping to keep your data available and protected.
Analytics Accelerator: AlloyDB has a built-in, in-memory analytics accelerator that provides real-time analytics on your operational data.
Auto-pilot features: AlloyDB has a number of optimizations to help with automatic memory management, adaptive autovacuum which is built on top of PostgreSQL’s autovacuum, and index advisor to help accelerate query performance.
Staying true to PostgreSQL, while pushing the boundaries
Choosing to build on PostgreSQL gave us a head start with AlloyDB, providing a mature, feature-rich database engine and a vast ecosystem of tools and extensions. This allowed the AlloyDB team to focus on developing unique, cloud-first optimizations, including its groundbreaking generative AI capabilities. AlloyDB includes innovations that directly address enterprises’ core performance, scalability, and analytical challenges, while developers benefit from PostgreSQL’s broad data-type support, advanced features (like full-text search and geospatial functions), and the flexibility of open standards to avoid vendor lock-in.
Two years of groundbreaking achievements
Since the AlloyDB launch, we’ve consistently pushed the boundaries of what’s possible with a PostgreSQL-compatible database. We’ve concentrated on strengthening the core platform, resulting in over 70 enterprise feature releases driven by direct customer feedback. Key highlights include:
Powerful generative AI Capabilities: AlloyDB is transforming how you interact with your data. Drawing on Google’s decades of investment in AI, we’ve added powerful gen AI capabilities, making it possible to build sophisticated AI applications without ever leaving your database. Imagine being able to easily generate natural language summaries of complex datasets, or build intelligent chatbots that understand your data. Additionally, AlloyDB’s ScaNN index (Scalable Nearest Neighbors) delivers lightning-fast vector similarity search — crucial for modern AI applications such as recommendation engines, image and video search, and anomaly detection.
Top-tier performance and scalability: AlloyDB is significantly faster than standard PostgreSQL, delivering up to 4x faster transactional processing and up to 100x faster analytical queries.
Run anywhere with AlloyDB Omni: Run the optimized AlloyDB database anywhere with AlloyDB Omni, a downloadable edition that you can run on-premises, in other clouds, or even on your laptop. Through a partnership with Aiven, a leading provider of managed open-source data platforms, AlloyDB Omni is available as a managed service on Google Cloud, AWS, and Azure, reinforcing our commitment to a flexible, multi-cloud strategy.
A growing ecosystem and industry recognition
Real-world Successes: AlloyDB’s growing customer base spans e-commerce, fintech, and manufacturing. Major enterprises in the retail vertical, financial powerhouses across the globe, and technology innovators all trust AlloyDB. Ocean Network Express (ONE), a major player in global container shipping, consolidated their database fleet on AlloyDB, realizing an estimated 50% reduction in database licensing costs, and SEEBURGER’s Business Integration Suite (BIS) runs on AlloyDB to power its critical business operations.
Industry recognition: AlloyDB was recognized as a Leader in the Forrester Wave™: Translytical Data Platforms, Q4 2024, highlighting its strengths in accelerating the convergence of transactional and analytical workloads. The Forrester Wave cited Google Cloud’s unified platform approach, its ability to handle complex data operations, and its strong vision for the future of data management.
What’s next: Deep dives into AlloyDB’s differentiators
This blog post marks the beginning of an exciting new series where we’ll explore the core features that make AlloyDB stand out. In the coming weeks, we’ll publish dedicated posts on:
Scalability: How AlloyDB handles massive datasets and growing workloads, helping to ensure your database keeps pace with your business needs.
Performance: A closer look at the architectural innovations that deliver the speed you need for both transactional and analytical workloads.
Analytical capabilities: Uncovering how AlloyDB empowers you to gain deeper insights from your data with advanced analytics and reporting.
Vector capabilities and generative AI: Building AI agents with AlloyDB, leveraging its built-in ScaNN vector index for efficient similarity search. We’ll also explore how to create RAG applications using frameworks like LangChain and LlamaIndex, and how the Gen AI Toolbox for Databases can streamline these integrations.
Join the AlloyDB revolution!
Ready for a database that can handle your modern enterprise needs? Discover AlloyDB with a 30-day free trial. AlloyDB free trial clusters make it easy to get started.
Today, I’m pleased to announce the launch of AI Luminaries programming at the upcoming Google Cloud Next conference. This is a unique forum where some of the top researchers, scientists, and technology leaders in the world will engage with Google experts across a variety of critical AI domains, including research, infrastructure, distributed systems, and AI hardware.
The AI Luminaries content is a curated selection of talks and discussions that offer unparalleled access to Google’s leading AI innovators. Our customers and partners will be able to engage with a diverse range of content, including focused breakout sessions, interactive panels with AMAs, presentations on significant research papers, and networking opportunities within the Google Cloud Showcase.
aside_block
<ListValue: [StructValue([(‘title’, ‘Join us at Google Cloud Next’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6b044a44c0>), (‘btn_text’, ‘Register now’), (‘href’, ‘https://cloud.withgoogle.com/next/25’), (‘image’, <GAEImage: Google Cloud>)])]>
AI Luminaries sessions
These speakers represent not only academic thought leadership, but engineering execution and strategic vision, offering a unique glimpse into the next era of technological advancement.
Building the future of AI: Intelligent infrastructure and sustainable energy: The explosive growth of artificial intelligence demands a paradigm shift in how we build and power its infrastructure. Join Urs Hölzle, Google Fellow, and Parthasarathy Ranganathan, VP, Engineering Fellow at Google Cloud, for a fireside chat exploring the critical intersection of intelligent infrastructure and sustainable energy. They will delve into the challenges and opportunities of scaling AI, from the hardware powering massive models to the energy sources that fuel them. Discover how innovative approaches to infrastructure design and energy efficiency are shaping the future of AI, 25 years of cloud infrastructure design at Google, and what is coming for the next 25 years.
Google Cloud TPUs and specialized AI hardware: Jeff Dean on what’s next: Join an insightful fireside chat with Jeff Dean, a pioneering force behind Google’s AI leadership and Sabastian Mugazambi, Senior Product Manager, Cloud AI Infrastructure. As Google’s Chief Scientist at DeepMind & Research, Jeff will share his vision on AI and specialized AI hardware like Google Cloud TPUs. What exciting things might we expect to see next? What drives Google’s innovation in specialized AI hardware? We’ll discuss how TPUs enable efficient large-scale training and optimal inference workloads including exclusive, never-before-revealed details of the latest Gen TPU, differentiated chip design, data center infrastructure and software stack co-design that makes TPUs the most compelling choice for AI workloads.
Isomorphic Labs: Solving disease with AIIsomorphic Labs has created an AI drug design engine that reimagined traditional drug discovery approaches and allowed the company to go after some of the hardest disease targets. IsoLabs’ frontier AI research has yielded models that can accurately predict properties of proteins and other molecules and allow chemists to rationally design the next generation of medicines in-silico, in record time. Featuring Isomorphic Labs Chief Technology Officer, Sergei Yakneen.
In my session, Now in focus: The fifth, gen AI epoch computing infrastructure, I will share my thoughts on the future of AI infrastructure, exploring the design principles behind scale-out architectures and the shift towards specialized hardware, high-bandwidth networks, liquid cooling and advanced memory technologies necessitated by Generative AI.
Join us at Google Cloud Next
By sharing our knowledge and expertise in these sessions, we aim to foster a deeper understanding of the technologies we are building for the future of AI and inspire further innovation across the industry. We invite you to join us at Google Cloud Next to witness the next era of AI development.
Protecting your data in the cloud is more critical than ever.As your Google Cloud deployments grow, managing your data protection strategy for Compute Engine workloads can become complex. That’s why we’re excited to announce a streamlined experience that simplifies data protection with powerful capabilities that are purpose-built for Persistent Disk, Hyperdisk, and Backup and DR Service.
Introducing the Data protection tab
Say hello to the new Data protection tab in the Compute Engine VM creation experience.
This new tab brings together backup and continuous data replication options for block storage in a single unified interface, making it possible to configure your data protection settings when creating a VM in the Google Cloud console, saving you time and effort, and allowing you to focus on what matters most: building and deploying your applications.
Selecting backup plans and snapshot schedules
Snapshot schedules for Persistent Disk and Hyperdisk provide foundational data protection for Compute Engine workloads, including basic scheduling automation at no charge. Snapshot schedules are a simple, cost-effective way to back up VM disks into the region or multi-region of your choice.
For more comprehensive protection, Backup and DR provides immutable backup vaults, securing data from accidental or malicious deletion, ransomware attacks, and project- or organization-level threats. These vaults help ensure that backups remain tamper-proof and resilient against unauthorized changes.
Beyond immutability, Backup and DR offers advanced backup management features, including centralized monitoring, detailed reporting, complex scheduling rules, and compliance-focused capabilities. Designed for teams with sophisticated data protection needs, Backup and DR supports centralized backup management across multiple cloud services (currently Compute Engine and Google Cloud VMware Engine) and includes a separate management pricing model.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6b08705730>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Configure continuous data replication
In addition to selecting snapshots or backup plans, you can choose to protect your data with continuous replication from the new Data protection tab. All capabilities for backup, cross-zone replication, and cross-region replication are designed to work together for layered protection against different types of risk. Continuous replication helps ensure minimal data loss in case of a disaster, complementing the historical recovery points provided by backups.
While some workloads such as test environments may not need any protection at all, most production workloads need protection from temporary outages, disasters, user errors, and malware. For mission-critical workloads that require zero or near-zero recovery point objectives (RPOs) for high availability and disaster recovery, you can choose continuous data replication to another point of presence. In the Data protection tab, this is as simple as selecting a secondary zone or region.
How it works
When creating a new VM through the console, simply navigate to the Data protection tab.In a project where Backup and DR has been enabled, a default backup plan will be pre-selected; if not, a snapshot schedule will be applied.
We sometimes see customers lose important data because they haven’t configured any protection for their workloads. Using backup protection for production VMs is a valuable way to minimize the risk of data loss. For test environments, setting up backup protection can help you verify that your restore processes will work once deployed. You can always customize and fully control your backup protection:
Choose between snapshot schedules and backup plans. Select the option that best meets your needs and budget. You can select either a pre-defined snapshot schedule or backup plan.
For workloads that need synchronous cross-zone replication (with RPO=0 between zones), you can choose to set your Persistent Disks to be Regional Persistent Disks, or your Hyperdisk Balanced disks to be Hyperdisk Balanced High Availability disks.
For workloads that need cross-region replication (with RPO < 1 minute between regions), select Asynchronous Replication and choose your secondary region. Replication will be started automatically.
You can opt out of using the default data protection options by simply selecting “No Backups” when configuring the VM or by modifying the Compute Engine Project settings. Learn more.
Data protection tab selections only affect the choices for the VM being created in the console. To configure VMs at scale, you can configure your data protection options with the API, gCloud, or Terraform.
Get started today
Protect your VMs with a comprehensive data protection strategy using snapshot schedules or backup plans, along with continuous data replication to secondary zones and regions. Ensure your data is safeguarded and manage your data protection needs in one place.
The new Data protection tab is available now in the Compute Engine VM creation experience. Try it out today and streamline your data protection strategy! Learn more here.
Amazon CloudWatch Database Insights announces support for Amazon Aurora and RDS databases hosted in the AWS GovCloud (US) Regions. Database Insights is a database observability solution that provides a curated experience designed for DevOps engineers, application developers, and database administrators (DBAs) to expedite database troubleshooting and gain a holistic view into their database fleet health.
Database Insights consolidates logs and metrics from your applications, your databases, and the operating systems on which they run into a unified view in the console. Using its pre-built dashboards, recommended alarms, and automated telemetry collection, you can monitor the health of your database fleets and use a guided troubleshooting experience to drill down to individual instances for root-cause analysis. Application developers can correlate the impact of database dependencies with the performance and availability of their business-critical applications. This is because they can drill down from the context of their application performance view in Amazon CloudWatch Application Signals to the specific dependent database in Database Insights.
You can get started with Database Insights by enabling it on your Aurora and RDS clusters using the service consoles, AWS APIs, and SDKs. Database Insights delivers database health monitoring aggregated at the fleet level, as well as instance-level dashboards for detailed database and SQL query analysis.
Database Insights is now available in the AWS GovCloud (US-West) and AWS GovCloud (US-East) Regions, and applies a new vCPU-based pricing – see pricing page for details. For further information, visit the Database Insights documentation.
At Google Cloud, we’re thrilled to return to NVIDIA’s GTC AI Conference in San Jose CA this March 17-21 with our largest presence ever. The annual conference brings together thousands of developers, innovators, and business leaders to experience how AI and accelerated computing are helping humanity solve the most complex challenges. Join us to discover how to build and deploy AI with optimized training and inference, apply AI with real-world solutions, and experience AI with our interactive demos.
After being the first hyperscaler to make both NVIDIA’s HGX B200 and GB200 NVL72 available to customers with A4 and A4X VMs. We’re are pleased to announce that A4 VMs are generally available, and that A4X VMs are in preview with general availability coming soon.
A4X VMs: Accelerated by NVIDIA GB200 NVL72 GPUs, A4X VMs are purpose-built for training and serving the most demanding, extra-large-scale AI workloads — particularly those involving reasoning models, large language models (LLMs) with long context windows, and scenarios that require massive concurrency. This is enabled by unified memory across a large GPU domain and ultra-low-latency GPU-to-GPU connectivity. Each A4X VM contains 4 GPUs, and an entire 72 GPU system is connected via fifth-generation NVLink to deliver 720 petaflops of performance (FP8). A4X has achieved 860,000 tokens/sec of inference performance on a full NVL72 running Llama 2 70b
A4 VMs: Built on NVIDIA HGX B200 GPUs, the A4 VM provides excellent performance and versatility for diverse AI model architectures and workloads, including training, fine-tuning, and serving. Each A4 VM contains eight GPUs for a total of 72 petaflops of performance (FP8). A4 offers easy portability from prior generations of Cloud GPUs. This enables an easy upgrade to 2.2x increase in training performance over A3 Mega (NVIDIA H100 GPU).
“We’re excited that we were among the first to test A4 VMs, powered by NVIDIA Blackwell GPUs and Google Cloud’s AI Hypercomputer architecture. The sheer compute and memory advancements, combined with the 3.2 Tbps GPU-to-GPU interconnect via NVLink and the Titanium ML network adapter, are critical for us to train our models. Leveraging the Cluster Director simplifies the deployment and management of our large-scale training workloads. This gives our researchers the speed and flexibility to experiment, iterate, and refine trading models more efficiently.” – Gerard Bernabeu Altayo, Compute Lead, Hudson River Trading
The Google Cloud advantage
A4 and A4X VMs are part of Google Cloud’s AI Hypercomputer, our supercomputing architecture designed for high performance, reliability, and efficiency for AI workloads. AI Hypercomputer brings together Google Cloud’s workload-optimized hardware, open software, and flexible consumption models to help simplify deployments, improve performance, and optimize costs. A4 and A4X VMs benefit from the following AI Hypercomputer capabilities:
AI-optimized architecture: A4 and A4X VMs are built on servers with our Titanium ML network adapter, which builds on NVIDIA ConnectX-7 network interface cards (NICs) to deliver a secure, high-performance cloud experience for AI workloads. Combined with our datacenter-wide 4-way rail-aligned network, A4 VMs deliver non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). You can scale to tens of thousands of NVIDIA Blackwell GPUs with our Jupiter network fabric with 13 Petabits/sec of bi-sectional bandwidth.
Simplified deployment with pre-built solutions: For large training workloads, Cluster Director offers dense co-location of accelerator resources, to help ensure host machines are allocated physically close to one another, provisioned as blocks of resources, and interconnected with a dynamic ML network fabric that minimizes network hops and optimizes for the lowest latency.
Scalable infrastructure: With support for up to 65,000 nodes per cluster, Google Kubernetes Engine (GKE) running on AI Hypercomputer is the most scalable Kubernetes service with which to implement a robust, production-ready AI platform. A4 and A4X VMs are natively integrated with GKE. And with integration to other Google Cloud services such as Hyperdisk ML for storage or BigQuery as a data warehouse, GKE facilitates data processing and distributed computing for AI workloads.
Fully-integrated, open software: In addition to support for CUDA, we work closely with NVIDIA to optimize popular frameworks with XLA such as PyTorch and JAX (including the reference implementation, MaxText), enabling increased performance of GPU infrastructure. Developers can easily incorporate powerful techniques like a latency hiding scheduler to minimize communication overhead (see XLA optimizations).
Flexible consumption models: In addition to the on-demand, committed use discount, and Spot consumption models, we reimagined cloud consumption for the unique needs of AI workloads with Dynamic Workload Scheduler, which offers two modes for different workloads: Flex Start mode for enhanced obtainability and better economics, and Calendar mode for predictable job start times and durations. Dynamic Workload Scheduler improves your access to AI accelerator resources, helps you optimize your spend, and can improve the experience of workloads such as training and fine-tuning jobs, by scheduling all the accelerators needed simultaneously.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9dac390f40>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
NVIDIA and Google Cloud: Better together
We’re continuously working together to provide our joint customers with an optimized experience. One of our recent collaborations includes bringing together software innovations to accelerate AI-driven drug discovery. Using the NVIDIA BioNeMo framework and blueprints on GKE, and PyTorch Lightning, we’re providing ready-to-use reference workflows for domain specific tasks. The NVIDIA BioNeMo Framework provides an optimized environment for training and fine-tuning biomolecular AI models. Read more here.
Meet Google Cloud
To connect with Google Cloud, please visit us at booth #914 at NVIDIA GTC, join our expert-led sessions listed below, or email us to set up a private meeting. Whether it’s your first time speaking with Google Cloud or the first time connecting with us at NVIDIA GTC, we’re looking forward to meeting with you.
Deep dive into AI at expert-led sessions
Join our expert-led sessions to gain in-depth knowledge and develop practical skills in AI development on Google.
Build Next-Generation AI Factories With DOCA-Accelerated Networking Time: 9:00 AM – 9:40 AM PDT Speakers: Valas Valancius, Senior Staff Software Engineer, Google Cloud; Ariel Kit, Director, Product Management, NVIDIA; David Wetherall, Distinguished Engineer, Google Cloud
Toward Rational Drug Design With AlphaFold 3 Time: 10:00 AM – 10:40 AM PDT Speakers: Max Jaderberg, Chief AI Officer, Isomorphic Labs (DeepMind) and Sergei Yakneen, Chief Technology Officer, Isomorphic Labs (DeepMind)
AI in Action: Optimize Your AI Infrastructure Time: 11:00 AM – 11:40 AM PDT Speakers: Chelsie Czop, Senior Product Manager, Google Cloud; Kshetrajna Raghavan, Machine Learning Engineer, Shopify; Ashwin Kannan, Principal Machine Learning Engineer, Palo Alto Networks; Jia Li, Chief AI Officer, Livex.AI
Horizontal Scaling of LLM Training with JAX Time: 2:00 PM – 2:40 PDT Speakers: Andi Gavrilescu, Sr. Engineering Manager, Google; Matthew Johnson, Research Scientist, Google Abhinav Goel, Senior Deep Learning Architect, Google
On-Demand, Virtual Sessions
S74318: Deploy AI and HPC on NVIDIA GPUs With Google Speakers: Annie Ma-Weaver, HPC group Product Manager, Google Cloud; Wyatt Gorman, HPC and AI Solutions Manager, Google Cloud; Sam Skillman, HPC Software Engineer, Google Cloud
The quest to develop new medical treatments has historically been a slow, arduous process, screening billions of molecular compounds across decade-long development cycles. The vast majority of therapeutic candidates do not even make it out of clinical trials.
Now, AI is poised to dramatically accelerate this timeline.
As part of our wide-ranging, cross-industry collaboration, NVIDIA and Google Cloud have supported the development of generative AI applications and platforms. NVIDIA BioNeMo is a powerful open-source collection of models specifically tuned to the needs of medical and pharmaceutical researchers.
Medical and biopharma organizations of all sizes are looking closely at predictive modeling and AI foundation models to help disrupt this space. With AI, they’re working on accelerating the identification and optimization of potential drug candidates to significantly shorten development timelines and address unmet medical needs. This has become a significant turning point for analyzing DNA, RNA, and protein sequences, and chemicals, predicting molecular interactions, and designing novel therapeutics at scale.
With BioNeMo, companies in this space gain a more data-driven approach to developing medicines while reducing reliance on time-consuming experimental methods. But these breakthroughs are not without their own challenges. The shift to generative medicine requires a robust tech stack, including: powerful infrastructure to build, scale, and customize models; efficient resource utilization; agility for faster iteration; fault tolerance; and orchestration of distributed workloads.
Google Kubernetes Engine (GKE) offers a powerful solution for achieving many of these demanding workloads, and when taken together with NVIDIA BioNeMo, GKE can accelerate work on the platform. With BioNeMo running on GKE, organizations can achieve medical breakthroughs and new research with levels of speed and effectiveness that were unheard of before.
In this blog, we’ll show you how to build and customize models and launch reference blueprints using NVIDIA BioNeMo platform on GKE
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9db19afa60>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
NVIDIA’s BioNeMo platform on GKE
NVIDIA BioNeMo is a generative AI framework that enables researchers to model and simulate biological sequences and structures. It places major demands for computing with powerful GPUs, scalable infrastructure for handling large datasets and complex models, and robust managed services for storage, networking, and security.
GKE offers a highly scalable and flexible platform ideal for AI and machine learning — and particularly the demanding workloads found in biopharma research and development. GKE’s autoscaling features ensure efficient resource utilization, while its integration with other Google Cloud services simplifies the AI workflow.
NVIDIA’s BioNeMo platform offers two potential synergistic components:
1. BioNeMo Framework: Large-Scale Training Platform for Drug Discovery AI
A scalable, open-source, training system for biomolecular AI models like ESM-2 and Evo2. It provides an optimized environment for training and fine-tuning biomolecular AI models. Built on NVIDIA NeMo and PyTorch Lightning, it offers:
Domain-Specific Optimization: Provides performant biomolecular AI architectures that can be scaled to billions of parameters (eg: BERT, Striped Hyena) along with representative model examples (e.g., ESM-2, Geneformer) built with CUDA-accelerated tooling tailored for drug discovery workflows.
GPU-accelerated performance: Delivers industry-leading speed through native integration with NVIDIA GPUs at scale, reducing training time for large language models and predictive models.
Comprehensive open-source resources: Includes programming tools, libraries, prepackaged datasets, and detailed documentation to support researchers and developers in deploying biomolecular AI solutions
2. BioNeMo Blueprints: Production Ready Workflows for Drug Discovery
BioNeMo Blueprints provide ready-to-use reference workflows for tasks such as protein binder design, virtual screening, and molecular docking. These workflows integrate advanced AI models like AlphaFold2, DiffDock 2.0, RFdiffusion, MolMIM, and ProteinMPNN to accelerate drug discovery processes. These blueprints provide solutions to patterns identified across several other industry use cases. Scientific developers can try NVIDIA inference microservices (NIMs) at build.nvidia.com and access them to test via a NVIDIA developer license.
The following graphic shows the components and features of GKE used by the BioNeMo platform. In this blog, we demonstrate how to deploy these components on GKE, combining NVIDIA’s domain-specific AI tools with Google Cloud’s managed Kubernetes infrastructure for:
Distributed pretraining and finetuning of models across NVIDIA GPU clusters
Blueprint-driven workflows using NIMs
Cost-optimized scaling via GKE’s dynamic node pools and preemptible VMs
Figure 1: NVIDIA BioNeMO Framework and BioNeMo Blueprints on GKE
Solution Architecture of BioNeMo framework
Here, we will walk through setting up the BioNeMo framework on GKE to perform ESM2 pretraining and fine-tuning.
Figure 2: BioNeMo framework on GKE
The above diagram shows an architectural overview of deploying the NVIDIA BioNeMo Framework on GKE for AI model pre-training, fine-tuning, and inferencing. Here’s a breakdown from an architectural perspective:
GKE: The core orchestration platform including the control plane managing the deployment and scaling of the BioNeMo Framework. This is deployed as a regional cluster, and can be optionally configured as a zonal cluster.
Node Pool: A group of worker nodes within the GKE cluster, specifically configured with NVIDIA GPUs for accelerated AI workloads.
Nodes: Individual machines within the node pool, equipped with NVIDIA GPUs.
NVIDIA BioNeMo Framework: The AI software platform running within GKE, enabling pre-training, fine-tuning, and inferencing of AI models.
Networking:
Virtual Private Cloud (VPC): A logically isolated network within GCP, ensuring secure communication between resources.
Load Balancer: Distributes incoming traffic to the BioNeMo services running in the GKE cluster, enhancing availability and scalability.
Storage:
Filestore (NFS): Provides high-performance network file storage for datasets and model checkpoints.
Cloud Storage: Object storage for storing datasets and other large files.
NVIDIA NGC Image Registry: Provides container images for BioNeMo and related software, ensuring consistent and optimized deployments.
Open a web browser pointing to http://localhost:8000/#timeseries to see the loss curves. The details for fine-tuning and inference are laid out in the GitHub repo.
Solution Architecture of BioNeMo Blueprints
The below graphic shows a BioNeMo Blueprint that is deployed on GKE for inferencing. From an infrastructure standpoint, the components used across the Compute, Networking and Storage layer are similar to Figure 2:
NIMs are packaged as a unit with runtime and model-specific weights. Blueprints deploy one or more NIMs using Helm charts. Alternatively, they can be deployed using gcloud or docker commands and configured using kubectl commands. Each NIM needs a minimum of one NVIDIA GPU accessible through a GKE node pool.
Three NIMs—AlphaFold2, DiffDock, and MolMIM—are deployed as individual Kubernetes deployments. Each deployment uses a GPU and a NIM container image, mounting a persistent volume claim for storing model checkpoints and data. Services expose each application on different ports. The number of GPUs can be configured to a higher value for better scalability.
Figure 3: NIM Blueprint on GKE
Steps
We have an example of deploying a BioNeMo blueprint for Generative Virtual Screening at Generative Virtual Screening for Drug Discovery on GKEGitHub repo. The setup steps, such as GKE cluster, node pool, and mounting filestore, are similar to BioNeMo training. The below steps give an outline of deploying the BioNeMo blueprint and using it for inference:
By integrating modular NIM microservices with scalable platforms like GKE, industries ranging from biopharma to agriculture can deploy AI-driven solutions tailored to their unique challenges, enabling faster insights and more efficient processes at scale.
Conclusion
As we’ve explored in this blog post, GKE provides a robust and versatile platform for deploying and running both NVIDIA BioNeMo Framework and NVIDIA BioNeMo Blueprint. By leveraging GKE’s scalability, container orchestration capabilities, and integration with Google Cloud’s ecosystem, you can streamline the development and deployment of AI solutions in the life sciences and other domains.
Whether you’re accelerating drug discovery with BioNeMo or deploying generative AI models with NIMs, GKE empowers you to harness the power of AI and drive innovation. By leveraging the strengths of both platforms, you can streamline the deployment process, optimize performance, and scale your AI workloads seamlessly.
Ready to experience the power of NVIDIA BioNeMo on Google Cloud? Get started today by exploring the BioNeMo Framework and NIM catalog, deploying your first generative AI model on GKE, and unlocking new possibilities for your applications.
We’d like to thank the NVIDIA team members who helped contribute to this guide, Juan Pablo Guerra, Solutions Architect, and Kushal Shah, Senior Solutions Architect.