Google Cloud

2025 07 17

GCP – AI/ML-ready Apache Spark with Dataproc

Apache Spark is the cornerstone for large-scale data processing, model training, and inference for AI/ML workloads. Yet, the complexities of environment configuration, dependency management, and MLOps integration can slow you down. To accelerate your AI/ML journey, Dataproc now delivers powerful, ML-ready capabilities for Spark. Available on both Dataproc on Compute Engine clusters and Google Cloud Serverless for Apache Spark, these enhancements are engineered to streamline development and operations, reducing setup overhead and simplifying workflows. This allows data scientists and engineers to dedicate more time to building and deploying impactful models rather than wrestling with infrastructure.

Let’s explore what’s new and how to start using these innovations today.

AI/ML-capable runtimes

Getting a Spark environment ready for ML, especially with GPU acceleration, used to involve custom scripts and manual configuration. Dataproc now streamlines this with ML Runtimes. ML Runtimes is a specialized Dataproc on Compute Engine image version, starting from 2.3 for Ubuntu-based images, designed to accelerate ML workloads. It ships with pre-packaged GPU drivers (NVIDIA Driver, CUDA, cuDNN, NCCL) and common ML libraries such as PyTorch, XGBoost, tokenizers, transformers etc, significantly cutting down cluster provisioning and setup time.

Google Cloud Serverless for Apache Spark also benefits from runtimes with pre-installed ML libraries, bringing the same ease of use to a serverless environment. These also include libraries such as XGBoost, PyTorch, tokenizers, transformers, etc.

“At Snap we use Spark on Dataproc for a variety of analytics and ML workloads including running GPU accelerated Spark Rapids, and model training and inference with PyTorch. The new Dataproc 2.3 ML runtime has been really helpful — reducing our cluster startup latency by 75% and eliminating toil for our ML Platform developers to build and manage environments.”– Prudhvi Vatala, Sr. Manager, Snap Inc.

It’s easy to create a Dataproc on a compute Engine cluster, specifying the ML image version and the required GPU accelerators for your workers.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud dataproc clusters create <your-cluster-name> \rn–project=<your-project-id> \rn–region=<your-region> \rn–image-version=2.3-ml-unbuntu \rn–master-machine-type g2-standard-4 –master-accelerator=type=nvidia-l4,count=1rn–worker-machine-type=g2-standard-8 \rn–worker-accelerator type=nvidia-l4,count=1’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3dee2f2340>)])]>

Additionally, Serverless Spark sessions (Generally Available) also support GPUs, and come similarly packaged with GPU drivers and common ML libraries.

Develop Spark applications in Colab or your favorite IDE

Now, you can develop and run Spark applications using Colab Enterprise notebooks in BigQuery Studio or via integrated development environments (IDEs) like VSCode and Jupyter.

BigQuery Colab enterprise notebooks are available within BigQuery Studio with native support for Spark application development. With Colab Enterprise notebooks, you can create Serverless Spark sessions using Spark Connect and work with your tables in BigLake metastore.

Colab notebook provides advanced features such as gen AI code assistance and error explanation, with error correction coming soon. It also supports observability for your Spark jobs and management of Spark sessions. In addition, the Colab notebooks lets you mix BigQuery SQL with Spark code in a single notebook and interoperate on the resulting tables. Once your code is ready, you can schedule notebooks via the inbuilt scheduling functionality or use BigQuery Pipelines for more complicated DAGs.

You can also use IDEs such as Visual Studio Code or JupyterLab for Spark application development. JupyterLab users can use the Dataproc JupyterLab plugin to simplify interactive development with Spark serverless sessions and simplify creation and submission of batch jobs via Serverless batch jobs. This plugin comes preinstalled in Vertex Workbench, so you can be productive in minutes.

On VS Code, you can use the Cloud Code extension, which supports development against a range of Google Cloud services. After configuring the Cloud Code extension, you can browse BigQuery datasets and tables, browse and manage your Dataproc compute resources (clusters, serverless interactive sessions and batch), create Spark notebooks from available templates or start developing on your own, and then schedule your workloads all from VS Code. This choice in development tooling allows you to pick one that best suits your workflow, without sacrificing access to the power of Dataproc Spark.

Distributed training and inference with GPU support

Dataproc’s ML runtimes are built to run distributed training and inference, leveraging frameworks like XGBoost, TensorFlow, and PyTorch, all pre-configured for GPU utilization. For example, for distributed training with XGBoost on Spark, you can leverage the pre-installed xgboost.spark library. By setting parameters such as num_workers to distribute the task across Spark executors and device=”cuda”, you can effectively train your models on multiple GPUs, significantly speeding up the training process for large datasets. Here’s an example of how to configure XGBoost classifier for distributed GPU training on your Spark cluster:

code_block: <ListValue: [StructValue([(‘code’, ‘from xgboost.spark import SparkXGBClassifierrnfrom pyspark.sql import SparkSessionrnrn# Configure the XGBoost classifier for distributed GPU trainingrnxgb_classifier = SparkXGBClassifier(rn featuresCol=”features”,rn labelCol=”label”,rn num_workers=spark.sparkContext.defaultParallelism,rn device=”cuda”, # Enable GPU trainingrn # Other XGBoost parameters like max_depth etc.rn max_depth=6 rn)rnrn# Train the modelrnxgb_model = xgb_classifier.fit(train_df)rnrn# Model persistence and predictionrnxgb_model.save(“path/to/your/xgboost_spark_model”)rnpredictions = xgb_model.transform(test_df)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3ded3d9760>)])]>

Interactive environment customization for Spark Connect

When working interactively with Spark, such as in a Colab notebook using Spark Connect, ensuring Python library consistency between your client and the Spark cluster is crucial. Dataproc simplifies adding PyPI packages dynamically to a Spark session by extending the addArtifacts method. You can now specify the list of packages to install/upgrade/downgrade in version-scheme (same as pip install). This instructs the Spark Connect server to install the package and its dependencies, making them available to workers for your UDFs and other code.

code_block: <ListValue: [StructValue([(‘code’, ‘# Installs textdistance(specific version) and random2 (latest) library on the cluster. UDFs using textdistance and random2 can now run on worker nodesrnrnspark.addArtifacts(“textdistance==4.6.1”, “random2”, pypi=True)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3ded3d9790>)])]>

In addition you can also customize your Spark environment on Dataproc with Init scripts and custom images.

MLOps via Vertex AI

Dataproc works with Vertex AI, Google Cloud’s unified platform for AI and ML, helping to improve MLOps for your AI/ML workflows with Spark. Using the Vertex AI SDK directly within your Dataproc Spark code enables experiment tracking and model management, allowing you to:

Track experiments: Track log parameters, metrics, and artifacts from your Dataproc Spark training jobs to Vertex AI Experiments. This allows you to compare runs, visualize results, and reproduce experiments reliably.
Register models: Once training is complete, register your trained models into the Vertex AI Model Registry. This provides a central repository for model versioning, staging, and governance, simplifying the path to deployment.

code_block: <ListValue: [StructValue([(‘code’, ‘# code snippet for Dataproc Spark on GCE. Some details ommited for brevityrnrnrnfrom google.cloud import aiplatformrnrn# — Initialize Vertex AI SDK & Enable Autologging —rnaiplatform.init(project=PROJECT_ID, location=REGION, experiment=EXPERIMENT_NAME)rnrn# Start a run to log experiment metricsrnaiplatform.start_run(run=RUN_NAME)rnrnxgb_spark_estimator = SparkXGBClassifier(rn featuresCol=”features”,rn labelCol=”label” rn # Add other XGBoost parameters needed for trainingrn )rnrn# train modelrntrained_spark_model = xgb_spark_estimator.fit(train_df)rnrn# register modelrn# 1. Get the underlying XGBoost model and save itrnnative_booster = trained_spark_model.get_booster()rnnative_booster.save_model(local_path)rnrn# Log relevant metrics manually in Vertex Experiments rnmetrics={parameter_name:parameter_value}rnaiplatform.log_metrics(metrics)rnrnrn# 2. Upload to GCSrndestination_gcs_object_name = f”{GCS_MODEL_ARTIFACT_DIR_NAME}/{MODEL_FILENAME}” rnstorage.Client(project=PROJECT_ID).bucket(GCS_BUCKET_NAME).blob(destination_gcs_object_name).upload_from_filename(local_path) rnrn# 3. Register to Vertex AI Model RegistryrnPRE_BUILT_SERVING_CONTAINER_IMAGE_URI = “us-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.2-1:latest”rnrnregistered_model = aiplatform.Model.upload(rn display_name=MODEL_DISPLAY_NAME,rn artifact_uri=GCS_ARTIFACT_DIRECTORY_URI,rn serving_container_image_uri=PRE_BUILT_SERVING_CONTAINER_IMAGE_URI,rn description=”Spark XGBoost model”,rn sync=True # Wait for the model to be uploaded and registeredrn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3ded3d9340>)])]>

This integration makes your AI/ML workloads on Spark more manageable, reproducible, and deployable, per your organization’s wider MLOps strategy.

Deploy to production

Move from interactive development to production easily.

When using BigQuery Colab notebooks for development, you get Git support to version-control your code and go through your CI/CD flow. You can also schedule your Spark notebook using BigQuery’s built-in pipeline feature, which allows you create single scheduled notebooks or more complicated DAGs, chaining multiple notebooks or queries. You can run these pipelines using the user account or a service account for production pipelines.

BigQuery Pipelines let you compose your flow into discrete tasks, so you can mix Apache Spark on Dataproc and BigQuery execution. In the following BigQuery pipeline, the first task ingests raw data via a BigQuery query, then the data is transformed via Apache Spark via a notebook task. This notebook contains the pertinent Spark transform steps. Finally, the graph splits into two parallel tasks: a notebook that produces a report based on output of the previous task, and a final query that cleans up the initial ingested data.

When using an IDE you can achieve a similar flow by using the Git client of these IDEs to version your Spark code. You can also create and deploy pipelines using Cloud Composer, Google Cloud’s managed serverless Apache Airflow offering. You can run jobs on your existing Dataproc clusters, ephemeral job clusters, or on Serverless Batch.

code_block: <ListValue: [StructValue([(‘code’, ‘# Following code illustrates how to schedule serverless batch jobs with Cloud Composer (Airflow)rnrnrn# import statements and configurations statements omitted for brevityrnrnrn# Define the full job payload for DataprocCreateBatchOperatorrnwith models.DAG(rn “dataproc_batch_operators”, # The id you will see in the DAG airflow pagern default_args=default_args, # The interval with which to schedule the DAGrn schedule_interval=datetime.timedelta(days=1), # Override to match your needsrn) as dag:rn create_batch = DataprocCreateBatchOperator(rn task_id=”batch_create”,rn batch={rn “pyspark_batch”: {rn “main_python_file_uri”: PYTHON_FILE_LOCATION,rn “jar_file_uris”: [SPARK_BIGQUERY_JAR_FILE],rn },rn “environment_config”: {rn “peripherals_config”: {rn “spark_history_server_config”: {rn “dataproc_cluster”: PHS_CLUSTER_PATH,rn },rn },rn },rn },rn batch_id=”create-xgboost-batch”,rn )’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3ded3d9910>)])]>

AI/ML-ready Apache Spark

With Dataproc, you can build your AI/ML workloads with Apache Spark more easily. By providing pre-configured ML Runtimes with GPU support, simplifying Python dependency management for interactive sessions via Spark Connect, enabling development from your preferred IDE, and offering seamless integration with Vertex AI for MLOps, Dataproc accelerates the entire ML lifecycle. Move from exploration and training to robust, production-ready Spark ML pipelines. Explore the Dataproc documentation today to start leveraging these capabilities.

Read More for the details.

2025 07 17

GCP – Tzafon selects Google Cloud to build next generation agentic machine intelligence

Tibor Kiss Cloud, Google Cloud gcp

Tzafon, a San Francisco-based startup and AI R&D lab, is partnering with Google Cloud to utilize Google’s AI-optimized infrastructure and cloud services, which will help Tzafon deliver automation at large scale.

The Tzafon team aims to do this by building systems and models that can support multiple, autonomous AI agents that are capable of working together and interacting with common interfaces like applications, operating systems, and web browsers.

Now, Tzafon will partner with Google Cloud to access the compute resources and cloud services it needs to train its new multi-agent models and to develop new automation frameworks that will allow Tzafon’s agents to collaborate more quickly and seamlessly.

Through its partnership with Google Cloud, Tzafon will:

Utilize NVIDIA GPUs through Google Cloud to train new machine intelligence models capable of managing multiple AI agents.
Develop individual agents capable of interacting with operating systems, web browsers, and applications on a person’s behalf.
Scale workloads up or down quickly using Google Kubernetes Engine.
Use BigQuery to effectively manage the large volumes of data underpinning its systems.

Today, more than 60% of the world’s generative AI startups are using Google Cloud. Now, Tzafon joins them in gaining access to Google Cloud’s AI stack, with its reliable compute capacity,strong price performance, robust data infrastructure, and elasticity to scale quickly, among many other features that are essential in the emerging field of AI.

You can read more about Tzafon’s mission in their white paper and learn more about how thousands of AI startups are building with Google Cloud here.

Read More for the details.

2025 07 17

GCP – Securely deploy ChromeOS Flex – from anywhere

Tibor Kiss Cloud, Google Cloud gcp

Just three years ago, ChromeOS Flex was born with a mission to breathe new life into existing hardware, offering a modern, sustainable, and secure experience in the process. Today, we’re proud to have over 600 certified devices, and millions of ChromeOS Flex devices being used around the world every day. But we aren’t resting on our laurels. I’m happy to share some much anticipated features coming to ChromeOS Flex that will further empower businesses to modernize their fleet.

You may have seen stories like Strawberry Hotels, who recovered from a ransomware attack in less than 48 hours by repurposing their existing Windows devices with ChromeOS Flex, or Foundations Health Solutions, who saved $1 million while improving patient care by adopting ChromeOS and ChromeOS Flex. These stories highlight how ChromeOS Flex is helping organizations of all sizes extend the lifespan of their hardware, reduce IT costs, and enhance security. And now with tools like remote deployment and expanded network security and VPN application support, it’s even easier for businesses to take advantage of ChromeOS Flex.

Upgrade your fleet with ease

The first advancement we’re introducing is ChromeOS Flex remote deployment. We understand that for large organizations with many globally distributed devices, relying solely on USB drives to migrate to ChromeOS Flex can be cumbersome and time-consuming. This new tool changes that, allowing most managed Windows devices to automatically convert to ChromeOS Flex, from anywhere, with no complex IT requirements.

This tool is designed to offer significant efficiency gains, and streamline ChromeOS Flex deployment across your organization. The best part? It’s simple:

Configure: Create a ChromeOS Flex package
Deploy: Push the package with your preferred Windows management tool, leveraging your existing infrastructure
Convert: On execution, the package will automatically convert your devices to ChromeOS Flex, join predefined Wi-Fi networks, and automatically enroll your device in the Google Admin console

24799_ChromeOS Flex_Remote Deployment May '25_Social_v3_TU

To ensure the security and stability of your existing environment, we’ve designed this tool with administrator-level access in mind, giving IT admins complete control throughout the migration process. For a deep dive into the technical details and how to get started, click here to learn more about ChromeOS Flex remote deployment.

Enhanced security with Android network security and VPN application support

We recognize the critical need for businesses to secure data and access to remote resources, corporate networks, and sensitive information, no matter where work happens. Given that VPN platforms are already the most popular Android applications for our enterprise ChromeOS customers, we’re thrilled to share that Android network security applications from leading providers are now also supported on ChromeOS Flex.

This enhancement boosts ChromeOS Flex’s security capabilities, making it an even more robust solution for repurposing legacy hardware. Businesses and end-users alike will benefit from the seamless compatibility with leading enterprise network security providers such as Cisco, Zscaler, and Fortinet. This enables consistent security policies and a unified approach to network connectivity and data protection across both ChromeOS and ChromeOS Flex devices, ensuring your data is protected at all times, no matter where you get work done.

Three years in, ChromeOS Flex is not just a solution for repurposing hardware–it’s a strategic tool for modernizing your IT infrastructure and strengthening security as well. With ChromeOS Flex remote deployment and Android network security application support, ChromeOS Flex is ready to meet the demands of large organizations, hybrid work environments, and unique security needs.

Interested in learning more about ChromeOS Flex remote deployment? Visit our website.

Read More for the details.

2025 07 16

GCP – How Renault Group is using Google’s software-defined vehicle industry solution

Tibor Kiss Cloud, Google Cloud gcp

It’s funny to think of Renault Group, the massive European car manufacturer, as a software company, but in many ways, it is. Renault Group subsidiary Ampere Software Technology is dedicated to developing and integrating advanced software solutions for intelligent electric vehicles, aiming to create software-defined vehicles (SDVs) with enhanced customer experiences and new services.

Ampere develops Renault Group’s software-defined vehicle based on large Android Automotive OS (AAOS) codebases. But like all software companies, it struggled to contain costs, sync code bases, maintain adequate testing regimens, and onboard new talent.

Building on the existing partnership between Google Cloud and Renault Group, Ampere chose a Google Cloud solution for its software-defined vehicle development. This solution, leveraging Google Cloud Workstations and Gemini Code Assist, effectively streamlined the process, making it more secure and productive by eliminating many common development hurdles.

For security-conscious enterprises, Cloud Workstations offer fully managed development environments. Concurrently, Gemini Code Assist, driven by Gemini 2.5, provides secure generative AI coding assistance and agents across the entire software development lifecycle. And by utilizing Google’s virtual twin technology, specifically developed for AAOS, Ampere constructed full digital counterparts to their automobiles.

Let’s take a closer look at the components in this solution:

Google Cloud Workstations

Google Cloud Workstations significantly boosts Android Open Source Project (AOSP) developer productivity in general. In the context of Ampere, it offers on-demand development environments with persistent disks, pre-synced with the customer’s AAOS repositories. This eliminates lengthy sync and build times, allowing developers to access their work from anywhere. Ampere’s Platform Admins provision these workstations, drastically cutting down the time it takes for developers to become productive. Developers have instant access to powerful virtual machines with ample vCPUs, RAM, and fast SSD storage — important for the demanding emulators that they use. This resource elasticity prevents bottlenecks and accelerates development. Then, secure, authenticated cloud access and Google Cloud security tools helps to significantly reduce IP leaks and unauthorized access. Finally, having a consistent development environment prevents “works on my machine” problems and reduces debugging time. The flexible access and disk configurations enhance AAOS developers’ productivity by enabling workstation access from anywhere, and preserving codebases, configurations, and build artifacts across sessions via persistent disks, eliminating repeated repo syncs.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea3f43e9760>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Gemini Code Assist for AI-powered Android development

To help, Ampere offered their developers Android Studio and Code OSS IDEs integrated with Gemini Code Assist, helping to address code management complexity, reduce steep learning curves, and prevent errors. Gemini Code Assist uses retrieval-augmented generation (RAG) to access Ampere’s private codebases and documentation, providing relevant and accurate code suggestions tailored to their Android development standards and conventions. It sped up understanding of their vast codebases by explaining functions, summarizing modules, and suggesting next steps, benefiting new developers and those working on different parts of the SDV software. It also helped boost their Android development productivity by automating boilerplate code, suggesting APIs, and finding potential problems, letting developers concentrate on core SDV logic instead of repetitive tasks.

The virtual twin

Google Cloud enabled Ampere’s AAOS developers and testers to use a “virtual twin” of a car, resolving resource and complexity issues associated with physical or poorly managed virtual testing. Developers can use powerful Compute Engine instances and specialized Android emulators like Cuttlefish to create accurate virtual vehicle embedded systems. This enables rigorous software testing with virtual hardware, helping to ensure robust performance before building physical prototypes. AAOS developers can also use scalable virtual devices for parallel testing, comprehensive regression suites, and simulations, accelerating the “test” phase of the CI/CD pipeline and improving the SDV lifecycle. The virtual twin is integrated with the Cloud Workstations development environment and the customer’s CI/CD pipelines (e.g., powered by GKE and GitLab), allowing developers to build their AAOS changes on their workstation, trigger automated tests on a fleet of virtual twins, and get immediate feedback on their code.

Tangible returns of modernized SDV development

By combining the robust, managed infrastructure of Cloud Workstations, Gemini Code Assist’s intelligent assistance, and virtual twins, Google Cloud is helping Renault modernize automotive software development, accelerate innovation, and bring new features to market at unprecedented speeds.

“…to invest and build the software platform for the software defined vehicle in Europe ..you need the tools and this is where Google shines … At the heart of it is AI and when we talk about code generation, instantiation of things that you need to be running immediately versus waiting for the thing to compile and having be available to the developer … to make our engineers more efficient so we can do more with less time because we are challenged with time.” – Henry Bzeih, Ampere Chief Software Officer (Renault Group)

Google Cloud and Gemini Code Assist offer automotive OEMs a transformational shift, extending beyond mere tool adoption to significantly impacting business results. This transition enhances competitiveness, profitability, and innovation speed.

Traditionally, onboarding new developers takes days and is costly. AAOS development often involves time-consuming setup, dependency management, and build system troubleshooting. By reducing environment setup time from days to minutes — including repository syncing and toolchain configuration — and utilizing AI assistance, the development process is vastly accelerated.

OEMs worry about intellectual property leaks from local devices. Cloud Workstations addresses this concern by operating within the customer’s Virtual Private Cloud. This approach prevents source code from being synced locally and exposed on endpoints, reducing security risks.

While cloud infrastructure has associated costs, it yields substantial cost optimization. Eliminating high-end local machines, minimizing wasted developer time on environment management, and speeding up timelines all lower total development costs. The ability to quickly adjust cloud resources ensures payment only for active usage, avoiding idle hardware expenses, and a better return on engineering investment.

Finally, having a virtual twin of the car improves quality assurance and validation. Instead of relying on limited prototypes or unreliable local emulators, developers can use detailed virtual car models, facilitating faster iteration, scalable testing, early bug detection, and advanced scenario simulation.

Automotive companies are not only adopting new technologies but are also reshaping their development capabilities by utilizing Cloud Workstations and Gemini Code Assist. For more, watch the fireside chat with Henry Bzeih, Ampere Chief Software Officer (Renault Group) on their success with this SDV Industry solution.

And if you’re in the automotive industry, you can get started on setting up custom AAOS development environments with Gemini Code Assist by referring to our GitHub repository. And of course, your Google Partner Engineering or Customer Engineering contacts are ready to help!

Read More for the details.

2025 07 16

GCP – How to integrate your Cloud SQL for MySQL database with Vertex AI & vector search

Tibor Kiss Cloud, Google Cloud gcp

Search is a critical component of many modern applications – whether searching for products in an online storefront, finding solutions to your customers’ support cases, or building the perfect playlist. But traditional keyword searches often miss the deeper meaning of data. Vector embeddings, however, capture the complexities of your data, enabling highly accurate and powerful searches. Vector embeddings are numerical representations of your data, generated by models, that help computers understand nuances and meaningfully compare data points.

Cloud SQL for MySQL’s generally available vector support allows you to store vector embeddings, build persistent indexes on them, and perform ANN search between them. Now, we have a new Vertex AI integration that makes this process easy by helping you generate vector embeddings using a simple SQL function, eliminating the need for external embedding generation. You can also leverage any Vertex AI model directly from MySQL, bringing advanced AI capabilities closer to your data.

Ready to see how AI can improve searches in your application and help you understand and predict customer trends? Let’s dive in.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea410747b50>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Generate, store, & search vector embeddings

Let’s say you have an ecommerce application and are looking to see how Vertex AI and vector search with Cloud SQL for MySQL can power product searches and help you analyze customer reviews.

After setting up Vertex AI permissions and enabling the feature, you can start embedding your existing product data with the `mysql.ml_embedding` function. You just need to specify the model name and then pass the text you would like to embed–all formatting and parsing of Vertex AI requests is handled for you.

For example, in your ecommerce application you can use a query like the following to directly store the output of `mysql.ml_embedding` in a VECTOR type column:

code_block: <ListValue: [StructValue([(‘code’, “ALTER TABLE my_products ADD COLUMN embedding vector(768) using varbinary;rnrn– In this example, product_description can be a text column describing the product in a given rowrnUPDATE my_products SET embedding = mysql.ml_embedding(‘text-embedding-005’, product_description);”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ea4107474c0>)])]>

To quickly find similar products, create a vector index for ANN search. You can select the best distance measure for your use case and customize the number of leaves to fine-tune speed and recall.

code_block: <ListValue: [StructValue([(‘code’, ‘CREATE VECTOR INDEX my_products_embedding_idx ON my_products(embedding) USING SCANN DISTANCE_MEASURE=COSINE NUM_LEAVES=500;’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ea410747f70>)])]>

When a customer searches in your application, you can embed their query using `mysql.ml_embedding` and pass it to ANN search. For example, to find clothes for a specific occasion in your database:

code_block: <ListValue: [StructValue([(‘code’, “SELECT mysql.ml_embedding(‘text-embedding-005′,’Dress for a springtime black-tie wedding’) into @query_vector;rnrnSELECTrn approx_distance(embedding ,@query_vector, ‘distance_measure=cosine’) as distancernFROMrn my_productsrnORDER BYrn distance ASCrnLIMIT 5;”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3ea410747e20>)])]>

Get responses from Gemini & other Vertex AI models

The Cloud SQL Vertex AI integration lets you make requests to any Vertex AI endpoint, including custom models fine-tuned to your datasets or pre-trained models like Gemini. This means that in addition to being able to generate vector embeddings for product descriptions, you can use Vertex AI to make predictions about things like what your users will purchase or whether a purchase is fraud.

For example, say you store customer reviews of your products in a MySQL database and want to perform analysis on them to get an idea of customer sentiment. You can invoke a Vertex AI LLM to determine customer satisfaction directly from the MySQL client and store its assessment.

code_block: <ListValue: [StructValue([(‘code’, ‘SELECTrn mysql.ml_predict_row(rn ‘publishers/google/models/gemini-2.0-flash:generateContent’,rn CONCAT(rn'{ “contents”: [{ “role”: “user”, “parts”:[rn {rn “text”:rn “Please rate customer sentiment from 1-10 based on the following review. Only output the number, nothing else. Text: \n’, review_content, ‘ “rn }]}]}rn ‘)) from customer_reviews;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ea410747c40>)])]>

Start building

Want to get started with the MySQL Vertex AI integration after seeing how it enhances product search and analysis? Explore our detailed codelab for an embeddings workflow tutorial or dive into our documentation for a comprehensive overview of all capabilities.

Read More for the details.

2025 07 16

GCP – Go beyond data: Four steps to master enterprise excellence

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: This is the first in a series of five blog posts dedicated to data transformation powered by Google Cloud and its ecosystem of data and analytics partners.

Everyone is trying to determine how best to leverage new AI technologies. But to be able to get the most out of AI, you need a strong data foundation. For that, you need to get your house in order.

The challenges around data aren’t new. We’ve always known that data has immense potential, but actualizing it has been persistently difficult. Initially, the conversations were about accumulation — building bigger and better data repositories. Then the focus shifted to extracting insights from that data, with a proliferation of analytics tools and techniques. Yet we still grapple with multiple challenges like data silos, quality concerns, and complexity. In fact, even with sophisticated tools, vast amounts of data remain untapped and underutilized. According to a 2024 survey by Wavestone, while 82% of organizations are increasing their investments in data and analytics, fewer than half consider themselves industry leaders in this domain. Clearly, it’s time for a new chapter in the data narrative.

True data leadership

True data leadership demands a holistic approach to data, one that prepares us for an AI-powered future. It requires enterprise intelligence. Enterprise intelligence is not just about having data. It’s about activating it strategically to drive smarter decisions, enhance customer experiences, guide product innovation, and ultimately fuel business growth.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea3de8f0a60>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

What is enterprise intelligence?

Enterprise intelligence refers to the ability of an organization to effectively use data, analytics, and advanced machine learning (ML) and AI technologies to make informed decisions, optimize operations, and drive innovation. It represents a holistic approach to data management that integrates people, processes, and technology by mastering four data capabilities in sequence:

Open data access
Unified insights
Mature AI/ML analytics
Strong data culture

Let’s take a deeper dive into each of these capabilities. With open data access, you’re breaking down silos and enabling the seamless flow of information across your organization. Google Cloud partners like Fivetran and Confluent support this with solutions for data integration and real-time data streaming.

Once you’ve achieved open data access, unified insights emerge, creating a single source of truth. Partners such as Elastic, MongoDB, and Neo4j play a crucial role in this step, offering solutions for flexible data storage and centralized data management to help uncover complex relationships within your data. With this solid foundation in place, you are now positioned to leverage mature AI/ML analytics to extract deeper insights, predict future trends, and automate complex processes, driving innovation and efficiency. Partners like Databricks excel in this area, providing powerful platforms for unified analytics, AI model development, and real-time data processing.

Finally, a strong data culture ensures that data is not just an asset but a core part of the organization’s DNA, with data embedded into the very fabric of the way a company operates. Collibra strengthens your data culture by enabling both business and technical users to discover, understand, and trust data across hybrid and multi-cloud environments — fueling collaboration and scalable decision-making.

Insights for AI

Enterprise intelligence is the necessary foundation to fuel AI-driven growth. However, achieving this level of data maturity requires a strategic blend of the right tools, technologies, and expertise. Google Cloud and its robust data partner ecosystem provide a comprehensive suite of solutions and services, along with specialized integrations.

The choice is yours. In today’s rapidly evolving landscape, those who fail to master enterprise intelligence risk being left behind. Over the next four blogs, we’ll delve more deeply into the four steps you need to take your organization to true data-driven enlightenment — and ready to harness the full potential of your data and AI.

Ready to get started?

Take this short assessment to see where you stand in enterprise intelligence. The future is data-driven and AI-powered — let’s build it together.

Read More for the details.

2025 07 16

GCP – Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough

Tibor Kiss Cloud, Google Cloud gcp

The excitement around open Large Language Models like Gemma, Llama, Mistral, and Qwen is evident, but developers quickly hit a wall. How do you deploy them effectively at scale?

Traditional load balancing algorithms fall short, as they fail to account for GPU/TPU load status, leading to inefficient routing for computationally intensive AI inference with its highly variable processing times. This directly impacts serving performance and the user experience.

This guide demonstrates how Google Kubernetes Engine and the new GKE Inference Gateway together offer a robust and optimized solution for high-performance LLM serving, specifically by overcoming the limitations of traditional load balancing with smart routing aware of AI-specific metrics like pending prompt requests and critical KV Cache utilization.

We’ll walk through deploying an LLM using the popular vLLM framework as the inference backend. We’ll use Google’s gemma-3-1b-it model and NVIDIA L4 GPUs as a concrete, easy-to-start example (avoiding the need for special GPU quota requests initially). The principles and configurations shown here apply directly to larger, more powerful models and diverse hardware setups.

Why Use GKE Inference Gateway for LLM Serving?

GKE Inference Gateway isn’t just another ingress controller; it’s purpose-built for the unique demands of generative AI workloads on GKE. It extends the standard Kubernetes Gateway API with critical features:

Intelligent load balancing: Goes beyond simple round-robin. Inference Gateway understands backend capacity, including GPU-specific metrics like KV-Cache utilization, to route requests optimally. For LLMs, the KV-Cache stores the intermediate attention calculations (keys and values) for previously processed tokens. This cache is the primary consumer of GPU memory during generation and is the most common bottleneck. By routing requests based on real-time cache availability, the gateway avoids sending new work to a replica that is near its memory capacity, thus preventing performance degradation and maximizing GPU usage, increasing throughput, and reducing latency.
AI-aware resource management: Inference Gateway recognizes AI model serving patterns. This enables advanced use cases like serving multiple different models or fine-tuned variants behind a single endpoint. It is particularly effective at managing and multiplexing numerous LoRA adapters on a shared pool of base models. This architecture dramatically increases model density on shared accelerators, reducing costs and operational complexity when serving many customized models. It also enables sophisticated, model-aware autoscaling strategies (beyond basic CPU/memory).
Simplified operations: Provides a dedicated control plane optimized for inference. It seamlessly integrates with GKE, offers specific inference dashboards in Cloud Monitoring, and supports optional security layers like Google Cloud Armor and Model Armor, reducing operational overhead.
Broad model compatibility: The techniques shown work with a wide array of Hugging Face compatible models.
Flexible hardware choices: GKE offers access to various NVIDIA GPU types (L4, A100, H100, etc.), allowing you to match hardware resources to your specific model size and performance needs. (See GPU platforms documentation).

The Walkthrough: Setting Up Your Inference Pipeline

Let’s get started building out our inference pipeline. By following these steps, you will deploy and configure the essential infrastructure to serve your LLMs with the high performance and scalability demanded by real-world applications, built on GKE and optimized by the Inference Gateway.

Environment Setup

Ensure your Google Cloud environment is ready. All steps in this walkthrough are tested in Google Cloud Shell. Cloud Shell has the Google Cloud CLI, kubectl, and Helm pre-installed.

1. Google Cloud project: Have a project with billing enabled.

code_block: <ListValue: [StructValue([(‘code’, ‘export PROJECT_ID=”your-project-id”rngcloud config set project $PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cddc10>)])]>

2. Google Cloud CLI: Ensure gcloud is installed and updated. Run gcloud init if needed.

3. kubectl: Install the Kubernetes CLI: gcloud components install kubectl

4. Helm: Install the Helm package manager (Helm installation guide).

5. Enable APIs: Activate necessary Google Cloud services.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud services enable \rn container.googleapis.com \rn compute.googleapis.com \rn networkservices.googleapis.com \rn monitoring.googleapis.com \rn logging.googleapis.com \rn modelarmor.googleapis.com \rn –project=$PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cddf10>)])]>

6. Configure permissions (IAM): Grant required roles. Remember to follow the principle of least privilege in production environments.

code_block: <ListValue: [StructValue([(‘code’, ‘export USER_EMAIL=$(gcloud config get-value account) # Or your service account emailrnrn# Grant necessary roles (adjust for production)rngcloud projects add-iam-policy-binding $PROJECT_ID –member=”user:${USER_EMAIL}” –role=”roles/container.admin” –condition=Nonerngcloud projects add-iam-policy-binding $PROJECT_ID –member=”user:${USER_EMAIL}” –role=”roles/compute.networkAdmin” –condition=Nonern# You might need roles/iam.serviceAccountUser depending on your node service account setup’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cddee0>)])]>

7. Set region: Choose a region with the GPUs you need.

code_block: <ListValue: [StructValue([(‘code’, ‘export REGION=”us-central1″ # Example regionrngcloud config set compute/region $REGION’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cdd190>)])]>

8. Hugging Face token: Obtain a Hugging Face access token (read permission minimum). If using Gemma models, accept the license terms on the Hugging Face model page.

code_block: <ListValue: [StructValue([(‘code’, ‘export HF_TOKEN=”your-huggingface-token”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cdd400>)])]>

Create GKE Cluster Resources

Set up the GKE cluster and necessary networking components.

1. Proxy-only subnet (run once per region/VPC): Required for Inference Gateway’s regional load balancer.

code_block: <ListValue: [StructValue([(‘code’, ‘export VPC_NETWORK_NAME=”default” # Or your specific VPC networkrnexport PROXY_SUBNET_RANGE=”10.120.0.0/23″ # Choose an unused rangernexport PROXY_SUBNET_NAME=”proxy-only-subnet-${REGION}”rnrngcloud compute networks subnets create $PROXY_SUBNET_NAME \rn –purpose=REGIONAL_MANAGED_PROXY \rn –role=ACTIVE \rn –region=$REGION \rn –network=$VPC_NETWORK_NAME \rn –range=$PROXY_SUBNET_RANGE \rn –project=$PROJECT_ID || echo “Proxy subnet ‘${PROXY_SUBNET_NAME}’ may already exist.”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cddfa0>)])]>

2. GKE standard cluster: Inference Gateway currently requires a Standard cluster.

code_block: <ListValue: [StructValue([(‘code’, ‘export CLUSTER_NAME=”llm-inference-cluster” # Choose a namernrngcloud container clusters create $CLUSTER_NAME \rn –project=$PROJECT_ID \rn –region=$REGION \rn –release-channel=stable \rn –machine-type=e2-standard-4 `# Basic type for default pool` \rn –num-nodes=1 \rn –network=$VPC_NETWORK_NAME \rn –enable-ip-alias `# Required for VPC-native` \rn –gateway-api=standard `# Enable Gateway API support` \rn –scopes=https://www.googleapis.com/auth/cloud-platformrn # –subnetwork=YOUR_PRIMARY_SUBNET_NAME # Optional: Specify if not using ‘default’ subnetrnrn# Cluster creation takes several minutes…’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cdd9d0>)])]>

3. Configure kubectl:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud container clusters get-credentials $CLUSTER_NAME –region $REGION –project $PROJECT_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cdd6a0>)])]>

4. Accelerator node pool: Add nodes with GPUs. Ensure you have quota for the chosen GPU type and the zone supports it.

code_block: <ListValue: [StructValue([(‘code’, ‘# Verify zone availability for your chosen GPUrnexport ACCELERATOR_ZONES=”us-central1-a” # Example zone for L4 in us-central1rnexport ACCELERATOR_POOL_NAME=”l4-pool”rnexport GPU_TYPE=”nvidia-l4″rnexport MACHINE_TYPE=”g2-standard-8″ # G2 machine type compatible with L4rnrngcloud container node-pools create $ACCELERATOR_POOL_NAME \rn –cluster=$CLUSTER_NAME \rn –project=$PROJECT_ID \rn –region=$REGION \rn –node-locations=$ACCELERATOR_ZONES \rn –machine-type=$MACHINE_TYPE \rn –accelerator type=$GPU_TYPE,count=1,gpu-driver-version=latest `# Request 1 L4 GPU per node` \rn –enable-autoscaling \rn –num-nodes=1 `# Initial node count` \rn –min-nodes=1 \rn –max-nodes=3 `# Adjust max scaling as needed` \rn –disk-size=100Gi `# Adjust based on model size` \rn –enable-gvnic `# Recommended for improved networking` \rn –scopes=https://www.googleapis.com/auth/cloud-platformrnrn# Node pool creation also takes several minutes…’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cdd7c0>)])]>

Install Gateway API and Inference Gateway CRDs

Apply the Custom Resource Definitions (CRDs) that define the necessary Kubernetes objects.

NOTE: Using kubectl apply with remote URLs means you’re fetching the manifests at execution time. For production, consider vendoring these manifests or referencing specific tagged releases.

code_block: <ListValue: [StructValue([(‘code’, “# — Standard Gateway API CRDs —rn# Provides the base Gateway, HTTPRoute, etc. resourcesrnkubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/standard-install.yamlrnrn# — Required GKE Gateway CRDs —rn# Needed for GKE’s Gateway controller implementation (BackendPolicy, HealthCheckPolicy)rnkubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/main/config/crd/networking.gke.io_gcpbackendpolicies.yamlrnkubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/main/config/crd/networking.gke.io_healthcheckpolicies.yamlrnrn# — Optional but Recommended GKE Gateway CRDs —rn# Enable advanced GKE-specific features (SLA, Security, Affinity, etc.)rnkubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/main/config/crd/networking.gke.io_gcpgatewaypolicies.yamlrnkubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/main/config/crd/networking.gke.io_gcproutingextensions.yamlrnkubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/main/config/crd/networking.gke.io_gcpsessionaffinitypolicies.yamlrnkubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/main/config/crd/networking.gke.io_gcptrafficextensions.yamlrnrn# — Inference Gateway CRDs —rn# Defines InferencePool and InferenceModel resourcesrn# Check releases for the latest stable version (e.g., v0.3.0)rn# –> IMPORTANT: Always check the official docs for the recommended version <–rnkubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yamlrnrn# — Create Hugging Face Token Secret —rn# Securely stores your HF token for the pods to usernkubectl create secret generic hf-secret \rn –from-literal=hf_api_token=$HF_TOKEN \rn –dry-run=client -o yaml | kubectl apply -f -“), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cdd100>)])]>

NOTE: You might see warnings about missing annotations if GKE pre-installed some base CRDs; these are generally safe to ignore during initial setup.

Deploy the LLM Inference Server (using vLLM)

First, create the Kubernetes Secret to securely store your Hugging Face token, which the deployment will need to download the model.

code_block: <ListValue: [StructValue([(‘code’, ‘# — Create Hugging Face Token Secret —rn# Securely stores your HF token for the pods to usernkubectl create secret generic hf-secret \rn –from-literal=hf_api_token=$HF_TOKEN \rn –dry-run=client -o yaml | kubectl apply -f -‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef514cddf40>)])]>

Now, define and apply the Kubernetes Deployment for the pods running the vLLM server with our chosen model. Inference Gateway will route traffic to these pods.

Key configurations in the YAML:

metadata.labels.app: Crucial! The InferencePool will use this label to find the pods.
spec.template.spec.containers[0].resources: Must match the GPU node pool (e.g., nvidia.com/gpu: "1" for one L4).
spec.template.spec.containers[0].env.MODEL_ID: Set to the Hugging Face model ID.
spec.template.spec.nodeSelector: Ensures pods land on the GPU nodes.
spec.template.spec.containers[0].*Probe: Vital for health checks and readiness signals to the Gateway.

Save as llm-deployment.yaml:

code_block: <ListValue: [StructValue([(‘code’, ‘apiVersion: apps/v1rnkind: Deploymentrnmetadata:rn name: gemma-3-1b-deployment # Descriptive namernspec:rn replicas: 1 # Start with 1 replica; HPA can scale this laterrn selector:rn matchLabels:rn app: gemma-3-1b-server # ** Label for InferencePool selector **rn template:rn metadata:rn labels:rn app: gemma-3-1b-server # ** Label for InferencePool selector **rn ai.gke.io/model: gemma-3-1b-it # Metadata label (optional but good practice)rn ai.gke.io/inference-server: vllm # Metadata label (optional but good practice)rn spec:rn terminationGracePeriodSeconds: 60 # Allow time for graceful shutdownrn containers:rn – name: vllm-inference-serverrn # –> NOTE: Always check for the latest recommended vLLM image <–rn image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01rn resources:rn requests:rn cpu: “2”rn memory: “10Gi”rn ephemeral-storage: “10Gi”rn nvidia.com/gpu: “1” # ** Request 1 GPU **rn limits:rn cpu: “2”rn memory: “10Gi”rn ephemeral-storage: “10Gi”rn nvidia.com/gpu: “1” # ** Limit to 1 GPU **rn command: [“python3”, “-m”, “vllm.entrypoints.openai.api_server”]rn args:rn – –model=$(MODEL_ID)rn – –tensor-parallel-size=1 # Adjust for larger models/multi-GPU nodesrn – –host=0.0.0.0rn – –port=8000 # ** Port targeted by InferencePool **rn env:rn – name: MODEL_IDrn value: google/gemma-3-1b-it # ** Target Model ID **rn – name: HUGGING_FACE_HUB_TOKENrn valueFrom:rn secretKeyRef:rn name: hf-secret # Use the secret created earlierrn key: hf_api_tokenrn ports:rn – containerPort: 8000rn name: http # ** Name used by InferencePool targetPort **rn protocol: TCPrn # — Health & Readiness Probes —rn readinessProbe:rn httpGet:rn path: /health # vLLM OpenAI endpoint health check pathrn port: http # Use the named port ‘http’rn initialDelaySeconds: 60 # Allow time for model download/loadrn periodSeconds: 10rn failureThreshold: 5 # More tolerant during startup/loadrn livenessProbe:rn httpGet:rn path: /healthrn port: httprn initialDelaySeconds: 120rn periodSeconds: 20rn failureThreshold: 3rn startupProbe: # Ensures container is fully ready before marking as ‘live’rn httpGet:rn path: /healthrn port: httprn # Generous timeout for initial model download (~10-15 mins)rn failureThreshold: 90 # 90 failures * 10s period = 15 minutesrn periodSeconds: 10rn initialDelaySeconds: 30rn volumeMounts:rn – name: dshm # Mount /dev/shm for potential inter-process communicationrn mountPath: /dev/shmrn volumes:rn – name: dshmrn emptyDir:rn medium: Memoryrn sizeLimit: “4Gi” # Adjust size as needed, helps performance for some modelsrn nodeSelector:rn # ** Target the correct GPU nodes **rn cloud.google.com/gke-accelerator: nvidia-l4 # Must match the node pool accelerator labelrn # Optional: Specify driver version if neededrn # cloud.google.com/gke-gpu-driver-version: latest’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef515110100>)])]>

Apply the Deployment and wait for the pod(s) to become ready. This includes the time needed to download the model, which can take several minutes depending on model size and network speed.

code_block: <ListValue: [StructValue([(‘code’, ‘kubectl apply -f llm-deployment.yamlrnecho “Waiting for LLM deployment to become available (may take several minutes for model download)…”rn# Increase timeout if deploying very large modelsrnkubectl wait –for=condition=Available –timeout=15m deployment/gemma-3-1b-deployment’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef515110070>)])]>

Configure GKE Inference Gateway Resources

Now, define how the Inference Gateway manages traffic to the deployed model server pods.

1. Create the inference pool: This resource groups the backend pods using the labels defined in the Deployment. We use the official Helm chart for this.

code_block: <ListValue: [StructValue([(‘code’, ‘# Ensure Helm chart version aligns with CRD version installed earlier (e.g., v0.3.0)rnexport CHART_VERSION=”v0.3.0″rnrnhelm install gemma-3-1b-pool \rn oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool \rn –version $CHART_VERSION \rn –set inferencePool.modelServers.matchLabels.app=gemma-3-1b-server `# ** MUST MATCH Deployment pod label **` \rn –set inferencePool.modelServers.targetPort.name=http `# ** MUST MATCH Deployment containerPort name **` \rn –set provider.name=gke’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef515110370>)])]>

2. Define the inference model: Specifies metadata about the model served by the pool. Save as gemma-3-1b-inference-model.yaml:

code_block: <ListValue: [StructValue([(‘code’, “apiVersion: inference.networking.x-k8s.io/v1alpha2rnkind: InferenceModelrnmetadata:rn # Name used to reference this specific model resourcern name: gemma-3-1b-it-modelrnspec:rn # This MUST match the MODEL_ID env var in the deploymentrn # AND the ‘model’ field in inference requestsrn modelName: google/gemma-3-1b-itrn criticality: Standard # Or other criticality levels if neededrn poolRef:rn # ** Links this model definition to the InferencePool **rn name: gemma-3-1b-poolrn namespace: default # Assuming default namespace”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef5151102b0>)])]>

3. Apply: kubectl apply -f gemma-3-1b-inference-model.yaml

4. Define the entry point: The gateway: Creates the actual load balancer. Save as inference-gateway.yaml:

code_block: <ListValue: [StructValue([(‘code’, “apiVersion: gateway.networking.k8s.io/v1rnkind: Gatewayrnmetadata:rn name: inference-gateway # Name for the Gateway resourcernspec:rn # Use GKE’s managed regional external HTTP(S) LBrn gatewayClassName: gke-l7-regional-external-managedrn listeners:rn – protocol: HTTP # ** Use HTTPS (TLS) for production! **rn port: 80 # External port clients connect torn name: httprn allowedRoutes:rn # Only allow routes from the same namespace to attachrn namespaces:rn from: Same”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef5151105e0>)])]>

5. Apply: kubectl apply -f inference-gateway.yaml Load balancer provisioning takes a few minutes.

6. Route the traffic: The HTTPRoute: Connects requests coming into the Gateway to the correct InferencePool based on path matching. Save as gemma-3-1b-httproute.yaml:

code_block: <ListValue: [StructValue([(‘code’, ‘apiVersion: gateway.networking.k8s.io/v1rnkind: HTTPRouternmetadata:rn name: gemma-3-1b-routernspec:rn parentRefs:rn – name: inference-gateway # ** MUST MATCH Gateway name **rn namespace: default # Assuming default namespacern rules:rn – matches:rn # Route requests starting with /v1 (common for OpenAI-compatible APIs)rn – path:rn type: PathPrefixrn value: /v1rn backendRefs:rn # ** Target the specific InferencePool **rn – name: gemma-3-1b-poolrn group: inference.networking.x-k8s.io # API group for InferencePoolrn kind: InferencePoolrn # Port MUST match the internal port of the backend pods (vLLM default is 8000)rn port: 8000rn weight: 1 # Direct all matching traffic here initially’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef515110760>)])]>

7. Apply: kubectl apply -f gemma-3-1b-httproute.yaml

Verify the Deployment

Let’s check if everything is wired up correctly.

1. Get gateway IP address: Wait for the load balancer to get an external IP.

code_block: <ListValue: [StructValue([(‘code’, ‘echo “Waiting for Gateway IP address…”rnkubectl wait –for=condition=Programmed=True gateway/inference-gateway –timeout=10mrnexport GATEWAY_IP=$(kubectl get gateway inference-gateway -o jsonpath='{.status.addresses[0].value}’ 2>/dev/null)rnrnif [ -z “$GATEWAY_IP” ]; thenrn echo “Error: Could not retrieve Gateway IP address.”rn kubectl get gateway inference-gateway -o yaml # Print full status for debuggingrnelsern echo “Gateway IP: ${GATEWAY_IP}”rnfi’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef5151100d0>)])]>

2. Send test inference request: Use curl to send a request to the Gateway endpoint.

NOTE: This uses HTTP/80 for simplicity. Production requires HTTPS/443.

code_block: <ListValue: [StructValue([(‘code’, ‘# IMPORTANT: The very first request might take longer as the model fully loads/warms up. Be patient!rncurl -i -X POST http://${GATEWAY_IP}:80/v1/completions \rn-H ‘Content-Type: application/json’ \rn-d ‘{rn “model”: “google/gemma-3-1b-it”,rn “prompt”: “Explain the main benefits of using GKE Inference Gateway for serving LLMs in simple terms.”,rn “max_tokens”: 150,rn “temperature”: 0.7rn}”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ef5149dd8e0>)])]>

If successful, you’ll receive an HTTP/1.1 200 OK status followed by a JSON response containing the LLM’s output. If you encounter issues, check the Gateway status (kubectl get gateway ... -o yaml) and the logs of your vLLM pods (kubectl logs deployment/gemma-3-1b-deployment).

Take Your LLM Serving to the Next Level

You’ve successfully deployed an LLM behind the GKE Inference Gateway! Now it’s time to explore its powerful features to build truly production-ready systems:

Scale smartly with autoscaling: Don’t guess capacity! Configure a HorizontalPodAutoscaler (HPA) for your gemma-3-1b-deployment. Scale based on the inference_pool_average_kv_cache_utilization metric provided by Inference Gateway. This ensures you scale based on actual AI workload demand, not just CPU/memory.
Gain visibility with monitoring: Keep a close eye on performance. Use the dedicated Inference Gateway dashboards in Cloud Monitoring to track request counts, latency, error rates, and KV Cache metrics at the gateway level. Combine this with backend pod metrics (GPU utilization, vLLM stats) for a complete picture.
Expand your model portfolio: Serve multiple models efficiently. Deploy other models (e.g., Llama 4, Mistral, or your own fine-tuned variants) using separate Deployments and InferencePools. Use advanced HTTPRoute rules (path-based, header-based, or even request-body-based routing via ExtensionRef) to direct traffic to the correct model pool, all behind the same Gateway IP.
Bolster security and reliability: Protect your endpoints. Configure HTTPS on your Gateway listener using Google-managed or custom TLS certificates. Apply Google Cloud Armor policies at the load balancer for robust WAF and DDoS protection. Consider integrating Model Armor for content safety filtering via GCPTrafficExtension.
Deploy larger, more powerful models: Ready for the big leagues? For models like Qwen 3 235B,, select appropriate GPUs (A100, H100), significantly increase resource requests/limits in your Deployment, adjust vLLM parameters (like tensor-parallel-size), and potentially increase probe timeouts. Inference Gateway’s intelligent load balancing and management become even more critical for efficiently handling these demanding workloads.

By combining the capabilities of modern LLMs with the operational power of GKE and Inference Gateway, you can build, manage, and scale sophisticated AI applications effectively on Google Cloud. Dive deeper into the official GKE Inference Gateway documentation for comprehensive configuration details and advanced scenarios.

Read More for the details.

2025 07 15

GCP – How to enable real time semantic search and RAG applications with Dataflow ML

Tibor Kiss Cloud, Google Cloud gcp

Embeddings are a cornerstone of modern semantic search and Retrieval Augmented Generation (RAG) applications. In short, they enable applications to understand and interact with information on a deeper, conceptual level. In this post, we’ll show you how to create and retrieve embeddings with a few lines of Dataflow ML code to enable both of these use cases. We will cover streaming and batch approaches for generating embeddings and storing them in vector databases such as AlloyDB to power semantic search and RAG applications with their vector search capabilities.

aside_block: <ListValue: [StructValue([(‘title’, ‘Get started with a 30-day AlloyDB free trial instance’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7b8eb94e80>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://goo.gle/try_alloydb’), (‘image’, None)])]>

Semantic search and RAG

In semantic search, we are able to leverage underlying relationships between words to find relevant results beyond a simple keyword match. Embeddings are vector representations of data, ranging from text to videos, that capture these relationships. In a semantic search we find embeddings that are mathematically close to our search query, allowing us to find words or search terms close in meaning that may have not shown up in a keyword search. Databases such as AlloyDB allow us to combine this unstructured search with a structured search to provide high quality relevant results. For example, the prompt “Show me all pictures of sunsets I took in the past month” includes a structured part (the date is within the past month) and an unstructured part (the picture contains a sunset).

In many RAG applications, embeddings play an important role in retrieving the relevant context from a knowledge base (such as a database) to ground the responses of large language models (LLMs). RAG systems can perform a semantic search on a Vector Database, such as AlloyDB, or directly pull data from the database to provide the retrieved results as context to the LLM so that it has access to the necessary information to generate informative answers.

Knowledge ingestion pipelines

A knowledge ingestion pipeline takes unstructured content, such as product catalogs with free form descriptions, support ticket dialogs, and legal documents, processes them into embeddings, and then pushes these embeddings into a vector database. The source of this knowledge can vary widely, from files stored in cloud storage buckets (like Google Cloud Storage) and information stored in databases like AlloyDB, to streaming sources such as Google Cloud Pub/Sub or Google Cloud Managed Service for Kafka. For streaming sources, the data itself might be raw content (e.g, plain text) or URIs pointing to documents. A key consideration when designing knowledge ingestion pipelines is how to ingest and process knowledge, whether in a batch or streaming fashion.

Streaming vs Batch: To provide the most up-to-date and relevant search results, and thus a superior user experience, embeddings should be generated in real time for streaming data. This applies to new documents being uploaded or new product images, where current knowledge holds significant business value. For less time-sensitive applications and operational tasks like backfilling, a batch pipeline is suitable. Crucially, the chosen framework must support both streaming and batch processing without requiring business logic re-implementation.
Chunking: Regardless of the data source, after reading the data there is normally a step for preprocessing the information. For simple raw text, this might mean basic cleaning. However, for larger documents or more complex content, chunking is a crucial step. Chunking breaks down the source material into smaller, manageable units. The best chunking strategy varies depending on the specific data and application.

Introducing Dataflows MLTransform for embeddings

Dataflow ML provides many out of the box capabilities to simplify the entire process of building and running a streaming or batch knowledge ingestion pipeline, allowing you to implement these pipelines in a few lines of code. For an ingestion pipeline there are typically four phases, reading from data sources, preprocessing the data, making it ready for embeddings, and finally writing the correctly shaped schema to our vector database. The new capabilities in MLTransform adds support for chunking, generation of embeddings, using Vertex or bring your own (BYO) models and specialized writers for persisting embeddings to databases such as AlloyDB.

1 How to enable real time semantic search and RAG applications with Dataflow ML — Knowledge ingestion using Dataflow

With Dataflow’s new MLTransform capabilities, the flow to chunk and generate embeddings and land them into AlloyDB can be achieved within a few lines of code:

code_block: <ListValue: [StructValue([(‘code’, ‘…rndef to_chunk(product: Dict[str, Any]) -> Chunk:rn return Chunk(rn content=Content(rn text=f”{product[‘name’]}: {product[‘description’]}”rn ), rn id=product[‘id’], # Use product ID as chunk IDrn metadata=product, # Store all product info in metadatarn )rnrn…rnwith beam.Pipeline() as p:rn _ = (rn prn | ‘Create Products’ >> beam.Create(products)rn | ‘Convert to Chunks’ >> beam.Map(to_chunk) rn | ‘Generate Embeddings’ >> MLTransform(rnwrite_artifact_location=tempfile.mkdtemp())rn.with_transform(huggingface_embedder)rn | ‘Write to AlloyDB’ >> VectorDatabaseWriteTransform(alloydb_config)rn )’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7b8e746d30>)])]>

Importantly, for the embeddings step shown above in Dataflow ML supports the ability to make use of Vertex AI embeddings as well as models from other model gardens and the ability to “bring your own model (BYOM)” hosted on Dataflows workers.

The example above creates a single chunk per element, but you can also use LangChain for chunking instead:

code_block: <ListValue: [StructValue([(‘code’, “…rnrn# Pick a LangChain text-splitter to divide your documents into smaller Chunksrnsplitter = langchain.text_splitter.CharacterTextSplitter(chunk_size=100, chunk_overlap=20)rn# Configure Apache Beam to use your desired text-splitterrnlangchain_chunking_provider = beam.ml.rag.chunking.langchain.LangChainChunker(rndocument_field=’content’, metadata_fields=[], text_splitter=splitter)rnrnwith beam.Pipeline() as p:rn_ = (rnprn| ‘Create Products’ >> beam.io.textio.ReadFromText(products)rnrn| ‘Generate Embeddings’ >> MLTransform(rnwrite_artifact_location=tempfile.mkdtemp())rn.with_transform(langchain_chunking_provider)rn.with_transform(huggingface_embedder)rn…”), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7b8e746c40>)])]>

You can find a full working example with AlloyDB here: build realtime vector embedding pipeline for AlloyDB with Dataflow.

Introducing Dataflow enrichment transform for RAG

To support RAG use cases, we have also enhanced Dataflows enrichment transform to include the ability to look up results from databases, enabling the ability to create asynchronous applications in batch or stream mode that can, in a low code pipeline, consume the information stored in the vector database. This enables you to enrich your applications with operational data to embed or to utilize as filters, so that you don’t need to store your vectors in distinct storage solutions.

2 How to enable real time semantic search and RAG applications with Dataflow ML — Enabling RAG applications with Dataflow

code_block: <ListValue: [StructValue([(‘code’, “params = BigQueryVectorSearchParameters(rn project=self.project,rn table_name=self.table_name,rn embedding_column=’embedding’,rn columns=[‘content’],rn neighbor_count=1)rnrnhandler = BigQueryVectorSearchEnrichmentHandler(rn vector_search_parameters=params)rnrnwith beam.Pipeline() as p:rn rn result = (p | beam.Create(test_chunks) | Enrichment(handler))rn )”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7b8e7469a0>)])]>

Get started today

With these simple code snippets, we’ve shown how it’s possible to not only ingest and prepare the source data needed to populate your vector databases, but also to consume and use that information for your streaming and batch applications where the intent is to process large volumes of data and incoming information. For a full working example please look at the following full example. To explore more please head to Dataflow ML documentation for detailed examples and collabs. Get started with vector search on AlloyDB today. You can also sign up for a 30-day AlloyDB free trial.

^{Special thanks to Claude van der Merwe, Dataflow ML Engineering, for his contributions to this blog post.}

Read More for the details.

2025 07 15

GCP – Engineering Deutsche Telekom’s sovereign data platform

Tibor Kiss Cloud, Google Cloud gcp

Imagine transforming a sprawling, 20-year-old telecommunications data ecosystem, laden with sensitive customer information and bound by stringent European regulations, into a nimble, cloud-native powerhouse. That’s precisely the challenge Deutsche Telekom tackled head-on, explains Ashutosh Mishra. By using Google Cloud’s Sovereign Cloud offerings, they’ve built a groundbreaking “One Data Ecosystem.”

When we decided to modernize our telecommunications data ecosystem at Deutsche Telekom, we faced a daunting task. Over 40 legacy systems, each an ecosystem (data warehouse or data lake), held terabytes of customer, network, and operational data. Each system had 5,000 to 10,000 users who had built their workflows around these isolated silos over decades of use.

The result? What I lovingly call a “spaghetti mess” of data distribution and an undetermined cost of value creation.

The technical challenge of building our One Data Ecosystem (ODE) was one thing — consolidating disparate systems always is. It’s the regulatory puzzle that made it genuinely interesting.

As a telecommunications company in Germany, we handle some of the most sensitive data imaginable: call data records (CDRs), network telemetry, and customer location data. Under GDPR and Germany’s Telecommunications and Telemedia Data Protection Act regulations, this data simply can not leave sovereign borders or risk exposure to foreign legal frameworks.

Here’s where it gets technically fascinating: Traditionally, regulated industries solve this by building expensive on-premises encryption and pseudonymization infrastructure. You process your sensitive data locally, strip it of identifying characteristics, and then send sanitized versions to the cloud for analytics.

This approach costs millions in dedicated hardware and creates a fundamental innovation bottleneck. We wanted something radically different: cloud-native processing of sensitive data, without compromise.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ee14eb5c400>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Engineering sovereignty at cloud scale

The breakthrough came with Google Cloud’s approach to digital sovereignty and their Germany Data Boundary by T-Systems offering (formerly known as Sovereign Controls by T-Systems). The architecture is elegant in its simplicity: Deutsche Telekom maintains complete cryptographic control through external key management (EKM) while using cloud-native data services.

Here’s how the technical magic works. T-Systems manages our encryption keys entirely outside Google’s infrastructure. This creates sovereign protection against foreign legal frameworks, and ensures they are able to control access to their data and deny access for any reason.

Meanwhile, we use format-preserving encryption (FPE) algorithms that maintain data utility for analytics while ensuring privacy protection.

The core innovation is our custom pseudonymization layer, which comprises C++ modules with Java wrappers that handle real-time data transformation during ingestion. This eliminates the traditional need for separate preprocessing infrastructure while maintaining analytical value.

Choosing our data format was crucial. After extensive POCs, we settled on Apache Iceberg, and here’s why that matters for anyone building similar platforms: Iceberg solves the polyglot analytics problem beautifully. Our data scientists prefer working in Python notebooks, our engineers use Spark, and our business analysts work with SQL.

While traditional approaches force you to pick sides or maintain multiple data copies, Iceberg provides us with a single source of truth that speaks every language fluently.

The three-layer architecture we built around Iceberg is worth replicating: Raw data lands directly in Cloud Storage, flows through an Atomic layer for normalization and schema evolution, and then surfaces in an Analytic layer optimized for specific use cases. BigQuery, Spanner, BigTable, and Cloud SQL each serve their optimal workloads while sharing the same underlying Iceberg foundation.

Performance and scale in production

We are migrating from more than 40 legacy systems to reinvigorate our business demands. We have ingested over 200 source systems in just six months. However, the real validation came recently when one of our use cases, running live on the new platform, achieved a 22x performance improvement over its legacy predecessor.

That number represents the compound effect of eliminating data silos, reducing ETL complexity, and using cloud-native autoscaling. When you can process overnight analytics jobs in minutes instead of hours, you fundamentally change how business decisions get made.

What makes this platform genuinely scalable isn’t just the technical architecture; it’s the operational model. We’ve implemented a GitOps approach with policy-as-code onboarding through GitLab CI/CD pipelines, where infrastructure and governance policies are defined declaratively and deployed automatically. This means onboarding a new system takes hours instead of months, and compliance becomes automatic rather than manual.

Additionally, we’re already running agentic AI use cases on the public side of our platform. The unified data model we’ve built positions us perfectly for the next wave of AI innovation. As more AI services become available with sovereign controls, we’ll be ready to expand our deployment at scale.

The key insight: Build your data foundation with AI in mind, even if you can’t implement every AI capability immediately. Clean, unified, well-governed data is the prerequisite for everything that’s coming.

A blueprint for the future

This is one of the largest and most comprehensive data platforms built on Google Cloud’s Data Boundary — but it won’t be the last. The architectural patterns we’ve developed, external key management, format-preserving encryption, unified data formats, policy-as-code governance, are replicable across any regulated industry.

The business case is also compelling: Eliminate expensive on-premises preprocessing infrastructure while gaining cloud-scale analytics capabilities. The technical implementation is proven. What’s needed now is the willingness to engineer sovereignty, rather than simply accept traditional trade-offs.

For my fellow data architects in regulated industries, you don’t have to choose between innovation and compliance. With the right technical approach, you can achieve both and build platforms that position your organization for the AI-driven future that’s rapidly approaching.

The maturity and integration of Google Cloud’s data and AI capabilities, combined with our intensive collaboration between engineering teams, has made this transformation possible. We’re not just customers: We’re co-creating the future of sovereign cloud platforms.

Read More for the details.

2025 07 14

GCP – Unlock AlloyDB performance secrets with new performance snapshot report

Tibor Kiss Cloud, Google Cloud gcp

In the world of database management, understanding performance bottlenecks is critical to smooth operations and an optimal user experience. Using managed database services can help alleviate mundane management tasks and let you focus on value-added, strategic tasks, while also offering tools for monitoring database and resource performance. But when performance issues arise, gathering and analyzing various system and database metrics before and during the incident to conduct a thorough analysis can be time-consuming and labor-intensive.

Google Cloud’s AlloyDB for PostgreSQL is a fully managed database service that delivers superior availability, scalability, and performance. Recently, we added a new feature called performance snapshot report that provides deep insights into database performance. In this blog post, we explore how the performance snapshot report lets you identify and solve performance issues with greater clarity, control, and ease. This tool complements other AlloyDB observability features like systems insights, query insights, and the Metrics Explorer, which provide real-time metrics about your instance.

Demystifying the performance snapshot report

The performance snapshot report is a collection of PostgreSQL functions and views that captures snapshots of your AlloyDB system statistics. These snapshots provide detailed diff reports of performance metrics between different points in time, offering a comprehensive view of your database’s activity during the period.

6qyLs3brkmvCHds — The image above demonstrates an initial portion of the performance snapshot report. For more details on various sections, refer to this table. For additional wait events, refer to the wait events documentation.

Here’s how it works

The performance snapshot report captures snapshots of your database’s performance metrics at any point during workload execution and generates reports based on any two snapshots. This gives you granular control over what periods you want to analyze, whether it’s the entire workload, specific intervals, or moments of peak activity. Each snapshot captures a wealth of information, including changes in key performance indicators like wait events, I/O statistics, and query execution times. It also captures system configuration details such as CPU usage, memory allocation, and parameter details.

You can also access and analyze performance by generating reports between two snapshots directly within your PostgreSQL environment, using familiar psql queries, as in familiar Oracle AWR reports, facilitating easy adoption and integration into existing processes. Then, you can save the reports for historical comparisons to identify regressions, evaluate changes, and monitor performance trends.

Performance snapshot report in action

The performance snapshot report equips you with the tools you need to diagnose and resolve performance issues effectively. Let’s explore some common scenarios where a performance snapshot report proves to be invaluable.

1. Identifying performance bottlenecks

Scenario: The Cymbal Shop’s database, managed by DBA Sarah, is experiencing an unexpected performance degradation. A previously stable system became significantly slower this morning. Sarah needs to investigate this sudden slowness to determine its cause and implement a solution to address the changes that led to it.

Solution: Sarah had already established a baseline by creating a report comparing two performance snapshots taken two days prior during normal system operation. To investigate the unexpected slowdown, Sarah first uses the `perfsnap.snap()` function to capture performance snapshots at two distinct time points, separated by approximately one hour. Subsequently, she utilizes the `perfsnap.report(start_snap_id, end_snap_id)` function to generate a differential report, which highlights variations in critical performance indicators such as wait events, I/O activity, and query execution times. Finally, Sarah analyzes the generated report, focusing on significant increases in particular wait events — such as `lwlock` for lock contention or `io` for storage delays — and changes in query execution durations. This detailed examination allows her to precisely identify the root cause of the sudden performance degradation.

Example:

code_block: <ListValue: [StructValue([(‘code’, ‘SELECT perfsnap.snap();rnsnaprn—-rn14rn(1 row)rnrn– Start your workload and wait for some timern– capture two snapshots to serve as a baseline rnrnSELECT perfsnap.snap(); rnsnaprn—-rn15rn(1 row)rnrn– report between snapshots 14 and 15 serve as a baselinern– now a slowdown is experienced.. rn– create two snapshots during the slowdownrnrnSELECT perfsnap.snap(); — this is snapshot 1 during the slowdownrnsnaprn—-rn16rn(1 row)rnrn– wait for some time of your workloadrnrnSELECT perfsnap.snap(); — capture the econd snapshot during the slowdownrnsnaprn—-rn17rn(1 row)rnrnrnSELECT perfsnap.report(14, 15); — Generate a diff report between snapshot IDs 14 and 15 (normal workload)rnrnrnSELECT perfsnap.report(16, 17); — Generate a diff report between snapshot IDs 16 and 17 (slowdown experiened)rnrnrn–Compare the two reports to find the change in behavior’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e0a79884820>)])]>

That generates a report similar to this example report.

2. Assessing performance impact post system upgrades and configuration changes

Scenario: Sarah has upgraded the AlloyDB version. She’s also modified a few configuration parameters. She wants to understand the system behavior post these changes.

Solution:

Make baseline snapshot report: Before making the changes, Sarah captured two snapshots to create a report that shows the baseline stats.
Create post-change snapshot report: After making the changes, she captured two snapshots, and generated another report.
Analyze changes: She compared the report for any significant changes in performance metrics by comparing the two reports. Pay attention to sections like “CPU Utilization,” “Memory Utilization,” and “I/O Statistics” to assess the impact of the changes.

Know thy database performance

The performance snapshot report is a convenient way for you to get visibility and control over your AlloyDB database performance. By capturing and comparing snapshot reports, you can identify bottlenecks, optimize workloads, assess changes, and diagnose performance regressions with ease. This powerful tool, combined with the inherent performance advantages of AlloyDB, empowers you to unlock the full potential of your database and ensure your applications run smoothly and efficiently.

If you’re new to Google Cloud and want to take AlloyDB for a spin, just sign up for an AlloyDB free trial. If you’re already a Google Cloud user, head over to the AlloyDB console, click “Create a trial cluster” and we’ll guide you through migrating your PostgreSQL data to a new AlloyDB database instance.

Learn more

Review the documentation for the performance snapshots rReport
Watch this session for a deep dive into the latest innovations in AlloyDB
Learn more about AlloyDB

Read More for the details.

Google Public Sector awarded $200 million contract to accelerate AI and cloud capabilities across Department of Defense’s Chief Digital and Artificial Intelligence Office (CDAO)

2025 07 14

GCP – Google Public Sector awarded $200 million contract to accelerate AI and cloud capabilities across Department of Defense’s Chief Digital and Artificial Intelligence Office (CDAO)

Tibor Kiss Cloud, Google Cloud gcp

At Google Public Sector, we’re committed to advancing the deployment of innovative technology across the defense ecosystem. Today, we’re announcing that Google Public Sector has been awarded a $200 million-ceiling contract to support the U.S. Department of Defense’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO). This builds on Google Public Sector’s long-standing collaboration with the Pentagon to advance national security.

The DoD may deploy Google’s frontier AI innovations and secure, extensive cloud infrastructure, empowering the agency with capabilities like Cloud Tensor Processing Units (TPUs), Agentspace, and access to Google’s entire Contiguous United States (CONUS) infrastructure for AI—a unique offering among DoD Cloud Service Providers (CSPs). These advanced AI solutions will enable the DoD to effectively address defense challenges and scale the adoption of agentic AI across enterprise systems to drive innovation and efficiency with agile, proven technology.

This milestone demonstrates a strengthened partnership between Google Public Sector and DoD CDAO to accelerate the federal adoption of cutting-edge data, analytics, and AI. It is also a testament to our dedication to providing secure, scalable, and responsible AI solutions that meet the stringent demands of the national defense ecosystem. This collaboration will enable the DoD to drive continued innovation and impactful outcomes.

This contract builds on Google Public Sector’s proven track record of success in partnering with customers across the federal ecosystem like the U.S. Navy, U.S. Air Force, and Defense Innovation Unit (DIU) to power mission-critical operations. Additionally, Google Distributed Cloud (GDC) & GDC air-gapped appliance recently achieved DoD Impact Level 6 (IL6) authorization, in addition to Google’s previous Top Secret authorization for GDC, providing DoD customers with secure, compliant, and cutting-edge air-gapped cloud environments for their most sensitive classified data and applications.

Register to attend our Google Public Sector Summit taking place October 29, 2025, in Washington, D.C., where you will have an opportunity to hear from public sector leaders and industry experts, and get hands-on with Google’s latest AI technologies.

Read More for the details.

2025 07 11

GCP – How Jina AI built its 100-billion-token web grounding system with Cloud Run GPUs

Tibor Kiss Cloud, Google Cloud gcp

Editor’s note: The Jina AI Reader is a specialized tool that transforms raw web content from URLs or local files into a clean, structured, and LLM-friendly format. In this post, Han Xiao details how Cloud Run empowers Jina AI to build a secure, reliable, and massively scalable web scraping system that remains economically viable. This post explores the collaborative innovation, technical hurdles, and breakthrough achievements behind Jina Reader, a web grounding system now processing 100 billion tokens daily.

When Jina Reader launched in April 2024, its explosive growth — serving more than 10 million requests and 100 billion tokens daily — confirmed huge demand for reliable, LLM-friendly web content. Jina Reader isn’t just another scraper; it takes a different approach to how AI systems consume web content by transforming raw, noisy web pages into clean, structured markdown.

The core challenge for any AI system processing web data is the “web grounding problem.” Modern websites are a chaotic mix of content, ads, tracking scripts, and dynamic JavaScript, creating an overwhelming noise-to-signal ratio. Traditional scrapers struggle with this complexity, often failing on dynamic single-page applications or generating unusable, ungrounded data for LLMs. Jina Reader’s breakthrough, ReaderLM-v2, is a purpose-built 1.5-billion-parameter language model that intelligently extracts content, trained on millions of documents to understand web structure beyond simple rules.

Figure 1 Jina Reader — FIgure 1: Jina Reader: a sophisticated browser automation system

Cloud Run: The engine behind Jina Reader’s scale

Jina Reader faced inherent burstiness and unpredictability of web scraping workloads. Traditional virtual machine setups meant either costly over-provisioning or critical failures under load. Google Cloud Run became the essential solution, enabling Jina Reader to build a web scraping system that is secure, reliable, massively scalable, and economically viable.

The web grounding app (the browser automation system that scrapes and cleans web content) is hosted on Cloud Run (CPU). It runs full Chrome browser instances.
ReaderLM-v2 is a purpose-built 1.5-billion-parameter language model for HTML-to-markdown conversion that runs on Cloud Run with serverless GPUs.

Cloud Run directly addressed several critical issues:

Optimized Performance: The deep collaboration between Jina Reader and Google Cloud engineering was essential. We jointly optimized container lifecycle management for browser automation, reducing startup times from over 10 seconds to under two seconds through prewarming, optimized images, and intelligent resource allocation. For ReaderLM-v2, Google’s team helped create custom container configurations to efficiently run a 1.5-billion-parameter model on Cloud Run GPUs. The on-demand scaling and fast start capabilities of Cloud Run GPUs were critical in helping optimize model performance, directly impacting our ability to process 100 billion tokens daily.

Figure 2 On-demand AI inference with Cloud Run GPUs — Figure 2: On-demand AI inference with Cloud Run GPUs (hosting ReaderLM-v2 model)

True Scale-to-Zero Serverless: Cloud Run’s ability to run full Chrome browser instances allowed cost-effective operations. Each request spawns an isolated container with its own headless Chrome, and crucially, these containers disappear when the request is done. This ephemeral nature is vital for processing untrusted web content, mitigating security risks and memory leaks.
Global Multi-Regional Deployment: Cloud Run’s global presence ensures requests are processed close to both the users and target websites. This significantly minimizes latency and boosts success rates, even against geo-restricted content.
Massive & Automatic Scaling: The platform seamlessly scales from a handful to over 1,000 container instances during peak traffic, handling the unpredictable nature of web scraping without manual intervention.
Economic Viability: With Cloud Run’s pay-per-use model, Jina Reader can offer a generous free tier to end users while maintaining profitability even with substantial monthly usage. This pricing flexibility was fundamental to our widespread adoption.
Resilience and Operational Excellence: During a recent sustained DDoS attack, Cloud Run’s serverless architecture proved invaluable. It scaled up to absorb massive loads (over 100,000 requests per minute), while intelligent rate limiting filtered malicious traffic. Critically, costs returned to normal immediately after the attack subsided due to its scale-to-zero capability. The system has maintained over 99.9% uptime.

Conclusion

Building Jina Reader on Google Cloud Run proved that AI capabilities and cloud-native architecture are complementary. Cloud Run’s unique capabilities — serverless GPUs, container isolation, global deployment and scale-to-zero economics — made the architecture possible. Our close partnership demonstrates that deep integration between AI-first systems and modern cloud infrastructure can create capabilities previously thought impossible, enabling us to process 100 billion tokens every day.

You can discover more about Cloud Run GPUs on our product page, and if you want to learn how to host a large language model on Cloud Run, watch this video.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef6a044ea00>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Read More for the details.

2025 07 11

GCP – Manipal Hospitals and Google Cloud partner to transform nurse handoffs with GenAI

Tibor Kiss Cloud, Google Cloud gcp

As one of India’s largest healthcare providers, Manipal Hospitals serves nearly 7 million patients annually across 37 hospitals. To deliver clinical excellence and patient-centric care at a high standard, we are continually embracing technology. One of our most significant operational challenges we consistently face is the nurse handover process—a critical but time-consuming task. To make nurse handovers more efficient, safe and accurate, we entered a strategic partnership with the Google Cloud Consulting team to co-develop a generative AI solution, leveraging the power of Google Cloud.

Rethinking time-consuming, error-prone nurse handoffs

The process of transferring essential information about a patient’s condition and care plan from an outgoing nurse to an incoming one is crucial for ensuring continuity of care and patient safety. However, with more than 10,500 beds across our hospitals, the sheer volume of data required for a comprehensive handover meant our nurses routinely added an extra 90 minutes to their shifts for both creating and receiving these reports. This lengthy process could directly affect patient care, as it could lead to fatigue and potential mistakes, and also reduce job satisfaction for our vital nursing staff. We needed a way to make this process faster, more accurate, and less of a burden.

Building a trusted solution on Google Cloud

Our joint Manipal-Google team knew that for a clinical tool to be adopted by over 5,000 nurses, it had to be both fast and trustworthy. The primary challenge with any generative AI application in healthcare is ensuring accuracy and minimizing the risk of AI “hallucinations.”

The solution’s architecture, designed by the Google Cloud Consulting team, addresses this head-on by leveraging multiple Google Cloud components. Patient data from our TrakCare system is securely transferred in near real-time to a data lake on Google Cloud. When a nurse requests a handover summary, a serverless Cloud Run application orchestrates a multi-stage process.

Critically, instead of passing pages of raw data directly to the AI, the system first uses intelligent, time-based filters to extract only the most relevant clinical information for the specific shift. This structured, pre-processed data is then sent to Gemini on Vertex AI. This “controlled generation” approach was a key innovation; it ensures Gemini summarizes only the most pertinent facts, dramatically improving the accuracy and consistency of the final ISBAR (Identify, Situation, Background, Assessment, and Recommendation) report. The ability of Gemini to understand complex medical terminology, medication names, and clinical procedures without specialized fine-tuning was a game-changer, accelerating the entire development process.

How our partnership delivered results

By combining Manipal’s deep clinical expertise and Google Cloud Consulting’s technical leadership, our joint approach provides a blueprint for enterprise-grade AI implementation:

From ideation to production: The Google Cloud Consulting team led the engagement from the initial idea all the way to a production-ready solution now used by thousands of nurses daily. The project started with a focused Minimum Viable Product (MVP) to prove the technology’s value before scaling.
User-centric design: The solution was not built in a vacuum. The Google team conducted over eight rounds of deep discussion and evaluation sessions directly with our nurses. This ensured the final ISBAR summary format was not just technically impressive, but clinically useful from day one.
Agile and iterative rollout: The solution was piloted at one hospital initially to test its performance and safety in a real-world setting. With a successful pilot, the solution is live in 23 of Manipal hospitals, and used by more than 5,000 every day. At full scale, it is projected to help save significant nurse hours on a daily basis. This phased approach, managed jointly, has allowed us to gather feedback and ensure smooth adoption.

Ensuring better patient care

The generative AI solution we implemented has yielded impressive returns. The 70% reduction in handoff time—from 90 minutes down to 20—frees our nurses to focus more on direct patient needs and care. It also makes the process less vulnerable to errors that can arise from handwritten notes and human fatigue.

This project, delivered in partnership with Google Cloud Consulting, is a prime example of how we are pioneering the future of healthcare in India, helping us scale the delivery of quality care across the length and breadth of the country.

^{We’d like to give special thanks to Google Cloud Consulting team – Naveen Poosarla, Gopala Dhar, Rupjit Chakraborty, Hem Anand, Amit Dutta, Nishant Welpulwar, Preetam Dey and Shikha Saxena – for designing and developing the solution. We are grateful to the Manipal Hospitals team – Saroja Jaykumar, Sunil Bhattacharjee – in delivering this successful project.}

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef69ec629d0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Read More for the details.

2025 07 11

GCP – Enhancing GKE data protection with cross-project backup and restore

Tibor Kiss Cloud, Google Cloud gcp

As Google Kubernetes Engine (GKE) deployments grow and scale, adopting a multi-project strategy in Google Cloud becomes a best practice for security and environment organization. Creating clear boundaries by using distinct projects for development, testing, and production environments provides isolation and helps manage access control.

However, isolation introduces a data protection challenge: How do you effectively manage backups across these project boundaries? Without a native solution, centralizing backups, ensuring a clear separation of duties with IAM, and enabling robust disaster recovery all become complex tasks, often forcing teams to rely on custom scripts or inefficient manual processes.

Introducing cross-project backup and restore

To address this, Backup for GKE, now in preview, supports cross-project backup and restore. This new capability allows you to back up workloads from a GKE cluster in one Google Cloud project, securely store the backups in a second, and restore them to a cluster in a third. This streamlines data protection, enhances your security posture, and offers greater flexibility for your operational workflows.Storing backups in a separate, isolated project and region is essential for modern disaster recovery, safeguarding your recovery capability during a regional outage or a compromise in a primary Google Cloud project — the foundation of a resilient infrastructure. This separation also simplifies regulatory compliance, boosts security by limiting the blast radius of any potential incident, and helps you meet RTO/RPO objectives.

Key benefits of cross-project backup and restore

Centralized backup management: Consolidate GKE backups from multiple Google Cloud projects into a single project by pointing the backup plan for each cluster to the chosen backup project. This simple configuration provides your team with one control plane to oversee monitoring and manage backup policies.
Enhanced disaster recovery: Storing GKE backups in a separate project and region provides a vital layer of isolation, boosting your resilience against events like regional outages. If your source region becomes unavailable, you can create a restore plan from your backup project to recover your workloads to a cluster in another project.
Streamline operations: seeding, cloning, and collaboration

Cross-project capabilities bring agility to your development lifecycle by simplifying how you copy data between environments. You can now leverage production backup data for testing or rapidly clone entire application environments.

- Seed and clone environments: You can populate a staging environment with data from a prior backup or create a sandbox. Create a restore plan using an existing backup plan located in the backup project, then select a backup — such as one from production for seeding or a dev environment for cloning — and target a cluster in any other project as your destination. This lets you create test environments and isolated sandboxes.
- Simplify cross-team collaboration: Since all backups are stored in a central backup project, you can grant a developer from another team a role like Delegated Restore Admin, and also provide them with read permission on the specific backup plan and all of its associated backups. They can then use it to restore to their cluster without needing access to the other team’s live source project.
Achieve separation of duties for security and compliance

Isolating backups in a dedicated project allows you to enforce the principle of least privilege by assigning distinct responsibilities. You can empower your application teams with self-service permissions to back up and restore applications within their own projects, without giving them control over the central backup repository. A central platform or operations team can be granted administrative control over the backup project to govern the entire data lifecycle — from setting retention policies with immutability to conducting audits, all without needing access to live production environments. This separation is key to reducing risk and simplifying audits.

For detailed guidance on Backup for GKE IAM roles and permissions, see the documentation.

Cross-project backup and restore for GKE helps you protect your containerized workloads across multiple Google Cloud projects. This feature allows you to strengthen your disaster recovery capabilities, improve your security posture, and streamline operational workflows.

Get started today

Want to try this feature yourself? To enable it for your projects, please complete this form.

Learn how to perform cross-project backups
Learn how to perform cross-project restores

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef69eccf190>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Read More for the details.

2025 07 11

GCP – Get better at getting better: Take the 2025 DORA survey

Tibor Kiss Cloud, Google Cloud gcp

In the fast-paced world of AI, it can be challenging to pause and reflect on how we work. Yet this reflection is the cornerstone of continuous improvement. The 2025 DORA survey offers a unique opportunity for you and your team to do just that. By taking just 10-15 minutes to participate before July 18, you can gain valuable insights into your team’s current practices and identify immediate opportunities for growth.

The benefits of applying DORA principles are not just theoretical. Companies around the world have seen significant improvements in their software delivery and operational performance. For example, financial services giant Banorte was able to increase their deployment frequency from bi-weekly to multiple times a day. Similarly, by focusing on continuous improvement, SLB cut their deployment time from five days to just three hours.

These are not isolated incidents. GitLab reduced errors by an impressive 88%, enhancing platform stability, while Scoops accelerated their feature delivery by 700%. These organizations, and many others, have demonstrated that a commitment to getting better at getting better, supported by the insights from DORA, can lead to transformative results.

The 2025 DORA Survey is available in multiple languages, including English, French, Japanese, Portuguese, Simplified Chinese, and Spanish, to ensure that we can capture a truly global snapshot of the technology landscape. Your participation, regardless of your location or language, is invaluable.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef69e9e4040>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Charting the future of AI in technology teams

DORA’s research has already shown that AI is transforming every stage of the software development lifecycle, with 76% of technologists reporting they rely on AI in their daily work. Our findings indicate that when organizations are transparent about their AI strategy and create policies to govern its adoption, they see a significant increase in the use of these powerful new tools. We’ve also seen that AI can have a positive impact on developer well-being, leading to higher job satisfaction and less burnout, but that organizations must be mindful of how AI affects the perceived value of development work.

This year’s DORA survey will take an even deeper look into how we can all improve the impact of AI across the entire software lifecycle. We’ll be building on our previous research into the importance of fostering trust in AI and how to best support teams as they adopt these new capabilities.

What’s in it for you?

The act of taking the DORA survey is, in itself, a valuable team-assessment. It provides a structured way to think about your team’s workflows, from initial idea to end-user delivery. This reflection can spark immediate ideas for improvement. Perhaps you’ll identify a bottleneck in your review process or an opportunity to enhance your documentation. High-quality documentation, for instance, has been shown to substantially improve team performance.

Your participation also contributes to a larger body of research that helps the entire technology community understand what it takes to build and maintain high-performing teams. By sharing your anonymous experiences, you help shape the industry’s understanding of the impact of AI and the characteristics of high-performing teams. This research is strongest when it includes diverse perspectives, which is why we are calling on a wide range of roles to participate. Whether you are a software engineer, a product manager, a CISO, or a UX designer, or anyone else who participates in the creation and delivery of software, your voice is critical.

Take the DORA survey today

Ready to take the first step towards improving your team’s performance? The 2025 DORA Survey is open until July 18, 2025. Take 15 minutes to reflect on your team’s practices, contribute to industry-leading research, and discover how you can get better at getting better.

Read More for the details.

2025 07 10

GCP – SQL reimagined: How pipe syntax is powering real-world use cases

Tibor Kiss Cloud, Google Cloud gcp

SQL has been the cornerstone of data management for over 50 years, valued for its declarative nature and robust ecosystem. However, traditional SQL has limitations, including rigid clause structures, verbose syntax, and complex nested queries that can hinder readability and maintenance. To address these challenges, Google introduced pipe syntax, an extension to GoogleSQL that reimagines how queries are written and processed. A pipe-structured data flow makes SQL more easier to read and write than ever before.

As we discussed in a previous blog post, pipe syntax introduces a linear, top-down approach to SQL queries, using the pipe operator (|>) to chain operations like filtering, aggregating, and joining in a logical sequence. This structure aligns with the natural flow of data transformations, making queries more intuitive, readable, and maintainable. Unlike standard SQL, which enforces a strict order (SELECT, FROM, WHERE, etc.), pipe syntax allows operations to be applied in any order, reducing the need for cumbersome subqueries or Common Table Expressions (CTEs).

Since pipe syntax became generally available in April, we have seen you embrace it across a myriad of use cases — from streamlining data transformations, building insightful reports to efficiently analyzing logs, and many more. In this blog, we are highlighting three different use cases from customers about how they are using pipe syntax to simplify data transformations and log data analysis.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5ddc7b83d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

1.Supercharging data analysis with a linear data flow

Imagine you want to build a recommendation report to suggest whether a logical or physical pricing model is more cost effective for long-term data storage in BigQuery. While physical storage is generally cheaper, costs can vary based on factors like data compressibility.

You might write a query like this using standard SQL:

code_block: <ListValue: [StructValue([(‘code’, “WITH StorageData AS (rn SELECTrn project_id,rn table_schema,rn logical_gb * logical_unit_price AS logical_price,rn physical_gb * physical_unit_price AS physical_pricern FROM `<project_id>.region.INFORMATION_SCHEMA.TABLE_STORAGE`rn WHERE table_type = ‘BASE TABLE’rn)rnSELECTrn project_id,rn table_schema AS dataset_name,rn IF(rn SUM(physical_price) < SUM(logical_price),’physical’,’logical’rn ) AS best_pricing_modelrnFROM StorageDatarnGROUP BY project_id, table_schema;”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5dda6f94f0>)])]>

Using pipe syntax, the query can be written as:

code_block: <ListValue: [StructValue([(‘code’, “FROM `<project_id>`.`region`.`INFORMATION_SCHEMA.TABLE_STORAGE`rn|> WHERE table_type = ‘BASE TABLE’rn|> EXTEND logical_gb * logical_unit_price AS logical_price,rn|> EXTEND physical_gb * physical_unit_price AS physical_price,rn|> AGGREGATE SUM(logical_price) AS total_logical_price, SUM(physical_price) AS total_physical_price,rn GROUP BY project_id,table_schema,rn|> EXTEND IF(total_physical_price < total_logical_price, ‘physical’, ‘logical’) AS best_pricing_model”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daeb26b80>)])]>

Pipe syntax eliminates the need for CTEs, allowing users to chain operations linearly. This reduces the query’s complexity and makes it easier to follow, especially for cost analysis tasks requiring multiple transformations. Now you can build a report faster, and share the queries across teams that are easier for them to understand.

2. Simplifying data transformations in data pipelines

Suppose you are building a data pipeline for a dashboard. You want to process customer order data to calculate total revenue per customer, filter high-value customers, and then rank them.

In standard SQL:

code_block: <ListValue: [StructValue([(‘code’, ‘WITH aggregated_revenue AS (rn SELECT rn customer_id,rn SUM(order_amount) AS total_revenuern FROM sales.ordersrn GROUP BY customer_idrn),rnfiltered_revenue AS (rn SELECT *rn FROM aggregated_revenuern WHERE total_revenue > 1000rn),rnranked_revenue AS (rn SELECT rn customer_id,rn total_revenue,rn RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rankrn FROM filtered_revenuern)rnSELECT *rnFROM ranked_revenuernORDER BY revenue_rank;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daeb26fa0>)])]>

Using pipe syntax, the query can be written as:

code_block: <ListValue: [StructValue([(‘code’, ‘FROM sales.ordersrn|> AGGREGATE SUM(order_amount) AS total_revenue GROUP BY customer_idrn|> WHERE total_revenue > 1000rn|> EXTENDrn RANK() OVER (ORDER BY total_revenue DESC) AS revenue_rankrnORDER BY revenue_rank;’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daeb26580>)])]>

Building a data pipeline is a multi-step process involving data cleaning, transformation and aggregation. Pipe syntax simplifies developers’ workflows by allowing them to chain operations like aggregation and filtering in a logical and sequential way. The simplicity of the code allows developers to focus more time on solving business problems instead of writing and debugging cumbersome SQL code. Fewer lines of code and higher productivity eventually translate to faster business outcomes.

3. Simpler log analysis in Google Cloud Logging

Log Analytics in Google Cloud Logging is a powerful tool for storing, searching, analyzing, and monitoring application and system log data through SQL. When you need to perform complex analysis, aggregations, or transformations on their logs, the Log Analytics feature gives you the full strength of SQL.

When helping enterprises to migrate their workloads and logging tooling to Cloud Logging, translating pipe-like structure into standard SQL queries for typical log analysis tasks can sometimes feel cumbersome; it involves subqueries and common table expressions (CTEs) that might not feel immediately intuitive. Pipe syntax offers a more linear, pipe-based syntax, often making common log analysis patterns easier to read and write.

Let’s look at a concrete example.

The challenge: Analyzing log frequency by time, severity, and resource attribute

Imagine you want to analyze your logs to find patterns where specific log messages (identified by severity and tags) occur frequently (more than 5 times per minute), excluding certain messages (like those containing “CookieIncluded=False”).

Using standard SQL in Log Analytics, you might write a query like this:

code_block: <ListValue: [StructValue([(‘code’, ‘SELECTrn timestamp_trunc(timestamp, MINUTE) AS minute,rn severity,rn JSON_VALUE(labels) AS tags,rn count(*) as total_cntrnFROM `your-test-project.region._Default._Default`rnWHERE NOT SEARCH(json_payload.message, “CookieIncluded\\=False”)rnGROUP BY 1,2,3rnHAVING total_cnt > 5’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daeb26340>)])]>

Using pipe syntax, the query can be written as:

code_block: <ListValue: [StructValue([(‘code’, ‘FROM `your-test-project.region._Default._Default`rn|> WHERE NOT SEARCH(json_payload.message, “CookieIncluded\\=False”)rn|> AGGREGATE COUNT(*) AS total_cntrnGROUP BYrn TIMESTAMP_TRUNC(timestamp, MINUTE) AS minute,rn severity,rn JSON_VALUE(labels) AS tags rn|> WHERE total_cnt > 5’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daeb261f0>)])]>

Beyond looking different, pipe syntax fundamentally changes how you build and read queries for sequential log processing. It tackles two common pain points of standard SQL: nested subqueries and “inside-out” logic.

PipeSQL eliminates this nesting for common sequential tasks. Each pipe (|>) feeds the previous command’s output into the next. Aggregation results from the summary are immediately available for filtering in the subsequent WHERE clause without needing an explicit nested structure. The query remains flat and linear. By tackling nesting and promoting a linear flow, pipe syntax significantly lowers the cognitive load for many common log analysis tasks, making powerful analytics more accessible.

Customer love for pipe syntax

“I was very excited when BigQuery first introduced pipe syntax. As someone who is familiar with R, it is very natural for myself and our team of analysts to use pipe-like syntax for more streamlined data transformation via Dataform. I am happy to share that I have seen >30% reduction in my code compared to standard SQL. This eventually transferred to time saving to read, write and debug queries. I am looking forward to onboarding more analysts to use pipe syntax across my organization.” – Shanker Venkatachalam, Senior Manager, Business Strategy, Northwell.edu

“The pipe syntax has proven to be a revelation in my analytical work. Although I initially had concerns about the learning curve, it was surprisingly smooth. The linear flow facilitates step-by-step query construction, ensuring each transformation is correct before moving on. This reduces complexity and improves long-term maintainability. Notably, it also naturally encourages a ‘micro-transformations’ mindset, resulting in significantly more organized and understandable queries.”– Axel Thevenot, Google Developer Expert, Venom Engineering

“At Bindplane, we specialize in accelerating migrations to Cloud Observability with our telemetry pipeline built on OpenTelemetry. Once the data arrives, using pipe syntax makes it much easier to build key observability assets like dashboards, alerts, and queries. Once pipe syntax is available, I’m done writing regular SQL. It’s just so much easier for helping enterprises get up and running with Cloud Logging/Log Analytics.” – Keith Schmitt, Senior Software Engineer, Bindplane

What’s next for pipe syntax?

As shared at Cloud Next, BigQuery supports Gemini-powered Data Preparation Agents, which can analyze your data and auto-generate data pipelines to clean and transform your data. You will soon be able to use pipe queries to validate generated queries, which simplifies your pipelines and makes it easier to review and verify code.

In short, pipe syntax is more than a syntactic upgrade. It’s a paradigm shift that makes SQL more accessible and productive for analysts, engineers, and data teams. Its modular, pipeline-based approach aligns with modern data workflows, saving development time and improving efficiency. Whether you’re building reports, developing data pipelines, or analyzing logs, pipe syntax delivers shorter, clearer, and more efficient queries, empowering you to focus on insights rather than syntax.

Ready to simplify your SQL workflows? Learn how to use pipe syntax from this tutorial, and check out the following resources:

Pipe syntax documentation and detailed reference guide
Demo video, tutorial
Blog post on using pipe syntax for log data analysis
Research publication (VLDB 2024) with more background and language design details
Open-source GoogleSQL implementation, ZetaSQL

Read More for the details.

2025 07 10

GCP – Cloud Storage bucket relocation: An industry first for non-disruptive bucket migrations

Tibor Kiss Cloud, Google Cloud gcp

As your operational needs change, sometimes you need to move data residing within Google’s Cloud Storage to a new location, to improve resilience, optimize performance, meet compliance needs, or simply to reorganize your infrastructure. Yet moving buckets can be a daunting, complex, risky endeavor that involves manual scripting, painstaking coordination, and the risk of data loss, or worse yet, extended downtime. This can discourage organizations from making the changes they need to their storage environments.

We recently introduced Cloud Storage bucket relocation, a unique feature among leading hyperscalers that makes it easy to change your bucket’s location. Bucket relocation eliminates the need for complex manual planning and helps prevent extended downtime, for an easy transition with minimal application disruption, and strong data integrity. Your bucket’s name, and all the object metadata within it, remain identical throughout the relocation, so there are no path changes, and your applications experience minimal downtime while the underlying storage is moved. Furthermore, your objects retain their original storage class (e.g., Standard, Nearline, Coldline, Archive) and time-in-class in the new location. This is key for many cost efficiency strategies, helping ensure capabilities such as Autoclass continue to operate intelligently to optimize your storage costs post-migration.

Bucket relocation is a key capability within the Storage Intelligence suite, alongside tools like Storage Insights, which provides deep visibility into your storage landscape and identifies optimization opportunities. Bucket relocation then lets you act on these insights, and move your data between diverse Cloud Storage locations — regional locations for low latency, dual-regions for high availability and disaster recovery, or multi-regions for global accessibility — to meet your business, performance, and compliance objectives.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5daefa1f40>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Bucket relocation under the hood

Bucket relocation relies on two critical techniques.

Asynchronous data copy: Bucket relocation leverages a unique and optimized asynchronous data transfer mechanism that copies data in the background to minimize impact to ongoing operations. Existing operations like writing, reading, and updating objects continue while the entire dataset is being copied.
Metadata preservation: Historically, Google Cloud customers moved data with the Storage Transfer Service, which copied the objects to a new bucket and deleted existing ones. Bucket relocation, on the other hand, automatically and meticulously moves all your bucket’s and objects’ associated metadata, thereby preserving state. This includes information like:

Storage class: Your objects retain their original storage class (e.g., Standard, Nearline, Coldline, Archive) in the new location.
Bucket and object names: The naming structure of your buckets and objects remains identical.
Creation and update timestamps: These markers are preserved, so that features like object lifecycle management (OLM) rules continue to operate.
Access Control Lists (ACLs) and IAM policies: Bucket- and object-level permissions are transferred to help maintain your security posture.
Custom metadata: Any user-defined metadata associated with your objects is also migrated.

By handling the complexities of asynchronous data transfer and automatic metadata migration, bucket relocation minimizes the risks and overhead associated with a manual bucket migration. Crucially, because the bucket name is preserved throughout the relocation process, applications accessing the bucket don’t need to be modified.

Relocate your bucket in a few simple steps

With bucket relocation, you can move your Cloud Storage buckets in three simple steps. Here’s a breakdown:

1. Initiate a dry run:

Before starting the actual relocation, it’s highly recommended to perform a dry run. This simulates the process without moving any data, allowing you to identify potential issues early on, such as incompatible configurations.
The dry run checks for incompatibilities like customer-managed encryption keys (CMEK), locked retention policies, objects with temporary holds, and bucket tags, without you having to manually validate each of them.
Make sure to add the --dry-run flag!

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud storage buckets relocate gs://BUCKET_NAME –location=LOCATION –dry-run’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daefa1a00>)])]>

Replace BUCKET_NAME with the name of your bucket and LOCATION with the desired destination.

2. Start the relocation process:

This step initiates the actual data transfer from the source bucket to the destination bucket. During this phase, you can still read, modify, and delete objects in the bucket. However, the bucket metadata (i.e., bucket-level parameters and configurations) is write-locked to prevent changes that could affect the relocation.
Note: Removing the --dry-run flag from the dry-run command initiates the relocation.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud storage buckets relocate gs://BUCKET_NAME –location=LOCATION’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daefa18b0>)])]>

3. Finalize the relocation process:

Once the incremental data copy is complete, you’re ready to trigger the final synchronization step (except when moving between multi-region and configurable dual-region). This involves a brief period where writes to the bucket are disabled to help ensure their data integrity; any last-second changes made to the objects within the bucket while the incremental copy was in progress are copied to the destination. After the data’s integrity is verified, the bucket’s location is updated, and all requests are automatically redirected to the new location. During the final synchronization step, attempts to update objects in the bucket will result in an HTTP 412 error.
Do not initiate the final synchronization process until the relocation process progress reaches ~99%. This helps you minimize downtime because most of the data has already been synchronized in the background.

Note: If you’re moving between multi-regions and configurable dual-regions within the same multi-region code, you’re all set — bucket relocation handles the transition in the background, no finalization or downtime required!

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud storage buckets relocate –finalize –operation=projects/_/buckets/BUCKET_NAME/operations/OPERATION_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5daefa1b50>)])]>

The OPERATION_ID is provided as output from Step-2. The OPERATION_ID is listed with the keyword name. For instance:

name: projects/_/buckets/my-bucket/operations/AbCJYd8jKT1n-Ciw1LCNXIcubwvij_TdqO-ZFjuF2YntK0r74

And there you have it — In just three steps, you’ven moved your entire bucket, its data, and metadata, to its new location.

Early users of bucket relocation have had great success with the new feature.

“With Storage Intelligence and bucket relocation, we effortlessly transitioned to dual-region buckets. The seamless process, powered by the bucket relocation, minimized downtime and ensured data integrity. We migrated the buckets with peace of mind and without the manual headaches.” – Adam Steele, Product Manager, Spotify

“We recently utilized the bucket relocation feature of Storage Intelligence to successfully complete a ~300 bucket migration and PBs of data project from multi-region to regional storage, to optimize network data transfer costs. Without bucket relocation, this process would have required extensive automation and scripting, resulting in increased downtime and effort.” – Deepak Mahato, Data Platform Infrastructure Manager, GroupOn

Experience the ease and efficiency of managing your Cloud Storage buckets with bucket relocation in Storage Intelligence. To learn more, visit the bucket relocation documentation and the Storage Intelligence overview.

Read More for the details.

2025 07 10

GCP – From news to insights: Glance leverages Google Cloud to build a Gemini-powered Content Knowledge Graph (CKG)

Tibor Kiss Cloud, Google Cloud gcp

In today’s hyperconnected world, delivering personalized content at scale requires more than just aggregating information – it demands deep understanding of context, relationships, and user preferences. Glance, a leading content discovery platform that delivers personalized, real-time content experiences on mobile lock screens across the globe, serves over 300 million users worldwide across 450 million devices. Beyond news aggregation, Glance curates diverse content including entertainment, sports, gaming, shopping, and lifestyle content, making every glance at the phone screen meaningful and engaging.

However, with the exponential growth of digital content, particularly the overwhelming volume of daily news articles, Glance faced a critical challenge: how to effectively navigate this information while maintaining the personalized, contextual experiences users expect. The existing search and content discovery capabilities needed significant enhancement to uncover emerging trends, improve search relevance, provide deeper context, and most importantly, deliver truly personalized content recommendations that resonate with individual user preferences and behaviors.

Glance partnered with Google Cloud Consulting team to build a sophisticated Content Knowledge Graph (CKG) that addresses these challenges head-on. This solution leverages Google Cloud’s advanced AI and data processing capabilities, including Gemini models, BigQuery, Vertex AI, and Google Cloud partner Neo4j, to ingest, process, extract, standardize, classify, and structure news data into a dynamic network of interconnected entities and relationships. The Content Knowledge Graph has dramatically improved search relevance, enhanced personalized content discovery, provided deeper contextual insights, increased user engagement, and improved scalability and efficiency.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5ddafd46d0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Tackling the content discovery and personalization challenge

Glance’s mission extends far beyond traditional content aggregation. As a platform serving hundreds of millions of users globally, Glance must deliver hyper-personalized content experiences that anticipate user interests and adapt to evolving preferences in real-time. The challenge was multifaceted:

Uncovering trends at scale – Identifying emerging topics, viral content, and the complex relationships between different content categories across news, entertainment, sports, and lifestyle domains
Enhancing personalized search – Improving the accuracy and relevance of search results based on individual user behavior patterns, preferences, and contextual signals
Providing intelligent context – Offering users deeper understanding not just of individual pieces of content, but how different stories, events, and topics connect across the broader content ecosystem
Scaling personalization – Delivering these sophisticated experiences across 300+ million users while maintaining real-time responsiveness and relevance

Manually analyzing and connecting the dots within this vast, multi-dimensional content landscape was simply not scalable. Glance needed an intelligent, automated approach that could understand content at both granular and contextual levels while powering personalized recommendations at unprecedented scale.

Building a Gemini-powered content intelligence engine

Glance and Google Cloud Consulting team collaborated to architect and implement a comprehensive Content Knowledge Graph that transforms how content is understood, connected, and delivered. This sophisticated system leverages the full spectrum of Google Cloud’s AI and data processing capabilities:

Architecture of Content Processing for Knowledge Graph Creation

Intelligent content ingestion and processing – The system ingests content from diverse sources beyond news, including entertainment articles, sports updates, lifestyle content, and trending topics, storing them in BigQuery for efficient, scalable processing.
Advanced entity extraction and relationship mapping – Using Gemini foundational models, the system extracts key entities (people, organizations, locations, events, brands) and identifies complex relationships between them across different content categories.
Entity standardization and knowledge linking – Extracted entities are normalized using Gemini’s advanced language understanding and linked to authoritative knowledge sources like Wikipedia, ensuring consistency and enabling sophisticated cross-content analysis.
Multi-dimensional content classification – Gemini foundation models classify content into granular categories using the IAB content taxonomy while also identifying sentiment, urgency, and relevance signals for personalization.
Intelligent content summarization and tagging – Gemini generates contextual tags, compelling short headlines, and category labels, enabling users to quickly grasp content essence while powering recommendation algorithms.
Dynamic Knowledge Graph construction – The extracted information is structured into a Neo4j graph database, creating a living, breathing network of interconnected entities, topics, relationships, and user interaction patterns.
Real-time trend analysis and prediction – The system integrates with external trend APIs and analyzes user engagement patterns to identify and predict trending topics, providing actionable insights for content curation.
Interactive analytics dashboard – NeoDash powers an interactive dashboard for monitoring trending content, analyzing entity relationships, and visualizing content performance across different user segments.

Diagram 2:

How entities extracted from one news article help identify related news articles

Engineering for Global Scale and Performance

Glance enhanced the Content Knowledge Graph solution to handle 50,000+ daily articles across multiple content categories. The engineering team implemented several critical optimizations:

Event-driven architecture transformation – Migrating to a Kafka-based event-driven architecture with intelligent retries and asynchronous operations resulted in 4x throughput improvement, enabling real-time content processing at massive scale.
Graph database optimization – Neo4j query optimization and indexing strategies drastically reduced query response times from seconds to milliseconds, enabling real-time content recommendations.
Kubernetes-native deployment – Moving to managed Google Kubernetes Engine (GKE) with auto-scaling capabilities improved system reliability and resource utilization.
Strategic performance optimization – Applying the 80-20 principle to focus on high-impact optimizations, coupled with Redis caching and Cloud Spanner for critical data, reduced processing latency by 80% and boosted recommendation coverage from under 60% to over 85%.
Intelligent load balancing – Implementing smart load distribution across processing pipelines ensured consistent performance even during viral content spikes and peak traffic periods.

Business impact

The Content Knowledge Graph has delivered measurable improvements across key business metrics:

Enhanced content discovery performance – CKG-powered content recommendations boosted Cards per Session (CPS) by 24%, directly improving user engagement and platform stickiness.
Significantly increased user engagement – More relevant and contextually aware content delivery resulted in higher click-through rates and a 5% increase in swiping sessions, particularly in the related content sections.
Real-time trend intelligence – Users now discover trending topics instantly across news, entertainment, sports, and lifestyle categories, with faster trend detection compared to previous systems.
Data-driven content strategy – The CKG provides comprehensive, actionable insights into content performance, user preferences, and emerging trends, enabling data-driven editorial and curation decisions.
Global scalability and efficiency – The cloud-native architecture seamlessly handles Glance’s ever growing global content pool while maintaining cost efficiency and performance.

Shaping the Future of content discovery

The Glance Content Knowledge Graph transforms raw, unstructured content into a sophisticated, interconnected knowledge network, and has has empowered Glance to deliver truly engaging content experiences that anticipate user needs.The solution’s success lies not just in its technical sophistication, but in its ability to enhance the human experience of content discovery – making every interaction with Glance more meaningful, relevant, and engaging.

As content continues to proliferate and user expectations for personalization grow, the principles and technologies demonstrated in this project provide a blueprint for the future of intelligent content platforms. We’re excited to see how Glance leverages this powerful foundation to further innovate in personalized content discovery and set new standards for user experience in the digital content ecosystem.

Read More for the details.

2025 07 10

GCP – From localhost to launch: Simplify AI app deployment with Cloud Run and Docker Compose

Tibor Kiss Cloud, Google Cloud gcp

At Google Cloud, we are committed to making it as seamless as possible for you to build and deploy the next generation of AI and agentic applications. Today, we’re thrilled to announce that we are collaborating with Docker to drastically simplify your deployment workflows, enabling you to bring your sophisticated AI applications from local development to Cloud Run with ease.

Deploy your compose.yaml directly to Cloud Run

Previously, bridging the gap between your development environment and managed platforms like Cloud Run required you to manually translate and configure your infrastructure. Agentic applications that use MCP servers and self-hosted models added additional complexity.

The open-source Compose Specification is one of the most popular ways for developers to iterate on complex applications in their local environment, and is the basis of Docker Compose. And now, gcloud run compose up brings the simplicity of Docker Compose to Cloud Run, automating this entire process. Now in private preview, you can deploy your existing compose.yaml file to Cloud Run with a single command, including building containers from source and leveraging Cloud Run’s volume mounts for data persistence.

Supporting the Compose Specification with Cloud Run makes for easy transitions across your local and cloud deployments, where you can keep the same configuration format, ensuring consistency and accelerating your dev cycle.

“We’ve recently evolved Docker Compose to support agentic applications, and we’re excited to see that innovation extend to Google Cloud Run with support for GPU-backed execution. Using Docker and Cloud Run, developers can now iterate locally and deploy intelligent agents to production at scale with a single command. It’s a major step forward in making AI-native development accessible and composable. We’re looking forward to continuing our close collaboration with Google Cloud to simplify how developers build and run the next generation of intelligent applications.” – Tushar Jain, EVP Engineering and Product, Docker

Cloud Run, your home for AI applications

Support for the compose spec isn’t the only AI-friendly innovation you’ll find in Cloud Run. We recently announced general availability of Cloud Run GPUs, removing a significant barrier to entry for developers who want access to GPUs for AI workloads. With its pay-per-second billing, scale to zero, and rapid scaling (which takes approximately 19 seconds for a gemma3:4b model for time-to-first-token), Cloud Run is a great hosting solution for deploying and serving LLMs.

This also makes Cloud Run a strong solution for Docker’s recently announced OSS MCP Gateway and Model Runner, making it easy for developers to take the AI applications locally to production in the cloud seamlessly. By supporting Docker’s recent addition of ‘models’ to the open Compose Spec, you can deploy these complex solutions to the cloud with a single command.

Bringing it all together

Let’s review the compose file for the above demo. It consists of a multi-container application (defined in services) built from sources and leveraging a storage volume (defined in volumes). It also uses the new models attribute to define AI models and a Cloud Run-extension defining the runtime image to use:

code_block: <ListValue: [StructValue([(‘code’, ‘name: agentrnservices:rn webapp:rn build: .rn ports:rn – “8080:8080″rn volumes:rn – web_images:/assets/imagesrn depends_on:rn – adkrnrn adk:rn image: us-central1-docker.pkg.dev/jmahood-demo/adk:latestrn ports:rn – “3000:3000″rn models:rn – ai-modelrnrnmodels:rn ai-model:rn model: ai/gemma3-qat:4B-Q4_K_Mrn x-google-cloudrun:rn inference-endpoint: docker/model-runner:latest-cuda12.2.2rnrnvolumes:rn web_images:’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e9c6fd2e6a0>)])]>

Building the future of AI

We’re committed to offering developers maximum flexibility and choice by adopting open standards and supporting various agent frameworks. This collaboration on Cloud Run and Docker is another example of how we aim to simplify the process for developers to build and deploy intelligent applications.

Compose Specification support is available for our trusted users — sign up here for the private preview.

Read More for the details.

2025 07 09

GCP – How to tap into natural language AI services using the Conversational Analytics API

Tibor Kiss Cloud, Google Cloud gcp

AI is making it easier than ever to get clear, reliable answers from your data. With intelligent tools like the Conversational Analytics API, powered by Gemini, you no longer need intricate systems to get insights.

The Conversational Analytics API lets you use everyday language to ask questions of your data in BigQuery or Looker, with more sources to come. This brings powerful business intelligence to your entire organization through the apps you already use, like Slack. Developers can also build custom data agents that understand your business and provide answers where they’re needed.

Now in preview, this API transforms unstructured conversation and structured data into actionable insights, accessible through natural language. You can create custom data agents, map business terms, and define calculations to empower your users and analysts.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6c936c53d0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

Conversational Analytics API’s intelligent toolkit

The Conversational Analytics API integrates multiple AI-powered tools to process user requests, including Natural Language to Query (NL2Query) and a Python code interpreter for generating responses. A critical context retrieval tool is also utilized to ensure the API’s answers are accurate and relevant by incorporating details about your specific datasets.

Let’s take a peek under the hood:

The engine that powers the Conversational Analytics API

Context retrieval: The key to unlocking relevant answers

For intelligent conversational data interactions, leveraging the context retrieval tool is essential. For BigQuery, the tool meticulously pulls schema information, detailed column, table descriptions from Dataplex. When interacting with Looker, it accesses the LookML model, retrieving field definitions, labels, and defined measures. This deep, accurate understanding of your data’s structure, relationships, and inherent business logic is critical as it enables the agent to become an expert in your data landscape. This ensures every response is firmly grounded, highly relevant, and ultimately, a trusted answer for your business, providing reliable and trusted answers.

NL2Query engine: Turning questions into queries

At the core of the Conversational Analytics API lies a robust NL2Query engine, designed to support both BigQuery and Looker data sources. This engine translates user-provided natural language questions into semantically equivalent and syntactically correct queries appropriate for the specified data source.

For instance, a user could pose a question such as, “What is the average sales value per order item broken down by payment method?” and the engine would process this to generate and execute the necessary query, providing a precise answer without requiring the user to write any SQL. The NL2Query engine’s capability includes handling ambiguity and inferring the customer’s intent from natural language input, ensuring accurate mapping to the underlying dataset structure.

NL2Query transforms natural language to data queries

Python code interpreter: Unleash advanced analytics

Building upon basic query capabilities, the code interpreter functionality within the Conversational Analytics API leverages Python to facilitate advanced analytical tasks. This allows the API to generate and execute Python code for complex calculations and statistical analyses, going beyond what standard query languages can achieve. Users can address intricate scenarios that would be difficult or impossible with SQL alone, accessing sophisticated analyses like statistical modeling and data transformations through Python libraries, all via a conversational interface without needing direct Python coding.

For instance, a user could request to “Calculate churn rate by customer segment, and show me the distribution.” In response, the code interpreter would automatically generate and execute the necessary Python code to perform this analysis and deliver insightful visualizations and statistics.

Code interpreter simplifies data science, leveraging Python code for complex calculations

If you are Interested in testing out the code interpreter tool, currently in preview. Learn more here.

Intelligent visualization engine for data stories that pop

Data in its raw form can often be challenging to interpret. The visualization engine within the Conversational Analytics API addresses this by transforming the results of your queries into compelling visual representations. This capability streamlines the data visualization process, providing benefits such as instant chart generation without the manual steps of selecting data, labeling axes, or choosing chart types and colors. The result is the transformation of numerical data into visually engaging and easily digestible insights that effectively highlight hidden patterns and trends.

For example, a user could request to “Plot total orders by month,” and the system would instantly generate a relevant visualization, such as a line chart, effectively illustrating sales performance over time, ready for sharing and analysis.

Gemini powers smart visualizations to quickly illustrate your data

Start having intelligent conversations with your data

Stop struggling with data complexity. Request a demo today or turn on Conversational Analytics in Looker to begin chatting with your data. You can even bring Conversational Analytics to your own application with this API. Get started in minutes with our quickstart application here.

Read More for the details.