The pace of innovation in open-source AI is breathtaking, with models like Meta’s Llama4 and DeepSeek AI’s DeepSeek. However, deploying and optimizing large, powerful models can be complex and resource-intensive. Developers and machine learning (ML) engineers need reproducible, verified recipes that articulate the steps for trying out the models on available accelerators.
Today, we’re excited to announce enhanced support and new, optimized recipes for the latest Llama4 and DeepSeek models, leveraging our cutting-edge AI Hypercomputer platform. AI Hypercomputer helps build a strong AI infrastructure foundation using a set of purpose-built infrastructure components that are designed to work well together for AI workloads like training and inference. It is a systems-level approach that draws from our years of experience serving AI experiences to billions of users, and combines purpose-built hardware, optimized software and frameworks, and flexible consumption models. Our AI Hypercomputer resourcesrepository on GitHub, your hub for these recipes, continues to grow.
In this blog, we’ll show you how to access Llama4 and DeepSeek models today on AI Hypercomputer.
Added support for new Llama4 models
Meta recently released the Scout and Maverick models in the Llama4 herd of models. Llama 4 Scout is a 17 billion active parameter model with 16 experts, and Llama 4 Maverick is a 17 billion active parameter model with 128 experts. These models deliver innovations and optimizations based on a Mixture of Experts (MoE) architecture. They support multimodal capability and long context length.
But serving these models can present challenges in terms of deployment and resource management. To help simplify this process, we’re releasing new recipes for serving Llama4 models on Google Cloud Trillium TPUs and A3 Mega and A3 Ultra GPUs.
JetStream, Google’s throughput and memory-optimized engine for LLM inference on XLA devices, now supports Llama-4-Scout-17B-16E and Llama-4-Maverick-17B-128Einference on Trillium, the sixth-generation TPU. New recipes now provide the steps to deploy these models using JetStream and MaxText on a Trillium TPU GKE cluster. vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. New recipes now demonstrate how to use vLLM to serve the Llama4 Scout and Maverick models on A3 Mega and A3 Ultra GPU GKE clusters.
For serving the Maverick model on TPUs, we utilize Pathways on Google Cloud. Pathways is a system which simplifies large-scale machine learning computations by enabling a single JAX client to orchestrate workloads across multiple large TPU slices. In the context of inference, Pathways enables multi-host serving across multiple TPU slices. Pathways is used internally at Google to train and serve large models like Gemini.
MaxTextprovides high performance, highly scalable, open-source LLM reference implementations for OSS models written in pure Python/JAX and targeting Google Cloud TPUs and GPUs for training and inference. MaxText now includes reference implementations for Llama4 Scout and Maverick models and includes information on how to perform checkpoint conversion, training, and decoding for Llama4 models.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3edc3b7832b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Added support for DeepSeek Models
Earlier this year, Deepseek released two open-source models: the DeepSeek-V3 model followed by DeepSeek-R1 model. The V3 model provides model innovations and optimizations based on an MoE-based architecture. The R1 model provides reasoning capabilities through the chain-of-thought thinking process.
To help simplify deployment and resource management, we’re releasing new recipes for serving DeepSeek models on Google Cloud Trillium TPUs and A3 Mega and A3 Ultra GPUs.
JetStream now supports DeepSeek-R1-Distill-Llama70B inference on Trillium. A new recipe now provides the steps to deploy DeepSeek-R1-Distill-Llama-70B using JetStream and MaxText on a Trillium TPU VM. With the recent ability to work with Google Cloud TPUs, vLLM users can leverage the performance-cost benefits of TPUs with a few configuration changes. vLLM on TPU now supports all DeepSeek R1 Distilled models on Trillium. Here’s a recipe which demonstrates how to use vLLM, a high-throughput inference engine, to serve the DeepSeek distilled Llama model on Trillium TPUs.
You can also deploy DeepSeek Models using the SGLang Inference stack on our A3 Ultra VMs powered by eight NVIDIA H200 GPUs with this recipe.A recipe for A3 Mega VMs with SGLang is also available, which shows you how to deploy multihost inference utilizing two A3 Mega nodes. Cloud GPU users using the vLLM Inference engine can also deploy DeepSeek Models on the A3 Mega (recipe) and A3 Ultra (recipe) VMs.
MaxText now also includes support for architectural innovations from DeepSeek such as MLA – Multi-Head Latent Attention, MoE Shared and Routed Experts with Loss Free Load Balancing, Expert Parallelism support with Dropless, Mixed Decoder Layers ( Dense and MoE ) and YARN RoPE embeddings. The reference implementations for the DeepSeek family of models allows you to rapidly experiment with your models by incorporating some of these newer architectural enhancements.
Recipe example
The reproducible recipes show the steps to deploy and benchmark inference with the new Llama4 and DeepSeek models. For example, this TPU recipe outlines the steps to deploy the Llama-4-Scout-17B-16E Model with JetStream MaxText Engine with Trillium TPU. The recipe shows steps to provision the TPU cluster, download the model weights and set up JetStream and MaxText. It then shows you how to convert the checkpoint to a compatible format for MaxText, deploy it on a JetStream server, and run your benchmarks.
You can deploy Llama4 Scout and Maverick models or DeepSeekV3/R1 models today using inference recipes from the AI Hypercomputer Github repository. These recipes provide a starting point for deploying and experimenting with Llama4 models on Google Cloud. Explore the recipes and resources linked below, and stay tuned for future updates. We hope you have fun building and share your feedback!
When you deploy open models like DeepSeek and Llama, you are responsible for its security and legal compliance. You should follow responsible AI best practices, adhere to the model’s specific licensing terms, and ensure your deployment is secure and compliant with all regulations in your area.
Looking to fine-tune multimodal AI models for your specific domain but facing infrastructure and implementation challenges? This guide demonstrates how to overcome the multimodal implementation gap using Google Cloud and Axolotl, with a complete hands-on example fine-tuning Gemma 3 on the SIIM-ISIC Melanoma dataset. Learn how to scale from concept to production while addressing the typical challenges of managing GPU resources, data preparation, and distributed training.
Filling in the Gap
Organizations across industries are rapidly adopting multimodal AI to transform their operations and customer experiences. Gartner analysts predict 40% of generative AI solutions will be multimodal (text, image, audio and video) by 2027, up from just 1% in 2023, highlighting the accelerating demand for solutions that can process and understand multiple types of data simultaneously.
Healthcare providers are already using these systems to analyze medical images alongside patient records, speeding up diagnosis. Retailers are building shopping experiences where customers can search with images and get personalized recommendations. Manufacturing teams are spotting quality issues by combining visual inspections with technical data. Customer service teams are deploying agents that process screenshots and photos alongside questions, reducing resolution times.
Multimodal AI applications powerfully mirror human thinking. We don’t experience the world in isolated data types – we combine visual cues, text, sound, and context to understand what’s happening. Training multimodal models on your specific business data helps bridge the gap between how your teams work and how your AI systems operate.
Key challenges organizations face in production deployment
Moving from prototype to production with multimodal AI isn’t easy. PwC survey data shows that while companies are actively experimenting, most expect fewer than 30% of their current experiments to reach full scale in the next six months. The adoption rate for customized models remains particularly low, with only 20-25% of organizations actively using custom models in production.
The following technical challenges consistently stand in the way of success:
Infrastructure complexity: Multimodal fine-tuning demands substantial GPU resources – often 4-8x more than text-only models. Many organizations lack access to the necessary hardware and struggle to configure distributed training environments efficiently.
Data preparing hurdles: Preparing multimodal training data is fundamentally different from text-only preparation. Organizations struggle with properly formatting image-text pairs, handling diverse file formats, and creating effective training examples that maintain the relationship between visual and textual elements.
Training workflow management: Configuring and monitoring distributed training across multiple GPUs requires specialized expertise most teams don’t have. Parameter tuning, checkpoint management, and optimization for multimodal models introduce additional layers of complexity.
These technical barriers create what we call “the multimodal implementation gap” – the difference between recognizing the potential business value and successfully delivering it in production.
How Google Cloud and Axolotl together solve these challenges
Our collaboration brings together complementary strengths to directly address these challenges. Google Cloud provides the enterprise-gradeinfrastructure foundation necessary for demanding multimodal workloads. Our specialized hardware accelerators such as NVIDIA B200 Tensor Core GPUs and Ironwood are optimized for these tasks, while our managed services like Google Cloud Batch, Vertex AI Training, and GKE Autopilot minimize the complexities of provisioning and orchestrating multi-GPU environments. This infrastructure seamlessly integrates with the broader ML ecosystem, creating smooth end-to-end workflows while maintaining the security and compliance controls required for production deployments.
Axolotl complements this foundation with a streamlined fine-tuning framework that simplifies implementation. Its configuration-driven approach abstracts away technical complexity, allowing teams to focus on outcomes rather than infrastructure details. Axolotl supports multiple open source and open weight foundation models and efficient fine-tuning methods like QLoRA. This framework includes optimized implementations of performance-enhancing techniques, backed by community-tested best practices that continuously evolve through real-world usage.
Together, we enable organizations to implement production-grade multimodal fine-tuning without reinventing complex infrastructure or developing custom training code. This combination accelerates time-to-value, turning what previously required months of specialized development into weeks of standardized implementation.
Solution Overview
Our multimodal fine-tuning pipeline consists of five essential components:
Foundational model: Choose a base model that meets your task requirements. Axolotl supports a variety of open source and open weight multimodal models including Llama 4, Pixtral, LLaVA-1.5, Mistral-Small-3.1, Qwen2-VL, and others. For this example, we’ll use Gemma 3, our latest open and multimodal model family.
Data preparation: Create properly formatted multimodal training data that maintains the relationship between images and text. This includes organizing image-text pairs, handling file formats, and splitting data into training/validation sets.
Training configuration: Define your fine-tuning parameters using Axolotl’s YAML-based approach, which simplifies settings for adapters like QLoRA, learning rates, and model-specific optimizations.
Infrastructure orchestration: Select the appropriate compute environment based on your scale and operational requirements. Options include Google Cloud Batch for simplicity, Google Kubernetes Engine for flexibility, or Vertex AI Custom Training for MLOps integration.
Production integration: Streamlined pathways from fine-tuning to deployment.
The pipeline structure above represents the conceptual components of a complete multimodal fine-tuning system. In our hands-on example later in this guide, we’ll demonstrate these concepts through a specific implementation tailored to the SIIM-ISIC Melanoma dataset, using GKE for orchestration. While the exact implementation details may vary based on your specific dataset characteristics and requirements, the core components remain consistent.
Selecting the Right Google Cloud Environment
Google Cloud offers multiple approaches to orchestrating multimodal fine-tuning workloads. Let’s explore three options with different tradeoffs in simplicity, flexibility, and integration:
Google Cloud Batch
Google Cloud Batch is best for teams seeking maximum simplicity for GPU-intensive training jobs with minimal infrastructure management. It handles all resource provisioning, scheduling, and dependencies automatically, eliminating the need for container orchestration or complex setup. This fully managed service balances performance and cost effectiveness, making it ideal for teams who need powerful computing capabilities without operational overhead.
Vertex AI Custom Training
Vertex AI Custom Training is best for teams prioritizing integration with Google Cloud’s MLOps ecosystem and managed experiment tracking. Vertex AI Custom Training jobs automatically integrate with Experiments for tracking metrics, the Model Registry for versioning, Pipelines for workflow orchestration, and Endpoints for deployment.
Google Kubernetes Engine (GKE)
GKE is best for teams seeking flexible integration with containerized workloads. It enables unified management of training jobs alongside other services in your container ecosystem while leveraging Kubernetes’ sophisticatedschedulingcapabilities. GKE offers fine-grained control over resource allocation, making it ideal for complex ML pipelines. For our hands-on example, we’ll use GKE in Autopilot mode, which maintains these integration benefits while Google Cloud automates infrastructure management including node provisioning and scaling. This lets you focus on your ML tasks rather than cluster administration, combining the flexibility of Kubernetes with the operational simplicity of a managed service.
Take a look at our code sample here for a complete implementation that demonstrates how to orchestrate a multimodal fine-tuning job on GKE:
This repository includes ready-to-use Kubernetes manifests for deploying Axolotl training jobs on GKE in Autopilot mode, covering automated cluster setup with GPUs, persistent storage configuration, job specifications, and monitoring integration.
Hands-on example: Fine-tuning Gemma 3 on the SIIM-ISIC Melanoma dataset
This section involves dermoscopic images of skin lesions with labels indicating whether they are malignant or benign. With melanoma accounting for 75% of skin cancer deaths despite its relative rarity, early and accurate detection is critical for patient survival. By applying multimodal AI to this challenge, we unlock the potential to help dermatologists improve diagnostic accuracy and potentially save lives through faster, more reliable identification of dangerous lesions. So, let’s walk through a complete example fine-tuning Gemma 3 on the SIIM-ISIC Melanoma Classification dataset.
For this implementation, we’ll leverage GKE in Autopilot mode to orchestrate our training job and monitoring, allowing us to focus on the ML workflow while Google Cloud handles the infrastructure management.
Data Preparation
The SIIM-ISIC Melanoma Classification dataset requires specific formatting for multimodal fine-tuning with Axolotl. Our data preparation process involves two main steps: (1) efficiently transferring the dataset to Cloud Storage using Storage Transfer Service, and (2) processing the raw data into the format required by Axolotl. To start, transfer the dataset.
Create a TSV file that contains the URLs for the ISIC dataset files:
Set up appropriate IAM permissions for the Storage Transfer Service:
code_block
<ListValue: [StructValue([(‘code’, ‘# Get your current project IDrnexport PROJECT_ID=$(gcloud config get-value project)rnrn# Get your project numberrnexport PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} –format=”value(projectNumber)”)rnrn# Enable the Storage Transfer APIrnecho “Enabling Storage Transfer API…”rngcloud services enable storagetransfer.googleapis.com –project=${PROJECT_ID}rnrn# Important: The Storage Transfer Service account is created only after you access the service.rn# Access the Storage Transfer Service in the Google Cloud Console to trigger its creation:rn# https://console.cloud.google.com/transfer/cloudrnecho “IMPORTANT: Before continuing, please visit the Storage Transfer Service page in the Google Cloud Console”rnecho “Go to: https://console.cloud.google.com/transfer/cloud”rnecho “This ensures the Storage Transfer Service account is properly created.”rnecho “After visiting the page, wait approximately 60 seconds for account propagation, then continue.”rnecho “”rnecho “Press Enter once you’ve completed this step…”rnread -p “”rnrn# Grant Storage Transfer Service the necessary permissionsrnexport STS_SERVICE_ACCOUNT_EMAIL=”project-${PROJECT_NUMBER}@storage-transfer-service.iam.gserviceaccount.com”rnecho “Granting permissions to Storage Transfer Service account: ${STS_SERVICE_ACCOUNT_EMAIL}”rnrngcloud storage buckets add-iam-policy-binding gs://${GCS_BUCKET_NAME} \rn–member=serviceAccount:${STS_SERVICE_ACCOUNT_EMAIL} \rn–role=roles/storage.objectViewer \rn–condition=Nonernrngcloud storage buckets add-iam-policy-binding gs://${GCS_BUCKET_NAME} \rn–member=serviceAccount:${STS_SERVICE_ACCOUNT_EMAIL} \rn–role=roles/storage.objectUser \rn–condition=None’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa8e0>)])]>
Set up a storage transfer job using the URL list:
Navigate to Cloud Storage > Transfer
Click “Create Transfer Job”
Select “URL list” as Source type and “Google Cloud Storage” as Destination type
Enter the path to your TSV file: gs://<GCS_BUCKET_NAME>/melanoma_dataset_urls.tsv
Select your destination bucket
Use the default job settings and click Create
The transfer will download approximately 32GB of data from the ISIC Challenge repository directly to your Cloud Storage bucket. Once the transfer is complete, you’ll need to extract the ZIP files before proceeding to the next step where we’ll format this data for Axolotl. See the notebook in the Github repository here for a full walk-through demonstration on how to format the data for Axolotl.
Preparing Multimodal Training Data
For multimodal models like Gemma 3, we need to structure our data following the extended chat_template format, which defines conversations as a series of messages with both text and image content.
Below is an example of a single training input example:
code_block
<ListValue: [StructValue([(‘code’, ‘{rn “messages”: [rn {rn “role”: “system”,rn “content”: [rn {“type”: “text”, “text”: “You are a dermatology assistant that helps identify potential melanoma from skin lesion images.”}rn ]rn },rn {rn “role”: “user”,rn “content”: [rn {“type”: “image”, “path”: “/path/to/image.jpg”},rn {“type”: “text”, “text”: “Does this appear to be malignant melanoma?”}rn ]rn },rn {rn “role”: “assistant”, rn “content”: [rn {“type”: “text”, “text”: “Yes, this appears to be malignant melanoma.”}rn ]rn }rn ]rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaadf0>)])]>
We split the data into training (80%), validation (10%), and test (10%) sets, while maintaining the class distribution in each split using stratified sampling.
This format allows Axolotl to properly process both the images and their corresponding labels, maintaining the relationship between visual and textual elements during training.
Creating the Axolotl Configuration File
Next, we’ll create a configuration file for Axolotl that defines how we’ll fine-tune Gemma 3. We’ll use QLoRA (Quantized Low-Rank Adaptation) with 4-bit quantization to efficiently fine-tune the model while keeping memory requirements manageable. While A100 40GB GPUs have substantial memory, the 4-bit quantization with QLoRA allows us to train with larger batch sizes or sequence lengths if needed, providing additional flexibility for our melanoma classification task. The slight reduction in precision is typically an acceptable tradeoff, especially for fine-tuning tasks where we’re adapting a pre-trained model rather than training from scratch.
This configuration sets up QLoRA fine-tuning with parameters optimized for our melanoma classification task. Next, we’ll set up our GKE Autopilot environment to run the training.
Setting up GKE Autopilot for GPU Training
Now that we have our configuration file ready, let’s set up the GKE Autopilot cluster we’ll use for training. As mentioned earlier, Autopilot mode lets us focus on our ML task while Google Cloud handles the infrastructure management.
Let’s create our GKE Autopilot cluster:
code_block
<ListValue: [StructValue([(‘code’, ‘# Set up environment variables for cluster configurationrnexport PROJECT_ID=$(gcloud config get-value project)rnexport REGION=us-central1rnexport CLUSTER_NAME=melanoma-training-clusterrnexport RELEASE_CHANNEL=regularrnrn# Enable required Google APIsrnecho “Enabling required Google APIs…”rngcloud services enable container.googleapis.com –project=${PROJECT_ID}rngcloud services enable compute.googleapis.com –project=${PROJECT_ID}rnrn# Create a GKE Autopilot cluster in the same region as your datarnecho “Creating GKE Autopilot cluster ${CLUSTER_NAME}…”rngcloud container clusters create-auto ${CLUSTER_NAME} \rn –location=${REGION} \rn –project=${PROJECT_ID} \rn –release-channel=${RELEASE_CHANNEL}rnrn# Install kubectl if not already installedrnif ! command -v kubectl &> /dev/null; thenrn echo “Installing kubectl…”rn gcloud components install kubectlrnfirnrn# Install the GKE auth plugin required for kubectlrnecho “Installing GKE auth plugin…”rngcloud components install gke-gcloud-auth-pluginrnrn# Configure kubectl to use the clusterrnecho “Configuring kubectl to use the cluster…”rngcloud container clusters get-credentials ${CLUSTER_NAME} \rn –location=${REGION} \rn –project=${PROJECT_ID}rnrn# Verify kubectl is working correctlyrnecho “Verifying kubectl connection to cluster…”rnkubectl get nodes’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa040>)])]>
Now set up Workload Identity Federation for GKE to securely authenticate with Google Cloud APIs without using service account keys:
code_block
<ListValue: [StructValue([(‘code’, ‘# Set variables for Workload Identity Federationrnexport PROJECT_ID=$(gcloud config get-value project)rnexport NAMESPACE=”axolotl-training”rnexport KSA_NAME=”axolotl-training-sa”rnexport GSA_NAME=”axolotl-training-sa”rnrn# Create a Kubernetes namespace for the training jobrnkubectl create namespace ${NAMESPACE} || echo “Namespace ${NAMESPACE} already exists”rnrn# Create a Kubernetes ServiceAccountrnkubectl create serviceaccount ${KSA_NAME} \rn –namespace=${NAMESPACE} || echo “ServiceAccount ${KSA_NAME} already exists”rnrn# Create an IAM service accountrnif ! gcloud iam service-accounts describe ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com &>/dev/null; thenrn echo “Creating IAM service account ${GSA_NAME}…”rn gcloud iam service-accounts create ${GSA_NAME} \rn –display-name=”Axolotl Training Service Account”rn rn # Wait for IAM propagationrn echo “Waiting for IAM service account creation to propagate…”rn sleep 15rnelsern echo “IAM service account ${GSA_NAME} already exists”rnfirnrn# Grant necessary permissions to the IAM service accountrnecho “Granting storage.objectAdmin role to IAM service account…”rngcloud projects add-iam-policy-binding ${PROJECT_ID} \rn –member=”serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com” \rn –role=”roles/storage.objectAdmin”rnrn# Wait for IAM propagationrnecho “Waiting for IAM policy binding to propagate…”rnsleep 10rnrn# Allow the Kubernetes ServiceAccount to impersonate the IAM service accountrnecho “Binding Kubernetes ServiceAccount to IAM service account…”rngcloud iam service-accounts add-iam-policy-binding ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \rn –role=”roles/iam.workloadIdentityUser” \rn –member=”serviceAccount:${PROJECT_ID}.svc.id.goog[${NAMESPACE}/${KSA_NAME}]”rnrn# Annotate the Kubernetes ServiceAccountrnecho “Annotating Kubernetes ServiceAccount…”rnkubectl annotate serviceaccount ${KSA_NAME} \rn –namespace=${NAMESPACE} \rn iam.gke.io/gcp-service-account=${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com –overwriternrn# Verify the configurationrnecho “Verifying Workload Identity Federation setup…”rnkubectl get serviceaccount ${KSA_NAME} -n ${NAMESPACE} -o yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa550>)])]>
Now create a PersistentVolumeClaim for our model outputs. In Autopilot mode, Google Cloud manages the underlying storage classes, so we don’t need to create our own:
<ListValue: [StructValue([(‘code’, ‘# Apply the PVC configurationrnkubectl apply -f model-storage-pvc.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaad30>)])]>
Deploying the Training Job to GKE Autopilot
In Autopilot mode, we specify our GPU requirements using annotations and resource requests within the Pod template section of our Job definition. We’ll create a Kubernetes Job that requests a single A100 40GB GPU:
Create a ConfigMap with our Axolotl configuration:
code_block
<ListValue: [StructValue([(‘code’, ‘# Create the ConfigMap rnkubectl create configmap axolotl-config –from-file=gemma3-melanoma.yaml -n ${NAMESPACE}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa6a0>)])]>
Create a Secret with Hugging Face credentials:
code_block
<ListValue: [StructValue([(‘code’, “# Create a Secret with your Hugging Face tokenrn# This token is required to access the Gemma 3 model from Hugging Face Hubrn# Generate a Hugging Face token at https://huggingface.co/settings/tokens if you don’t have one rnkubectl create secret generic huggingface-credentials -n ${NAMESPACE} –from-literal=token=YOUR_HUGGING_FACE_TOKEN”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e83fef25340>)])]>
Apply training job YAML to start the training process:
code_block
<ListValue: [StructValue([(‘code’, ‘# Start training job rnkubectl apply -f axolotl-training-job.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e83fef8aa90>)])]>
Monitor the Training Process
Fetch the pod name to monitor progress:
code_block
<ListValue: [StructValue([(‘code’, “# Get the pod name for the training jobrnPOD_NAME=$(kubectl get pods -n ${NAMESPACE} –selector=job-name=gemma3-melanoma-training -o jsonpath='{.items[0].metadata.name}’)rnrn# Monitor logs in real-timernkubectl describe pod $POD_NAME -n ${NAMESPACE}rnkubectl logs -f $POD_NAME -n ${NAMESPACE}”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e83fef8a550>)])]>
<ListValue: [StructValue([(‘code’, ‘# Deploy TensorBoardrnkubectl apply -f tensorboard.yamlrnrn# Get the external IP to access TensorBoardrnkubectl get service tensorboard -n ${NAMESPACE}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8421ee8d00>)])]>
Model Export and Evaluation Setup
After training completes, we need to export our fine-tuned model and evaluate its performance against the base model. First, let’s export the model from our training environment to Cloud Storage:
After creating the model-export.yaml file, apply it:
code_block
<ListValue: [StructValue([(‘code’, ‘# Export the modelrnkubectl apply -f model-export.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8421ee8550>)])]>
This will start the export process, which copies the fine-tuned model from the Kubernetes PersistentVolumeClaim to your Cloud Storage bucket for easier access and evaluation.
Once exported, we have several options for evaluating our fine-tuned model. You can deploy both the base and fine-tuned models to their own respective Vertex AI Endpoints for systematic testing via API calls, which works well for high-volume automated testing and production-like evaluation. Alternatively, for exploratory analysis and visualization, a GPU-enabled notebook environment such as a Vertex Workbench Instance or Colab Enterprise offers significant advantages, allowing for real-time visualization of results, interactive debugging, and rapid iteration on evaluation metrics.
In this example, we use a notebook environment to leverage its visualization capabilities and interactive nature. Our evaluation approach involves:
Loading both the base and fine-tuned models
Running inference on a test set of dermatological images from the SIIM-ISIC dataset
Computing standard classification metrics (accuracy, precision, recall, etc.)
Analyzing the confusion matrices to understand error patterns
Generating visualizations to highlight performance differences
For the complete evaluation code and implementation details, check out our evaluation notebook in the GitHub repository.
Performance Results
Our evaluation demonstrated that domain-specific fine-tuning can transform a general-purpose multimodal model into a much more effective tool for specialized tasks like medical image classification. The improvements were significant across multiple dimensions of model performance.
The most notable finding was the base model’s tendency to over-diagnose melanoma. It showed perfect recall (1.000) but extremely poor specificity (0.011), essentially labeling almost every lesion as melanoma. This behavior is problematic in clinical settings where false positives lead to unnecessary procedures, patient anxiety, and increased healthcare costs.
Fine-tuning significantly improved the model’s ability to correctly identify benign lesions, reducing false positives from 3,219 to 1,438. While this came with a decrease in recall (from 1.000 to 0.603), the tradeoff resulted in much better overall diagnostic capability, with balanced accuracy improving substantially.
In our evaluation, we also included results from the newly announced MedGemma—a collection of Gemma 3 variants trained specifically for medical text and image comprehension recently released at Google I/O. These results further contribute to our understanding of how different model starting points affect performance on specialized healthcare tasks.
Below we can see the performance metrics across all three models:
Accuracy jumped from a mere 0.028 for base Gemma 3 to 0.559 for our tuned Gemma 3 model, representing an astounding 1870.2% improvement. MedGemma achieved 0.893 accuracy without any task-specific fine-tuning—a 3048.9% improvement over the base model and substantially better than our custom-tuned version.
While precision saw a significant 34.2% increase in our tuned model (from 0.018 to 0.024), MedGemma delivered a substantial 112.5% improvement (to 0.038). The most remarkable transformation occurred in specificity—the model’s ability to correctly identify non-melanoma cases. Our tuned model’s specificity increased from 0.011 to 0.558 (a 4947.2% improvement), while MedGemma reached 0.906 (an 8088.9% improvement over the base model).
These numbers highlight how fine-tuning helped our model develop a more nuanced understanding of skin lesion characteristics rather than simply defaulting to melanoma as a prediction. MedGemma’s results demonstrate that starting with a medically-trained foundation model provides considerable advantages for healthcare applications.
The confusion matrices further illustrate these differences:
Looking at the base Gemma 3 matrix (left), we can see it correctly identified all 58 actual positive cases (perfect recall) but also incorrectly classified 3,219 negative cases as positive (poor specificity). Our fine-tuned model (center) shows a more balanced distribution, correctly identifying 1,817 true negatives while still catching 35 of the 58 true positives. MedGemma (right) shows strong performance in correctly identifying 2,948 true negatives, though with more false negatives (46 missed melanoma cases) than the other models.
To illustrate the practical impact of these differences, let’s examine a real example, image ISIC_4908873, from our test set:
Disclaimer: Image for example case use only.
The base model incorrectly classified it as melanoma. Its rationale focused on general warning signs, citing its “significant variation in color,” “irregular, poorly defined border,” and “asymmetry” as definitive indicators of malignancy, without fully contextualizing these within broader benign patterns.
In contrast, our fine-tuned model correctly identified it as benign. While acknowledging a “heterogeneous mix of colors” and “irregular borders,” it astutely noted that such color mixes can be “common in benign nevi.” Crucially, it interpreted the lesion’s overall “mottled appearance with many small, distinct color variations” as being “more characteristic of a common mole rather than melanoma.”
Interestingly, MedGemma also misclassified this lesion as melanoma, stating, “The lesion shows a concerning appearance with irregular borders, uneven coloration, and a somewhat raised surface. These features are suggestive of melanoma. Yes, this appears to be malignant melanoma.” Despite MedGemma’s overall strong statistical performance, this example illustrates that even domain-specialized models can benefit from task-specific fine-tuning for particular diagnostic challenges.
These results underscore a critical insight for organizations building domain-specific AI systems: while foundation models provide powerful starting capabilities, targeted fine-tuning is often essential to achieve the precision and reliability required for specialized applications. The significant performance improvements we achieved—transforming a model that essentially labeled everything as melanoma into one that makes clinically useful distinctions—highlight the value of combining the right infrastructure, training methodology, and domain-specific data.
MedGemma’s strong statistical performance demonstrates that starting with a domain-focused foundation model significantly improves baseline capabilities and can reduce the data and computation needed for building effective medical AI applications. However, our example case also shows that even these specialized models would benefit from task-specific fine-tuning for optimal diagnostic accuracy in clinical contexts.
Next steps for your multimodal journey
By combining Google Cloud’s enterprise infrastructure with Axolotl’s configuration-driven approach, you can transform what previously required months of specialized development into weeks of standardized implementation, bringing custom multimodal AI capabilities from concept to production with greater efficiency and reliability.
For deeper exploration, check out these resources:
BigQuery provides a powerful platform for analyzing large-scale datasets with high performance. However, as data volumes and query complexity increase, maintaining operational efficiency is essential. BigQuery workload management provides comprehensive control mechanisms to optimize workloads and resource allocation, preventing performance issues and resource contention, especially in high-volume environments. And today, we’re excited to announce several updates to BigQuery workload management that make it more effective and easy to use.
But first, what exactly is BigQuery workload management?
At its core, BigQuery workload management is a suite of features that allows you to prioritize, isolate, and manage the execution of queries and other operations (aka workloads) within your BigQuery project. It provides granular control over how BigQuery resources are allocated and consumed, enabling you to:
Ensure critical workloads get the resources they need:
Reservationsfacilitate dedicated BigQuery slots, representing defined compute capacity.
Control and optimize cost with:
Slot commitments: Establish a predictable expenditure for BigQuery compute capacity in a specific Edition.
Spend-based commitments: Hourly spend-based commitment with 1yr and 3yr discount options for BigQuery compute working across Editions
Auto-scaling, which allows reservations to dynamically adjust their slot capacity in response to demand fluctuations, operating within predefined parameters. This lets you accommodate peak workloads while preventing over-provisioning during periods of reduced activity.
Enjoy reliability and availability:
Dedicated reservations and commitments provide predictable performance for critical workloads by reducing resource contention.
Help ensure business continuity through managed disaster recovery, providing compute and data availability resilience.
Implementing BigQuery workload management is crucial for organizations seeking to maximize the efficiency, reliability, and cost-effectiveness of their cloud-based data analytics infrastructure.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7cf26c1280>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Updates to BigQuery workload management
BigQuery workload management is focused on providing efficiency and control. The newest features and updates provide better resource allocation, and optimized performance. Key improvements include reservation fairness for optimal slot distribution, reservation predictability for consistent performance, runtime reservation specification for flexibility, reservation labels for enhanced visibility, and autoscaler improvements for rapid and granular scalability.
Reservation fairness Previously, using the fair-sharing method, BigQuery distributed capacity equally across projects. With reservation fairness, BigQuery prioritizes and allocates idle slots equally across all reservations within the same admin project, regardless of the number of projects running jobs in each reservation. Each reservation receives a similar share of available capacity in the idle slot pool, and then its slots are distributed fairly within its projects. Note: allocation assumes presence of demand. Idle slots are not allocated to reservations if no queries are running. This feature is only applicable to BigQuery Enterprise or Enterprise Plus editions, as Standard Edition does not support idle slots.
Figure 1: Project-based fairness
Configurations represent reservations with 0 baseline: The “Number” under the reservation is the total slots the projects in that reservation get through (Project) fair sharing. Note: Allocation assumes presence of demand. Idle slots are not allocated if no queries are running.
Figure 2: Reservation fairness enabled
Here, configurations represent reservations with 0 baseline: Under the reservation, you can see the total slots the projects in that reservation gets through (Reservation) fair-sharing. Note: Allocation assumes presence of demand. Idle slots are not allocated if no queries are running.
Reservation predictability This feature allows you to set the absolute maximum number of consumed slots on a reservation, enhancing control over cost and performance fluctuations in your slot consumption. BigQuery offers baseline slots, idle slots, and autoscaling slots as potential capacity resources. When you create a reservation with a maximum size, confirm the number of baseline slots and the appropriate configuration of autoscaling and idle slots based on your past workloads. Note: To use predictable reservations, you must enable reservation fairness. Baselines are optional.
Reservation – flexibility and securability BigQuery lets you specify which reservation a query should run on at runtime. Enhanced flexibility and securability features provide greater control over resource allocation and improved flexibility, including the ability to grant role-based access. You can specify a reservation at runtime using the CLI, UI, SQL, or API, overriding the default reservation assignment for your project, folder, or organization. The assigned reservation must be in the same region as the query you are running.
Reservation labels When you add labels to your reservations, they are included in your billing data. This adds granular visibility into BigQuery slot consumption for specific workloads or teams, making tracking and optimization easier. You can then use these labels to filter your Cloud Billing data by the Analysis Slots Attribution SKU, giving you a powerful tool to track and analyze your spending on BigQuery slots based on the specific labels you have assigned.
Autoscaler improvements Last but not least, the BigQuery autoscaler now delivers enhanced performance and adaptability for resource management. You can enjoy near-instant scale up, improved granularity (improved from 100 slot increments to 50 slot increments), and faster scale down. These features provide rapid capacity adjustments to meet workload demands, greater predictability and understanding of usage. This 50-slot increment also applies to setting Baseline and ReservationMax capacities.
BigQuery workload management is an essential tool for optimizing both your performance and costs. By using reservations, spend-based commitments, and new features such as reservation predictability and fairness, you can significantly improve your data analysis performance. This leads to better data-driven decision-making by optimizing resource allocation and cutting costs, allowing your team to gain more meaningful insights from their data and experience consistent performance.
Today, we are excited to announce that Gartner® has named Google as a Leader in the 2025 Magic Quadrant™ for Data Science and Machine Learning Platforms report (DSML). We believe that this recognition is a reflection of continued innovations to address the needs of data science and machine learning teams, as well as new types of practitioners working alongside data scientists in the dynamic space of generative AI.
Download the complimentary 2025 Gartner Magic Quadrant™ for Data Science and Machine Learning Platforms.
AI is driving a radical transformation in how organizations operate, compete, and innovate. Working closely with customers, we’re delivering the innovations for a unified data and AI platform to meet the demands of the AI era, including data engineering and analysis, data science, MLOps, gen AI application and agent development tools, and a central layer of governance.
Unified AI platform with best-in-class multimodal AI
Google Cloud offers a wide spectrum of AI capabilities, starting from the foundational hardware like Tensor Processing Units (TPUs) to AI agents, and the tools for building them. These capabilities are powered by our pioneering AI research and development, and our expertise in taking AI to production with large-scale applications such as YouTube, Maps, Search, Ads, Workspace, Photos, and more.
All of this research and experience fuels our Vertex AI platform, our unified AI platform for MLOps tooling, predictive and gen AI use cases, that sits at the heart of Google’s DSML offering. Vertex AI provides a comprehensive suite of tools covering the entire AI lifecycle, including data engineering and analysis tools, data science workbenches, MLOps capabilities for deploying and managing models, and specialized features for developing gen AI applications and agents. Moreover, our Self-Deploy capability enables our partners to not only build and host their models within Vertex AI for internal users, but also distribute and commercialize those models. Customer use of Vertex AI has grown 20x in the last year driven by Gemini, Imagen, and Veo models.
Vertex AI Model Garden offers a curated selection of over 200 enterprise-ready models from Google like Gemini, partners like Anthropic, and the open ecosystem. Model Garden helps customers access the highest performing foundation models suited for their business needs and easily customize them with their own data, deploy to applications with just one click, and scale with end-to-end MLOps built-in.
Building on Google DeepMind research, we recently announced Gemini 2.5, our most intelligent AI model yet. Gemini 2.5 models are now thinking models, capable of reasoning (and showing its reasoning) before responding, resulting in dramatically improved performance. Transparent step-by-step reasoning is crucial for enterprise trust and compliance. We also launched Gemini 2.5 Flash, our cost-effective, low-latency workhorse model. Gemini 2.5 Flash will be generally available for all Vertex AI users in early June, with 2.5 Pro generally available soon after.
Vertex AI is now the only platform with generative media models across all modalities — video, image, speech, and music. At Google I/O, we announced several innovations in this portfolio, including the availability of Veo 3, Lyria 2, and Imagen 4 on Vertex AI. Veo 3 combines video and audio generation, taking content generation to a new level. The state-of-the-art model features improved quality when generating videos from text and image prompts. In addition, Veo 3 also generates videos with speech (dialogue and voice-overs) and audio (music and sound effects). Lyria 2, Google’s latest music generation model, features high-fidelity music across a range of styles. And Imagen 4, Google’s highest-quality image generation model, delivers outstanding text rendering and prompt adherence, higher overall image quality across all styles, and multilingual prompt support to help creators globally. Imagen 4 also supports multiple model variants to help customers optimize around quality, speed and cost.
All of this innovation resides on Vertex AI, so that AI projects can reach production and deliver business value while teams collaborate to improve models throughout the development lifecycle.
For instance, customers like Radisson Hotel Group have redefined personalized marketing with Google Cloud. Partnering with Accenture, the global hotel chain leveraged BigQuery, Vertex AI, Google Ads, and Google’s multimodal Gemini models to build a generative AI agent to help create locally relevant ad content and translate it into more than 30 languages — reducing content creation time from weeks to hours. This AI-driven approach has increased team productivity by 50%, boosted return on ad spend by 35%, and driven a 22% increase in ad-driven revenue.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7cf026cf40>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
A new era of multi-agent management
Eventually, we believe that every enterprise will rely on multi-agent systems, including those built on different frameworks or providers. We recently announced multiple enhancements to Vertex AI so you can build agents with an open approach and deploy them with enterprise-grade controls. This includes anAgent Development Kit (ADK), available for Python and Java, with an open-source framework for designing agents built on the same framework that powers Google Agentspace and Google Customer Engagement Suite agents.Many powerful examples and extensible sample agents are readily available in Agent Garden. You can also take advantage of Agent Engine, a fully managed runtime in Vertex AI that helps you deploy your custom agents to production with built-in testing, release, and reliability at global scale.
Connecting all your data to AI
Enterprise agents need to be grounded in relevant data to be successful. Whether helping a customer learn more about a product catalog or helping an employee navigate company policies, agents are only as effective as the data they are connected to. At Google Cloud, we do this by making it easy to leverage any data source. Whether it’s structured data in a relational database or unstructured content like presentations and videos, Google Cloud tools let customers easily use their existing data architectures as retrieval-augmented generation (RAG) solutions. With this approach, developers get the benefits of Google’s decades of search experience from out-of-the-box offerings, or can build their own RAG system with best-in-class components.
For RAG on an enterprise corpus, Vertex AI Search is our out-of-the-box solution that delivers high quality at scale, with minimal development or maintenance overhead. Customers who prefer to fully customize their solution can use our suite of individual components including the Layout Parser to prepare unstructured data, Vertex embedding models to create multimodal embeddings, Vertex Vector Search to index and serve the embeddings at scale, and the Ranking API to optimize the results. And RAG Engine provides an easy way for developers to orchestrate these components, or mix and match with third-party and open-source tools. BigQuery customers can also use its built-in vector search capabilities for RAG, or leverage the new connector with Vertex Vector Search to get the best of both worlds, by combining the data in BigQuery with a purpose-built high performance vector search tool.
Unified data and AI governance
With built-in governance, customers can simplify how they discover, manage, monitor, govern, and use their data and AI assets. Dataplex Universal Catalog brings together a data catalog and a fully managed, serverless metastore, enabling interoperability across Vertex AI, BigQuery, and open-source formats such as Apache Spark and Apache Iceberg with a common metadata layer. Customers can also use a business glossary for a shared understanding of data and define company terms, creating a consistent foundation for AI.
At Google Cloud, we’re committed to helping organizations build and deploy AI and we are investing heavily in bringing new predictive and gen AI capabilities to Vertex AI. For more, download the full 2025 Gartner Magic Quadrant™ for Data Science and Machine Learning Platforms report.
Gartner Magic Quadrant for Data Science and Machine Learning Platforms – Afraz Jaffri, Maryam Hassanlou, Tong Zhang, Deepak Seth, Yogesh Bhatt, May 28, 2025
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Google.
GARTNER is a registered trademark and service mark of Gartner Inc., and/or its affiliates in the U.S and internationally, and MAGIC QUADRANT is a registered trademark of Gartner Inc., and/or its affiliates and are used herein with permission. All rights reserved.
Here’s a common scenario when building AI agents that might feel confusing: How can you use the latest Gemini models and an open-source framework like LangChain and LangGraph to create multimodal agents that can detect objects?
Detecting objects is critically important for use cases from content moderation to multimedia search and retrieval. Langchain provides tools to chain together LLM calls and external data. LangGraph provides a graph structure to build more controlled and complex multiagents apps.
In this post, we’ll show you which decisions you need to make to combine Gemini, LangChain and LangGraph to build multimodal agents that can identify objects. This will provide a foundation for you to start building enterprise use cases like:
Object identification: Using different sources of data to verify if an object exist on a map
Multimedia search and retrieval: Finding files that contains a specific object
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7cf2bda670>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
First decision: No-code/low-code, or custom agents?
The first decision enterprises have to decide is: no-code/low-code options or build custom agents? If you are building a simple agent like a customer service chat bot, you can use Google’s Vertex AI Agent Builder to build a simple agent in a few minutes or start from pre-built agents that are available in Google Agentspace Agent Gallery.
But if your use case requires orchestration of multiple agents and integration with custom tooling, you would have to build custom agents which leads to the next question.
Second decision: What agentic framework to use?
It’s hard to keep up with so many agentic frameworks out there releasing new features every week. Top contenders include CrewAI, Autogen, LangGraph and Google’s ADK. Some of them, like ADK and CrewAI, have higher levels of abstraction while others like LangGraph allow higher degree of control.
That’s why in this blog, we center the discussion on building a custom agent using the open-sourced LangChain, LangGraph as an agentic framework, and Gemini 2.0 Flash as the LLM brain.
Code deep dive
This example code identifies an object in an image, in an audio file, and in a video. In this case we will use a dog as the object to be identified. We have different agents (image analysis agent, audio analysis agent, and a video analysis agent) performing different tasks but all working together towards a common goal, object identification.
Generative AI workflow for object detection
This gen AI workflow entails a user asking the agent to verify if a specific object exists in the provided files. The Orchestrator Agent will call relevant worker agents: image_agent, audio_agent, and video_agent while passing the user question and the relevant files. Each worker agent will call respective tooling to convert the provided file to base64 encoding. The final finding of each agent is then passed back to the Orchestrator Agent. The Orchestrator Agent then synthesizes the findings and makes the final determination. This code can be used as a starting point template where you need to ask an agent to reason and make a decision or generate conclusions from different sources.
If you want to create multiagent systems with ADK, here is a video production agent built by a Googler which generates video commercials from user prompts and utilizes Veo for video content generation, Lyria for composing music, and Google Text-to-Speech for narration. This example demonstrates the fact that many ingredients can be used to meet your agentic goals, in this case an AI agent as a production studio. If you want to try ADK, here is an ADK Quickstart to help you kick things off.
Third decision: Where to deploy the agents?
If you are building a simple app that needs to go live quickly, Cloud Run is an easy way to deploy your app. Just like any serverless web app, you can follow the same instructions to deploy on Cloud Run. Watch this video of building AI agents on Cloud Run. However, if you want more enterprise grade managed runtime, quality and evaluation, managing context and monitoring, Agent Engine is the way to go. Here is a quick start for Agent Engine. Agent Engine is a fully managed runtime which you can integrate with many of the previously mentioned frameworks – ADK, LangGraph, Crew.ai, etc (see the image below, from the official Google Cloud Docs).
Get started
Building intelligent agents with generative AI, especially those capable of multimodal understanding, is akin to solving a complex puzzle. Many developers are finding that a prototypical agentic build involves a LangChain agent with Gemini Flash as the LLM. This post explored how to combine the power of Gemini models with open-source frameworks like LangChain and LangGraph. To get started right away, use this ADK Quickstart and or visit our Agent Development GitHub.
The latest version of the Bigtable Spark connector opens up a world of possibilities for Bigtable and Apache Spark applications, not least of which is additional support for Bigtable and Apache Iceberg, the open table format for large analytical datasets. In this blog post, we explore how to use the Bigtable Spark connector to interact with data stored in Bigtable from Apache Spark, and delve into powerful use cases that leverage Apache Iceberg.
The Bigtable Spark connector allows you to directly read and write Bigtable data using Apache Spark in Scala, SparkSQL and DataFrames. This integration gives you direct access to your operational data for building data pipelines that support training ML models, ETL/ELT, or generating real time dashboards. When combined with Bigtable Data Boost, Bigtable’s serverless compute service, you can get high-throughput read jobs on operational data without impacting Bigtable application performance. Apache Spark is commonly used as a processing engine for working with data lakehouses and data stored in open table formats, including Apache Iceberg. We’ve worked to enhance the Bigtable Spark connector for working with data across both Bigtable and Iceberg, including query optimizations such as join pushdowns and support for dynamic column filtering.
This opens up Bigtable and Apache Iceberg integrations for:
Accelerated data science: In the past, Bigtable developers and administrators had to generate datasets for analytics and move them out of Bigtable for analytical processing in tools like notebooks and PySpark. Now, data scientists can directly interact with Bigtable’s operational data within their Apache Spark environments using a combination of both Bigtable and Apache Iceberg data, streamlining data preparation, exploration, analysis, and even the creation of Iceberg tables. When combined with Data Boost, this can be done without any impact to production applications.
Low-latency serving: Write-back capabilities support making real-time updates to Bigtable. This means you can use Iceberg data to create predictions or features in batch and easily serve those features from Bigtable for low-latency online access within an end-user application.
To get started, follow the Quickstart or read on to learn more about the two use cases outlined above.
What the Bigtable Spark connector can do for you
Now, let’s take a look at some ways you could put the Bigtable Spark connector into service.
Accelerated data science
Bigtable is designed for throughput-intensive applications, offering throughput that can be adjusted by adding and removing nodes. If you are writing in batch over the Apache Spark connector, you can achieve even more throughput through the use of the spark.bigtable.batch.mutate.size option, which takes advantage of Bigtable’s mutation batching functionality.
Throughput and queries per second (QPS) can be autoscaled, resized without any restarting, and the data is automatically replicated for high availability and faster region-specific access. There are also specialized data types that make it easy to build distributed counters, which can give you up-to-date metrics on what is happening in your system.
Conversely, Apache Iceberg is a high-performance open-source table format for large analytical datasets. Iceberg lets you build analytics tables, often with aggregated data, that can be shared across engines such as Apache Spark and BigQuery.
Customers have found that event collection in Bigtable with advanced analytics of those events using Apache Spark and Apache Iceberg can be a powerful combination. For example, you may want to collect clicks, views, sensor readings, device usage, gaming activity, engagement, or other telemetry in real time, and have a view of what is happening in the system using Bigtable’s continuous materialized views. You might then use Apache Spark’s batch processing and ML capabilities and even join with historical Iceberg data to run advanced analytics and understand the trends over time, identify anomalies, or generate machine learning models on the data. When these advanced analytics in Apache Spark are done using a Data Boost application profile, this analysis can be done without impacting real-time data collection and operational analytics.
Low-latency serving: Bigtable for model serving of BigQuery Iceberg Managed Tables
Apache Iceberg provides an efficient way to combine and manage large datasets for machine learning tasks. By storing your data in Iceberg tables, multiple engines can write to the same warehouse and leverage Spark or BigQuery to train and evaluate the ML models. Once you have a trained model, you often need to publish feature tables or feature vectors into a low-latency database for online application access.
Bigtable is well suited for low-latency applications that require lookups against these large-scale datasets. Let’s say you have a dataset of customer transactions stored across multiple Iceberg tables. You can use SparkSQL to combine this data and SparkML to train a fraud detection model on this data. Once the model is trained, you can use it to predict the probability of fraud for new transactions. You can then write these predictions back to Bigtable using the Bigtable Spark connector, where they can be accessed by your fraud detection application.
Use case: Vehicle telemetry using Bigtable and the Apache Spark connector
Let’s look at an abbreviated example of how Bigtable and the Apache Spark connector might work together for a company that is tracking vehicle telemetry and wants to enable their fleet managers with immediate access to real-time KPIs of equipment effectiveness, while also allowing data scientists to build a predictive maintenance schedule that they can provide to drivers.
While this specific use case relies on vehicles as a case study, it is a generally applicable architecture pattern that can be used for a variety of telemetry and IOT use cases ranging from measuring telecommunications equipment reliability to building KPIs for Overall Equipment Effectiveness (OEE) in a manufacturing operation.
Let’s take a look at the various components of this architecture.
Bigtable is an excellent choice for the high-throughput, low-latency writes that are often required for telemetry data, where vast amounts of data are continuously streamed in. With telemetry data, the data schema changes often, requiring a flexible schema that Bigtable provides. Bigtable clusters can be deployed throughout the globe with different autoscaling configurations that can match the local demand for writes. The ingested data is automatically replicated to all clusters, giving you a single unified view of the data. There are also open-source streaming connectors for both Apache Kafka and Apache Fink, as well as industry-specific connectors such as NATS for automotive data.
Bigtable continuous materialized views offer real-time data transformations and aggregations on streaming data, enabling vehicle managers to gain immediate insights into their fleet’s activity and make data-driven adjustments.
Keeping all data within Bigtable facilitates advanced analytics on historical information using Apache Spark. Data scientists can directly access this data in Apache Spark using the Bigtable Spark connector without needing to create copies. Furthermore, Bigtable Data Boost enables the execution of large batch or machine learning jobs, such as training predictive models or generating comprehensive reports, without impacting the performance of live applications. These jobs can involve joining streaming event data (e.g., real-time vehicle telemetry like GPS coordinates, speed, engine RPM, fuel consumption, or acceleration/braking patterns) with historical or static datasets stored in Apache Iceberg (e.g., vehicle master data including make, model, year, VIN, vehicle type, maintenance history, or driver assignments). Apache Iceberg may also include additional data sources such as weather and traffic analysis. This allows for richer insights, such as correlating specific driving behaviors with maintenance needs, predicting component failures based on operational data, or optimizing routes by combining real-time traffic with vehicle capacity and destination information. You can also provide analytics teams with secure Bigtable data access through Bigtable Authorized Views to limit data access to sensitive information like GPS.
Machine learning-driven insights, such as predictive maintenance recommendations that are often generated in batch processes and potentially stored in Iceberg tables, can be written back to Bigtable using the Bigtable Spark connector. This makes these valuable insights immediately accessible to user-facing applications.
Bigtable excels at high-scale reads in user-facing applications for this vehicle application thanks to its distributed architecture and design that’s optimized for massive, time-series data. It can handle billions of rows and thousands of columns. Bigtable can quickly retrieve this data with low latency because it distributes data across many nodes and performs fast, single-row lookups and efficient range scans, helping to ensure a smooth and responsive user experience even with millions of vehicles constantly streaming data.
Igniting the spark
The Bigtable Spark connector, combined with the recent connector enhancements for Apache Iceberg and Bigtable Data Boost, unlocks new possibilities for large-scale data processing on operational data. Whether you’re training ML models or performing serverless analytics, this powerful combination can help you implement new use cases and ease the operational burden of running complex ETL jobs. By leveraging the scalability, performance, and flexibility of these technologies, you can build robust and efficient data pipelines that can handle your most demanding workloads.
On Google Cloud, Dataproc Serverless simplifies running Apache Spark batch workloads by removing the need to manage clusters. When processing data via Bigtable’s serverless Data Boost, these jobs become highly cost-effective: you only pay for the precise amount of processing power you consume and solely for the duration your workload is running, without needing to configure any compute infrastructure.
For years, BigQuery has been synonymous with fully managed, fast, petabyte-scale analytics. Its columnar architecture and decoupled storage and compute have made it the go-to data warehouse for deriving insights from massive datasets.
But what about the moments between the big analyses? What if you need to:
Modify a handful of customer records across huge tables without consuming all your slots or running for minutes on end?
Track exactly how some data has evolved row by row?
Act immediately on incoming streaming data, updating records on the fly?
Historically, these types of “transactional” needs might have sent you searching for a database solution or required you to build complex ETL/ELT pipelines around BigQuery. The thinking was clear: BigQuery was for analysis, and you used something else for dynamic data manipulation.
That’s changing. At Google Cloud, we’ve been steadily evolving BigQuery, adding powerful capabilities that blur these lines and bring near-real-time, transactional-style operations directly into your data warehouse. This isn’t about turning BigQuery into a traditional OLTP database; rather, it’s about empowering you to handle common data management tasks more efficiently within the BigQuery ecosystem.
This shift means fewer complex workarounds, faster reactions to changing data, and the ability to build more dynamic and responsive applications right where your core data lives.
Today, we’ll explore three game-changing features that are enabling this evolution:
Efficient fine-grained DML mutations: Forget costly table rewrites for small modifications. Discover how BigQuery now handles targeted UPDATEs, DELETEs, and MERGEs with significantly improved performance and resource efficiency.
Change history support for updates and deletes: Go beyond simple snapshots. See how BigQuery can now capture the granular history of UPDATEs and DELETEs, providing a detailed audit trail of data within your tables.
Real-time updates with DML over streaming data: Don’t wait for data to settle. Learn how you can apply UPDATE, DELETE, and MERGE operations directly to data as it streams into BigQuery, enabling immediate data correction, enrichment, or state management.
Ready to see how these capabilities can simplify your workflows and unlock new possibilities within BigQuery? Let’s dive in and see them in action!
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e20b1cde1f0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
1. Efficient fine-grained DML mutations
BigQuery has supported Data Manipulation Language (DML) statements like UPDATE, DELETE, and MERGE for years, allowing you to modify data without recreating entire tables. However, historically, performing these operations — especially small, targeted changes on very large tables — was less efficient than you might have hoped for. The challenge? Write amplification.
When you executed a DML mutation, BigQuery needed to rewrite entire underlying storage blocks (think of them as internal file groups) containing the rows you modified. Even if your statement only affected a few rows within a block, the whole block might have needed to be rewritten. This phenomenon, sometimes called “write amplification,” could lead to significant slot consumption and longer execution times, particularly for sparse mutations (changes scattered across many different blocks in a large table). This sometimes made operations like implementing GDPR’s “right to be forgotten” by deleting specific user records slow or costly.
To address this, we introduced fine-grained DML in BigQuery, a set of performance enhancements that optimize sparse DML mutation operations.
When enabled, instead of always rewriting large storage blocks, BigQuery fine-grained DML can pinpoint and modify data with much finer granularity. It leverages optimized metadata indexes to rewrite only the necessary mutated data, drastically reducing the processing, I/O, and consequently, the slot time consumed for sparse DML. The result? Faster, more cost-effective DML, making BigQuery much more practical for workloads involving frequent, targeted data changes.
Grupo Catalana Occidente, a leading global insurance provider, is excited about fine-grain DML’s ability to help them integrate changes to their data in real time:
“In our integration project between Google BigQuery, SAP, and MicroStrategy, we saw an 83% improvement in DML query runtime when we enabled BigQuery fine-grained DML. Fine-grained DML allows us to achieve adequate performance and reduces the time of handling large volumes of data. This is an essential functionality for implementing the various data initiatives we have in our pipeline.” – Mayker Oviedo, Chief Data Officer, Grupo Catalana Occidente
Let’s quantify this improvement ourselves. To really see the difference, we need a large table where updates are likely to be sparse. We’ll use a copy of the bigquery-public-data.wikipedia.pageviews_2024 dataset, which contains approximately 58.7 billion rows and weighs in at ~2.4 TB.
(Important Note: Running the following queries involves copying a large dataset and processing significant amounts of data. This will incur BigQuery storage and compute costs based on your pricing model. Proceed with awareness if you choose to replicate this experiment.)
Step 1: Create the Table Copy
First, let’s copy the public dataset into our own project. We’ll also enable change history, which we’ll use later on.
code_block
<ListValue: [StructValue([(‘code’, “– Make a copy of the public 2024 Wikipedia page views tablernCREATE OR REPLACE TABLE `my_dataset.wikipedia_pageviews_copy`rnCOPY `bigquery-public-data.wikipedia.pageviews_2024`;rnrn– Enable change history on your new table. We’ll use this later.rnALTER TABLE `my_dataset.wikipedia_pageviews_copy`rnSET OPTIONS(rn enable_change_history = TRUErn);”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e20b1cde580>)])]>
Step 2: Run Baseline UPDATE (without optimization)
Now, let’s perform a sparse update, modifying about 0.1% of the rows scattered across the table.
code_block
<ListValue: [StructValue([(‘code’, “– Baseline UPDATE: Modify ~0.1% of rowsrnUPDATE `my_dataset.wikipedia_pageviews_copy`rnSET views = views + 1000rnWHERE title LIKE ‘%Goo%’rn AND datehour IS NOT NULL;”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e20b1cde070>)])]>
Result: This update modified approximately 61.2 million records. In our test environment, without the optimization enabled, it took roughly 10 minutes and 49 seconds to complete and consumed ~787.3 million slot milliseconds.
Step 3: Enable fine-grained mutations
Next, we’ll enable the optimization using a simple ALTER TABLE statement.
Let’s run a similar update, again modifying roughly 0.1% of the data.
code_block
<ListValue: [StructValue([(‘code’, “– Optimized UPDATE: Modify the same number of rowsrnUPDATE `my_dataset.wikipedia_pageviews_copy`rnSET views = views – 999 — Change the value slightly for a distinct operationrnWHERE title LIKE ‘%Goo%’rn AND datehour IS NOT NULL;”), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e209c76f5b0>)])]>
Result: This time, the update (again affecting ~61.2 million sparse records) completed dramatically faster. It took only 44 seconds and consumed ~51.8 million slot milliseconds.
Now let’s compare the results:
Metric
Baseline (No Optimization)
Optimized with fine-grained DML
Improvement Factor
Query execution time
10 min 49 sec
44 sec
~14.8x Faster
Slot Milliseconds
~787.3 million
~51.8 million
~15.2x Less
Wow! Enabling fine-grained mutations resulted in a massive ~14.8x reduction in query timeand ~15.2x reduction in slot consumption! This illustrates how this optimization makes targeted DML operations significantly more performant and cost-effective on large BigQuery tables.
2. Tracking row-level history with the CHANGES TVF
Understanding how data evolves row by row is crucial for auditing, debugging unexpected data states, and building downstream processes that react to specific modifications. While BigQuery’s time travel feature lets you query historical snapshots of a table, it doesn’t easily provide a granular log of individual UPDATE, DELETE, and INSERT operations. Another feature, the APPENDS Table-Valued Function (TVF), only tracks additions, but not modifications or deletions.
Enter the BigQuery change history function, CHANGES TVF, which provides access to a detailed, row-level history of appends and modifications made to a BigQuery table. It allows you to see not just what data exists now, but how it got there — including the sequence of insertions, updates, and deletions.
It’s important to note that you must enable change history tracking on the table before the changes you want to query occur. BigQuery retains this detailed change history for a table’s configured time travel duration. By default, this is 7 days. Also, the CHANGES function can’t query the last ten minutes of a table’s history. Therefore, the end_timestamp argument value must be at least ten minutes prior to the current time.
To explore this further, let’s look at the changes we made to our Wikipedia pageviews table earlier. We’ll look for changes made to the Google Wikipedia article from January 1st, 2024.
code_block
<ListValue: [StructValue([(‘code’, ‘– Query the same Wikipedia pageviews table described above. Keep in mind this must run 10 min after you ran the DML update above and you must have already set enable_change_history to TRUE.rnSELECTrn *rnFROMrn CHANGES(TABLE `my_dataset.wikipedia_pageviews_copy`, NULL, TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 601 SECOND))rnWHERE rn title LIKE “Google” rn AND wiki = “en”rn AND datehour = “2024-01-01″rnORDER BY _CHANGE_TIMESTAMP ASC’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e209c76fa60>)])]>
As you can see from the query results, there are two new pseudo columns within our table, _CHANGE_TYPE and _CHANGE_TIMESTAMP. The _CHANGE_TYPE column refers to the type of change that produced the row, while the _CHANGE_TIMESTAMP column indicates the commit time of the transaction that made the change.
Thus, parsing the changes made to the table, you can see:
Our table initially received an INSERT with this record’s views totaling 288. This resulted from the initial copy from the Wikipedia pageviews public dataset.
The table then simultaneously recorded an UPDATE and DELETE operation from our first DML statement, which added 1,000 views to the record. This is to reflect our original event of 288 views being deleted and replaced with an event showing 1,288 views.
Then finally, our table again simultaneously recorded an UPDATE and DELETE operation for our second DML. The delete was for the record with 1,288 views, and the update was for the final event, showing 289 views.
This detailed, row-level change tracking provided by the CHANGES TVF is incredibly powerful for building robust audit trails, debugging data anomalies by tracing their history, and even for building disaster recovery pipelines that replicate BigQuery changes to other systems in near real-time.
3. Real-time mutations: DML on freshly streamed data
BigQuery’s Storage Write API provides a high-throughput, low-latency way to stream data into your tables, making it immediately available for querying. This is fantastic for powering real-time dashboards and immediate analysis.
While the Storage Write API lets you instantly query this freshly streamed data, historically, you couldn’t immediately modify it using DML statements like UPDATE, DELETE, or MERGE. The incoming data first lands in a temporary, write-optimized storage (WOS) buffer, designed for efficient data ingestion. Before DML could target these rows, they needed to be automatically flushed and organized into BigQuery’s main columnar, read-optimized storage (ROS) by a background process. This optimization step, while essential for query performance, meant there was often a delay (potentially varying from minutes up to ~30 minutes or more) before you could apply corrections or updates via DML to the newest data.
That waiting period is no longer a hard requirement! BigQuery now supports executing UPDATE, DELETE, and MERGE statements that can directly target rows residing in write-optimized storage, before they are flushed to the columnar storage.
Why does this matter? This is a significant enhancement for real-time data architectures built on BigQuery. It eliminates the delay between data arrival and the ability to manipulate it within the warehouse itself. You can now react instantly to incoming events, correct errors on the fly, or enrich data as it lands, without waiting for background processes to complete or implementing complex pre-ingestion logic outside of BigQuery.
This capability unlocks powerful scenarios directly within your data warehouse like:
Immediate data correction: Did a sensor stream an obviously invalid reading? Or did an event arrive with incorrect formatting? Run an UPDATE or DELETE immediately after ingestion to fix or remove the bad record before it impacts real-time dashboards or downstream consumers.
Real-time enrichment: As events stream in, UPDATE them instantly with contextual information looked up from other dimension tables within BigQuery (e.g., adding user details to a clickstream event).
On-the-fly filtering/flagging: Implement real-time quality checks. If incoming data fails validation, immediately DELETE it or UPDATE it with a ‘quarantine’ flag.
By enabling DML operations directly on data in the streaming buffer, BigQuery significantly shortens the cycle time for acting on real-time data, simplifying workflows and allowing for faster, more accurate data-driven responses.
BigQuery for dynamic data management
As we’ve explored, we’ve significantly expanded BigQuery’s capabilities beyond its traditional analytical strengths. Features like fine-grained DML, change history support for updates and deletes, and the ability to run DML directly on freshly streamed data represent a major leap forward.
While we’re not aiming to replace your specialized OLTP databases with BigQuery for high-volume, low-latency transactions, it’s undeniably becoming a far more versatile platform. These enhancements mean data practitioners can increasingly:
Perform targeted UPDATEs and DELETEs efficiently, even on massive tables
Track the precise history of data modifications for auditing and debugging
React to and modify streaming data in near real-time
All of this happens within the familiar, scalable, and powerful BigQuery environment you already use for analytics. This convergence simplifies data architectures, reduces the need for complex external pipelines, and enables faster, more direct action on your data.
Customers like Statsig, a leading product development company which enables their customers to build faster and make smarter decisions can now use BigQuery for new use cases:
“BigQuery adding new features like fine-grained DML allows us to use BigQuery for more transactional use cases here at Statsig.” – Pablo Beltran, Staff Software Engineer, Statsig
So, the next time your project requires a blend of deep analysis and more dynamic data management, remember these powerful tools in your BigQuery toolkit.
Ready to learn more? Explore the official Google Cloud documentation:
Last month at Google Cloud Next ‘25, we announced MCP Toolbox for Databases to make it easier to connect generative AI agents to databases, and automate core enterprise workflows. MCP Toolbox for Databases (Toolbox) is an open-source Model Context Protocol (MCP) server that allows developers to easily connect gen AI agents to enterprise data. It supports BigQuery, AlloyDB (including AlloyDB Omni), Cloud SQL for MySQL, CloudSQL for Postgres, Cloud SQL for SQL Server, Spanner, self-managed open source databases including PostgreSQL, MySQL and SQLLite as well as databases from other growing list of vendors including Neo4j, Dgraph, and more
Today, we are announcing additional capabilities in Toolbox specifically designed to empower AI-assisted development. Toolbox now makes it easy to connect databases to AI assistants in your IDE.
MCP is an emerging open standard created by Anthropic for connecting AI systems with data sources through a standardized protocol, replacing fragmented integrations that require custom integrations. Now with Toolbox, any MCP-compatible AI assistant (including Claude Code, Cursor, Windsurf, Cline, and many more) can help you write application code that queries your database, designs a schema for a new application, refactors code when the data model changes, generates data for integration testing, explores the data in your database, and much more.
Today, we’ll explore these new capabilities and how you can get started.
Using MCP with Google Cloud databases
As you carry out AI-assisted tasks like code generation, code refactoring, code completion, automated testing, and documentation writing using AI-native IDEs like Claude Code, Cursor, Windsurf or established IDEs such as VSCode, you’re probably looking for the most efficient way to connect with your data. Let’s see how this can be done with MCP Toolbox and Google Cloud databases.
Toolbox’s new pre-built tools enable you to integrate with Cloud SQL, AlloyDB, Spanner, and BigQuery, or with your self-managed PostgreSQL database, all directly within your preferred IDE. And since every application manages data in some capacity, Toolbox’s new capabilities unlock new opportunities to automate the software development process.
AI-assisted development connected to your database
Let’s see how a developer uses these new tools to accelerate their work:
Sara has recently joined a development team that maintains an e-commerce application. She has access to the source code and the Google Cloud SQL for PostgreSQL development database. She uses Cline, an open source AI assistant that can be integrated with the VS Code IDE. Sara quickly sets up Toolbox and connects it to Cline and the database.
Next, Sara explores the database to understand how the information is structured and how it can be queried. She doesn’t need to know the SQL syntax or remember the nuances of PostgresSQL. Cline can handle this for her, looking up metadata about the database and then seamlessly connecting to it to run the queries. Sara can simply ask questions in plain english and Cline can bring her answers.
Until now, she had to write complex SQL queries and remember specific table schemas just to get answers – for example if she wants to find the last three orders, she needs to know the correct table and write a SQL query like SELECT * and if she needs to know how many open orders there are that include product type and purchase date, she needs to write another SQL query that joins the orders table with the items table and so you can see that soon the SQL queries get more complex.
Now, she can use these simple natural language prompts and AI can handle the rest for her.
NL prompts
code_block
<ListValue: [StructValue([(‘code’, ‘List all tablesrnrnHow many open orders are there? List the product type and purchase daternrnFor items delivered last year, what is their current inventory quantity?’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e20ac3f46d0>)])]>
After just a few minutes, Sara has a good understanding of the data in the database. She’s ready for her first assignment.
Now, Sara’s team has been asked to integrate vendor management features into their system so Sara turns to Cline and asks it to set up a new ‘vendors‘ table with columns for id, business name, address, city, state, email, and phone. She also needs to add a vendor ID column to the ‘inventory’ table and set up an appropriate index. Once again, Sara doesn’t need to write SQL or code for these tasks, she just instructs Cline which figures out how to make these changes to the database and executes them via Toolbox.
Until now, if Sara had to implement a change like adding vendor information she had to do a cascade of manual updates: writing SQL for table creation (e.g., for ‘vendors‘ with all its columns), altering existing tables (like ‘inventory‘ to add a vendor_id and an index), then update model classes in her application code, and finally, ensure her InventoryDAO tests were still valid and covered the new structure.
Now, Sara can achieve all this with a few simple natural language commands. She can just tell the AI through simple NL prompts :
NL prompts
code_block
<ListValue: [StructValue([(‘code’, “Set up a new ‘vendors’ table with columns for id, business name, address, city, state, email, and phone.rn rnModify the ‘inventory’ table: add a vendor_id column and make sure it’s indexed. rnrnReflect these database changes in the application’s model classes. rnrnAnd can you also update the tests for the InventoryDAO?”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e20ac3f4e50>)])]>
Because Cline has access to the database via Toolbox, it has full context of the revised schema and can make the code changes accordingly. Finally, Sara asks Cline to update the tests for the InventoryDAO class. The tests pass, Sara reviews the changes and checks them in.
A task that might have taken a day or more for a new developer to figure out and implement – even for a developer familiar with PostgreSQL syntax – has been finished in minutes. Sara has completed her first task for her new team and it’s not even lunchtime yet!
Getting started
These expanded capabilities within MCP Toolbox signify our ongoing commitment to providing you with powerful and intuitive tools that accelerate the database development lifecycle and unlock the potential of AI-assisted workflows.
Learn more about Toolbox,connect it to your favorite AI-assisted coding platform and experience the future of AI-accelerated, database-connected software development today.
In today’s cloud environments, security teams need more than just surface-level visibility; they require actionable insight to ensure that their cloud workloads are safe. Unlike third-party cloud security tools that rely on data available via public APIs, Security Command Center (SCC) is built directly into Google Cloud. This gives us unmatched visibility into the safety of cloud workloads and the ability to orchestrate fixes when necessary.
We are using this unique vantage point to further enhance the ability of Security Command Center to protect customers’ Google Cloud environments. Here are four new capabilities designed to help security teams do just that:
Simplify vulnerability management: Introducing agentless scanning for Compute Engine and GKE
Exploiting software vulnerabilities is a frequently observed initial infection vector in cyber attacks. According to M-Trends 2025, 33% of initial infection vectors began with an exploited vulnerability.
For security teams, proactively identifying and remediating these vulnerabilities is crucial, yet traditional agent-based software scanning can introduce significant overhead and deployment headaches.
Security Command Center now offers a powerful alternative: vulnerability scanning for Google Compute Engine and Google Kubernetes Engine (GKE), without the requirement to deploy and manage software on each asset. This new capability, available in preview, allows your team to discover software and OS vulnerabilities in virtual machine instances, GKE kubernetes objects, and GKE clusters — at no additional charge.
Three key benefits of agentless vulnerability scanning include:
Reduce operational overhead: Eliminates agent deployment, configuration, updates, and potential performance impact, helping to simplify security workflows
Expand coverage: Scans virtual machines (VMs) even where agent installation is challenging or restricted, and when unauthorized VMs are provisioned by an adversary.
Maintain data residency: Respects Google Cloud environment boundaries you’ve established for scan results and data.
Security Command Center displays detailed vulnerability information.
Security Command Center also enriches the vulnerability report with data from Google Threat Intelligence, derived from defending billions of users and spending hundreds of thousands of hours investigating incidents. Insights include identifying the impact and the exploitability of the identified vulnerability, which are then aggregated. Overall findings are presented in a visual heat map to help security teams gain a better understanding of the threat landscape — and which vulnerabilities should be prioritized for remediation.
Security Command Center’s vulnerability heat map.
Find vulnerabilities in container images with Artifact Analysis integration
In today’s cloud-native world, container images are the building blocks of modern applications. Ensuring these images are free from known software vulnerabilities is a critical first line of defense. Security Command Center now supports vulnerability scanning for container images by integrating results from Google Cloud’s Artifact Analysis service.
For Security Command Center Enterprise customers, Artifact Registry scans are now included at no additional cost. This means customers can get alerted to vulnerabilities in their container images when they are deployed to a GKE cluster, Cloud Run, or App Engine as part of their SCC Enterprise subscription — enabling vulnerability management without additional costs.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e207ff67c70>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
The heart of the service is driven by automated integration. Images are stored in Artifact Registry, and then scanned by Artifact Analysis to identify known vulnerabilities in both operating system and software packages.
Any image that has been scanned in Artifact Registry will be associated with the container image version deployed to a GKE cluster, Cloud Run job or service, or App Engine instance, and have its vulnerability data linked directly. This can help ensure that the findings you see in the Security Command Center risk dashboard are relevant to your active deployments.
Security Command Center shows known vulnerabilities in Cloud Run images.
The integration allows security teams to directly view potential vulnerabilities in their deployed container images alongside all other Google Cloud security findings, and discover broader risks that could result from exploitation using virtual red teaming. This consolidated view simplifies risk assessment, streamlines remediation, and also can help reduce alert fatigue and tool sprawl.
Security Command Center integration with Artifact Analysis is now generally available.
Secure your serverless applications: Threat detection for Cloud Run
Serverless computing platforms like Google Cloud Run allow organizations to build applications and websites without needing to manage the underlying infrastructure.
Security Command Center now integrates threat detection for Cloud Run services and jobs, available in preview. It employs 16 specialized detectors that continuously analyze Cloud Run deployments for potentially malicious activities. This scope of detection is not possible with third-party products, and includes:
Behavioral analysis, which can identify activities such as the execution of unexpected binaries, connections to known malicious URLs, and attempts to establish reverse shells.
Malicious code detection, which can detect known malicious binaries and libraries used at runtime.
NLP-powered analysis, which uses natural language processing techniques to analyze Bash and Python code-execution patterns for signs of malicious intent.
Control plane monitoring; which analyzes Google Cloud Audit Logs (specifically IAM System Event and Admin Activity logs) to identify potential security threats, such as known cryptomining commands executed in Cloud Run jobs, or the default Compute Engine service account used to modify a Cloud Run service’s IAM policy, which could indicate a post-exploit privilege escalation attempt.
This layered detection strategy provides comprehensive visibility into potential threats targeting your Cloud Run applications, from code execution to control plane activities.
Uncover network anomalies with foundational log analysis
Because Security Command Center is built into the Google Cloud infrastructure, it has direct, first-party access to log sources that can be analyzed to find anomalous and malicious activity. For instance, Security Command Center can automatically detect connections to known bad IP addresses — public IPs flagged for suspicious or malicious behavior by Google Threat Intelligence — by analyzing this internal network traffic.
Now generally available, this built-in capability offers a distinct advantage. While third-party cloud security products require customers to undertake the costly and complex process of purchasing, ingesting, storing, and analyzing VPC Flow Logs (often at additional expense) to gain similar network insights, Security Command Center provides this critical analysis natively and without having to export logs.
Take the next step
To evaluate Security Command Center capabilities and explore subscription options, please contact a Google Cloud sales representative or authorized Google Cloud partner. You can also learn how to activate Security Command Center here.
Organizations are increasingly relying on diverse digital communication channels for essential business operations. The way employees interact with colleagues, access corporate resources, and especially, receive information technology (IT) support is often conducted through calls, chat platforms, and other remote technologies. While these various available methods enhance both efficiency and global accessibility, they also introduce an expanded attack surface that can pose a significant risk if overlooked. Prevalence of in-person social interactions has diminished and remote IT structures, such as an outsourced service desk, has normalized employees’ engagement with external or less familiar personnel. As a result, threat actors continue to use social engineering tactics.
Vishing in the Wild: A Tale of Two Actors
Social engineering is the psychological manipulation of people into performing unsolicited actions or divulging confidential information. It is an effective strategy that preys on human emotions and built-in vulnerabilities like trust and the desire to be helpful. Financially motivated threat actors have increasingly adopted voice-based social engineering, or “vishing,” as a primary vector for initial access, though their specific methods and end goals can vary significantly.
Two prominent examples illustrate the versatility of this threat. The cluster tracked as UNC3944 (which overlaps with “Scattered Spider”) has historically used vishing as a flexible entry point for a range of criminal enterprises. Their operators frequently call corporate service desks, impersonating employees to have credentials and multi-factor authentication (MFA) methods reset. This access is then leveraged for broader attacks, including SIM swapping, ransomware deployment, and data theft extortion.
More recently, the financially motivated actor UNC6040 has demonstrated a different vishing playbook. Its operators also impersonate IT support, but with the specific goal of deceiving employees into navigating to Salesforce’s connected app page and authorizing a malicious, actor-controlled version of the Data Loader application. This single action grants the actor the ability to perform large-scale data exfiltration from the victim’s Salesforce environment, which is then used for subsequent extortion attempts. While both actors rely on vishing, their distinct objectives—UNC3944’s focus on account takeover for broad network access versus UNC6040’s targeted theft of CRM data—highlight the diverse risks organizations face from this tactic.
By reviewing the techniques, tactics, and procedures (TTPs) of actors like UNC3944 and UNC6040, organizations can better assess their own internal policies and guidelines when it comes to employee identification and protection of infrastructure and confidential data. Red teamers can also learn from their methodologies to better emulate real-world attacks and assist organizations in developing defense-in-depth strategies.
Mandiant has successfully used the following approaches to perform voice-based social engineering during Red Team Assessments for clients of varying sizes. The described techniques have enabled Mandiant to mimic TTPs from sophisticated vishing actors like UNC3944 and UNC6040, resulting in administrative-level user impersonation, corporate network perimeter breaches, and sensitive data access. Mandiant has additionally convinced multiple service desks to reset credentials and alter several forms of MFA. These simulated incidents have empowered organizations to proactively identify and resolve deficiencies that otherwise may have gone unnoticed and potentially exploited by a real threat actor.
Open-Source Intelligence Gathering (OSINT)
Effective social engineering campaigns are built upon extensive reconnaissance. The amount of information an attacker can source about corporate culture, employees, policies, procedures, and technologies in use directly impacts the maturity of a phishing scenario’s development. A thorough search to provide a comprehensive overview of an organization from an outside perspective would include, but is not limited to, discovery of the following items:
Network ranges and IP address space
Top-level domains and subdomains
Cloud service providers and email infrastructure
Internet-accessible and internally used web applications
Code repositories
Corporate phone numbers and email address formats
Employee positions and titles
Physical office locations
Publicly exposed internal documentation
Much of this information can often be found through publicly accessible resources. Company websites and marketing materials often list corporate contact information, including numbers for main lines, specific departments, or even individual employees. Social media platforms provide another means of profiling an organization. Professional networking services can be utilized to scrape the full names of employees and recreate corporate emails matching discovered naming conventions. Resumes shared on these platforms may also contain additional contact information including phone numbers and personal email addresses. Attackers may attempt to elicit private information by sending messages to employees from disposable email accounts, aiming to retrieve details through direct interaction or from out-of-office auto-replies. Additionally, public forums, where employees might seek troubleshooting assistance, can inadvertently reveal company-specific details.
Search engines, such as Google, DuckDuckGo, and Bing, provide advanced filtering capabilities to narrow results from targeted queries based on keywords, file types, and other parameters. Figure 1 includes an example of a search filter designed to uncover sensitive files for a given target that may be unknowingly exposed.
“TARGET” filetype:pdf | filetype:doc | filetype:docx | filetype:xls |
filetype:xlsx | filetype:ppt | filetype:pptx intext:"confidential" |
intext:"internal use only" | intext:"not for public release" |
intext:"restricted access"
Figure 1: Searching for documents with search filters
Anonymity networks, like The Onion Router (TOR), can be used to access hidden services, obtain restricted content, and identify supplemental data such as leaked employee IDs, usernames, passwords, and personally identifiable information (PII).
The internet offers a vast array of resources, and a good amount of intelligence can be discovered without any overt interaction with your target.
Leveraging Automated Phone Services
Some organizations make use of automated phone systems that have pre-recorded messages and interactive menus. These systems can provide callers with business-related information, facilitate employee self-service, or route calls to appropriate departments. If not found online, an attacker may attempt to obtain the phone number for an automated service by contacting an employee, often at a reception desk, claiming to have misplaced the number. Calling into these automated services allows an attacker to anonymously identify common issues faced by end users, names of internal applications, additional phone numbers for specific support teams, and, occasionally, alerts about company-wide technical issues. This type of information can be used to craft pretexts for subsequent activity that involves impersonating IT support.
Discovering Employee Identification Processes
Actors engaged in voice-based social engineering ultimately aim to interact with a human operator. While some automated systems provide a direct option to speak with a live agent, others can require some initial information to be provided, such as an employee ID. However, even in these cases, it is common for repeated incorrect entries to result in the transfer to a live agent anyway. Service desk agents handle a high volume of inbound calls ranging from internal employees needing a password reset to external customers experiencing problems with a public-facing application. They are generally given a scripted process for call handling including information they need to request from the caller for identification as well as where to escalate if they are unable to address the issue directly.
During the reconnaissance phase in social engineering a service desk, an attacker may feign ignorance or push boundaries of information disclosure before a requirement for identification is enforced. It is also important for an attacker to take note of how service desk personnel react to incorrect or insufficient information being provided. For example, an attacker may provide an employee ID with an incorrect associated name to observe the response, potentially eliciting the correct full name or determining the validity of the employee ID format. Attackers may also call at different times to converse with varying staff members, use different voice modulations to conceal repeated reconnaissance attempts, and iteratively learn more about the service desk’s identification process each time.
Alternatively, once a service desk number has been identified, an attacker can better target standard employees directly. Using publicly available resources, attackers can spoof the inbound number of a phone call to match that of the legitimate service desk. Without a procedure for verifying inbound callers claiming to be from IT, unsuspecting targets may be convinced by threat actors to perform actions that grant account access or divulge information that can be used to better impersonate staff.
Crafting a Convincing Narrative
With sufficient reconnaissance data, an attacker can formulate targeted campaigns reflecting plausible employee scenarios. A common pretext for contacting a service desk is a forgotten password. Many organizations verify employees using multiple factors. While initial reconnaissance might provide an attacker with answers for knowledge-based authentication methods, challenges arise if device-based verification is required. An attacker might impersonate an employee who claims their phone is unavailable (e.g., damaged or lost during travel) and who needs urgent account access. Another common practice is for actors to impersonate employees identified as being on personal time off (PTO) via out-of-office replies, leveraging a sense of urgency to persuade service desk personnel. Responses to such situations can vary, especially for executive-level users. In the event of a successful MFA reset, the attacker can then call back and try to get a different agent on the phone to further reset the impersonated user’s password for a full account compromise. If the legitimate employee is genuinely unavailable, unauthorized account access can persist for an extended period of time.
The Evolution of an Exploit
The compromise of a single account can serve as a foundation for more complex social engineering campaigns. Breaching the perimeter of an organization often grants an attacker access to internal workflows, chats, documents, meeting invites, and ways to better uncover verified intelligence on existing employees. Open-source tools such as ROADrecon can extract details from entire Entra ID tenants, potentially revealing phone numbers, employee IDs, and organizational hierarchy. Attackers may also seek access to IT ticketing systems and support channels to impersonate service desk staff to end-users who have open requests. The more information an attacker possesses, the more believable their pretext becomes, increasing the probability of success.
Strategic Recommendations and Best Practices
Modern features in mobile technology, such as AI-powered Scam Detection on Android, demonstrate how software may be able to offer personal protection, but a comprehensive defense for organizations against vishing and related social engineering threats requires broad, proactive security initiatives and a defense-in-depth strategy. Mandiant recommends organizations consider the following best practices to reinforce their external perimeter and develop secure communication channels, particularly those involving IT support and employee verification.
Positive Identity Verification for Service Desk Interactions
Train service desk personnel to rigorously perform positive identity verification for all employees before modifying accounts or providing security-sensitive information (including during initial enrollment). This is critical for any privileged accounts.
Mandated verification methods should include options such as:
On-camera/video conference verification where the employee presents a corporate badge or government-issued ID
Utilization of an internal, up-to-date employee photo database
Challenge/response questions based on information not easily discoverable externally (avoiding reliance on publicly available PII like date of birth or the last four digits of a Social Security number, as actors often possess this data)
For high-risk changes, such as MFA resets or password changes for privileged accounts, implement out-of-band verification (e.g., a call-back to a registered phone number or confirmation via a known corporate email address of the employee or their manager).
During periods of heightened threat or suspected compromise, consider temporarily disabling self-service password or MFA reset methods and routing all such requests through a manual service desk workflow with enhanced scrutiny.
Enforce Strong, Phishing-Resistant MFA
MFA should be enforced on all sensitive and internet-facing portals to prevent unauthorized access even in the event of a password compromise.
Standardize one primary MFA solution, for most employees, to simplify security architecture and centralize a platform for detections and alerts.
Remove weak forms of MFA, such as SMS, voice calls, or simple email links, as primary authentication factors. These are susceptible to vishing, SIM swapping, and other attacks.
Prioritize phishing-resistant MFA methods:
FIDO2-compliant security keys (hardware tokens), especially for administrative and privileged users
Authenticator applications providing number matching or robust geo-verification features
Soft-tokens that are not reliant on easily intercepted channels
Ensure administrative users cannot register or use legacy/weak MFA methods, even if those are permitted for other user tiers.
Secure MFA Registration and Modification Processes
Do not permit employees to self-register new MFA devices without stringent controls. Implement an IT-managed or otherwise secure enrollment process.
Restrict MFA registration and modification actions to only be permissible from trusted IP locations and/or compliant corporate devices.
Alert on and investigate suspicious MFA registration activities, such as the same MFA method or phone number being registered across multiple user accounts.
Manager Involvement and Segregation of Duties
Service desks should notify managers (via verified contact channels sourced from internal directories) upon an employee’s password reset, especially for sensitive accounts.
Require manager approval, through a verified channel, for all MFA resets. This creates third-party awareness and an additional record.
For larger organizations, consider segregating service desk responsibilities. Customer-facing support desks should generally not have permissions to modify internal corporate employee accounts.
Employee Training and Vishing Awareness
Conduct regular phishing simulation exercises that include vishing scenarios to educate employees about the specific risks of voice-based social engineering.
Train employees to always verify unexpected calls or requests for sensitive information, especially those claiming to be from IT support or other internal departments, by using an official internal directory to initiate a call-back or by contacting their manager.
Train employees to recognize common vishing pretexts (e.g., urgent requests to avoid negative consequences, claims of system issues requiring immediate action, unexpected MFA prompts).
Equip service desk employees with access to logs of previous calls and tickets to help identify abnormal patterns, such as repeated calls from unrecognized numbers or sequential MFA reset and password reset requests for the same user.
Security Monitoring and Alerting for Vishing-Related Activity
Utilize security information and event management (SIEM) and security orchestration, automation, and response (SOAR) technologies to monitor employee sign-in activity and service desk interactions.
Create specific alerts for the following:
Password reset activity, particularly for privileged accounts or outside of expected patterns
New MFA device enrollment or modification of existing MFA methods
Multiple failed login attempts followed by a successful password or MFA reset
All activities flagged as abnormal should be reviewed by an internal security team and investigated with the impacted employee and their manager.
Further guidance on hardening against UNC3944-style threats, including broader identity, endpoint, and network infrastructure recommendations, is detailed by the Google Threat Intelligence Group (GTIG).
Conclusion
This discussion of voice-based social engineering and its proposed resolutions aims to provide insight into attack methodologies and preventative measures relevant to this threat vector. Organizations seeking direct support on this subject or other services related to attack simulation and red team exercises are encouraged to contact Mandiant for assistance. Mandiant can discuss specific needs in detail and explore tailored recommendations to better equip security postures against advanced and persistent threats.
Google Threat Intelligence Group (GTIG) is tracking UNC6040, a financially motivated threat cluster that specializes in voice phishing (vishing) campaigns specifically designed to compromise organization’s Salesforce instances for large-scale data theft and subsequent extortion. Over the past several months, UNC6040 has demonstrated repeated success in breaching networks by having its operators impersonate IT support personnel in convincing telephone-based social engineering engagements. This approach has proven particularly effective in tricking employees, often within English-speaking branches of multinational corporations, into actions that grant the attackers access or lead to the sharing of sensitive credentials, ultimately facilitating the theft of organization’s Salesforce data. In all observed cases, attackers relied on manipulating end users, not exploiting any vulnerability inherent to Salesforce.
A prevalent tactic in UNC6040’s operations involves deceiving victims into authorizing a malicious connected app to their organization’s Salesforce portal. This application is often a modified version of Salesforce’s Data Loader, not authorized by Salesforce. During a vishing call, the actor guides the victim to visit Salesforce’s connected app setup page to approve a version of the Data Loader app with a name or branding that differs from the legitimate version. This step inadvertently grants UNC6040 significant capabilities to access, query, and exfiltrate sensitive information directly from the compromised Salesforce customer environments. This methodology of abusing Data Loader functionalities via malicious connected apps is consistent with recent observations detailed by Salesforce in their guidance on protecting Salesforce environments from such threats.
In some instances, extortion activities haven’t been observed until several months after the initial UNC6040 intrusion activity, which could suggest that UNC6040 has partnered with a second threat actor that monetizes access to the stolen data. During these extortion attempts, the actor has claimed affiliation with the well-known hacking group ShinyHunters, likely as a method to increase pressure on their victims.
Figure 1: Data Loader attack flow
UNC6040
GTIG is currently tracking a significant portion of the investigated activity as UNC6040. UNC6040 is a financially motivated threat cluster that accesses victim networks by voice phishing social engineering. Upon obtaining access, UNC6040 has been observed immediately exfiltrating data from the victim’s Salesforce environment using Salesforce’s Data Loader application. Following this initial data theft, UNC6040 was observed moving laterally through the victim’s network, accessing and exfiltrating data from other platforms such as Okta, Workplace, and Microsoft 365.
Attacker Infrastructure
UNC6040 utilized infrastructure to access Salesforce applications that also hosted an Okta phishing panel. This panel was used to trick victims into visiting it from their mobile phones or work computers during the social engineering calls. In these interactions, UNC6040 also directly requested user credentials and multifactor authentication codes to authenticate and add the Salesforce Data Loader application, facilitating data exfiltration and subsequent lateral movement.
Alongside the phishing infrastructure, UNC6040 primarily used Mullvad VPN IP addresses to access and perform the data exfiltration on the victim’s Salesforce environments and other services of the victim’s network.
Overlap with Groups Linked to “The Com”
GTIG has observed infrastructure across various intrusions that shares characteristics with elements previously linked to UNC6040 and threat groups suspected of ties to the broader, loosely organized collective known as “The Com“. We’ve also observed overlapping tactics, techniques, and procedures (TTPs), including social engineering via IT support, the targeting of Okta credentials, and an initial focus on English-speaking users at multinational companies. It’s plausible that these similarities stem from associated actors operating within the same communities, rather than indicating a direct operational relationship between the threat actors.
Data Loader
Data Loader is an application developed by Salesforce, designed for the efficient import, export, and update of large data volumes within the Salesforce platform. It offers both a user interface and a command-line component, the latter providing extensive customization and automation capabilities. The application supports OAuth and allows for direct “app” integration via the “connected apps” functionality in Salesforce. Threat actors abuse this by persuading a victim over the phone to open the Salesforce connect setup page and enter a “connection code,” thereby linking the actor-controlled Data Loader to the victim’s environment.
Figure 2: The victim needs to enter a code to connect the threat actor controlled Data Loader
Modifications
In some of the intrusions using Data Loader, threat actors utilized modified versions of Data Loader to exfiltrate Salesforce data from victim organizations. The proficiency with the tool and capabilities by executed queries seems to differ from one intrusion to another.
In one instance, a threat actor used small chunk sizes for data exfiltration from Salesforce but was only able to retrieve approximately 10% of the data before detection and access revocation. In another case, numerous test queries were made with small chunk sizes initially. Once sufficient information was gathered, the actor rapidly increased the exfiltration volume to extract entire tables.
There were also cases where the threat actors configured their Data Loader application with the name “My Ticket Portal”, aligning the tool’s appearance with the social engineering pretext used during the vishing calls.
Outlook & Implications
Voice phishing (vishing) as a social engineering method is not, in itself, a novel or innovative technique; it has been widely adopted by numerous financially motivated threat groups over recent years with varied results. However, this campaign by UNC6040 is particularly notable due to its focus on exfiltrating data specifically from Salesforce environments. Furthermore, this activity underscores a broader and concerning trend: threat actors are increasingly targeting IT support personnel as a primary vector for gaining initial access, exploiting their roles to compromise valuable enterprise data.
The success of campaigns like UNC6040’s, leveraging these refined vishing tactics, demonstrates that this approach remains an effective threat vector for financially motivated groups seeking to breach organizational defenses.
Given the extended time frame between initial compromise and extortion, it is possible that multiple victim organizations and potentially downstream victims could face extortion demands in the coming weeks or months.
Readiness, Mitigations, and Hardening
This campaign underscores the importance of a shared responsibility model for cloud security. While platforms like Salesforce provide robust, enterprise-grade security controls, it’s essential for customers to configure and manage access, permissions, and user training according to best practices.
To defend against social engineering threats, particularly those abusing tools like Data Loader for data exfiltration, organizations should implement a defense-in-depth strategy. GTIG recommends the following key mitigations and hardening steps:
Adhere to the Principle of Least Privilege, Especially for Data Access Tools: Grant users only the permissions essential for their roles—no more, no less. Specifically for tools like Data Loader, which often require the “API Enabled” permission for full functionality, limit its assignment strictly. This permission allows broad data export capabilities; therefore, its assignment must be carefully controlled. Per Salesforce’s guidance, review and configure Data Loader access to restrict the number of users who can perform mass data operations, and regularly audit profiles and permission sets to ensure appropriate access levels.
Manage Access to Connected Applications Rigorously: Control how external applications, including Data Loader, interact with your Salesforce environment. Diligently manage access to your connected apps, specifying which users, profiles, or permission sets can use them and from where. Critically, restrict powerful permissions such as “Customize Application” and “Manage Connected Apps”—which allow users to authorize or install new connected applications—only to essential and trusted administrative personnel. Consider developing a process to review and approve connected apps, potentially allowlisting known safe applications to prevent the unauthorized introduction of malicious ones, such as modified Data Loader instances.
Enforce IP-Based Access Restrictions: To counter unauthorized access attempts, including those from threat actors using commercial VPNs, implement IP address restrictions. Set login ranges and trusted IPs, thereby restricting access to your defined enterprise and VPN networks. Define permitted IP ranges for user profiles and, where applicable, for connected app policies to ensure that logins and app authorizations from unexpected or non-trusted IP addresses are denied or appropriately challenged.
Leverage Advanced Security Monitoring and Policy Enforcement with Salesforce Shield: For enhanced alerting, visibility, and automated response capabilities, utilize tools within Salesforce Shield. Transaction Security Policies allow you to monitor activities like large data downloads (a common sign of Data Loader abuse) and automatically trigger alerts or block these actions. Complement this with “Event Monitoring” to gain deep visibility into user behavior, data access patterns (e.g., who viewed what data and when), API usage, and other critical activities, helping to detect anomalies indicative of compromise. These logs can also be ingested into your internal security tools for broader analysis.
Enforce Multi-Factor Authentication (MFA) Universally: While the social engineering tactics described may involve tricking users into satisfying an MFA prompt (e.g., for authorizing a malicious connected app), MFA remains a foundational security control. Salesforce states that “MFA is an essential, effective tool to enhance protection against unauthorized account access” and requires it for direct logins. Ensure MFA is robustly implemented across your organization and that users are educated on MFA fatigue tactics and social engineering attempts designed to circumvent this critical protection.
By implementing these measures, organizations can significantly strengthen their security posture against the types of vishing and the UNC6040 data exfiltration campaign detailed in this report. Regularly review Salesforce’s security documentation, including theSalesforce Security Guide for additional detailed guidance.
Read our vishing technical analysis for more details on the vishing threat, and strategic recommendations and best practices to stay ahead of it.
Many organizations in regulated industries and the public sector that want to start using generative AI face significant challenges in adopting cloud-based AI solutions due to stringent regulatory mandates, sovereignty requirements, the need for low-latency processing, and the sheer scale of their on-premises data. Together, these can all present institutional blockers to AI adoption, and force difficult choices between using advanced AI capabilities and adhering to operational and compliance frameworks.
GDC Sandbox can help organizations harness Google’s gen AI technologies while maintaining control over data, meeting rigorous regulatory obligations, and unlocking a new era of on-premises AI-driven innovation. With flexible deployment models, a robust security architecture, and transformative AI applications like Google Agentspace search, GDC Sandbox enables organizations to accelerate innovation, enhance security, and realize the full potential of AI.
Secure development in isolated environments
For sovereign entities and regulated industries, a secure Zero Trust architecture via platforms like GDC Sandbox is a prerequisite for leveraging advanced AI. GDC Sandbox lets organizations implement powerful use cases — from agentic automation and secure data analysis to compliant interactions — while upholding sovereign Zero Trust mandates for security and compliance.
“GDC Sandbox provides Elastic with a unique opportunity to enable air-gapped gen AI app development with Elasticsearch, as well as enable customers to rapidly deploy our Security Incident & Event Management (SIEM) capabilities.” – Ken Exner, Chief Product Officer, Elastic
“Accenture is excited to offer Google Distributed Cloud air-gapped to customers worldwide as a unique solution for highly secure workloads. By using GDC Sandbox, an emulator for air-gapped workloads, we can expedite technical reviews, enabling end-customers to see their workloads running in GDC without the need for lengthy proofs of concept on dedicated hardware.” – Praveen Gorur, Managing Director, Accenture
Air-gapped environments are challenging
Public sector agencies, financial institutions, and other organizations that handle sensitive, secret, and top-secret data are intentionally isolated (air-gapped) from the public internet to enhance security. This physical separation prevents cyberattacks and unauthorized data access from external networks, helping to create a secure environment for critical operations and highly confidential information. However, this isolation significantly hinders the development and testing of cutting-edge technologies. Traditional air-gapped development often requires complex hardware setups, lengthy procurement cycles, and limits access to the latest tools and frameworks. These limitations hinder the rapid iteration cycles essential to development.
Video Analysis Application Built on GDC Sandbox
According to Gartner® analyst Michael Brown in the recent report U.S. Federal Government Context: Magic Quadrant for Strategic Cloud Platform Services, where Google Cloud is evaluated as a Notable Vendor, “Federal CIOs will need to consider cost and feature availability in selecting a GCC [government community cloud] provider. Careful review of available services within the compliance scope is necessary. A common pitfall is the use of commercially available services in early solution development and subsequently finding that some of those services are not available in the target government community environment. This creates technical debt requiring refactoring, which results in delays and additional expense.”
GDC Sandbox: A virtualized air-gapped environment
GDC Sandbox addresses these challenges head-on. This virtual environment emulates the experience of GDC air-gapped, allowing you to build, test, and deploy gen AI applications using popular development tools and CI/CD pipelines. With it, you don’t need to procure hardware or set up air-gapped infrastructure to test applications with stringent security requirements before moving them to production. Customers can leverage Vertex AI APIs for key integrations with GDC Sandbox – AI Optimized including:
GPUs: Dedicate user-space GPUs for gen AI development
Interacting with GDC Sandbox
One of the things that sets GDC Sandbox apart is its consistent user interface. As seen above, developers familiar with Google Cloud will find themselves in a comfortable and familiar environment, which helps streamline the development process and reduces the learning curve. This means you can jump right into building and testing your gen AI applications without missing a beat.
“GDC Sandbox has proven to be an invaluable tool to develop and test our solutions for highly regulated customers who are looking to bring their air-gapped infrastructures into the cloud age.” – David Olivier, Defense and Homeland Security Director, Sopra Steria Group
“GDC Sandbox provides a secure playground for public sector customers and other regulated industries to prototype and test how Google Cloud and AI can solve their unique challenges. By ensuring consistency with other forms of compute, we simplify development and deployment, making it easier for our customers to bring their ideas to life. We’re excited to see how our customers use the GDC Sandbox to push the boundaries of what’s possible.” – Will Grannis, VP & CTO, Google Cloud
The GDC Sandbox architecture and experience
GDC Sandbox offers developers a familiar and intuitive environment by mirroring the API, UI, and CLI experience of GDC air-gapped and GDC air-gapped appliance. It offers a comprehensive suite of services, including virtual machines, Kubernetes clusters, storage, observability, and identity management. This allows developers to build and deploy a wide range of gen AI applications, and leverage the power of Google’s AI and machine learning expertise within a secure, dedicated environment.
GDC Sandbox – Product Architecture
Use cases for GDC Sandbox
GDC Sandbox offers numerous benefits for organizations with air-gapped environments. Let’s explore some compelling use cases:
Gen AI development: Develop and test Vertex and gen AI applications via GPUs to cost-effectively validate them in secure production environments.
Partner enablement: Empower partners to build applications, host GDC Marketplace offerings, train personnel, and prepare services for production.
Training and proof of concepts: Provide hands-on training for developers and engineers on GDC air-gapped technologies and best practices. Deliver ground-breaking new capabilities and showcase the art of the possible for customers and partners.
Building applications in GDC Sandbox
GDC Sandbox leverages containers and Kubernetes to host your applications. To get your application up and running, follow these steps:
Build and push: Build your application image locally using Docker and ensure your Dockerfile includes all necessary dependencies. Tag your image in your source repository then sync with the Harbor instance URI and push it to the provided Harbor repository.
Deploy with Kubernetes: Create a Kubernetes deployment YAML file that defines your application’s specifications, including the Harbor image URI and the necessary credentials to access the image. Apply this file using the kubectl command-line tool to deploy your application to the Kubernetes cluster within the Sandbox.
Expose and access: Create a Kubernetes service to expose your application within the air-gap. Retrieve the service’s external IP using kubectl get svc to access your application.
Migrate and port: Move your solutions from GDC Sandbox to GDC air-gapped and appliance deployments.
U.S. Federal Government Context: Magic Quadrant for Strategic Cloud Platform Services, By Michael Brown, 3 February 2025
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
As the first fully cloud-native private bank in Switzerland, Alpian stands at the forefront of digital innovation in the financial services sector. With its unique model blending personal wealth management and digital convenience, Alpian offers clients a seamless, high-value banking experience.
Through its digital-first approach built on the cloud, Alpian has achieved unprecedented agility, scalability, and compliance capabilities, setting a new standard for private banking in the 21st century. In particular, its use of generative AI gives us a glimpse of the future of banking.
The Challenge: Innovating in a Tightly Regulated Environment
The financial industry is one of the most regulated sectors in the world, and Switzerland’s banking system is no exception. Alpian faced a dual challenge: balancing the need for innovation to provide cutting-edge services while adhering to stringent compliance standards set by the Swiss Financial Market Supervisory Authority (FINMA).
Especially when it came to deploying a new technology like generative AI, the teams at Alpian and Google Cloud knew there was virtually no room for error.
Tools like Gemini have streamlined traditionally complex processes, allowing developers to interact with infrastructure through simple conversational commands. For instance, instead of navigating through multiple repositories and manual configurations, developers can now deploy a new service by simply typing their request into a chat interface.
This approach not only accelerates deployment times — reducing them from days to mere hours — it’s also empowered teams to focus on innovative rather than repetitive tasks.
There are limits, to be sure, both to ensure security and compliance, as well as focus on the part of teams.
Thanks to this platform with generative AI, we haven’t opened the full stack to our engineers, but we have created a defined scope where they can interact with different elements of our IT using a simplified conversational interface. It’s within these boundaries that they have the ability to be autonomous and put AI to work.
Faster deployment times translate directly into better client experiences, offering quicker access to new features like tailored wealth management tools and enhanced security. This integration of generative AI has not only optimized internal workflows but also set a new benchmark for operational excellence in the banking sector.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0ff9deda60>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
A Collaborative Journey to Success
Alpian worked closely with its team at Google Cloud to find just the right solutions to meet it’s evolving needs. Through strong trust, dedicated support and expertise, they were able to optimize infrastructure, implement scalable solutions, and leverage AI-powered tools like Vertex AI and BigQuery.
“Google Cloud’s commitment to security, compliance, and innovation gave us the confidence to break new ground in private banking,” Damien Chambon, head of cloud at Alpian, said.
Key Results
Alpian’s cloud and AI work has already had a meaningful impact on the business:
Enhanced developer productivity with platform engineering, enabling more independence and creativity within teams.
Automated compliance workflows, aligning seamlessly with FINMA’s rigorous standards.
Simplified deployment processes, reducing infrastructure complexity with tools like Gemini
These achievements have enabled Alpian to break down traditional operational silos, empowering cross-functional teams to work in harmony while delivering customer-focused solutions.
Shaping the Future of Private Banking
Alpian’s journey is just beginning. With plans to expand its AI capabilities further, the bank is exploring how tools like machine learning and data analytics can enhance client personalization and operational efficiency. By leveraging insights from customer interactions and integrating them with AI-driven workflows, Alpian aims to refine its offerings continually and remain a leader in the competitive digital banking space.
By aligning technological advancements with regulatory requirements, Alpian is creating a model for the future of banking — one where agility, security, and customer-centricity can come together seamlessly and confidently.
As an AI/ML developer, you have a lot of decisions to make when it comes to choosing your infrastructure — even if you’re running on top of a fully managed Google Kubernetes Engine (GKE) environment.While GKE acts as the central orchestrator for your AI/ML workloads — managing compute resources, scaling your workloads, and simplifying complex workflows — you still need to choose an ML framework, your preferred compute (TPU or GPUs), a scheduler (Ray, Kueue, Slurm) and how you want to scale your workloads. By the time you have to configure storage, you’re facing decision fatigue!
You could simply choose Google’s Cloud Storage for its size, scale and cost efficiency. However, Cloud Storage may not be a good fit for all use cases. For instance, you might benefit from a storage accelerator in front of Cloud Storage like Hyperdisk ML for better model weights load times. But in order to benefit from the acceleration these bring, you would need to develop custom workflows to orchestrate data transfer across storage systems.
Introducing GKE Volume Populator
GKE Volume Populator is targeted at organizations that want to store their data in one data source and let GKE orchestrate the data transfers. To achieve this, GKE leverages the Kubernetes Volume Populator feature through the same PersistentVolumeClaim API that customers use today.
GKE Volume Populator along with the relevant CSI drivers dynamically provision a new destination storage volume and transfer data from your Cloud Storage bucket to the destination storage volume. Your workload pods then wait to be scheduled until the data transfer is complete.
Using GKE Volume Populator provides a number of benefits:
Low management overhead: As part of a managed solution that’s enabled by default, GKE Volume Populator handles the data transfer, so you don’t need to build a bespoke solution for data hydration but leave it to GKE.
Optimized resource utilization: Your workload pods are blocked for scheduling until the data transfer completes. You can use your GPUs/TPUs for other tasks while data is being transferred.
Easy progress tracking: Monitor the data transfer progress by checking the event message on your PVC object.
Customers like Abridge AI report that GKE Volume Populator is helping them streamline their AI development processes.
“Abridge AI is revolutionizing clinical documentation by leveraging generative AI to summarize patient-clinician conversations in real time. By adopting Google Cloud Hyperdisk ML, we’ve accelerated model loading speeds by up to 76% and reduced pod initialization times. Additionally, the new GKE Volume Populator feature has significantly streamlined access to large models and LoRA adapters stored in Cloud Storage buckets. These performance improvements enable us to process and generate clinical notes with unprecedented efficiency — especially during periods of high clinician demand.” – Taruj Goyal, Software Engineer, Abridge
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0ff9d562b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Accelerate your data via Hyperdisk ML
Let’s say you have an AI/ML inference workload, and your data is stored in a Cloud Storage bucket, you want to move your data from the Cloud Storage bucket to a Hyperdisk ML instance to accelerate the loading of model weights, scale up to 2,500 concurrent nodes and reduce the pod over-provisioning. Here’s how to do this with GKE Volume Populator:
1. Prepare your GKE Cluster: Create a GKE cluster with the corresponding CSI driver, and enable Workload Identity Federation for GKE.
2. Set up necessary permissions: Configure permissions so that GKE Volume Populator has read access to your Cloud Storage bucket.
3. Define Your data source: Create a GCPDataSource This specifies:
The URL of the Cloud Storage bucket that contains your data
The Kubernetes Service Account you created with read access to the bucket
4. Create your PersistentVolumeClaim: Create a PVC that refers to the GCPDataSource you created in step 3 and the corresponding StorageClass for the destination storage.
5. Deploy Your AI/ML workload: Create your inference workload with the PVC. Configure this workload to use the PVC you created in step 4.
GKE Volume Populator is generally available, and support for Hyperdisk ML is in preview. To enable it in your console, reach out to your account team.
“There’s no red teaming on the factory floor,” isn’t an OSHA safety warning, but it should be — and for good reason. Adversarial testing in most, if not all, manufacturing production environments is prohibited because the safety and productivity risks outweigh the value.
If resources were not a constraint, the security team would go build another factory with identical equipment and systems and use it to conduct proactive security testing. Almost always, costs outweigh the benefits, and most businesses simply can not support the expense.
This is where digital twins can help. Digital twins are essentially IT stunt doubles, cloud-based replicas of physical systems that use real-time data to create a safe environment for security and resilience testing. The digital twin environment can be used to test for essential subsystem interactions and repercussions as the systems transition from secure states to insecure states.
aside_block
<ListValue: [StructValue([(‘title’, ‘Don’t test in prod: Use digital twins for safer, smarter resilience’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0ffa1783a0>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://cloud.google.com/transform/dont-test-in-prod-use-digital-twins-safer-smarter-resilience’), (‘image’, None)])]>
Security teams can operationalize digital twins and resilience analysis using the following approach:
Gain a deep understanding about the correlations between the leading indicators of cyber resilience and the role of digital twins in becoming resilient. The table below offers this mapping.
Get buy-in from business leaders, including the CMO, CIO, and CTO. Security teams should be able to demonstrate the strategic value to the organization by using digital twins for adversarial security testing without disrupting production.
Identify the right mix of engineers and security experts, as well as appropriate technologies to execute the strategy. Google Cloud’s security and infrastructure stack is positioned to help security teams achieve operational digital twins for security (see table below).
Cyber resilience
leading indicator
Role of digital twins
Hard-restart recovery time
Simulate various system failure scenarios on the digital twins and discover subsequent rebuild processes. Identify areas of improvement, optimal recovery procedures, and bottlenecks.
Cyber-physical modularity
Use digital twins to quantify the impact of single point failures on the overall production process. Use the digital twin environment to measure metrics such as the mean operational capability of a service in a degraded state and trackability of the numbers of modules impacted by each single point failure.
Internet denial and communications resilience
Simulate the loss of internet connectivity to the digital twins and measure the proportion of critical services that continue operating successfully. Assess the effectiveness of the backup communication systems and the response speed. This process can also be applied to the twins of non-internet facing systems.
Manual operations
Disrupt the automation controls on the digital twins and measure the degree to which simulation of manual control can sustain a minimum viable operational delivery objective. Incorporate environmental and operational constraints such as the time taken for the personnel to manually control.
Control pressure index (CPI)
Model the enablement of security controls and dependencies on the digital twins to calculate CPI. Then, simulate failures of individual controls or a combination of controls to assess the impact. Discover defense-in-depth improvement opportunities.
Software reproducibility
Not applicable
Preventative maintenance levels
Explore and test simulated failures to optimize and measure preventative maintenance effectiveness. Simulate the impact of maintenance activities, downtime reduction, and evaluate return on investment (ROI).
Inventory completeness
Inventory completeness will become apparent during the digital twin construction process.
Stress-testing vibrancy
Conduct red teaming, apply chaos engineering principles, and stress test the digital twin environment to assess the overall impact.
Common mode failures
In the twin environment, discover and map critical dependencies and identify potential common mode failures that could impact the production process. In a measurable manner, identify and test methods of reducing risk of cascading failures during disruption events.
What digital twins architecture can look like with Google Cloud
To build an effective digital twin, the physics of the electrical and mechanical systems must be represented with sufficient accuracy.
The data needed for the construction of the twin should either come from the physical sensors or computed using mathematical representations of the physical process. The twin should be modeled across three facets:
Subsystems:Modeling the subsystems of the system, and pertinent interactions between the subsystems (such as a robotic arm, its controller, and software interactions).
Networks:Modeling the network of systems and pertinent interactions (such as plant-wide data flow and machine-to-machine communication).
Influencers:Modeling the environmental and operational parameters, such as temperature variations, user interactions, and physical anomalies causing system and network interruptions.
Developing digital twins in diverse OT environments requires secure data transmission, compatible data storage and processing, and digital engines using AI, physics modeling, applications, and visualization. This is where comprehensive end-to-end monitoring, detection, logging, and response processes using tools such as Google Security Operations and partner solutions comes in.
The following outlines one potential architecture for building and deploying digital twins with Google Cloud:
Compute Engine to replicate physical systems on a digital plane
Cloud Storage to store data, simulate backup and recovery
Cloud Monitoring to emulate on-prem monitoring and evaluate recovery process
BigQuery to store, query, and analyze the datastreams received from MDE and to perform adversarial testing’s postpartum analysis
Spanner Graph and partner solutions such as neo4j to build and enumerate the industrial process based on graph-based relationship modeling
Machine learning services (including Vertex AI, Gemini in Security, partner models through Vertex AI Model Garden) to rapidly generate relevant failure scenarios and discover opportunities of secure customized production optimization. Similarly, use Vision AI tools to enhance the digital twin environment, bringing it closer to the real-world physical environment.
Cloud Run functions for serverless compute platform, which can run failure-event-driven code and trigger actions based on digital twin insights
Looker to visualize and create interactive dashboards and reports based on digital twin and event data
Apigee to securely expose and manage APIs for the digital twin environment. This allows for controlled access to real-time data from on-prem OT applications and systems. For example, Apigee can manage APIs for accessing building OT sensor data, controlling HVAC systems, and integrating with third-party applications for energy management.
Google Distributed Cloud to run digital twins in an air-gapped, on-premises, containerized environment
An architectural reference for building and deploying digital twins with Google Cloud.
Security and engineering teams can use the above Google Cloud services illustration as a foundation and customize it to their specific requirements. While building and using digital twins, both security of the twins and security by the twins are critical. To ensure that the lifecycle of the digital twins are secure, cybersecurity hardening, logging, monitoring, detection, and response should be at the core design, build, and execution processes.
This structured approach enables modelers to identify essential tools and services, define in-scope systems and their data capabilities, map communication and network routes, and determine applications needed for business and engineering functions.
Getting started with digital twins
Digital twins are a powerful tool for security teams. They help us better understand and measure cyber-physical resilience through safe application of cyber-physical resilience leading indicators. They also allow for the adversarial testing and analysis of subsystem interactions and the effects of systems moving between secure and insecure conditions without compromising safety or output.
Security teams can begin right away to use Google Cloud to build and scale digital twins for security:
Identify the purpose and function that security teams would like to simulate, monitor, optimize, design, and maintain for resilience.
Select and identify the right physical or industrial object, system, or process to be replicated as the digital twin.
Identify pertinent data flows, and interfaces, and dependencies for data collection and integration.
Be sure to understand the available IT and OT, cloud, and on-premises telemetry across the physical or industrial object,system, or process.
Create the virtual model that accurately represents its physical counterpart in all necessary aspects.
The replica should be connected to its physical counterpart to facilitate real-time data flow to the digital twin. Use a secure on-premises connector such as MDE to make the secure connection between the physical and digital environments running on Google Cloud VPC.
To operationalize the digital twin, build the graph-based entity relationship model using Spanner Graph and partner solutions like neo4j. This uses the live data stream from the physical system and represents it on the digital twin.
Use a combination of Cloud Storage and BigQuery to store discrete and continuous IT and OT data such as system measurements, states, and file dumps from the source and digital twin.
Discover common mode failures based on the mapped processes that include internal and external dependencies.
Use at least one leading indicator with Google Threat Intelligence to perform threat modeling and evaluate the impact on the digital twin model.
Run Google’s AI models on the digital twins to further advance the complexity of cyber-resilience studies.
Look for security and observability gaps. Improve model fidelity. Recreate and update the digital twin environment. Repeat step 10 with a new leading indicator, new threat intelligence, or an updated threat model.
Based on the security discoveries from the resilience studies on the digital twin, design and implement security controls and risk mitigations in the physical counterpart.
To learn more about how to build a digital twin, you can read this ebook chapter and contact Google Cloud’s Office of the CISO.
In today’s digital world, we spend countless hours in our browsers. It’s where we work, collaborate, and access information. But have you ever stopped to consider if you’re fully leveraging the browser security features available to protect your organisation? We explore this in our new paper “The Security Blindspot: Real Attack Insights From Real Browser Attacks,” and the answer might surprise you.
Written in partnership with Mandiant Incident Response experts, the new paper highlights how traditional security measures often overlook available security features within the browser, leaving organizations vulnerable to sophisticated attacks that could be prevented with additional browser security policies. Phishing, data breaches, insider threats, and malicious browser extensions are just some of the risks. Attackers are increasingly using legitimate browser features to trick users and carry out their malicious activities, making them harder to detect.
The paper delves into real-world case studies where increased browser security could have prevented significant security breaches and financial losses. These examples underscore the urgent need for organizations to adopt proactive and comprehensive security strategies within the browser.
Key takeaways from the report include:
Browsers are a major entry point for attacks: Attackers exploit users working on the web to launch advanced attacks.
Traditional security often overlooks the browser: Focusing solely on network and endpoint security leaves a significant gap.
Real-world attacks demonstrate the risks: Case studies reveal the consequences of neglecting security at the browser layer.
Advanced threat and data protection within the browser is essential: Solutions like Chrome Enterprise Premium can help mitigate these risks.
Browser insights for your security teams: Leverage telemetry and advanced browser data to provide a detailed view of your environment, identify risks and enable proactive measures to protect data.
Organizations that don’t take advantage of security within the browser are open to an array of threats, including phishing, data breaches, insider attacks, and malicious browser extensions, making robust browser protection essential. Don’t let your unprotected browser be your biggest security blind spot. To learn more about how to protect your organization from browser-based attacks, read the full whitepaper.
You can never be sure when you’ll be the target of a distributed denial-of-service (DDoS) attack. For investigative journalist Brian Krebs, that day came on May 12, when his site KrebsOnSecurity experienced one of the largest DDoS attacks seen to date.
At 6.3 terabits per second (Tbps), or roughly 63,000 times the speed of broadband internet in the U.S., the attack was 10 times the size of the DDoS attack Krebs faced in 2016 from the Mirai botnet. That 2016 incident took down KrebsOnSecurity.com for four days, and was so severe that his then-DDoS protection service asked him to find another provider, Krebs said in his report on the May attack.
Following the 2016 incident, Krebs signed up for Project Shield, a free Google service that offers at-risk, eligible organizations protection against DDoS attacks. Since then, his site has stayed reliably online in the face of attacks — including the latest incident.
The brunt of the May 12 attack lasted less than a minute and peaked above 6.3 Tbps, one of the largest DDoS attacks observed to date.
Organizations in eligible categories, including news publishers, government elections, and human rights defenders, can use the power of Google Cloud’s networking services in conjunction with Jigsaw to help keep their websites available and online.
Project Shield acts as a reverse proxy service — customers change their DNS settings to send traffic to an IP address provided by Project Shield, and configure Project Shield with information about their hosting server. The customer retains control over both their DNS settings and their hosting server, making it easy to enable or disable Project Shield at any time with a simple DNS switch.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb979a834c0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Built on the strength of Google Cloud networking services, including Cloud Load Balancing, Cloud CDN, and Cloud Armor, Project Shield’s services can be configured through the Project Shield dashboard as a managed experience. This solution works together to mitigate attacks and serve cached content from multiple points on Google’s edge network. It’s a combination that has protected KrebsOnSecurity before, and has successfully defended many websites against some of the world’s largest DDoS attacks.
In the May incident against Krebs, the attack was filtered instantly by Google Cloud’s network. Requests for websites protected by Project Shield pass through Google Cloud Load Balancing, which automatically blocks layer 3 and layer 4 volumetric DDoS attacks.
In the May incident, the attacker sent large data packets to random ports at a rate of approximately 585 million packets per second, which is over 1,000 times the usual rate for KrebsOnSecurity.
The attack came from infected devices all around the world.
Cloud Armor, which embeds protection into every load balancer deployment, blocked the attack at the load balancing level because Project Shield sits behind the Google Cloud Load Balancer, which proxies only HTTP/HTTPS traffic. Had the attack occurred with well-formed requests (such as at Layer 7, also known as the application layer), additional defenses from the Google Cloud global front end would have been ready to defend the site.
Cloud CDN, for example, makes it possible to serve content for sites like KrebsOnSecurity from cache, lessening the load on a site’s servers. Cloud Armor would have actively filtered incoming requests for any remaining traffic that may have bypassed the cache to allow only legitimate traffic through.
Additionally, Cloud Armor’s Adaptive Protection uses real-time machine learning, which helps identify attack signatures and dynamically tailor rate limits. These rate limits are actively and continuously refined, allowing Project Shield to harness Google Cloud’s capabilities to mitigate almost all DDoS attacks in seconds.
Project Shield defenses are automated, with no customer defense configuration needed. They’re optimized to capitalize on the powerful blend of defensive tools in Google Cloud’s networking arsenal, which are available to any Google Cloud customer.
As KrebsOnSecurity and others have experienced, DDoS attacks have been getting larger, more sophisticated, and more frequent in recent years. Let the power and scale of Google Cloud help protect your site against attacks when you least expect them. Eligible organizations can apply for Project Shield today, and all organizations can set up their own Cloud Networking configuration like Project Shield by following this guide.
Developers love Cloud Run, Google Cloud’s serverless runtime, for its simplicity, flexibility, and scalability. And today, we’re thrilled to announce that NVIDIA GPU support for Cloud Run is now generally available, offering a powerful runtime for a variety of use cases that’s also remarkably cost-efficient.
Now, you can enjoy the following benefits across both GPUs and CPUs:
Pay-per-second billing: You are only charged for the GPU resources you consume, down to the second.
Scale to zero: Cloud Run automatically scales your GPU instances down to zero when no requests are received, eliminating idle costs. This is a game-changer for sporadic or unpredictable workloads.
Rapid startup and scaling Go from zero to an instance with a GPU and drivers installed in under 5 seconds, allowing your applications to respond to demand very quickly. For example, when scaling from zero (cold start), we achieved an impressive Time-to-First-Token of approximately 19 seconds for a gemma3:4b model (this includes startup time, model loading time, and running the inference)
Full streaming support: Build truly interactive applications with out-of-the box support for HTTP and WebSocket streaming, allowing you to provide LLM responses to your users as they are generated.
Support for GPUs in Cloud Run is a significant milestone, underscoring our leadership in making GPU-accelerated applications simpler, faster, and more cost-effective than ever before.
“Serverless GPU acceleration represents a major advancement in making cutting-edge AI computing more accessible. With seamless access to NVIDIA L4 GPUs, developers can now bring AI applications to production faster and more cost-effectively than ever before.” – Dave Salvator, director of accelerated computing products, NVIDIA
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eb98c7c11c0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
AI inference for everyone
One of the most exciting aspects of this GA release is that Cloud Run GPUs are now available to everyone for NVIDIA L4 GPUs, with no quota request required.This removes a significant barrier to entry, allowing you to immediately tap into GPU acceleration for your Cloud Run services. Simply use --gpu 1 from the Cloud Run command line, or check the “GPU” checkbox in the console, no need to request quota:
Production-ready
With general availability, Cloud Run with GPU support is now covered by Cloud Run’s Service Level Agreement (SLA), providing you with assurances for reliability and uptime. By default, Cloud Run offers zonal redundancy, helping to ensure enough capacity for your service to be resilient to a zonal outage; this also applies to Cloud Run with GPUs. Alternatively, you can turn off zonal redundancy and benefit from a lower price for best-effort failover of your GPU workloads in case of a zonal outage.
Multi-regional GPUs
To support global applications, Cloud Run GPUs are available in five Google Cloud regions: us-central1 (Iowa, USA), europe-west1 (Belgium), europe-west4 (Netherlands), asia-southeast1 (Singapore), and asia-south1 (Mumbai, India), with more to come.
Cloud Run also simplifies deploying your services across multiple regions. For instance, you can deploy a service across the US, Europe and Asia with a single command, providing global users with lower latency and higher availability. For instance, here’s how to deploy Ollama, one of the easiest way to run open models, on Cloud Run across three regions:
See it in action: 0 to 100 NVIDIA GPUs in four minutes
You can witness the incredible scalability of Cloud Run with GPUs for yourself with this live demo from Google Cloud Next 25, showcasing how we scaled from 0 to 100 GPUs in just four minutes.
Load testing a Stable Diffusion service running on Cloud Run GPUs to 100 GPU instances in four minutes.
Unlock new use cases with NVIDIA GPUs on Cloud Run jobs
The power of Cloud Run with GPUs isn’t just for real-time inference using request-driven Cloud Run services. We’re also excited to announce the availability of GPUs on Cloud Run jobs, unlocking new use cases, particularly for batch processing and asynchronous tasks:
Model fine-tuning: Easily fine-tune a pre-trained model on specific datasets without having to manage the underlying infrastructure. Spin up a GPU-powered job, process your data, and scale down to zero when it’s complete.
Batch AI inferencing: Run large-scale batch inference tasks efficiently. Whether you’re analyzing images, processing natural language, or generating recommendations, Cloud Run jobs with GPUs can handle the load.
Batch media processing: Transcode videos, generate thumbnails, or perform complex image manipulations at scale.
What Cloud Run customers are saying
Don’t just take our word for it. Here’s what some early adopters of Cloud Run GPUs are saying:
“Cloud Run helps vivo quickly iterate AI applications and greatly reduces our operation and maintenance costs. The automatically scalable GPU service also greatly improves the efficiency of our AI going overseas.” – Guangchao Li, AI Architect, vivo
“L4 GPUs offer really strong performance at a reasonable cost profile. Combined with the fast auto scaling, we were really able to optimize our costs and saw an 85% reduction in cost. We’ve been very excited about the availability of GPUs on Cloud Run.” – John Gill at Next’25, Sr. Software Engineer, Wayfair
“At Midjourney, we have found Cloud Run GPUs to be incredibly valuable for our image processing tasks. Cloud Run has a simple developer experience that lets us focus more on innovation and less on infrastructure management. Cloud Run GPU’s scalability also lets us easily analyze and process millions of images.” – Sam Schickler, Data Team Lead, Midjourney
Welcome to the second Cloud CISO Perspectives for May 2025. Today, Enrique Alvarez, public sector advisor, Office of the CISO, explores how government agencies can use AI to improve threat detection — and save money at the same time.
As with all Cloud CISO Perspectives, the contents of this newsletter are posted to the Google Cloud blog. If you’re reading this on the website and you’d like to receive the email version, you can subscribe here.
aside_block
<ListValue: [StructValue([(‘title’, ‘Get vital board insights with Google Cloud’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0babaf3580>), (‘btn_text’, ‘Visit the hub’), (‘href’, ‘https://cloud.google.com/solutions/security/board-of-directors?utm_source=cloud_sfdc&utm_medium=email&utm_campaign=FY24-Q2-global-PROD941-physicalevent-er-CEG_Boardroom_Summit&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
Do more with less: How governments can use AI to save money and improve threat detection
By Enrique Alvarez, public sector advisor, Office of the CISO
Enrique Alvarez, public sector advisor, Office of the CISO
Government agencies have long been a pressure chamber for some of cybersecurity’s most confounding problems, particularly constrained budgets and alert fatigue. While there may not be a single, sharp kopis that can slice through this Gordian knot, AI offers a potential solution that we’d be foolish to ignore.
By many measures, the situation government agencies face is dire. Headcounts and budgets are shrinking, cyber threats are increasing, and security alerts routinely threaten to overwhelm security operations center (SOC) team members, increasing toil and reducing effectiveness. The fiscal austerity facing government agencies is further exacerbated by not being able to fill open cybersecurity positions — nor replace departing experienced workers.
Fortunately, advances in AI models and tools provide a way forward.
Cybersecurity threats present significant challenges for government agencies, one exacerbated by decades of patchwork defensive measures.
Discussions around what AI is and what it can do are often sensationalized. For government agencies, a clear understanding of the different AI types is crucial. At its core, AI refers to the ability of machines to simulate human-like cognitive functions such as learning, problem-solving, and decision-making. This broad definition encompasses everything from rule-based systems to complex neural networks.
Scoping the threat: Unique risk profile for government agencies
Cybersecurity threats present significant challenges for government agencies, one exacerbated by decades of patchwork defensive measures.
The lack of a clear strategy and standardization across agencies has led to a fragmented security posture and a limited common operational picture, hindering effective threat detection and coordinated response. This decentralized approach creates vulnerabilities and makes it difficult to share timely and actionable threat intelligence.
Many public sector entities operate smaller SOCs with limited teams. This resource constraint makes it challenging to effectively monitor complex networks, analyze the ever-increasing volume of alerts, and proactively hunt for threats. Alert fatigue and burnout are significant concerns in these environments.
Heightened risk from vendor lock-in
A crucial additional factor is that many government agencies operate in de facto vendor lock-in environments. A heavy reliance on one vendor for operating systems, productivity software, and mission-critical operations comes with greatly-increased risk.
While these tools are familiar to the workforce, their ubiquity makes them an attractive vector for phishing campaigns and vulnerability exploitation. The Department of Homeland Security’s Cyber Safety Review Board highlighted this risk and provided recommendations focused on protecting digital identity standards. Agencies should be vigilant about securing these environments and mitigating the risks associated with vendor lock-in, which can limit flexibility and increase costs in the long run.
By automating the initial triage and analysis of security alerts, agencies can better respond, predict resource allocation, and develop more accurate cybersecurity budgets. This automation can reduce the need for constant manual intervention in routine tasks, leading to more predictable operational costs and a more effective cybersecurity team.
The prevalence of legacy on-premises databases and increasingly complex multicloud infrastructure adds another layer of difficulty. Securing outdated systems alongside diverse cloud environments requires specialized skills and tools, further straining resources and potentially introducing vulnerabilities.
Addressing these multifaceted challenges requires a strategic and coordinated effort focused on standardization, robust security practices, and resource optimization.
How AI can help: Automating the future (of threat detection)
AI-based threat detection models offer a promising path toward a more resilient cybersecurity posture. By combining AI’s advanced capabilities with real-time cybersecurity intelligence and tooling, key cybersecurity workflows can be greatly streamlined.
Previously, these workflows required heavy personnel investment, such as root cause analysis, threat analysis, and vulnerability impact. As we’ve seen, AI-driven automation can provide a crucial assist in scaling for the true scope of the threat landscape, while also accelerating time-to-completion. At Google Cloud, we are seeing the benefits of AI in security today, as these three examples demonstrate.
However, achieving optimal effectiveness for government agencies requires a tailored approach.
Public sector networks often have unique configurations, legacy systems, and security-focused workflows that differ from commercial enterprises. By ingesting agency-specific data — logs, network traffic patterns, and historical incident data — AI models can learn baseline behaviors, identify deviations more accurately, reduce false positives, and improve detection rates for threats specific to public sector networks.
Adding the automation inherent in agentic AI-driven threat detection leads to better security and more sustainable operations. By automating the initial triage and analysis of security alerts, agencies can better respond, predict resource allocation, and develop more accurate cybersecurity budgets. This automation can reduce the need for constant manual intervention in routine tasks, leading to more predictable operational costs and a more effective cybersecurity team.
Ultimately, automating threat detection will maximize the capabilities of SOC staff and reduce toil so that teams can focus on the most important alerts. By offloading repetitive tasks like initial alert analysis and basic threat correlation to agentic AI, human analysts can focus on more complex investigations, proactive threat hunting, and strategic security planning. This shift can improve job satisfaction and also enhance the overall effectiveness and efficiency of the SOC.
At Google Cloud’s Office of the CISO, we’re optimistic that embracing AI can help improve threat detection even as overall budgets are reduced. Sometimes, you really can do more with less.
<ListValue: [StructValue([(‘title’, ‘Join the Google Cloud CISO Community’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0babaf3520>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://rsvp.withgoogle.com/events/ciso-community-interest?utm_source=cgc-blog&utm_medium=blog&utm_campaign=2024-cloud-ciso-newsletter-events-ref&utm_content=-&utm_term=-‘), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
In case you missed it
Here are the latest updates, products, services, and resources from our security teams so far this month:
10 actionable lessons for modernizing security operations: Google Cloud’s Office of the CISO shares lessons learned from the manufacturing sector on how to modernize security operations. Read more.
Tracking the cost of quantum factoring: Our latest research updates how we characterize the size and performance of a future quantum computer that could likely break current cryptography algorithms. Read more.
How Confidential Computing lays the foundation for trusted AI: Confidential Computing has redefined how organizations can securely process their most sensitive data in the cloud. Here’s what’s new. Read more.
Please visit the Google Cloud blog for more security stories published this month.
aside_block
<ListValue: [StructValue([(‘title’, ‘Fact of the month’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0babaf3af0>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://cloud.google.com/blog/topics/threat-intelligence/m-trends-2025’), (‘image’, <GAEImage: GCAT-replacement-logo-A>)])]>
Threat Intelligence news
How cybercriminals weaponize fake AI-themed websites: Mandiant Threat Defense has been investigating since November an UNC6032 campaign that uses fake AI video generator websites to distribute malware. Here’s what we know. Read more.
Pwning calendars for command and control: Google Threat Intelligence Group (GTIG) has observed malware that took advantage of Google Calendar for command and control being hosted on an exploited government website, and subsequently used to attack other government websites. The activity has been attributed to APT41. Read more.
Cybercrime hardening guidance from the frontlines: The U.S. retail sector is currently being targeted in ransomware operations that GTIG suspects is linked to UNC3944, also known as Scattered Spider. UNC3944 is a financially-motivated threat actor characterized by its persistent use of social engineering and brazen communications with victims. Here’s our latest proactive hardening recommendations to combat their threat activities. Read more.
Please visit the Google Cloud blog for more threat intelligence stories published this month.
Now hear this: Podcasts from Google Cloud
Betting on the future of security operations with AI-native MDR: What does AI-first managed detection and response get right? What does it miss? How does it compare to traditional security operations? Tenex.AI’s Eric Foster and Venkata Koppaka join hosts Anton Chuvakin and Tim Peacock for a lively discussion about the future of MDR Listen here.
AI supply chain security: Old lessons, new poisons, and agentic dreams: How does the AI supply chain differ from other software supply chains? Can agentic AI secure itself? Christine Sizemore, Google Cloud security architect connects the supply-chain links with Anton and Tim. Listen here.
What we learned at RSAC 2025: Anton and Tim discuss their RSA Conference experiences this year. How did the show floor hold up to the complicated reality of today’s information security landscape? Listen here.
How boards can address AI risk: Christian Karam, strategic advisor and investor, joins Office of the CISO’s Alicja Cade and David Homovich to chat about the important role that board can play in addressing AI-driven risks. Listen here.
Defender’s Advantage: Confronting a North Korean IT worker incident: Mandiant Consulting’s J.P. Glab joins host Luke McNamara to walk through North Korean IT worker activity — and how Mandiant responds. Listen here.
To have our Cloud CISO Perspectives post delivered twice a month to your inbox, sign up for our newsletter. We’ll be back in a few weeks with more security-related updates from Google Cloud.
In today’s rapidly evolving technological landscape, artificial intelligence (AI) stands as a transformative force, reshaping industries and redefining possibilities. Recognizing AI’s potential and leveraging its data landscape on Google Cloud, Magyar Telekom, Deutsche Telekom’s Hungarian operator, embarked on a journey to empower its workforce with AI knowledge and tools. This endeavor led to the creation of Pluto AI — an internal AI platform that has grown into a comprehensive framework for diverse AI solutions.
As one of Hungary’s largest telecommunications operators, Magyar Telekom’s ultimate vision is to embed AI into every aspect of its operations, empowering every employee to leverage AI’s potential. Pluto AI is a significant step toward achieving this goal, fostering a culture of innovation and data-driven decision-making.
Magyar Telekom’s leadership recognized that AI proficiency is now essential for future success. However, the company faced challenges, including employees with varying levels of AI understanding and a lack of accessible tools for experimentation and practical application. As a result, Magyar Telekom aimed to democratize AI knowledge and foster a culture of experimentation by building a scalable solution that could adapt to its evolving AI needs and support a wide range of use cases.
To enable business teams across Magyar Telekom to utilize generative AI, the Pluto AI team developed a simple tool that provided a safe and compliant way to prompt large language models (LLMs). They also created educational content and training for business teams on how to use gen AI and what opportunities it brings. This approach provided other teams with the building blocks to quickly construct the AI solutions they needed.
With Pluto AI, Magyar Telekom spearheaded the successful adoption of gen AI across the company, quickly expanding the platform to support additional use cases without the need for the central platform team to have a deep understanding of them.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0bae7034f0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Developing Pluto AI
Magyar Telekom’s AI Team partnered with Google Cloud Consulting to accelerate the development of Pluto AI. This collaboration ensured that the platform was built on best practices, aligned with industry standards, and met security and compliance requirements of a regulated industry.
Here are some of the key features and functionality of Pluto AI:
1. Modular framework
Pluto AI’s modular architecture allows teams to seamlessly integrate, change, and update various AI models, tools and various architectural patterns. This flexibility enables the platform to cater to a wide range of use cases and rapidly evolve alongside Magyar Telekom’s AI strategy.
The core modules of Pluto AI include:
Large language models: Pluto AI integrates with state-of-the-art LLMs, enabling natural language understanding, text and image generation, and conversational AI applications.
Code generation and assistance: The platform supports code generation, autocompletion, and debugging, boosting developer productivity and code quality. Pluto AI provides both a coding model, accessible via its user interface, for all development levels and IDE integration for experienced coders.
API: Pluto AI’s models can be called via API, enabling all parts of Magyar Telekom to utilize and integrate AI capabilities into their existing and new solutions.
Retrieval augmented generation (RAG) with grounding capabilities: RAG combines LLMs with internal knowledge sources, including multimodal content like images and videos. This enables teams to build AI assistants that can access and synthesize information from vast datasets and add evidence like extended citations from both corporate and public data to their responses.
Customizable AI assistants: Users can create tailored, personalized AI assistants by defining system prompts, uploading documents, and fine-tuning model behavior to meet their business needs.
2. Technical implementation
Pluto AI runs on Compute Engine using virtual machines, providing scalability, reliability, and efficient resource management. The platform also utilizes foundation models from the Model Garden on Vertex AI, including Google’s Gemini, Imagen, and Veo models, Anthropic’s Claude 3.5 Sonnet, and more. Magyar Telekom also deployed ElasticSearch on Google Cloudto store the knowledge bases necessary for enabling RAG workflows.
In addition to these core components, Pluto AI also utilizes other Google Cloud services to help develop production-ready applications, such as Cloud Logging, Pub/Sub, Cloud Storage, Firestore, and Looker.
3. User interface and experience
Pluto AI’s intuitive interface makes AI tools accessible to users with varying technical expertise. A dropdown menu allows users to easily navigate between different modules and functionalities. The platform’s design prioritizes user experience, ensuring that employees can leverage AI capabilities without a steep learning curve.
Impact and adoption
Pluto AI has seen impressive adoption rates, with hundreds of daily active users across different departments. The platform’s user-friendly design and practical applications have garnered positive feedback from Magyar Telekom employees.
In addition, Pluto AI has enabled the development of various AI assistants, including legal and compliance assistants that accelerate contract review, identify compliance risks, and analyze legal documents. Knowledge management assistants have enhanced knowledge sharing and retrieval across the organization, while software development has benefited from code generation and assistance tools. Additionally, AI-powered chatbots that handle routine inquiries have significantly improved customer support experiences.
Magyar Telekom has seen quantifiable results since rolling out Pluto AI. These include hundreds of daily unique users, tens of thousands of API calls, an estimated 20% reduction in the time spent reviewing legal documents, and a 15% decrease in code defects.
Vision and future roadmap for Pluto AI
Magyar Telekom sees Pluto AI as a key part of its AI strategy going forward. To maximize its impact, the company intends to expand the platform to more markets, business units, and departments within the organization. Additionally, Magyar Telekom is looking into the possibility of offering Pluto AI as a service or a product to other Deutsche Telekom markets. The company is also planning to build a library of reusable AI modules and frameworks that can be easily adapted to different use cases.
Magyar Telekom is pursuing several key initiatives to enhance Pluto AI and expand its capabilities. These efforts include investigating the potential of agent-based AI systems to automate complex tasks and workflows, adding a language selector for multilingual support to cater to a diverse user base, and developing an enhanced interface for managing RAG solutions, monitoring usage, and tracking performance metrics. Magyar Telecom also plans to continue developing dashboards for monitoring and optimizing cloud resource usage and costs.
Pluto AI has transformed Magyar Telekom’s AI landscape, making AI accessible, practical, and impactful. By providing a user-friendly platform, fostering experimentation, and delivering tangible business value, Pluto AI has set a new standard for internal AI adoption.