Amazon Connect Customer Profiles now offers enhanced calculated attributes with timestamp controls, historical data backfill, and improved limits to help businesses transform customer data into actionable insights. Customers now can specify timestamps on their data, including future-dated events, and process historical data information with increased limits.
Amazon Connect Customer Profiles offers calculated attributes that transform customer behavior data (e.g., contacts, orders, web visits) into actionable insights such as a customer’s preferred channel to drive proactive outbound campaigns, dynamic routing, and personalize IVRs without requiring engineering resources. With new enhancements, customers can now create more accurate and relevant calculated attributes by controlling which timestamps are used for calculations and ensuring proper chronological ordering regardless of the ingestion sequence. The new historical calculation capability automatically includes previously ingested data when creating new attributes, eliminating the wait time for meaningful insights for customer engagement. These enhancements enable sophisticated use cases like tracking upcoming appointments, analyzing long-term customer behavior patterns, evaluating customer lifetime value, and ensuring agents are prepared with relevant context before customer interactions.
Amazon Connect Customer Profiles is available in US East (N. Virginia), US West (Oregon), Africa (Cape Town), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Seoul), Canada (Central), Europe (Frankfurt), and Europe (London). To learn more, refer to our help documentation, visit our webpage, and view the API reference guide.
Amazon CloudWatch agent now supports collecting detailed performance statistics for Amazon Elastic Block Store (EBS) volumes attached to Amazon EC2 instances and Amazon EKS nodes. The CloudWatch agent can be configured to collect NVMe-based metrics including queue depth, number of operations, bytes sent and received, and time spent on read and write I/O operations, making them available as custom metrics in CloudWatch.
This enhancement provides customers more granular visibility into their volume’s I/O performance so they can quickly identify and proactively troubleshoot application performance bottlenecks. Customers can use these detailed metrics in CloudWatch to analyze I/O patterns, track performance trends, create custom dashboards, and set up automated alarms based on performance thresholds, helping them maintain optimal storage performance and improve resiliency for their workloads and applications.
EBS detailed performance statistics via Amazon CloudWatch agent are available for all EBS volumes attached to Nitro-based EC2 instances in all AWS Commercial and AWS GovCloud (US) Regions. See the Amazon CloudWatch pricing page for CloudWatch pricing details.
To get started with detailed performance statistics for Amazon Elastic Block Store (EBS) volumes in CloudWatch, see Configuring the CloudWatch agent in the Amazon CloudWatch User Guide. To learn more about detailed performance statistics for Amazon Elastic Block Store (EBS) volumes, see Amazon EBS detailed performance statistics in the Amazon EBS User Guide.
Amazon Elastic File System (EFS) now supports Internet Protocol Version 6 (IPv6) for both EFS APIs and mount targets, enabling customers to manage and mount file systems using IPv4, IPv6, or dual-stack clients.
This launch helps customers meet IPv6 compliance requirements and eliminates the need for complex infrastructure to handle IPv6 to IPv4 address translation. Customers can now use IPv6 clients to access EFS APIs through new dual-stack endpoints. Additionally, customers can mount file systems using IPv6 by specifying an IP address type when creating mount targets.
EFS offers IPv6 support in all AWS Commercial and the AWS GovCloud (US) Regions. To get started using IPv6 on EFS, refer to the EFS user guide.
Today, we’re introducing Pub/Sub Single Message Transforms (SMTs) to make it easy to perform simple data transformations right within Pub/Sub itself.
This comes at a time when businesses are increasingly reliant on streaming data to derive real-time insights, understand evolving customer trends, and ultimately make critical decisions that impact their bottom line and strategic direction. In this world, the sheer volume and velocity of streaming data present both opportunities and challenges. Whether you’re generating and analyzing data, ingesting data from another source, or syndicating your data for others to use, you often need to perform transforms on that data to match your use case. For example, if you’re providing data to other teams or customers, you may have the need to redact personally identifiable information (PII) from the messages before sharing data. And if you’re using data you generated or sourced from somewhere else – especially unstructured data – you may need to perform data format conversions or other types of data normalization.
Traditionally, the options for these simple transformations within a message involve either altering the source or destination of the data (which may not be an option) or using an additional component like Dataflow or Cloud Run, which incurs additional latency and operational overhead.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e93b19501c0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Pub/Sub SMTs
An overarching goal of Pub/Sub is to simplify streaming architectures. We already greatly simplified data movement with Import Topics and Export Subscriptions, which removed the need to use additional services for ingesting raw streaming data through Pub/Sub into destinations like BigQuery. Pub/Sub Single Message Transforms (SMTs), designed to be a suite of features making it easy to validate, filter, enrich, and alter individual messages as they move in real time.
The first SMT is available now: JavaScript User-Defined Functions (UDFs), which allows you to perform simple, lightweight modifications to message attributes and/or the data directly within Pub/Sub via snippets of JavaScript code.
Key examples of such modifications include:
Simple transforms: Perform common single message transforms such as data format conversion, casting, adding a new composite field.
Enhanced filtering: Filter based on message data (not just attributes), and regular expression based filters
Data masking and redaction:Safeguard sensitive information by employing masking or redaction techniques on fields containing PII data.
In order to stay true to Pub/Sub’s objective of decoupling publishers and subscribers, UDF transforms can be applied independently to a topic, a subscription, or both based on your needs.
JavaScript UDFs in Pub/Sub provide three key benefits:
Flexibility: JavaScript UDFs give you complete control over your transformation logic, catering to a wide variety of use cases, helping deliver a diverse set of transforms.
Simplified pipelines: Transformations happen directly within Pub/Sub, eliminating the need to maintain extra services or infrastructure for data transformation.
Performance: End-to-end latencies are improved for streaming architectures, as you avoid the need for additional products for lightweight transformations.
Pub/Sub JavaScript UDF Single Message Transforms are easy to use. You can add up to five JavaScript transforms on the topic and/or subscription. If a Topic SMT is configured, Pub/Sub transforms the message with the SMT logic and persists the transformed message. If a subscription SMT is configured, Pub/Sub transforms the message before sending the message to the subscriber. In the case of an Export Subscription, the transformed message gets written to the destination. Please see the Single Message Transform overview for more information.
Getting started with Single Message Transforms
JavaScript UDFs as the first Single Message Transform is generally available starting today for all users. You’ll find the new “Add Transform” option in the Google Cloud console when you create a topic or subscription in your Google Cloud project. You can also use gcloud CLI to start using JavaScript Single Message Transforms today.
We plan to launch additional Single Message Transforms in the coming months such as schema validation/encoding SMT, AI Inference SMT, and many more, so stay tuned for more updates on this front.
Today, we’re excited to announce the general availability of the memory-optimized machine series: Compute Engine M4, our most performant memory-optimized VM with under 6TB of memory.
The M4 family is designed for workloads like SAP HANA, SQL Server, and in-memory analytics that benefit from higher memory-to-core ratio. The M4 is based on Intel’s latest 5th generation Xeon processors (code-named Emerald Rapids), with instances scaling up to 224 vCPUs and 6TB of DDR5 memory. M4 offers two ratios of memory to vCPU, allowing you to choose to upgrade your memory-optimized infrastructure. They are offered in predefined shapes with a 13.3:1 and 26.6:1 memory/core ratio, for instance shapes ranging from 372GB to 6TB, with complete SAP HANA certification in all shapes and sizes.
M4 VMs are also engineered and fine-tuned to deliver consistent performance, with up to 66% better price-performance compared to our previous memory-optimized M31. The M4 outperforms the M3 with up to 2.25x2 more SAPs, a substantial improvement in overall performance. Additionally, M4 delivers up to 2.44x better price performance compared to the M23. To support customers’ most business-critical workloads, M4 offers enterprise-grade reliability and granular controls for scheduled maintenance, and is backed by Compute Engine’s Memory Optimized 99.95% Single Instance SLA — important for business-critical in-memory database workloads such as SAP.
“We are excited to announce our collaboration with Google Cloud to bring the power of the 5th Gen Intel Xeon processors to the first memory-optimized (M4) instance type among leading hyperscalers. This launch represents a significant milestone in delivering cutting-edge performance, scalability, and efficiency to cloud users for large-scale databases such as SAP Hana and memory-intensive workloads. The new M4 instance delivers advanced capabilities for today and future workloads, empowering businesses to innovate and grow in the digital era.” – Rakesh Mehrotra, VP & GM DCAI Strategy & Product Management, Intel
A full portfolio of memory–optimized machine instances
M4 is just the latest in a long line of Compute Engine’s memory-optimized VM family. We introduced the M1 in 2018 for SAP HANA. M2 followed in 2019, supporting larger workloads. In 2023, we introduced M3, with improved performance and new features. X4 launched in 2024, supporting the largest in-memory databases, with up to 32TB of memory, making Google Cloud the first hyperscaler with an SAP-certified instance of that size.
“For years, SAP and Google Cloud have had a powerful partnership, helping businesses transform with RISE with SAP on Google Cloud. Now, fueled by the enhanced performance, high reliability, and cost efficiency of M4 machines, we’re accelerating our mission to deliver even greater value to our shared customers.” – Lalit Patil, CTO for RISE with SAP, Enterprise Cloud Services, SAP SE
Today, both customers and internal Google teams are adopting the M4 to take advantage of increased performance, new shapes, and Compute Engine’s newest innovations.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e93b191c100>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/compute’), (‘image’, None)])]>
Powered by Titanium
M4 is underpinned by Google’s Titanium offload technology, enabling ultra low-latency with up to 200 Gb/s of networking bandwidth. By offloading storage and networking to the Titanium adapter, host resources are preserved for running your workloads. Titanium also provides M4 with enhanced lifecycle management, reliability, and security. With Titanium’s hitless upgrades and live migration capabilities, most infrastructure maintenance can be performed with minimal to no disruption, helping to ensure predictable performance. Additionally, Titanium’s custom-built security hardware root-of-trust further strengthens the security of customer workloads.
Next-level storage with Hyperdisk
M4 VMs come with the latest Hyperdisk storage technology, now available in both Hyperdisk Balanced and Hyperdisk Extreme options. With up to 320K IOPS per instance, Hyperdisk Balanced delivers a blend of performance and cost-efficiency for a wide range of workloads, handling typical transactional throughput and moderate query volumes effectively. Hyperdisk Extreme pushes the boundaries of storage performance, up to 500K IOPS and up to 10,000 MiB/s of throughput per M4 instance for the most demanding applications such as SAP HANA’s in-memory database operations, which require low-latency access to large datasets. You can attach up to 64 Hyperdisk volumes per M4 VM, with up to 512 TiB of total capacity, with a mix of Balanced and Extreme volumes.
Hyperdisk’s benefits go beyond raw performance. It allows you to dynamically tune IOPS and bandwidth in real time, so your workloads consistently have the resources they need. Hyperdisk storage pools, available for Hyperdisk Balanced volumes, support capacity pooling and flexible allocation of storage resources, optimizing both utilization and cost-efficiency. As a result, Hyperdisk delivers not only high performance and flexibility but also a significant reduction in total cost of ownership (TCO) compared to traditional storage solutions. The combination of Hyperdisk’s advanced features and Titanium’s storage acceleration offloads storage processing from the CPU, frees up compute resources, and enhances overall M4 performance.
For SAP applications, including SAP NetWeaver-based applications deployed on non-SAP HANA databases (SAP ASE, DB2, SQL Server), such as SAP Business Suite and SAP Business Warehouse (BW), SAP certifications are available for the following machine shapes: 372GB, 744GB, 1,488GB, 2,976GB and 5,952GB. You can find more information on supported SAP applications in SAP Note 2456432.
Get started today
Whether you’re running advanced analytics, complex algorithms, or real-time insights for critical workloads on databases like SAP HANA and SQL Server in the cloud, M4 VMs provide the performance, features, and stability to meet your business needs. With high-performance infrastructure designed to handle massive datasets, M4 VMs offer robust memory and compute capabilities that can meet the needs of your most demanding workloads.
M4 instances are currently available in us-east4, europe-west4, europe-west3, us-central1 and will be coming to additional regions. Like other instances of the M machine family, you can purchase them on-demand orcommitted use discounts(CUDs). For more, see the M4’s predefinedcompute resource pricing or start using M4 in your next project today.
1. M3-megamem-64 compared to M4-megamem-56. Performance based on the estimated SPECrate®2017_int_base performance benchmark score 2. M4-megamem-224 comparing to M3-megamem-128 3. M4-ultramem-224 comparing to M2-ultramem-208
Today, we are announcing the availability of AWS Backup in the Asia Pacific (Taipei) Region. AWS Backup is a fully-managed, policy-driven service that allows you to centrally automate data protection across multiple AWS services spanning compute, storage, and databases. Using AWS Backup, you can centrally create and manage backups of your application data, protect your data from inadvertent or malicious actions with immutable recovery points and vaults, and restore your data in the event of a data loss incident.
You can get started with AWS Backup using the AWS Backup console, SDKs, or CLI by creating a data protection policy and then assigning AWS resources to it using tags or Resource IDs. For more information on the features available in the Asia Pacific (Taipei) Region, visit the AWS Backup product page and documentation. To learn about the Regional availability of AWS Backup, see the AWS Regional Services List.
Today, Amazon SageMaker AI announces the general availability of Amazon EC2 P6-B200 instances in Training Jobs, powered by NVIDIA B200 GPUs. Amazon EC2 P6-B200 instances offer up to 2x performance compared to P5en instances for AI training.
P6-B200 instances feature 8 Blackwell GPUs with 1440 GB of high-bandwidth GPU memory and a 60% increase in GPU memory bandwidth compared to P5en, 5th Generation Intel Xeon processors (Emerald Rapids), and up to 3.2 terabits per second of Elastic Fabric Adapter (EFAv4) networking. P6-B200 instances are powered by the AWS Nitro System, so you can reliably and securely scale AI workloads within Amazon EC2 UltraClusters to tens of thousands of GPUs.
The instances are available through SageMaker HyperPod Flexible Training Plans in US West (Oregon) AWS Region. For on-demand reservation of B200 instances, please reach out to your account manager.
Amazon SageMaker Model Training lets you easily train machine learning models at scale using fully managed infrastructure optimized for performance and cost. To get started with Training Jobs, visit SageMaker Model Training.
Amazon OpenSearch Ingestion now allows you ingest data from Atlassian Jira and Confluence and seamlessly index it in Amazon OpenSearch managed clusters and serverless collections. With this integration, you can now create a unified searchable knowledge base of all your data in Atlassian Jira and Confluence to power your RAG applications.
This integration allows data ingestion with flexible filtering options for projects and types in Jira and spaces and pages in Confluence ensuring that only the information you need is imported. Updates to your data in Jira and Confluence is continuously monitored and automatically synchronized with indices in Amazon OpenSearch Service. To ensure secure and reliable connectivity, multiple authentication methods, including basic API key authentication and OAuth2 authentication, with the added security of managing credentials using a secret stored in AWS Secrets Manager are supported.
This feature is available in all the 16 AWS commercial regions where Amazon OpenSearch Ingestion is currently available: US East (Ohio), US East (N. Virginia), US West (Oregon), US West (N. California), Europe (Ireland), Europe (London), Europe (Frankfurt), Europe (Spain), Asia Pacific (Tokyo), Asia Pacific (Sydney), Asia Pacific (Singapore), Asia Pacific (Mumbai), Asia Pacific (Seoul), Canada (Central), South America (Sao Paulo), and Europe (Stockholm).
To get started, you can start ingesting data from Atlassian Jira and Confluence using the AWS Management Console, AWS SDK, or CLI. To learn more about this feature, see the Amazon OpenSearch Service Developer Guide.
Amazon SageMaker now offers an upgrade experience that enables customers to transition from SageMaker Studio to SageMaker Unified Studio while preserving their existing resources and maintaining consistent access controls. This new capability allows customers to import their SageMaker AI domains, user profiles, and spaces into SageMaker Unified Studio without redeploying infrastructure. The upgrade tool ensures that identity, authentication, and authorization experiences remain consistent, with users retaining access to only the resources they were previously permitted to use.
With this upgrade experience, customers can continue to access their resources from both SageMaker Studio and SageMaker Unified Studio during the transition period, allowing teams to gradually adapt to the new experience. The tool preserves access to existing JupyterLab and CodeEditor spaces, as well as other SageMaker AI resources like training jobs, ML pipelines, models, inference endpoints etc, previously created from SageMaker Studio. Administrators maintain control over the upgrade process and can disable access to SageMaker Studio once users are comfortable with the SageMaker Unified Studio experience. The upgrade tool is available as an open-source solution that provides a guided, step-by-step process to ensure a smooth transition to SageMaker Unified Studio.
The upgrade experience is available in all AWS Commercial Regions where the next generation of Amazon SageMaker is available. See the supported regions list for more details. To learn more about upgrading from SageMaker Studio to SageMaker Unified Studio, visit the GitHub repository, and to learn more about the next generation of Amazon SageMaker, visit the product detail page.
Customers can now create file systems using Amazon Elastic File System (Amazon EFS) in the AWS Asia Pacific (Taipei) Region.
Amazon EFS is designed to provide serverless, fully elastic file storage that lets you share file data without provisioning or managing storage capacity and performance. It is built to scale on demand to petabytes without disrupting applications, growing and shrinking automatically as you add and remove files. Because Amazon EFS has a simple web services interface, you can create and configure file systems quickly and easily. The service is designed to manage file storage infrastructure for you, meaning that you can avoid the complexity of deploying, patching, and maintaining complex file system configurations.
For more information, visit the Amazon EFS product page, and see the AWS Region Table for complete regional availability information.
Amazon Connect now allows you to track durations of holds initiated by individual agents in multiparty calling scenarios through the new Agent Initiated Hold Duration field on the contact record. This new field allows contact center managers to gain insights into hold patterns at the individual agent level during customer interactions. This also provides other benefits including better agent performance management, allowing managers to identify areas for improvement in call handling. Additionally, it helps in optimizing your customers’ experience by providing insights into hold patterns and durations across different agents and scenarios. This level of granularity in data can lead to more informed decision-making in workforce management and training initiatives.
AWS Key Management Service (KMS) is announcing support for on-demand rotation of symmetric encryption KMS keys with imported key material. This new capability enables you to rotate the cryptographic key material of Bring Your Own Keys (BYOK) keys without changing the key identifier (key ARN). Rotating keys helps you meet compliance requirements and security best practices that mandate periodic key rotation.
Organizations can now better align key rotation with their internal security policies when using imported keys within AWS KMS. This new on-demand rotation capability supports both immediate rotation as well as scheduled rotation. Similar to flexible rotation for standard KMS keys, this new rotation capability offers seamless transition to new key material within an existing KMS key ARN and key alias, with zero downtime and complete backwards compatibility with existing data protected under this key.
The pace of innovation in open-source AI is breathtaking, with models like Meta’s Llama4 and DeepSeek AI’s DeepSeek. However, deploying and optimizing large, powerful models can be complex and resource-intensive. Developers and machine learning (ML) engineers need reproducible, verified recipes that articulate the steps for trying out the models on available accelerators.
Today, we’re excited to announce enhanced support and new, optimized recipes for the latest Llama4 and DeepSeek models, leveraging our cutting-edge AI Hypercomputer platform. AI Hypercomputer helps build a strong AI infrastructure foundation using a set of purpose-built infrastructure components that are designed to work well together for AI workloads like training and inference. It is a systems-level approach that draws from our years of experience serving AI experiences to billions of users, and combines purpose-built hardware, optimized software and frameworks, and flexible consumption models. Our AI Hypercomputer resourcesrepository on GitHub, your hub for these recipes, continues to grow.
In this blog, we’ll show you how to access Llama4 and DeepSeek models today on AI Hypercomputer.
Added support for new Llama4 models
Meta recently released the Scout and Maverick models in the Llama4 herd of models. Llama 4 Scout is a 17 billion active parameter model with 16 experts, and Llama 4 Maverick is a 17 billion active parameter model with 128 experts. These models deliver innovations and optimizations based on a Mixture of Experts (MoE) architecture. They support multimodal capability and long context length.
But serving these models can present challenges in terms of deployment and resource management. To help simplify this process, we’re releasing new recipes for serving Llama4 models on Google Cloud Trillium TPUs and A3 Mega and A3 Ultra GPUs.
JetStream, Google’s throughput and memory-optimized engine for LLM inference on XLA devices, now supports Llama-4-Scout-17B-16E and Llama-4-Maverick-17B-128Einference on Trillium, the sixth-generation TPU. New recipes now provide the steps to deploy these models using JetStream and MaxText on a Trillium TPU GKE cluster. vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. New recipes now demonstrate how to use vLLM to serve the Llama4 Scout and Maverick models on A3 Mega and A3 Ultra GPU GKE clusters.
For serving the Maverick model on TPUs, we utilize Pathways on Google Cloud. Pathways is a system which simplifies large-scale machine learning computations by enabling a single JAX client to orchestrate workloads across multiple large TPU slices. In the context of inference, Pathways enables multi-host serving across multiple TPU slices. Pathways is used internally at Google to train and serve large models like Gemini.
MaxTextprovides high performance, highly scalable, open-source LLM reference implementations for OSS models written in pure Python/JAX and targeting Google Cloud TPUs and GPUs for training and inference. MaxText now includes reference implementations for Llama4 Scout and Maverick models and includes information on how to perform checkpoint conversion, training, and decoding for Llama4 models.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3edc3b7832b0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Added support for DeepSeek Models
Earlier this year, Deepseek released two open-source models: the DeepSeek-V3 model followed by DeepSeek-R1 model. The V3 model provides model innovations and optimizations based on an MoE-based architecture. The R1 model provides reasoning capabilities through the chain-of-thought thinking process.
To help simplify deployment and resource management, we’re releasing new recipes for serving DeepSeek models on Google Cloud Trillium TPUs and A3 Mega and A3 Ultra GPUs.
JetStream now supports DeepSeek-R1-Distill-Llama70B inference on Trillium. A new recipe now provides the steps to deploy DeepSeek-R1-Distill-Llama-70B using JetStream and MaxText on a Trillium TPU VM. With the recent ability to work with Google Cloud TPUs, vLLM users can leverage the performance-cost benefits of TPUs with a few configuration changes. vLLM on TPU now supports all DeepSeek R1 Distilled models on Trillium. Here’s a recipe which demonstrates how to use vLLM, a high-throughput inference engine, to serve the DeepSeek distilled Llama model on Trillium TPUs.
You can also deploy DeepSeek Models using the SGLang Inference stack on our A3 Ultra VMs powered by eight NVIDIA H200 GPUs with this recipe.A recipe for A3 Mega VMs with SGLang is also available, which shows you how to deploy multihost inference utilizing two A3 Mega nodes. Cloud GPU users using the vLLM Inference engine can also deploy DeepSeek Models on the A3 Mega (recipe) and A3 Ultra (recipe) VMs.
MaxText now also includes support for architectural innovations from DeepSeek such as MLA – Multi-Head Latent Attention, MoE Shared and Routed Experts with Loss Free Load Balancing, Expert Parallelism support with Dropless, Mixed Decoder Layers ( Dense and MoE ) and YARN RoPE embeddings. The reference implementations for the DeepSeek family of models allows you to rapidly experiment with your models by incorporating some of these newer architectural enhancements.
Recipe example
The reproducible recipes show the steps to deploy and benchmark inference with the new Llama4 and DeepSeek models. For example, this TPU recipe outlines the steps to deploy the Llama-4-Scout-17B-16E Model with JetStream MaxText Engine with Trillium TPU. The recipe shows steps to provision the TPU cluster, download the model weights and set up JetStream and MaxText. It then shows you how to convert the checkpoint to a compatible format for MaxText, deploy it on a JetStream server, and run your benchmarks.
You can deploy Llama4 Scout and Maverick models or DeepSeekV3/R1 models today using inference recipes from the AI Hypercomputer Github repository. These recipes provide a starting point for deploying and experimenting with Llama4 models on Google Cloud. Explore the recipes and resources linked below, and stay tuned for future updates. We hope you have fun building and share your feedback!
When you deploy open models like DeepSeek and Llama, you are responsible for its security and legal compliance. You should follow responsible AI best practices, adhere to the model’s specific licensing terms, and ensure your deployment is secure and compliant with all regulations in your area.
Looking to fine-tune multimodal AI models for your specific domain but facing infrastructure and implementation challenges? This guide demonstrates how to overcome the multimodal implementation gap using Google Cloud and Axolotl, with a complete hands-on example fine-tuning Gemma 3 on the SIIM-ISIC Melanoma dataset. Learn how to scale from concept to production while addressing the typical challenges of managing GPU resources, data preparation, and distributed training.
Filling in the Gap
Organizations across industries are rapidly adopting multimodal AI to transform their operations and customer experiences. Gartner analysts predict 40% of generative AI solutions will be multimodal (text, image, audio and video) by 2027, up from just 1% in 2023, highlighting the accelerating demand for solutions that can process and understand multiple types of data simultaneously.
Healthcare providers are already using these systems to analyze medical images alongside patient records, speeding up diagnosis. Retailers are building shopping experiences where customers can search with images and get personalized recommendations. Manufacturing teams are spotting quality issues by combining visual inspections with technical data. Customer service teams are deploying agents that process screenshots and photos alongside questions, reducing resolution times.
Multimodal AI applications powerfully mirror human thinking. We don’t experience the world in isolated data types – we combine visual cues, text, sound, and context to understand what’s happening. Training multimodal models on your specific business data helps bridge the gap between how your teams work and how your AI systems operate.
Key challenges organizations face in production deployment
Moving from prototype to production with multimodal AI isn’t easy. PwC survey data shows that while companies are actively experimenting, most expect fewer than 30% of their current experiments to reach full scale in the next six months. The adoption rate for customized models remains particularly low, with only 20-25% of organizations actively using custom models in production.
The following technical challenges consistently stand in the way of success:
Infrastructure complexity: Multimodal fine-tuning demands substantial GPU resources – often 4-8x more than text-only models. Many organizations lack access to the necessary hardware and struggle to configure distributed training environments efficiently.
Data preparing hurdles: Preparing multimodal training data is fundamentally different from text-only preparation. Organizations struggle with properly formatting image-text pairs, handling diverse file formats, and creating effective training examples that maintain the relationship between visual and textual elements.
Training workflow management: Configuring and monitoring distributed training across multiple GPUs requires specialized expertise most teams don’t have. Parameter tuning, checkpoint management, and optimization for multimodal models introduce additional layers of complexity.
These technical barriers create what we call “the multimodal implementation gap” – the difference between recognizing the potential business value and successfully delivering it in production.
How Google Cloud and Axolotl together solve these challenges
Our collaboration brings together complementary strengths to directly address these challenges. Google Cloud provides the enterprise-gradeinfrastructure foundation necessary for demanding multimodal workloads. Our specialized hardware accelerators such as NVIDIA B200 Tensor Core GPUs and Ironwood are optimized for these tasks, while our managed services like Google Cloud Batch, Vertex AI Training, and GKE Autopilot minimize the complexities of provisioning and orchestrating multi-GPU environments. This infrastructure seamlessly integrates with the broader ML ecosystem, creating smooth end-to-end workflows while maintaining the security and compliance controls required for production deployments.
Axolotl complements this foundation with a streamlined fine-tuning framework that simplifies implementation. Its configuration-driven approach abstracts away technical complexity, allowing teams to focus on outcomes rather than infrastructure details. Axolotl supports multiple open source and open weight foundation models and efficient fine-tuning methods like QLoRA. This framework includes optimized implementations of performance-enhancing techniques, backed by community-tested best practices that continuously evolve through real-world usage.
Together, we enable organizations to implement production-grade multimodal fine-tuning without reinventing complex infrastructure or developing custom training code. This combination accelerates time-to-value, turning what previously required months of specialized development into weeks of standardized implementation.
Solution Overview
Our multimodal fine-tuning pipeline consists of five essential components:
Foundational model: Choose a base model that meets your task requirements. Axolotl supports a variety of open source and open weight multimodal models including Llama 4, Pixtral, LLaVA-1.5, Mistral-Small-3.1, Qwen2-VL, and others. For this example, we’ll use Gemma 3, our latest open and multimodal model family.
Data preparation: Create properly formatted multimodal training data that maintains the relationship between images and text. This includes organizing image-text pairs, handling file formats, and splitting data into training/validation sets.
Training configuration: Define your fine-tuning parameters using Axolotl’s YAML-based approach, which simplifies settings for adapters like QLoRA, learning rates, and model-specific optimizations.
Infrastructure orchestration: Select the appropriate compute environment based on your scale and operational requirements. Options include Google Cloud Batch for simplicity, Google Kubernetes Engine for flexibility, or Vertex AI Custom Training for MLOps integration.
Production integration: Streamlined pathways from fine-tuning to deployment.
The pipeline structure above represents the conceptual components of a complete multimodal fine-tuning system. In our hands-on example later in this guide, we’ll demonstrate these concepts through a specific implementation tailored to the SIIM-ISIC Melanoma dataset, using GKE for orchestration. While the exact implementation details may vary based on your specific dataset characteristics and requirements, the core components remain consistent.
Selecting the Right Google Cloud Environment
Google Cloud offers multiple approaches to orchestrating multimodal fine-tuning workloads. Let’s explore three options with different tradeoffs in simplicity, flexibility, and integration:
Google Cloud Batch
Google Cloud Batch is best for teams seeking maximum simplicity for GPU-intensive training jobs with minimal infrastructure management. It handles all resource provisioning, scheduling, and dependencies automatically, eliminating the need for container orchestration or complex setup. This fully managed service balances performance and cost effectiveness, making it ideal for teams who need powerful computing capabilities without operational overhead.
Vertex AI Custom Training
Vertex AI Custom Training is best for teams prioritizing integration with Google Cloud’s MLOps ecosystem and managed experiment tracking. Vertex AI Custom Training jobs automatically integrate with Experiments for tracking metrics, the Model Registry for versioning, Pipelines for workflow orchestration, and Endpoints for deployment.
Google Kubernetes Engine (GKE)
GKE is best for teams seeking flexible integration with containerized workloads. It enables unified management of training jobs alongside other services in your container ecosystem while leveraging Kubernetes’ sophisticatedschedulingcapabilities. GKE offers fine-grained control over resource allocation, making it ideal for complex ML pipelines. For our hands-on example, we’ll use GKE in Autopilot mode, which maintains these integration benefits while Google Cloud automates infrastructure management including node provisioning and scaling. This lets you focus on your ML tasks rather than cluster administration, combining the flexibility of Kubernetes with the operational simplicity of a managed service.
Take a look at our code sample here for a complete implementation that demonstrates how to orchestrate a multimodal fine-tuning job on GKE:
This repository includes ready-to-use Kubernetes manifests for deploying Axolotl training jobs on GKE in Autopilot mode, covering automated cluster setup with GPUs, persistent storage configuration, job specifications, and monitoring integration.
Hands-on example: Fine-tuning Gemma 3 on the SIIM-ISIC Melanoma dataset
This section involves dermoscopic images of skin lesions with labels indicating whether they are malignant or benign. With melanoma accounting for 75% of skin cancer deaths despite its relative rarity, early and accurate detection is critical for patient survival. By applying multimodal AI to this challenge, we unlock the potential to help dermatologists improve diagnostic accuracy and potentially save lives through faster, more reliable identification of dangerous lesions. So, let’s walk through a complete example fine-tuning Gemma 3 on the SIIM-ISIC Melanoma Classification dataset.
For this implementation, we’ll leverage GKE in Autopilot mode to orchestrate our training job and monitoring, allowing us to focus on the ML workflow while Google Cloud handles the infrastructure management.
Data Preparation
The SIIM-ISIC Melanoma Classification dataset requires specific formatting for multimodal fine-tuning with Axolotl. Our data preparation process involves two main steps: (1) efficiently transferring the dataset to Cloud Storage using Storage Transfer Service, and (2) processing the raw data into the format required by Axolotl. To start, transfer the dataset.
Create a TSV file that contains the URLs for the ISIC dataset files:
Set up appropriate IAM permissions for the Storage Transfer Service:
code_block
<ListValue: [StructValue([(‘code’, ‘# Get your current project IDrnexport PROJECT_ID=$(gcloud config get-value project)rnrn# Get your project numberrnexport PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} –format=”value(projectNumber)”)rnrn# Enable the Storage Transfer APIrnecho “Enabling Storage Transfer API…”rngcloud services enable storagetransfer.googleapis.com –project=${PROJECT_ID}rnrn# Important: The Storage Transfer Service account is created only after you access the service.rn# Access the Storage Transfer Service in the Google Cloud Console to trigger its creation:rn# https://console.cloud.google.com/transfer/cloudrnecho “IMPORTANT: Before continuing, please visit the Storage Transfer Service page in the Google Cloud Console”rnecho “Go to: https://console.cloud.google.com/transfer/cloud”rnecho “This ensures the Storage Transfer Service account is properly created.”rnecho “After visiting the page, wait approximately 60 seconds for account propagation, then continue.”rnecho “”rnecho “Press Enter once you’ve completed this step…”rnread -p “”rnrn# Grant Storage Transfer Service the necessary permissionsrnexport STS_SERVICE_ACCOUNT_EMAIL=”project-${PROJECT_NUMBER}@storage-transfer-service.iam.gserviceaccount.com”rnecho “Granting permissions to Storage Transfer Service account: ${STS_SERVICE_ACCOUNT_EMAIL}”rnrngcloud storage buckets add-iam-policy-binding gs://${GCS_BUCKET_NAME} \rn–member=serviceAccount:${STS_SERVICE_ACCOUNT_EMAIL} \rn–role=roles/storage.objectViewer \rn–condition=Nonernrngcloud storage buckets add-iam-policy-binding gs://${GCS_BUCKET_NAME} \rn–member=serviceAccount:${STS_SERVICE_ACCOUNT_EMAIL} \rn–role=roles/storage.objectUser \rn–condition=None’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa8e0>)])]>
Set up a storage transfer job using the URL list:
Navigate to Cloud Storage > Transfer
Click “Create Transfer Job”
Select “URL list” as Source type and “Google Cloud Storage” as Destination type
Enter the path to your TSV file: gs://<GCS_BUCKET_NAME>/melanoma_dataset_urls.tsv
Select your destination bucket
Use the default job settings and click Create
The transfer will download approximately 32GB of data from the ISIC Challenge repository directly to your Cloud Storage bucket. Once the transfer is complete, you’ll need to extract the ZIP files before proceeding to the next step where we’ll format this data for Axolotl. See the notebook in the Github repository here for a full walk-through demonstration on how to format the data for Axolotl.
Preparing Multimodal Training Data
For multimodal models like Gemma 3, we need to structure our data following the extended chat_template format, which defines conversations as a series of messages with both text and image content.
Below is an example of a single training input example:
code_block
<ListValue: [StructValue([(‘code’, ‘{rn “messages”: [rn {rn “role”: “system”,rn “content”: [rn {“type”: “text”, “text”: “You are a dermatology assistant that helps identify potential melanoma from skin lesion images.”}rn ]rn },rn {rn “role”: “user”,rn “content”: [rn {“type”: “image”, “path”: “/path/to/image.jpg”},rn {“type”: “text”, “text”: “Does this appear to be malignant melanoma?”}rn ]rn },rn {rn “role”: “assistant”, rn “content”: [rn {“type”: “text”, “text”: “Yes, this appears to be malignant melanoma.”}rn ]rn }rn ]rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaadf0>)])]>
We split the data into training (80%), validation (10%), and test (10%) sets, while maintaining the class distribution in each split using stratified sampling.
This format allows Axolotl to properly process both the images and their corresponding labels, maintaining the relationship between visual and textual elements during training.
Creating the Axolotl Configuration File
Next, we’ll create a configuration file for Axolotl that defines how we’ll fine-tune Gemma 3. We’ll use QLoRA (Quantized Low-Rank Adaptation) with 4-bit quantization to efficiently fine-tune the model while keeping memory requirements manageable. While A100 40GB GPUs have substantial memory, the 4-bit quantization with QLoRA allows us to train with larger batch sizes or sequence lengths if needed, providing additional flexibility for our melanoma classification task. The slight reduction in precision is typically an acceptable tradeoff, especially for fine-tuning tasks where we’re adapting a pre-trained model rather than training from scratch.
This configuration sets up QLoRA fine-tuning with parameters optimized for our melanoma classification task. Next, we’ll set up our GKE Autopilot environment to run the training.
Setting up GKE Autopilot for GPU Training
Now that we have our configuration file ready, let’s set up the GKE Autopilot cluster we’ll use for training. As mentioned earlier, Autopilot mode lets us focus on our ML task while Google Cloud handles the infrastructure management.
Let’s create our GKE Autopilot cluster:
code_block
<ListValue: [StructValue([(‘code’, ‘# Set up environment variables for cluster configurationrnexport PROJECT_ID=$(gcloud config get-value project)rnexport REGION=us-central1rnexport CLUSTER_NAME=melanoma-training-clusterrnexport RELEASE_CHANNEL=regularrnrn# Enable required Google APIsrnecho “Enabling required Google APIs…”rngcloud services enable container.googleapis.com –project=${PROJECT_ID}rngcloud services enable compute.googleapis.com –project=${PROJECT_ID}rnrn# Create a GKE Autopilot cluster in the same region as your datarnecho “Creating GKE Autopilot cluster ${CLUSTER_NAME}…”rngcloud container clusters create-auto ${CLUSTER_NAME} \rn –location=${REGION} \rn –project=${PROJECT_ID} \rn –release-channel=${RELEASE_CHANNEL}rnrn# Install kubectl if not already installedrnif ! command -v kubectl &> /dev/null; thenrn echo “Installing kubectl…”rn gcloud components install kubectlrnfirnrn# Install the GKE auth plugin required for kubectlrnecho “Installing GKE auth plugin…”rngcloud components install gke-gcloud-auth-pluginrnrn# Configure kubectl to use the clusterrnecho “Configuring kubectl to use the cluster…”rngcloud container clusters get-credentials ${CLUSTER_NAME} \rn –location=${REGION} \rn –project=${PROJECT_ID}rnrn# Verify kubectl is working correctlyrnecho “Verifying kubectl connection to cluster…”rnkubectl get nodes’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa040>)])]>
Now set up Workload Identity Federation for GKE to securely authenticate with Google Cloud APIs without using service account keys:
code_block
<ListValue: [StructValue([(‘code’, ‘# Set variables for Workload Identity Federationrnexport PROJECT_ID=$(gcloud config get-value project)rnexport NAMESPACE=”axolotl-training”rnexport KSA_NAME=”axolotl-training-sa”rnexport GSA_NAME=”axolotl-training-sa”rnrn# Create a Kubernetes namespace for the training jobrnkubectl create namespace ${NAMESPACE} || echo “Namespace ${NAMESPACE} already exists”rnrn# Create a Kubernetes ServiceAccountrnkubectl create serviceaccount ${KSA_NAME} \rn –namespace=${NAMESPACE} || echo “ServiceAccount ${KSA_NAME} already exists”rnrn# Create an IAM service accountrnif ! gcloud iam service-accounts describe ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com &>/dev/null; thenrn echo “Creating IAM service account ${GSA_NAME}…”rn gcloud iam service-accounts create ${GSA_NAME} \rn –display-name=”Axolotl Training Service Account”rn rn # Wait for IAM propagationrn echo “Waiting for IAM service account creation to propagate…”rn sleep 15rnelsern echo “IAM service account ${GSA_NAME} already exists”rnfirnrn# Grant necessary permissions to the IAM service accountrnecho “Granting storage.objectAdmin role to IAM service account…”rngcloud projects add-iam-policy-binding ${PROJECT_ID} \rn –member=”serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com” \rn –role=”roles/storage.objectAdmin”rnrn# Wait for IAM propagationrnecho “Waiting for IAM policy binding to propagate…”rnsleep 10rnrn# Allow the Kubernetes ServiceAccount to impersonate the IAM service accountrnecho “Binding Kubernetes ServiceAccount to IAM service account…”rngcloud iam service-accounts add-iam-policy-binding ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \rn –role=”roles/iam.workloadIdentityUser” \rn –member=”serviceAccount:${PROJECT_ID}.svc.id.goog[${NAMESPACE}/${KSA_NAME}]”rnrn# Annotate the Kubernetes ServiceAccountrnecho “Annotating Kubernetes ServiceAccount…”rnkubectl annotate serviceaccount ${KSA_NAME} \rn –namespace=${NAMESPACE} \rn iam.gke.io/gcp-service-account=${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com –overwriternrn# Verify the configurationrnecho “Verifying Workload Identity Federation setup…”rnkubectl get serviceaccount ${KSA_NAME} -n ${NAMESPACE} -o yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa550>)])]>
Now create a PersistentVolumeClaim for our model outputs. In Autopilot mode, Google Cloud manages the underlying storage classes, so we don’t need to create our own:
<ListValue: [StructValue([(‘code’, ‘# Apply the PVC configurationrnkubectl apply -f model-storage-pvc.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaad30>)])]>
Deploying the Training Job to GKE Autopilot
In Autopilot mode, we specify our GPU requirements using annotations and resource requests within the Pod template section of our Job definition. We’ll create a Kubernetes Job that requests a single A100 40GB GPU:
Create a ConfigMap with our Axolotl configuration:
code_block
<ListValue: [StructValue([(‘code’, ‘# Create the ConfigMap rnkubectl create configmap axolotl-config –from-file=gemma3-melanoma.yaml -n ${NAMESPACE}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8424eaa6a0>)])]>
Create a Secret with Hugging Face credentials:
code_block
<ListValue: [StructValue([(‘code’, “# Create a Secret with your Hugging Face tokenrn# This token is required to access the Gemma 3 model from Hugging Face Hubrn# Generate a Hugging Face token at https://huggingface.co/settings/tokens if you don’t have one rnkubectl create secret generic huggingface-credentials -n ${NAMESPACE} –from-literal=token=YOUR_HUGGING_FACE_TOKEN”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e83fef25340>)])]>
Apply training job YAML to start the training process:
code_block
<ListValue: [StructValue([(‘code’, ‘# Start training job rnkubectl apply -f axolotl-training-job.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e83fef8aa90>)])]>
Monitor the Training Process
Fetch the pod name to monitor progress:
code_block
<ListValue: [StructValue([(‘code’, “# Get the pod name for the training jobrnPOD_NAME=$(kubectl get pods -n ${NAMESPACE} –selector=job-name=gemma3-melanoma-training -o jsonpath='{.items[0].metadata.name}’)rnrn# Monitor logs in real-timernkubectl describe pod $POD_NAME -n ${NAMESPACE}rnkubectl logs -f $POD_NAME -n ${NAMESPACE}”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e83fef8a550>)])]>
<ListValue: [StructValue([(‘code’, ‘# Deploy TensorBoardrnkubectl apply -f tensorboard.yamlrnrn# Get the external IP to access TensorBoardrnkubectl get service tensorboard -n ${NAMESPACE}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8421ee8d00>)])]>
Model Export and Evaluation Setup
After training completes, we need to export our fine-tuned model and evaluate its performance against the base model. First, let’s export the model from our training environment to Cloud Storage:
After creating the model-export.yaml file, apply it:
code_block
<ListValue: [StructValue([(‘code’, ‘# Export the modelrnkubectl apply -f model-export.yaml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e8421ee8550>)])]>
This will start the export process, which copies the fine-tuned model from the Kubernetes PersistentVolumeClaim to your Cloud Storage bucket for easier access and evaluation.
Once exported, we have several options for evaluating our fine-tuned model. You can deploy both the base and fine-tuned models to their own respective Vertex AI Endpoints for systematic testing via API calls, which works well for high-volume automated testing and production-like evaluation. Alternatively, for exploratory analysis and visualization, a GPU-enabled notebook environment such as a Vertex Workbench Instance or Colab Enterprise offers significant advantages, allowing for real-time visualization of results, interactive debugging, and rapid iteration on evaluation metrics.
In this example, we use a notebook environment to leverage its visualization capabilities and interactive nature. Our evaluation approach involves:
Loading both the base and fine-tuned models
Running inference on a test set of dermatological images from the SIIM-ISIC dataset
Computing standard classification metrics (accuracy, precision, recall, etc.)
Analyzing the confusion matrices to understand error patterns
Generating visualizations to highlight performance differences
For the complete evaluation code and implementation details, check out our evaluation notebook in the GitHub repository.
Performance Results
Our evaluation demonstrated that domain-specific fine-tuning can transform a general-purpose multimodal model into a much more effective tool for specialized tasks like medical image classification. The improvements were significant across multiple dimensions of model performance.
The most notable finding was the base model’s tendency to over-diagnose melanoma. It showed perfect recall (1.000) but extremely poor specificity (0.011), essentially labeling almost every lesion as melanoma. This behavior is problematic in clinical settings where false positives lead to unnecessary procedures, patient anxiety, and increased healthcare costs.
Fine-tuning significantly improved the model’s ability to correctly identify benign lesions, reducing false positives from 3,219 to 1,438. While this came with a decrease in recall (from 1.000 to 0.603), the tradeoff resulted in much better overall diagnostic capability, with balanced accuracy improving substantially.
In our evaluation, we also included results from the newly announced MedGemma—a collection of Gemma 3 variants trained specifically for medical text and image comprehension recently released at Google I/O. These results further contribute to our understanding of how different model starting points affect performance on specialized healthcare tasks.
Below we can see the performance metrics across all three models:
Accuracy jumped from a mere 0.028 for base Gemma 3 to 0.559 for our tuned Gemma 3 model, representing an astounding 1870.2% improvement. MedGemma achieved 0.893 accuracy without any task-specific fine-tuning—a 3048.9% improvement over the base model and substantially better than our custom-tuned version.
While precision saw a significant 34.2% increase in our tuned model (from 0.018 to 0.024), MedGemma delivered a substantial 112.5% improvement (to 0.038). The most remarkable transformation occurred in specificity—the model’s ability to correctly identify non-melanoma cases. Our tuned model’s specificity increased from 0.011 to 0.558 (a 4947.2% improvement), while MedGemma reached 0.906 (an 8088.9% improvement over the base model).
These numbers highlight how fine-tuning helped our model develop a more nuanced understanding of skin lesion characteristics rather than simply defaulting to melanoma as a prediction. MedGemma’s results demonstrate that starting with a medically-trained foundation model provides considerable advantages for healthcare applications.
The confusion matrices further illustrate these differences:
Looking at the base Gemma 3 matrix (left), we can see it correctly identified all 58 actual positive cases (perfect recall) but also incorrectly classified 3,219 negative cases as positive (poor specificity). Our fine-tuned model (center) shows a more balanced distribution, correctly identifying 1,817 true negatives while still catching 35 of the 58 true positives. MedGemma (right) shows strong performance in correctly identifying 2,948 true negatives, though with more false negatives (46 missed melanoma cases) than the other models.
To illustrate the practical impact of these differences, let’s examine a real example, image ISIC_4908873, from our test set:
Disclaimer: Image for example case use only.
The base model incorrectly classified it as melanoma. Its rationale focused on general warning signs, citing its “significant variation in color,” “irregular, poorly defined border,” and “asymmetry” as definitive indicators of malignancy, without fully contextualizing these within broader benign patterns.
In contrast, our fine-tuned model correctly identified it as benign. While acknowledging a “heterogeneous mix of colors” and “irregular borders,” it astutely noted that such color mixes can be “common in benign nevi.” Crucially, it interpreted the lesion’s overall “mottled appearance with many small, distinct color variations” as being “more characteristic of a common mole rather than melanoma.”
Interestingly, MedGemma also misclassified this lesion as melanoma, stating, “The lesion shows a concerning appearance with irregular borders, uneven coloration, and a somewhat raised surface. These features are suggestive of melanoma. Yes, this appears to be malignant melanoma.” Despite MedGemma’s overall strong statistical performance, this example illustrates that even domain-specialized models can benefit from task-specific fine-tuning for particular diagnostic challenges.
These results underscore a critical insight for organizations building domain-specific AI systems: while foundation models provide powerful starting capabilities, targeted fine-tuning is often essential to achieve the precision and reliability required for specialized applications. The significant performance improvements we achieved—transforming a model that essentially labeled everything as melanoma into one that makes clinically useful distinctions—highlight the value of combining the right infrastructure, training methodology, and domain-specific data.
MedGemma’s strong statistical performance demonstrates that starting with a domain-focused foundation model significantly improves baseline capabilities and can reduce the data and computation needed for building effective medical AI applications. However, our example case also shows that even these specialized models would benefit from task-specific fine-tuning for optimal diagnostic accuracy in clinical contexts.
Next steps for your multimodal journey
By combining Google Cloud’s enterprise infrastructure with Axolotl’s configuration-driven approach, you can transform what previously required months of specialized development into weeks of standardized implementation, bringing custom multimodal AI capabilities from concept to production with greater efficiency and reliability.
For deeper exploration, check out these resources:
AWS WAF now supports matching incoming request against Autonomous System Numbers (ASNs). By monitoring and restricting traffic from specific ASNs, you can mitigate risks associated with malicious actors, comply with regulatory requirements, and optimize the performance and availability of your web applications. This new ASN Match Statement integrates seamlessly with existing WAF rules, making it easy for you to incorporate ASN based security controls into your overall web application defense strategy.
You can specify a list of ASNs to match against incoming request and take appropriate action such as block or allow the request. You can also use ASN in your rate-based rule statements. These rules aggregate requests according to your criteria, counts and rate limits the requests based on the rule’s evaluation window, request limit, and action settings.
ASN Match statement is available in all regions where AWS WAF is available. The rate-based rule support with ASN is available in regions where the enhanced rate-based rules are currently supported. There is no additional cost for using ASN in Match statement and rate-based rules, however standard AWS WAF charges still apply. For more information about the service, visit the AWS WAF page. For more information about pricing, visit the AWS WAF Pricing page
Today, AWS announces the general availability of the Invoice Summary API. This allows you to retrieve your AWS invoice summary details programmatically via SDK. You can retrieve multiple invoice summary details by making a single API call that accepts input parameters like AWS Account ID, AWS Invoice ID, billing period, or a date range as input.
The output of the Invoice Summary API will include data elements like Invoice Amount in base currency and tax currency, purchase order number and other meta data that can be found in this link. You can integrate the API to your accounts payable systems to automate invoice processing and improve efficiency.
Invoice Summary API is available in all AWS Regions, except the AWS GovCloud (US) Regions and the China Regions.
Today, Amazon Q Developer announces support for the agentic coding experience within the JetBrains and Visual Studio IDEs. This experience, already available in Visual Studio Code and the Amazon Q Developer CLI, redefines how you write, modify, and maintain code by leveraging natural language understanding to seamlessly run complex workflows.
Agentic coding provides intelligent task execution, enabling Q Developer to perform actions beyond code suggestions, such as reading files, generating code diffs, and running command-line tasks. To get started, simply type in your prompt in your preferred spoken language. As Q Developer works through your tasks, it provides continuous status updates, instantly applying your changes and feedback along the way. This allows you to seamlessly complete tasks, while improving and streamlining the development process.
The agentic coding experience is available in all AWS regions where Q Developer is supported. To learn more about agentic coding in Visual Studio and JetBrains, read our blog.
Amazon EC2 now enables you to automatically delete underlying Amazon EBS snapshots when deregistering Amazon Machine Images (AMIs), allowing you to better manage your storage costs and simplify your AMI cleanup workflow.
Previously, when deregistering an AMI, you had to separately delete its associated EBS snapshots, which required additional steps. This process could lead to abandoned snapshots, resulting in unnecessary storage costs and resource management overhead. Now you can automatically delete EBS snapshots at the time of AMI deregistration.
This capability is available to all customers at no additional costs, and is enabled in all AWS commercial regions including AWS GovCloud (US), AWS China (Beijing) Region, operated by Sinnet, and in the AWS China (Ningxia) Region, operated by NWCD.
You can deregister AMIs from the EC2 Console, CLI, API, or SDK, and learn more in the AMI documentation.
Amazon Relational Database Service (Amazon RDS) for MariaDB now supports new community MariaDB minor versions 10.11.13 and 11.4.7. We recommend that you upgrade to the latest minor versions to fix known security vulnerabilities in prior versions of MariaDB, and to benefit from the bug fixes, performance improvements, and new functionality added by the MariaDB community.
You can leverage automatic minor version upgrades to automatically upgrade your databases to more recent minor versions during scheduled maintenance windows. You can also leverage Amazon RDS Managed Blue/Green deployments for safer, simpler, and faster updates to your MariaDB instances. Learn more about upgrading your database instances, including automatic minor version upgrades and Blue/Green Deployments, in the Amazon RDS User Guide.
Amazon RDS for MariaDB makes it straightforward to set up, operate, and scale MariaDB deployments in the cloud. Learn more about pricing details and regional availability at Amazon RDS for MariaDB. Create or update a fully managed Amazon RDS database in the Amazon RDS Management Console.
Amazon Managed Service for Prometheus is now available in Africa (Cape Town), Asia Pacific (Thailand), Asia Pacific (Hong Kong), Asia Pacific (Malaysia), Europe (Milan), Europe (Zurich), and Middle East (UAE). Amazon Managed Service for Prometheus is a fully managed Prometheus-compatible monitoring service that makes it easy to monitor and alarm on operational metrics at scale.
The list of all supported regions where Amazon Managed Service for Prometheus is generally available can be found on the user guide. Customers can send up to 1 billion active metrics to a single workspace and can create many workspaces per account, where a workspace is a logical space dedicated to the storage and querying of Prometheus metrics.
To learn more about Amazon Managed Service for Prometheus collector, visit the user guide or product page.