Cloud

2025 11 19

AWS – Amazon CloudWatch now supports scheduled queries in Logs Insights

Amazon CloudWatch Logs now supports automatically running Logs Insights queries on a recurring schedule for your log analysis needs. With scheduled queries, you can now automate log analysis tasks and deliver query results to Amazon S3 and Amazon EventBridge.

With today’s launch, you can track trends, monitor key operational metrics, and detect anomalies without needing to manually re-run queries or maintain custom automation. This feature makes it easier to maintain continuous visibility into your applications and infrastructure, streamline operational workflows, and ensure consistent insight generation at scale. For example, you can setup scheduled queries for your weekly audit reporting. The query results can also be stored in Amazon S3 for analysis, or trigger incident response workflows through Amazon EventBridge. The feature supports all CloudWatch Logs Insights query languages and helps teams improve operational efficiency by eliminating manual query executions.

Scheduled queries is available in US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), and South America (São Paulo).

You can configure a scheduled query using the Amazon CloudWatch console, AWS Command Line Interface (AWS CLI), AWS Cloud Development Kit (AWS CDK), and AWS SDKs. For more information, visit the Amazon CloudWatch documentation.

Read More for the details.

2025 11 18

AWS – Amazon EC2 P6-B300 instances with NVIDIA Blackwell Ultra GPUs are now available

Tibor Kiss AWS, Cloud AWS

Today, AWS announces the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P6-B300 instances, accelerated by NVIDIA Blackwell Ultra B300 GPUs. Amazon EC2 P6-B300 instances provide 8x NVIDIA Blackwell Ultra GPUs with 2.1 TB high bandwidth GPU memory, 6.4 Tbps EFA networking, 300 Gbps dedicated ENA throughput, and 4 TB of system memory.

P6-B300 instances deliver 2x networking bandwidth, 1.5x GPU memory size, and 1.5x GPU TFLOPS (at FP4, without sparsity) compared to P6-B200 instances, making them well suited to train and deploy large trillion-parameter foundation models (FMs) and large language models (LLMs) with sophisticated techniques. The higher networking and larger memory deliver faster training times and more token throughput for AI workloads.

P6-B300 instances are now available in the p6-b300.48xlarge size through Amazon EC2 Capacity Blocks for ML and Savings Plans in the following AWS Region: US West (Oregon). For on-demand reservation of P6-B300 instances, please reach out to your account manager.

To learn more about P6-B300 instances, visit Amazon EC2 P6 instances.

Read More for the details.

2025 11 18

AWS – Amazon OpenSearch Serverless now adds audit logs for data plane APIs

Tibor Kiss AWS, Cloud AWS

Amazon OpenSearch Serverless now supports detailed audit logging of data plane requests via AWS CloudTrail. This feature enables customers to record user actions on their collections, helping meet compliance regulations, improve security posture, and provide evidence for security investigations. Customers can now track user activities such as authorization attempts, index modifications, and search queries.

Customers can use CloudTrail to configure filters for OpenSearch Serverless collections with read-only and write-only options, or use advanced event selectors for more granular control over logged data events. All OpenSearch Serverless data events are delivered to an Amazon S3 bucket and optionally to Amazon CloudWatch Events, creating a comprehensive audit trail. This enhanced visibility into when and who made API calls helps security and operations teams monitor data access and respond to events in real-time.

Once configured with CloudTrail, audit logs will be continuously streamed with no additional customer action required. Audit Logs will be continuously streamed to CloudTrail and can be further analyzed there.

Please refer to the AWS Regional Services List for more information about Amazon OpenSearch Service availability. To learn more about OpenSearch Serverless, see the documentation.

Read More for the details.

2025 11 18

AWS – EC2 Auto Scaling now offers a synchronous API to launch instances inside an Auto Scaling group

Tibor Kiss AWS, Cloud AWS

Today, EC2 Auto Scaling is launching a new API, LaunchInstances, which gives customers more control and flexibility over how EC2 Auto Scaling provisions instances while providing instant feedback on capacity availability.

Customers use EC2 Auto Scaling for automated fleet management. With scaling policies, EC2 Auto Scaling can automatically add instances when demand spikes and remove them when traffic drops, ensuring customers’ applications always have the right amount of compute. EC2 Auto Scaling also offers the ability to monitor and replace unhealthy instances. In certain use cases, customers may want to specify exactly where EC2 Auto Scaling should launch additional instances and need immediate feedback on capacity availability. The new LaunchInstances API allows customers to precisely control where instances are launched by specifying an override for any Availability Zone and/or subnet in an Auto Scaling group, while providing immediate feedback on capacity availability. This synchronous operation gives customers real-time insight into scaling operations, enabling them to quickly implement alternative strategies if needed. For additional flexibility, the API includes optional asynchronous retries to help reach the desired capacity.

This feature is now available in US East (N. Virginia), US West (Oregon), Europe (Ireland), and Asia Pacific (Singapore), at no additional cost beyond standard EC2 and EBS usage. To get started, visit the AWS Command Line Interface (CLI) and the AWS SDKs. To learn more about this feature, visit the AWS documentation.

Read More for the details.

2025 11 18

AWS – Amazon Bedrock introduces Priority and Flex inference service tiers

Tibor Kiss AWS, Cloud AWS

Today, Amazon Bedrock introduces two new inference service tiers to optimize costs and performance for different AI workloads. The new Flex tier offers cost-effective pricing for non-time-critical applications like model evaluations and content summarization while the Priority tier provides premium performance and preferential processing for mission-critical applications. For most models that support Priority Tier, customers can realize up to 25% better output tokens per second (OTPS) latency compared to standard tier. These join the existing Standard tier for everyday AI applications with reliable performance.

These service tiers address key challenges that organizations face when deploying AI at scale. The Flex tier is designed for non-interactive workloads that can tolerate longer latencies, making it ideal for model evaluations, content summarization, labeling and annotation, and multistep agentic workflow, and it’s priced at a discount relative to the Standard tier. During periods of high demand, Flex requests receive lower priority relative to the Standard tier. The Priority tier is an ideal fit for mission critical applications, real-time end-user interactions, and interactive experiences where consistent, fast responses are essential. During periods of high demand, Priority requests receive processing priority, at a premium price, over other service tiers. These new service tiers are available today for a range of leading foundation models, including OpenAI (gpt-oss-20b, gpt-oss-120b), DeepSeek (DeepSeek V3.1), Qwen3 (Coder-480B-A35B-Instruct, Coder-30B-A3B-Instruct, 32B dense, Qwen3-235B-A22B-2507), and Amazon Nova (Nova Pro and Nova Premier). With these new options, Amazon Bedrock helps customers gain greater control over balancing cost efficiency with performance requirements, enabling them to scale AI workloads economically while ensuring optimal user experiences for their most critical applications.

For more information about the AWS Regions where Amazon Bedrock Priority and Flex inference service tiers are available, see the AWS Regions table

Learn more about service tiers in our News Blog and documentation.

Read More for the details.

2025 11 18

GCP – Conquering IP address scarcity: A deep dive into Google Cloud’s private NAT

Tibor Kiss Cloud, Google Cloud gcp

Running AI workloads in a hybrid fashion — in your data center and in the cloud — requires sophisticated, global networks that unify cloud and on-premises resources. While Google’s Cloud WAN provides the necessary unified network fabric to connect VPCs, data centers, and specialized hardware, this very interconnectedness exposes a critical, foundational challenge: IP address scarcity and overlapping subnets. As enterprises unify their private and cloud environments, manually resolving these pervasive address conflicts can be a big operational burden.

Resolving IPv4 address conflicts has been a longstanding challenge in networking. And now, with a growing number of IP-intensive workloads and applications, customers face the crucial question of how to ensure sufficient IP addresses for their deployments.

Google Cloud offers various solutions to address private IP address challenges and facilitate communication between non-routable networks, including Private Service Connect (PSC), IPv6 addressing, and network address translation (NAT) appliances. In this post, we focus on private NAT, a feature of the Cloud NAT service. This managed service simplifies private-to-private communication, allowing networks with overlapping IP spaces to connect without complex routing or managing proprietary NAT infrastructure.

Getting to know private NAT

Private NAT allows your Google Cloud resources to connect to other VPC networks or to on-premises networks with overlapping and/or non-routable subnets, without requiring you to manage any virtual machines or appliances.

Here are some of the key benefits of private NAT:

A managed service: As a fully managed service, private NAT minimizes the operational burden of managing and scaling your own NAT gateways. Google Cloud handles the underlying infrastructure, so you can focus on your applications.
Simplified management: Private NAT simplifies network architecture by providing a centralized and straightforward way to manage private-to-private communication — across workloads and traffic paths.
High availability: Being a distributed service, private NAT offers high availability, VM-to-VM line-rate performance, and resiliency, all without having to over-provision costly, redundant infrastructure.
Scalability: Private NAT is designed to scale automatically with your needs, supporting a large number of NAT IP addresses and concurrent connections.

1 Cloud NAT options — Figure: Cloud NAT options

Common use cases

Private NAT provides critical address translation for the most complex hybrid and multi-VPC networking challenges

Unifying global networks with Network Connectivity Center

For organizations that use Network Connectivity Center to establish a central connectivity hub, private NAT offers the essential mechanism for linking networks that possess overlapping “ non-routable” IP address ranges. This solution facilitates two primary scenarios:

VPC spoke-to-spoke: Facilitates seamless private-to-private communication between distinct VPC networks (spokes) with overlapping subnets.
VPC-to-hybrid-spoke: Enables connectivity between a cloud VPC and an on-premises network (a hybrid spoke) connected via Cloud Interconnect or Cloud VPN. Learn more here.

2 Private NAT with Network Connectivity Center — Figure: Private NAT with Network Connectivity Center

Enabling local hybrid connectivity in shared VPC

Organizations with shared VPC architectures can establish connectivity from non-routable or overlapping network subnets to their local Cloud Interconnects or Cloud VPN tunnels. A single private NAT gateway can manage destination routes for all workloads within the VPC.

“Thanks to private NAT, we effortlessly connected our Orange on-prem data center with the Masmovil GCP environment, even with IP address overlaps after our joint venture. This was crucial for business continuity, as it allowed us to enable communications without altering our existing environment.” – Pedro Sanz Martínez, Head of Cloud Platform Engineering, MasOrange

3 Enabling local hybrid connectivity using private NAT — Figure: Enabling local hybrid connectivity using private NAT

Accommodating Cloud Run and GKE workloads

Dynamic, IP-intensive workloads such as Google Kubernetes Engine (GKE) and Cloud Run often use Non-RFC 1918 ranges such as Class E to solve for IPv4 exhaustion. These workloads often need to access resources in an on-premises network or a partner VPC, so the ability for the on-premises network to accept non-RFC 1918 ranges is critical. In most cases, central network teams do not accept non-RFC 1918 address ranges.

You can solve this by applying a private NAT configuration to the non-RFC 1918 subnet. With private NAT, all egress traffic from your Cloud Run service or GKE workloads is translated, allowing it to securely communicate with the destination network despite being on non-routable subnets. Learn about how private NAT works with different workloads here.

Configuration in action: Example setups

Let’s look at how to configure private NAT for one of these use cases using gcloud commands.

Example: connecting to a partner network with overlapping IPs

Scenario: Your production-vpc contains an application subnet (app-subnet-prod, 10.20.0.0/24). You need to connect to a partner’s network over Cloud VPN, but the partner also uses the 10.20.0.0/24 range for the resources you need to access.

Solution: Configure a private NAT gateway to translate traffic from app-subnet-prod before it goes over the VPN tunnel.

1. Create a dedicated subnet for NAT IPs. This subnet’s range is used for translation and must not overlap with the source or destination.

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud compute networks subnets create pnat-subnet-prod \rn –network=production-vpc \rn –range=192.168.1.0/24 \rn –region=us-central1 \rn –purpose=PRIVATE_NAT’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf5277250>)])]>

2. Create a Cloud Router

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud compute routers create prod-router \rn –network=production-vpc \rn –region=us-central1’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf52773d0>)])]>

3. Create a private NAT gateway. This configuration specifies that only traffic from app-subnet-prod to local dynamic (match is_hybrid) destinations should be translated using IPs from pnat-subnet-prod subnet.

code_block: <ListValue: [StructValue([(‘code’, “gcloud compute routers nats create pnat-to-partner \rn –router=prod-router \rn –region=us-central1 \rn –type=PRIVATE –region=us-central1 \rn –nat-custom-subnet-ip-ranges=app-subnet-prod:ALLrnrngcloud compute routers nats rules create 1 \rn –router=prod-router –region=us-central1 \rn –nat= pnat-to-partner \rn –match=’nexthop.is_hybrid’ \rn –source-nat-active-ranges= pnat-subnet-prod”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf4fc4610>)])]>

Now, any VM in app-subnet-prod that sends traffic to the partner’s overlapping network will have its source IP translated to an address from the 192.168.1.0/24 range, resolving the conflict.

Google Cloud’s private NAT elegantly solves the common and complex problem of connecting networks with overlapping IP address spaces. As a fully managed, scalable, and highly available service, it simplifies network architecture, reduces operational overhead, and enables you to build and connect complex hybrid and multi-cloud environments with ease.

Learn more

Ready to get started with private NAT? Check out the official private NAT documentation and tutorials to learn more and start building your own solutions today.

Read More for the details.

2025 11 18

GCP – TimesFM in Data Cloud: The future of forecasting in BigQuery and AlloyDB

Tibor Kiss Cloud, Google Cloud gcp

We are thrilled to announce the integration of TimesFM into our leading data platforms, BigQuery and AlloyDB. This brings the power of large-scale, pre-trained forecasting models directly to your data within the Google Data Cloud, enabling you to predict future trends with unprecedented ease and accuracy.

TimesFM is a powerful time-series foundation model developed by Google Research, pre-trained on a vast dataset of over 400 billion real-world time-points. This extensive training allows TimesFM to perform “zero-shot” forecasting, meaning it can generate accurate predictions for your specific data without needing to be retrained. This dramatically simplifies the process of creating and deploying forecasting models, saving you time and resources.

Now, let’s dive into what this means for you in BigQuery and AlloyDB.

TimesFM in BigQuery

We launched the AI.FORECAST function in preview at Google Cloud Next ‘25. Today, we are announcing:

AI.FORECAST and AI.EVALUATE are now Generally Available (GA).
AI.DETECT_ANOMALIES is now in Public Preview.
AI.FORECAST is supported in multiple open-source frameworks, including

Let’s take a look at these in greater depth.

AI.FORECAST and AI.EVALUATE

The GA launch includes major upgrades:

TimesFM 2.5 is now supported. By specifying `model => “TimesFM 2.5”`, you can use the latest TimesFM model to achieve better forecasting accuracy and lower latency.
AI.FORECAST supports dynamic context windows up to 15K: Multiple context windows from 64 to 15K are supported, by specifying `context_window`. If not specified, a context window is selected to match the time series input size.
AI.FORECAST supports displaying historical data: Displaying historical data together with forecasts is supported by setting `output_historical_time_series` to true. The option enhances usability by enabling easier and better visualizations.
We add AI.EVALUATE for model evaluation. Users can specify the actual data to evaluate the accuracy of the forecasted value.

In this example, you can use the TimesFM 2.5 model and specify the context window = 1024 in AI.FORECAST to use the latest 1024 points as the history data. You can specify output_historical_time_series = true to display historical data together with the forecasts.

code_block: <ListValue: [StructValue([(‘code’, “WITH citibike_trips AS (rn SELECT EXTRACT(DATE FROM starttime) AS date, COUNT(*) AS num_tripsrn FROM `bigquery-public-data.new_york.citibike_trips` GROUP BY date)rnSELECT *rnFROMrn AI.FORECAST(rn TABLE citibike_trips, — History Tablern data_col => ‘num_trips’,rn timestamp_col => ‘date’,rn horizon => 300,rn output_historical_time_series => TRUE,rn model => ‘TimesFM 2.5’,rn context_window => 1024);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c0c24f5e0>)])]>

The first 10 days forecasted values are:

You can also visualize the results by clicking the `Visualization` tab. The results should be similar to:

In this example of AI.EVALUATE, you can use the data before “2016-08-01” as history to evaluate the forecasted bike trips against the actual data after “2016-08-01”:

code_block: <ListValue: [StructValue([(‘code’, ‘WITH citibike_trips AS (rn SELECT EXTRACT(DATE FROM starttime) AS date, usertype, COUNT(*) AS num_tripsrn FROM `bigquery-public-data.new_york.citibike_trips` GROUP BY date, usertype)rnSELECT * rnFROMrn AI.EVALUATE(rn (SELECT * FROM citibike_trips WHERE date < ‘2016-08-01’), — History time seriesrn (SELECT * FROM citibike_trips WHERE date >= ‘2016-08-01’), — Actual time seriesrn data_col => ‘num_trips’,rn timestamp_col => ‘date’,rn id_cols => [“usertype”]);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c0c24ff10>)])]>

The SQL generates evaluation metrics based on each `usertype`:

AI.DETECT_ANOMALIES

The addition of AI.DETECT_ANOMALIES lets you specify the target data to detect anomalies against the forecasted value.

In this example of AI.DETECT_ANOMALIES, you can use the data before “2016-08-01” as history to detect anomalies in the target data after “2016-08-01”:

code_block: <ListValue: [StructValue([(‘code’, ‘WITH citibike_trips AS (rn SELECT EXTRACT(DATE FROM starttime) AS date, usertype, COUNT(*) AS num_tripsrn FROM `bigquery-public-data.new_york.citibike_trips` GROUP BY date, usertype)rnSELECT * rnFROMrn AI.DETECT_ANOMALIES(rn (SELECT * FROM citibike_trips WHERE date < ‘2016-08-01’), — History time series rn (SELECT * FROM citibike_trips WHERE date >= ‘2016-08-01’), — Target time seriesrn data_col => ‘num_trips’,rn timestamp_col => ‘date’,rn id_cols => [“usertype”]);’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c0c24f280>)])]>

The SQL generates the anomalies per usertype for each data point that is after “2016-08-01”, an example of 10 rows of results are:

TimesFM in AlloyDB

AI.FORECAST is now available in AlloyDB in preview. AlloyDB provides built-in support for TimesFM for predictions directly from inside of AlloyDB. This enables you to make predictions leveraging operational and analytical data for use cases such as sales forecasting, inventory demand prediction, or operational load modeling, without needing to export data.

Forecasting sales with AlloyDB

Let’s walk through an example of how you can forecast sales leveraging data stored in AlloyDB. Traditionally you would have to set up and maintain an ETL pipeline to extract data from AlloyDB, pull it into a data science environment, potentially deploy a forecasting model, run predictions for the model and store them. But for time-sensitive applications, these steps can be costly.

Instead, suppose you are leveraging AlloyDB for your operational workloads. You have stored sales, stock and price data, along with metadata, in a table retail_sales. You know what happened last week in terms of sales, but you want to predict what will happen next week so that you can plan accordingly to the demand.

With AlloyDB’s latest integration, you can get started with just two simple steps.

1. Register the model. Register the TimesFM model as a model endpoint within AlloyDB’s model endpoint management in order to point to the Vertex AI endpoint where the model is hosted. This allows AlloyDB to securely send time-series data to the model and receive predictions back. Here we point to a TimesFM model deployed on Vertex AI and choose a model id “timesfm_v2”.

code_block: <ListValue: [StructValue([(‘code’, “CALLrn ai.create_model(rn model_id => ‘timesfm_v2’,rn model_type => ‘ts_forecasting’,rn model_provider => ‘google’,rn model_qualified_name => ‘timesfm_v2’,rn model_request_url => ‘https://<REGION>-aiplatform.googleapis.com/v1/projects/<PROJECT_ID>/locations/<REGION>/endpoints/<ENDPOINT_ID>:predict’ — endpoint in Vertex AI Model Gardenrn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c0c24f490>)])]>

2. Generate Predictions with AI.FORECAST.Once the model is registered, you can start leveraging the AI.FORECAST function. This function takes your time-series data and prediction parameters (like the forecast horizon) and returns the forecasted values.

In this example, we’ll forecast the next 11 days of sales based on the sales data stored in our database with a confidence level of .80.

code_block: <ListValue: [StructValue([(‘code’, “SELECT * FROM ai.forecast(rn model_id => ‘timesfm_v2’,rn source_table => ‘retail_sales’,rn data_col => ‘sales’,rn timestamp_col => ‘timestamp’,rn horizon => 11,rn conf_level => 0.8rn);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c0c24f430>)])]>

This integrated approach means you can keep your data securely within your high-performance AlloyDB instance and immediately leverage Google’s state-of-the-art forecasting capabilities. The low latency of AlloyDB, combined with the zero-shot power of TimesFM, makes real-time predictive analytics a reality for your operational workloads. Read more about our integration in this blog post.

AI.FORECAST in Agents and MCP

In addition to supporting TimesFM (AI.FORECAST) via a SQL interface, you can leverage TimesFM’s prediction capabilities on BigQuery and AlloyDB via agentic interfaces such as Agent Development Kit (ADK), MCP toolbox for Databases, and the Gemini CLI extension for Google Data Cloud.

Use BigQuery built-in forecast tool

This blog post shows you how to write your agent with ADK’s built-in BigQuery forecast tool (via TimesFM) to do the forecast task with your data. Here is a quick peek of how you can run forecasting task via natural language with an agent built with ADK:

This blog post can walk you through how to install and configure the MCP extension and use the BigQuery forecast tool in the Gemini CLI.

Take the next step

The TimesFM model is now generally available in BigQuery. For more details, please see the tutorial and the documentation for AI.FORECAST, AI.EVALUTE and AI.DETECT_ANOMALIES. You can also get started on TimesFM today on AlloyDB.

Read More for the details.

2025 11 18

GCP – Announcing Cloud SQL free trial instances: Experience the power of a fully managed database

Tibor Kiss Cloud, Google Cloud gcp

Cloud SQL is a proven foundation for fully managed databases, offering production-ready MySQL, PostgreSQL, and SQL Server database engines without the operational headache. With Cloud SQL, there’s no need to worry about patches, backups, and scaling limits — just connect your app and start building.

Today, we’re announcing new free trial instances designed to help you experience the power of Cloud SQL for MySQL and PostgreSQL, with no upfront commitment. Whether you’re a seasoned Google Cloud developer or new to the platform, this 30-day free trial allows you to explore, test, and truly understand the value Cloud SQL brings to your database needs.

There are two editions of Cloud SQL currently available:

Cloud SQL Enterprise Plus edition: Designed for mission-critical applications, providing the highest performance and availability with a 99.99% SLA (including maintenance). It features near-zero downtime for planned maintenance, significant performance boosts through Data Cache (using local SSD), and enhanced write throughput.
Cloud SQL Enterprise edition: Suitable for most business applications, offering high availability and managed maintenance with a 99.95% SLA. It offers all the core capabilities of Cloud SQL, striking a good balance of performance, availability, and cost.

Cloud SQL Free Trial Instance Get Started — Cloud SQL Free Trial Instance ‘Get Started’ Page

Why a dedicated Cloud SQL free trial?

You might be familiar with the standard $300 Google Cloud free trial for new users. While that’s a fantastic starting point, customers have been asking us for a more specialized offering. They want a dedicated environment to test the full power of Cloud SQL, especially enterprise-grade configurations for Performance, High Availability (HA), and Data Cache. This new trial is our answer.

This trial provides a significantly enhanced experience for customers developing applications on top of Cloud SQL, allowing you to:

Experience enterprise-grade features: Test critical functionality like High Availability and Data Cache, both essential to robust and performant database operations.
Onboard new users: As a developer, get hands-on with Cloud SQL without the usual hurdle of getting expense approvals for running tests.
Perform preliminary performance testing: Evaluate Cloud SQL’s performance for your specific workloads, ensuring it meets your demands.

This new Cloud SQL free trial is designed for a wide range of users:

Existing Google Cloud customers: If you’re already using other Google Cloud products, but haven’t explored Cloud SQL, this is your chance!
New Google Cloud users: Complementing the existing standard $300 trial, this offers a deeper dive into Cloud SQL’s capabilities.

What’s included in the 30-day free trial?

We want you to get a comprehensive understanding of Cloud SQL’s key value pillars: price-performance, high availability, connectivity, security, observability, ease of manageability, and open-source compatibility. Your free trial instance will be configured to help you explore all of these areas, based on the following database instance:

When you’re ready to move your workload to production, upgrading to a paid instance is a simple one-click upgrade, at any time during the trial.

Not ready to upgrade quite yet? At the end of the 30-day free trial, we automatically suspend your free trial resources, keeping the instance in a “stopped” state for an additional 90 days, at no additional charge. This should give you ample time to upgrade and continue without interruption.

Ready to get started?

Ready to unlock the full potential of your data with Cloud SQL? Creating your free trial instance is easy. If you’re new to Google Cloud, just sign up for an account and follow the instructions to create your Cloud SQL free trial instance. This exciting offer is available in all Google Cloud regions. Start your free trial and see what Cloud SQL can do for your applications.

Read More for the details.

2025 11 18

GCP – A step-by-step guide to fine-tuning MedGemma for breast tumor classification

Tibor Kiss Cloud, Google Cloud gcp

aside_block: <ListValue: [StructValue([(‘title’, ‘Disclaimer: This guide is for informational and educational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment.’), (‘body’, <wagtail.rich_text.RichText object at 0x7f7c104adc40>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>

Artificial intelligence (AI) is revolutionizing healthcare, but how do you take a powerful, general-purpose AI model and teach it the specialized skills of a pathologist? This journey from prototype to production often begins in a notebook, which is exactly where we’ll start.

In this guide, we’ll take the crucial first step. We’ll walk through the complete process of fine-tuning the Gemma 3 variant MedGemma. MedGemma is Google’s family of open models for the medical community, to classify breast cancer histopathology images. We’re using the full precision MedGemma model because that’s what you’ll need in order to get maximum performance for many clinical tasks. If you’re concerned about compute costs, you can quantize and fine-tune by using MedGemma’s pre-configured fine-tuning notebook instead.

To complete our first step, we’ll use the Finetune Notebook. The notebook provides you with all of the code and a step-by-step explanation of the process, so it’s the perfect environment for experimentation. I’ll also share the key insights that I learned along the way, including a critical choice in data types that made all the difference.

After we’ve perfected our model in this prototyping phase, we’ll be ready for the next step. In an upcoming post, we’ll show you how to take this exact workflow and move it to a scalable, production-ready environment using Cloud Run jobs.

Setting the stage: Our goal, model, and data

Before we get to the code, let’s set the stage. Our goal is to classify microscope images of breast tissue into one of eight categories: four benign (non-cancerous) and four malignant (cancerous). This type of classification represents one of many crucial tasks that pathologists perform in order to make an accurate diagnosis, and we have a great set of tools for the job.

We’ll be using MedGemma, a powerful family of open models from Google that’s built on the same research and technology that powers our Gemini models. What makes MedGemma special is that it isn’t just a general model: it’s been specifically tuned for the medical domain.

The MedGemma vision component, MedSigLIP, was pre-trained on a vast amount of de-identified medical imagery, including the exact type of histopathology slides that we’re using. If you don’t need the predictive power of MedGemma, you can use MedSigLIP alone as a more cost-effective option for predictive tasks like image classification. There are multiple MedSigLIP tutorial notebooks that you can use for fine-tuning.

The MedGemma language component was also trained on a diverse set of medical texts, making the google/medgemma-4b-it version that we’re using perfect for following our text-based prompts. Google provides MedGemma as a strong foundation, but it requires fine-tuning for specific use cases—which is exactly what we’re about to do.

To train our model, we’ll use the Breast Cancer Histopathological Image Classification (BreakHis) dataset. The BreakHis dataset is a public collection of thousands of microscope images of breast tumor tissue that was collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X). The dataset is publicly available for non-commercial research and it’s detailed in the paper: F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heudel, A dataset for breast cancer histopathological image classification.¹

Handling a 4-billion parameter model requires a capable GPU, so I used an NVIDIA A100 with 40 GB of VRAM on Vertex AI Workbench. This GPU has the necessary power, and it also features NVIDIA Tensor Cores that excel with modern data formats, which we’ll leverage for faster training. In an upcoming post, we’ll explain how to calculate the VRAM that’s required for your fine tuning.

My float16 disaster: A crucial lesson in stability

My first attempt to load the model used the common float16 data type to save memory. It failed spectacularly. The model’s outputs were complete garbage, and a quick debugging check revealed that every internal value had collapsed into NaN (Not a Number).

The culprit was a classic numerical overflow.

To understand why, you need to know the critical difference between these 16-bit formats:

float16 (FP16): Has a tiny numerical range. It can’t represent any number that’s greater than 65,504. During the millions of calculations in a transformer, intermediate values can easily exceed this limit, causing an overflow that creates a NaN. When a NaN appears, it contaminates every subsequent calculation.
bfloat16 (BF16): This format, developed at Google Brain, makes a crucial trade-off. It sacrifices a little bit of precision to maintain the same massive numerical range as the full 32-bit float32 format.

The bfloat16 massive range prevents overflows, which keeps the training process stable. The fix was a simple one-line change, but it was based on this critical concept.

The successful code:

code_block: <ListValue: [StructValue([(‘code’, ‘# The simple, stable solutionrnmodel_kwargs = dict(rn torch_dtype=torch.bfloat16, # Use bfloat16 for its wide numerical rangern device_map=”auto”,rn attn_implementation=”sdpa”,rn)rnrnmodel = AutoModelForImageTextToText.from_pretrained(MODEL_ID, **model_kwargs)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c11187d30>)])]>

Lesson learned: For fine-tuning large models, always prefer bfloat16 for its stability. It’s a small change that saves you from a world of NaN-related headaches.

The code walkthrough: A step-by-step guide

Now, let’s get to the code. I’ll break down my Finetune Notebook into clear, logical steps.

Step 1: Setup and installations

First, you need to install the necessary libraries from the Hugging Face ecosystem and log into your account to download the model.

code_block: <ListValue: [StructValue([(‘code’, ‘# Install required packagesrn!pip install –upgrade –quiet transformers datasets evaluate peft trl scikit-learnrnrnimport osrnimport rernimport torchrnimport gcrnfrom datasets import load_dataset, ClassLabelrnfrom peft import LoraConfig, PeftModelrnfrom transformers import AutoModelForImageTextToText, AutoProcessorrnfrom trl import SFTTrainer, SFTConfigrnimport evaluate’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c11187ac0>)])]>

Hugging Face authentication and and the recommended approach to handle your secrets

⚠️ Important security note: You should never hardcode secrets like API keys or tokens directly into your code or notebooks, especially in a production environment. This practice is insecure and it creates a significant security risk.

In Vertex AI Workbench, the most secure and enterprise-grade approach to handle secrets (like your Hugging Face token) is to use Google Cloud’s Secret Manger.

If you’re just experimenting and you don’t want to set up Secret Manager yet, you can use the interactive login widget. The widget saves the token temporarily in the instance’s file system.

code_block: <ListValue: [StructValue([(‘code’, ‘# Hugging Face authentication using interactive login widget:rnfrom huggingface_hub import notebook_loginrnnotebook_login()’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c11187b50>)])]>

In our upcoming post, where we move this process to Cloud Run Jobs, we’ll show you the correct and secure way to handle this token by using Secret Manager.

Step 2: Load and prepare the dataset

Next, we download the BreakHis dataset from Kaggle using the kagglehub library. This dataset includes a Folds.csv file, which outlines how the data is split for experiments. The original study used 5-fold cross-validation, but to keep the training time manageable for this demonstration, we’ll focus on Fold 1 and we’ll only use images with 100X magnification. You can explore using other folds and magnifications for more extensive experiments.

code_block: <ListValue: [StructValue([(‘code’, ‘! pip install -q kagglehubrnimport kagglehubrnimport osrnimport pandas as pdrnfrom PIL import Imagernfrom datasets import Dataset, Image as HFImage, Features, ClassLabelrnrn# Download the dataset metadatarnpath = kagglehub.dataset_download(“ambarish/breakhis”)rnprint(“Path to dataset files:”, path)rnfolds = pd.read_csv(‘{}/Folds.csv’.format(path))rnrn# Filter for 100X magnification from the first foldrnfolds_100x = folds[folds[‘mag’]==100]rnfolds_100x = folds_100x[folds_100x[‘fold’]==1]rnrn# Get the train/test splitsrnfolds_100x_test = folds_100x[folds_100x.grp==’test’]rnfolds_100x_train = folds_100x[folds_100x.grp==’train’]rnrn# Define the base path for imagesrnBASE_PATH = “/home/jupyter/.cache/kagglehub/datasets/ambarish/breakhis/versions/4/BreaKHis_v1″‘), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7c111879d0>)])]>

Step 2.1: Balance the dataset

The initial train and test splits for the 100X magnification show an imbalance between benign and malignant classes. To address this, we’ll undersample the majority class in both the training and testing sets in order to create balanced datasets with a 50/50 distribution.

code_block: <ListValue: [StructValue([(‘code’, ‘# — 1. Create Balanced TRAIN Set —rntrain_benign_df = folds_100x_train[folds_100x_train[‘filename’].str.contains(‘benign’)]rntrain_malignant_df = folds_100x_train[folds_100x_train[‘filename’].str.contains(‘malignant’)]rnmin_train_count = min(len(train_benign_df), len(train_malignant_df))rnbalanced_train_benign = train_benign_df.sample(n=min_train_count, random_state=42)rnbalanced_train_malignant = train_malignant_df.sample(n=min_train_count, random_state=42)rnbalanced_train_df = pd.concat([balanced_train_benign, balanced_train_malignant])rnrn# — 2. Create Balanced TEST Set —rntest_benign_df = folds_100x_test[folds_100x_test[‘filename’].str.contains(‘benign’)]rntest_malignant_df = folds_100x_test[folds_100x_test[‘filename’].str.contains(‘malignant’)]rnmin_test_count = min(len(test_benign_df), len(test_malignant_df))rnbalanced_test_benign = test_benign_df.sample(n=min_test_count, random_state=42)rnbalanced_test_malignant = test_malignant_df.sample(n=min_test_count, random_state=42)rnbalanced_test_df = pd.concat([balanced_test_benign, balanced_test_malignant])rnrn# — 3. Get the Final Filename Lists —rntrain_filenames = balanced_train_df[‘filename’].valuesrntest_filenames = balanced_test_df[‘filename’].valuesrnrnprint(f”Balanced Train: {len(train_filenames)} files”)rnprint(f”Balanced Test: {len(test_filenames)} files”)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411de50>)])]>

Step 2.2: Create a Hugging Face dataset

We’re converting our data into the Hugging Face datasets format because it’s the easiest way to work with the SFTTrainer from their Transformers library. This format is optimized for handling large datasets, especially images, because it can load them efficiently when needed. And it gives us handy tools for preprocessing, like applying our formatting function to all examples.

code_block: <ListValue: [StructValue([(‘code’, ‘CLASS_NAMES = [rn ‘benign_adenosis’, ‘benign_fibroadenoma’, ‘benign_phyllodes_tumor’,rn ‘benign_tubular_adenoma’, ‘malignant_ductal_carcinoma’,rn ‘malignant_lobular_carcinoma’, ‘malignant_mucinous_carcinoma’,rn ‘malignant_papillary_carcinoma’rn]rnrndef get_label_from_filename(filename):rn filename = filename.replace(‘\\’, ‘/’).lower()rn if ‘/adenosis/’ in filename: return 0rn if ‘/fibroadenoma/’ in filename: return 1rn if ‘/phyllodes_tumor/’ in filename: return 2rn if ‘/tubular_adenoma/’ in filename: return 3rn if ‘/ductal_carcinoma/’ in filename: return 4rn if ‘/lobular_carcinoma/’ in filename: return 5rn if ‘/mucinous_carcinoma/’ in filename: return 6rn if ‘/papillary_carcinoma/’ in filename: return 7rn return -1rnrntrain_data_dict = {rn ‘image’: [os.path.join(BASE_PATH, f) for f in train_filenames],rn ‘label’: [get_label_from_filename(f) for f in train_filenames]rn}rntest_data_dict = {rn ‘image’: [os.path.join(BASE_PATH, f) for f in test_filenames],rn ‘label’: [get_label_from_filename(f) for f in test_filenames]rn}rnfeatures = Features({rn ‘image’: HFImage(),rn ‘label’: ClassLabel(names=CLASS_NAMES)rn})rntrain_dataset = Dataset.from_dict(train_data_dict, features=features).cast_column(“image”, HFImage())rneval_dataset = Dataset.from_dict(test_data_dict, features=features).cast_column(“image”, HFImage())rnrnprint(train_dataset)rnprint(eval_dataset)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411dfa0>)])]>

Step 3: Prompt engineering

This step is where we tell the model what we want it to do. We create a clear, structured prompt that instructs the model to analyze an image and to return only the number that corresponds to a class. This prompt makes the output simple and easy to parse. We then map this format across our entire dataset.

code_block: <ListValue: [StructValue([(‘code’, ‘# Define the instruction promptrnPROMPT = “””Analyze this breast tissue histopathology image and classify it.rnrnClasses (0-7):rn0: benign_adenosisrn1: benign_fibroadenomarn2: benign_phyllodes_tumorrn3: benign_tubular_adenomarn4: malignant_ductal_carcinomarn5: malignant_lobular_carcinomarn6: malignant_mucinous_carcinomarn7: malignant_papillary_carcinomarnrnAnswer with only the number (0-7):”””rnrndef format_data(example):rn “””Format examples into the chat-style messages MedGemma expects.”””rn example[“messages”] = [rn {rn “role”: “user”,rn “content”: [rn {“type”: “image”},rn {“type”: “text”, “text”: PROMPT},rn ],rn },rn {rn “role”: “assistant”,rn “content”: [rn {“type”: “text”, “text”: str(example[“label”])},rn ],rn },rn ]rn return examplernrn# Apply formattingrnformatted_train = train_dataset.map(format_data, batched=False)rnformatted_eval = eval_dataset.map(format_data, batched=False)rnrnprint(“✓ Data formatted with instruction prompts”)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411d940>)])]>

Step 4: Load the model and processor

Here, we load the MedGemma model and its associated processor. The processor is a handy tool that prepares both the images and text for the model. We’ll also make two key parameter choices for efficiency:

torch_dtype=torch.bfloat16: As we mentioned earlier, this format ensures numerical stability.
attn_implementation="sdpa": Scaled dot product attention is a highly optimized attention mechanism that’s available in PyTorch 2.0. Think of this mechanism as telling the model to use a super-fast, built-in engine for its most important calculation. It speeds up training and inference, and it can even automatically use more advanced backends like FlashAttention if your hardware supports it.

code_block: <ListValue: [StructValue([(‘code’, ‘MODEL_ID = “google/medgemma-4b-it”rnrn# Model configurationrnmodel_kwargs = dict(rn torch_dtype=torch.bfloat16,rn device_map=”auto”,rn attn_implementation=”sdpa”,rn)rnrnmodel = AutoModelForImageTextToText.from_pretrained(MODEL_ID, **model_kwargs)rnprocessor = AutoProcessor.from_pretrained(MODEL_ID)rnrn# Ensure right padding for trainingrnprocessor.tokenizer.padding_side = “right”‘), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411d6a0>)])]>

Step 5: Evaluate the baseline model

Before we invest time and compute in fine-tuning, let’s see how the pre-trained model performs on its own. This step gives us a baseline to measure our improvement against.

code_block: <ListValue: [StructValue([(‘code’, ‘# Helper functions to run evaluationrnaccuracy_metric = evaluate.load(“accuracy”)rnf1_metric = evaluate.load(“f1″)rnrndef compute_metrics(predictions, references):rn return {rn **accuracy_metric.compute(predictions=predictions, references=references),rn **f1_metric.compute(predictions=predictions, references=references, average=”weighted”)rn }rnrndef postprocess_prediction(text):rn “””Extract just the number from the model’s text output.”””rn digit_match = re.search(r’\b([0-7])\b’, text.strip())rn return int(digit_match.group(1)) if digit_match else -1rnrndef batch_predict(model, processor, prompts, images, batch_size=8, max_new_tokens=40):rn “””A function to run inference in batches.”””rn predictions = []rn for i in range(0, len(prompts), batch_size):rn batch_texts = prompts[i:i + batch_size]rn batch_images = [[img] for img in images[i:i + batch_size]]rnrn inputs = processor(text=batch_texts, images=images, padding=True, return_tensors=”pt”).to(“cuda”, torch.bfloat16)rn prompt_lengths = inputs[“attention_mask”].sum(dim=1)rnrn with torch.inference_mode():rn outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, pad_token_id=processor.tokenizer.pad_token_id)rnrn for seq, length in zip(outputs, prompt_lengths):rn generated = processor.decode(seq[length:], skip_special_tokens=True)rn predictions.append(postprocess_prediction(generated))rnrn return predictionsrnrn# Prepare data for evaluationrneval_prompts = [processor.apply_chat_template([msg[0]], add_generation_prompt=True, tokenize=False) for msg in formatted_eval[“messages”]]rneval_images = formatted_eval[“image”]rneval_labels = formatted_eval[“label”]rnrn# Run baseline evaluationrnprint(“Running baseline evaluation…”)rnbaseline_preds = batch_predict(model, processor, eval_prompts, eval_images)rnbaseline_metrics = compute_metrics(baseline_preds, eval_labels)rnrnprint(f”\n{‘BASELINE RESULTS’:-^80}”)rnprint(f”Accuracy: {baseline_metrics[‘accuracy’]:.1%}”)rnprint(f”F1 Score: {baseline_metrics[‘f1′]:.3f}”)rnprint(“-” * 80)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411ddf0>)])]>

The performance of the baseline model was evaluated on both 8-class and binary (benign/malignant) classification:

8-Class accuracy: 32.6%
8-Class F1 score (weighted): 0.241
Binary accuracy: 59.6%
Binary F1 score (malignant): 0.639

This output shows that the model performs better than random chance (12.5%), but there’s significant room for improvement, especially in the fine-grained 8-class classification.

A quick detour: Few-shot learning vs. fine-tuning

Before we start training, it’s worth asking: is fine-tuning the only way? Another popular technique is few-shot learning.

Few-shot learning is like giving a smart student a few examples of a new math problem right before a test. You aren’t re-teaching them algebra, you’re just showing them the specific pattern you want them to follow by providing examples directly in the prompt. This is a powerful technique, especially when you’re using a closed model through an API where you can’t access the internal weights.

So why did we choose fine-tuning?

We can host the model: Because MedGemma is an open model, we have direct access to its architecture. This access lets us perform fine-tuning to create a new, permanently updated version of the model.
We have a good dataset: Fine-tuning lets the model learn the deep, underlying patterns in our hundreds of training images far more effectively than just showing it a few examples in a prompt.

In short, fine-tuning creates a true specialist model for our task, which is exactly what we want.

Step 6: Configure and run fine-tuning with LoRA

This is the main event! We’ll use Low-Rank Adaptation (LoRA), which is much faster and more memory-efficient than traditional fine-tuning. LoRA works by freezing the original model weights and training only a tiny set of new adapter weights. Here’s a breakdown of our parameter choices:

r=8: The LoRA rank. A lower rank means fewer trainable parameters, which is faster but less expressive. A higher rank has more capacity, but risks overfitting on a small dataset. Rank 8 is a great starting point that balances performance and efficiency.
lora_alpha=16: A scaling factor for the LoRA weights. A common rule of thumb is to set it to twice the rank (2 × r).
lora_dropout=0.1: A regularization technique. It randomly deactivates some LoRA neurons during training to prevent the model from becoming overly specialized and failing to generalize.

code_block: <ListValue: [StructValue([(‘code’, ‘# LoRA Configurationrnpeft_config = LoraConfig(rn r=8,rn lora_alpha=16,rn lora_dropout=0.1,rn bias=”none”,rn target_modules=”all-linear”,rn task_type=”CAUSAL_LM”,rn)rnrn# Custom data collator to handle images and textrndef collate_fn(examples):rn texts, images = [], []rn for example in examples:rn images.append([example[“image”]])rn texts.append(processor.apply_chat_template(example[“messages”], add_generation_prompt=False, tokenize=False).strip())rn batch = processor(text=texts, images=images, return_tensors=”pt”, padding=True)rn labels = batch[“input_ids”].clone()rn labels[labels == processor.tokenizer.pad_token_id] = -100rn image_token_id = processor.tokenizer.convert_tokens_to_ids(processor.tokenizer.special_tokens_map[“boi_token”])rn labels[labels == image_token_id] = -100rn labels[labels == 262144] = -100rn batch[“labels”] = labelsrn return batchrnrn# Training argumentsrntraining_args = SFTConfig(rn output_dir=”medgemma-breastcancer-finetuned”,rn num_train_epochs=5,rn per_device_train_batch_size=1,rn per_device_eval_batch_size=1,rn gradient_accumulation_steps=8,rn gradient_checkpointing=True,rn optim=”paged_adamw_8bit”,rn learning_rate=5e-4,rn lr_scheduler_type=”cosine”,rn warmup_ratio=0.03, # Warm up LR for first 3% of trainingrn max_grad_norm=0.3, # Clip gradients to prevent instabilityrn bf16=True, # Use bfloat16 precisionrn logging_steps=10,rn save_strategy=”steps”,rn save_steps=100,rn eval_strategy=”epoch”,rn push_to_hub=False,rn report_to=”none”,rn gradient_checkpointing_kwargs={“use_reentrant”: False},rn dataset_kwargs={“skip_prepare_dataset”: True},rn remove_unused_columns=False,rn label_names=[“labels”], rn)rnrn# Initialize and run the trainerrntrainer = SFTTrainer(rn model=model,rn args=training_args,rn train_dataset=formatted_train,rn eval_dataset=formatted_eval,rn peft_config=peft_config,rn processing_class=processor,rn data_collator=collate_fn,rn)rnrnprint(“Starting training…”)rntrainer.train()rntrainer.save_model()’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411d7c0>)])]>

The training took about 80 minutes on the A100 GPU with VRAM 40 GB. The results looked promising, with the validation loss steadily decreasing.

Important (time saving!) tip: If your training gets interrupted for any reason (like a connection issue or exceeding resource limits), you can resume the training process from a saved checkpoint by using the resume_from_checkpoint argument in trainer.train(). Checkpoints can save you valuable time because they’re saved at every save_steps interval as defined in TrainingArguments.

Step 7: The final verdict – evaluating our fine-tuned model

After training, it’s time for the moment of truth. We’ll load our new LoRA adapter weights, merge them with the base model, and then run the same evaluation that we ran for the baseline.

code_block: <ListValue: [StructValue([(‘code’, ‘# Clear memory and load the final modelrndel modelrntorch.cuda.empty_cache()rngc.collect()rnrn# Load base model againrnbase_model = AutoModelForImageTextToText.from_pretrained(rn MODEL_ID,rn torch_dtype=torch.bfloat16,rn device_map=”auto”,rn attn_implementation=”sdpa”rn)rnrn# Load LoRA adapters and merge them into a single modelrnfinetuned_model = PeftModel.from_pretrained(base_model, training_args.output_dir)rnfinetuned_model = finetuned_model.merge_and_unload()rnrn# Configure for generationrnfinetuned_model.generation_config.max_new_tokens = 50rnfinetuned_model.generation_config.pad_token_id = processor_finetuned.tokenizer.pad_token_idrnfinetuned_model.config.pad_token_id = processor_finetuned.tokenizer.pad_token_idrnrn# Load the processor and run evaluationrnprocessor_finetuned = AutoProcessor.from_pretrained(training_args.output_dir)rnfinetuned_preds = batch_predict(finetuned_model, processor_finetuned, eval_prompts, eval_images, batch_size=4)rnfinetuned_metrics = compute_metrics(finetuned_preds, eval_labels)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411ddc0>)])]>

Final results

So, how did the fine tuning impact performance? Let’s look at the numbers for 8-class accuracy and macro F1.

code_block: <ListValue: [StructValue([(‘code’, ‘— 8-Class Classification (0-7) —rnModel Accuracy F1 (Weighted)rn———————————————–rnBaseline 32.6% 0.241rnFine-tuned 87.2% 0.865rn———————————————–rnrn— Binary (Benign/Malignant) Classification —rnModel Accuracy F1 (Malignant)rn———————————————–rnBaseline 59.6% 0.639rnFine-tuned 99.0% 0.991rn———————————————–‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f7bf411dca0>)])]>

The results are great! After fine-tuning, we see a dramatic improvement:

8-Class: Accuracy jumped from 32.6% to 87.2% (+54.6%) and F1 from 0.241 to 0.865.
Binary: Accuracy increased from 59.6% to 99.0% (+39.4%) and F1 from 0.639 to 0.991.

This project shows the incredible power of fine-tuning modern foundation models. We took a generalist AI that was already pre-trained on relevant medical data, gave it a small, specialized dataset, and taught it a new skill with remarkable efficiency. The journey from a generic model to a specialized classifier is more accessible than ever, opening up exciting possibilities for AI in medicine and beyond.

All of the information is available in the Finetune Notebook. You can run it in with a GPU instance on Vertex AI Workbench.

Want to take it to production? Don’t forget to catch the upcoming post, which shows you how to bring the fine tuning and evaluation to Cloud Run jobs.

I hope this guide was helpful. Happy coding!

Special thanks to Fereshteh Mahvar and Dave Steiner from the MedGemma team for their helpful review and feedback on this post.

¹ IEEE Transactions on Biomedical Engineering, vol. 63, no. 7, pp. 1455-1462, 2016

Read More for the details.

2025 11 18

GCP – Bringing Gemini 3 to Enterprise

Tibor Kiss Cloud, Google Cloud gcp

The fastest way to transform your business is here. Today, we’re bringing Gemini 3, our most intelligent model, to every developer and enterprise team. It’s the best model in the world for multimodal understanding, and our most powerful agentic and vibe-coding model yet. Plus, Gemini 3 Pro tops the LMArena Leaderboard with a breakthrough score of 1501 Elo. You can learn more about the model capabilities here.

Gemini 3 is available now in Gemini Enterprise and Vertex AI, so businesses and developers can access:

State-of-the-art reasoning and multimodality: Gemini 3 uses multimodal understanding and state-of-the-art reasoning to analyze text, video, and files all at once. Applications can range from analyzing X-rays and MRI scans to assist in faster diagnostics; to automatically generate transcripts and metadata for podcast content; or to analyzing streams of machine logs to anticipate equipment failure before it happens.
Powerful agentic coding and front-end creation: Gemini 3 is our most powerful agentic and vibe-coding model yet for transforming application development and design. With Gemini 3, enterprises can rapidly prototype full front-end interfaces with a single prompt and leverage agentic coding to quickly move from prototype to production.
Advanced tool use and planning: Gemini 3 enables advanced reasoning with large sets of tools, facilitating long-running tasks across your enterprise systems and data. Businesses can now leverage Gemini 3 to execute tasks like financial planning, supply chain adjustments, and contract evaluation.

Taken together, Gemini 3 is our most intelligent model for helping enterprises transform their businesses for the agentic future.

State-of-the-art reasoning and multimodality

Consider the friction your teams face every day – the data you need exists, but extracting meaning from it forces your smartest people to perform tedious, manual work. That’s why we built Gemini 3 from the ground up to synthesize information about any topic across multiple modalities, including text, images, video, audio, and code.

What this means for your business:

Deeply understand any topic or dataset: Gemini 3 is our most factually accurate model. You can produce personalized training and employee onboarding, perform legal and contract analysis, or handle procurement, with confidence in the model’s understanding of your business.
Make better, data-backed decisions: Gemini 3’s powerful multimodal understanding makes sense of your data, no matter where it comes from. For example, you can more accurately analyze videos, factory floor images, and customer calls alongside text reports, giving you a more unified view of your data.

How customers are already seeing impact:

“As organizations generate and work with vast amounts of unstructured data, Gemini 3 Pro brings a new level of multimodal understanding, planning, and tool-calling that transforms how Box AI interprets and applies your institutional knowledge. The result is content actively working for you to deliver faster decisions and execute across mission-critical workflows, from sales and marketing to legal and finance. We’re excited to offer Gemini 3 Pro to customers today through the Box AI Studio.”

Ben Kus, CTO, Box

“Presentations.AI uses Gemini 3’s multimodal reasoning to analyze company info, extract key strategic moves, and generate content that enables enterprise sales teams to walk into C-suite meetings with intelligence that took analysts 6 hours to compile – generated in 90 seconds.”

Sumanth Raghavendra, CEO and Co-founder, Presentaions.AI

“Gemini 3 represents a significant advancement in multimodal AI. Rakuten partnered with Google to perform alpha testing, and its ability to handle real-world conditions across both audio and vision modalities, especially in challenging scenarios like overlapping speakers or blurry images, sets it apart for enterprise applications. From accurately transcribing 3-hour multilingual meetings with superior speaker identification, to extracting structured data from poor-quality document photos, outperforming baseline models by over 50%, it showcased impressive capabilities that redefine enterprise potential.”

Yusuke Kaji General Manager, AI for Business, Rakuten Group, Inc.

Powerful agentic coding and front-end creation

Many technical teams and developers are often bogged down by the heavy lift of maintaining brittle legacy systems and the cognitive load of juggling disconnected tools.

Gemini 3 has powerful agentic coding capabilities to enable legacy code migration and software testing that act as a force multiplier for technical teams. With a 1M token context window that leads the industry on long context performance, Gemini 3 outperforms previous generations and can consume entire code bases to help developers be more efficient than ever before. Finally, with dramatic improvements in frontend quality, developers can now use Gemini 3 to generate and render richer aesthetics and more sophisticated UI components faster and more reliably.

Accessible through the terminal via Gemini CLI, as well as Google’s new agentic development platform, Google Antigravity, Gemini 3’s powerful intelligence enables it to better synthesize disparate pieces of code and following complex user instructions to handle multi-step development tasks simultaneously. Third party coding platforms like Cursor, GitHub, JetBrains, Manus, Replit, and more are already integrating Gemini 3 Pro into their tools for developers.

What this means for your business:

Accelerate the move from concept to execution: The enhanced zero-shot generation and exceptional instruction following of Gemini 3 allows development teams to rapidly generate everything from well-organized wireframes to stunning high-fidelity frontend prototypes with superior aesthetics and sophisticated UI components.
Help technical teams do more, safely: Because Gemini 3 is the best vibe coding and agentic coding model we’ve ever built, it’s even better at updating old code, running software tests, and handling complex operations – all with our most comprehensive set of safety evaluations to date.

How customers are already seeing impact:

“We’re excited to partner with Google to launch Gemini 3 in Cursor! Gemini 3 Pro shows noticeable improvements in frontend quality, and works well for solving the most ambitious tasks.”

Sualeh Asif, Co-founder and Chief Product Officer, Cursor

“With Gemini 3 Pro in Figma Make, teams have a strong foundation to explore and steer their ideas with code-backed prototypes. The model translates designs with precision and generates a wide, inventive range of styles, layouts, and interactions. As foundation models get better, Figma gets better — and I’m excited to see how Gemini 3 Pro helps our community unlock new creative possibilities.”

Loredana Crisan, Chief Design Officer, Figma

“By bringing Gemini 3 Pro to GitHub Copilot, we’re seeing promising gains in how quickly and confidently developers can move from idea to code. In our early testing in VS Code, Gemini 3 Pro demonstrated 35% higher accuracy in resolving software engineering challenges than Gemini 2.5 Pro. That’s the kind of potential that translates to developers solving real-world problems with more speed and effectiveness.”

Joe Binder, VP of Product, GitHub

“At JetBrains, we pride ourselves on code quality, so we challenged Gemini 3 Pro with demanding frontline tasks: from generating thousands of lines of front-end code to even simulating an operating-system interface from a single prompt. The new Gemini 3 Pro model advances the depth, reasoning, and reliability of AI in developer tools, showing more than a 50% improvement over Gemini 2.5 Pro in the number of solved benchmark tasks. In collaboration with Google, we’re now integrating Gemini 3 Pro into Junie and AI Assistant, to deliver smarter, more context-aware experiences to millions of developers worldwide.”

Vladislav Tankov, Director of AI, JetBrains

“Gemini 3 Pro truly stands out for its design capabilities, offering an unprecedented level of flexibility while creating apps. Like a skilled UI designer, it can range from well-organized wireframes to stunning high-fidelity prototypes.”

Michele Catasta, President & Head of AI, Replit

Advanced tool use and planning

When using AI to work through complex problems, clarity is key. It’s why we trained Gemini 3 to be stronger at tool use and planning so it could be a reliable collaborator when you’re creating sophisticated agents for long-running complex business tasks . Whether you’re building agents to complete multi-step tasks, create plans, or do business planning, Gemini 3 helps you achieve the right outcomes.

What this means for your business:

Build agents that help you forecast: Gemini 3 is the best vibe coding and agentic coding model we’ve ever built – making our products more autonomous and boosting developer productivity. Combined with state-of-the-art reasoning, it means you can execute and forecast quarterly planning, customer support needs, demand campaigns, and more.
Pair strategy with agent execution: Gemini 3’s advanced tool use and reasoning capabilities means you can connect your high-level strategy with the business tools that will carry out the actual work to assist in items like budgeting to full-cycle customer support.

How customers are already seeing impact:

“Gemini 3 Pro is significantly enhancing our user experience on complex agent tasks that require multi-step planning. We immediately achieved a 10% boost in the relevancy of responses for a complex code-generation task used for data retrieval and noted a further 30% reduction in tool-calling mistakes. Ultimately this means our customers get correct answers more often, and more quickly.”

Bob Bradley, Vice President, Data Science & AI Engineering, Geotab

“We’re delighted to see the launch of Gemini 3! With this release, we’ve observed even stronger performance in the model’s reasoning and problem-solving capabilities. Many of Manus’ recent advancements—such as Wide Research and the web-building capabilities introduced in Manus 1.5—have become significantly more powerful with Gemini 3’s support. We look forward to continuing our partnership and delivering even better experiences for our users together.”

Tao Zhang, Co-Founder and Chief Product Officer, Manus AI

“Gemini 3 is a major leap forward for agentic AI. It follows complex instructions with minimal prompt tuning and reliably calls tools, which are critical capabilities to build truly helpful agents. This advancement accelerates Shopify’s ability to build agentic AI tools that solve complex commerce challenges for our merchants.”

Mikhail Parakhin, Chief Technology Officer, Shopify

“Our early evaluations indicate that Gemini 3 is delivering state-of-the-art reasoning with depth and nuance. We have observed measurable and significant progress in both legal reasoning and complex contract understanding. We deeply value the opportunity to collaborate closely with Google DeepMind to validate how these improvements translate into real-world, professional-grade performance for our users. This partnership is vital to bringing the most advanced AI to market with confidence and transparency.”

Joel Hron, Chief Technology Officer, Thomson Reuters

“At Wayfair, we’ve been piloting Google’s Gemini 3 Pro to turn complex partner support SOPs into clear, data-accurate infographics for our field associates. Compared with Gemini 2.5 Pro, it’s a clear step forward in handling structured business tasks that require precision and consistency — helping our teams grasp key information faster and support partners more effectively.”

Fiona Tan, CTO, Wayfair

At WRTN, we leverage Gemini 3 across the full spectrum of our business—from powering Story Generation in Crack and delivering contextual Companion Chat to driving Memory Management and complex B2B Agent Projects. Gemini 3’s multi-lingual capabilities are stellar, especially in high-fidelity languages like Korean, where every model iteration becomes dramatically more natural and stable across all domains. This stability is critical for our agentic planning workflows. The direct and iterative partnership with the Gemini team is what makes this collaboration truly game-changing.

DJ Lee, Chief Product Officer, WRTN Technologies Inc.

Get started with Gemini 3

Today, you can safely put our most powerful agentic and vibe-coding model to work. We’re making Gemini 3 available where your teams already are:

For business teams: You can access Gemini 3 Pro in preview on Gemini Enterprise, our advanced agentic platform for teams to discover, create, share, and run AI agents all in one secure platform.
For developers: You can start building with Gemini 3 Pro in preview on Vertex AI today. Gemini 3 is also available in Google Antigravity, Gemini CLI, AI Studio, and more.

Read More for the details.

2025 11 18

AWS – Amazon Polly expands Generative TTS engine with additional languages and region support

Tibor Kiss AWS, Cloud AWS

Today, we are excited to announce the general availability of five highly expressive Amazon Polly Generative voices in Austrian German (Hannah), Irish English (Niamh), Brazilian Portuguese (Camila), Belgian Dutch (Lisa), and Korean (Seoyeon). This release follows our October launch of Netherlands Dutch (Laura) Generative voice, bringing our total Generative engine offering to thirty-one voices across twenty locales. Additionally, we have expanded the Generative engine to three new regions in Asia Pacific: Asia Pacific (Seoul), Asia Pacific (Singapore), and Asia Pacific (Tokyo).

Amazon Polly is a fully-managed service that turns text into lifelike speech, allowing developers and builders to enable their applications for conversational AI or for speech content creation.

All new and existing Generative voices are now available in the US East (North Virginia), Europe (Frankfurt), US West (Oregon), Asia Pacific (Seoul), Asia Pacific (Singapore), and Asia Pacific (Tokyo) regions.

To hear how Polly voices sound, go to Amazon Polly Features. To learn more about how to use Generative engine, go to AWS Blog. For more details on the Polly offerings and use, please read the Amazon Polly documentation and visit our pricing page.

Read More for the details.

2025 11 18

AWS – Amazon EC2 I7ie instances now available in AWS Asia Pacific (Singapore) Region

Tibor Kiss AWS, Cloud AWS

Amazon Web Services (AWS) is announcing starting today, Amazon EC2 I7ie instances are now available in Asia Pacific (Singapore) region. Designed for large storage I/O intensive workloads, I7ie instances are powered by 5th Gen Intel Xeon Processors with an all-core turbo frequency of 3.2 GHz, offering up to 40% better compute performance and 20% better price performance over existing I3en instances. I7ie instances offer up to 120TB local NVMe storage density for storage optimized instances and offer up to twice as many vCPUs and memory compared to prior generation instances. Powered by 3rd generation AWS Nitro SSDs, I7ie instances deliver up to 65% better real-time storage performance, up to 50% lower storage I/O latency, and 65% lower storage I/O latency variability compared to I3en instances.

I7ie are high density storage optimized instances, ideal for workloads requiring fast local storage with high random read/write performance at very low latency consistency to access large data sets. These instances are available in 9 different virtual sizes and deliver up to 100Gbps of network bandwidth and 60Gbps of bandwidth for Amazon Elastic Block Store (EBS).

To learn more, visit the I7ie instances page.

Read More for the details.

2025 11 18

AWS – AWS Transfer Family announces Terraform module to automate scanning of transferred files

Tibor Kiss AWS, Cloud AWS

AWS Transfer Family Terraform module now supports deployment of automated malware scanning workflows for files transferred using Transfer Family resources. This release streamlines centralized provisioning of threat detection workflows using Amazon GuardDuty S3 Protection, helping you meet data security requirements by identifying potential threats in transferred files.

AWS Transfer Family provides fully managed file transfers over SFTP, AS2, FTPS, FTP, and web browser-
based interfaces for AWS storage services. Using the new module, you can programmatically provision workflows to scan incoming files, dynamically route files based on scan results, and generate threat notifications, in a single deployment. You can granularly implement threat detection for specific S3 prefixes while preserving folder structures post scanning, and ensure that only verified clean files reach your business applications and data lakes. This eliminates the overhead and risks associated with manual configurations, and provides a scalable deployment option for data security compliance.

Customers can get started by using the new module from the Terraform Registry. To learn more about Transfer Family, visit the product page and user guide. To see all the regions where Transfer Family is available, visit the AWS Region table.

Read More for the details.

2025 11 18

AWS – Amazon RDS Optimized Reads now supports R8gd and M8gd database instances

Tibor Kiss AWS, Cloud AWS

Amazon Relational Database Service (RDS) now supports R8gd and M8gd database instances for Optimized Reads on Amazon Aurora PostgreSQL and RDS for PostgreSQL, MySQL, and MariaDB. R8gd and M8gd database instances offer improved price-performance. For example, Optimized Reads on R8gd instances deliver up to 165% better throughput and up to 120% better price-performance over R6g instances for Aurora PostgreSQL.

Optimized Reads uses local NVMe-based SSD block storage available on these instances to store ephemeral data, such as temporary tables, reducing data access to/from network-based storage and improving read latency and throughput. The result is improved query performance for complex queries and faster index rebuild operations. Aurora PostgreSQL Optimized Reads instances using the I/O-Optimized configuration additionally use the local storage to extend their caching capacity. Database pages that are evicted from the in-memory buffer cache are cached in local storage to speed subsequent retrieval of that data.

Customers can get started with Optimized Reads through the AWS Management Console, CLI, and SDK by modifying their existing Aurora and RDS databases or creating a new database using R8gd or M8gd instances. These instances are available in the US East (N. Virginia, Ohio), US West (Oregon), Europe (Spain, Frankfurt), and Asia Pacific (Tokyo) Regions. For complete information on pricing and regional availability, please refer to the pricing page. For information on specific engine versions that support these DB instance types, please see the Aurora and RDS documentation.

Read More for the details.

2025 11 18

GCP – A new era: Supporting customers as a critical ICT third-party provider under EU DORA

Tibor Kiss Cloud, Google Cloud gcp

At Google Cloud, we take our role in the financial ecosystem in Europe very seriously. We also firmly believe that digital operational resilience is vital to safeguarding and enhancing innovation.

Today, we mark a significant milestone in our long-term commitment to the European financial services sector. The European Supervisory Authorities (ESAs) have officially designated Google Cloud EMEA Limited (Google Cloud EMEA), together with its subsidiaries, as a critical Information and Communication Technology (ICT) third-party service provider (CTPP) under the EU Digital Operational Resilience Act (DORA).

This designation acknowledges the systemic importance of the financial entities that rely on our services, as well as the importance of the workloads they have deployed. We welcome this new phase under DORA, and we remain committed to working with our customers and our regulators under DORA to drive towards even greater resilience for the European financial system.

Embracing direct oversight

Google Cloud EMEA has been assigned a dedicated Lead Overseer who will assess our strength in managing ICT risks through oversight. This oversight establishes a direct communication channel between Google Cloud and financial regulators in the EU, and provides a significant opportunity to enhance understanding, transparency, and trust between all parties.

We are confident that this structured dialogue will help us learn and contribute to improved risk management and resilience across the entire sector. We will approach our relationship with the ESAs and our Lead Overseer with the same commitment to ongoing transparency, collaboration, and assurance that we offer our customers and their regulators today.

Keeping customer success in focus

Along with our commitment to successful oversight, we remain focused on supporting our customers’ DORA compliance journeys with helpful resources like our Register of Information Guide and our ICT Risk Management Customer Guide. If you haven’t already, we also encourage our financial entity customers to consider our DORA-specific contract and subcontractor resources. Please contact your Google Cloud representative for further details.

As all financial entities subject to DORA will know, CTPP oversight does not replace your own responsibilities under DORA. That said, by supplementing risk management by financial entities and creating a clear mechanism for information and learnings to flow between CTPPs and key EU and national supervisory stakeholders, we feel confident that customers and users will benefit from the oversight of CTPPs.

Looking ahead

We value the constructive dialogue the ESAs have fostered with industry, and look forward to continuing this collaboration with our Lead Overseer. We believe that together we can help to build a more resilient and secure financial sector in Europe.

As we move forward in this new era of direct oversight, our goal remains to make Google Cloud the best possible service for sustainable, digital transformation for all European organizations on their terms.

Read More for the details.

2025 11 18

AWS – Active threat defense now enabled by default in AWS Network Firewall

Tibor Kiss AWS, Cloud AWS

Starting today, AWS Network Firewall enables active threat defense by default in alert mode when you create new firewall policies in the AWS Management Console. Active threat defense provides automated, intelligence-driven protection against dynamic, ongoing threat activities observed across AWS infrastructure.

With this default setting you get visibility into threat activity and indicator groups, types, and threat names you are protected against. You can switch to block mode to automatically prevent suspicious traffic, such as command-and-control (C2) communication, embedded URLs, and malicious domains, or disable the feature entirely. AWS verifies threat indicators to ensure high accuracy and minimize false positives.

Active threat defense is available in all regions where AWS Network Firewall is available, including AWS GovCloud (US) and China Regions. To learn more about active threat defense and pricing, see the AWS Network Firewall product page and documentation.

Read More for the details.

2025 11 18

AWS – Amazon MSK Replicator is now available in two additional AWS Regions

Tibor Kiss AWS, Cloud AWS

You can now use Amazon MSK Replicator to replicate streaming data across Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in Asia Pacific (Hyderabad) and Asia Pacific (Malaysia) Regions.

MSK Replicator is a feature of Amazon MSK that enables you to reliably replicate data across Amazon MSK clusters in different or the same AWS Region(s) in a few clicks. With MSK Replicator, you can easily build regionally resilient streaming applications for increased availability and business continuity. MSK Replicator provides automatic asynchronous replication across MSK clusters, eliminating the need to write custom code, manage infrastructure, or setup cross-region networking. MSK Replicator automatically scales the underlying resources so that you can replicate data on-demand without having to monitor or scale capacity. MSK Replicator also replicates the necessary Kafka metadata including topic configurations, Access Control Lists (ACLs), and consumer group offsets. If an unexpected event occurs in a region, you can failover to the other AWS Region and seamlessly resume processing.

With this launch, MSK Replicator is now available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), South America (Sao Paulo), China (Beijing), China (Ningxia), Asia Pacific (Hyderabad), and Asia Pacific (Malaysia). You can get started with MSK Replicator from the Amazon MSK console or the Amazon CLI. To learn more, visit the MSK Replicator product page, pricing page, and documentation.

Read More for the details.

2025 11 18

AWS – Amazon Redshift announces support for the SUPER data type in Databases with Case-Insensitive Collation

Tibor Kiss AWS, Cloud AWS

Amazon Redshift announces support for the SUPER data type in databases with case insensitive collation, enabling analytics on semi-structured and nested data in these databases. Using the SUPER data type with PartiQL in Amazon Redshift, you can perform advanced analytics that combine structured SQL data (such as string, numeric, and timestamp) with the semi-structured SUPER data (such as JSON) with flexibility and ease-of-use.

This enhancement allows you to leverage the SUPER data type for your structured and semi-structured data processing needs in databases with case-insensitive collation. Using the COLLATE function, you can now explicitly specify case sensitivity preferences for SUPER columns, providing greater flexibility in handling data with varying case patterns. This is particularly valuable when working with JSON documents, APIs, or application data where case consistency isn’t guaranteed. Whether you’re processing user-defined identifiers or integrating data from multiple sources, you can now perform complex queries across both case-sensitive and case-insensitive data without additional normalization overhead.

Amazon Redshift support for the SUPER data type in databases with case insensitive collation is available in all AWS Regions, including the AWS GovCloud (US) Regions, where Amazon Redshift is available. See AWS Region Table for more details. To learn more about the SUPER data type in databases with case insensitive collation, please visit our documentation.

Read More for the details.

2025 11 18

AWS – Workshops now available in AWS Builder Center

Tibor Kiss AWS, Cloud AWS

AWS Builder Center now provides access to the catalog of AWS Workshops, offering step-by-step instructions crafted by AWS experts that explain how to deploy and use AWS services effectively. These workshops cover a wide range of AWS services and use cases, allowing builders to follow guided tutorials within their own AWS accounts. Workshops are designed for builders of all skill levels to gain practical experience and develop solutions tailored to their specific business needs using AWS services.

The AWS Workshops Catalog features hundreds of workshops with advanced filtering capabilities to quickly find relevant content by category (Machine Learning, Security, Serverless), AWS service (EC2, Lambda, S3), and complexity level (100-Beginner through 400-Expert). Real-time search with partial matching across workshop titles, descriptions, services, and categories helps surface the most relevant content. Catalog content automatically localized based on your Builder Center language preference.

Builders can navigate to the Workshops catalog at builder.aws.com/build/workshops and filter by specific needs—whether you have 1 hour or 8 hours, are a beginner or expert, or want to focus on specific services like Amazon Bedrock and SageMaker. Seamless navigation from Builder Center discovery to the full workshops experience enables hands-on, step-by-step guided learning in your own AWS account.

You can begin exploring Workshops in AWS Builder Center immediately with a free Builder ID. To get started with Workshops, visit AWS Builder Center.

Read More for the details.

2025 11 18

AWS – AWS announces flat-rate pricing plans for website delivery and security

Tibor Kiss AWS, Cloud AWS

Amazon Web Services (AWS) is launching flat-rate pricing plans with no overages for website delivery and security. The flat-rate plans, available with Amazon CloudFront, combine global content delivery with AWS WAF, DDoS protection, Amazon Route 53 DNS, Amazon CloudWatch Logs ingestion, and serverless edge compute into a simple monthly price with no overage charges. Each plan also includes monthly Amazon S3 storage credits to help offset your storage costs.

CloudFront flat-rate plans allow you to deliver your websites and applications without calculating costs across multiple AWS services. You won’t face the risk of overage charges, even if your website or application goes viral or faces a DDoS attack. Security features like WAF and DDoS protection are enabled by default, and additional configurations are simple to set up. When you serve your AWS applications through CloudFront instead of directly to the internet, your flat-rate plan covers the data transfer costs between your applications and your viewers for a simple monthly price without the worry of overages. This simplified pricing model is available alongside pay-as-you-go pricing for each CloudFront distribution, giving you the flexibility to choose the right pricing model and feature set for each application.

Plans are available in Free ($0/month), Pro ($15/month), Business ($200/month), and Premium ($1,000/month) tiers for new and existing CloudFront distributions. Select the plan tier with the features and usage allowances matching your application’s needs. To learn more, refer to the Launch Blog, Plans and Pricing, or CloudFront Developer Guide. To get started, visit the CloudFront console.

Read More for the details.