Cloud

2025 10 07

AWS – Amazon Redshift Serverless with lower base capacity available in the AWS Asia Pacific (Seoul) and Canada (Central) Regions

Tibor Kiss AWS, Cloud AWS

Amazon Redshift now allows you to get started with Amazon Redshift Serverless with a lower data warehouse base capacity configuration of 8 Redshift Processing Units (RPUs) in the AWS Asia Pacific (Seoul) and Canada (Central) regions. Each RPU provides 16 GB of memory. Amazon Redshift Serverless measures data warehouse capacity in RPUs, and you pay only for the duration of workloads you run in RPU-hours on a per-second basis. Previously, the minimum base capacity required to run Amazon Redshift Serverless was 32 RPUs. With the new lower base capacity minimum of 8 RPUs, you now have even more flexibility to support a diverse set of workloads of small to large complexity based on your price performance requirements. You can increment or decrement the RPU in units of 8 RPUs.

Amazon Redshift Serverless allows you to run and scale analytics without having to provision and manage data warehouse clusters. With Amazon Redshift Serverless, all users, including data analysts, developers, and data scientists, can use Amazon Redshift to get insights from data in seconds. With the new lower capacity configuration, you can use Amazon Redshift Serverless for production environments, test and development environments at an optimal price point when a workload needs a small amount of compute.

To get started, see the Amazon Redshift Serverless feature page, user documentation, and API Reference.

Read More for the details.

2025 10 06

AWS – Amazon EKS and Amazon EKS Distro now supports Kubernetes version 1.34

Tibor Kiss AWS, Cloud AWS

Kubernetes version 1.34 introduced several new features and bug fixes, and AWS is excited to announce that you can now use Amazon Elastic Kubernetes Service (EKS) and Amazon EKS Distro to run Kubernetes version 1.34. Starting today, you can create new EKS clusters using version 1.34 and upgrade existing clusters to version 1.34 using the EKS console, the eksctl command line interface, or through an infrastructure-as-code tool.

Kubernetes version 1.34 introduces several key improvements, including projected service account tokens for kubelet image credential providers helping improve security for container image pulls, and Pod-level resource requests and limits for simplified multi-container resource management. The release also introduces Dynamic Resource Allocation (DRA) prioritized alternatives, enabling workloads to define prioritized device requirements for improved resource scheduling. To learn more about the changes in Kubernetes version 1.34, see our documentation and the Kubernetes project release notes.

EKS now supports Kubernetes version 1.34 in all the AWS Regions where EKS is available, including the AWS GovCloud (US) Regions.

You can learn more about the Kubernetes versions available on EKS and instructions to update your cluster to version 1.34 by visiting EKS documentation. You can use EKS cluster insights to check if there any issues that can impact your Kubernetes cluster upgrades. EKS Distro builds of Kubernetes version 1.34 are available through ECR Public Gallery and GitHub. Learn more about the EKS version lifecycle policies in the documentation.

Read More for the details.

2025 10 06

AWS – AWS Incident Detection and Response is now available in the AWS GovCloud (US) Regions

Tibor Kiss AWS, Cloud AWS

AWS Incident Detection and Response is now available in both AWS GovCloud (US-West) and AWS GovCloud (US-East) Regions.

AWS Incident Detection and Response offers eligible AWS Enterprise Support customers proactive incident engagement to reduce the potential for failure and accelerate recovery of critical workloads from disruption. Incident Detection and Response facilitates your collaboration with AWS to develop runbooks and response plans customized to each onboarded workload.

Read More for the details.

2025 10 06

GCP – 11 ways to reduce your Google Cloud compute costs today

Tibor Kiss Cloud, Google Cloud gcp

As the saying goes, “a penny saved is a penny earned,” and this couldn’t be more true when it comes to cloud infrastructure. In today’s competitive business landscape, you need to maintain the performance to meet your business needs. Luckily, Google Cloud’s Compute Engine and block storage services offer numerous opportunities to reduce costs without sacrificing performance, especially in the context of your migration and modernization initiatives.

In this article, we’ll explore 11 key ways to optimize your infrastructure spending on Google Cloud, from simple adjustments to strategic decisions that can result in significant long-term savings.

1. Choose the right VM instances

One of the most effective ways to reduce Compute Engine costs is to ensure that you’ve properly selected and right-sized your virtual machines (VMs) for their workloads to support your migration and modernization efforts. Whether you’re new to Google Cloud or already using Compute Engine, adopting the latest-generation VMs — such as N4, C4, C4D, and C4A — can deliver substantial savings and improved price-performance.

Powered by Google Cloud’s Titanium architecture, our latest-generation VMs offer faster CPUs, higher memory bandwidth, and more efficient virtualization than their predecessors, so you can handle the same workloads with fewer resources. For existing customers, migrating from older VM generations to the newest VMs can significantly lower total costs while helping you exceed current performance levels. Organizations that have made the switch often report 20–40% better performance along with meaningful reductions in cloud compute spend. For example, Elastic leveraged the general-purpose C4A machine series based on Google Cloud’s Arm-based Axion CPUs, to achieve a compelling efficiency and performance uplift for their workloads.

Beyond general-purpose VMs, we also offer specialized machine types to address unique customer requirements. Compute-optimized HPC VMs like H4D are designed for high-performance computing and data analytics, offering extreme performance for demanding workloads. M4 and X4 instances cater to memory-intensive applications, while Z3 instances are ideal for storage-intensive workloads. Furthermore, if you need complete control over your hardware environment and maximum performance isolation, we offer bare metal instances.

These options help ensure that even the most specialized and performance-sensitive workloads can find an optimal and cost-effective home within the Compute Engine portfolio.

2. Optimize your block storage selections

The best way to lower your block storage TCO, while ensuring your workloads remain successful, is to drive high resource efficiency. Hyperdisk makes it simple to drive high performance and high efficiency by enabling you to optimize your block storage to your workload and through Storage Pools. We’ll discuss each of these capabilities, and how you can use them to lower your block storage TCO below.

Workload Optimized: With Hyperdisk, you can independently tune capacity and performance to match your block storage resources to your workload. Hyperdisk enables you to independently provision performance and capacity at the volume level. You can leverage this capability to purchase just the capacity and performance you need, no more and no less. You can also take advantage of Hyperdisk Balanced’s “baseline” performance (i.e. included free with every volume), you can serve the vast majority of your VMs without purchasing any extra performance.

Storage Pools: Hyperdisk is the only hyperscale cloud block storage to offer thin-provisioned performance and capacity. With Hyperdisk Storage Pools, you can provision the aggregate performance and capacity your workload requires, while still provisioning the volume level capacity performance your workloads request (also known as thin-provisioning). This allows you to pay for the resources you need, not the sum of the volumes you’ve provisioned. As a result, you can lower your overall block storage TCO by as much as 50%.

For more information on how to select the right block storage for your workload and to see how customers have benefitted from Hyperdisk, read this blog.

3. Consider custom compute classes

To get the most out of our latest-generation VMs, Google Kubernetes Engine (GKE) custom compute classes (CCC) offer an advanced way to optimize compute choices and provide high availability. Instead of being limited to a single machine type for your workloads, you can define a prioritized list of VM instance types. This allows you to set the newest, most price-performant VMs — including our latest-generation VMs — as your top priority. GKE custom compute classes provide the capability to automatically and seamlessly spin up instances based on your specified priority list. This feature helps you maximize the availability of your compute capacity while still aiming for the most cost-effective options, so your workloads can scale reliably without manual intervention.

Here are some specific use cases for how custom compute classes can help you optimize costs:

Autoscaling cost-performant fallbacks: When demand peaks, you might be tempted to autoscale using a highly available but less cost-efficient VM type. CCC allows you to take a tiered approach. You can set up several cost-efficient fallback alternatives, so that as demand increases, GKE first attempts to use the most cost-effective options, and progressively moves to the other choices in your list when necessary to meet demand.
AI/ML inference: Running AI/ML inference workloads often involves significant compute resources. Instead of maintaining a large, static reservation that might sit idle during off-peak times, CCC lets you provision a minimal base reservation and leverage more cost-effective capacity types, such as Spot VMs, to handle peak inference demand — all orchestrated through your CCC configuration.
Adopting new VM generations: Combine the power of GKE custom compute classes with Compute Flexible committed use discounts (Flex CUDs) to de-risk the adoption of new, cost-efficient VM series like N4 and C4. With CCC, you can define fallback options, providing workload resilience, while Flex CUDs offer financial adaptability, as the discounts apply across your total eligible compute spend, regardless of the specific VM series you use. This dual approach is a safe, cost-effective strategy for leveraging the latest hardware without disruption. For more information, read this blog.
Using flexible Spot VMs: Spot VMs offer significant savings but can be preempted. Being constrained to a single Spot VM shape increases the risk that capacity will not be available. With CCC, you can define multiple fallback Spot VM types. This “spot surfing” capability allows the application to remain on cost-efficient Spot capacity by automatically pivoting to alternative Spot instance types if the primary choice is unavailable.

In short, by leveraging GKE CCC, you can artfully mix and match various VM types and consumption models, including On-Demand, Spot, DWS FlexStart, and instances covered by CUDs, to build a resilient and highly cost-optimized infrastructure that adapts to the unique needs and patterns of your workloads.

4. Leverage custom machine types (CMT)

Custom machine types, available on N4 VMs, allow you to precisely configure virtual machines to your exact specifications. Rather than selecting from predefined machine types that might include excess capacity, you can tailor the CPU-to-memory ratio specifically for your workloads, so you only pay for resources you actually use. This targeted approach minimizes waste and can significantly reduce your cloud spend, especially when migrating from on-premises to Google Cloud or from other cloud providers.

This flexibility becomes particularly valuable if your applications have unique resource profiles that don’t align well with our standard offerings. Custom machine types let you create the perfect environment for your needs. By avoiding the compromise of over-provisioning certain resources while potentially constraining others, you can achieve both better performance and more efficient spending across your Compute Engine deployment.

As an example, take a memory-intensive workload that runs best with 16 vCPU, and 70 GB memory. Normally, you would need to pick a VM with 128 GB memory with our standard shapes, or in other cloud contexts, resulting in higher costs to run your workload due to the extra provisioned resources. Instead, with custom machine types, you can easily launch a VM with 16 vCPU and 70 GB memory, resulting in an 18% cost savings vs standard N4-highmem-16 VMs.

5. Make the most of committed use discounts

CUDs are a strategic cost-saving opportunity for organizations with steady, predictable computing needs. By committing to resource usage over one- or three-year periods, you can reduce cloud costs by up to 70% compared to on-demand pricing. This approach not only helps ensure budget predictability but also converts fixed infrastructure spending into a financial advantage, making it ideal for stable workloads that support core business functions.

Google Cloud offers flexible CUD structures to align with various operational models. Resource-based commitments target specific machine types and regions, flexible commitments apply discounts across projects, regions, and machine series — great for dynamic environments. By analyzing historical usage and forecasting future needs, you can identify workloads suited for these discounts, reinvesting the savings into innovation and scaling initiatives.

6. Manage unused disk space

You pay for the total provisioned disk space, regardless of how much you actually use. Many organizations tend to over-provision storage “just in case,” which often leads to unnecessary and costly waste. For instance, if you provision a 100GB disk but only use 20GB, you’re still paying for the entire 100GB. Being intentional and precise with your storage allocations — rather than rounding up to common sizes — can lead to significant cost savings.

To optimize spending, it’s important to adopt a few best practices. Using Ops Agent, regularly audit disk usage across your infrastructure to identify and eliminate inefficiencies. Resize disks to align with actual consumption, allowing a reasonable buffer for growth. Implement automated alerts in Cloud Monitoring to detect underutilized disks and take corrective action. For stateless applications, consider using smaller boot disk images to minimize overhead and reduce costs even further.

In addition, consider the following optimization strategies to further reduce costs and improve efficiency:

Use Google Cloud’s monitoring tools to track CPU, memory, and disk usage over time.
Establish a regular review cycle to identify and right-size over-provisioned resources.
Test workloads across different VM configurations to find the optimal balance between cost and performance.

7. Use Spot VMs

Spot VMs provide the same machine types and configuration options as standard virtual machines but at a significantly reduced cost — typically offering a 60% to 91% discount. This cost efficiency comes with the tradeoff of potential preemption at short notice, making them most suitable for workloads that are fault-tolerant and can recover quickly from unexpected interruptions. Spot VMs are designed to take advantage of unused compute capacity, allowing you to optimize your cloud spending without compromising access to high-performance resources.

Strong use cases for Spot VMs include batch processing jobs, big data and analytics workloads, continuous integration and deployment (CI/CD) pipelines, stateless web servers running in autoscaling groups, and compute-heavy tasks. When properly architected to handle interruptions — for example, by using job checkpointing, load balancing, task queues, or via GKE custom compute classes (see more above) — Spot VMs can play a critical role in minimizing infrastructure costs while maintaining high availability and system resilience. Leveraging Spot VMs in these scenarios lets you scale cost-effectively, especially when compute demand is variable or time-flexible.

8. Use optimization recommendations

Google Cloud’s Recommenders are a powerful tool designed to help you optimize your cloud resources efficiently. When browsing the Google Cloud console, you may see lightbulb icons next to specific resources — these indicate potential improvements identified by Google’s recommendation engine. By analyzing real-time usage patterns and current resource configurations, the Recommender delivers actionable insights tailored to each user’s unique environment. This intelligent system highlights opportunities not only to reduce costs but also to enhance security, performance, reliability, management efficiency, and environmental sustainability.

For example, there are idle VM recommendations to help you identify VM instances that have not been used over the last 1 to 14 days. Common recommendations include switching to more suitable machine types, rightsizing underutilized compute instances, or adopting more cost-effective storage solutions. The tool allows you to apply many of these changes directly, streamlining the optimization process. By continuously evaluating workloads and offering these automated, data-driven suggestions, the Recommendation Hub helps organizations maintain cloud performance while managing costs more effectively.

9. Take advantage of auto-scaling and scheduling

Matching your compute resources to actual demand patterns is one of the most effective ways to reduce cloud waste and improve overall cost efficiency. Many organizations over-provision their resources to handle peak workloads, leaving machines underutilized during off-peak periods. By aligning compute capacity more closely with real-time or predictable usage patterns, such as business hours or seasonal trends, you can significantly cut unnecessary spending without sacrificing performance.

Autoscaling is the key to achieving this efficiency. In fact, customers who leverage Google Compute Engine autoscaling for their virtual machines have seen average infrastructure cost savings of more than 40%.

You can implement autoscaling strategies to dynamically adjust resources based on CPU utilization, load balancing capacity, or custom application metrics, so that workloads receive the necessary compute power when needed, while scaling down automatically during low-demand periods.

For workloads with predictable patterns, such as those that fluctuate with business hours or planned seasonal events, schedule-based scaling is a particularly powerful tool. This approach allows you to proactively increase resources in anticipation of high demand and scale them down during lulls, for the performance you need without constant over-provisioning.

In addition to autoscaling, several practical implementation techniques can further optimize your resource usage. Setting up instance scheduling lets you automatically start and stop development and test environments according to business hours — a simple yet highly effective approach that can lead to cost savings of up to 70%. You can also leverage maintenance windows to reduce disruptions and resource consumption, by concentrating updates and system changes into low-usage periods. Together, these tactics help maintain high availability and performance while keeping infrastructure costs under control.

10. Understand your spend with detailed billing analysis

Before implementing any cost-saving strategies in Google Cloud, it’s essential to understand your current spending in detail. Google Cloud’s billing panel offers granular visibility into your expenses, including costs broken down by individual SKUs. This level of transparency lets you track where your money is going and identify potential inefficiencies. Begin by regularly reviewing your billing dashboard to monitor usage trends and spot anomalies. Applying labels and tags to your resources can further help categorize and attribute costs accurately, especially in complex environments with multiple projects or departments.

In addition, setting up budget alerts is a practical way to stay ahead of overspending by notifying you when costs approach or exceed predefined thresholds. It’s also important to identify and eliminate unused or idle resources, such as virtual machines or persistent disks that are no longer in active use — these can often be shut down or deleted to immediately reduce costs. By thoroughly analyzing your cost structure, you can uncover “low-hanging fruit” — resources that provide little or no value — and make data-driven decisions to optimize your cloud usage efficiently.

11. Consider serverless alternatives

Last but not least, Google Cloud’s serverless computing offerings provide a compelling alternative to traditional virtual machines, can deliver better cost efficiency, simplified operations, and greater scalability. By abstracting away infrastructure management, serverless platforms allow teams to focus on writing and deploying code without worrying about provisioning, scaling, or maintaining servers. This shift can not only reduce operational overhead but also cut costs by aligning compute spending directly with application usage.

There are multiple serverless options available, each tailored to different workloads. Cloud Run is designed for running containerized applications that need rapid scaling and flexible deployment. Cloud Run Functions supports lightweight, event-driven code execution for microservices or automation tasks. GKE (Autopilot Mode) simplifies Kubernetes operations by automatically managing nodes and scaling, allowing you to run Kubernetes workloads without handling the underlying infrastructure. All these options charge based on usage not allocation, significantly reducing costs associated with idle resources and over-provisioning. This makes them especially beneficial for variable or unpredictable workloads. Cloud Run and GKE both support GPU’s and flexibility to move between the two. You can start with Cloud Run then move to GKE or vice-versa. Some customers also leverage both offerings for workloads. The rule of thumb is to start with GKE if you need access to the Kubernetes API. Otherwise, start with Cloud Run.

Start reducing your costs today

Migrate to Google Cloud and optimize your infrastructure costs without compromising on what your workloads need. If you are new to Google Cloud, start with a migration assessment. Google Cloud’s Migration Center can help you with a clear understanding of your potential savings by migrating to Google Cloud, with detailed recommended paths for your workloads, along with TCO reports. Apply the strategies in this article and unlock substantial cost savings.

Read More for the details.

2025 10 06

AWS – Amazon Connect launches new case APIs to link related cases, add custom related items, and search across them

Tibor Kiss AWS, Cloud AWS

Amazon Connect now allows you to programmatically enrich case data by linking related cases, attaching custom related items, and searching across them, so agents have the full context they need to resolve issues faster. For example, an airline can link all customer cases tied to a single flight cancellation to coordinate rebookings and send proactive updates, while a retailer can attach order and shipment details to a refund request to deliver faster resolutions and keep customers informed.

Amazon Connect Cases is available in the following AWS regions: US East (N. Virginia), US West (Oregon), Canada (Central), Europe (Frankfurt), Europe (London), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Africa (Cape Town) AWS regions. To learn more and get started, visit the Amazon Connect Cases webpage and documentation.

Read More for the details.

More choice, more control: self-deploy proprietary models in your VPC with Vertex AI

2025 10 06

GCP – More choice, more control: self-deploy proprietary models in your VPC with Vertex AI

Tibor Kiss Cloud, Google Cloud gcp

Building the best AI applications requires both the freedom to choose the most powerful, specialized model for the task at hand, and a platform that can handle them all. This flexibility is core to the Vertex AI platform, and today, we’re taking a significant step forward in our commitment to giving you unparalleled choice and control.

We are excited to announce that you can now securely deploy a growing selection of leading proprietary models from industry partners, including AI21 Labs, CAMB.AI, CSM, Mistral AI, Qodo, and Virtue AI, with models from Contextual AI and WRITER coming soon. You can deploy these models — including closed-source models and those with restricted commercial licenses — directly into your own Virtual Private Cloud (VPC).

You will find all of these models in the Vertex AI Model Garden, our central gateway to over 200 foundation models, including Google’s versatile Gemini family, leading open models, and third-party models. We provide a single, curated catalog where you can discover, test, and deploy the ideal model for your application.

Announcing self-deployable proprietary models in your VPC

For organizations that require maximum control over their data and infrastructure, you can now self-deploy powerful proprietary models from leading AI model builders directly within your VPC. With this new capability, you can acquire commercial licenses via Google Cloud Marketplace and deploy models securely within your environment, all while meeting Google Cloud’s high standards for security and compliance. Self-deploying proprietary models with Google Cloud gives you a number of benefits:

Deploy models within your VPC with full adherence to your VPC-SC policies, providing the highest assurance that your proprietary business data never leaves your environment. You can evaluate and deploy third-party models to production on a trusted platform.
You can optimize for performance or cost by selecting from a range of available machine types. Scale your replica count up or down manually to meet workload demands, or configure auto-scaling policies for hands-free management. Deploy to specific Google Cloud regions of your choice to achieve data compliance in your target markets or select locations for low latency delivery to your customers.
Discover, license, and deploy a curated selection of proprietary models from industry-leading providers, all in one place. We’re launching with models from eight partners—AI21 Labs, CAMB.AI, Contextual AI, CSM, Mistral AI, Qodo, Virtue AI, and WRITER. These models cover a wide range of use-cases and specializations. This is just the beginning, and you’ll see us continue to expand our catalog with the latest generative AI models.
Go from model discovery to production with ease. You can procure commercial licenses and deploy the models with just a few clicks directly from the Model Garden console. Our fully managed AI inference service handles the underlying infrastructure, so you’re free to focus on building your application.
Get started with simple pay-as-you-go pricing, so you only pay for what you use. You control your costs by scaling your deployment to meet your needs, and not deal with artificial limits or quota caps. You can further optimize costs by applying your existing Google Cloud committed-use discounts (CUDs) or reservations.

Meet Our Launch Models

Explore the new models available today for self-deployment in your VPC:

Slide

AI21 Labs – Jamba Large 1.6: Delivers leading model quality at fast speed, making it an excellent choice for private enterprise deployment.
CAMB.AI – MARS7: Enables you to ship production-ready voice applications with hyper-realistic, multilingual text-to-speech (TTS) outputs featuring optional voice cloning and fine-grained emotional control.
(Coming soon!) Contextual AI – Reranker: Designed to significantly enhance the relevance and quality of Retrieval-Augmented Generation (RAG) systems.
CSM – Cube: A generative AI model that transforms 2D images into detailed 3D models with remarkable precision and efficiency.
Mistral AI – Codestral (25.01): Explicitly designed for code generation tasks, helping developers write and interact with code through a shared instruction.
Qodo – Embed-1: A suite of large-scale code embedding models that enhance search accuracy for RAG by enabling efficient code and text retrieval.
Virtue AI – VirtueGuard: An enterprise-ready AI guardrail model that enables real-time content security, policy enforcement, and regulatory compliance with multilingual support for generative AI systems.
(Coming soon!) WRITER – Palmyra X4: Enterprise-grade LLM that combines a 128K token context window with a suite of capabilities, including advanced reasoning, tool calling, LLM delegation, built-in RAG, code generation, structured outputs, multi-modality, and multilingual

How to Get Started

You can deploy these new models in three simple steps:

Visit the Vertex AI Model Garden. On the left-hand navigation tab, under “Model Collections,” select “Self-deploy partner models”.
Select the model from a partner that you choose to deploy. To use the selected model, purchase a license by clicking “Enable”.
Your license is active in a few seconds. Simply click “Deploy” to configure and deploy the model endpoint in your VPC using Model Garden’s one-click deployment workflow.

We are committed to providing the most open and flexible AI platform for the enterprise. With even more choice and the fine-grained control and security of your own environment, you have everything you need to innovate responsibly. Explore the new models in Model Garden today and start building today!

Read More for the details.

2025 10 06

AWS – Amazon Connect now enables you to customize service level calculations

Tibor Kiss AWS, Cloud AWS

Amazon Connect now enables you to customize service level calculations to your specific needs. Supervisors and managers can define time thresholds for when a contact is considered to meet service level standards and select which contact outcomes to include in the calculation. For example, managers can choose to count callback contacts, exclude contacts transferred out while waiting in queue, and exclude short abandons using a configurable time threshold. Customization of service level calculation is available from the metric configuration section on the analytics dashboards.

With this feature supervisors and managers can now create a service level metric calculation that better aligns with their business operations. With a customized view of service level performance, operations managers can assess how effectively they have met their service standards.

This new feature is available in all AWS regions where Amazon Connect is offered. To learn more about customizing your service level calculation, visit the Admin Guide. To learn more about Amazon Connect, the easy-to-use cloud contact center, visit the Amazon Connect website.

Read More for the details.

2025 10 06

AWS – New Compute Optimized Amazon EC2 C8i and C8i-flex instances

Tibor Kiss AWS, Cloud AWS

AWS is announcing the general availability of new compute optimized Amazon EC2 C8i and C8i-flex instances. These instances are powered by custom Intel Xeon 6 processors, available only on AWS, delivering the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. These C8i and C8i-flex instances offer up to 15% better price-performance, and 2.5x more memory bandwidth compared to previous generation Intel-based instances. They deliver up to 20% better performance than C7i and C7i-flex instances, with even higher gains for some workloads. The C8i and C8i-flex instances are up to 60% faster for NGINX web applications, up to 40% faster for AI deep learning recommendation models, and 35% faster for Memcached stores compared to C7i and C7i-flex instances.

C8i-flex are the easiest way to get price performance benefits for a majority of compute intensive workloads like web and application servers, databases, caches, Apache Kafka, Elasticsearch, and enterprise applications. They offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources.

C8i instances are a great choice for all compute intensive workloads, especially for workloads that need the largest instance sizes or continuous high CPU usage. C8i instances offer 13 sizes including 2 bare metal sizes and the new 96xlarge size for the largest applications.

C8i and C8i-flex instances are available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Spain).

To get started, sign in to the AWS Management Console. Customers can purchase these instances via Savings Plans, On-Demand instances, and Spot instances. For more information about the new C8i and C8i-flex instances visit the AWS News blog.

Read More for the details.

2025 10 03

AWS – AWS Glue adds write operations for SAP OData, Adobe Marketo Engage, Salesforce Marketing Cloud, and HubSpot connectors

Tibor Kiss AWS, Cloud AWS

AWS Glue adds write operations support for SAP OData, Adobe Marketo Engage, Salesforce Marketing Cloud, and HubSpot connectors. This allows you to not only extract data from those applications, but also write data to them directly from your AWS Glue ETL jobs.

With the new write functionality you can create and update records in SAP systems; sync leads into Adobe Marketo Engage; updating subscriber and campaign data in Salesforce Marketing Cloud; manage contacts, companies, and deals in HubSpot; and more.

This feature simplifies building end-to-end ETL pipelines that both extract data from and write processed results back to target applications, eliminating the need for custom scripts or intermediate systems.

Write operations support for SAP OData, Adobe Marketo Engage, Salesforce Marketing Cloud, and HubSpot connectors is available in all Regions where AWS Glue is available. To learn more and see the list of supported entities, visit AWS Glue documentation.

Read More for the details.

2025 10 03

AWS – Amazon OpenSearch Ingestion now supports batch AI inference

Tibor Kiss AWS, Cloud AWS

You can now perform batch AI inference within Amazon OpenSearch Ingestion pipelines to efficiently enrich and ingest large datasets for Amazon OpenSearch Service domains.

Previously, customers used OpenSearch’s AI connectors to Amazon Bedrock, Amazon SageMaker, and 3rd-party services for real-time inference. Inferences generate enrichments such as vector embeddings, predictions, translations, and recommendations to power AI use cases. Real-time inference is ideal for low-latency requirements such as streaming enrichments. Batch inference is ideal for enriching large datasets offline, delivering higher performance and cost efficiency. You can now use the same AI connectors with Amazon OpenSearch Ingestion pipelines as an asynchronous batch inference job to enrich large datasets such as generating and ingesting up to billions of vector embeddings.

This feature is available in all regions that support Amazon OpenSearch Ingestion and 2.17+ domains. Learn more from the documentation.

Read More for the details.

2025 10 03

GCP – Connect Spark data pipelines to Gemini and other AI models with Dataproc ML library

Tibor Kiss Cloud, Google Cloud gcp

Many data science teams rely on Apache Spark running on Dataproc managed clusters for powerful, large-scale data preparation. As these teams look to connect their data pipelines directly to machine learning models, there’s a clear opportunity to simplify the integration. But running inference on a Spark DataFrame using a model from Vertex AI typically requires custom development, making it complex to build a single, end-to-end workflow.

To solve this problem, we are developing a new open-source Python library designed to simplify AI/ML inference for Dataproc. This library connects your Apache Spark jobs to use popular ML frameworks and Vertex AI features, starting with model inference. Because the library is open-sourced, you will be able to use it directly in your application code with full transparency into its operation.

How it works

Dataproc ML is built to feel familiar to Spark users, following a SparkML-style builder pattern. You configure the model you want to use, and then call .transform() on your DataFrame. Let’s look at a few common inference use cases.

Apply Gemini models to your Spark data

You can apply generative AI models, like Gemini, to columns in your Spark DataFrame. This is useful for tasks like classification, extraction, or summarization at scale. In this example, we take a DataFrame with “city” & “country” columns and use Gemini to create a new column by providing a simple prompt.

You can test in your local environment by installing from PyPi:

code_block: <ListValue: [StructValue([(‘code’, ‘pip install dataproc-ml’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55db20>)])]>

To deploy/test at scale, create a Dataproc version 2.3-ml cluster:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud dataproc clusters create my-ml-cluster \rn –project=”YOUR_PROJECT_ID” \rn –region=”YOUR_REGION” \rn –image-version=2.3-ml-ubuntu \rn –properties=’dataproc:pip.packages=dataproc-ml==0.1”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d8e0>)])]>

Copy this example to a file gemini_spark.py.

code_block: <ListValue: [StructValue([(‘code’, ‘from pyspark.sql import SparkSessionrnfrom google.cloud.dataproc_ml.inference import GenAiModelHandlerrnrnspark = SparkSession.builder.getOrCreate()rnrn# Create a sample DataFramerndf = spark.createDataFrame([rn (“London”, “UK”),rn (“Bengaluru”, “India”),rn (“Paris”, “France”),rn (“Tokyo”, “Japan”)rn], [“city”, “country”])rnrn# Configure the model handler. It uses gemini-2.5-flash by default.rngenai_handler = GenAiModelHandler().prompt(rn “Write a short, one-line rhyming poem about the experience of visiting {city} in {country}.”rn)rnrn# Apply the model, which will output to a new `predictions` columnrngenai_handler.transform(df).show(truncate=False)rnrn# Outputrn# +———+——-+———————————————-+rn# |city |country|predictions |rn# +———+——-+———————————————-+rn# |London |UK |Big Ben’s loud chime, a magical time! |rn# |Bengaluru|India |Bengaluru’s green, a vibrant tech scene. |rn# |Paris |France |In Paris, I fell for romance at first glance. |rn# |Tokyo |Japan |In Tokyo’s vibrant pace, a smile upon my face.|rn# +———+——-+———————————————-+’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d520>)])]>

The handler is flexible to support customized options, as explained in the documentation.

code_block: <ListValue: [StructValue([(‘code’, ‘from google.cloud.dataproc_ml.inference import GenAiModelHandlerrnfrom vertexai.generative_models import GenerationConfigrnrn# Configure the model handlerrngenai_handler = (rn GenAiModelHandler()rn .prompt(“Write a short, one-line rhyming poem about the experience of visiting {city} in {country}.”)rn .model(“gemini-2.5-pro”)rn .output_col(“city_poem”)rn .generation_config(GenerationConfig(temperature=0.7))rn)’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d550>)])]>

Submit this job to your Dataproc cluster:

code_block: <ListValue: [StructValue([(‘code’, ‘gcloud dataproc jobs submit pyspark gemini_spark.py \ rn –cluster=my-ml-cluster \rn –region=”YOUR_REGION”‘), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d130>)])]>

2. Run inference with PyTorch and TensorFlow models

In addition to calling Gemini endpoints, the library also allows you to run inference with model files loaded directly from Google Cloud Storage. You can use the PyTorchModelHandler (and a similar handler for TensorFlow) to load your model weights, define a pre-processor, and run inference directly on your worker nodes. This is useful when you want to run batch inference at scale without managing a separate model serving endpoint.

code_block: <ListValue: [StructValue([(‘code’, ‘from google.cloud.dataproc_ml.inference import PyTorchModelHandlerrnrn# Get weights and transforms for the modelrnweights = ResNet50_Weights.DEFAULTrnrnimage_df = spark.read.format(“binaryFile”).load(“gs://cloud-samples-data/generative-ai/image/”)rnrndef vectorized_preprocessor(image_bytes_series: pd.Series) -> pd.Series:rn “””Applies ResNet50 transforms to a series of image bytes.”””rn return image_bytes_series.apply(rn lambda b: weights.transforms()(Image.open(io.BytesIO(b)).convert(“RGB”))rn )rnrnpytorch_handler = (rn PyTorchModelHandler()rn .model_path(“gs://<bucket>/resnet50_full_model.pt”)rn .input_cols(“content”)rn .pre_processor(vectorized_preprocess)rn .set_return_type(ArrayType(FloatType()))rn)rnrnpytorch_handler.transform(image_df).show()’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x7f487c55d970>)])]>

Built for performance

This library isn’t just a simple wrapper. It’s designed for running inference on large Dataproc clusters and includes several optimizations for inference:

Vectorized data transfer: We use pandas_udf to efficiently move data between Spark and the Python worker processes.
Connection re-use: Connections to the endpoint are re-used across partitions to reduce overhead.
Retry logic: The library automatically handles errors like HTTP 429 (resource exhausted) with exponential backoff and retries.

Get started

You can start using it today by checking out the open-source repository and reading our documentation.

Looking ahead, we plan to add the following features to this library in the coming months.

Spark Connect support: This would also allow using above functionalities within BigQuery Studio notebooks.
Vertex AI integrations: To ease inference, we plan to add more ModelHandlers to:
1. Directly call a vertex model endpoint for online inference
2. Refer to Vertex models and localize them to Spark workers
3. Refer to models hosted in Vertex Model Garden including embedding models
More Optimizations: Auto-repartition input dataframes to enhance inference runtime
Third-party integrations: Refer to open sourced models in HuggingFace

We are actively working on including this library by default in Dataproc on Google Compute Engine ML images and Google Cloud Serverless for Apache Spark runtimes.

We look forward to seeing what you build! Have feedback or feature requests to further simplify your AI/ML experience on spark? Reach us at dataproc-feedback@google.com.

Read More for the details.

2025 10 03

AWS – Amazon Kinesis Video Streams now supports IPv6 for Streams capability

Tibor Kiss AWS, Cloud AWS

Today, AWS announces Internet Protocol version 6 (IPv6) addressing support for Amazon Kinesis Video Streams (KVS). With this enhancement, KVS now offers dual-stack endpoints that let customers use both IPv4 and IPv6 addresses to stream video from millions of devices. This means that existing IPv4 implementations continue to work seamlessly while gaining the benefits of IPv6 connectivity.

As customers increasingly encounter IPv4 address exhaustion in their private networks, this enhancement delivers much-needed flexibility. Organizations can now seamlessly stream videos using IPv4, IPv6, or dual-stack clients. This advancement simplifies IPv6-based system transitions, ensures compliance requirements are met, and eliminates dependency on costly address translation equipment.

IPv6 support is available in all commercial AWS Regions where Amazon KVS is available except Ap-Southeast-1 and GovCloud regions . To learn more about Amazon KVS, refer to the developer guide.

Read More for the details.

2025 10 03

AWS – Amazon Connect now provides generative AI-powered email conversation overviews and suggested responses

Tibor Kiss AWS, Cloud AWS

Amazon Connect now provides agents with generative AI-powered email conversation overviews, suggested actions, and responses. This enables agents to handle emails more efficiently, and customers to receive faster, more consistent support. For example, when a customer emails about a refund request, Amazon Connect automatically provides key details about the customer’s purchase history, recommends a refund resolution step-by-step guide, and generates an email response to help resolve the contact quickly.

To enable this feature, add the Amazon Q in Connect block to your flows before an email contact is assigned to your agent. You can customize the outputs of your email generative AI-powered assistant by adding knowledge bases and defining your prompts to guide the AI agent with generating responses that match your company’s language, tone, and policies for consistent customer service.

This new feature is available in all AWS regions where Amazon Q in Connect is available. To learn more and get started, refer to the help documentation, pricing page, or visit the Amazon Connect website.

Read More for the details.

2025 10 03

AWS – AWS Clean Rooms now supports collaboration with cross-region data sources

Tibor Kiss AWS, Cloud AWS

Today, AWS Clean Rooms announces support for cross-region data collaboration. This launch enables companies and their partners to easily collaborate with data sources stored in different regions, without having to move, copy, or share their underlying data.

With AWS Clean Rooms support for cross-region data collaboration, organizations can collaborate with their partners by leveraging datasets stored in regions outside of their own. Companies can collaborate with data sources stored in different AWS and Snowflake Regions from where their collaboration is hosted, eliminating the need to move or replicate data across regions to collaborate with their partners. Collaboration creators can control where their analysis results are delivered by configuring a set of allowed regions, helping each collaborator comply with applicable data residency requirements and sovereignty laws. For example, a media publisher with data stored in US East (N. Virginia) can collaborate with an advertising partner whose data resides in EU Central (Frankfurt) without building additional data pipelines or sharing underlying data with one another.

With AWS Clean Rooms, customers can create a secure data clean room in minutes and collaborate with any company on AWS or Snowflake to generate unique insights about advertising campaigns, investment decisions, and research and development. For more information about the AWS Regions where AWS Clean Rooms is available, see the AWS Regions table. To learn more about collaborating with AWS Clean Rooms, visit AWS Clean Rooms.

Read More for the details.

2025 10 03

AWS – AWS End User Messaging now sends onboarding progress alerts via Slack, Email, or any other EventBridge destination

Tibor Kiss AWS, Cloud AWS

Starting today, AWS End User Messaging customers can be notified of updates to their SMS onboarding progress in Slack, Email, or any other Amazon EventBridge destination. Before this launch, tracking the status of your onboarding progress was difficult. Customers had to periodically check the status of a phone number registration in the console. Now with this launch, you can be immediately notified when phone number or sender ID registrations in your AWS account are created, submitted, denied or requires an update.

AWS End User Messaging provides developers with a scalable and cost-effective messaging infrastructure without compromising the safety, security, or results of their communications. Developers can integrate messaging to support uses cases such as one-time passcodes (OTP) at sign-ups, account updates, appointment reminders, delivery notifications, promotions and more.

Support for EventBridge for SMS is available in all AWS Regions where End User Messaging is available, see the AWS Region table.

To learn more, see AWS End User Messaging.

Read More for the details.

2025 10 03

AWS – AWS Directory Service introduces IPv6 support for Managed Microsoft AD and AD Connector

Tibor Kiss AWS, Cloud AWS

AWS Directory Service now supports IPv6 connectivity for Managed Microsoft AD and AD Connector. The IPv6 capabilities allow customers to deploy directories with IPv4-only, IPv6-only, or dual-stack configurations, helping organizations meet government mandates and standardize on next-generation Internet protocol.

The IPv6 support helps organizations meet regulatory requirements, including U.S. federal agencies’ mandate to transition to IPv6 by 2025, while eliminating dual-protocol network complexity. This enables organizations to modernize their network infrastructure and comply with evolving security standards without maintaining separate IPv4 and IPv6 network stacks.

Customers can upgrade existing IPv4-only directories to dual-stack by enabling IPv6 in VPC subnets, then adding IPv6 support through the Directory Service Management Console. IPv6 capabilities are available in all AWS Directory Service regions, with IPv6 accessible through Console, CLI, and API.

To learn more, see the AWS Directory Service documentation.

Read More for the details.

2025 10 03

AWS – EC2 Image Builder now provides enhanced capabilities for managing image pipelines

Tibor Kiss AWS, Cloud AWS

EC2 Image Builder now automatically disables pipelines after consecutive failures and allows customers to configure custom log groups for image pipelines. These capabilities address common operational needs including improved control over pipeline execution, enhanced customization options for logging requirements, and better visibility.

Image Builder pipelines are used to automate the creation, testing, and distribution of custom images across your AWS infrastructure. With the new automatic disablement feature, you can configure pipelines to stop execution after a specified number of consecutive failures, preventing creation of unnecessary resources and reducing costs from repeatedly failed builds. Additionally, you can also configure custom log groups for pipelines with specific log retention periods and encryption settings that align with your organizational policies, providing you enhanced customization options for logging and better visibility. These enhancements collectively provide greater control and efficiency in managing your image building processes.

These capabilities are available to all customers at no additional costs, in all AWS commercial regions including AWS China (Beijing) Region, operated by Sinnet, AWS China (Ningxia) Region, operated by NWCD, and AWS GovCloud (US) Regions.

You can get started from the EC2 Image Builder Console, CLI, API, CloudFormation, or CDK, and learn more in the EC2 Image Builder documentation.

Read More for the details.

2025 10 03

AWS – AWS Introduces self-service invoice correction feature

Tibor Kiss AWS, Cloud AWS

Today, AWS announces the general availability of a self-service Invoice correction feature to update AWS invoices. This launch enables all AWS customers to correct key invoice attributes—including purchase order numbers, business legal name, and addresses — on their AWS invoices and get corrected invoices instantaneously.

AWS customers can now access the new self-service Invoice correction feature directly from the AWS Billing and Cost Management console. This feature offers AWS customers a guided self-service workflow to update invoice attributes in their account settings and on select invoices. This feature gives AWS customers direct control over invoice corrections while reducing wait times and improving efficiency in managing their AWS accounts.

AWS self-service Invoice Correction feature is generally available in all AWS Regions, excluding GovCloud (US) Regions and China (Beijing) and China (Ningxia) Regions.

To get started with AWS self-service invoice correction feature, please visit the product details page.

Read More for the details.

2025 10 02

AWS – AWS Directory Service enables API-driven Managed Microsoft AD edition upgrades

Tibor Kiss AWS, Cloud AWS

AWS Directory Service now enables customers to upgrade Managed Microsoft AD from Standard to Enterprise Edition programmatically through the UpdateDirectorySetup API. The self-service edition upgrade eliminates the need for support tickets when scaling Managed Microsoft AD directories.

The API-driven Standard to Enterprise Edition upgrade removes operational barriers that previously required coordinating maintenance windows with AWS support, enabling on-demand directory scaling with automated pre-upgrade snapshots and sequential domain controller upgrades. This streamlined process ensures data protection through automatic backup creation before upgrades begin, while the sequential upgrade approach maintains directory availability throughout the process.

Organizations can now scale their directory infrastructure in response to growing user bases or expanding application requirements without the delays associated with traditional support-driven upgrade processes. The programmatic approach enables integration with existing automation frameworks and infrastructure-as-code deployments.
Directory size upgrades are available in all AWS Directory Service regions through the AWS SDK, providing consistent upgrade capabilities across global deployments.

To learn more, see the AWS Directory Service documentation and UpdateDirectorySetup API reference.

Read More for the details.

2025 10 02

AWS – Amazon Connect now supports agent screen recording for ChromeOS

Tibor Kiss AWS, Cloud AWS

Amazon Connect now provides screen recording for agents using ChromeOS devices making it easier for you to help improve their performance. With screen recording, you can identify areas for agent coaching (e.g., long contact handle duration or non-compliance with business processes) by not only listening to customer calls or reviewing chat transcripts, but also watching agents’ actions while handling a contact (i.e., a voice call, chat, or task).

Screen recording on ChromeOS is available in all the AWS Regions where Amazon Connect is already available. To learn more about screen recording, please visit the documentation and webpage. For information about screen recording pricing, visit the Amazon Connect pricing page.

Read More for the details.