AWS Elemental MediaConvert now integrates with Time-Addressable Media Store (TAMS), enabling customers to temporally reference and extract media asset segments. This capability allows MediaConvert customers to work more efficiently meet quick turn around deadlines. With TAMS integration, customers can extract highlights from live events for near real-time social media publishing, repurpose archived broadcast content into fresh programming or documentaries, and streamline media operations by connecting directly to existing broadcast infrastructure and content management systems.
This integration is designed for customers who operate their own TAMS servers—MediaConvert does not host or manage a TAMS instance. By leveraging your own TAMS deployment, MediaConvert can ingest time-based media segments on demand and use them as inputs in your encoding workflows. Whether you’re modernizing a legacy archive, building automation around editorial workflows, or enabling UGC teams to clip and publish with precision, the combination of MediaConvert and TAMS provides a powerful foundation for flexible, high-performance media processing at scale.
Apache Spark is a fundamental part of most modern lakehouse architectures, and Google Cloud’s Dataproc provides a powerful, fully managed platform for running Spark applications. However, for data engineers and scientists, debugging failures and performance bottlenecks in distributed systems remains a universal challenge.
Manually troubleshooting a Spark job requires piecing together clues from disparate sources — driver and executor logs, Spark UI metrics, configuration files and infrastructure monitoring dashboards.
What if you had an expert assistant to perform this complex analysis for you in minutes?
Accessible directly in the Google Cloud console — either from the resource page (e.g., Serverless for Apache Spark Batch job list or Batch detail page) you are investigating or from the central Cloud Assist Investigations list — Gemini Cloud Assist offers several powerful capabilities:
For data engineers: Fix complex job failures faster. A prioritized list of intelligent summaries and cross-product root cause analyses helps in quickly narrowing down and resolving a problem.
For data scientists and ML engineers: Solve performance and environment issues without deep Spark knowledge. Gemini acts as your on-demand infrastructure and Spark expert so you can focus more on models.
For Site Reliability Engineers (SREs): Quickly determine if a failure is due to code or infrastructure. Gemini finds the root cause by correlating metrics and logs across different Google Cloud services, thereby reducing the time required to identify the problem.
For big data architects and technical managers: Boost team efficiency and platform reliability. Gemini helps new team members contribute faster, describe issues in natural language and easily create support cases.
Debugging Spark applications is inherently complex because failures can stem from anywhere in a highly distributed system. These issues generally fall into two categories. First are the outright job failures. Then, there are the more insidious, subtle performance bottlenecks. Additionally, cloud infrastructure issues can cause workload failures, complicating investigations.
Gemini Cloud Assist is designed to tackle all these challenges head-on:
Gemini Cloud Assist analyzes and correlates a wide range of data, including metrics, configurations, and logs, across Google Cloud services and pinpoints the root cause of infrastructure issues and provides a clear resolution.
Gemini Cloud Assist automatically identifies incorrect or insufficient Spark and cluster configurations, and recommends the right settings for your workload.
Application Problems
Application logic related problems, inefficient code and algorithms
Gemini Cloud Assist analyzes application logs, Spark metrics, and performance data and diagnoses code errors and performance bottlenecks, and provides actionable recommendations to fix them.
Data Problems
Stage/Task failures, data-related issues
Gemini Cloud Assist analyzes Spark metrics and logs and identifies data-related issues like data skew, and provides actionable recommendations to improve performance and stability.
Gemini Cloud Assist: Your AI-powered operational expert
Let’s explore how Gemini transforms the investigation process in common, real-world scenarios.
Example 1: The slow job with performance bottlenecks
Some of the most challenging issues are not outright failures but performance bottlenecks. A job that runs slowly can impact service-level objectives (SLOs) and increase costs, but without error logs, diagnosing the cause requires deep Spark expertise.
Say a critical batch job succeeds but takes much longer than expected. There are no failure messages, just poor performance.
Manual investigation requires a deep-dive analysis in the Spark UI. You would need to manually search for “straggler” tasks that are slowing down the job. The process also involves analyzing multiple task-level metrics to find signs of memory pressure or data skew.
With Gemini assistance
By clicking Investigate, Gemini automatically performs this complex analysis of performance metrics, presenting a summary of the bottleneck.
Gemini acts as an on-demand performance expert, augmenting a developer’s workflow and empowering them to tune workloads without needing to be a Spark internals specialist.
Example 2: The silent infrastructure failure
Sometimes, a Spark job or cluster fails due to issues in the underlying cloud infrastructure or integrated services. These problems are difficult to debug because the root cause is often not in the application logs but in a single, obscure log line from the underlying platform.
Say a cluster configured to use GPUs fails unexpectedly.
The manual investigation begins by checking the cluster logs for application errors. If no errors are found, the next step is to investigate other Google Cloud services. This involves searching Cloud Audit Logs and monitoring dashboards for platform issues, like exceeded resource quotas.
With Gemini assistance
A single click on the Investigate button triggers a cross-product analysis that looks beyond the cluster’s logs. Gemini quickly pinpoints the true root cause, such as an exhausted resource quota, and provides mitigation steps.
Gemini bridges the gap between the application and the platform, saving hours of broad, multi-service investigation.
Get started today!
Spend less time debugging and more time building and innovating. Let Gemini Cloud Assist in Dataproc on Compute Engine and Google Cloud Serverless for Apache Spark be your expert assistant for big data operations.
We’re excited to announce an expansion to our Compute Flexible Committed Use Discounts (Flex CUDs), providing you with greater flexibility across your cloud environment. Your spend commitments now stretch further and cover a wider array of Google Cloud services and VM families, translating into greater savings for your workloads.
Flex CUDs are spend-based commitments that provide deep discounts on Google Cloud compute resources in exchange for a one or three-year term. This model offers maximum flexibility, automatically applying savings across a broad pool of eligible VM families and regions without being tied to a single resource.
More power, more savings with expanded coverage
We understand that modern applications are built on a diverse mix of services, from massive databases to nimble serverless functions. To better support the way you build, we’re expanding Flex CUDs to cover more of the specialized solutions and serverless solutions you use every day:
Memory-optimized VM Families: We’re bringing enhanced discounts to our memory-optimized M1, M2, M3 and the new M4 VM families. Now you can get more value from critical workloads like SAP HANA, in-memory analytics platforms and high-performing databases.
High-performance computing (HPC) VM families: For compute-intensive workloads, Flex CUDs now apply to our HPC-optimized H3 and the new H4D VM families, perfect for complex simulations and scientific research.
Cloud Run and Cloud Functions: For developers and organizations that use Cloud Run’s fully managed platform, we are extending Flex CUDs’ coverage to Cloud Run request-based billing and Cloud Run functions.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud infrastructure’), (‘body’, <wagtail.rich_text.RichText object at 0x3e449b5ac7c0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Why this matters
This expansion of Compute Flex CUDs is designed with your growth and efficiency in mind:
Maximize your spend commitments: Instead of being tied to a specific resource type or region, your committed spend can now be applied across a larger portion of your Google Cloud usage. This means less “wasted” commitment and more active savings.
Enhanced financial predictability and control: With greater coverage, you gain a clearer picture of your anticipated cloud spend, making budgeting and financial planning more predictable.
Simplified cost management: A single, flexible commitment can now cover a more diverse set of services, streamlining your financial operations and reducing the complexity of managing multiple, granular commitments.
Fuel innovation: By reducing the cost of core compute and serverless services, you free up budget that can be reinvested into innovation.
An updated Billing model
Compute Flex CUDs’ expanded coverage is made possible by the new and improved spend-based CUDs model, which streamlines how discounts are applied and provides greater flexibility. Enabling this feature triggers some experience changes to the Billing user interface, Cloud Billing export to BigQuery schema, and Cloud Commerce Consumer Procurement API. This new billing model is simpler: we directly charge the discounted rate for CUD-eligible usage, reflecting the applicable discount, instead of using credits to offset usage and reflect savings. It’s also more flexible: we apply discounts to a wider range of products within spend-based CUDs. For more, thisfollow-up resourcedetails the updates, including information on a sample export to preview your monthly bill in the new format, key CUD KPIs, new SKUs added to CUDs, and CUD product information. You can learn more about these changes in the documentation.
Availability and next steps
At Google Cloud, we’re committed to providing you with the most flexible and cost-effective solutions for your evolving cloud needs. This expansion of Compute Flex CUDs is a testament to that commitment, enabling you to build, deploy, and scale your applications with even greater financial efficiency. Starting today, you can opt-in and begin enjoying Compute Flex CUDs’ expanded scope and improved billing model.
Starting January 21, 2026, all customers will be automatically transitioned to the new spend-based model to take advantage of these expanded Flex CUDs. If you don’t opt in to multi-price CUDs, these changes will be automatically applied on January 21, 2026. New customers who create a Billing Account on or after July 15, 2025 will automatically be under the new billing model for Flex CUDs. Stay tuned for more updates as we continue to enhance our offerings to support your success on Google Cloud.
For ten years, Google Kubernetes Engine (GKE) has been at the forefront of innovation, powering everything from microservices to cloud-native AI and edge computing. To honor this special birthday, we’re challenging you to catapult your microservices into the future with cutting-edge agentic AI. Are you ready to celebrate?
Hands-on learning with GKE: This is your shot to build the next evolution of applications by integrating agentic AI capabilities on GKE. We have everything you need to get started: our microservice applications, example agents on GitHub, documentation, quickstarts, tutorials, and a webinar hosted by our experts.
Showcase your skills: You’ll have the opportunity to elevate a sample microservices application into a unique use case. Feel free to get creative with non-traditional use cases and utilize Agent Development Kit (ADK), Model Context Protocol (MCP), and the Agent2Agent (A2A) protocol for extra powerful functionality!
Think you have what it takes to win?Build an app to showcase your agents and you could potentially win:
Overall grand prize: $15,000 in USD, $3,000 in Google Cloud Credits for use with a Cloud Billing Account, A chance to win maximum of two (2) KubeCon North America conference passes in Atlanta, Georgia (November 10-13, 2025), a 1 year, no-costGoogle Developer Program Premium subscription, guest interview on the Kubernetes Podcast, video feature with the GKE team, virtual coffee with a Google team member, and social promo
Regional winners: $8,000 in USD, $1,000 in Google Cloud Credits for use with a Cloud Billing Account, video feature with the GKE team on a Google Cloud social media channel, virtual coffee with a Google team member, and social promo
Honorable mentions: $1000 in USD and $500 in Google Cloud Credits for use with a Cloud Billing Account
Unleash the power of agentic AI on GKE
GKE is built on open-source Kubernetes, but is also tightly integrated with the Google Cloud ecosystem. This makes it easy to get started with a simple application, while having the control you need for more complex application orchestration and management.
When you join the GKE Turns 10 Hackathon, your mission is to take pre-existing microservice applications (either Bank of Anthos or Online Boutique) and then integrate cutting-edge agentic AI capabilities. The goal is not to modify the core application code directly, but instead build new components that interact with its established APIs! Here is some inspiration:
Optimize important processes: Add a sophisticated AI chatbot to the Online Boutique that can query inventory, provide personalized product recommendations, or even check a user’s financial balance via an integrated Bank of Anthos API.
Streamline maintenance and mitigation: Develop an agent that intelligently monitors microservice performance on GKE, suggests troubleshooting steps, and even automates remediation.
Crucial note: Your project must be built using GKE and Google AI models such as Gemini, focusing on how the agents interact with your chosen microservice application. As long as GKE is the foundation, feel free to enhance your project by integrating other Google Cloud technologies!
Ready to start building?
Head over to our hackathon website and watch our webinar to learn more, review the rules, and register.
Tata Steel is one of the world’s largest steel producers, with an annual crude steel capacity exceeding 35 millions tons. With such a large and global output, we needed a way to improve asset availability, product quality, operational safety, and environmental monitoring. By centralizing data from diverse sources and implementing advanced analytics with Google Cloud, we’re driving a more proactive and comprehensive approach to worker safety and environmental stewardship.
To achieve these objectives, we designed and implemented a robust multi-cloud architecture. This setup unifies manufacturing data across various platforms, establishing the Tata Steel Data Lake on Google Cloud as the centralized repository for seamless data aggregation and analytics.
High level IIOT data integration architecture
Building a unified data foundation on Google Cloud
Our comprehensive data acquisition framework spans multiple plant locations, including Jamshedpur, in the eastern Indian state of Jharkhand, where we leverage Litmus and ClearBlade — both available on Google Cloud Marketplace — to collect real-time telemetry data from programmable logic controllers (PLCs) via LAN, SIM cards, and process networks.
As alternatives, we employ an internal data staging setup using SAP Business Objective Data Services (BODS) and Web APIs. We have also developed in-house smart sensors that use LoRaWAN and Web APIs to upstage data. These diverse approaches ensure seamless integration of both Operational Technology (OT) data from PLCs and Information Technology (IT) data from SAP into Google Cloud BigQuery, enabling unified and efficient data consumption.
Initially, Google Cloud IoT Core was used for ingesting crane data. Following its deprecation, we redesigned the data pipeline to integrate ClearBlade IoT Services, ensuring seamless and secure data ingestion into Google Cloud.
Our OT Data Lake is architected on Manufacturing Data Engine (MDE) and BigQuery, which provides decoupled storage and compute capabilities for scalable, cost-efficient data processing. We developed a visualization layer with hourly and daily table partitioning to support both real-time insights and long-term trend analysis, strategically archiving older datasets in Google Cloud Storage for cost optimization.
We also implemented a secure, multi-path data ingestion architecture to upstage OT data with minimal latency, utilizing Litmus and ClearBlade IoT Core. Finally, we developed custom solutions to extract OPC Data Access and OPC Unified Access data from remote OPC servers, staging it through on-premise databases before secure transfer to Google Cloud.
Together, this comprehensive architecture provides immediate access to real-time device data while facilitating batch processing of information from SAP and other on-premise databases. This integrated approach to OT and IT data delivers a holistic view of operations, enabling more informed decision-making for critical initiatives like Asset Health Monitoring, Environment Canvas, and the Central Quality Management System, across all Tata Steel locations.
Crane health monitoring with IoT data
Monitoring health parameters of crane sub devices
Overcoming legacy challenges for real-time operations
Before deploying Industrial IoT with Google Cloud, high-velocity data was not readily accessible in our central storage. Instead, the data resided in local systems, such as mediation servers and IBA, where limited storage capacity led to automatic purging after a defined retention period. This approach, combined with legacy infrastructure, significantly constrained data availability and hindered informed business decision-making. Furthermore, edge analytics and visualization capabilities were limited, and data latency remained high due to processing bottlenecks at the mediation layer.
Our Google Cloud implementation has since enabled the seamless acquisition of high-volume and high-velocity data for analyzing manufacturing assets and processes, all while ensuring compliance with security protocols across both the IT and OT layers. This initiative has enhanced operational efficiency and delivered cost savings.
Our collaboration with Google Cloud to evaluate and implement secure, more resilient manufacturing operations solutions marks a key milestone in Tata Steel’s digital transformation journey. The new unified data foundation empowered data-driven decision-making through AI-enabled capabilities, including:
Asset health monitoring
Event-based alerting mechanisms
Real-time data monitoring
Advanced data analytics for enhanced user experience
The iMEC: Powering predictive maintenance and efficiency
Tata Steel’s Integrated Maintenance Excellence Centre (iMEC) utilizes MDE to build and deploy monitoring solutions. This involves leveraging data analytics, predictive maintenance strategies, and real-time monitoring to enhance equipment reliability and enable proactive asset management.
MDE, which provides a zero code pre-configured set of Google Cloud infrastructure, acts as a central hub for ingesting, processing, and analyzing data from various sensors and systems across the steel plant, enabling the development and implementation of solutions for improved operational efficiency and reduced downtime.
With monitoring solutions helping to deliver real-time advice, maintenance teams can reduce the physical human footprint at hazardous shop floor locations while providing more ergonomic and comfortable working environments to employees compared to near-location control rooms. These solutions also help us centralize asset management and maintenance expertise, employing real-time data to enable significant operational improvements and cost-effectiveness goals, including:
Reducing unplanned outages and increasing equipment availability.
Transitioning from Time-Based Maintenance (TBM) to predictive maintenance.
Optimizing resource use, reducing power costs, and minimizing delays.
Driving safety with video analytics and cloud storage
To strengthen worker safety, we have also deployed a safety violation monitoring system powered by on-premise, in-house video analytics. Detected violation images are automatically uploaded to a Cloud Storage bucket for further analysis and reporting.
We developed and trained a video analytics model in-house, using specific samples of violations and non-violations tailored to each use case. This innovative approach has enabled us to efficiently store a growing catalog of safety violation images on Cloud Storage, harnessing its elastic storage capabilities.
Our Central Quality Management System — which ensures our data is complete, accurate, consistent, and reliable — is also built on Google Cloud, utilizing BigQuery for scalable data storage and analysis, and Looker Studio for intuitive data visualization and reporting.
Google Cloud for environmental monitoring
Tata Steel’s commitment to sustainability is evident in our comprehensive environment monitoring system, which operates entirely on the Google Cloud. Our Environment Canvas system captures a wide array of environmental Key Performance Indicators (KPIs), including stack emission and fugitive emission.
Environment Canvas – Data office & visualization architecture
Environmental parameters
We capture the data for these KPIs through sensors, SAP, and manual entries. While some sensor data from certain plants is initially sent to a different cloud or on-premises systems, we eventually transfer it to Google Cloud for unified consumption and visualization.
By leveraging the power of Google Cloud’s data and AI technologies, we are advancing operational monitoring and safety through a unified data foundation, real-time monitoring, and predictive maintenance — all enabled by iMEC. At the same time, we are reinforcing our commitment to environmental responsibility with a Google Cloud-based system that enables comprehensive monitoring and real-time reporting of environmental KPIs, delivering actionable insights for responsible operations.
Amazon Relational Database Service (RDS) Proxy now offers customers the option to use Internet Protocol version 6 (IPv6) addresses to pool and share database connections coming from an application. The existing endpoints supporting Internet Protocol version 4 (IPv4) will remain available for backwards compatibility. Additionally, customers now have the option to specify RDS Proxy target connections using either IPv4 or IPv6.
The continued growth of the Internet, particularly in the areas of mobile applications, connected devices, and IoT, has spurred an industry-wide move to IPv6. IPv6 increases the number of available addresses by several orders of magnitude so customers no longer need to manage overlapping address spaces in their VPCs.
Many applications, including those built on modern serverless architectures, may need to have a high number of open connections to the database or may frequently open and close database connections, exhausting the database memory and compute resources. Amazon RDS Proxy allows applications to pool and share database connections, improving your database efficiency and application scalability.
For information on supported database engine versions and regional availability of RDS Proxy, refer to our RDS and Aurora documentations.
Today, AWS announced the general availability of Amazon GuardDuty custom threat detection using entity lists. This new feature enhances threat detection capabilities in GuardDuty by extending support to incorporate your own domain-based threat intelligence into the service beyond originally supported custom IP list. You can now detect threats in GuardDuty using malicious domains or IP addresses defined in your custom threat list. As part of this update, GuardDuty introduces a new finding type, Impact:EC2/MaliciousDomainRequest.Custom, which is triggered when activity related to a domain in your custom threat list is detected. Additionally, you can use entity lists to suppress alerts from trusted sources, giving you greater control over your threat detection strategy.
Entity lists offer enhanced flexibility compared to the previous IP address lists. These new lists can include IP addresses, domains, or both, allowing for more comprehensive threat intelligence integration. Unlike the legacy IP list format, entity lists provides simplified permission management and avoids impacting IAM policy size limits across multiple AWS Regions, making it easier to implement and manage custom threat detection across your AWS environment.
GuardDuty custom entity list is available in all AWS Regions where GuardDuty is offered, excluding China Regions and GovCloud (US) Regions.
Amazon Aurora PostgreSQL Limitless Database is now available with PostgreSQL version 16.9 compatibility. This release contains product improvements and bug fixes made by the PostgreSQL community, along with Aurora Limitless-specific additions such as support for the hstore extension, the auto_explain extension, and various performance improvements. The hstore extension allows for storing sets of key/value pairs within a single PostgreSQL value, while the auto-explain extension logs execution plans of slow statements automatically.
Aurora PostgreSQL Limitless Database makes it easy for you to scale your relational database workloads by providing a serverless endpoint that automatically distributes data and queries across multiple Amazon Aurora Serverless instances while maintaining the transactional consistency of a single database. Aurora PostgreSQL Limitless Database offers capabilities such as distributed query planning and transaction management, removing the need for you to create custom solutions or manage multiple databases to scale. As your workloads increase, Aurora PostgreSQL Limitless Database adds compute resources while staying within your specified budget, so there is no need to provision for peak, and compute automatically scales down when demand is low.
Aurora PostgreSQL Limitless Database is available in the following AWS Regions: US East (N. Virginia, Ohio), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Hyderabad, Jakarta, Malaysia, Melbourne, Mumbai, Osaka, Seoul, Singapore, Sydney, Thailand, Tokyo), Canada (Central), Canada West (Calgary), Europe (Frankfurt, Ireland, London, Milan, Paris, Spain, Stockholm, Zurich), Israel (Tel Aviv), Mexico (Central), Middle East (Bahrain, UAE), and South America (Sao Paulo).
AWS Config now tracks resource tags for IAM policy resource types, enhancing the granularity of metadata you can capture to assess, audit, and evaluate configurations of your IAM policies.
With this enhancement, you can now track resource tags and their changes for IAM Policies directly in your Config recorder. This capability allows you to scope both Config-managed and custom rule evaluations based on resource tags, ensuring your IAM policies maintain desired configurations. Additionally, you can leverage Config aggregators to selectively aggregate IAM policies across multiple accounts using tags, streamlining your multi-account governance.
This feature is now available across all supported AWS Regions at no additional cost. Resource tags are automatically populated in Config when you record IAM policy resource types. For recording IAM policy resource type in your Config recorder, please refer our documentation.
Today, AWS announces the general availability of Organizational Notification Configurations for AWS User Notifications. This launch allows AWS Organizations users to centrally configure and view notifications across their organization. You can use the Management Account or Delegated Administrators (DAs) to configure and view notifications about accounts included in specific organizational units (OUs) or all accounts rolling up to an organization. Once configured, events from any of the member accounts will generate a notification in the Management Account. User Notifications supports up to 5 DAs.
You can use this capability to setup notifications for any supported Amazon EventBridge Event. For example, you can setup a notification configuration to send a push notification to the AWS Console Mobile Application anytime a user in any of the member accounts in your organization signs in to the console without MFA. Notifications will also be available in the Admin’s Console Notifications Center.
This new capability is available in all AWS Regions where AWS User Notifications is available.
To learn more about managing notifications across your organization with AWS User Notifications, please refer to the user guide.
AWS Backup Audit Manager now supports cross-account, cross-Region reports in Asia Pacific (Hyderabad, Jakarta, Melbourne), Europe (Spain, Zurich), and Middle East (UAE) Regions.
Now, you can use your AWS Organizations’ management or delegated administrator account to generate aggregated cross-account and cross-Region reports on your data protection policies and retrieve operational data about your backup and recovery activities. AWS Backup enables you to centralize and automate data protection policies across AWS services based on organizational best practices and regulatory standards, and AWS Backup Audit Manager is a feature within the AWS Backup service that allows you to audit and report on the compliance of your data protection policies to help you meet your business and regulatory needs.
The GDR updates address vulnerabilities described in CVE-2025-49758, CVE-2025-24999, CVE-2025-49759, CVE-2025-53727, and CVE-2025-47954. For additional information on the improvements and fixes included in these updates, please see Microsoft documentation for KB5063757 and KB5063814. We recommend that you upgrade your Amazon RDS Custom for SQL Server instances to apply these updates using Amazon RDS Management Console, or by using the AWS SDK or CLI. You can learn more about upgrading your database instance in the Amazon RDS Custom User Guide.
In Episode #6 of the Agent Factory podcast, Vlad Kolesnikov and I were joined by Keith Ballinger, VP and General Manager at Google Cloud, for a deep dive into the transformative future of software development with AI. We explore how AI agents are reshaping the developer’s role and boosting team productivity.
This post guides you through the key ideas from our conversation. Use it to quickly recap topics or dive deeper into specific segments with links and timestamps.
Keith Ballinger kicked off the discussion by redefining a term from his personal blog: “Impossible Computing.” For him, it isn’t about solving intractable computer science problems, but rather about making difficult, time-consuming tasks feel seamless and even joyful for developers.
He described it as a way to “make things that were impossible or at least really, really hard for people, much more easy and almost seamless for them.”
The conversation explored how AI’s impact extends beyond the individual developer to the entire team. Keith shared a practical example of how his teams at Google Cloud use the Gemini CLI as a GitHub action to triage issues and conduct initial reviews on pull requests, showcasing Google Cloud’s commitment to AI-powered software development.
This approach delegates the more mundane tasks, freeing up human developers to focus on higher-level logic and quality control, ultimately breaking down bottlenecks and increasing the team’s overall velocity.
The Developer’s New Role: A Conductor of an Orchestra
A central theme of the conversation was the evolution of the developer’s role. Keith suggested that developers are shifting from being coders who write every line to becoming “conductors of an orchestra.”
In this view, the developer holds the high-level vision (the system architecture) and directs a symphony of AI agents to execute the specific tasks. This paradigm elevates the developer’s most critical skills to high-level design and “context engineering”—the craft of providing AI agents with the right information at the right time for efficient software development.
The Factory Floor
The Factory Floor is our segment for getting hands-on. Here, we moved from high-level concepts to practical code with live demos from both Keith and Vlad.
Keith shared two of his open-source projects as tangible “demonstration[s] of vibe coding intended to provide a trustworthy and verifiable example that developers and researchers can use.”
Terminus: A Go framework for building web applications with a terminal-style interface. Keith described it as a fun, exploratory project he built over a weekend.
Aether: An experimental programming language designed specifically for LLMs. He explained his thesis that a language built for machines—highly explicit and deterministic—could allow an AI to generate code more effectively than with languages designed for human readability.
Keith provided a live demonstration of his vibe coding workflow. Starting with a single plain-English sentence, he guided the Gemini CLI to generate a user guide, technical architecture, and a step-by-step plan. This resulted in a functional command-line markdown viewer in under 15 minutes.
Vlad showcased a different application of AI agents: creative, multi-modal content generation. He walked through a workflow that used Gemini 2.5 Flash Image (also known as Nano Banana) and other AI tools to generate a viral video of a capybara for a fictional ad campaign. This demonstrated how to go from a simple prompt to a final video.
Inspired by Vlad’s Demo?
If you’re interested in learning how to build and deploy creative AI projects like the one Vlad showcased, the Accelerate AI with Cloud Run program is designed to help you take your ideas from prototype to production with workshops, labs, and more.
Keith explained that he sees a role for both major cloud providers and a “healthy ecosystem of startups” in solving challenges like GPU utilization. He was especially excited about how serverless platforms are adapting, highlighting that Cloud Run now offers GPUs to provide the same fast, elastic experience for AI workloads that developers expect for other applications.
In response to a question about a high-level service for orchestrating AI across multi-cloud and edge deployment, Keith was candid that he hasn’t heard a lot of direct customer demand for it yet. However, he called the area “untapped” and invited the question-asker to email him, showing a clear interest in exploring its potential.
Calling it the “billion-dollar question,” Keith emphasized that as AI accelerates development, the need for a mature and robust compliance regime becomes even more critical. His key advice was that the human review piece is more important than ever. He suggested the best place to start is using AI to assist and validate human work. For example, brainstorm a legal brief with an AI rather than having the AI write the final brief for court submission.
Baseten is one of a growing number of AI infrastructure providers, helping other startups run their models and experiments at speed and scale. Given the importance of those two factors to its customers, Baseten has just passed a significant milestone.
By leveraging the latest Google Cloud A4 virtual machines (VMs) based on NVIDIA Blackwell, and Google Cloud’s Dynamic Workload Scheduler (‘DWS’) Baseten has achieved 225% better cost-performance for high-throughput inference and 25% better cost-performance for latency-sensitive inference.
Why it matters: This breakthrough in performance and efficiency enables companies to move powerful agentic AI and reasoning models out of the lab and into production affordably. For technical leaders, this provides a blueprint for building next-generation AI products — such as real-time voice AI, search, and agentic workflows — at a scale and cost-efficiency that has been previously unattainable.
The big picture: Inference is the cornerstone of enterprise AI. As models for multi-step reasoning and decision-making demand exponentially greater compute, the challenge of serving them efficiently has become the primary bottleneck. Enter Baseten, a six-year-old Series C company that partners with Google Cloud and NVIDIA to provide enterprise companies a scalable inference platform for their proprietary models as well as open models like Gemma, DeepSeek ,and Llama, with an emphasis on performance and cost efficiency. Their success hinges on a dual strategy: maximizing the potential of cutting-edge hardware and orchestrating it with a highly optimized, open software stack.
We wanted to share more about how Baseten architected its stack — and what this new level of cost-efficiency can unlock for your inference applications.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e5d44251580>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Hardware optimization with the latest NVIDIA GPUs
Baseten delivers production-grade inference by leveraging a wide range of NVIDIA GPUs on Google Cloud, from NVIDIA T4s through the recent A4 VMs (NVIDIA HGX B200). This access to the latest hardware is critical for achieving new levels of performance.
With A4 VMs, Baseten now serves three of the most popular open-source models — DeepSeek V3, DeepSeek R1, and Llama 4 Maverick — directly on their Model APIs with over 225% better cost-performance for high throughput inference, and 25% better cost-performance for latency- sensitive inference.
In addition to its production-ready model APIs, Baseten provides additional flexibility with NVIDIA B200-powered dedicated deployments for customers seeking to run their own custom AI models with the same reliability and efficiency.
Advanced software for peak performance
Baseten’s approach is rooted in coupling the latest accelerated hardware with leading and open-source software to extract the most value possible from every chip. This integration is made possible with Google Cloud’s AI Hypercomputer, which includes a broad suite of advanced inference frameworks, including NVIDIA’s open-source software stack — NVIDIA Dynamo and TensorRT-LLM — as well as SGLang and vLLM.
Using TensorRT-LLM, Baseten optimizes and compiles custom LLMs for one of its largest AI customers, Writer. This has boosted their throughput by more than 60% for Writer’s Palmyra LLMs. The flexibility of TensorRT-LLM also enabled Baseten to develop a custom model builder that speeds up model compilation.
To serve reasoning models like DeepSeek R1 and Llama 4 on NVIDIA Blackwell GPUs, Baseten uses NVIDIA Dynamo. The combination of NVIDIA’s HGX B200 and Dynamo dramatically lowered latency and increased throughput, propelling Baseten to the top GPU performance spot on OpenRouter’s LLM ranking leaderboard.
The team leverages techniques such as kernel fusion, memory hierarchy optimization, and custom attention kernels to increase tokens per second, reduce time to first token, and support longer context windows and larger batch sizes — all while maintaining low latency and high throughput.
Building a backbone for high availability and redundancy
For mission-critical AI services, resilience is non-negotiable. Baseten runs globally across multiple clouds and regions, requiring an infrastructure that can handle ad hoc demand and outages. Flexible consumption models, such as the Dynamic Workload Scheduler within the AI Hypercomputer, help Baseten manage capacity similar to on-demand with additional price benefits. This allows them to scale up on Google Cloud if there are outages across other clouds.
“Baseten runs globally across multi-clouds and Dynamic Workload Scheduler has saved us more than once when we encounter a failure,” said Colin McGrath, head of infrastructure at Baseten. “Our automated system moves affected workloads to other resources including Google Cloud Dynamic Workload scheduler and within minutes, everyone is up and running again. It is impressive — by the time we’re paged and check-in, everything is back and healthy. This is amazing and would not be possible without DWS. It has been the backbone for us to run our business.”
Baseten’s collaboration with Google Cloud and NVIDIA demonstrates how a powerful combination of cutting-edge hardware and flexible, scalable cloud infrastructure can solve the most pressing challenges in AI inference through Google Cloud’s AI Hypercomputer.
This unique combination enables end-users across industries to bring new applications to market, such as powering agentic workflows in financial services, generating real-time audio and video content in media, and accelerating document processing in healthcare. And it’s all happening at a scale and cost that was previously unattainable.
Amazon EC2 introduces AMI Usage, providing new capabilities that allow you to track AMI consumption across AWS accounts and identify resources in your account that are dependent on particular AMIs. This enhanced visibility helps you monitor AMI utilization patterns across your AWS infrastructure and safely manage AMI deregistrations.
Up until today, you had to write custom scripts to track the use of AMIs across accounts and resources, leading to operational overhead. Now, with AMI Usage, you can generate a report that lists the accounts that are using your AMIs in EC2 instances and launch templates. You can also check utilization of any AMI within your account across multiple resources, including instances, launch templates, Image Builder recipes, and SSM parameters. These new capabilities empower you to maintain clear oversight of AMI usage across your AWS ecosystem, better manage the lifecycle of your AMIs, and optimize costs.
AMI Usage is available to all customers at no additional cost in all AWS regions including AWS China (Beijing) Region, operated by Sinnet, and AWS China (Ningxia) Region, operated by NWCD, and AWS GovCloud (US). To learn more, please visit our documentation.
Amazon Neptune Database, a fully managed graph database service, now supports Public Endpoints, allowing developers to connect directly to Neptune databases from their development desktops without complex networking configurations.
With Public Endpoints, developers can securely access their Neptune databases from outside the VPC, eliminating the need for VPN connections, bastion hosts, or other networking configurations. This feature streamlines the development process while maintaining security through existing controls like IAM authentication, VPC security groups, and encryption in transit.
Public Endpoints can be enabled for new or existing Neptune clusters, with engine version 1.4.6 or above, through the AWS Management Console, AWS CLI, or AWS SDK. When enabled, Neptune generates a publicly accessible endpoint that developers can use with standard Neptune connection methods from their development machines. This feature is available at no additional cost beyond standard Neptune pricing and is available today in all AWS Regions where Neptune Database is offered. To learn more, visit the Amazon Neptune documentation.
AWS Systems Manager Configuration Manager now supports SAP HANA, allowing you to automatically test your SAP HANA databases running on AWS against best practices defined in the AWS Well-Architected Framework SAP Lens.
Keeping SAP optimally configured requires SAP administrators to stay current with best practices from multiple sources including AWS, SAP, and operating system vendors and manually check their configurations to validate adherence. AWS Systems Manager Configuration Manager automatically assesses SAP applications running on AWS against these standards, proactively identifying misconfigurations and recommending specific remediation steps, allowing you to make the necessary changes before potential impacts to business operations. Configuration checks can be scheduled or run on-demand.
SSM for SAP Configuration Manager is available in all commercial AWS Regions.
Amazon Managed Service for Prometheus, a fully managed Prometheus-compatible monitoring service now allows you to view applied quota values and their utilization for your Amazon Managed Service for Prometheus workspaces using AWS Service Quotas and Amazon CloudWatch. This update gives you a comprehensive view of quota utilization across your workspaces.
AWS Service Quotas allows you to quickly understand your applied service quota values and request increases in a few clicks. With Amazon CloudWatch usage metrics, you can create alarms to be notified when your Amazon Managed Service for Prometheus workspaces approach applied limits and visualize usage in CloudWatch dashboards.
Usage metrics for Amazon Managed Service for Prometheus service limits are available at no additional cost and are always enabled. You can access Service Quotas and usage metrics in CloudWatch through the AWS console, AWS APIs, and CLI. These features are available in all AWS regions where Amazon Managed Service for Prometheus is generally available.
Ever worry about your applications going down just when you need them most? The talk at Cloud Next 2025, Run high-availability multi-region services with Cloud Run, dives deep into building fault tolerant and reliable applications using Google Cloud’s serverless container platform: Cloud Run.
Google experts Shane Ouchi and Taylor Money, along with Seenuvasan Devasenan from Commerzbank, pull back the curtain on Cloud Run’s built-in resilience and walk you through a real-world scenario with the upcoming Cloud Run feature called Service Health.
For the Cloud Next 2025 presentation, Shane kicked things off by discussing the baseline resilience of Cloud Run through autoscaling, a decoupled data and control plane, and N+1 zonal redundancy. Let’s break that down, starting with autoscaling.
Autoscaling to Make Sure Capacity Meets Demand
Cloud Run automatically adds and removes instances based on the incoming request load, ensuring that the capacity of a Cloud Run service meets the demand. Shane calls this hyper-elasticity, referring to Cloud Run’s ability to rapidly add container instances. Rapid autoscaling prevents the failure mode where your application doesn’t have enough server instances to handle all requests.
Note: Cloud Run lets you prevent runaway scaling by limiting the maximum number of instances.
A Decoupled Data and Control Planes Increases Resiliency
The control plane in Cloud Run is the part of the system responsible for management operations, such as deploying new revisions, configuring services, and managing infrastructure resources. It’s decoupled from the data plane. The data plane is responsible for receiving incoming user requests, routing them to container instances, and executing the application code. Because the data plane operates independently from the control plane, issues in the control plane typically don’t impact running services.
N+1 Redundancy for Both Control and Data Plane
Cloud Run is a regional service, and Cloud Run provides N+1 zonal redundancy by default. That means if any of the zones in a region experiences failures, the Cloud Run infrastructure has sufficient failover capacity (that’s the “+1”) in the same region to continue serving all workloads. This isolates your application from zone failures.
Container Probes Increase Availability
If you’re concerned with application availability, you should definitely configure liveness probes to make sure failing instances are shut down. You can configure two distinct types of container instance health checks on Cloud Run.
Startup probe: Confirms that a new instance has successfully started and is ready to receive requests
Liveness probe: Monitors if a running instance remains healthy and able to continue processing requests. This probe is optional, but enabling it allows Cloud Run to automatically remove faulty instances
100% Availability is Unrealistic
Some applications are so important that you want them to always be available. While 100% availability is unrealistic, you can make them as fault tolerant as possible. Getting that right depends on your application architecture and on the underlying platforms and services you use. Cloud Run has several features that increase its baseline resilience, but there’s more you can do to make your application more resilient.
Going Beyond Zonal Redundancy
Since Cloud Run is a regional service, providing zonal redundancy, developers have to actively architect their application to be resilient against regional outages. Fortunately, Cloud Run already supports multi-regional deployments. Here’s how that works:
Deploy a Cloud Run service to multiple regions, each using the same container image and configuration.
Create a global external application load balancer, with one backend and a Serverless Network Endpoint Group (NEG) per Cloud Run service.
Use a single entrypoint with one global external IP address.
Here’s how that looks like in a diagram:
In case you’re not familiar, a Serverless Network Endpoint Group (NEG) is a load balancer backend configuration resource that points to a Cloud Run service or an App Engine app.
Architecting Applications for Regional Redundancy Can Be Challenging
While deploying in multiple regions is straightforward with Cloud Run, the challenge lies in architecting your application in such a way that individual regional services can fail without losing data or impacting services in other regions.
A Preview of Service Health for Automated Regional Failover
If you set up a multi-regional Cloud Run architecture today, requests are always routed to the region closest to them, but they are not automatically routed away if a Cloud Run service becomes unavailable, as shown in the following illustration:
The upcoming feature Service Health adds automatic traffic failover of traffic from one region to another if a service in one region becomes unavailable:
Enabling Service Health
As of August 2025, Service Health is not yet publicly available (it’s in private preview), but I’m hopeful that’ll change soon. One thing to keep in mind is that the feature might still change until it’s generally available. You can sign up to get access by filling in this request form.
Once you have access, you can enable Service Health on a multi-regional service in two steps:
Add a container instance readiness probe to each Cloud Run service.
Set minimum instances to 1 on each Cloud Run service.
That’s really all there is to it. No additional load balancer configuration is required.
Readiness Probes Are Coming to Cloud Run
As part of Service Health, readiness probes are introduced to Cloud Run. A readiness probe periodically checks each container instance via HTTP. If a readiness probe fails, Cloud Run stops routing traffic to that instance until the probe succeeds again. In contrast, a failing liveness probe causes Cloud Run to shut down the unhealthy instance.
Service Health uses the aggregate readiness state of all container instances in a service to determine if the service itself is healthy or not. If a large percentage of the containers is failing, it marks the service as unhealthy and routes traffic to a different region.
A Live Demo at Cloud Next 2025
In a live demo, Taylor deployed the same service to two regions (one near, one far away). He then sent a request via a Global External Application Load Balancer (ALB). The ALB correctly routed the request to the service in the closest region.
After configuring the closest service to flip between failing and healthy every 30 seconds, he demonstrated that the traffic didn’t failover. That’s the current behavior – so far nothing new.
The next step in his demo was enabling Service Health through enabling minimum instances and a readiness probe on each service. For deploying the config changes to the two services, Taylor used a new flag in the Cloud Run gcloud interface: the --regions flag in gcloud run deploy. It’s a great way to deploy the same container image and configuration to multiple regions at the same time.
With the readiness probes in place and minimum instances set, Service Health started detecting service failure and moved over the traffic to the healthy service in the other region. I thought that was a great demo!
Next Steps
In this post, you learned about Cloud Run’s built-in fault tolerance mechanisms, such as autoscaling and zonal redundancy, how to architect multi-region services for higher availability, and got a preview of the upcoming Service Health feature for automated regional failover.
While the Service Health feature is still in private preview, you can sign up to get access by filling in this request form.
AWS today launched three new condition keys that help administrators govern API keys for Amazon Bedrock. The new condition keys help you control the generation, expiration, and the type of API keys allowed. Amazon Bedrock supports two types of API keys: short-term API keys valid for up to 12 hours or long-term API keys which are IAM service-specific credentials for use with Bedrock only.
The new iam:ServiceSpecificCredentialServiceName condition key lets you control what target AWS services are allowed when creating IAM service-specific credentials. For example, you could allow the creation of Bedrock long-term API keys but not credentials for AWS CodeCommit or Amazon Keyspaces. The new iam:ServiceSpecificCredentialAgeDays condition key lets you control the maximum duration of Bedrock long-term API keys at creation. The new bedrock:BearerTokenType condition key let’s you allow or deny Bedrock requests based on whether the API key is short-term or long-term.
These new condition keys are available in all AWS Regions. To learn more about using the new condition keys, visit the IAM User Guide or Amazon Bedrock User Guide.