Amazon ECS customers can configure automated failure detection and remediation for their ECS service rolling updates using deployment circuit breaker and CloudWatch Alarms. Deployment circuit breaker automatically detects task launch failures while CloudWatch alarms allow you to detect issues that result in degradation in infrastructure (e.g. cpu utilization) or performance (e.g. response latency) metrics. Previously, in scenarios where a failing deployment was not detected by either of these mechanisms, customers had to manually trigger a new deployment to roll back to a previous safe state. With today’s release, customers can simply use the new stopDeployment API action and ECS automatically rolls back the service to the last service revision that reached steady state.
You can use the new stop-deployment API to rollback deployments for your ECS services using the AWS Management Console, API, SDK, and CLI in all AWS Regions. To learn more, visit our documentation.
Amazon Bedrock Data Automation (BDA) now supports extraction of custom GenAI-powered insights from audio by specifying the desired output configuration through blueprints. BDA is a GenAI-powered capability of Bedrock that streamlines the development of generative AI applications and automates workflows involving documents, images, audio, and videos. Developers can now extract custom insights from audio using blueprints, which contain their desired output including a list of field names, the data format in which the response for the field is to be extracted as well as natural language instructions for each field. Developers can get started with blueprints by either using a catalog blueprint or creating a blueprint tailored to their needs.
With this launch, developers can extract custom insights such as summaries, key topics, intents, and sentiment from a variety of voice conversations such as customer calls, clinical discussions, and meetings. Insights from BDA can be used to improve employee productivity, reduce compliance costs, and enhance customer experience, among others. For example, customers can improve productivity of sales agents by extracting insights such as summaries, key action items, and next steps from conversations between sales agents and customers.
Amazon Bedrock Data Automation is available in US West (Oregon) and US East (N. Virginia) AWS Regions.
AWS Billing and Cost Management Console’s Payments page now features a Payments Account Summary that helps you view your AWS account’s financial status more efficiently. Critical account balance information is now summarized in a single, easy-to-access location on your Payments page.
Payments Account Summary shows your total outstanding balance, including current and past due amounts, alongside your total unapplied funds from credit memos, unapplied cash, and Advance Pay balance. You can use these unapplied funds to pay outstanding invoices by sending remittance instructions via the email address on your invoice, or by contacting AWS Customer Service. Customers with Advance Pay will have their balances automatically applied to eligible future invoices.
To start reviewing your Payments Account Summary, visit the Payments page in the AWS Billing and Cost Management Console.
Amazon Connect now supports Outbound Campaign calling to Poland in the Europe (Frankfurt) and Europe (London) regions, making it easier to proactively communicate across voice, SMS, and email for use cases such as delivery notifications, marketing promotions, appointment reminders, or debt collection, etc. Outbound Campaigns offers real-time audience segmentation using unified customer data from Customer Profiles, along with an intuitive UI for campaign management, targeting, and analytics. It eliminates the need for complex integrations or direct AWS Console access. Outbound Campaigns can be enabled within the AWS Connect Console.
With Outbound Campaigns, Amazon Connect becomes the only CCaaS platform offering native, seamless support for both inbound and outbound engagement across voice and digital channels in a single, business-friendly application. To learn more, visit our webpage.
AWS Marketplace now supports software as a service (SaaS) products deployed on AWS, on other cloud infrastructures, and on-premises. This will allow independent software vendors to list more SaaS products in AWS Marketplace, offering customers a broader selection of products.
By listing SaaS products in AWS Marketplace, sellers can streamline their sales processes and scale operations more efficiently. Customers can now identify products, including SaaS products, that are 100% deployed on AWS infrastructure with a new “Deployed on AWS” badge in AWS Marketplace. The badge is visible on product detail pages, and customers can also see whether products are “Deployed on AWS” on procurement pages. Products with the “Deployed on AWS” badge leverage the strong security posture and operational excellence of AWS infrastructure, can be deployed quickly, and may qualify for additional AWS customer benefits.
This feature is available in all AWS Regions where AWS Marketplace is available.
To learn more about the expansion of the SaaS product catalog and “Deployed on AWS” badge, read this blog. If you are a seller and want to learn more about the SaaS product listing guidelines, visit the AWS Marketplace Seller Guide.
Across industries, enterprises need efficient and proactive solutions. Imagine frontline professionals using voice commands and visual input to diagnose issues, access vital information, and initiate processes in real-time. The Gemini 2.0 Flash Live API empowers developers to create next-generation, agentic industry applications.
This API extends these capabilities to complex industrial operations. Unlike solutions relying on single data types, it leverages multimodal data – audio, visual, and text – in a continuous livestream. This enables intelligent assistants that truly understand and respond to the diverse needs of industry professionals across sectors like manufacturing, healthcare, energy, and logistics.
In this post, we’ll walk you through a use case focused on industrial condition monitoring, specifically motor maintenance, powered by Gemini 2.0 Flash Live API. The Live API enables low-latency bidirectional voice and video interactions with Gemini. With this API we can provide end users with the experience of natural, human-like voice conversations, and with the ability to interrupt the model’s responses using voice commands. The model can process text, audio, and video input, and it can provide text and audio output. Our use case highlights the API’s advantages over conventional AI and its potential for strategic collaborations.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7fa064a370>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Demonstrating multimodal intelligence: A condition monitoring use case
The demonstration features a live, bi-directional multimodal streaming backend driven by Gemini 2.0 Flash Live API, capable of real-time audio and visual processing, enabling advanced reasoning and life-like conversations. Utilizing the API’s agentic and function calling capabilities alongside Google Cloud services allows for building powerful live multimodal systems with a clean, mobile-optimized user interface for factory floor operators. The demonstration uses a motor with a visible defect as a real-world anchor.
Real-time visual identification: Pointing the camera at a motor, Gemini identifies the model and instantly summarizes relevant information from its manual, providing quick access to crucial equipment details.
Real-time visual defect identification: With a voice command like “Inspect this motor for visual defects,” Gemini analyzes the live video, identifies and localizes the defect, and explains its reasoning.
Streamlined repair initiation: Upon identifying defects, the system automatically prepares and sends an email with the highlighted defect image and part information, directly initiating the repair process.
Real-time audio defect identification: Analyzing pre-recorded audio of healthy and defective motors, Gemini accurately distinguishes the faulty one based on its sound profile and explains its analysis.
Multimodal QA on operations: Operators can ask complex questions about the motor while pointing the camera at specific components. Gemini intelligently combines visual context with information from the motor manual to provide accurate voice-based answers.
Under the hood: The technical architecture
The demonstration leverages the Gemini Multimodal Livestreaming API on Google Cloud Vertex AI. The API manages the core workflow and agentic function calling, while the regular Gemini API handles visual and audio feature extraction.
The workflow involves:
Agentic function calling: The API interprets user voice and visual input to determine the desired action.
Audio defect detection: Upon user intent, the system records motor sounds, stores them in GCS, and triggers a function that uses a prompt with examples of healthy and defective sounds, analyzed by the Gemini Flash 2.0 API to diagnose the motor’s health.
Visual inspection: The API recognizes the intent to detect visual defects, captures images, and calls a function that uses zero-shot detection with a text prompt, leveraging the spatial understanding of the Gemini Flash 2.0 API to identify and highlight defects.
Multimodal QA: When users ask questions, the API identifies the intent for information retrieval, performs RAG on the motor manual, combines it with multimodal context, and uses the Gemini API to provide accurate answers.
Sending repair orders: Recognizing the intent to initiate a repair, the API extracts the part number and defect image, using a pre-defined template to automatically send a repair order via email.
Such a demo can be easily built with minimal custom integration, by referring to theguide here, and incorporating the features mentioned in the diagram above. The majority of the effort would be in adding custom function calls for various use cases.
Key capabilities and industrial benefits with cross-industry use cases
Real-time multimodal processing: The API’s ability to simultaneously process live audio and visual streams provides immediate insights in dynamic environments, crucial for preventing downtime and ensuring operational continuity.
Use case: In healthcare, a remote medical assistant could use live video and audio to guide a field paramedic, receiving real-time vital signs and visual information to provide expert support during emergencies.
Advanced audio & visual reasoning: Gemini’s sophisticated reasoning interprets complex visual scenes and subtle auditory cues for accurate diagnostics.
Use Case: In manufacturing, AI can analyze the sounds and visuals of machinery to predict failures before they occur, minimizing production disruptions.
Agentic function calling for automated workflows: The API’s agentic nature enables intelligent assistants to proactively trigger actions, like generating reports or initiating processes, streamlining workflows.
Use case: In logistics, a voice command and visual confirmation of a damaged package could automatically trigger a claim process and notify relevant parties.
Seamless integration and scalability: Built on Vertex AI, the API integrates with other Google Cloud services, ensuring scalability and reliability for large-scale deployments.
Use case: In agriculture, drones equipped with cameras and microphones could stream live data to the API for real-time analysis of crop health and pest detection across vast farmlands.
Mobile-optimized user experience: The mobile-first design ensures accessibility for frontline workers, allowing interaction with the AI assistant at the point of need using familiar devices.
Use case: In retail, store associates could use voice and image recognition to quickly check inventory, locate products, or access product information for customers directly on the store floor.
Proactive maintenance and efficiency gains: By enabling real-time condition monitoring, industries can shift from reactive to predictive maintenance, reducing downtime, optimizing asset utilization, and improving overall efficiency across sectors.
Use case: In the energy sector, field technicians can use the API to diagnose issues with remote equipment like wind turbines through live audio and visual streams, reducing the need for costly and time-consuming site visits.
Get started
Explore the cutting edge of AI interaction with the Gemini Live API, as showcased by this solution. Developers can leverage its codebase – featuring low-latency voice, webcam/screen integration, interruptible streaming audio, and a modular tool system via Cloud Functions – as a robust starting point. Clone the project, adapt the components, and begin creating transformative, multimodal AI solutions that feel truly conversational and aware. The future of the intelligent industry is live, multimodal, and within reach for all sectors.
For AI developers building cutting-edge applications with large model sizes, a reliable foundation is non-negotiable. You need your AI to perform consistently, delivering results without hiccups, even under pressure. This means having dedicated resources that won’t get bogged down by other users’ activity. While existing Vertex AI Prediction Endpoints – managed pools of resources to deploy AI models for online inference – provide a capable serving solution, developers need better ways to reach consistent performance and resource isolation in case of shared resource contention.
Today, we are pleased to announce Vertex AI Prediction Dedicated Endpoints, a new family of Vertex AI Prediction endpoints, designed to address the needs of modern AI applications, including those related with large-scale generative AI models.
Dedicated endpoint architected for generative AI and large models
Serving generative AI and other large-scale models introduces unique challenges related to payload size, inference time, interactivity, and performance demands. The new Vertex AI Prediction Dedicated Endpoints have been specifically engineered to help you build more reliably with the following new integrated features:
Native support for streaming inference: Essential for interactive applications like chatbots or real-time content generation, Vertex AI Endpoints now provide native support for streaming, simplifying development and architecture, via the following APIs:
streamRawPredict: Utilize this dedicated API method for bidirectional streaming to send prompts and receive sequences of responses (e.g., tokens) as they become available.
OpenAI Chat Completion: To facilitate interoperability and ease migration, endpoints serving compatible models can optionally expose an interface conforming to the widely used OpenAI Chat Completion streaming API standard.
gRPC protocol support: For latency-sensitive applications or high-throughput scenarios often encountered with large models, endpoints now natively support gRPC. Leveraging HTTP/2 and Protocol Buffers, gRPC can offer performance advantages over standard REST/HTTP.
Customizable request timeouts: Large models can have significantly longer inference times. We now provide the flexibility, via API, to configure custom timeouts for prediction requests, accommodating a wider range of model processing durations beyond the default settings.
Optimized resource handling: The underlying infrastructure is designed to better handle the resource demands (CPU/GPU, memory, network bandwidth) of large models, contributing to the overall stability and performance, especially when paired with Private Endpoints.
The newly integrated capabilities of Vertex AI Prediction Dedicated Endpoints offer a unified and robust serving solution tailored for demanding modern AI workloads. From today, Vertex AI Model Garden will use Vertex AI Prediction Dedicated Endpoints as the standard serving method for self-deployed models.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e7fac8ca580>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Optimized networking via Private Service Connect (PSC)
While Dedicated Endpoints Public remain available for models accessible over the public internet, we are enhancing networking options on Dedicated Endpoints utilizing Google Cloud Private Service Connect (PSC). The new DedicatedEndpoints Private (via PSC)provide a secure and performance-optimized path for prediction requests. By leveraging PSC, traffic routes entirely within Google Cloud’s network, offering significant benefits:
Enhanced security: Requests originate from within your Virtual Private Cloud (VPC) network, eliminating public internet exposure for the endpoint.
Improved performance consistency: Bypassing the public internet reduces latency variability.
Reduced performance interference: PSC facilitates better network traffic isolation, mitigating potential “noisy neighbor” effects and leading to more predictable performance, especially for demanding workloads.
For production workloads with strict security requirements and predictable latency, Private Endpoints using Private Service Connect are the recommended configuration.
How Sojern is using the new Vertex AI Prediction Dedicated Endpoints to serve models at scale
Sojern is a marketing company focusing on the hospitality industry, matching potential customers to travel businesses around the globe. As part of their growth plans, Sojern turned to Vertex AI. Leaving their self-managed ML stack behind, Sojern can focus more on innovation, while scaling out far beyond their historical footprint.
Given the nature of Sojern’s business, their ML deployments follow a unique deployment model, requiring several high throughput endpoints to be available and agile at all times, allowing for constant model evolution. Using Public Endpoints would cause rate limiting and ultimately degrade user experience; moving to a Shared VPC model would have required a major design change for existing consumers of the models.
With Private Service Connect (PSC) and Dedicated Endpoint, Sojern avoided hitting the quotas / limits enforced on Public Endpoints, while also avoiding a network redesign to accommodate Shared VPC.
The ability to quickly promote tested models, take advantage of Dedicated Endpoint’s enhanced featureset, and improve latency for their customers strongly aligned with Sojern’s goals. The Sojern team continues to onboard new models, always improving accuracy and customer satisfaction, powered by Private Service Connect and Dedicated Endpoint.
Get started
Are you struggling to scale your prediction workloads on Vertex AI? Check out the resources below to start using the new Vertex AI Prediction Dedicated Endpoints:
Your experience and feedback are important as we continue to evolve Vertex AI. We encourage you to explore these new endpoint capabilities and share your insights through Google Cloud community forum.
Today, AWS announces the preview of the Amazon Q Developer integration in GitHub. With this launch, developers can use the power of Amazon Q Developer agents for feature development, code review, and Java transformation within GitHub.com and GitHub Enterprise Cloud projects to streamline their developer experience.
After installing the Amazon Q Developer application from GitHub, developers can use labels to assign issues to Amazon Q Developer. Then, Amazon Q Developer agents automatically implement new features, generate bug fixes, run code reviews on new pull requests, and modernize legacy Java applications, all within the GitHub projects. While generating new code, the agents will automatically use any pull request workflows, refining the solution and ensuring all checks are passing. Developers can also collaborate with the agents by directly commenting on the pull request, and Amazon Q Developer will respond with improvements, allowing all teammates to stay in the loop. By bringing Amazon Q Developer into GitHub, development teams can confidently deliver high-quality software faster while maintaining their organization’s security and compliance standards.
The Amazon Q Developer integration is available on GitHub, and you can get started today for free—no AWS account needed. To learn more, check out the Amazon Q Developer Integrations page or read the blog.
When’s the last time you watched a race for the braking?
It’s the heart-pounding acceleration and death-defying maneuvers that keep most motorsport fans on the edge of their seats. Especially when it comes to Formula E — and really all EVs — the explosive, near-instantaneous acceleration of an electric motor is part of the appeal.
A less considered, yet no less important feature, is how EVs can regeneratively brake, turning friction into fuel. Part of Formula E’s mission is to make EVs a compelling automotive choice for consumers, not just world-class racers; highlighting this powerful aspect of the vehicles has become a priority. The question remained: How do you get others to feel the same exhilaration from deceleration?
The answer came from the mountains above Monaco, as well as some prompts in Gemini 2.5.
In the lead up to the Monaco E-Prix, Formula E and Google undertook a project dubbed Mountain Recharge. The challenge: Whether a Formula E GENBETA race car, starting with only 1% battery, could regenerate enough energy from braking during a descent through France’s coastal Alps to then complete a full lap of the iconic Monaco circuit.
More than just a stunt, this experiment is testing the boundaries of technology — and not just in EVs, but on the cloud, too. Without the live analytics and plenty of AI-powered planning, the Mountain Recharge might not have come to pass. In fact, AI even helped determine which mountain pass would be best suited for this effort. (Read on to find out which one, and see if we made it to the bottom.)
Mountain Recharge is exciting not only for thrills on the course but also the potential it shows for AI across industries. In addition to its role in helping to execute tasks, AI proved valuable to the brainstorming, experimentation, and rapidfire simulations that helped get Mountain Recharge to the finish line.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e50bafb0cd0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Planning the charge up the mountain
Before even setting foot or wheel to the course, the team at Formula E and Google Cloud turned to Gemini to try and figure out if such an endeavor was possible.
To answer the fundamental question of feasibility, the team entered a straightforward prompt into Google’s AI Studio: “Starting with just 1% battery, could the GENBETA car potentially generate enough recharge by descending a high mountain pass to do a lap of the Circuit of Monaco?”
The AI Studio validator, running Gemini 2.5 Pro with its deep reasoning functionality, analyzed first-party data that had been uploaded by Formula E on the GENBETA’s capabilities; we then grounded the model with Google Search to further improve accuracy and reliability by connecting to the universe of information available online.
AI Studio shared its “thinking” in a detailed eight-step process, which included identifying the key information needed; consulting the provided documents; gathering external information through a simulated search; performing calculations and analysis; and finally synthesizing the answer based on the core question.
The final output: “theoretically feasible.” In other words, the perfect challenge.
Navigating the steep turns above Monaco helped generate plenty of power for Mountain Recharge.
Still working in AI Studio, we then used a new feature, the ability to build custom apps such as the Maps Explorer, to determine the best route, which turned out to be theCol de Braus. AI Studio then mapped out a route for the challenge. This rigorous, data-backed validation, facilitated by AI Studio and Gemini’s ability to incorporate technical specifications and estimations, transformed the project from a speculative what-if into something Formula E felt confident attempting.
AI played an important role away from the course, as well. To aid in coordination and planning, teams at Formula E and Google Cloud used NotebookLM to digest the technical regulations and battery specifications and locate relevant information within them, which, given the complexity of the challenge and the number of parties involved, helped ensure cross-functional teams were kept up to date and grounded with sourced data to help make informed decisions.
Smart cars, smart drivers, and a smartphone
During the mountain descent, real-time monitoring of the car’s progress and energy regeneration would be crucial. Firebase and BigQuery were instrumental in visualizing this real-time telemetry. Data from both multiple sensors and Google Maps was streamed to BigQuery, Google Cloud’s data warehouse, from a high-performance mobile phone connected to the car (a Pixel 9 was well suited to the task).
This data stream proved to be yet another challenge to overcome, because of the patchy mobile signal in the mountainous terrain of the Maritime Alps. When data couldn’t be sent, it was cached locally on the phone until the signal was available again.
BigQuery’s capacity for real-time data ingestion and in-platform AI model creation enabled speedy analysis and the calculation of essential metrics. A web-based dashboard was developed using Firebase that connected to BigQuery to display both data and insights. AI Studio greatly facilitated the development of the application by translating a picture of a dashboard mockup into fully functional code.
“From figuring out if our crazy Mountain Recharge idea was even possible, to giving us live insights during the descent, AI was our guide,” said Alex Aidan, Formula E’s VP of Marketing. “It’s what turned an ambitious ‘what if’ into a reality we could track moment by moment.”
After completing its descent, the car stored up enough energy that it is expected to complete its lap of the Monaco circuit on Saturday, as part of the E-Prix’s pre-race festivities.
A different kind of push start.
Benefits beyond the finish line
Both the success and the development of the Mountain Recharge campaign offer valuable lessons to others pursuing ambitious projects. It shows that AI doesn’t have to be central to a project — it can be just as powerful at facilitating and optimizing something we’ve been doing for years, like racing cars. Our results in the Mountain Recharge only underscores the potential benefits of AI for a wide range of industries:
Enhanced planning and exploration: Just as Gemini helped Formula E explore unconventional ideas and identify the optimal route, businesses can leverage large language models for innovative problem-solving, market analysis, and strategic planning, uncovering unexpected angles and accelerating the journey from “what if” to “we can do that”.
Streamlined project management: NotebookLM’s ability to centralize and organize vast amounts of information demonstrates how AI can significantly improve efficiency in complex projects, from logistics and resource allocation to research and compliance. This reduces the risk of errors and ensures smoother coordination across teams.
Data-driven decision making: The real-time data analysis capabilities showcased in the Mountain Recharge underscore the power of cloud-based data platforms like BigQuery. Organizations can leverage these tools to gain immediate insights from their data, enabling them to make agile adjustments and optimize performance on the fly. This is invaluable in dynamic environments where rapid responses are critical.
Deeper understanding of complex systems: By applying AI to analyze intricate data streams, teams can gain a more profound understanding of the factors influencing performance.
Such capabilities certainly impressed James Rossiter, a former Formula E Team Principal, current test driver, and broadcaster for the series. “I was really surprised at the detail of the advice and things to consider,” Rossiter said. “We always talk about these things as a team, but as this is so different to racing, I had to totally rethink the drive.”
The Formula E Mountain Recharge campaign is more than just an exciting piece of content; it’s a testament to the power of human ingenuity amplified by intelligent technology. It’s also the latest collaboration between Formula E and Google Cloud and our shared commitment to use AI to push the boundaries of what’s possible in the sport in the sport and in the world.
We’ve already developed an AI-powered digital driving coach to help level the field for EV racing. Now, with the Mountain Recharge, we can inspire everyday drivers well beyond the track with the capabilities of electric vehicles.
It’s thinking big, even if it all starts with a simple prompt on a screen. You just have to ask the right questions, starting with the most important ones: Is this possible, and how can we make it so?
Today, AWS Organizations is making resource control policies (RCPs) available in both AWS GovCloud (US-West) and AWS GovCloud (US-East) Regions. RCPs help you centrally establish a data perimeter across your AWS environment. With RCPs, you can centrally restrict external access to your AWS resources at scale.
RCPs are a type of authorization policy in AWS Organizations that you can use to centrally enforce the maximum available permissions for resources in your organization. For example, an RCP can help enforce the requirement that “no principal outside my organization can access Amazon S3 buckets in my organization,” regardless of the permissions granted through individual S3 bucket policies.
AWS Graviton4-based R8g database instances are now generally available for Amazon Aurora with PostgreSQL compatibility and Amazon Aurora with MySQL compatibility in the AWS GovCloud (US-West) Region. R8g instances offer larger instance sizes, up to 48xlarge and features an 8:1 ratio of memory to vCPU, and the latest DDR5 memory. Graviton4-based instances provide up to a 40% performance improvement and up to 29% price/performance improvement for on-demand pricing over Graviton3-based instances of equivalent sizes on Amazon Aurora databases, depending on database engine, version, and workload.
AWS Graviton4 processors are the latest generation of custom-designed AWS Graviton processors built on the AWS Nitro System. R8g DB instances are available with new 24xlarge and 48xlarge sizes. With these new sizes, R8g DB instances offer up to 192 vCPU, up to 50Gbps enhanced networking bandwidth, and up to 40Gbps of bandwidth to the Amazon Elastic Block Store (Amazon EBS).
Amazon Aurora is designed for unparalleled high performance and availability at global scale with full MySQL and PostgreSQL compatibility. It provides built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication, and integrations with other AWS services. To get started with Amazon Aurora, take a look at our getting started page.
Amazon Elastic Container Registry (ECR) announces IPv6 support for API and Docker/OCI endpoints for both ECR and ECR Public. This makes it easier to standardize on IPv6 and remove IP address scalability limitations for your container build, deployment, and orchestration infrastructure.
With today’s launch, you can pull your private or public ECR images via the AWS SDK or Docker/OCI CLI using ECR’s new dual-stack endpoints which support both IPv4 and IPv6. When you make a request to an ECR dual-stack endpoint, the endpoint resolves to an IPv4 or an IPv6 address, depending on the protocol used by your network and client. This helps you meet IPv6 compliance requirements, and modernize your applications without expensive network address translation between IPv4 and IPv6 addresses.
ECR’s new dual-stack endpoints are generally available in all AWS commercial and AWS GovCloud (US) regions at no additional cost. Currently, ECR’s dual-stack endpoints do not serve AWS PrivateLink traffic originating from your Amazon Virtual Private Cloud (VPC). To get started with ECR IPv6, visit ECR documentation or ECR Public documentation.
AWS Graviton3-based R7g database instances are now generally available for Amazon Aurora with PostgreSQL compatibility and Amazon Aurora with MySQL compatibility in Middle East (Bahrain) and AWS GovCloud (US-West) Regions. Graviton3 instances provide up to 30% performance improvement over Graviton2 instances for Aurora depending on database engine, version, and workload.
Graviton3 processors offer several improvements over Graviton2 processors. Graviton3-based R7g are the first AWS database instances to feature the latest DDR5 memory, which provides 50% more memory bandwidth compared to DDR4, enabling high-speed access to data in memory. R7g database instances offer up to 30Gbps enhanced networking bandwidth and up to 20 Gbps of bandwidth to the Amazon Elastic Block Store (Amazon EBS).
You can launch Graviton3 R7g database instances in the Amazon RDS Management Console or using the AWS CLI. Graviton3 is supported by Aurora MySQL version 3.03.1 and higher, and Aurora PostgreSQL version 13.10 and higher, Aurora PostgreSQL 14.7 and higher, and Aurora PostgreSQL 15.2 and higher. Upgrading a database instance to Graviton3 requires a simple instance type modification. For more details, refer to the Aurora documentation.
Amazon Aurora is designed for unparalleled high performance and availability at global scale with full MySQL and PostgreSQL compatibility. It provides built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication, and integrations with other AWS services. To get started with Amazon Aurora, take a look at our getting started page.
Amazon Relational Database Service (Amazon RDS) for PostgreSQL, MySQL, and MariaDB now supports AWS Graviton2-based T4g database instances in Asia Pacific (Malaysia) region. T4g database instances provide a baseline level of CPU performance, with the ability to burst CPU usage at any time for as long as required. For complete information on pricing and regional availability, please refer to the Amazon RDS pricing page.
T4g database instances are available on Amazon RDS for All PostgreSQL 17, 16, 15, 14, and 13 versions; and 12.7 and higher 12 versions. T4g database instances are available on Amazon RDS for MySQL versions 8.4 and 8.0, and Amazon RDS for MariaDB versions11.4, 10.11, 10.6, 10.5, and 10.4. You can upgrade to T4g by modifying the database instance type to T4g using the AWS Management Console or AWS CLI. For more details, refer to the Amazon RDS User Guide.
Amazon Relational Database Service (Amazon RDS) for PostgreSQL, MySQL, and MariaDB now supports M7i database (DB) instances in Asia Pacific (Jakarta), South America (Sao Paulo), AWS GovCloud (US-East) and AWS GovCloud (US-West) regions. R7i DB instances are now supported in South America (Sao Paulo), AWS GovCloud (US-East) and AWS GovCloud (US-West) Regions. M7i and R7i are the latest Intel based offering and are available with a new maximum instance size of 48xlarge, which brings 50% more vCPU and memory than the maximum size of M6i and R6i instance types.
M7i and R7i DB instances are available for Amazon RDS for PostgreSQL version 17.1 and higher, 16.1 and higher, 15.4 and higher, 14.9 and higher, and 13.11 and higher. M7i and R7i DB instances are also available for Amazon RDS for MySQL version 8.0.32 and higher, and Amazon RDS for MariaDB version 11.4, 10.11, 10.6, 10.5, and 10.4.
For complete information on pricing and regional availability, please refer to the Amazon RDS pricing page. Get started by creating any of these fully managed database instance using the Amazon RDS Management Console. For more details, refer to the Amazon RDS User Guide.
Amazon Aurora with MySQL compatibility and PostgreSQL compatibility now supports R7i database instances in AWS GovCloud (US-East) and AWS GovCloud (US-West) Regions. R7i database instances are powered by custom 4th Generation Intel Xeon Scalable processors. R7i instances offer larger instance sizes, up to 48xlarge and features an 8:1 ratio of memory to vCPU, and the latest DDR5 memory.
You can launch R7i database instances in the Amazon RDS Management Console or using the AWS CLI. Upgrading a database instance to R7i instance family requires a simple instance type modification. For more details, refer to the Aurora documentation.
Amazon Aurora is designed for unparalleled high performance and availability at global scale with PostgreSQL compatibility. It provides built-in security, continuous backups, serverless compute, up to 15 read replicas, automated multi-Region replication, and integrations with other AWS services. To get started with Amazon Aurora, take a look at our getting started page.
Amazon Relational Database Service (RDS) for PostgreSQL, MySQL, and MariaDB now supports AWS Graviton3-based M7g database instances in Asia Pacific (Jakarta), Middle East (UAE), South America (Sao Paulo), Asia Pacific (Osaka), Asia Pacific (Melbourne), Israel (Tel Aviv), Europe (Zurich) and AWS GovCloud (US-East) Regions. R7g is now supported in Middle East (Bahrain), South America (Sao Paulo) and AWS GovCloud (US-West) Regions. Graviton3-based instances provide up to a 30% performance improvement over Graviton2-based instances on RDS for open-source databases depending on database engine, version, and workload.
Graviton3 processors offer several improvements over the second-generation Graviton2 processors. Graviton3-based M7g and R7g are the first AWS database instances to feature the latest DDR5 memory, which provides 50% more memory bandwidth compared to DDR4, enabling high-speed access to data in memory. M7g and R7g database instances offer up to 30Gbps enhanced networking bandwidth and up to 20 Gbps of bandwidth to the Amazon Elastic Block Store (Amazon EBS). M7g and R7g on Amazon RDS for MySQL and MariaDB will also support Optimized Writes. With Optimized Writes you can improve write throughout by up to 2x at no additional cost.
M7g and R7g database instances are supported on RDS for MySQL versions 8.0 and 8.4, RDS for PostgreSQL versions 13.4 (and higher), 14.5 (and higher), 15, 16 and 17 and RDS for MariaDB versions 10.4, 10.5, 10.6, 10.11 and 11.4. For complete information on pricing and regional availability, please refer to the Amazon RDS pricing page. Get started using the Amazon RDS Management Console.
The Amazon Web Services (AWS) ODBC Driver for PostgreSQL is now generally available for use with Amazon RDS and Amazon Aurora PostgreSQL-compatible edition database clusters. This database driver provides support for faster switchover and failover times, Aurora Limitless, and authentication with AWS Secrets Manager, AWS Identity and Access Management (IAM), or Federated Identity.
The Amazon Web Services (AWS) ODBC Driver for PostgreSQL is a standalone driver and supports RDS and community PostgreSQL and Amazon Aurora PostgreSQL. You can install the aws-pgsql-odbc package for Windows, Mac or Linux by following the Getting Started instructions on GitHub. The driver relies on monitoring the database cluster status and being aware of the cluster topology to determine the new writer. This approach reduces switchover and failover times from tens of seconds to single digit seconds compared to the open-source driver.
The AWS Advanced MySQL PostgreSQL driver is released as an open-source project under the Library General Public Licence, or LGPL.
At Google Cloud, we empower businesses to accelerate their generative AI innovation cycle by providing a path from prototype to production. Palo Alto Networks, a global cybersecurity leader, partnered with Google Cloud to develop an innovative security posture control solution that can answer complex “how-to” questions on demand, provide deep insights into risk with just a few clicks, and guide users through remediation steps.
Using advanced AI services, including Google’s Gemini models and managed Retrieval Augmented Generation (RAG) services such as Google Cloud’s Vertex AI Search, Palo Alto Networks had an ideal foundation for building and deploying gen AI-powered solutions.
The end result was Prisma Cloud Co-pilot, the Palo Alto Networks Prisma Cloud gen AI offering. It helps simplify cloud security management by providing an intuitive, AI-powered interface to help understand and mitigate risks.
Technical challenges and surprises
The Palo Alto Networks Prisma Cloud Co-pilot journey began in 2023 and launched in October 2024. During this time, Palo Alto Networks witnessed Google’s AI models evolve rapidly, from Text Bison (PaLM) to Gemini Flash 1.5. That rapid pace of innovation meant that each iteration brought new capabilities, necessitating a development process that could quickly adapt to the evolving landscape.
To effectively navigate the dynamic landscape of evolving gen AI models, Palo Alto Networks established robust processes that proved invaluable to their success:
Prompt engineering and management: Palo Alto Networks used Vertex AI to help manage prompt templates and built a diverse prompt library to generate a wide range of responses. To rigorously test each new model’s capabilities, limitations, and performance across various tasks, Palo Alto Networks and Google Cloud team systematically created and updated prompts for each submodule. Additionally, Vertex AI’s Prompt Optimizer helped streamline the tedious trial-and-error process of prompt engineering.
Intent recognition:Palo Alto Networks used the Gemini Flash 1.5 model to develop an intent recognition module, which efficiently routed user queries to the relevant co-pilot component. This approach provided users with many capabilities through a unified and lightweight user experience.
Input guardrails: Palo Alto Networks created guardrails as a first line of defense against unexpected, malicious, or simply incorrect queries that could compromise the functionality and experience of the chatbot. These guardrails maintain the chatbot’s intended functionality by preventing known prompt injection attacks, such as circumventing system instructions; and restricting chatbot usage to its intended scope. Guardrails were created to detect if user queries are restricted to responses within the predefined domain of general cloud security, risks, and vulnerabilities to prevent unintended use. Any topics outside this scope did not receive a response from the chatbot. Additionally, since the chatbot was designed for proprietary code generation for Palo Alto Networks systems to query internal systems, requests for general-purpose code generation similarly did not receive a response.
Evaluation dataset curation: A robust and representative evaluation dataset serves as a foundation to accurately and quickly assess the performance of gen AI models. The Palo Alto Networks team took great care to choose high-quality evaluation data and keep it relevant by constantly refreshing it with representative questions and expert-validated answers. The accuracy and reliability of the evaluation dataset was sourced and validated directly from Palo Alto Networks subject matter experts.
Automated evaluation:In collaboration with Google Cloud, Palo Alto Networks developed an automated evaluation pipeline using Vertex AI’s gen AI evaluation service. This pipeline allowed Palo Alto Networks to rigorously scale their assessment of different gen AI models, and benchmark those models using custom evaluation metrics while focusing on key performance indicators such as accuracy, latency, and consistency of responses.
Human evaluator training and red teaming: Palo Alto Networks invested in training their human evaluation team to identify and analyze specific loss patterns and provide detailed answers on a broad set of custom rubrics. This allowed them to pinpoint where a model’s response was inadequate and provide insightful feedback on model performance, which then guided model selection and refinement.
The team also conducted red teaming exercises focused on key areas, including:
Manipulating the co-pilot: Can the co-pilot be tricked into giving bad advice by feeding it false information?
Extracting sensitive data: Can the co-pilot be manipulated into revealing confidential information or system details?
Bypassing security controls: Can the co-pilot be used to craft attacks that circumvent existing security measures?
Load testing:To ensure the gen AI solutions met real-time demands, Palo Alto Networks actively load tested them, working within the pre-defined QPM (query per minute) and latency parameters of Gemini models. They simulated user traffic scenarios to find the optimal balance between responsiveness and scalability using provisioned throughput, which helped ensure a smooth user experience even during peak usage.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3e91549430>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Operational and business challenges
Operationalizing gen AI can introduce complex challenges across multiple functions, especially for compliance, legal, and information security. Evaluating ROI for gen AI solutions also requires new metrics. To address these challenges, Palo Alto Networks implemented the following techniques and processes:
Data residency and regional ML processing:Since many Palo Alto Networks customers need a regional approach for ML processing capabilities, we prioritized regional machine learning processing to help enable customer compliance with data residency needs and regional regulations, if applicable.
Where Google does not offer an AI data center that matched Prisma Cloud data center locations, customers were able to choose having their data processed in the U.S. before gaining access to the Prisma Cloud Co-pilot. We implemented strict data governance policies and used Google Cloud’s secure infrastructure to help safeguard sensitive information and uphold user privacy.
Deciding KPIs and measuring success for gen AI apps:The dynamic and nuanced nature of gen AI applications demands a bespoke set of metrics tailored to capture its specific characteristics and comprehensively evaluate its efficacy. There are no standard metrics that work for all use cases. The Prisma Cloud AI Co-pilot team relied on technical and business metrics to measure how well the system was operating.
Technical metrics, such as recall, helped to measure how thoroughly the system fetches relevant URLs when answering questions from documents, and to help increase the accuracy of prompt responses and provide source information for users.
Customer experience metrics, such as measuring helpfulness, relied on explicit feedback and telemetry data analysis. This provided deeper insights into user experience that resulted in increased productivity and cost savings.
Collaborating with security and legal teams: Palo Alto Networks brought in legal, information security, and other critical stakeholders early in the process to identify risks and create guardrails for issues including, but not limited to: information security requirements, elimination of bias in the dataset, appropriate functionality of the tool, and data usage in compliance with applicable law and contractual obligations.
Given customer concerns, enterprises must prioritize clear communication around data usage, storage, and protection. By collaborating with legal and information security teams early on to create transparency in marketing and product communications, Palo Alto Networks was able to build customer trust and help ensure they have a clear understanding of how and when their data is being used.
Ready to get started with Vertex AI ?
The future of generative AI is bright, and with careful planning and execution, enterprises can unlock its full potential. Explore your organization’s AI needs through practical pilots in Vertex AI, and rely on Google Cloud Consulting for expert guidance.
Your customers might not all speak the same language. If you operate internationally or serve a diverse customer base, you need your chatbot to meet them where they are – whether they’re searching for something in Spanish or Japanese. If you want to give your customers multilingual support with chatbots, you’ll need to orchestrate multiple AI models to handle diverse languages and technical complexities intelligently and efficiently. Customers expect quick, accurate answers in their language, from simple requests to complex troubleshooting.
To get there, developers need a modern architecture that can leverage specialized AI models – such as Gemma and Gemini – and a standardized communication layer so your LLM models can speak the same language, too. Model Context Protocol, or MCP, is a standardized way for AI systems to interact with external data sources and tools. It allows AI agents to access information and execute actions outside their own models, making them more capable and versatile. Let’s explore how we can build a powerful multilingual chatbot using Google’s Gemma, Translation LLM and Gemini models, orchestrated via MCP.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3e910f0ca0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
The challenge: Diverse needs, one interface
Building a truly effective support chatbot might be challenging for a few different reasons:
Language barriers: Support needs to be available in multiple languages, requiring high-quality, low-latency translation.
Query complexity: Questions range from simple FAQs (handled easily by a basic model) to intricate technical problems demanding advanced reasoning.
Efficiency: The chatbot needs to respond quickly without getting bogged down, especially when dealing with complex tasks or translations.
Maintainability: As AI models evolve and business needs change, the system must be easy to update without requiring a complete overhaul.
Trying to build a single, monolithic AI model to handle everything is often inefficient and complex. A better approach? Specialization and smart delegation.
MCP architecture for harnessing different LLMs
The key to making these specialized models work together effectively is MCP. MCP defines how an orchestrator (like our Gemma-powered client) can discover available tools, request specific actions (like translation or complex analysis) from other specialized services, pass necessary information (the “context”), and receive results back. It’s the essential plumbing that allows our “team” of AI models to collaborate. Here’s a framework for how it works with the LLMs:
Gemma: The chatbot uses a versatile LLM like Gemma to manage conversations, understand user requests, handle basic FAQs, and determine when to utilize specialized tools for complex tasks via MCP.
Translation LLM server: A dedicated, lightweight MCP server exposing Google Cloud’s Translation capabilities as a tool. Its sole focus is high-quality, fast translation between languages, callable via MCP.
Gemini: A specialized MCP server uses Gemini Pro or similar LLM for complex technical reasoning and problem-solving when invoked by the orchestrator.
Model Context Protocol:This protocol will allow Gemma to discover and invoke the Translation and Gemini “tools” running on their respective servers.
How it works
Let’s walk through an example non-English language scenario:
A technical question arrives: A customer types a technical question into the chat window, but it’s in French.
Gemma receives the text: The Gemma-powered client receives the French text. It recognizes the language isn’t English and determines translation is needed.
Gemma calls on Translation LLM: : Gemma uses the MCP connection to send the French text to the Translation LLM Server, requesting an English translation.
Text is translated: The Translation LLM Server performs the translation via its MCP-exposed tool and sends the English version back to the client.
This architecture offers broad applicability. For example, imagine a financial institution’s support chatbot where all user input, regardless of the original language, must be preserved in English in real time for fraud detection. Here, Gemma operates as the client, while Translation LLM, Gemini Flash, and Gemini Pro function on the server. In this configuration, the client-side Gemma manages multi-turn conversations for routine inquiries and intelligently directs complex requests to specialized tools. As depicted in the architectural diagram, Gemma manages all user interactions within a multi-turn chat. A tool leveraging Translation LLM can translate user queries and concurrently save them for immediate fraud analysis. Simultaneously, Gemini Flash and Pro models can generate responses based on the user’s requests. For intricate financial inquiries, Gemini Pro can be employed, while Gemini Flash can address less complex questions.
Let’s look at this sample GitHub repo that illustrates how this architecture works.
Why this is a winning combination
This is a powerful combination because it’s designed for both efficiency and how easily you can adapt it.
The main idea is splitting up the work. The Gemma model based client that users interact with stays light, handling the conversation and sending requests where they need to go. Tougher jobs, like translating or complex thinking, are sent to separate LLMs built specifically for those tasks. This way, each piece does what it’s best at, making the whole system perform better.
A big plus is how this makes things easier to manage and more flexible. Because the parts connect with a standard interface (the MCP), you can update or swap out one of the specialized LLMs – maybe to use a newer model for translation – without having to change the Gemma client. This makes updates simpler, reduces potential headaches, and lets you try new things more easily. You can use this kind of setup for things like creating highly personalized content, tackling complex data analysis, or automating workflows more intelligently.
Get started
Ready to build your own specialized, orchestrated AI solutions?
Explore the code: Clone the GitHub repository for this project and experiment with the client and server setup.