Amazon API Gateway (APIGW) now supports all features of HTTP APIs as well as Mutual TLS and multi-level base path mappings on REST APIs in the following additional Regions: Middle East (UAE), Asia Pacific (Jakarta), Asia Pacific (Osaka), Asia Pacific (Hyderabad), Asia Pacific (Melbourne), Europe (Zurich), Europe (Spain), Israel (Tel Aviv), and Canada West (Calgary). AWS Web Application Firewall (WAF) for REST APIs is now available in two additional regions: Asia Pacific (Kuala Lumpur) and Canada West (Calgary).
HTTP APIs simplify API development for serverless applications with a simpler user interface that includes support for OAuth2.0 and automatic deployments. Mutual TLS enhances security by authenticating x509 certificate based identities at the APIGW. Multi-level base path mappings enable routing requests based on segments in custom domain paths, supporting path-based versioning and traffic redirection. Integration of AWS WAF offers APIs protections against common web exploits through configurable rules that allow, block, or monitor web requests.
Today, we are announcing the general availability of Amazon Bedrock Data Automation (BDA), a feature of Amazon Bedrock that enables developers to automate the generation of valuable insights from unstructured multimodal content such as documents, images, video, and audio to build GenAI-based applications. By leveraging BDA, developers can reduce development time and effort, making it easier to build intelligent document processing, media analysis, and other multimodal data-centric automation solutions. BDA can be used as a standalone feature or as a parser in Amazon Knowledge Bases RAG workflows. Further, Amazon Q Business now uses BDA to process multimodal assets and deliver insights.
In this GA release, we improved document accuracy across a variety of document types, enhanced scene-level and full video summarization accuracy, added support for detection of 35,000+ company logos in images and videos, and added support for AWS cross-region inference to optimize routing across regions within your geography to maximize throughput. BDA also added a number of security, governance, and manageability capabilities such as AWS Key Management Service (KMS) Customer Managed Keys (CMKs) support for encryption, AWS PrivateLink to connect directly to the BDA APIs in your virtual private cloud (VPC) instead of connecting over the internet, and tagging of BDA resources and jobs to track costs and enforce tag-based access policies in Amazon Identity and Access Management (IAM).
Amazon Bedrock Data Automation is now generally available in the US West (Oregon) and US East (N. Virginia) AWS Regions.
Amazon QuickSight is now available in the AWS GovCloud (US-East) Region. AWS GovCloud (US) Regions are isolated AWS Regions designed to host sensitive data and regulated workloads in the cloud, assisting customers who have United States federal, state, or local government compliance requirements.
Amazon QuickSight is a fast, scalable, and fully managed Business Intelligence service that lets you easily create and publish interactive dashboards across your organization. QuickSight dashboards can be authored on any modern web browser with no clients to install or manage; dashboards can be shared with 10s of 1000s of users without the need to provision or manage any infrastructure. QuickSight dashboards can also be seamlessly embedded into your applications, portals, and websites to provide rich, interactive analytics for end-users.
With this launch, QuickSight expands to 22 regions, including: US East (Ohio and N. Virginia), US West (Oregon), Europe (Stockholm, Paris, Frankfurt, Ireland, London, Milan and Zurich), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Beijing, Tokyo and Jakarta), Canada (Central), South America (São Paulo), Africa (Cape Town) and AWS GovCloud (US-East, US-West).
AWS Amplify now supports HttpOnly cookies for server-rendered Next.js applications when using Amazon Cognito’s Managed Login. This enhancement builds upon existing cookie functionality in server-rendered sites, opting in to the HttpOnly attribute strengthens your application’s security posture by blocking client-side JavaScript from accessing cookie contents.
With HttpOnly cookies, your applications gain an additional layer of protection against cross-site scripting (XSS) attacks. This ensures that sensitive information remains secure and will only be transmitted between the browser and the server, and is particularly valuable when handling authentication tokens in your web applications. The contents of cookies with HttpOnly attributes can only be read by the server, requiring your requests to flow through the server before reaching other services.
This feature is now available in all AWS regions where AWS Amplify and Amazon Cognito are supported.
Amazon Connect now supports outbound campaign calling to Brazil in the US East (Virginia) and US West (Oregon) regions, making it easier to proactively communicate across voice, SMS, and email for use cases such as delivery notifications, marketing promotions, appointment reminders, or debt collection. Communication capabilities include features such as point-of-dial checks, calling controls for time of day, time zone, number of attempts per contact, and predictive dialing with integrated voicemail detection. A list management capability provided by Amazon Pinpoint can also be used to build customer journeys and multi-channel user contact experiences. Outbound campaigns can be enabled within the AWS Connect Console.
With Amazon Connect outbound campaigns, you only pay-as-you-go for the high-volume outbound service usage, associated telephony charges and any monthly target audience charges via Amazon Pinpoint. To learn more, visit our webpage.
AWS announces the preview of new AWS Outposts racks designed specifically for on-premises high throughput, network-intensive workloads. With these new Outposts racks, telecom service providers (telcos) can extend AWS infrastructure and services to telco locations, enabling them to deploy on-premises network functions requiring low latency, high throughput, and real-time performance.
The new Outposts racks feature new Amazon Elastic Compute Cloud (Amazon EC2) 4th Generation Intel Xeon Scalable-based (Sapphire Rapids) bare metal instances along with a high-performance bare metal network fabric. This architecture delivers the low latency and high throughput required for demanding 5G workloads, such as User Plane Function (UPF) and Radio Access Network (RAN) Central Unit (CU) network functions.
Telcos can now use Amazon EKS (Elastic Kubernetes Service) and built-in EKS add-ons to automate deployment and scaling of micro-services based 5G network functions for high throughput and performance. Telcos can now use the same AWS infrastructure, AWS services, APIs, tools, and a common continuous integration and continuous delivery (CI/CD) pipeline wherever their workloads reside. This consistent cloud experience eases operational burden, reduces integration costs, and maximizes new feature development velocity for operators.
The new AWS Outposts racks are currently available in preview in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Singapore).
Today, AWS IoT Device Management announces the preview of managed integrations, a new feature that enables you to simplify control and management of a diverse set of devices across multiple manufacturers and connectivity protocols. The new feature helps you streamline cloud onboarding of Internet of Things (IoT) devices and enables you to control both self-managed and third-party devices, including cloud-based devices, from a single application.
Managed integrations provides cloud and device Software Development Kits (SDKs) for device connectivity and protocol support for ZigBee, Z-Wave, and Wi-Fi specifications, eliminating the need to handle dedicated connectivity protocols from different manufacturers separately. A unified API coupled with a catalog of cloud-to-cloud connectors and 80+ device data model templates enable you to control both proprietary and third-party devices from a single application. Additionally, you can easily process and integrate device data from those devices for building home security, energy management, and elderly care monitoring solutions. Managed integrations for AWS IoT Device Management also provides built-in capabilities for barcode scanning and direct pairing of devices, delivering additional mechanisms to simplify device onboarding and integration complexities.
The managed integrations feature is available in Canada (Central) and Europe (Ireland) AWS Regions. To learn more, see technical documentation and read this blog. To get started, log in to the AWS IoT console or use the AWS Command Line Interface (AWS CLI).
Customers can use regional processing profiles for Amazon Nova understanding models (Amazon Nova Lite, Amazon Nova Micro, and Amazon Nova Pro) in Europe (Stockholm).
Amazon Bedrock is a fully managed service that offers a choice of high-performing large language models (LLMs) and other FMs from leading AI companies via a single API. Amazon Bedrock also provides a broad set of capabilities customers need to build generative AI applications with security, privacy, and responsible AI built in. These capabilities help you build tailored applications for multiple use cases across different industries, helping organizations unlock sustained growth from generative AI while ensuring customer trust and data governance.
AWS CodeBuild managed images now support Node 22, Python 3.13, and Go 1.23. These new runtime versions are available in Linux x86, Arm, Windows and macOS platforms. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages ready for deployment.
For CodeBuild managed images based on Linux, you can specify a runtime of your choice in the runtime-versions section of your buildspec file. You can select specific major and minor versions supported by CodeBuild, or define a custom runtime version. Additionally with this release, we added commonly used tools that are available in GitHub Actions environments to better support customers using CodeBuild as a self-hosted runner option.
The updated images are available in all regions where CodeBuild is offered. For more information about the AWS Regions where CodeBuild is available, see the AWS Regions page.
To learn more about docker images and runtime versions provided by CodeBuild, please visit our documentation or our image repository. To learn more about how to get started with CodeBuild, visit the AWS CodeBuild product page.
CloudWatch RUM, which provides real-time monitoring into web application performance by tracking user interactions, now supports resource based policies that simplify access for data ingestion to RUM. With resource-based policies, you can specify which Identity and Access Management (IAM) principals have access to ingest data to your RUM app monitors— effectively which clients can write data to RUM. This would also allow you to ingest data at higher volume and gives you greater control over data ingress in RUM.
Using resource based policies allows you to manage ingestion access to your app monitor without using Amazon Cognito to assume an IAM role, and AWS Security Token Service (STS) to obtain security credentials to write data to CloudWatch RUM. This is beneficial for high throughput use cases where a high volume of requests may be subject to Cognito’s quota limits leading to throttling and potentially failure in ingesting data to RUM. With a public resource policy, no such limits apply. Anyone can send data to CloudWatch RUM including unauthenticated users and clients. In addition, you can use AWS Global context keys to use these policies to block certain IPs or disable clients sending data to RUM. You can configure these policies on the AWS console or via code using AWS CloudFormation.
These enhancements are available in all regions where CloudWatch RUM is available at no additional cost to users.
See documentation to know more about the feature, or see user guide to learn how to configure resource based policies for CloudWatch RUM.
Amazon Cognito now allows customers to customize access tokens for M2M flows, enabling you to implement fine-grained authorization in your applications, APIs, and workloads. M2M authorization is commonly used for automated processes such as scheduled data synchronization tasks, event-driven workflows, microservices communication, or real-time data streaming between systems. In M2M authorization flows, an app client can represent a software system or service that can request access tokens to interact with resources, such as a reporting system or a data processing service. With this launch, customers can now customize their access tokens with custom claims (attributes about the app client) and scopes (level of access that an app client can request to a resource), making it easier to control and manage how their automated systems interact with each other.
Customers can now add custom attributes directly in access tokens, reducing the complexity of authorization logic needed in their application code. For example, customers can customize access tokens with claims that allow an app client for a reporting system to only read data while allowing an app client for a data processing service to both read and modify data. This allows customers to streamline authentication by embedding custom authorization attributes directly into access tokens during the token issuance process.
Access token customization for M2M authorization is available to Amazon Cognito customers using Essentials or Plus tiers in all AWS Regions where Cognito is available, except the AWS GovCloud (US) Regions. To learn more, refer to the developer guide.
A few weeks ago, Google DeepMind released Gemini 2.0 for everyone, including Gemini 2.0 Flash, Gemini 2.0 Flash-Lite, and Gemini 2.0 Pro (Experimental). All models support up to at least 1 million input tokens, which makes it easier to do a lot of things – from image generation to creative writing. It’s also changed how we convert documents into structured data. Manual document processing is a slow and expensive problem, but Gemini 2.0 changed everything when it comes to chunking pdfs for RAG systems, and can even transform pdfs into insights.
Today, we’ll take a deep dive into a multi-step approach using generative AI where you can use Gemini 2.0 to improve your document extraction by combining language models (LLMs) with structured, externalized rules.
A multi-step approach to document extraction, made easy
A multi-step architecture, as opposed to relying on a single, monolithic prompt, offers significant advantages for robust extraction. This approach begins with modular extraction, where initial tasks are broken down into smaller, more focused prompts targeting specific content locations within a document. This modularity not only enhances accuracy but also reduces the cognitive load on the LLM.
Another benefit of a multi-step approach is externalized rule management. By managing post-processing rules externally, for instance, using Google Sheets or a BigQuery table, we gain the benefits of easy CRUD (Create, Read, Update, Delete) operations, improving both maintainability and version control of the rules. This decoupling also separates the logic of extraction from the logic of processing, allowing for independent modification and optimization of each.
Ultimately, this hybrid approach combines the strengths of LLM-powered extraction with a structured rules engine. LLMs handle the complexities of understanding and extracting information from unstructured data, while the rules engine provides a transparent and manageable system for enforcing business logic and decision-making. The following steps outline a practical implementation.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e436dc40100>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Step 1: Extraction
Let’s test a sample prompt with a configurable set of rules. This hands-on example will demonstrate how easily you can define and apply business logic to extracted data, all powered by the Gemini and Vertex AI.
First, we extract data from a document. Let’s use Google’s 2023 Environment Report as the source document. We use Gemini with the initial prompt to extract data. This is not a known schema, but a prompt we’ve created for the purposes of this story. To create specific response schemas, use controlled generation with Gemini.
code_block
<ListValue: [StructValue([(‘code’, ‘<PERSONA>rnYou are a meticulous AI assistant specializing in extracting key sustainability metrics and performance data from corporate environmental reports. Your task is to accurately identify and extract specific data points from a provided document, ensuring precise values and contextual information are captured. Your analysis is crucial for tracking progress against sustainability goals and supporting informed decision-making.rnrn<INSTRUCTIONS>rnrn**Task:**rnAnalyze the provided Google Environmental Report 2023 (PDF) and extract the following `key_metrics`. For each metric:rnrn1. **`metric_id`**: A short, unique identifier for the metric (provided below).rn2. **`description`**: A brief description of the metric (provided below).rn3. **`value`**: The numerical value of the metric as reported in the document. Be precise (e.g., “10.2 million”, not “about 10 million”). If a range is given, and a single value is not clearly indicated, you must use the largest of the range.rn4. **`unit`**: The unit of measurement for the metric (e.g., “tCO2e”, “million gallons”, “%”). Use the units exactly as they appear in the report.rn5. **`year`**: The year to which the metric applies (2022, unless otherwise specified).rn6. **`page_number`**: The page number(s) where the metric’s value is found. If the information is spread across multiple pages, list all relevant pages, separated by commas. If the value requires calculations based on the page, list the final answer page.rn7. **`context`**: One sentance to put the metric in context.rn**Metrics to Extract:**rnrn“`jsonrn[rn {rn “metric_id”: “ghg_emissions_total”,rn “description”: “Total GHG Emissions (Scope 1, 2 market-based, and 3)”,rn },rn {rn “metric_id”: “ghg_emissions_scope1”,rn “description”: “Scope 1 GHG Emissions”,rn },rn {rn “metric_id”: “ghg_emissions_scope2_market”,rn “description”: “Scope 2 GHG Emissions (market-based)”,rn },rn {rn “metric_id”: “ghg_emissions_scope3_total”,rn “description”: “Total Scope 3 GHG Emissions”,rn },rn {rn “metric_id”: “renewable_energy_capacity”,rn “description”: “Clean energy generation capacity from signed agreements (2010-2022)”,rn },rn {rn “metric_id”: “water_replenishment”,rn “description”: “Water replenished”,rn },rn {rn “metric_id”: “water_consumption”,rn “description”: “Water consumption”,rn },rn {rn “metric_id”: “waste_diversion_landfill”,rn “description”: “Percentage of food waste diverted from landfill”,rn },rn {rn “metric_id”: “recycled_material_plastic”,rn “description”: “Percentage of plastic used in manufactured products that was recycled content”,rn },rn {rn “metric_id”: “packaging_plastic_free”,rn “description”: “Percentage of product packaging that is plastic-free”,rn }rn]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e436dc40be0>)])]>
The JSON output below, which we’ll assign to the variable `extracted_data`, represents the results of the initial data extraction by Gemini. This structured data is now ready for the next critical phase: applying our predefined business rules.
code_block
<ListValue: [StructValue([(‘code’, ‘extracted_data= [rn {rn “metric_id”: “ghg_emissions_total”,rn “description”: “Total GHG Emissions (Scope 1, 2 market-based, and 3)”,rn “value”: “14.3 million”,rn “unit”: “tCO2e”,rn “year”: 2022,rn “page_number”: “23”,rn “context”: “In 2022 Google’s total GHG emissions, including Scope 1, 2 (market-based), and 3, amounted to 14.3 million tCO2e.”rn },rn {rn “metric_id”: “ghg_emissions_scope1”,rn “description”: “Scope 1 GHG Emissions”,rn “value”: “0.23 million”,rn “unit”: “tCO2e”,rn “year”: 2022,rn “page_number”: “23”,rn “context”: “In 2022, Google’s Scope 1 GHG emissions were 0.23 million tCO2e.”rn },rn {rn “metric_id”: “ghg_emissions_scope2_market”,rn “description”: “Scope 2 GHG Emissions (market-based)”,rn “value”: “0.03 million”,rn “unit”: “tCO2e”,rn “year”: 2022,rn “page_number”: “23”,rn “context”: “Google’s Scope 2 GHG emissions (market-based) in 2022 totaled 0.03 million tCO2e.”rn },rn {rn “metric_id”: “ghg_emissions_scope3_total”,rn “description”: “Total Scope 3 GHG Emissions”,rn “value”: “14.0 million”,rn “unit”: “tCO2e”,rn “year”: 2022,rn “page_number”: “23”,rn “context”: “Total Scope 3 GHG emissions for Google in 2022 reached 14.0 million tCO2e.”rn },rn {rn “metric_id”: “renewable_energy_capacity”,rn “description”: “Clean energy generation capacity from signed agreements (2010-2022)”,rn “value”: “7.5”,rn “unit”: “GW”,rn “year”: 2022,rn “page_number”: “14”,rn “context”: “By the end of 2022, Google had signed agreements for a clean energy generation capacity of 7.5 GW since 2010.”rn },rn {rn “metric_id”: “water_replenishment”,rn “description”: “Water replenished”,rn “value”: “2.4 billion”,rn “unit”: “gallons”,rn “year”: 2022,rn “page_number”: “30”,rn “context”: “Google replenished 2.4 billion gallons of water in 2022.”rn },rn {rn “metric_id”: “water_consumption”,rn “description”: “Water consumption”,rn “value”: “3.4 billion”,rn “unit”: “gallons”,rn “year”: 2022,rn “page_number”: “30”,rn “context”: “In 2022 Google’s water consumption totalled 3.4 billion gallons.”rn },rn {rn “metric_id”: “waste_diversion_landfill”,rn “description”: “Percentage of food waste diverted from landfill”,rn “value”: “70”,rn “unit”: “%”,rn “year”: 2022,rn “page_number”: “34”,rn “context”: “Google diverted 70% of its food waste from landfills in 2022.”rn },rn {rn “metric_id”: “recycled_material_plastic”,rn “description”: “Percentage of plastic used in manufactured products that was recycled content”,rn “value”: “50”,rn “unit”: “%”,rn “year”: 2022,rn “page_number”: “32”,rn “context”: “In 2022 50% of plastic used in manufactured products was recycled content.”rn },rn {rn “metric_id”: “packaging_plastic_free”,rn “description”: “Percentage of product packaging that is plastic-free”,rn “value”: “34”,rn “unit”: “%”,rn “year”: 2022,rn “page_number”: “32”,rn “context”: “34% of Google’s product packaging was plastic-free in 2022.”rn }rn]’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e436dc400d0>)])]>
Step 2: Feed the extracted data into a rules engine
Next, we’ll feed this `extracted_data` into a rules engine, which, in our implementation, is another call to Gemini, acting as a powerful and flexible rules processor. Along with the extracted data, we’ll provide a set of validation rules defined in the `analysis_rules` variable. This engine, powered by Gemini, will systematically check the extracted data for accuracy, consistency, and adherence to our predefined criteria. Below is the prompt we provide to Gemini to accomplish this, along with the rules themselves.
code_block
<ListValue: [StructValue([(‘code’, “<PERSONA>rnYou are a sustainability data analyst responsible for verifying the accuracy and consistency of extracted data from corporate environmental reports. Your task is to apply a set of predefined rules to the extracted data to identify potential inconsistencies, highlight areas needing further investigation, and assess progress towards stated goals. You are detail-oriented and understand the nuances of sustainability reporting.rnrn<INSTRUCTIONS>rnrn**Input:**rnrn1. `extracted_data`: (JSON) The `extracted_data` variable contains the values extracted from the Google Environmental Report 2023, as provided in the previous turn. This is the output from the first Gemini extraction.rn2. `analysis_rules`: (JSON) The `analysis_rules` variable contains a JSON string defining a set of rules to apply to the extracted data. Each rule includes a `rule_id`, `description`, `condition`, `action`, and `alert_message`.rnrn**Task:**rnrn1. **Iterate through Rules:** Process each rule defined in the `analysis_rules`.rn2. **Evaluate Conditions:** For each rule, evaluate the `condition` using the data in `extracted_data`. Conditions may involve:rn * Accessing specific `metric_id` values within the `extracted_data`.rn * Comparing values across different metrics.rn * Checking for data types (e.g., ensuring a value is a number).rn * Checking page numbers for consistency.rn * Using logical operators (AND, OR, NOT) and mathematical comparisons (>, <, >=, <=, ==, !=).rn * Checking for the existence of data.rn3. **Execute Actions:** If a rule’s condition evaluates to TRUE, execute the `action` specified in the rule. The action describes *what* the rule is checking.rn4. **Trigger Alerts:** If the condition is TRUE, generate the `alert_message` associated with that rule. Include relevant `metric_id` values and page numbers in the alert message to provide context.rnrn**Output:**rnrnReturn a JSON array containing the triggered alerts. Each alert should be a dictionary with the following keys:rnrn* `rule_id`: The ID of the rule that triggered the alert.rn* `alert_message`: The alert message, potentially including specific values from the `extracted_data`.”), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e436dc401c0>)])]>
`analysis_rules` is a JSON object that contains the business rules we want to apply to the extracted receipt data. Each rule defines a specific condition to check, an action to take if the condition is met, and an optional alert message if a violation occurs. The power of this approach lies in the flexibility of these rules; you can easily add, modify, or remove them without altering the core extraction process. The beauty of using Gemini is that the rules can be written in human-readable language and can be maintained by non-coders.
code_block
<ListValue: [StructValue([(‘code’, ‘analysis_rules = {rn “rules”: [rn {rn “rule_id”: “AR001”,rn “description”: “Check if all required metrics were extracted.”,rn “condition”: “extracted_data contains all metric_ids from the original extraction prompt”,rn “action”: “Verify the presence of all expected metrics.”,rn “alert_message”: “Missing metrics in the extracted data. The following metric IDs are missing: {missing_metrics}”rn },rn {rn “rule_id”: “AR002”,rn “description”: “Check if total GHG emissions equal the sum of Scope 1, 2, and 3.”,rn “condition”: “extracted_data[‘ghg_emissions_total’][‘value’] != (extracted_data[‘ghg_emissions_scope1’][‘value’] + extracted_data[‘ghg_emissions_scope2_market’][‘value’] + extracted_data[‘ghg_emissions_scope3_total’][‘value’]) AND extracted_data[‘ghg_emissions_total’][‘page_number’] == extracted_data[‘ghg_emissions_scope1’][‘page_number’] == extracted_data[‘ghg_emissions_scope2_market’][‘page_number’] == extracted_data[‘ghg_emissions_scope3_total’][‘page_number’]”,rn “action”: “Sum Scope 1, 2, and 3 emissions and compare to the reported total.”,rn “alert_message”: “Inconsistency detected: Total GHG emissions ({total_emissions} {total_unit}) on page {total_page} do not equal the sum of Scope 1 ({scope1_emissions} {scope1_unit}), Scope 2 ({scope2_emissions} {scope2_unit}), and Scope 3 ({scope3_emissions} {scope3_unit}) emissions on page {scope1_page}. Sum is {calculated_sum}”rn },rn {rn “rule_id”: “AR003”,rn “description”: “Check for unusually high water consumption compared to replenishment.”,rn “condition”: “extracted_data[‘water_consumption’][‘value’] > (extracted_data[‘water_replenishment’][‘value’] * 5) AND extracted_data[‘water_consumption’][‘unit’] == extracted_data[‘water_replenishment’][‘unit’]”,rn “action”: “Compare water consumption to water replenishment.”,rn “alert_message”: “High water consumption: Consumption ({consumption_value} {consumption_unit}) is more than five times replenishment ({replenishment_value} {replenishment_unit}) on page {consumption_page} and {replenishment_page}.”rn }rn ]rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e436dc402e0>)])]>
Step 3: Integrate your insights
Finally – and crucially – integrate the alerts and insights generated by the rules engine into existing data pipelines and workflows. This is where the real value of this multi-step process is unlocked. For our example, we can build robust APIs and systems using Google Cloud tools to automate downstream actions triggered by the rule-based analysis. Some examples of downstream tasks are:
Automated task creation: Trigger Cloud Functions to create tasks in project management systems, assigning data verification to the appropriate teams.
Data quality pipelines: Integrate with Dataflow to flag potential data inconsistencies in BigQuery tables, triggering validation workflows.
Vertex AI integration: Leverage Vertex AI Model Registry for tracking data lineage and model performance related to extracted metrics and corrections made.
Dashboard integration Use Looker, Google Sheets, or Data Studio to display alerts
Human in the loop trigger: Build a trigger system for the Human in the loop, using Cloud Tasks, to show which extractions to focus on and double check.
Make document extraction easier today
This hands-on approach provides a solid foundation for building robust, rule-driven document extraction pipelines. To get started, explore these resources:
Gemini for document understanding: For a comprehensive, one-stop solution to your document processing needs, check out Gemini for document understanding. It simplifies many common extraction challenges.
Few-shot prompting: Begin your Gemini journey withfew-shot prompting. This powerful technique can significantly improve the quality of your extractions with minimal effort, providing examples within the prompt itself.
Fine-tuning Gemini models: When you need highly specialized, domain-specific extraction results, consider fine-tuning Gemini models. This allows you to tailor the model’s performance to your exact requirements.
Cloud SQL, Google Cloud’s fully managed database service for PostgreSQL, MySQL, and SQL Server workloads, offers strong availability SLAs, depending on which edition you choose: a 99.95% SLA, excluding maintenance for Enterprise edition; and a 99.99% SLA, including maintenance for Enterprise Plus. In addition, Cloud SQL offers numerous high availability and scalability features that are crucial for maintaining business continuity and minimizing downtime, especially for mission-critical databases.
These features can help address some common database deployment challenges:
Combined read/write instances: Using a single instance for both reads and writes creates a single point of failure. If the primary instance goes down, both read and write operations are impacted. In the event that your storage is full and auto-scaling is disabled, even a failover would not help.
Downtime during maintenance: Planned maintenance can disrupt business operations.
Time-consuming scaling: Manually scaling instance size for planned workload spikes is a lengthy process that requires significant planning.
Complex cross-region disaster recovery: Setting up and managing cross-region DR requires manual configuration and connection string updates after a failover.
In this blog, we show you how to maximize your business continuity efforts with Cloud SQL’s high availability and scalability features, as well as how to use Cloud SQL Enterprise Plus features to build resilient database architectures that can handle workload spikes, unexpected outages, and read scaling needs.
Architecting a highly available and robust database
Using the Cloud SQL high availability feature, which automatically fails over to a standby instance, is a good starting point but not sufficient: scenarios such as storage full issues, regional outages, or failover problems can still cause disruptions. Separating read workloads from write workloads is essential for a more robust architecture.
A best-practice approach involves implementing Cloud SQL read replicas alongside high availability. Read traffic should be directed to dedicated read-replica instances, while write operations are handled by the primary instance. You can enable high availability either on the primary, the read replica(s), or both, depending on your specific requirements. This separation helps ensure that the primary can serve production traffic predictably, and that read operations can continue uninterrupted via the read replicas even when there is downtime.
Below is a sample regional architecture with high availability and read-replica enabled.
You can deploy this architecture regionally across multiple zones or extend it cross-regionally for disaster recovery and geographically-distributed read access. A regional deployment with a highly available primary and a highly available read replica that spans three availability zones provides resilience against zonal failures: Even if two zones fail, the database remains accessible for both read and write operations after failover. Cross-region read replicas enhance this further, providing regional DR capabilities.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e43683c9280>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Cloud SQL Enterprise Plus features
Cloud SQL Enterprise Plus offers significant advantages for performance and availability:
Enhanced hardware: Run databases on high-performance hardware with up to 128 vCPUs and 824GB of RAM.
Data cache: Enable data caching for faster read performance.
Near-zero downtime operations: Experience near-zero downtime maintenance and sub-second (<1s) downtime for instance scaling.
Advanced disaster recovery: Streamline disaster recovery with failover to cross-region DR-Replica and automatic reinstatement of the old primary. The application can still connect using the same write endpoint, which is automatically assigned to the new primary after failover.
Enterprise Plus edition addresses the previously mentioned challenges:
Improved performance: Benefit from higher core-to-memory ratios for better database performance.
Faster reads: Data caching improves read performance for read-heavy workloads. Read-cache can be enabled in the primary, the read-replica, or both as needed.
Easy scaling: Scale instances quickly with minimal downtime (sub-second) to handle traffic spikes or planned events. Scale the instance down when traffic is low with sub-second downtime.
Minimized maintenance downtime: Reduce downtime during maintenance to less than a second and provide better business continuity.
Handle regional failures: Easily fail over to a cross-region DR replica, and Cloud SQL automatically rebuilds your architecture as the original region recovers. This lessens the hassle of DR drills and helps ensure application availability.
Automatic IP address re-pointing: Leverage the write endpoint to automatically connect to the current primary after a switchover or failover and you don’t need to make any IP address changes on the application end.
To test out these benefits quickly, there’s an easy, near-zero downtime upgrade option from Cloud SQL Enterprise edition to Enterprise Plus edition.
Staging environment testing: To identify potential issues, use the maintenance timing feature to deploy maintenance to test/staging environments at least a week before production.
Read-replica maintenance: Apply self-service maintenance to one of the read replicas before the primary instance to avoid simultaneous downtime for read and write operations. Make sure that the primary and other replicas are updated shortly afterwards, as we recommend maintaining the same maintenance version in the primary as well as all the other replicas.
Maintenance window: Always configure a maintenance window during off-peak hours to control when maintenance is performed.
Maintenance notifications: Opt in to maintenance notifications to make sure you receive an email at least one week before scheduled maintenance.
Reschedule maintenance: Use the reschedule maintenance feature if a maintenance activity conflicts with a critical business period.
Deny maintenance period: Use the deny maintenance period feature to postpone maintenance for up to 90 days during sensitive periods.
By combining these strategies, you can build highly available and scalable database solutions in Cloud SQL, helping to ensure your business continuity and minimize downtime. Refer to the maintenance FAQ for more detailed information.
As a technology leader and a steward of company resources, understanding these costs isn’t just prudent – it’s essential for sustainable AI adoption. To help, we’ll unveil a comprehensive approach to understanding and managing your AI costs on Google Cloud, ensuring your organization captures maximum value from its AI investments.
Whether you’re just beginning your AI journey or scaling existing solutions, this approach will equip you with the insights needed to make informed decisions about your AI strategy.
Why understanding AI costs matters now
Google Cloud offers a vast and ever-expanding array of AI services, each with its own pricing structure. Without a clear understanding of these costs, you risk budget overruns, stalled projects, and ultimately, a failure to realize the full potential of your AI investments. This isn’t just about saving money; it’s about responsible AI development – building solutions that are both innovative and financially sustainable.
Breaking down the Total Cost of Ownership (TCO) for AI on Google Cloud
Let’s dissect the major cost components of running AI workloads on Google Cloud:
Cost category
Description
Google Cloud services (Examples)
Model serving cost
The cost of running your trained AI model to make predictions (inference). This is often a per-request or per-unit-of-time cost.
OOTB models available in Vertex AI, Vertex AI Prediction, GKE (if self-managing), Cloud Run Functions (for serverless inference)
Training and tuning costs
The expense of training your AI model on your data and fine-tuning it for optimal performance. This includes compute resources (GPUs/TPUs) and potentially the cost of the training data itself.
Vertex AI Training, Compute Engine (with GPUs/TPUs), GKE or Cloud Run (with GPUs/TPUs)
Cloud hosting costs
The fundamental infrastructure costs for running your AI application, including compute, networking, and storage.
Compute Engine, GKE or Cloud Run, Cloud Storage, Cloud SQL (if your application uses a database)
Training data storage and adapter layers costs
The cost of storing your training data and any “adapter layers” (intermediate representations or fine-tuned model components) created during the training process.
Cloud Storage, BigQuery
Application layer and setup costs
The expenses associated with any additional cloud services needed to support your AI application, such as API gateways, load balancers, monitoring tools, etc.
The ongoing costs of maintaining and supporting your AI model, including monitoring performance, troubleshooting issues, and potentially retraining the model over time.
Google Cloud Support, internal staff time, potential third-party monitoring tools
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e436e50f250>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Let’s estimate costs with an example
Let’s illustrate this with a hypothetical, yet realistic, generative AI use case: Imagine you’re a retail customer with an automated customer support chatbot.
Scenario: A medium-sized e-commerce company wants to deploy a chatbot on their website to handle common customer inquiries (order status, returns, product information and more). They plan to use a pre-trained language model (like one available through Vertex AI Model Garden) and fine-tune it on their own customer support data.
Assumptions:
Model: Fine-tuning a low latency language model (in this case we will use Gemini 1.5 Flash).
Training data: 1 million customer support conversations (text data).
Traffic: 100K chatbot interactions per day.
Hosting: Vertex AI Prediction for serving the model.
Fine-tuning frequency: Monthly.
Cost estimation
As the retail customer in this example, here’s how you might approach this.
1. First, discover your model serving cost:
Vertex AI Prediction (Gemini 1.5 Flash for Chat) pricing is modality-based pricing so in this case since our input and output is text, the usage unit will be characters. Let’s assume an average of 1000 input characters and 500 output characters per interaction.
Total model serving cost per month (~30 days): ~$337
Servicing cost of Gemini Flash 1.5 LLM model
2. Second, identify your training and tuning costs:
In this scenario, we aim to enhance the model’s accuracy and relevance to our specific use case through fine-tuning. This involves inputting a million past chat interactions, enabling the model to deliver more precise and customized interactions.
Cost per training tokens: $8 / M tokens
Cost per training characters: $2 / M characters (where each token approximately equates to 4 characters)
Tuning cost (subsequent month): 100,000 conversation (new training data) * 1500 characters (input + output) * 2 /1,000,000 = $300
3. Third, understand the cloud hosting costs:
Since we’re using Vertex AI Prediction, the underlying infrastructure is managed by Google Cloud. The cost is included in the per-request pricing. However, if we are self-managing the model on GKE or Compute Engine, we’d need to factor in VM costs, GPU/TPU costs (if applicable), and networking costs. For this example, we assume this is $0, as it is part of Vertex AI cost.
4. Fourth, define the training data storage and adapter layers costs:
The infrastructure costs for deploying machine learning models often raise concerns, but the data storage components can be economical at moderate scales. When implementing a conversational AI system, storing both the training data and the specialized model adapters represents a minor fraction of the overall costs. Let’s break down these storage requirements and their associated expenses.
1M conversations, assuming an average size of 5KB per conversation, would be roughly 5GB of data.
Cloud Storage cost for 5GB is negligible: $0.1 per month.
Adapter layers (fine-tuned model weights) might add another 1GB of storage. This would still be very inexpensive: $0.02 per month.
Total storage cost per month: < $1/month
5. Fifth, consider the application layer and setup costs:
This depends heavily on the specific application. In this case we are using Cloud Run Functions and Logging. Cloud Run to handle pre- and post-processing of chatbot requests (e.g., formatting, database lookups). In this case let’s assume we use request-based billing so we are only charged when it processes the request. In this example we are processing 3M requests per month (100K * 30) and assuming 1 sec for average execution time: $14.30
Cloud Run function cost for request-based billing
Cloud Logging and Monitoring for tracking chatbot performance and debugging issues. Let’s estimate 100GB of logging volume (which is on higher end) and retaining the logs for 3 months: $28
Cloud Logging costs for storage and retention
Total application layer cost per month:~ $40
6. Finally, incorporate the Operational support cost:
This is the hardest to estimate, as it depends on the internal team’s size and responsibilities. Let’s assume a conservative estimate of 5 hours per week of an engineer’s time dedicated to monitoring and maintaining the chatbot, at an hourly rate of $100.
Total operational support cost per month: 5 hours/week * 4 weeks/month * $100/hour = $2000
You can find the full estimate of cost here. Note that this does not include tuning and operational cost as it is not available in pricing export yet.
Once you have a good understanding of your AI costs, it is important to develop an optimization strategy that encompasses infrastructure choices, resource utilization, and monitoring practices to maintain performance while controlling expenses. By understanding the various cost components and leveraging Google Cloud’s tools and resources, you can confidently embark on your AI journey. Cost management isn’t a barrier; it’s an enabler. It allows you to experiment, innovate, and build transformative AI solutions in a financially responsible way.
Rosetta 2 is Apple’s translation technology for running x86-64 binaries on Apple Silicon (ARM64) macOS systems.
Rosetta 2 translation creates a cache of Ahead-Of-Time (AOT) files that can serve as valuable forensic artifacts.
Mandiant has observed sophisticated threat actors leveraging x86-64 compiled macOS malware, likely due to broader compatibility and relaxed execution policies compared to ARM64 binaries.
Analysis of AOT files, combined with FSEvents and Unified Logs (with a custom profile), can assist in investigating macOS intrusions.
Introduction
Rosetta 2 (internally known on macOS as OAH) was introduced in macOS 11 (Big Sur) in 2020 to enable binaries compiled for x86-64 architectures to run on Apple Silicon (ARM64) architectures. Rosetta 2 translates signed and unsigned x86-64 binaries just-in-time or ahead-of-time at the point of execution. Mandiant has identified several new highly sophisticated macOS malware variants over the past year, notably compiled for x86-64 architecture. Mandiant assessed that this choice of architecture was most likely due to increased chances of compatibility on victim systems and more relaxed execution policies. Notably, macOS enforces stricter code signing requirements for ARM64 binaries compared to x86-64 binaries running under Rosetta 2, making unsigned ARM64 binaries more difficult to execute. Despite this, in the newly identified APT malware families observed by Mandiant over the past year, all were self-signed, likely to avoid other compensating security controls in place on macOS.
The Rosetta 2 Cache
When a x86-64 binary is executed on a system with Rosetta 2 installed, the Rosetta 2 Daemon process (oahd) checks if an ahead-of-time (AOT) file already exists for the binary within the Rosetta 2 cache directory on the Data volume at /var/db/oah/<UUID>/. The UUID value in this file path appears to be randomly generated on install or update. If an AOT file does not exist, one will be created by writing translation code to a .in_progress file and then renaming it to a .aot file of the same name as the original binary. The Rosetta 2 Daemon process then runs the translated binary.
The /var/db/oah directory and its children are protected and owned by the OAH Daemon user account _oahd. Interaction with these files by other user accounts is only possible if System Integrity Protection (SIP) is disabled, which requires booting into recovery mode.
The directories under /var/db/oah/<UUID>/ are binary UUID values that correspond to translated binaries. Specifically, these binary UUID values are SHA-256 hashes generated from a combination of the binary file path, the Mach-O header, timestamps (created, modified, and changed), size, and ownership information. If the same binary is executed with any of these attributes changed, a new Rosetta AOT cache directory and file is created. While the content of the binaries is not part of this hashing function, changing the content of a file on an APFS file system will update the changed timestamp, which effectively means content changes can cause the creation of a new binary UUID and AOT file. Ultimately, the mechanism is designed to be extremely sensitive to any changes to x86-64 binaries at the byte and file system levels to reduce the risk of AOT poisoning.
Figure 1: Sample Rosetta 2 cache directory structure and contents
The Rosetta 2 cache binary UUID directories and the AOT files they contain appear to persist until macOS system updates. System updates have been found to cause the deletion of the cache directory (the Random UUID directory). After the upgrade, a directory with a different UUID value is created, and new Binary UUID directories and AOT files are created upon first launch of x86-64 binaries thereafter.
Translation and Universal Binaries
When universal binaries (containing both x86-64 and ARM64 code) are executed by a x86-64 process running through Rosetta 2 translation, the x86-64 version of these binaries is executed, resulting in the creation of AOT files.
Figure 2: Overview of execution of universal binaries with X864-64 processes translated through Rosetta 2 versus ARM64 processes
In a Democratic People’s Republic of Korea (DPRK) crypto heist investigation, Mandiant observed a x86-64 variant of the POOLRAT macOS backdoor being deployed and the attacker proceeding to execute universal system binaries including ping, chmod, sudo, id, and cat through the backdoor. This resulted in AOT files being created and provided evidence of attacker interaction on the system through the malware (Figure 5).
In some cases, the initial infection vector in macOS intrusions has involved legitimate x86-64 code that executes malware distributed as universal binaries. Because the initial x86-64 code runs under Rosetta 2, the x86-64 versions of malicious universal binaries are executed, leaving behind Rosetta 2 artifacts, including AOT files. In one case, a malicious Python 2 script led to the downloading and execution of a malicious universal binary. The Python 2 interpreter ran under Rosetta 2 since no ARM64 version was available, so the system executed the x86-64 version of the malicious universal binary, resulting in the creation of AOT files. Despite the attacker deleting the malicious binary later, we were able to analyze the AOT file to understand its functionality.
Unified Logs
The Rosetta 2 Daemon emits logs to the macOS Unified Log; however, the binary name values are marked as private. These values can be configured to be shown in the logs with a custom profile installed. Informational logs are recorded for AOT file lookups, when cached AOT files are available and utilized, and when translation occurs and completes. For binaries that are not configured to log to the Unified Log and are not launched interactively, in some cases this was found to be the only evidence of execution within the Unified Logs. Execution may be correlated with other supporting artifacts; however, this is not always possible.
0x21b1afc Info 0x0 1596 0 oahd: <private>(1880):
Aot lookup request for <private>
0x21b1afc Info 0x0 1596 0 oahd: <private>(1880):
Translating image <private> -> <private>
0x21b1afc Info 0x0 1596 0 oahd: <private>(1880):
Translation finished for <private>
0x21b1afc Info 0x0 1596 0 oahd: <private>(1880):
Aot lookup request for <private>
0x21b1afc Info 0x0 1596 0 oahd: <private>(1880):
Using cached aot <private> -> <private>
Figure 3: macOS Unified Logs showing Rosetta lookups, using cached files, and translating with private data disabled (default)
0x2ec304 Info 0x0 668 0 oahd: my_binary (Re(34180):
Aot lookup request for /Users/Mandiant/my_binary
0x2ec304 Info 0x0 668 0 oahd: my_binary (Re(34180):
Translating image /Users/Mandiant/my_binary ->
/var/db/oah/237823680d6bdb1e9663d60cca5851b63e79f6c
8e884ebacc5f285253c3826b8/1c65adbef01f45a7a07379621
b5800fc337fc9db90d8eb08baf84e5c533191d9/my_binary.in_progress
0x2ec304 Info 0x0 668 0 oahd: my_binary (Re(34180):
Translation finished for /Users/Mandiant/my_binary
0x2ec304 Info 0x0 668 0 oahd: my_binary(34180):
Aot lookup request for /Users/Mandiant/my_binary
0x2ec304 Info 0x0 668 0 oahd: my_binary(34180):
Using cached aot /Users/Mandiant/my_binary ->
/var/db/oah/237823680d6bdb1e9663d60cca5851b63e
79f6c8e884ebacc5f285253c3826b8/1c65adbef01f45a7
a07379621b5800fc337fc9db90d8eb08baf84e5c533191d9/my_binary.aot
Figure 4: macOS Unified Logs showing Rosetta lookups, using cached files, and translating with private data enabled (with custom profile installed)
FSEvents
FSEvents can be used to identify historical execution of x86-64 binaries even if Unified Logs or files in the Rosetta 2 Cache are not available or have been cleared. These records will show the creation of directories within the Rosetta 2 cache directory, the creation of .in_progress files, and then the renaming of the file to the AOT file, which will be named after the original binary.
Figure 5: Decoded FSEvents records showing the translation of a x86-64 POOLRAT variant on macOS, and subsequent universal system binaries executed by the malware as x86-64
AOT File Analysis
The AOT files within the Rosetta 2 cache can provide valuable insight into historical evidence of execution of x86-64 binaries. In multiple cases over the past year, Mandiant identified macOS systems being the initial entry vector by APT groups targeting cryptocurrency organizations. In the majority of these cases, Mandiant identified evidence of the attackers deleting the malware on these systems within a few minutes of a cryptocurrency heist being perpetrated. However, the AOT files were left in place, likely due to the protection by SIP and the relative obscurity of this forensic artifact.
From a forensic perspective, the creation and modification timestamps on these AOT files provide evidence of the first time a specified binary was executed on the system with a unique combination of the attributes used to generate the SHA-256 hash. These timestamps can be corroborated with other artifacts related to binary execution where available (for example, Unified Logs or ExecPolicy, XProtect, and TCC Databases), and file system activity through FSEvents records, to build a more complete picture of infection and possible attacker activity if child processes were executed.
Where multiple AOT files exist for the same origin binary under different Binary UUID directories in the Rosetta 2 cache, and the content (file hashes) of those AOT files is the same, this is typically indicative of a change in file data sections, or more commonly, file system metadata only.
Mandiant has previously shown that AOT files can be analyzed and used for malware identification through correlation of symbols. AOT files are Mach-O binaries that contain x86-64 instructions that have been translated from the original ARM64 code. They contain jump-backs into the original binary and contain no API calls to reference. Certain functionality can be determined through reverse engineering of AOT files; however, no static data, including network-based indicators or configuration data, are typically recoverable. In one macOS downloader observed in a notable DPRK cryptocurrency heist, Mandiant observed developer file path strings as part of the basic Mach-O information contained within the AOT file. The original binary was not recovered due to the attacker deleting it after the heist, so this provided useful data points to support threat actor attribution and malware family assessment.
Figure 6: Interesting strings from an AOT file related to a malicious DPRK downloader that was unrecoverable
In any case, determining malware functionality is more effective using the original complete binary instead of the AOT file, because the AOT file lacks much of the contextual information present in the original binary. This includes static data and complete Mach-O headers.
Poisoning AOT Files
Much has been written within the industry about the potential for the poisoning of the Rosetta 2 cache through modification or introduction of AOT files. Where SIP is disabled, this is a valid attack vector. Mandiant has not yet seen this technique in the wild; however, during hunting or investigation activities, it is advisable to be on the lookout for evidence of AOT poisoning. The best way to do this is by comparing the contents of the ARM64 AOT files with what would be expected based on the original x86-64 executable. This can be achieved by taking the original x86-64 executable and using it to generate a known-good AOT file, then comparing this to the AOT file in the cache. Discrepancies, particularly the presence of injected shellcode, could indicate AOT poisoning.
Conclusion
There are several forensic artifacts on macOS that may record historical evidence of binary execution. However, in cases of advanced intrusions with forensically aware attackers, original binaries being deleted, and no further security monitoring solutions, combining FSEvents, Unified Logs, and, crucially, residual AOT files on disk has provided the residual evidence of intrusion on a macOS system.
Whilst signed macOS ARM64 binaries may be the future, for now AOT files and the artifacts surrounding them should be reviewed in analysis of any suspected macOS intrusion and leveraged for hunting opportunities wherever possible.
The behavior identified in the cases presented here was identified on various versions of macOS between 13.5 and 14.7.2. Future or previous versions of macOS and Rosetta 2 may behave differently.
Acknowledgements
Special thanks to Matt Holley, Mohamed El-Banna, Robert Wallace, and Adrian Hernandez.
Amazon Connect now allows agents to exchange shifts with each other, providing greater schedule flexibility without compromising service levels. With this launch, agents can initiate shift trades directly, allowing them to manage unexpected life events without using time off. Additionally, contact center managers can now automate some approvals while ensuring others are approved manually — reducing admin work without sacrificing controls when needed. For example, supervisors can automate approvals for agents handling non-critical tasks, such as routine customer inquiries, while manually approving requests from agents who handle sensitive customer segments, like healthcare or high-value enterprise accounts.
Amazon Data Lifecycle Manager now supports AWS PrivateLink to connect directly to the Amazon Data Lifecycle Manager APIs in your virtual private cloud (VPC) instead of connecting over the internet.
Customers create Amazon Data Lifecycle Manager policies to automate the creation, retention, and management of EBS Snapshots and EBS-backed Amazon Machine Images (AMIs). When you use AWS PrivateLink to access Amazon Data Lifecycle Manager APIs, communication between your VPC and Amazon Data Lifecycle Manager API is conducted privately within the AWS network, providing a secure pathway for your data. An AWS PrivateLink endpoint connects your VPC directly to the Amazon Data Lifecycle Manager API.
This feature is available in all AWS Regions where Amazon Data Lifecycle Manager is available. You can create an AWS PrivateLink to connect to Amazon Data Lifecycle Manager using the AWS Management Console or AWS Command Line Interface (AWS CLI) commands. To learn more about using AWS PrivateLink, please refer to our documentation.
Amazon Connect Contact Lens now provides generative AI-powered contact categorization in five additional regions, making it easy to identify top drivers, customer experience, and agent behavior for your contacts. With this launch, you can use natural language instructions to define a criteria to automatically categorize customer contacts (e.g., “show me calls where customers attempted payment”). Contact Lens automatically labels interactions matching your criteria and extracts relevant conversation points. In addition, you can receive alerts and generate tasks on categorized contacts, and search for contacts using the automated labels. This feature helps managers easily categorize contacts for scenarios such as identifying customer interest in specific products, assessing customer satisfaction, monitoring whether agents exhibited professional behavior on calls, and more.
This feature is supported in English language and is available in five additional AWS regions including Europe (London), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney) and Canada (Central). To learn more, please visit our documentation and our webpage. This feature is included within Contact Lens conversational analytics price at no additional cost. For information about Contact Lens pricing, please visit our pricing page.
Amazon Bedrock announces the preview launch of the Session Management APIs, a new capability that enables developers to simplify state and context management for generative AI (GenAI) applications built with popular open-source frameworks such as LangGraph and LlamaIndex. Session Management APIs provide an out-of-the-box solution that enables developers to securely manage state and conversation context across multi-step GenAI workflows, eliminating the need to build, maintain, or scale custom backend solutions.
By preserving session state between interactions, Session Management APIs enhance workflow continuity, enabling GenAI applications, such as virtual assistants and multi-agent research workflows, that require persistent context across extended interactions. Developers can use this capability to checkpoint workflow stages, save intermediate states, and resume tasks from points of failure or interruption. Additionally, they can pause and replay sessions and leverage detailed traces to debug and enhance their GenAI applications. By treating session as a first-class resource, this capability enables developers to enforce granular access control through AWS Identity and Access Management (IAM) and encrypt data using AWS Key Management Service (KMS), ensuring that data from different user sessions is securely isolated and supporting multi-tenant applications with strong privacy protections.
Session Management APIs are now available in preview in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Zurich), and South America (São Paulo). To learn more, visit the technical documentation.
AWS CodeBuild now supports macOS 15.2 as a new major version for macOS builds. This allows developers to build and test their applications in the latest macOS environment. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages ready for deployment.
The new major version includes the latest Xcode 16.2, Fastlane for iOS automation and Finch for container management in macOS environment. We have also updated our existing macOS AMI to version 14.7, ensuring customers have access to the latest security updates and improvements.
The new macOS 15.2 is available in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Sydney). For more information about the AWS Regions where CodeBuild is available, see the AWS Regions page.
To learn more about the runtimes supported by macOS, please visit our documentation. To learn more about how to get started with CodeBuild, visit the AWS CodeBuild product page.