Today, we are announcing the availability of AWS Backup support for Amazon FSx for OpenZFS in 13 additional AWS Regions. AWS Backup is a policy-based, fully managed and cost-effective solution that enables you to centralize and automate data protection of AWS services (spanning compute, storage, and databases) and third-party applications. With this launch, AWS Backup customers can help improve business continuity, disaster recovery, and compliance requirements by protecting Amazon FSx for OpenZFS backups in additional Regions.
AWS Backup support for Amazon FSx for OpenZFS is added in the following Regions: Africa (Cape Town), Asia Pacific (Hyderabad, Jakarta, Osaka), Europe (Milan, Paris, Spain, Zurich), Israel (Tel Aviv), Middle East (Bahrain, UAE), South America (São Paulo), and US West (N. California).
Today, we are announcing the availability of AWS Backup logically air-gapped vault support for Amazon FSx for Lustre, Amazon FSx for Windows File Server, and Amazon FSx for OpenZFS. Logically air-gapped vault is a type of AWS Backup vault that allows secure sharing of backups across accounts and organizations, supporting direct restore to reduce recovery time from a data loss event. A logically air-gapped vault stores immutable backup copies that are locked by default, and isolated with encryption using AWS owned keys.
You can now protect your Amazon FSx file system in logically air-gapped vaults in either the same account or across other accounts and Regions. This helps reduce the risk of downtime, ensure business continuity, and meet compliance and disaster recovery requirements.
You can get started using the AWS Backup console, API, or CLI. Target Amazon FSx backups to a logically air-gapped vault by specifying it as a copy destination in your backup plan. Share the vault for recovery or restore testing with other accounts using AWS Resource Access Manager (RAM). Once shared, you can initiate direct restore jobs from that account, eliminating the overhead of copying backups first.
AWS Backup support for the three Amazon FSx file systems is available in all the Regions where logically air-gapped vault and respective Amazon FSx file systems are supported. For more information, visit the AWS Backup product page, and documentation.
Today, Amazon Web Services (AWS) announces the availability of Amazon GuardDuty Malware Protection for Amazon S3 in AWS GovCloud (US) regions. This expansion of GuardDuty Malware Protection allows you to scan newly uploaded objects to Amazon S3 buckets for potential malware, viruses, and other suspicious uploads and take action to isolate them before they are ingested into downstream processes.
GuardDuty helps customers protect millions of Amazon S3 buckets and AWS accounts. GuardDuty Malware Protection for Amazon S3 is fully managed by AWS, alleviating the operational complexity and overhead that normally comes with managing a data-scanning pipeline, with compute infrastructure operated on your behalf. This feature also gives application owners more control over the security of their organization’s S3 buckets; they can enable GuardDuty Malware Protection for S3 even if core GuardDuty is not enabled in the account. Application owners are automatically notified of the scan results using Amazon EventBridge to build downstream workflows, such as isolation to a quarantine bucket, or define bucket policies using tags that prevent users or applications from accessing certain objects.
GuardDuty Malware Protection for Amazon S3 is available in all AWS Regions where GuardDuty is available, excluding China Regions.
AWS Amplify Hosting is excited to offer Skew Protection, a powerful feature that guarantees version consistency across your deployments. This feature ensures frontend requests are always routed to the correct server backend version—eliminating version skew and making deployments more reliable.
You can enable this feature at the branch level in the Amplify Console under App Settings → Branch Settings. There is no additional cost associated with this feature and it is available to all customers.
This feature is available in all 20 AWS Amplify Hosting regions: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Tokyo), Asia Pacific (Osaka) Asia Pacific (Seoul), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Canada (Central), Europe (Frankfurt), Europe (Stockholm), Europe (Milan), Europe (Ireland), Europe (London), Europe (Paris), Middle East (Bahrain) and South America (São Paulo).
AWS announces the general availability of one new larger sizes (48xlarge) on Amazon EC2 I8g instances in US East(N. Virginia) and US West(Oregon) regions. The new size expand the I8g portfolio supporting up to 192vCPUs, providing additional compute options to scale-up existing workloads or run larger sized applications that need additional CPU and memory. I8g instances are powered by AWS Graviton4 processors that deliver up to 60% better compute performance compared to previous generation I4g instances. I8g instances use the latest third generation AWS Nitro SSDs, local NVMe storage that deliver up to 65% better real-time storage performance per TB while offering up to 50% lower storage I/O latency and up to 60% lower storage I/O latency variability. These instances are built on the AWS Nitro System, which offloads CPU virtualization, storage, and networking functions to dedicated hardware and software enhancing the performance and security for your workloads.
I8g instances offer instance sizes up to 48xlarge, 1,536 GiB of memory, and 45 TB instance storage. They are ideal for real-time applications like relational databases, non-relational databases, streaming databases, search queries and data analytic.
Amazon S3 Tables now seamlessly integrate with Amazon SageMaker Lakehouse, making it easy to query and join S3 Tables with data in S3 data lakes, Amazon Redshift data warehouses, and third-party data sources. S3 Tables deliver the first cloud object store with built-in Apache Iceberg support. SageMaker Lakehouse is a unified, open, and secure data lakehouse that simplifies your analytics and artificial intelligence (AI). All data in SageMaker Lakehouse can be queried from SageMaker Unified Studio and engines such as Amazon EMR, AWS Glue, Amazon Redshift, Amazon Athena, and Apache Iceberg-compatible engines like Apache Spark or PyIceberg.
SageMaker Lakehouse provides the flexibility to access and query data in-place across S3 Tables, S3 buckets, and Redshift warehouses using the Apache Iceberg open standard. You can secure and centrally manage your data in the lakehouse by defining fine-grained permissions that are consistently applied across all analytics and ML tools and engines. You can access SageMaker Lakehouse from Amazon SageMaker Unified Studio, a single data and AI development environment that brings together functionality and tools from AWS analytics and AI/ML services.
The integrated experience to access S3 Tables with SageMaker Lakehouse is generally available in all AWS Regions where S3 Tables are available. To get started, enable S3 Tables integration with Amazon SageMaker Lakehouse, which allows AWS analytics services to automatically discover and access your S3 Tables data. To learn more about S3 Tables integration, visit the documentation and product page. To learn more about SageMaker Lakehouse, visit the documentation, product page, and read the launch blog.
AWS announces the general availability of Amazon SageMaker Unified Studio, a single data and AI development environment that brings together functionality and tools from AWS analytics and AI/ML services, including Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and Amazon SageMaker AI. This launch includes simplified permissions management that makes it easier to bring existing AWS resources to the unified studio. SageMaker Unified Studio allows you to find, access, and query data and AI assets across your organization, then collaborate in projects to securely build and share analytics and AI artifacts, including data, models, and generative AI applications. Unified access to your data is provided by Amazon SageMaker Lakehouse and governance capabilities are built in via Amazon SageMaker Catalog.
Amazon Q Developer is now generally available in SageMaker Unified Studio, providing generative AI-powered assistance across the development lifecycle. Amazon Q Developer streamlines development by offering natural language, conversational interfaces that simplify tasks like writing SQL queries, building ETL jobs, troubleshooting, and generating real-time code suggestions. The Free Tier of Amazon Q Developer is available by default in SageMaker Unified Studio; customers with existing Amazon Q Developer Pro Tier subscriptions can access additional features.
Selected capabilities from Amazon Bedrock are also generally available in SageMaker Unified Studio. You can rapidly prototype, customize, and share generative AI applications using high-performing foundation models and advanced features such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, Amazon Bedrock Agents, and Amazon Bedrock Flows to create tailored solutions aligned to your requirements and responsible AI guidelines.
See Supported Regions for a list of AWS Regions where SageMaker Unified Studio is generally available. To learn more about SageMaker Unified Studio and how it can accelerate data and AI development, see the Amazon SageMaker Unified Studio webpage or documentation. You can start using SageMaker Unified Studio today by selecting “Amazon SageMaker” in the AWS Console.
Amazon S3 Tables now offer table management APIs that are compatible with the Apache Iceberg REST Catalog standard, enabling any Iceberg-compatible application to easily create, update, list, and delete tables in an S3 table bucket.
These new table management APIs, that map directly to S3 Tables operations, make it easier for you to get started with S3 Tables if you have a custom catalog implementation, need only basic read and write access to tabular data in a single S3 table bucket, or use an APN partner-provided catalog. For unified data management across all of your tabular data, data governance, and fine-grained access controls, you can use S3 Tables with SageMaker Lakehouse.
The new table management APIs are available in all AWS Regions where S3 Tables are available, at no additional cost. To learn more about S3 Tables, visit the documentation and product page. To learn more about SageMaker Lakehouse, visit the product page.
At Definity, a leading Canadian P&C insurer with a history spanning over 150 years, we have a long tradition of innovating to help our customers and communities adapt and thrive. To stay ahead in our rapidly evolving industry, we knew a unified data foundation was key to realizing the business and customer experience opportunities offered by modern analytics and AI.
While our legacy on-premises Cloudera platform had served us well, it could no longer support our growing needs for scale, innovation, and harnessing the power of data and AI. So, we embarked on a critical mission: modernizing our data infrastructure.
Legacy limitations stifling innovation
We faced a combination of interconnected challenges, which impact many organizations today:
Limited scalability and AI/ML workload support: Our existing infrastructure, constantly running at 80% utilization, was stretched thin. Processing billions of daily events for real-time analytics and scaling AI and ML workflows was a constant battle, limiting our ability to gain timely insights and develop innovative, data-driven products and experiences.
Data silos, fragmented insights: Our data resided in various systems, creating a fragmented view of our business. This made it difficult to get a holistic understanding of our customers and hindered initiatives like building a comprehensive customer 360º view and delivering personalized recommendations at a moment of relevance.
Escalating costs: Maintaining and scaling our Cloudera platform, which hosted massive data volumes (200TB compressed, 1PB uncompressed), was increasingly expensive and diverting valuable fiscal and people resources away from strategic priorities.
Faced with these pressing issues, the timing of our next renewal presented a strategic window of opportunity. We had a critical decision to make — migrate both technology and business platforms within 10 months or invest in upgrading our legacy Cloudera environment.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea733fef130>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Building a unified data and AI platform with Google Cloud
We chose Google Cloud and its powerful duo, BigQuery and Vertex AI, to build the Strategic Data Platform (SDP) — our new modern data analytics platform. BigQuery’s serverless architecture, unmatched scalability, built-in ML capabilities, and seamless integration with Vertex AI made it the ideal solution to power our data-driven transformation.Our migration was a remarkably fast-paced effort, carried out in close collaboration with with Google Cloud and Quantiphi, a Google Cloud partner.
Like many enterprises, we adopted a hybrid approach. We retained Databricks on Google Cloud for specific ETL workloads, utilizing Quantiphi’s expertise in converting legacy systems. At the same time, we migrated the bulk of our data processing to BigQuery for optimal cost-efficiency and performance. We also used Cloud Composer to orchestrate our complex data pipelines and ensure secure, private connectivity within our Google Cloud environment, a crucial requirement for handling sensitive customer data.
As a result, our dedicated team of over 100 Definity employees completed the migration in just ten months — 50% faster than the industry average. This rapid transition was aided by innovative tools, such as the “nifi-migration” solution built by Google Cloud Consulting. This open-source tool provided a visual and highly configurable way to automate real-time data flow between different systems, minimizing disruption and helping us surpass our initial migration timeline expectations.
Our CTO, Tatjana Lalkovic, who championed this effort to consolidate our structured and multimodal data to accelerate our AI/ML use cases, shared her perspective on the impact of our decision, saying:
“As we reimagined where data and AI could take our business, industry, and customer experience, Google Cloud BigQuery and Vertex AI stood out as modern, enterprise-ready serverless solutions prepared to meet the AI moment — not just today but for the foreseen future. The speed and success of this migration has created a lot of trust in our partnership and has been a significant boost to our digital transformation to streamline operations, improve products, and better serve customers.”
Strategic Data Platform – High level design on Google Cloud
Transforming insurance with data and AI
The results of our migration to BigQuery and Vertex AI has been transformative for Definity. We’ve seen exceptional user satisfaction, with the SDP achieving a remarkable Net Promoter Score (NPS) of 9.9 out of 10. The move has also saved us millions on our annual spend on non-strategic technologies and delivered a roughly 75% reduction in planned downtime for our digital platforms.
Performance has also dramatically improved, with processes to gain insights that once took days now completing in an average of 4.5 hours. Moreover, migrating to Google Cloud has helped us increase agility and innovation. We’re now able to double our business releases per year — achieving a 30% increase in testing automation, a 63% improvement in deployment time, and a 10x faster infrastructure setup.
By combining structured and unstructured data in BigQuery, we’ve unlocked new analytical possibilities and improved price-performance. This unified data foundation has empowered our business intelligence tools with richer, more comprehensive data, leading to more informed business decisions. The seamless integration with Vertex AI has enabled us to develop, deploy, and scale AI models, driving innovation in areas like fraud detection, automated intake, and personalized call center experiences. At the same time, we benefit from Google Cloud’s strong commitment to data security and privacy, helping us to strengthen our security posture and keep our customers safe.
As our VP of Data, Ron Mills, said:
“BigQuery’s serverless architecture has been a game-changer. The ‘nothing to manage’ approach is a huge differentiator. For enterprises like us that are migrating from on-prem clusters constantly running at 80% capacity, it’s like night and day.”
Lessons learned from our migration journey
Migrating a core data platform is a significant undertaking, and we’ve learned a lot along the way. For other organizations considering the same journey, here are some key takeaways from our experience:
One team, one goal: Foster a collaborative environment where technology and business teams, vendors, contractors, and consultants work together seamlessly towards a shared objective.
Leadership trust and commitment: Executive leadership trust in the delivery team’s decision-making is crucial for maintaining momentum and navigating challenges.
Be bold: Don’t be afraid to think outside the box, make timely decisions, and be prepared to adapt quickly to unforeseen setbacks.
Plan for the unknown: Anticipate potential roadblocks and have a core team dedicated to developing alternative solutions and addressing unforeseen issues.
Strong business partnership: A trusted relationship with business teams is essential for smooth user acceptance testing, change management, and avoiding unnecessary disruptions during the migration.
Balanced governance: Independent governance should provide guidance and support calculated risk-taking, acting as a partner in problem-solving rather than a blocker.
Motivated team: Cultivate a team-oriented environment where ownership of the project extends beyond leadership to every team member.
Transparent communication: Maintain open and consistent communication among all stakeholders (in our case, over 250 people) to ensure everyone is aligned and informed.
Fast fail and incremental delivery: Avoid a “big bang” approach. Embrace incremental releases (we aimed for 2-5 daily releases) to learn quickly, adapt, and iterate.
Parallel run: Plan for a parallel run of your systems on both the legacy and target cloud platforms to ensure a smooth transition and validate the new environment.
A data-driven future with limitless potential
Our migration to BigQuery and Vertex AI is just the first step in Definity’s data transformation journey. With a modern, scalable, and AI-ready data foundation now in place, we are empowered to unlock even greater value from our data and continue to lead innovation in the insurance industry. We are excited about the possibilities that lie ahead and are already actively developing our next AI use cases, including several focused on legal summarization and IT functions. We are confident that our partnership with Google Cloud will be instrumental in helping us achieve our goals.
Get ready to dive deep and level up your cloud skills at Google Cloud Next ’25. Whether you’re a seasoned pro or just starting your cloud journey, you’ll have more learning opportunities at Next than ever before. From hands-on challenges to expert-led workshops on AI and ML, Next ’25 (April 9-11, 2025) is your chance to transform your knowledge into real-world expertise.
The first-ever Skills Challenge: your chance to win big
This year, we’re launching a new, on-the-ground game: The Skills Challenge. Think of it as your personal learning adventure at Google Cloud Next, complete with:
Hands-on labs: Master practical skills.
Certification kickstarters: Pave your way to certification.
AI Agent Builder Bar: Experience AI agent development.
Quizzathon at Makerspace: Test your Google Cloud knowledge and win.
Leaderboard competition: See how you stack up against your peers and compete for grand prizes.
Early access: Be the first to try new gamification features on Google Cloud Skills Boost.
Don’t miss out on limited-edition swag and bragging rights on the leaderboard displayed at the Learning and Certification booth. Top the charts by the third day of Next for a chance to win a grand prize.
aside_block
<ListValue: [StructValue([(‘title’, ‘Get hands-on experience for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea731b74bb0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome/’), (‘image’, None)])]>
Dive into expert-led workshops
Developers, these lab-style sessions are for you. They’re led by knowledgeable experts who can help you build better, faster, and smarter applications. Secure your spot today by adding training sessions to your agenda, such as:
Gain insights from breakout sessions and lightning talks
From in-depth technical learning to strategic insights for leaders, explore topics like credentialing a modern workforce, and building a culture of continuous learning with expert-led presentations and panel discussions. Hear from customers about their technical use cases and gain valuable takeaways about their specific approaches and solutions.
Recharge at the Google Developer Experts and Certified Lounge
Connect with your peers, recharge, and refuel at an exclusive lounge reserved for Google Developer Experts. Enjoy light refreshments, meetups, and a photowall to capture your Next ’25 experience. When you register for Next, remember to identify yourself asGoogle Cloud Certifiedon yourProfile page for easy lounge entry!
Can’t wait? Start learning today with Google Cloud Skills Boost
Don’t wait until Next to start your learning journey. Jump-start your skills with courses, labs, learning paths and more on Google Cloud Skills Boost. Join theInnovators program to get 35 credits at no cost and use them to keep learning on Skills Boost.
We’ll see you April 9-11 at Google Cloud Next ‘25! Register today.
Today’s insurance customers expect more: simple digital services, instant access to service representatives when they want to discuss personal matters, and quick feedback on submitted invoices. Meeting these demands has become increasingly difficult for insurers due to rising inquiry volumes, a shortage of skilled workers, and the loss of expertise as employees retire.
Recognizing the growing need for immediate and accurate responses, SIGNAL IDUNA, a leading German full-service insurer, particularly prominent in health insurance, introduced a cutting-edge AI knowledge assistant, powered by Google Cloud generative AI.
“We’ve pioneered to unlock the power of human-AI collaboration: To redefine process efficiency by bringing together technology and subject matter experts to deliver exceptional customer experiences,” said Johannes Rath, board member for Customer, Service, and Transformation at SIGNAL IDUNA.
SIGNAL IDUNA, in collaboration with Google Cloud, BCG and Deloitte, has developed an AI knowledge assistant that empowers service agents to quickly and accurately resolve complex customer inquiries. This innovative solution uses Google Cloud AI, including Google’s multimodal Gemini models, to help agents find relevant documents and provide comprehensive answers 30% faster — ultimately, enhancing customer satisfaction.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3ea734674910>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
The Challenge: Meeting modern expectations
Like many organizations in the insurance sector, SIGNAL IDUNA faced significant operational burdens. The complexity of insurance products, along with the growing demand for immediate and accurate responses, often leads to bottlenecks that can impact service experiences.
For example, prior to introducing its AI knowledge assistant, service agents had to manually search thousands of internal documents for hundreds of different tariffs to find the information needed to answer questions or resolve customer issues — including, insurance conditions, tariff information, guidelines, and standard operating procedures. As a result, 27% of inquiries required further escalation to other departments or specialists, resulting in delayed resolutions, increased costs, and potential damage to reputation.
Though complex, SIGNAL IDUNA prioritized this process as one of its top gen AI use cases, developing an AI assistant to help agents provide quick and accurate answers to customer inquiries, particularly those about health insurance. The AI knowledge assistant is grounded in more than 2,000 internal documents for more than 600 different tariffs, allowing agents to ask questions in natural language and receive accurate answers, significantly reducing the time spent searching for relevant information.
A deep dive into SIGNAL IDUNA’s gen AI system
Working with Google Cloud, BCG, and Deloitte, SIGNAL IDUNA built a sophisticated generative AI architecture using Google Cloud’s AI platform, Vertex AI, and utilized Gemini 1.5 Pro’s long-context capabilities to develop an AI knowledge assistant that can provide quick and accurate access to the right information within a vast collection of documents. The system employs multiple steps to aggregate and process extensive information from diverse sources, ensuring agents can access the complete context necessary to effectively address customer inquiries.
Here’s a breakdown of the key steps:
An end-to-end architecture diagram
1. Data pre-processing and extraction The knowledge base is built from various document types, which are typically in PDF format, including policy documents, operating procedures, and general terms and conditions.
SIGNAL IDUNA utilizes a hybrid approach that combines Layout Parser in Google Cloud Document AI and PDFPlumber to parse these PDFs and extract the text content. While the Layout Parser is responsible for extracting the text segments, SIGNAL IDUNA enhances the extraction of tables with PDFPlumber if the quality of the PDFs allows. The extracted texts are then cleaned, chunked by Google’s Gecko multilingual embedding model, and enhanced with additional metadata, enabling the ability to process and analyze the information later effectively.
For storing the vectorized texts, Google Cloud SQL for PostgreSQL is used with the pgvector PostgreSQL extension, which provides a highly effective vector database solution for our needs. By storing vectorized text chunks in Cloud SQL, SIGNAL IDUNA benefits from its scalability, reliability, and seamless integration with other Google Cloud services, while pgvector empowers efficient similarity search functionality.
2. Query augmentation Query augmentation generates multiple queries to improve the formulation of user questions for both document retrieval from the vector store and answer generation. The original question is reformulated into several variants, creating three versions in total: the original query, a rewritten query, and an imitation query. These are used then to retrieve relevant documents and generate the final answer.
For the rewritten query, the system uses Gemini Pro 1.5 to correct spelling errors in the original question. Additionally, the query is expanded by adding synonyms for predefined terms and tagging specific terms (e.g., “remedies,” “assistive devices,” “wahlleistung/selective benefits”) with categories. The system also uses information about selected tariffs to enrich the query. For example, tariff attributes, such as brand or contract type, are extracted from a database and appended to the query in a structured format. These specific adjustments make it possible to handle special tariff codes and add further context based on tariff prefixes.
The imitation query uses Gemini Pro 1.5 to rephrase the question to mimic the language of technical insurance documents, improving the semantic similarity with the source material. It considers conversation history and handles age formatting.
3. Retrieval First, the system checks the query cache, which stores previously answered questions and their corresponding correct answers. If the question, or one very similar to it, has already been successfully resolved, the cached answer is retrieved, helping to provide a rapid answer. This efficient approach ensures quick access to information and avoids redundant processing.
The accuracy of the cache is maintained through a user feedback loop, which identifies correctly answered questions to be stored in the cache through upvotes. A downvote on a cached answer triggers an immediate cache invalidation, ensuring only relevant and helpful responses are served. This dynamic approach improves the efficiency and accuracy of the system over time. If no matching questions are found in the query cache, the retrieval process falls back on the vector store, ensuring that the system can answer novel questions.
After retrieving any relevant information chunks from the query cache or vector store, the system uses the Vertex AI ranking API to rerank them. This crucial process analyzes various signals to refine the results, prioritizing relevance and ensuring the most accurate and helpful information is presented.
Ensuring complete and accurate answers is paramount during retrieval, and SIGNAL IDUNA found that some queries required information beyond what was available in the source documents. To address this issue, the system uses keyword-based augmentations to supplement the final prompt, providing a more comprehensive context for generating responses.
4. Generation The answer generation process involves three key components: the user’s question with multiple queries, retrieved chunks of relevant information, and augmentations that add further context. These elements are combined to create the final response using a complex prompt template.
Delivering a near real-time experience is crucial for service agents, so SIGNAL IDUNA also streams the generated response. During development, minimizing latency based on the input posed a significant technical hurdle. To address this issue, SIGNAL IDUNA reduced processing times using asynchronous APIs to help stream data and handle multiple requests. Currently, the system has achieved an average response time of approximately 6 seconds, and SIGNAL IDUNA is experimenting with newer faster models to reduce this time even further.
5. Evaluation Rigorous evaluation is essential for optimizing Retrieval Augmented Generation (RAG) systems. SIGNAL IDUNA uses the Gen AI evaluation service in Vertex AI to automate the assessment of both response quality and the performance of all process components, such as retrieval. A comprehensive question set, created with input from SIGNAL IDUNA’s service agents, forms the basis of these automated tests.
Here’s a closer look at how Looker helps evaluate the AI knowledge assistant:
Chunk retrieval: First, SIGNAL IDUNA evaluates retrieval of relevant information chunks. Metrics at this stage help assess how effectively the model identifies and gathers the necessary information from the source data. This includes tracking gen AI metrics, such as recall, precision, and F1-score, to pinpoint areas for improvement in the retrieval process. This is crucial as retrieving the correct information is the foundation of a good generated response.
Document reranking: Once the relevant chunks are retrieved, they’re reranked to prioritize the most pertinent information. The Looker dashboard allows monitoring the effectiveness of this reranking process.
Generated vs. expected response comparison: The final stage involves comparing the generated response with the expected response. SIGNAL IDUNA evaluates the quality, accuracy, and completeness of the generated output, utilizing large language models (LLMs) to score the similarity between the generated response and the expected response.
Explanation generation: To understand the reasoning behind an LLM’s evaluation, SIGNAL IDUNA generates explanations for its judgments. This provides valuable insights into the strengths and weaknesses of the generated responses, helping the developers identify specific areas for improvement.
This multi-stage evaluation approach provides SIGNAL IDUNA a holistic view of the model’s performance, enabling data-driven optimization at each stage. The Looker dashboard plays a vital role in visualizing these metrics, making it easier for the developers to identify areas where the model excels and where it needs improvement.
Real-world impact: AI-powered efficiency and productivity
To determine whether the AI assistant provided measurable added value for its workforce, SIGNAL IDUNA conducted an experiment with a total of 20 employees (internal and with external providers). During the experiment, customer requests were processed with and without the AI knowledge assistant to assess its impact.
One of the key benefits observed was a reduction in processing time. Searching across numerous data sources used to be a time-consuming process. The experiment showed that using the AI knowledge assistant reduced the core processing time (information search and response formulation) by approximately 30% and increased the quality of the response based on expert evaluations. The time saved was particularly notable for employees with less than two years of experience in health insurance.
In addition, the AI knowledge assistant significantly increased the case closure rate. Health insurance is a very complex field, and the use of external service providers means that not every employee can always answer every customer question. With support from the AI knowledge assistant, SIGNAL IDUNA’s case closure rate increased by approximately 24 percentage points, rising from 73% to almost 98%.
Scaling for the Future
“Together with Google, we at SIGNAL IDUNA have successfully applied gen AI to one of our core business processes” Stefan Lemke, CIO at SIGNAL IDUNA, said. “Now, it’s time to scale this powerful technology across our entire organization. We’re not just scaling a tool, we’re scaling innovation, learning, and the possibilities of what we can achieve.”
Gen AI offers enormous potential for optimizing processes and developing innovative solutions. With its innovative approach — business teams experimenting with the technology in a decentralized manner and developing customized applications — SIGNAL IDUNA is primed to pioneer the next generation of insurance solutions and services.
At the same time, SIGNAL IDUNA is establishing central standards to scale insights gained across the company and tap into the combined power of its teams, resources, and lines of business. This strategic decision has helped create valuable resources like code libraries, infrastructure blueprints, and centrally offered services.
By combining agility with established standards and best practices, SIGNAL IDUNA can now react quickly to new requirements, setting a new standard for efficiency and customer satisfaction.
This project was delivered by the following core team members, Max Tschochohei, Anant Nawalgaria, and Corinna Ludwig by Google, and Christopher Masch, Michelle Mäding from SIGNAL IDUNA
Today, we’re sharing the new Gemma 3 model is available on Vertex AI Model Garden, giving you immediate access for fine-tuning and deployment. You can quickly adapt Gemma 3 to your use case using Vertex AI’s pre-built containers and deployment tools.
In this post, you’ll learn how to fine-tune Gemma 3 on Vertex AI and deploy it as a production-ready endpoint.
Gemma 3 on Vertex AI: PEFT and vLLM deployment
Tuning and deploying large language models can be computationally expensive and time-consuming. That’s why we’re excited to announce Gemma 3 support for Parameter-Efficient Fine-Tuning (PEFT) and optimized deployment using vLLM on Vertex AI Model Garden.
Gemma 3 fine-tuning allows you to achieve performance gains with significantly less computational resources compared to full fine-tuning. Our vLLM-based deployment is easy-to-use and fast. vLLM’s optimized inference engine maximizes throughput and minimizes latency, ensuring a responsive and scalable endpoint for your Gemma 3 applications on Vertex AI.
Let’s look at how you can fine-tune and deploy your Gemma 3 model on Vertex AI.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Gemma 3 on Vertex AI’), (‘body’, <wagtail.rich_text.RichText object at 0x3ef612bbd4f0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
Fine-tuning Gemma 3 on Vertex AI
In Vertex AI Model Garden, you can fine-tune and deploy Gemma 3 using PEFT (LoRA) from Hugging Face in only a few steps. Before you run the notebook make sure you complete all of the initial steps as described in the notebook.
Fine-tuning Gemma 3 on Vertex AI for your use case requires a custom dataset. The recommended format is a JSONL file, where each line is a valid JSON string. Here’s an example inspired by the timdettmers/openassistant-guanaco dataset:
The JSON object has a key text, which should match train_column; The value should be one training data point, i.e. a string. You can upload your dataset to Google Cloud Storage (preferred) or to Hugging Face datasets.
Choose the Gemma 3 variant that best suits your needs. For example, to use the 1B parameter model:
You have the flexibility to customize model parameters and job arguments. Let’s explore some key settings. LoRA (Low-Rank Adaptation) is a PEFT technique that significantly reduces the number of trainable parameters. The following parameters control LoRA’s behavior. lora_rank controls to control dimensionality of the update matrices (smaller rank = fewer parameters), lora_alpha that scales the LoRA updates, and lora_dropout to add regularization. The following settings are a reasonable starting point.
When fine-tuning large language models (LLMs), precision is a key consideration, impacting both memory usage and performance. Lower precision training, such as 4-bit quantization, reduces the memory footprint. However, this can come with a slight performance trade-off compared to higher precisions like 8-bit or float16. The train_precision parameter dictates the numerical precision used during the training process. Choosing the right precision involves balancing resource limitations with desired model accuracy.
Optimizing model performance involves tuning training parameters that impact speed, stability, and capabilities. Essential parameters include per_device_train_batch_size, which determines the batch size per GPU, with larger sizes accelerating training but demanding more memory. gradient_accumulation_steps allows simulating larger batch sizes by accumulating gradients over smaller batches, providing a memory-efficient alternative at the cost of increased training time. The learning_rate dictates the optimization step size, where a rate that is too high can lead to divergence, while a rate that is too low can slow down convergence. The lr_scheduler_type dynamically adjusts the learning rate throughout training, such as through linear decay, fostering better convergence and accuracy. And, the total training duration is defined by either max_steps, specifying the total number of training steps, or num_train_epochs, with max_steps taking precedence if both are specified. Below you have the full training recipe you find in the official notebook.
You can monitor the fine-tuning progress using Tensorboard. Once the job is complete, you can upload the tuned model to the Vertex AI Model Registry and deploy it as an endpoint for inference. Let’s dive into deployment next.
Deploying Gemma 3 on Vertex AI
Deploying Gemma 3 on Vertex AI requires only three steps as described in this notebook.
First, you need to provision a dedicated endpoint for your Gemma 3 model. This provides a scalable and managed environment for hosting your model. You use the create function to set the endpoint name (display_name), and ensure dedicated resources for your model (dedicated_endpoint_enabled).
Next, register the Gemma 3 model within the Vertex AI Model Registry. Think of the Model Registry as a central hub for managing your models. It keeps track of different versions of your Gemma 3 model (in case you make improvements later), and is the central place from which you’ll deploy.
This step involves a few important configurations including the serving container to deploy Gemma 3.
To serve Gemma 3 on Vertex AI, use the Vertex AI Model Garden vLLM pre-built Docker image for fast and efficient model serving. The vLLM recipe to set how vLLM will serve Gemma 3 which includes --tensor-parallel-size lets you spread the model across multiple GPUs if you need extra computation resources, --gpu-memory-utilization controls how much of the GPU memory you want to use and --max-model-len sets the maximum length of text the model can process at once. You also have some advanced settings like --enable-chunked-prefill, and --enable-prefix-caching to optimize performance, especially when dealing with longer pieces of text.
There are also some of deployment configuration Vertex AI requires to serve the model including the port (8080 in our case) that the serving container will listen on, defines the URL path for making prediction requests (e.g., “/generate”) and the URL path for health checks (e.g., “/ping”), allowing Vertex AI to monitor the model’s status.
Finally, use upload() to take this configuration – the serving container, your model-specific settings, and instructions for how to run the model – and bundle them up into a single, manageable unit within the Vertex AI Model Registry. This makes deployment and version control much easier.
Now you’re ready to deploy the model. To deploy the registered model to the endpoint, use the deploy method as shown below.
This is where you choose the computing power for our deployment including the type of virtual machine (like “a3-highgpu-2g”, machine_type), the kind of accelerator (e.g., “NVIDIA_L4” GPUs, accelerator_type), how many accelerators to use (accelerator_count).
Deploying the model requires some time and you can monitor the status of the deployment in Cloud Logging. Once you get the endpoint running, you can use the ChatCompletion API to call the model and integrate it within your applications as shown below.
Depending on the Gemma model you deploy, you can use the ChatCompletion API to call the model with multimodal inputs (images). You can find more in the “Deploy Gemma 3 4B, 12B and 27B multimodal models with vLLM on GPU” section of the model card notebook.
What’s next?
Visit the Gemma 3 model card on Vertex AI Model Garden to get started today. For a deeper understanding of the model’s architecture and performance, check out this developer guide on Gemma 3.
AWS CloudFormation Hooks now supports three new invocation points for stacks, change sets, and AWS Cloud Control API (CCAPI) in the AWS GovCloud (US) Regions. You can now evaluate CloudFormation create/update/delete stack and change set operations, and CCAPI create/update operations. With this launch, you can standardize your proactive evaluations beyond CloudFormation resource properties by enabling safety checks that consider the entire context of a stack, a CloudFormation change set, and/or a CCAPI resource configuration.
CloudFormation Hooks also extended two new managed hooks to the AWS GovCloud (US) Regions. The managed Lambda and Guard Hook simplify your hooks authoring experience by pointing to an AWS Lambda function or an S3 bucket containing AWS CloudFormation Guard domain specific language rules. Today’s launch allows GovCloud customers and partners to leverage the new invocation points and the new managed hooks to help enforce organizational best practices easily and minimize the risk of non-compliant resources being provisioned.
With this launch, all CloudFormation Hooks’ features are available in 32 AWS regions globally: US East (Ohio, N. Virginia), US West (N. California, Oregon), Canada (Central, Calgary), Asia Pacific (Singapore, Tokyo, Seoul, Mumbai, Hong Kong, Osaka, Jakarta, Hyderabad, Malaysia, Sydney, Melbourne), Europe (Ireland, Stockholm, Frankfurt, Milan, London, Zurich, Paris, Spain), Middle East (UAE, Bahrain, Tel Aviv), South America (São Paulo), Africa (Cape Town), and the AWS GovCloud (US-East, US-West) Regions.
To get started, you can use the new Hooks console workflow within the CloudFormation console, AWS CLI, or new CloudFormation Hooks resources. To learn more, refer to Hooks User Guide.
Today, Amazon announces the expansion of Amazon Nova understanding models (Amazon Nova Lite, Amazon Nova Micro, Amazon Nova Pro) to AWS GovCloud (US-West) – an isolated U.S. sovereign region for managing sensitive data and controlled unclassified information.
Government customers, technology partners, and entities with highly-regulated enterprise requirements now have access to Amazon Nova’s powerful AI capabilities including: Amazon Nova Micro, a text-only model that delivers the lowest latency responses at a very low cost; Amazon Nova Lite, a very low-cost multimodal model that is lightning fast for processing image, video, and text inputs to generate text outputs; and Amazon Nova Pro, a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. These models support over 200 languages, text and vision fine-tuning, and easy integration with proprietary data and applications through Amazon Bedrock features such as Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents.
AWS Glue, a serverless data integration service, is now available in the Asia Pacific (Thailand) and Mexico (Central) Regions, enabling customers to build and run their ETL workloads closer to their data sources in these regions.
AWS Glue is a serverless data integration service that makes it simple to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides both visual and code-based interfaces to make data integration simpler so you can analyze your data and put it to use in minutes instead of months.
AWS CodeBuild now supports registering self-hosted runners at GitHub organization or enterprise level. Additionally, you can assign your self-hosted runners to specific runner groups for enhanced security and access control. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages ready for deployment.
Organization and enterprise level runners provide centralized management across multiple repositories. Runner groups offer additional security control with granular repository access policies. You can also configure webhook filters on your CodeBuild projects to allow or deny workflow jobs from specific GitHub organizations or repositories.
This feature is available in all regions where CodeBuild is offered. For more information about the AWS Regions where CodeBuild is available, see the AWS Regions page.
To get started, configure runner scope and group in your CodeBuild projects. CodeBuild will automatically register your runners to the correct destination. To learn more about using CodeBuild self-hosted runners, visit the CodeBuild runner tutorial.
Today, we are excited to announce that Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift now supports up to five integrations from the same Aurora cluster. This enhancement allows customers to create multiple zero-ETL integrations between a single Amazon Aurora PostgreSQL cluster and same or different Amazon Redshift warehouses, providing greater flexibility and efficiency in data analytics workflows.
With this new capability, customers can now seamlessly replicate data from a single Aurora PostgreSQL cluster to multiple Redshift environments without the need for complex extract, transform, and load (ETL) processes. This feature is particularly beneficial for organizations that require different data views or aggregations for various analytical purposes, such as departmental reporting, regional analysis, or specific project requirements. By supporting multiple integrations, customers can maintain a single source of truth in Aurora while distributing relevant data subsets to different Redshift warehouses, optimizing both storage and query performance.
Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift is available for Aurora PostgreSQL version 16.4 and higher in the regions listed here.
Amazon DynamoDB zero-ETL integration with Amazon Redshift is now supported in 3 additional regions: Asia Pacific (Thailand), Asia Pacific (Malaysia), and Mexico (Central). This expansion enables customers to run high-performance analytics on their DynamoDB data in Amazon Redshift with no impact on production workloads running on DynamoDB. With this launch, DynamoDB zero-ETL integration with Amazon Redshift is now supported in all AWS commercial regions where Amazon Redshift is available.
Zero-ETL integrations help you derive holistic insights across many applications, break data silos in your organization, and gain significant cost savings and operational efficiencies. Now you can run enhanced analysis on your DynamoDB data with the rich capabilities of Amazon Redshift, such as high performance SQL, built-in ML and Spark integrations, materialized views with automatic and incremental refresh, and data sharing. Additionally, you can use history mode to easily run advanced analytics on historical data, build lookback reports, and build Type 2 Slowly Changing Dimension (SCD 2) tables on your historical data from DynamoDB, out-of-the-box in Amazon Redshift, without writing any code.
The Amazon DynamoDB zero-ETL integration with Amazon Redshift is now available in Asia Pacific (Thailand), Asia Pacific (Malaysia), and Mexico (Central), in addition to previously supported regions. For a complete list of supported regions, please refer to the AWS Region Table where Amazon Redshift is available.
To learn more, visit the getting started guides for DynamoDB and Amazon Redshift. For more information on using history mode, we encourage you to visit our recent blog post here.
Today, we are excited to announce support for scratch, distroless (Debian/Ubuntu based), and Chainguard image scanning with Amazon Inspector. With the expanded support for ECR images, Amazon Inspector extends its security coverage to minimal and security-focused container bases, enabling teams to maintain robust security practices even with highly optimized container environments.
For ECR scanning, Amazon Inspector expands scanning to additional ecosystems including Go toolchain, Oracle JDK & JRE, Amazon Corretto, Apache Tomcat, Apache httpd, WordPress (core, themes, plugins), Google Puppeteer (Chrome embedding), and Node.js runtime. This enhancement helps customers identify vulnerabilities in ecosystem components and gain visibility into third party software. The same functionality is also available via the Amazon Inspector SBOM Scan API.
Additionally, Amazon Inspector now supports identifying discontinued operating systems running on Amazon EC2 instances and Amazon ECR container images. Amazon Inspector will generate a finding on resources using a discontinued operating system solely for informational purposes, aiding in the prioritization of risk mitigation strategies.
Amazon Inspector is a vulnerability management service that continually scans AWS workloads including Amazon EC2 instances, container images, and AWS Lambda functions for software vulnerabilities, code vulnerabilities, and unintended network exposure across your entire AWS organization.
Enhanced detections, and support for additional operating systems for ECR scanning is available in all commercial and AWS GovCloud (US) Regions where Amazon Inspector is available.
Amazon ECR announces ECR to ECR pull through cache, a capability that allows customers to automatically sync container images between two ECR private registries, existing across AWS regions and/or accounts. This enables customers to benefit from the reduced latency of pulling cached images in-region. With today’s release, Amazon ECR makes it easier for customers to optimize storage costs by providing a simple and reliable way to store local copies of only the images that are pulled across regions/accounts.
As customers grow, they often have container deployments spread across multiple AWS regions. Storing images within the region of deployment improves application start-up times due to lower latency of in-region pulls. To achieve this, customers have to maintain copies of all images in every region, which is not cost-effective as many of these images are not deployed. ECR to ECR pull through cache allows customers to sync images between ECR registries in a cost-effective way by caching only the images that are pulled. Customers can now push images to their primary registry and configure pull through cache rules to cache images into downstream registries. On an image pull, ECR automatically fetches the image from upstream registry, and caches it into an automatically created repository in downstream registry for future pulls. Additionally, this feature supports frequent syncs with upstream, helping keep the cached images up to date.
ECR to ECR Pull through cache is available in all AWS regions, excluding GovCloud (US) and China regions. To learn more, please visit our user guide.