Anthropic’s Claude 3.7 Sonnet hybrid reasoning model, their most intelligent model to date, is now available in Amazon Bedrock. Claude 3.7 Sonnet represents a significant advancement in AI capabilities, offering both quick responses and extended, step-by-step thinking made visible to the user. This new model includes strong improvements in coding and brings enhanced performance across various tasks, like instruction following, math, and physics.
Claude 3.7 Sonnet introduces a unique approach to AI reasoning by integrating it seamlessly with other capabilities. Unlike traditional models that separate quick responses from those requiring deeper thought, Claude 3.7 Sonnet allows users to toggle between standard and extended thinking modes. In standard mode, it functions as an upgraded version of Claude 3.5 Sonnet. While in extended thinking mode, it employs self-reflection to achieve improved results across a wide range of tasks. Amazon Bedrock users can adjust how long the model thinks, offering a flexible trade-off between speed and answer quality. Additionally, users can control the reasoning budget by specifying a token limit, enabling more precise management of cost.
Anthropic has optimized Claude 3.7 Sonnet for real-world applications that align closely with typical language model use cases, rather than focusing solely on math and computer science competition problems. This approach ensures that the model is well-suited to address the diverse needs of customers across various industries and use cases.
Claude 3.7 Sonnet is now available in Amazon Bedrock in the US East (N. Virginia), US East (Ohio), and US West (Oregon) regions. To get started, visit the Amazon Bedrock console. Integrate it into your applications using the Amazon Bedrock API or SDK. For more information and to learn more read the AWS News Blog and Claude in Bedrock product detail page.
AWS WAF enhances Service Quotas capabilities, enabling organizations to proactively monitor and manage quotas for their cloud deployments.
AWS WAF is a web application firewall that helps protect your web applications or APIs against common web exploits and bots that may affect availability, compromise security, or consume excessive resources. By leveraging AWS Service Quotas, you can quickly understand your applied service quota values for these WAF resources and request increases when needed. This enhanced integration brings three key benefits. First, you can now monitor the current utilization of your account-level quotas for WAF resources such as web ACLs, rule groups, and IP sets in the Service Quotas console. Second, certain service quota increase requests will now be auto-approved, enabling customers to access higher quotas faster. For example, smaller increases are usually automatically approved while larger requests are submitted to AWS Support. Lastly, you can now create Amazon CloudWatch alarms to notify you when your utilization of a given quota exceeds a configurable threshold. This enables you to better adapt your utilization based on your applied quota values and automate your quota increase requests.
You can access AWS Service Quotas through the AWS console, AWS APIs, and CLI. Integration with AWS Service Quotas is available in all AWS regions where AWS WAF is offered. You can learn more about AWS WAF by visiting Developer Guide.
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C7gd instances with up to 3.8 TB of local NVMe-based SSD block-level storage are available in the AWS GovCloud (US-East) Region.
These Graviton3-based instances with DDR5 memory are built on the AWS Nitro System and are a great fit for applications that need access to high-speed, low latency local storage, including those that need temporary storage of data for scratch space, temporary files, and caches. They have up to 45% improved real-time NVMe storage performance than comparable Graviton2-based instances. Graviton3-based instances also use up to 60% less energy for the same performance than comparable EC2 instances, enabling you to reduce your carbon footprint in the cloud.
C7gd instances are now available in the following AWS regions: US East (N. Virginia, Ohio), US West (Oregon, N. California), Europe (Spain, Stockholm, Ireland, Frankfurt), Asia Pacific (Tokyo, Mumbai, Singapore, Sydney, Malaysia) and AWS GovCloud (US-East).
Distributed tracing is a critical part of an observability stack, letting you troubleshoot latency and errors in your applications. Cloud Trace, part of Google Cloud Observability, is Google Cloud’s native tracing product, and we’ve made numerous improvements to the Trace explorer UI on top of a new analytics backend.
The new Trace explorer page contains:
A filter bar with options for users to choose a Google Cloud project-based trace scope, all/root spans and a custom attribute filter.
A visualization of matching spans including an interactive span duration heatmap (default), a span rate line chart, and a span duration percentile chart.
A table of matching spans that can be narrowed down further by selecting a cell of interest on the heatmap.
A tour of the new Trace explorer
Let’s take a closer look at these new features and how you can use them to troubleshoot your applications. Imagine you’re a developer working on the checkoutservice of a retail webstore application and you’ve been paged because there’s an ongoing incident.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e8295f2b1c0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
This application is instrumented using OpenTelemetry and sends trace data to Google Cloud Trace, so you navigate to the Trace explorer page on the Google Cloud console with the context set to the Google Cloud project that hosts the checkoutservice.
Before starting your investigation, you remember that your admin recommended using the webstore-prod trace scope when investigating webstore app-wide prod issues. By using this Trace scope, you’ll be able to see spans stored in other Google Cloud projects that are relevant to your investigation.
You set the trace scope to webstore-prod and your queries will now include spans from all the projects included in this trace scope.
You select checkoutservice in Span filters (1) and the following updates load on the page:
Other sections such as Span name in the span filter pane (2) are updated with counts and percentages that take into account the selection made under service name. This can help you narrow down your search criteria to be more specific.
The span Filter bar (3) is updated to display the active filter.
The heatmap visualization (4) is updated to only display spans from the checkoutservice in the last 1 hour (default). You can change the time-range using the time-picker (5). The heatmap’s x-axis is time and the y-axis is span duration. It uses color shades to denote the number of spans in each cell with a legend that indicates the corresponding range.
The Spans table (6) is updated with matching spans sorted by duration (default).
Other Chart views (7) that you can switch to are also updated with the applied filter.
From looking at the heatmap, you can see that there are some spans in the >100s range which is abnormal and concerning. But first, you’re curious about the traffic and corresponding latency of calls handled by the checkoutservice.
Switching to the Span rate line chart gives you an idea of the traffic handled by your service. The x-axis is time and the y-axis is spans/second. The traffic handled by your service looks normal as you know from past experience that 1.5-2 spans/second is quite typical.
Switching to the Span duration percentile chart gives you p50/p90/p95/p99 span duration trends. While p50 looks fine, the p9x durations are greater than you expect for your service.
You switch back to the heatmap chart and select one of the outlier cells to investigate further. This particular cell has two matching spans with a duration of over 2 minutes, which is concerning.
You investigate one of those spans by viewing the full trace and notice that the orders publish span is the one taking up the majority of the time when servicing this request. Given this, you form a hypothesis that the checkoutservice is having issues handling these types of calls. To validate your hypothesis, you note the rpc.method attribute being PlaceOrder and exit this trace using the X button.
You add an attribute filter for key: rpc.method value:PlaceOrder using the Filter bar, which shows you that there is a clear latency issue with PlaceOrder calls handled by your service. You’ve seen this issue before and know that there is a runbook that addresses it, so you alert the SRE team with the appropriate action that needs to be taken to mitigate the incident.
Share your feedback with us via the Send feedback button.
Behind the scenes
This new experience is powered by BigQuery, using the same platform that backs Log Analytics. We plan to launch new features that take full advantage of this platform: SQL queries, flexible sampling, export, and regional storage.
In summary, you can use the new Cloud Trace explorer to perform service-oriented investigations with advanced querying and visualization of trace data. This allows developers and SREs to effectively troubleshoot production incidents and identify mitigating measures to restore normal operations.
The new Cloud Trace explorer is generally available to all users — try it out and share your feedback with us via the Send feedback button.
From transcribing customer calls and meetings, to analyzing research interviews and creating accessible content, audio transcription plays a vital role in extracting insights from spoken data. Our partners are collaborating with clients across industries to implement transcription solutions that enhance efficiency, accessibility, and data-driven decision-making.
Traditional audio transcription methods, such as manual transcription or basic speech-to-text tools can be time-consuming, error-prone, and expensive. In this blog post, we show how Gemini offers a cutting-edge solution for scalable audio transcription by automating the process and delivering results with high accuracy at a fast pace – all in a cost-effective way.
The challenges of scaling audio transcription
As organizations scale their transcription needs, they might encounter challenges such as increasing costs, latency in handling large volumes of audio, and maintaining accuracy across diverse audio conditions. In particular, legacy solutions struggle with:
Handling complex audio with multiple speakers, accents, or background noise.
Maintaining accuracy in industry-specific terminology across healthcare, legal, and customer service domains.
Adapting to multilingual needs, especially in global business environments.
Optimizing processing time and cost, ensuring fast turnaround without excessive resource consumption.
A scalable solution must address these challenges efficiently, without compromising speed, accuracy, or customization — this is where Gemini excels.
How our partners put Gemini to work
Google Cloud Partners leverage audio transcription to help clients across various industries improve efficiency, compliance, and accessibility. Here are some examples:
Media and entertainment: Transcribe interviews, podcasts, and webinars for content creation, and generate subtitles to enhance accessibility and engagement.
Legal and Compliance: Transcribe legal proceedings, contracts, and compliance-related communications to improve accuracy, streamline case management, and ensure regulatory adherence.
Healthcare: Convert medical dictations and clinical notes into structured records for better documentation, electronic health record (EHR) integration, and regulatory compliance.
Business and corporate: Transcribe meetings, interviews, and presentations to improve collaboration, knowledge sharing, and record-keeping.
Gemini redefines the possibilities of scalable audio transcription, offering a potent combination of advanced AI and seamless integration with Google Cloud. Here’s what sets it apart:
Efficient processing of large datasets: Gemini can handle large volumes of audio data with ease, making it ideal for organizations with high-throughput transcription needs.
Exceptional accuracy and contextual understanding: Backed by decades of Google research and development in speech recognition and natural language understanding, Gemini delivers highly accurate transcriptions that capture the nuances of conversations. This minimizes the need for manual review and correction, especially in cases with multiple speakers, accents, or challenging audio conditions
Speaker diarization: Gemini can accurately identify and differentiate between speakers in an audio file, making it easier to follow conversations and attribute dialogue correctly
Multilingual support: Gemini supports transcription in multiple languages and dialects, expanding its utility for global businesses and diverse content
Customizable formatting: Gemini offers flexible formatting options, allowing users to tailor transcripts to their specific needs, including timestamps, speaker labels, and punctuation.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e8290547340>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Introducing a differentiated solution
The Google Cloud Partner Engineering team worked together with System Integrators (SIs) to build a differentiated solution that allows customers to implement audio transcription at scale using Google’s Gemini on Google Cloud.
Gemini’s advanced multi-modal and reasoning capabilities have unlocked new possibilities for audio transcription. This solution allows audio files to be sent directly to Gemini for transcription. The reference architecture below illustrates how this solution is built;
gen AI powered audio transcription reference architecture
This architecture demonstrates a robust and scalable approach to audio transcription using Gemini. It can be modified for any audio transcription use case. Here’s how it works:
1. File upload and sorting: The Upload Cloud Storage bucket is used to store source audio files like .wav, .mp3, .mp4, files etc. When these files are uploaded, Eventarc triggers the SortCloud Run function. This trigger event is passed using Cloud Pub/Sub.
The Sort Cloud Run function manages incoming files by sorting and filtering them based on their file types (e.g., .wav, .mp3). Depending on the file type, the files are then stored in either the Recordings Cloud Storage bucket or the Archive Cloud Storage bucket.
2. Transcription: When audio files are placed in the Recordings Cloud Storage bucket, Eventarc uses Cloud Pub/Sub to trigger the Recording Cloud Run function. This Recording function then sends the audio files to the Gemini 1.5 Flash LLM model for audio transcription.
3. Gemini’s multi-faceted processing: Gemini performs three key tasks:
a. Analysis and formatting: It analyzes the audio file, extracting pertinent data and structuring it into JSON format based on the audio file schema.
b. Transcription and summarization: Gemini transcribes the audio content into text and generates a concise summary.
c. Output and evaluation: The summarized text is sent to a “TTS Output” Cloud Storage bucket, triggering the TTS Audio Generation function. This function executes a script from the “Golden Script” Cloud Storage bucket to generate sample audio, which is then used to evaluate the transcription quality against established metrics like Word Error Rate (WER), Character Error Rate (CER), Match Rate, etc.
This approach provides key benefits: dynamic scaling through a serverless, event-driven architecture (Cloud Run, Eventarc), simplified management via fully managed services (Cloud Storage), cost-effectiveness by consuming resources only when needed, and enhanced capabilities like advanced summarization and speaker diarization powered by Gemini.
Design considerations
When designing audio transcription applications and services on Google Cloud with Gemini, several factors are crucial for optimal performance and scalability:
1. Efficient audio file handling: Avoid loading large audio files directly into memory for serverless transcription on Google Cloud. Instead, use Google Cloud Storage URI to efficiently access and process audio without memory limitations.
2. Serverless function timeouts: To prevent premature termination when processing large audio files in Cloud run, increase the function timeout up to 60 minutes. Also set the Pub/Sub subscription acknowledgement deadline to 300 seconds for Eventarc.
3. Model selection and context window: For gen AI audio transcription, audio file size and duration dictate the model selection. Larger files and longer audio require models with large context windows like Gemini 1.5 Flash (1M tokens) and Gemini 1.5 Pro (2M tokens), overcoming prior LLM input limitations on the market today. The Gemini 1.5’s extended context window and near-perfect retrieval capabilities open up many new possibilities;
Context lengths of leading foundation models
This shows that for audio transcription use cases, Gemini 1.5 Pro and Flash offer scalable audio transcription, processing up to 22 and 11 hours of audio respectively based on customer needs.
a. Use the latest Gemini SDK: Ensure your code utilizes the most up-to-date SDK for optimal diarization performance.
b. Design effective prompts: Craft prompts that clearly instruct Gemini on diarization and formatting requirements. The diagram below shows a code example of a diarization prompt
Sample transcription & diarization prompt
This sample code prompts Gemini to transcribe an audio file from Cloud Storage URI and displays the transcription.
5. Advanced diarization techniques: For complex scenarios with multiple speakers, accents, or overlapping speech, design prompts efficiently to improve Gemini’s diarization accuracy. Consider separating diarization and transcription into separate functions, the snippet below shows an example of this;
Separate transcription & diarization function
From the screenshot above, the function highlighted in the red box is the prompt that instructs Gemini to transcribe. It also shows how we want the transcription to be formatted. This operation allows Gemini to focus first on transcribing the audio into text and summarizing it.
The transcription function is actually a straightforward function and zero shot prompt. For the diarization function, we recommend you design your prompt with a few short examples. The code block highlighted in blue shows the diarization function with some examples to help the model to diarize effectively and efficiently when there are multiple speakers.
6. Evaluating transcription quality: when building gen AI powered audio transcription systems on Google Cloud we recommend implementing a mechanism to evaluate the transcribed responses to further ensure better accuracy. Consider using tools like our Model Evaluation Service to assess and improve transcription quality.
Get started
Ready to unlock the power of scalable audio transcription with Gemini? Explore Gemini’s API documentation and discover how easy it is to integrate its advanced capabilities into your solutions. By implementing the best practices and design considerations discussed in this post, you can deliver exceptional transcription experiences to clients and drive innovation across various industries.
If you are an approved partner and require assistance, contact your Google Partner Engineer for deployment support.
Written by: Ashley Pearson, Ryan Rath, Gabriel Simches, Brian Timberlake, Ryan Magaw, Jessica Wilbur
Overview
Beginning in August 2024, Mandiant observed a notable increase in phishing attacks targeting the education industry, specifically U.S.-based universities. A separate investigation conducted by the Google’s Workspace Trust and Safety team identified a long-term campaign spanning from at least October 2022, with a noticeable pattern of shared filenames, targeting thousands of educational institution users per month.
These attacks exploit trust within academic institutions to deceive students, faculty, and staff, and have been timed to coincide with key dates in the academic calendar. The beginning of the school year, with its influx of new and returning students combined with a barrage of administrative tasks, as well as financial aid deadlines, can create opportunities for attackers to carry out phishing attacks. In these investigations, three distinct campaigns have emerged, attempting to take advantage of these factors.
In one campaign, attackers leveraged phishing campaigns utilizing compromised educational institutions to host Google Forms. At this time, Mandiant has observed at least 15 universities targeted in these phishing campaigns. In this case, the malicious forms were reported and subsequently removed. As such, at this time none of the phishing forms identified are currently active. Another campaign involved scraping university login pages and re-hosting them on the attacker-controlled infrastructure. Both campaigns exhibited tactics to obfuscate malicious activity while increasing their perceived legitimacy, ultimately to perform payment redirection attacks. These phishing methods employ various tactics to trick victims into revealing login credentials and financial information, including requests for school portal login verification, financial aid disbursement, refund verification, account deactivation, and urgent responses to campus medical inquiries.
Google takes steps to protect users from misuse of its products, and create an overall positive experience. However, awareness and education play a big role in staying secure online. To better protect yourself and others, be sure to report abuse.
Case Study 1: Google Forms Phishing Campaign
The first observed campaign involved a two-pronged phishing campaign. Attackers distributed phishing emails that contained a link to a malicious Google Form. These emails and their respective forms were designed to mimic legitimate university communications, but requested sensitive information, including login credentials and financial details.
Figure 1: Example phishing email
Figure 2: Another example phishing email
The email is just the initial stage of the attack. While there are legitimate URLs contained within the phish, there is also a request to visit an external link to provide “urgent” information. This external link leads victims to a Google Form that has been tailored to the targeted university, including a color scheme in the school colors, a header with the logo or mascot, and references to the university name. Mandiant has observed the creation and staging of several different Google Forms, all with different methods employed to trick victims into providing sensitive information. In one instance, the social engineering pretext is that a student’s account is “associated with logins from two separate university portals”, a conflict which, if not resolved, will lead to interruption in service at both universities.
Figure 3: Example Google Form phish
These Google Forms phishing campaigns are not just limited to targeting login credentials. In several instances, Mandiant observed threat actors attempting to obtain financial institution details.
Your school has collaborated with <redacted> to streamline fund
distribution to students. <redacted> ensures the quickest, most
dependable, and secure method for disbursing Emergency Grants
to eligible students. Unfortunately, we've identified an outstanding
issue regarding the distribution of your financial aid through <redacted>.
We kindly request that you review and, if necessary, update your
<redacted> information within the net 24 hours. Failing to address
this promptly may result in delays in receiving your funds.
Figure 4: Example Google Form phish
After successfully compromising and propagating additional phishes using the compromised environment, the threat actor then uses the victim’s infrastructure to host a similar campaign targeting future victims. In some cases, the Google Form link was shut down and then repurposed to further the attacker’s objectives.
Case Study 2: Website Cloning and Redirection
This campaign involves a sophisticated phishing attack where threat actors cloned a university website, mimicking the legitimate login portal. However, this cloned website involved a series of redirects, specifically targeting mobile devices.
The embedded JavaScript performs a “mobile check” and user-agent string verification and performed the following hex-encoded redirect:
if (window.mobileCheck()) {
window.location.href="x68x74x74x70x3ax2fx2fx63x75x74
x6cx79x2ex74x6fx64x61x79x2fx4ax4ex78x30x72x37"; }
Figure 5: JavaScript Hex-encoded redirect
This JavaScript checks to determine if the user is on a mobile device. If they are, it redirects them to one of several possible follow-on URLs. These are two examples:
hxxp://cutly[.]today/JNx0r7
hxxp://kutly[.]win/Nyq0r4
Case Study 3: Two-Step Phishing Campaign Targeting Staff and Students
Google’s Workspace Trust and Safety team also observed a two-step phishing campaign targeting staff and students. First, attackers send a phishing email to faculty and staff. The emails are designed to entice faculty and staff to provide their login credentials in order to view a document about a raise or bonus.
Figure 6: Example of phishing email targeting faculty and staff
Next, attackers use login credentials provided by faculty and staff to hijack their account and email phishing forms to students. These forms are designed to look like job applications, and phish for personal and financial information.
Figure 7: Example of phishing form emailed to students
Understanding Payment Redirection Attacks
Payment redirection attacks via Business Email Compromise (BEC) are a sophisticated form of financial fraud. In these attacks, cyber threat actors gain unauthorized access to a business email account and exploit it to redirect payments meant for legitimate recipients into their own accounts. While these attacks often involve the diversion of large transfers, there have been instances where attackers divert small amounts (typically 5-10%) to lower the likelihood of detection. This outlier tactic allows them to steal funds gradually, making it more challenging to detect unauthorized transactions.
Figure 8: Payment redirection attacks
Initial Compromise: Attackers often begin by gaining access to a legitimate email account through phishing, social engineering, or exploiting vulnerabilities. A common phishing technique involves using online surveys or other similar platforms to create convincing but fraudulent login pages or forms. When unsuspecting employees enter their credentials, attackers capture them and gain unauthorized access.
Reconnaissance: Once they have access to the email account, attackers closely monitor communications to understand the organization’s financial processes, the relationships with vendors, and the typical language used in financial transactions. This reconnaissance phase is crucial for the attackers to craft convincing fraudulent emails that appear authentic to their victims.
Impersonation and Execution: Armed with the information gathered during reconnaissance, attackers impersonate the compromised user or create look-alike email addresses. The TA then sends emails to employees, vendors, or clients, instructing them to change payment details for an upcoming transaction. Believing these requests to be legitimate, recipients comply, and the funds are redirected to accounts controlled by the attackers.
Withdrawal and Laundering: After the funds are diverted, attackers quickly withdraw or move the money across multiple accounts to make recovery difficult. The types of funds being stolen can vary widely and include financial aid such as FAFSA, refunds, scholarships, payroll, and other large transactions like vendor payments or grants. This diversity in targeted funds complicates efforts by organizations and law enforcement to trace and recover the stolen money, as each category may involve different institutions and processes.
The Impact of Payment Redirection Attacks
The consequences of a successful payment redirection attack can be severe:
Financial Losses: Organizations may lose substantial amounts of money, potentially running into millions of dollars, depending on the size of the transactions.
Reputational Damage: Clients and partners affected by these attacks may lose trust in the organization, which can harm long-term business relationships and brand reputation.
Operational Disruption: The aftermath of an attack often involves extensive investigations, coordination with financial institutions and law enforcement, and implementing enhanced security measures, all of which can disrupt normal business operations.
Mitigating Payment Redirection Attacks
To protect against payment redirection attacks, Mandiant recommends a multi-layered approach focusing on prevention, detection, and response:
Implement Multi-Factor Authentication (MFA): Requiring MFA for accessing email accounts adds an additional layer of security. Even if an attacker obtains a user’s credentials, they would still need the second factor to gain access, significantly reducing the risk of account compromise. Mandiant has observed many universities, which require MFA for current faculty/staff/students, but not for alumni accounts. While alumni accounts aren’t necessarily at risk of payment redirection attacks, Mandiant has identified instances where alumni accounts have been leveraged to access other user accounts in the environment.
Conduct Employee Training: Regular training sessions can help employees recognize phishing attempts and suspicious emails. Training should emphasize vigilance against phishing forms hosted on platforms like Google Forms, and stress the importance of verifying unusual requests, especially those involving financial transactions or changes in payment details. If a Google Forms page seems suspicious, report it as phishing.
Establish Payment Verification Protocols: Organizations should have strict procedures for verifying changes in payment information. For example, a policy that requires confirmation of changes via a known phone number or a separate communication channel can help ensure that any alterations are legitimate.
Use Canary Tokens for Detection: Deploying canary tokens, which are unique identifiers embedded in web pages or documents, can serve as an early warning system. If attackers scrape legitimate web pages to host them maliciously on their infrastructure, these tokens trigger alerts, notifying security teams of potential compromise or unauthorized data access.
Use Advanced Email Security Solutions: Deploying advanced email filtering and monitoring solutions can help detect and block malicious emails. These tools can analyze email metadata, check for domain anomalies, and identify patterns indicative of BEC attempts.
Built-in Protections with Gmail: Employs AI, threat signals, and Safe Browsing to block 99.9% of spam, phishing, and malware, while also detecting more malware than traditional antivirus and preventing suspicious account sign-ins.
Develop a robust Incident Response Plan: A well-defined incident response plan specifically addressing BEC scenarios enables organizations to act swiftly when an attack is detected. This plan should include procedures for containing the breach, notifying affected parties, and collaborating with financial institutions and law enforcement to recover lost funds.
Limit the number of emails a standard user can send in a day: Implementing a policy that restricts the number of emails a standard user can send daily provides additional safeguards in preventing the mass dissemination of phishing emails or malicious content from compromised accounts. This limit can act as a safety net, reducing the potential impact of a compromised account and making it harder for attackers to carry out large-scale phishing campaigns.
Context-Aware Access Monitoring: Utilize context-aware access monitoring to enhance security by analyzing the context of each login attempt. This includes evaluating factors such as the user’s location, device, and behavior patterns. If an access attempt deviates from established norms, such as an unusual login location or device, additional verification steps can be triggered. This helps detect and prevent unauthorized access, particularly in cases where credentials may have been compromised.
Detection
To assist the wider community in hunting and identifying activity outlined in this blog post, we have included a subset of these indicators of compromise (IOCs) in this post, and in a GTI Collection for registered users.
Amazon AppStream 2.0 improves the end-user experience by adding support for certificate-based authentication (CBA) on multi-session fleets running the Microsoft Windows operating system and joined to an Active Directory. This functionality helps administrators to leverage the cost benefits of the multi-session model while providing an enhanced end-user experience. By combining these enhancements with the existing advantages of multi-session fleets, AppStream 2.0 offers a solution that helps balance cost-efficiency and user satisfaction.
By using certificate-based authentication, you can rely on the security and logon experience features of your SAML 2.0 identity provider, such as passwordless authentication, to access AppStream 2.0 resources. Certificate-based authentication with AppStream 2.0 enables a single sign-on logon experience to access domain-joined desktop and application streaming sessions without separate password prompts for Active Directory.
This feature is available at no additional cost in all the AWS Regions where Amazon AppStream 2.0 is available. AppStream 2.0 offers pay-as-you go pricing. To get started with AppStream 2.0, see Getting Started with Amazon AppStream 2.0.
To enable this feature for your users, you must use an AppStream 2.0 image that uses AppStream 2.0 agent released on or after February 7, 2025 or your image is using Managed AppStream 2.0 image updates released on or after February 11, 2025.
Amazon Database Migration Service (DMS) now supports the Multi-ENI networking model and Credentials Vending System for DMS Homogenous Migrations.
Customers can now choose the Multi-ENI connection type and use the Credentials Vending System, providing a simplified networking configuration experience for secure connectivity to their on-premises database instances.
Amazon Relational Database Service (RDS) for PostgreSQL now supports the latest minor versions 17.4, 16.8, 15.12, 14.17, and 13.20. Please note, this release supports the versions released by the PostgreSQL community on February, 20,2025 to address the regression that was part of the February 13, 2025 release. We recommend that you upgrade to the latest minor versions to fix known security vulnerabilities in prior versions of PostgreSQL, and to benefit from the bug fixes added by the PostgreSQL community.
You can use automatic minor version upgrades to automatically upgrade your databases to more recent minor versions during scheduled maintenance windows. You can also use Amazon RDS Blue/Green deployments for RDS for PostgreSQL usingphysical replication for your minor version upgrades. Learn more about upgrading your database instances, including automatic minor version upgrades and Blue/Green Deployments in the Amazon RDS User Guide.
Amazon RDS for PostgreSQL makes it simple to set up, operate, and scale PostgreSQL deployments in the cloud. See Amazon RDS for PostgreSQL Pricing for pricing details and regional availability. Create or update a fully managed Amazon RDS database in the Amazon RDS Management Console.
AWS CodePipeline introduces a new action to deploy to Amazon Elastic Compute Cloud (EC2). This action enables you to easily deploy your application to a group of EC2 instances behind load balancers.
Previously, if you wanted to deploy to EC2 instances, you had to use CodeDeploy with an AppSpec file to configure the deployment. Now, you can simply use this new EC2 deploy action in your pipeline to deploy to EC2 instances, without the necessity of managing CodeDeploy resources. This streamlined approach reduces your operational overhead and simplifies your deployment process.
To learn more about using the EC2 deploy action in your pipeline, visit our tutorial and documentation. For more information about AWS CodePipeline, visit our product page. This new action is available in all regions where AWS CodePipeline is supported, except the AWS GovCloud (US) Regions and the China Regions.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 3.8. You can now create new clusters using version 3.8 with either KRAFT or ZooKeeper mode for metadata management or upgrade your existing ZooKeeper based clusters to use version 3.8. Apache Kafka version 3.8 includes several bug fixes and new features that improve performance. Key new features include support for compression level configuration. This allows you to further optimize your performance when using compression types such as lz4, zstd and gzip, by allowing you to change the default compression level. For more details and a complete list of improvements and bug fixes, see the Apache Kafka release notes for version 3.8.
Amazon MSK is a fully managed service for Apache Kafka and Kafka Connect that makes it easier for you to build and run applications that use Apache Kafka as a data store. Amazon MSK is compatible with Apache Kafka, which enables you to quickly migrate your existing Apache Kafka workloads to Amazon MSK with confidence or build new ones from scratch. With Amazon MSK, you can spend more time innovating on streaming applications and less time managing Apache Kafka clusters. To learn how to get started, see the Amazon MSK Developer Guide.
Support for Apache Kafka version 3.8 is offered in all AWS regions where Amazon MSK is available.
Amazon Web Services, Inc. now supports China UnionPay credit cards for creating new AWS accounts, eliminating the need for international credit cards for customers in China.
To use China UnionPay for creating your AWS account, enter your address and billing country in China, then provide your local China UnionPay credit card details and verify your personal identity or business license. All subsequent AWS charges will be billed in Chinese Yuan currency, providing convenient payment experience for customers in China.
To get started, select China UnionPay as your payment method when creating a new AWS account. For more information on using China UnionPay credit cards with AWS, visit Set up a Chinese yuan credit card.
We are excited to announce the general availability of fine-grained data access control (FGAC) via AWS Lake Formation for Apache Spark with Amazon EMR on EKS. This enables you to enforce full FGAC policies (database, table, column, row, and cell-level) defined in Lake Formation for your data lake tables from EMR on EKS Spark jobs. We are also sharing the general availability of Glue Data Catalog views with EMR on EKS for Spark workflows.
Lake Formation simplifies building, securing, and managing data lakes by allowing you to define fine-grained access controls through grant and revoke statements, similar to RDBMS. The same Lake Formation rules now apply to Spark jobs on EMR on EKS for Hudi, Delta Lake, and Iceberg table formats, further simplifying data lake security and governance.
AWS Glue Data Catalog views with EMR on EKS allows customers to create views from Spark jobs that can be queried from multiple engines without requiring access to referenced tables. Administrators can control underlying data access using the rich SQL dialect provided by EMR on EKS Spark jobs. Access is managed with AWS Lake Formation permissions, including named resource grants, data filters, and lake formation tags. All requests are logged in AWS CloudTrail.
Fine-grained access control for Apache Spark batch jobs on EMR on EKS is available with the EMR 7.7 release in all regions where EMR on EKS is available. To get started, see Using AWS Lake Formation with Amazon EMR on EKS.
As a developer, there’s a lot to think about when you’re getting ready to launch an application. There’s the availability of the underlying database, of course, which stores application state, and determines how fast and you can recover if your application or web servers go down. Thankfully, if you’re running on Spanner, its 99.999% availability architecture means that application owners don’t have to worry about designing and operating active-active database deployments. While Spanner itself is designed to offer high availability, as with any other database, it’s important to have a checklist to ensure a successful launch of your workload as a whole and subsequent ongoing operation. We’ve captured a comprehensive list of items to keep in mind when launching workloads on Spanner in the Spanner Launch Checklist. In this blog post, we go over the key steps in the checklist, and provide additional color about each item. Once you’re done, you’ll be equipped to launch your Spanner-based application with confidence.
Let’s explore the key steps to focus on when gearing up for a Spanner launch.
1. Design, develop, test, optimize
Your Spanner schema and transactions are the blueprint for success. Designing for Spanner’s distributed architecture isn’t just a good idea — it’s the key to unlocking its full potential. Ensure your schema avoids bottlenecks and hotspots, and that transactions are tailored for minimal locking and maximum performance. But the real game-changer? Testing. Rigorous, real-world load testing, complete with traffic spikes and realistic concurrency, helps ensure your workload is ready for production scale and capable of handling unexpected spikes in traffic.
2. Choose your deployments well
Spanner deployments entail more than just provisioning nodes; they’re about aligning infrastructure with your business goals. Choose configurations that balance availability and performance while accounting for traffic peaks with features like auto-scaling. Geography matters as well — place leader regions near your users for low-latency performance. And don’t forget the small stuff, like setting up tags and labels for resource tracking or warming up your database for a smooth start.
3. Implement backup and disaster recovery
Downtime isn’t an option when critical data is on the line. A solid backup and recovery strategy helps your business weather unexpected failures. Define recovery objectives, automate backups, and test restore procedures regularly. Features like incremental backups and Point-in-Time Recovery (PITR) make Spanner a reliable ally in your data protection arsenal.
4. Stay secure
Protecting your data goes beyond locking it up. Spanner’s security features, like fine-grained access controls and least-privilege permissions, let you limit access to only what’s necessary and for the right principals. Add an extra layer of defense with deletion protection and carefully designed IAM policies. With these safeguards, your Spanner instance remains both robust and secure.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3eaa8c105cd0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
5. Deploy logging, monitoring, and observability
To keep Spanner running smoothly, you need visibility. Audit logs, distributed tracing with OpenTelemetry, and real-time dashboards give you a clear view of what’s happening under the hood. Proactive alerts ensure you’re ready to act before small issues turn into big problems. We also invite you to check out Database Center, which provides a centralized view across your entire database fleet, including Spanner instances.
6. Optimize your configuration with the client library:
Spanner’s client library is more than a connection — it’s a tool for optimization. Add transaction tags for enhanced debugging, fine-tune session pooling to match workloads, and tweak retry policies to handle errors with grace. These configurations boost both performance and resilience.
7. Cut costs without compromising performance
Managing costs isn’t about cutting corners. Take advantage of committed use discounts for predictable savings and let auto-scaling handle spikes while minimizing idle resources. For non-production environments, scale down or shut off instances when they’re not needed.
8. Plan your Migration
If you’re moving to Spanner, a well-planned migration makes all the difference. From orchestrating a seamless cutover to preparing a reliable fallback strategy, every step counts. Communicate with stakeholders, validate data, and monitor the process to ensure success.
Ready, set, scale!
Launching a successful workload or application is the result of careful planning, rigorous testing, and proactive management. And it’s no different for workloads running on Spanner.
Deploying mission-critical workloads on Spanner is about building a solid foundation for scalability, resilience and peak performance. A well-structured launch checklist ensures you’ve covered all the bases, from design and deployment to backup strategies and cost management. You can review the Spanner Launch Checklist in the Spanner documentation.
Generative AI diffusion models such as Stable Diffusion and Flux produce stunning visuals, empowering creators across various verticals with impressive image generation capabilities. However, generating high-quality images through sophisticated pipelines can be computationally demanding, even with powerful hardware like GPUs and TPUs, impacting both costs and time-to-result.
The key challenge lies in optimizing the entire pipeline to minimize cost and latency without compromising on image quality. This delicate balance is crucial for unlocking the full potential of image generation in real-world applications. For example, before reducing the model size to cut image generation costs, prioritize optimizing the underlying infrastructure and software to ensure peak model performance.
At Google Cloud Consulting, we’ve been assisting customers in navigating these complexities. We understand the importance of optimized image generation pipelines, and in this post, we’ll share three proven strategies to help you achieve both efficiency and cost-effectiveness, to deliver exceptional user experiences.
A comprehensive approach to optimization
We recommend having a comprehensive optimization strategy that addresses all aspects of the pipeline, from hardware to code to overall architecture. One way that we address this at Google Cloud is with AI Hypercomputer, a composable supercomputing architecture that brings together hardware like TPUs & GPUs, along with software and frameworks like Pytorch. Here’s a breakdown of the key areas we focus on:
Image generation pipelines often require GPUs or TPUs for deployment, and optimizing hardware utilization can significantly reduce costs. Since GPUs cannot be allocated fractionally, underutilization is common, especially when scaling workloads, leading to inefficiency and increased cost of operation. To address this, Google Kubernetes Engine (GKE) offers several GPU sharing strategies to improve resource efficiency. Additionally, A3 High VMs with NVIDIA H100 80GB GPUs come in smaller sizes, helping you scale efficiently and control costs.
Some key GPU sharing strategies in GKE include:
Multi-instance GPUs: In this strategy, GKE divides a single GPU in up to 7 slices, providing hardware isolation between the workloads. Each GPU slice has its own resources (compute, memory, and bandwidth) and can be independently assigned to a single container. You can leverage this strategy for inference workloads where resiliency and predictable performance is required. Please review the documented limitations of this approach before implementing and note that currently supported GPU types for multi-instance GPUs on GKE are NVIDIA A100 GPUs (40GB and 80GB), and NVIDIA H100 GPUs (80GB).
GPU time-sharing: GPU time-sharing lets multiple containers access full GPU capacity using rapid context switching between processes; this is made possible by instruction-level preemption in NVIDIA GPUs. This approach is more suitable for bursty and interactive workloads, or for testing and prototyping purposes where full isolation is not required. With GPU time-sharing, you can optimize GPU cost and utilization by reducing GPU idling time. However, context switching may introduce some latency overhead for individual workloads.
NVIDIA Multi-Process Service (MPS): NVIDIA MPS is a version of the CUDA API that lets multiple processes/containers run at the same time on the same physical GPU without interference. In this approach, you can run multiple small-to-medium-scale batch-processing workloads on a single GPU and maximize the throughput and hardware utilization. While implementing MPS, you must ensure that workloads using MPS can tolerate memory protection and error containment limitations.
Example illustration of GPU sharing strategies
2. Inference code optimization: Fine-tuning for efficacy
When you have an existing pipeline written natively in PyTorch, you have several options to optimize and reduce the pipeline execution time.
One way is to use PyTorch’s compile method, which enables Just-in-time (JIT) compiling PyTorch code into optimized kernels for faster execution, especially for the forward pass of the decoder step. You can do this through various backend computers such as NVIDIA TensorRT, OpenVINO or IPEX depending on the underlying hardware. You can also use certain compiler backends at training time. A similar JIT compilation capability is also available for other frameworks such as JAX.
Another way to improve code latency is to enable Flash Attention. By enabling the torch.backends.cuda.enable_flash_sdp attribute, PyTorch code natively runs Flash Attention where it is helpful in speeding up a given computation, while automatically selecting another attention mechanism if Flash isn’t optimal based on the inputs.
Additionally, to reduce latency, you also need to minimize data transfers between the GPU and CPU. Operations such as tensor loading, or comparing tensors and Python floats, incur significant overhead due to the data movement. Each time a tensor is compared with a floating-point value, it must be transferred to the CPU, incurring latency. Ideally, you should only load and offload a tensor on and off the GPU once in the entire image generation pipeline; this is especially important for image generation pipelines that utilize several models, where latency cascades with each model that is run. Tools such as PyTorch Profiler help us observe the time and memory utilization of a model.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eaa8d893bb0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
3. Inference pipeline optimization: Streamlining the workflow
While optimizing code can help speed up individual modules within a pipeline, you need to take a look at the big picture. Many multi-step image-generation pipelines cascade multiple models (e.g., samplers, decoders, and image and/or text embedding models) one after the other to generate the final image, often on a single container that has a single GPU attached.
For Diffusion-based pipelines, models such as decoders can have significantly higher computational complexity and hence take longer to execute, especially as compared with embedding models, which are generally faster. That means that certain models can cause a bottleneck in the generation pipeline. To optimize GPU utilization and mitigate this bottleneck, you may consider employing a multi-threaded queue-based approach for efficient task scheduling and execution. This approach enables parallel execution of different pipeline stages on the same GPU, allowing for concurrent processing of several requests. Efficiently distributing tasks among worker threads minimizes GPU idle time and maximizes resource utilization, ultimately leading to higher throughput.
Furthermore, by maintaining tensors on the same GPU throughout the process, you can reduce the overhead of CPU-to-GPU (and vice-versa) data transfers, further enhancing efficiency and reducing costs.
Processing time comparison between a stacked pipeline and a multithreaded pipeline for 2 concurrent requests
Final thoughts
Optimizing image-generation pipelines is a multifaceted process, but the rewards are significant. By adopting a comprehensive approach that includes hardware optimization for maximized resource utilization, code optimization for faster execution, and pipeline optimization for increased throughput, you can achieve substantial performance gains, reduce costs, and deliver exceptional user experiences. In our work with customers, we’ve consistently seen that implementing these optimization strategies can lead to significant cost savings without compromising image quality.
Ready to get started?
At Google Cloud Consulting, we’re dedicated to helping our customers build and deploy high-performing AI solutions. If you’re looking to optimize your image generation pipelines, connect with Google Cloud Consulting today, and we’ll work to help you unlock the full potential of your AI initiatives.
We extend our sincere gratitude to Akhil Sakarwal, Ashish Tendulkar, Abhijat Gupta, and Suraj Kanojia for their invaluable support and guidance throughout the crucial experimentation phase.
Organizations use multiple clouds to gain agility, use resources more efficiently, and leverage the strengths of different cloud providers. However, managing application traffic across these environments is challenging. To support predictable services, organizations need a system that intelligently selects the optimal backend for each request. The selection takes into account both the real-time health of those backends and the user’s location. This entire process of dynamic traffic routing depends on continuous monitoring of application endpoints across all clouds, which delivers the real-time insights needed for informed decisions.
Today, we’re announcing the general availability of Cloud DNS routing policies with public IP health checking, which provides the automated, health-aware traffic management that you need to build resilient applications, no matter where your workloads reside.
Running on multiple cloud providers often leads to fragmented traffic management strategies. Cloud DNS now lets you intelligently route traffic across multiple cloud providers based on application health from a single interface. Cloud DNS supports a variety of routing policies, including weighted round robin (WRR), geolocation, and failover, giving you the flexibility to tailor your traffic management strategy to your specific needs.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 to try Google Cloud networking’), (‘body’, <wagtail.rich_text.RichText object at 0x3eaa8c575340>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/products?#networking’), (‘image’, None)])]>
Cloud DNS uses routing policies and health checks to direct traffic to healthy backends. These health checks probe internet-based endpoints — any public IP address on other cloud providers, on-premises environments, or even other load balancers. To help improve outage detection in multicloud deployments, health checks are regionalized, originating from points of presence near Google Cloud regions. A backend is considered healthy when a majority of these regional probes report a successful connection. Based on these health checks, Cloud DNS routing policies automatically direct traffic away from failing backends. This automated process happens at the DNS level, providing a crucial layer of control and traffic steering across your infrastructure.
Here are the steps to building a resilient multicloud architecture with Cloud DNS routing policies and public IP health checking:
1. Set up health checks
Configure a HealthCheck resource in Compute Engine, specifying the application’s port on the public IP address. You must select three geographically diverse Google Cloud regions as the origin points for the health-check probes. Good practice is to select regions that are most representative of the user base. For example, if an application services clients from North America and Europe, then a good choice is to include regions from those locations as origins for health checks.
2. Configure a failover routing policy and link it to the health check.
Create a routing policy in Cloud DNS. Define the primary and backup endpoints, specifying the public IP addresses of your applications in different cloud environments.
3. Fail over automatically
If an application instance becomes unhealthy (two or three out of three regions are reporting a failure), Cloud DNS can switch traffic to the healthy instance in another cloud, depending on how the routing policy is configured and the health of the backup endpoint. The routing decision happens at the DNS level before traffic reaches the applications, helping support failover across your multi-cloud infrastructure.
Because health checks test internet-based endpoints, they can be located anywhere on the internet, letting you build cross-cloud and on-prem failover scenarios. Services can be located in other clouds, and traffic can be switched between providers or to on-prem locations during an outage. This lets you as a multicloud customer standardize on Cloud DNS for workloads, helping streamline traffic management and reduce the operational overhead of managing multiple DNS configurations. Furthermore, with health check logging, you can validate that your routing policies are performing as expected and identify any infrastructure issues with specific backends.
Multicloud deployments are increasingly common. This new Cloud DNS capability provides the automated, health-aware traffic management needed to navigate the complexities of multicloud deployments and strive for positive user experiences.
AWS CodePipeline introduces a new action to deploy to Amazon Elastic Kubernetes Service (Amazon EKS). This action enables you to easily deploy your container applications to your EKS clusters, including those in private VPCs.
Previously, if you wanted to deploy to a EKS cluster within a private network, you had to initialize and maintain a compute environment within the private network. Now, you can simply provide the name of the EKS cluster and add this action to your pipeline. The pipeline will automatically establish a connection into your private network to deploy your container application, without additional infrastructure needed. This streamlined approach reduces your operational overhead and simplifies your deployment process.
To learn more about using the EKS action in your pipeline, visit our tutorial and documentation. For more information about AWS CodePipeline, visit our product page. This new action is available in all regions where AWS CodePipeline is supported, except the AWS GovCloud (US) Regions and the China Regions.
Beginning today, customers can use Amazon Bedrock in the Asia Pacific (Hyderabad) and Asia Pacific (Osaka) regions to easily build and scale generative AI applications using a variety of foundation models (FMs) as well as powerful tools to build generative AI applications.
Amazon Bedrock is a fully managed service that offers a choice of high-performing large language models (LLMs) and other FMs from leading AI companies via a single API. Amazon Bedrock also provides a broad set of capabilities customers need to build generative AI applications with security, privacy, and responsible AI built in. These capabilities help you build tailored applications for multiple use cases across different industries, helping organizations unlock sustained growth from generative AI while ensuring customer trust and data governance.
AWS Elastic Beanstalk now enables customers to deploy applications on Windows Server 2025 and Windows Server Core 2025 environments. These environments come pre-configured with .NET Framework 4.8.1 and .NET 8.0, providing developers with the latest Long Term Support (LTS) version of .NET alongside the established .NET Framework
Windows Server 2025 and Windows Server Core 2025 delivers enhanced security features and performance improvements. Developers can create Elastic Beanstalk environments on Windows Server 2025 using the Elastic Beanstalk Console, CLI, API, or AWS Toolkit for Visual Studio.
This platform is generally available in commercial regions where Elastic Beanstalk is available including the AWS GovCloud (US) Regions. For a complete list of regions and service offerings, see AWS Regions.
For more information about .NET on Windows Server platforms, see the Elastic Beanstalk developer guide. To learn more about Elastic Beanstalk, visit the Elastic Beanstalk product page.
Starting today, the Amazon EC2 G6e instances powered by NVIDIA L40S Tensor Core GPUs is now available in Europe (Stockholm) region. G6e instances can be used for a wide range of machine learning and spatial computing use cases.
Customers can use G6e instances to deploy large language models (LLMs) with up to 13B parameters and diffusion models for generating images, video, and audio. Additionally, the G6e instances will unlock customers’ ability to create larger, more immersive 3D simulations and digital twins for spatial computing workloads. G6e instances feature up to 8 NVIDIA L40S Tensor Core GPUs with 48 GB of memory per GPU and third generation AMD EPYC processors. They also support up to 192 vCPUs, up to 400 Gbps of network bandwidth, up to 1.536 TB of system memory, and up to 7.6 TB of local NVMe SSD storage. Developers can run AI inference workloads on G6e instances using AWS Deep Learning AMIs, AWS Deep Learning Containers, or managed services such as Amazon Elastic Kubernetes Service (Amazon EKS), AWS Batch, and Amazon SageMaker.
Amazon EC2 G6e instances are available today in the AWS US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Frankfurt, Spain, Stockholm) regions. Customers can purchase G6e instances as On-Demand Instances, Reserved Instances, Spot Instances, or as part of Savings Plans.