AWS CodeBuild now offers native support for self-hosted Buildkite runners, enabling you to execute Buildkite pipeline jobs within the CodeBuild environment. AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages ready for deployment.
Buildkite is a continuous integration and continuous delivery platform. With this feature, your Buildkite jobs can access CodeBuild’s comprehensive suite of instance types and managed images, and utilize native integrations with AWS services. You have control over the build environment, without the overhead of manually provisioning and scaling the compute resources.
The Buildkite runner feature is available in all regions where CodeBuild is offered. For more information about the AWS Regions where CodeBuild is available, see the AWS Regions page.
A new minor version of Microsoft SQL Server is now available on Amazon RDS for SQL Server, providing performance enhancements and security fixes. Amazon RDS for SQL Server now supports this latest minor version of SQL Server 2019 across the Express, Web, Standard, and Enterprise editions.
We encourage you to upgrade your Amazon RDS for SQL Server database instances at your convenience. You can upgrade with just a few clicks in the Amazon RDS Management Console or by using the AWS CLI. Learn more about upgrading your database instances from the Amazon RDS User Guide. The new minor version is SQL Server 2019 CU30 – 15.0.4415.2.
This minor version is available in all AWS commercial regions where Amazon RDS for SQL Server databases are available, including the AWS GovCloud (US) Regions.
Amazon RDS for SQL Server makes it simple to set up, operate, and scale SQL Server deployments in the cloud. See Amazon RDS for SQL Server Pricing for pricing details and regional availability.
Amazon Connect now provides the ability to choose which states an agent can be in when adhering to their schedule, making it easier for you to customize adherence tracking to match your unique operational needs. With this launch, you can now define custom mappings between agent statuses and schedule activities. For example, schedule activity “Work” can be mapped to multiple agent statuses such as “Available” and “Back-office work.” An agent scheduled for “Work” from 8 AM to 10 AM will be considered adherent if they are either in “Available” or “Back-office work” status. Additionally, you can now view the actual name of the scheduled activity in the real-time adherence dashboard (as opposed to only Productive/Non-productive). With custom mappings and enhanced real-time dashboard, this launch provides more accurate and flexible agent adherence monitoring.
This feature is available in all AWS Regions where Amazon Connect agent scheduling is available. To learn more about Amazon Connect agent scheduling, click here.
Written By: Jacob Paullus, Daniel McNamara, Jake Rawlins, Steven Karschnia
Executive Summary
Mandiant exploited flaws in the Microsoft Software Installer (MSI) repair action of Lakeside Software’s SysTrack installer to obtain arbitrary code execution.
An attacker with low-privilege access to a system running the vulnerable version of SysTrack could escalate privileges locally.
Mandiant responsibly disclosed this vulnerability to Lakeside Software, and the issue has been addressed in version 11.0.
Introduction
Building upon the insights shared in a previous Mandiant blog post, Escalating Privileges via Third-Party Windows Installers, this case study explores the ongoing challenge of securing third-party Windows installers. These vulnerabilities are rooted in insecure coding practices when creating Microsoft Software Installer (MSI) Custom Actions and can be caused by references to missing files, broken shortcuts, or insecure folder permissions. These oversights create gaps that inadvertently allow attackers the ability to escalate privileges.
As covered in our previous blog post, after software is installed with an MSI file, Windows caches the MSI file in the C:WindowsInstaller folder for later use. This allows users on the system to access and use the “repair” feature, which is intended to address various issues that may be impacting the installed software. During execution of an MSI repair, several operations (such as file creation or execution) may be triggered from an NT AUTHORITYSYSTEM context, even if initiated by a low-privilege user, thereby creating privilege escalation opportunities.
This blog post specifically focuses on the discovery and exploitation of CVE-2023-6080, a local privilege escalation vulnerability that Mandiant identified in Lakeside Software’s SysTrack Agent version 10.7.8.
Exploiting the SysTrack Installer
Mandiant began by using Microsoft’s Process Monitor (ProcMon) to analyze and review file operations executed during the repair process of SysTrack’s MSI. While running the repair process as a low-privileged user, Mandiant observed file creation and execution within the user’s %TEMP% folder from MSIExec.exe.
Figure 1: MSIExec.exe copying and executing .tmp file in user’s %TEMP% folder
Each time Mandiant ran the repair functionality, MSIExec.exe wrote a new .tmp file to the %TEMP% folder using a formula-based name, and then executed it. Mandiant discovered, through dynamic analysis of the installer, that the name generated by the repair function would consist of the string “wac” followed by four randomly chosen hex characters (0-9, A-F). With this naming scheme, there were 65,535 possible filename options.
Due to the %TEMP% folder being writable by a low-privilege user, Mandiant tested the behavior of the repair tool when all possible filenames already existed within the %TEMP% folder. Mandiant created a PowerShell script to copy an arbitrary test executable to each possible file name in the range of wac0000.tmp to wacFFFF.tmp.
# Path to the permutations file
$csvFilePath = ‘.permutations.csv’
# Path to the executable
$exePath = ‘.test.exe’
# Target directory (using the system’s temp directory)
$targetDirectory = [System.IO.Path]::GetTempPath()
# Read the csv file content
$csvContent = Get-Content -Path $csvFilePath
# Split the content into individual values
$values = $csvContent -split “,”
# Loop through each value and copy the exe to the target directory with the new name
Foreach ($value in $values) {
$newFilePath = Join-Path -Path $targetDirectory -ChildPath ($value + “.tmp”)
Copy-Item -Path $exePath -Destination $newFilePath
}
Write-Output “Copy operation completed to $targetDirectory”
Figure 2: Creating all possible .tmp files in %TEMP%
Figure 3: Excerpt of .tmp files created in %TEMP%
After filling the previously identified namespace, Mandiant reran the MSI repair function to observe its subsequent behavior. Upon review of the ProcMon output, Mandiant observed that when the namespace was filled, the application would failover to an incrementing filename pattern. The pattern began with wac1.tmp and incremented the number each time in a predictable pattern, if the previous file existed. To prove this theory, Mandiant manually created wac1.tmp and wac2.tmp, then observed the MSI repair action in ProcMon. When running the MSI repair function, the resulting filename was wac3.tmp.
Figure 4: MSIExec.exe writing and executing a predicted .tmp file
Additionally, Mandiant observed that there was a small delay between the file write action and the file execution action, which could potentially result in a race condition vulnerability. Since Mandiant could now force the program to use a predetermined filename, Mandiant wrote another PowerShell script designed to attempt to win the race condition by copying a file (test.exe) to the %TEMP% folder, using the predicted filename, between the file write and execution in order to overwrite the file created by MSIExec.exe. In this test, test.exe was a simple proof-of-concept executable that would start notepad.exe.
while ($true) {
if (Test-Path -Path "C:UsersUSERAppDataLocalTempwac3.tmp") {
Copy-Item -Path "C:UsersUSERDesktoptest.exe" -Destination
"C:UsersUSERAppDataLocalTempwac3.tmp" -Force
}
}
Figure 5: PowerShell race condition script to copy arbitrary file into %TEMP%
With the %TEMP% folder prepared with the wac1.tmp and wac2.tmp files created, Mandiant ran both the PowerShell script and MSI repair action targeting wac3.tmp. With the race condition script running, execution of the repair action resulted in the test.exe file overwriting the intended binary and subsequently being executed by MSIExec.exe, opening cmd.exe as NT AUTHORITYSYSTEM.
Figure 6: Obtaining NT AUTHORITY SYSTEM command prompt
Defensive Considerations
As discussed in Mandiant’s previous blog post, misconfigured Custom Actions can be trivial to find and exploit, making them a significant security risk for organizations. It is essential for software developers to follow secure coding practices and review their implemented Custom Actions to prevent attackers from hijacking high-privilege operations triggered by the MSI repair functionality. Refer to the original blog post for general best practices when configuring Custom Actions. In discovery of CVE-2023-6080, Mandiant identified several misconfigurations and oversights that allowed for privilege escalation to NT AUTHORITYSYSTEM.
The SysTrack MSI performed file operations including creation and execution in the user’s %TEMP% folder, which provides a low-privilege user the opportunity to alter files being actively used in a high-privilege context. Software developers should keep folder permissions in mind and ensure all privileged file operations are performed from folders that are appropriately secured. This can include altering the read/write permissions for the folder, or using built-in folders such as C:Program Files or C:Program Files (x86), which are inherently protected from low-privilege users.
Additionally, the software’s filename generation schema included a failover mechanism that allowed an attacker to force the application into using a predetermined filename. When using randomized filenames, developers should use a sufficiently large length to ensure that an attacker cannot exhaust all possible filenames and force the application into unexpected behavior. In this case, knowing the target filename before execution made it significantly easier to beat the race condition, as opposed to dynamically identifying and replacing the target file between the time of its creation by MSIExec.exe and the time of its execution.
Something security professionals must also consider is the safety of the programs running on corporate machines. Many approved applications may inadvertently contain security vulnerabilities that increase the risk in our environments. Mandiant recommends that companies consider auditing the security of their individual endpoints to ensure that defense in depth is maintained at an organizational level. Furthermore, where possible, companies should monitor the spawning of administrative shells such as cmd.exe and powershell.exe in an elevated context to alert on possible privilege escalation attempts.
A Final Word
Domain privilege escalation is often the focus of security vendors and penetration tests, but it is not the only avenue for privilege escalation or compromise of data integrity in a corporate environment. Compromise of integrity on a single system can allow an attacker to mount further attacks throughout the network; for example, the Network Access Account used by SCCM can be compromised through a single workstation and when misconfigured can be used to escalate privileges within the domain and pivot to additional systems within the network.
Mandiant offers dedicated endpoint security assessments, during which customer endpoints are tested from multiple contexts, including the perspective of an adversary with low-privilege access attempting to escalate privileges. For more information about Mandiant’s technical consulting services, including comprehensive endpoint security assessments, visit our website.
We would like to extend a special thanks to Andrew Oliveau, who was a member of the testing team that discovered this vulnerability during his time at Mandiant.
CVE-2023-6080 Disclosure Timeline
June 13, 2024 – Vulnerability reported to Lakeside Software
July 1, 2024 – Lakeside Software confirmed the vulnerability
August 7, 2024 – Confirmed vulnerability fixed in version 11.0
AWS Transfer Family web apps are now available in the following additional Regions: North America (N. California, Canada West, Canada Central), South America (São Paulo), Europe (London, Paris, Zurich, Milan, Spain), Africa (Cape Town), Israel (Tel Aviv), Middle East (Bahrain, UAE), and Asia Pacific (Osaka, Hong Kong, Hyderabad, Jakarta, Melbourne, Seoul, Mumbai). This expansion allows you to create Transfer Family web apps in additional commercial Regions where Transfer Family is available.
AWS Transfer Family web apps provide a simple interface for accessing your data in Amazon S3 through a web browser. With Transfer Family web apps, you can provide your workforce with a fully managed, branded, and secure portal for your end users to browse, upload, and download data in S3.
Dashboard Q&A by Amazon Q in QuickSight enables QuickSight Authors to add Data Q&A to their dashboards in one-click. With dashboard Q&A, QuickSight users can ask and answer questions about their data using natural language.
Dashboard Q&A capabilities of Q in QuickSight automatically extract semantic information presented in dashboards and use it to enable Q&A over specific data and improves existing Topic based Q&A experiences by automatically using semantics from dashboards to improve Q&A answers. With Dashboard Q&A Authors can quickly deliver self-service access to customized data insights for the entire organization.
Dashboard Q&A is launching to all regions in which QuickSight’s generative data Q&A is available today, as documented here.
Amazon Elastic Block Store (Amazon EBS) now supports additional resource-level permissions for creating EBS volumes from snapshots. With this launch, you now have more granular controls to set resource-level permissions for the creation of a volume and selection of the source snapshot when calling the CreateVolume action in your IAM policy. This allows you to control the IAM identities that can create EBS volumes from source snapshots, and the conditions that they can use these snapshots to create EBS volumes.
To meet your specific permission needs on the source snapshots, you can also specify any of 5 EC2-specific condition keys in your IAM policy: ec2:Encrypted, ec2:VolumeSize, ec2:Owner, ec2:ParentVolume, and ec2:SnapshotTime. Additionally, you can use global condition keys for the source snapshot.
This new resource-level permission model is available in all AWS Regions where EBS volumes are available. To learn more about using resource-level permissions to create EBS volume, or transitioning to the new resource-level permission model from previous permission model, please visit the launch blog. For more information about Amazon EBS, please visit the product page.
Today, Amazon Q Developer announces an improved software development agent capable of running build and test scripts on generated code to validate the code before the developers review. This new capability detects errors, ensures generated code is in sync with the project’s current state, and accelerates the development process by producing higher quality code on the first iteration.
With the developer’s natural language input request and project-specific context, the Amazon Q Developer agent is designed to assist in implementing complex multi-file features and bug fixes. The agent will analyze the existing codebase, make necessary code changes, and run the selected build and test commands to ensure the code is working as expected. Where errors are found, the agent will iterate on the code prior to requesting the developer’s review. Throughout the process, the agent maintains a real-time connection with the developer, providing updates as changes are made. With control over what commands Amazon Q runs through a Devfile, you can customize the development process for better accuracy.
The Amazon Q Developer agent for software development is available for JetBrains and Visual Studio Code IDEs in all AWS regions where Q Developer is available.
AWS Deadline Cloud now includes the ability to specify a limit for a specific resource, like a floating license, and also constrain the maximum number of workers that work on a job. AWS Deadline Cloud is a fully managed service that simplifies render management for teams creating computer-generated graphics and visual effects, for films, television and broadcasting, web content, and design.
By adding a limit to your Deadline Cloud farm, you can specify a maximum amount of concurrent usage of resources by workers in your farm. Capping resource usage ensures tasks don’t start until the resources needed to run are available. For example, if you have 50 floating licenses for a particular plugin required by your rendering workflow, a Deadline Cloud limit allows you to ensure no more than 50 tasks requiring that limit are started, preventing tasks from failing due to the license being unavailable. Additionally, setting a maximum number of workers on a job enables you to prevent any single job from consuming all the available workers so that you can efficiently run multiple jobs concurrently when there are a limited number of workers available.
Limits are available in all AWS Regions where Deadline Cloud is available.
Amazon Connect now includes the ability for agents to schedule time off up to 24 months in the future, making it easier for managers and agents to plan ahead of time. With this launch, agents can now book time off in Connect up to 24 months ahead of time (an increase from 13 months). Additionally, you can now upload pre-approved time off windows for a scheduling group (group allowance) for up to 27 months at a time (an increase from 13 months). These increased limits provide agents more flexibility to plan their personal time and also provide managers better visibility into future staffing needs, thus enabling more efficient resource allocation.
This feature is available in all AWS Regions where Amazon Connect agent scheduling is available. To learn more about Amazon Connect agent scheduling, click here.
Amazon AppStream 2.0 now allows administrator to control whether admin consent is required when users link their OneDrive for Business accounts as a persistent storage option.
The new capability simplifies the management of AppStream 2.0 persistent storage and the admin consent process. After enabling OneDrive for Business for an AppStream 2.0 stack and specifying the OneDrive domains, administrators can now configure whether admin consent is needed for each OneDrive domain. If admin consent is required, administrators must approve users’ OneDrive connections within their Azure Active Directory environment when users attempt to link their account to AppStream 2.0.
This feature is available at no additional cost in all AWS Regions where AppStream 2.0 is offered. It is supported only on AppStream stacks using single-session Windows fleets.
To get started, open the AppStream 2.0 console and create a stack. In the Enable storage step, enable OneDrive for business and configure the admin consent settings. For more details, refer to Administer OneDrive for business. You can also programmatically manage the setting using AppStream 2.0 APIs. For API details, see the CreateStack API documentation.
AWS Glue announces 14 new connectors for applications, expanding its connectivity portfolio. Customers can now use AWS Glue native connectors to ingest data from Blackbaud Raiser’s Edge NXT, CircleCI, Docusign Monitor, Domo, Dynatrace, Kustomer, Mailchimp, Microsoft Teams, Monday, Okta, Pendo, Pipedrive, Productboard and Salesforce Commerce Cloud.
As enterprises increasingly rely on data-driven decisions, they need to integrate with data from various applications. With 14 new connectors, customers have more options to easily establish a connection to their applications using the AWS Glue console or AWS Glue APIs without the need to learn application-specific APIs. Glue native connectors provide the scalability and performance of the AWS Glue Spark engine along with support for standard authorization and authentication methods like OAuth 2. With these connectors, customers can test connections, validate their connection credentials, preview data, and browse metadata.
AWS Glue native connectors to Blackbaud, CircleCI, Docusign Monitor, Domo, Dynatrace, Kustomer, Mailchimp, Microsoft Teams, Monday, Okta, Pendo, Pipedrive, Productboard, Salesforce Commerce Cloud are available in all AWS commercial regions.
To get started, create new AWS Glue connections with these connectors and use them as source in AWS Glue studio. To learn more, visit AWS Glue documentation for connectors.
Amazon RDS Custom for SQL Server now offers enhanced storage and performance capabilities, supporting up to 64TiB of storage and 256,000 I/O operations per second (IOPS) with io2 Block Express volumes. This represents an improvement from the previous limit of 16 TiB and 64,000 IOPS with io2 Block Express. These enhancements enable transactional databases and data warehouses to handle larger workloads on a single Amazon RDS Custom for SQL Server database instance.
The support for 64TiB and 256,000 IOPS with io2 Block Express for Amazon RDS Custom for SQL Server is now generally available in all AWS regions where both Amazon RDS io2 Block Express volumes and Amazon RDS Custom for SQL Server are currently supported.
Amazon RDS Custom for SQL Server is a managed database service that allows customization of the underlying operating system and includes the ability to bring your own licensed SQL Server media or use SQL Server Developer Edition while providing the time-savings, durability, and scalability benefits of a managed database service. To get started, visit the Amazon RDS Custom for SQL Server User Guide. See Amazon RDS Custom Pricingfor up-to-date pricing of instances, storage, data transfer and regional availability.
Amazon Connect Cases now allows agents and supervisors to filter cases in the agent workspace by custom field values, making it easier to narrow down search results and find relevant cases. Users can also customize the case list view and search results layout by adding custom columns, hiding or rearranging existing columns, and adjusting the number of cases per page. These enhancements enable users to tailor the case list view to meet their needs and manage their case workloads more effectively.
The Amazon EventBridge console now displays the source and detail type of all available AWS service events when you create a rule in the EventBridge console. This makes it easier for customers to discover and utilize the full range of AWS service events when building event-driven architectures. Additionally, the EventBridge documentation now includes an automatically updated list of all AWS service events, facilitating access to the most current information.
Amazon EventBridge Event Bus is a serverless event router that enables you to create highly scalable event-driven applications by routing events between your own applications, third-party SaaS applications, and other AWS services. With this update, developers can quickly search and filter through all available AWS service events, including event types, within the EventBridge console, when configuring event patterns in the sandbox and rules, and in the documentation, enabling customers to more efficiently create event-driven integrations and reduce misconfiguration.
This feature in the EventBridge console is available in all commercial AWS Regions. To learn more about discovering and using AWS service events in Amazon EventBridge, see the updated list of AWS service events in the documentation here.
Amazon Managed Service for Prometheus collector, a fully-managed agentless collector for Prometheus metrics, adds support for cross-account ingestion. Starting today, you can agentlessly scrape metrics from Amazon Elastic Kubernetes Service clusters in different accounts than your Amazon Managed Service for Prometheus workspace.
While it was previously possible to apply AWS multi-account best practices for centralized observability with Amazon Managed Service for Prometheus workspaces, you had to use self-managed collection. This meant that you had to run, scale, and patch telemetry agents yourself to scrape metrics from Amazon Elastic Kubernetes Service clusters in various accounts in order to ingest them into a central Amazon Managed Service for Prometheus workspaces in a different account. With this launch, you can now use the Amazon Managed Service for Prometheus collector to get rid of this heavy lifting and ingest metrics in a cross-account setup without having to self-run a collector. In addition, you can now also use the Amazon Managed Service for Prometheus collector to scrape metrics from for Amazon Elastic Kubernetes Service clusters to ingest them into Amazon Managed Service for Prometheus workspaces created with customer managed keys.
Amazon Managed Service for Prometheus collector is available in all regions where Amazon Managed Service for Prometheus is available. To learn more about Amazon Managed Service for Prometheus collector, visit the user guide or product page.
For developers who want to use the PyTorch deep learning framework with Cloud TPUs, the PyTorch/XLA Python package is key, offering developers a way to run their PyTorch models on Cloud TPUs with only a few minor code changes. It does so by leveraging OpenXLA, developed by Google, which gives developers the ability to define their model once and run it on many different types of machine learning accelerators (i.e., GPUs, TPUs, etc.).
The latest release of PyTorch/XLA comes with several improvements that improve its performance for developers:
A new experimental scan operator to speed up compilation for repetitive blocks of code (i.e., for loops)
Host offloading to move TPU tensors to the host CPU’s memory to fit larger models on fewer TPUs
Improved goodput for tracing-bound models through a new base Docker image compiled with the C++ 2011 Standard application binary interface (C++ 11 ABI) flags
In addition to these improvements we’ve also re-organized the documentation to make it easier to find what you’re looking for!
Let’s take a look at each of these features in greater depth.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e3a54db9ee0>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
Experimental scan operator
Have you ever experienced long compilation times, for example when working with large language models and PyTorch/XLA — especially when dealing with models with numerous decoder layers? During graph tracing, where we traverse the graph of all the operations being performed by the model, these iterative loops are completely “unrolled” — i.e., each loop iteration is copied and pasted for every cycle — resulting in large computation graphs. These larger graphs lead directly to longer compilation times. But now there’s a new solution: the new experimental scan function, inspired by jax.lax.scan.
The scan operator works by changing how loops are handled during compilation. Instead of compiling each iteration of the loop independently, which creates redundant blocks, scan compiles only the first iteration. The resulting compiled high-level operation (HLO) is then reused for all subsequent iterations. This means that there is less HLO or intermediate code that is being generated for each subsequent loop. Compared to a for loop, scan compiles in a fraction of the time since it only compiles the first loop iteration. This improves the developer iteration time when working on models with many homogeneous layers, such as LLMs.
Building on top of torch_xla.experimental.scan, the torch_xla.experimental.scan_layers function offers a simplified interface for looping over sequences of nn.Modules. Think of it as a way to tell PyTorch/XLA “These modules are all the same, just compile them once and reuse them!” For example:
code_block
<ListValue: [StructValue([(‘code’, ‘import torchrnimport torch.nn as nnrnimport torch_xlarnfrom torch_xla.experimental.scan_layers import scan_layersrnrnclass DecoderLayer(nn.Module):rn def __init__(self, size):rn super().__init__()rn self.linear = nn.Linear(size, size)rnrn def forward(self, x):rn return self.linear(x)rnrnwith torch_xla.device():rn layers = [DecoderLayer(1024) for _ in range(64)]rn x = torch.randn(1, 1024)rnrn# Instead of a for loop, we can scan_layers once:rn# for layer in layers:rn# x = layer(x)rnx = scan_layers(layers, x)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3a54d60610>)])]>
One thing to note is that custom pallas kernels do not yet support scan. Here is a complete example of using scan_layers in an LLM for reference.
Host offloading
Another powerful tool for memory optimization in PyTorch/XLA is host offloading. This technique allows you to temporarily move tensors from the TPU to the host CPU’s memory, freeing up valuable device memory during training. This is especially helpful for large models where memory pressure is a concern. You can use torch_xla.experimental.stablehlo_custom_call.place_to_host to offload a tensor and torch_xla.experimental.stablehlo_custom_call.place_to_device to retrieve it later. A typical use case involves offloading intermediate activations during the forward pass and then bringing them back during the backward pass. Here’s an example of host offloading for reference.
Strategic use of host offloading, such as when you’re working with limited memory and are unable to use the accelerator continuously, may significantly improve your ability to train large and complex models within the memory constraints of your hardware.
Alternative base Docker image
Have you ever encountered a situation where your TPUs are sitting idle while your host CPU is heavily loaded tracing your model execution graph for just-in-time compilation? This suggests your model is “tracing bound,” meaning performance is limited by the speed of tracing operations.
The C++11 ABI image offers a solution. Starting with this release, PyTorch/XLA offers a choice of C++ ABI flavors for both Python wheels and Docker images. This gives you a choice for which version of C++ you’d like to use with PyTorch/XLA. You’ll now find builds with both the pre-C++11 ABI, which remains the default to match PyTorch upstream, and the more modern C++11 ABI.
Switching to the C++11 ABI wheels or Docker images can lead to noticeable improvements in the above-mentioned scenarios. For example, we observed a 20% relative improvement in goodput with the Mixtral 8x7B model on v5p-256 Cloud TPU (with a global batch size of 1024) when we switched from the pre-C++11 ABI to the C++11 ABI! ML Goodput gives us an understanding of how efficiently a given model utilizes the hardware. So if we have a higher goodput measurement for the same model on the same hardware, that indicates better performance of the model.
An example of using a C++11 ABI docker image in your Dockerfile might look something like:
code_block
<ListValue: [StructValue([(‘code’, ‘# Use the C++11 ABI PyTorch/XLA image as the basernFROM us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.6.0_3.10_tpuvm_cxx11rnrn# Install any additional dependencies herern# RUN pip install my-other-packagernrn# Copy your code into the containerrnCOPY . /apprnWORKDIR /apprnrn# Run your training scriptrnCMD [“python”, “train.py”]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e3a54d60460>)])]>
Alternatively, if you are not using Docker images, because you’re testing locally for instance, you can install the C++11 ABI wheels for version 2.6 using the following command (Python 3.10 example):
The above command works for Python 3.10. We have instructions for other versions within our documentation.
The flexibility to choose between C++ ABIs lets you choose the optimal build for your specific workload and hardware, ultimately leading to better performance and efficiency in your PyTorch/XLA projects!
So, what are you waiting for, go try out the latest version of PyTorch/XLA! For additional information check out the latest release notes.
A note on GPU support
We aren’t offering a PyTorch/XLA:GPU wheel in the PyTorch/XLA 2.6 release. We understand this is important and plan to reinstate GPU support by the 2.7 release. PyTorch/XLA remains an open-source project and we welcome contributions from the community to help maintain and improve the project. To contribute, please start with the contributors guide.
The latest stable version where a PyTorch/XLA:GPU wheel is available is torch_xla 2.5.
Modern AI workloads require powerful accelerators and high-speed interconnects to run sophisticated model architectures on an ever-growing diverse range of model sizes and modalities. In addition to large-scale training, these complex models need the latest high-performance computing solutions for fine-tuning and inference.
Today, we’re excited to bring the highly-anticipated NVIDIA Blackwell GPUs to Google Cloud with the preview of A4 VMs, powered by NVIDIA HGX B200. The A4 VM features eight Blackwell GPUs interconnected by fifth-generation NVIDIA NVLink, and offers a significant performance boost over the previous generation A3 High VM. Each GPU delivers 2.25 times the peak compute and 2.25 times the HBM capacity, making A4 VMs a versatile option for training and fine-tuning for a wide range of model architectures, while the increased compute and HBM capacity makes it well-suited for low-latency serving.
The A4 VM integrates Google’s infrastructure innovations with Blackwell GPUs to bring the best cloud experience for Google Cloud customers, from scale and performance, to ease-of-use and cost optimization. Some of these innovations include:
Enhanced networking: A4 VMs are built on servers with our Titanium ML network adapter, optimized to deliver a secure, high-performance cloud experience for AI workloads, building on NVIDIA ConnectX-7 network interface cards (NICs). Combined with our datacenter-wide 4-way rail-aligned network, A4 VMs deliver non-blocking 3.2 Tbps of GPU-to-GPU traffic with RDMA over Converged Ethernet (RoCE). Customers can scale to tens of thousands of GPUs with our Jupiter network fabric with 13 Petabits/sec of bi-sectional bandwidth.
Google Kubernetes Engine: With support for up to 65,000 nodes per cluster, GKE is the most scalable and fully automated Kubernetes service for customers to implement a robust, production-ready AI platform. Out of the box, A4 VMs are natively integrated with GKE. Integrating with other Google Cloud services, GKE facilitates a robust environment for the data processing and distributed computing that underpin AI workloads.
Vertex AI: A4 VMs will be accessible through Vertex AI, our fully managed, unified AI development platform for building and using generative AI, and which is powered by the AI Hypercomputer architecture under the hood.
Open software: In addition to PyTorch and CUDA, we work closely with NVIDIA to optimize JAX and XLA, enabling the overlap of collective communication and computation on GPUs. Additionally, we added optimized model configurations and example scripts for GPUs with XLA flags enabled.
Hypercompute Cluster: Our new highly scalable clustering system streamlines infrastructure and workload provisioning, and ongoing operations of AI supercomputers with tight GKE and Slurm integration.
Multiple consumption models: In addition to the On-demand, Committed use discount, and Spot consumption models, we reimagined cloud consumption for the unique needs of AI workloads with Dynamic Workload Scheduler, which offers two modes for different workloads: Flex Start mode for enhanced obtainability and better economics, and Calendar mode for predictable job start times and durations.
Hudson River Trading, a multi-asset-class quantitative trading firm, will leverage A4 VMs to train its next generation of capital market model research. The A4 VM, with its enhanced inter-GPU connectivity and high-bandwidth memory, is ideal for the demands of larger datasets and sophisticated algorithms, accelerating Hudson River Trading’s ability to react to the market.
“We’re excited to leverage A4, powered by NVIDIA’s Blackwell B200 GPUs. Running our workload on cutting edge AI Infrastructure is essential for enabling low-latency trading decisions and enhancing our models across markets. We’re looking forward to leveraging the innovations in Hypercompute Cluster to accelerate deployment of training our latest models that deliver quant-based algorithmic trading.” – Iain Dunning, Head of AI Lab, Hudson River Trading
“NVIDIA and Google Cloud have a long-standing partnership to bring our most advanced GPU-accelerated AI infrastructure to customers. The Blackwell architecture represents a giant step forward for the AI industry, so we’re excited that the B200 GPU is now available with the new A4 VM. We look forward to seeing how customers build on the new Google Cloud offering to accelerate their AI mission.” – Ian Buck, Vice-President and General Manager of Hyperscale and HPC, NVIDIA
Better together: A4 VMs and Hypercompute Cluster
Effectively scaling AI model training requires precise and scalable orchestration of infrastructure resources. These workloads often stretch across thousands of VMs, pushing the limits of compute, storage, and networking.
Hypercompute Cluster enables you to deploy and manage these large clusters of A4 VMs with compute, storage and networking as a single unit. This makes it easy to manage complexity while delivering exceptionally high performance and resilience for large distributed workloads. Hypercompute Cluster is engineered to:
Deliver high performance through co-location of A4 VMs densely packed to enable optimal workload placement
Optimize resource scheduling and workload performance with GKE and Slurm, packed with intelligent features like topology-aware scheduling
Increase reliability with built-in self-healing capabilities, proactive health checks, and automated recovery from failures
Enhance observability and monitoring for timely and customized insights
Automate provisioning, configuration, and scaling, integrated with GKE and Slurm
We’re excited to be the first hyperscaler to announce preview availability of an NVIDIA Blackwell B200-based offering. Together, A4 VMs and Hypercompute Cluster make it easier for organizations to create and deliver AI solutions across all industries. If you’re interested in learning more, please reach out to your Google Cloud representative.
Amazon S3 announces schema definition support for the CreateTable API to programmatically create tables with pre-defined columns. This enhancement simplifies table creation for data analytics applications, making it easier to get started and ingest data in S3 table buckets.
To use this feature, you can specify column names and their data types as new request headers in the CreateTable API to define a table’s schema in an S3 table bucket. You can also define a table’s schema when you create tables using the AWS CLI or the AWS SDK.
Amazon S3 Tables now support creating up to 10,000 tables in each S3 table bucket. With this higher quota, you can scale up to 100,000 tables across 10 table buckets within an AWS Region per AWS Account. The higher table quota is available by default on all table buckets at no additional cost.
S3 Tables deliver the first cloud object store with built-in Apache Iceberg support, and the easiest way to store tabular data at scale. You can use S3 Tables with AWS Analytics services through the preview integration with Amazon SageMaker Lakehouse, as well as Apache Iceberg-compatible open source engines like Apache Spark and Apache Flink.