Today, Amazon Cognito announced support for passwordless authentication for low-friction and secure logins in the AWS GovCloud (US) Regions. Amazon Cognito now allows you to secure user access to your applications with passwordless authentication, including sign-in with passkeys, email, and text message. Passkeys are based on FIDO standards and use public key cryptography, which enables strong, phishing-resistant authentication. With passwordless authentication, you can reduce the friction associated with traditional password-based authentication and thus simplify the user log-in experience for their applications. For example, if your users choose to use passkeys to log in, they can do so using a built-in authenticator, such as Touch ID on Apple MacBooks and Windows Hello facial recognition on PCs.
Amazon Cognito provides millions of users with secure, scalable, and customizable sign-up and sign-in experiences within minutes. With this launch, AWS is now extending the support for passwordless authentication to the applications you build. This enables your end-users to log in to your applications with a low-friction and secure approach.
Passwordless authentication is offered as part of the Cognito Essentials tier and can be used in all AWS Regions where Amazon Cognito is available, including AWS GovCloud (US). To get started, see the following resources:
Amazon Cognito introduces Managed Login in the AWS GovCloud (US) Regions, a fully-managed, hosted sign-in and sign-up experience that customers can personalize to align with their company or application branding. Amazon Cognito provides millions of users with secure, scalable, and customizable sign-up and sign-in experiences. With Managed Login, Cognito customers can now use its no-code visual editor to customize the look and feel of the user journey from signup and login to password recovery and multi-factor authentication.
Managed Login helps customers offload the undifferentiated heavy lifting of designing and maintaining custom implementations such as passwordless authentication and localization. For example, Managed Login offers pre-built integrations for passwordless login, including sign-in with passkeys, email, or text message. This provides customers the flexibility to implement low-friction and secure authentication methods without the need to author custom code. With Managed Login, customers now design and manage their end-user sign-up and sign-in experience through the AWS Management Console. Additionally, Cognito has also revamped its getting started experience with application-specific (e.g., for web applications) guidance for customers to swiftly configure their user pools. Together with Managed Login and a simplified getting started experience, customers can now get their applications to end users faster than ever before with Amazon Cognito.
Managed Login is offered as part of the Cognito Essentials tier and can be used in all AWS Regions where Amazon Cognito is available, including the AWS GovCloud (US) Regions. To get started, refer to:
Amazon Cognito is now available in the AWS GovCloud (US-East) Region. This launch introduces all Amazon Cognito features and tiers: Essentials, Lite, and Plus, allowing customers to use comprehensive and flexible authentication and access control features to implement secure, scalable, and customized sign-up and sign-in experiences for their application within minutes. Cognito allows customers to scale authentication to millions of users and supports sign-in with social identity providers such as Apple, Facebook, Google, and Amazon, and enterprise identity providers via standards such as SAML 2.0 and OpenID Connect.
For a full list of regions where Amazon Cognito is available, refer to the AWS Region Table. To learn more about Amazon Cognito, refer to:
Today, we are excited to announce fully managed tiered storage for Spanner, a new capability that lets you use larger datasets with Spanner by striking the right balance between cost and performance, while minimizing operational overhead through a simple, easy-to-use, interface.
Spanner powers mission-critical operational applications at organizations in financial services, retail, gaming, and many other industries. These workloads rely on Spanner’s elastic scalability and global consistency to deliveralways-on experiences at any size. For example, a global trade ledger at a bank or a multi-channel order and inventory management system at a retailer depend on Spanner to provide a consistent view of real-time data to make trades and assess risk, fulfill orders, or dynamically optimize prices.
But over time, settled trade records or fulfilled orders become less important to running the business, and instead drive historical reporting or legal compliance. These datasets don’t require the same real-time performance as “hot,” active, transactional data, prompting customers to look for ways to move this “cold” data to lower-cost storage.
However, moving to alternative types of storage typically requires complicated data pipelines and can impact the performance of the operational system. Manually separating data across storage solutions can result in inconsistent reads that require application-level reconciliation. Furthermore, the separation imposes significant limits on how applications can query across current and historical data for things like responding to regulators; it also increases governance touchpoints that need to be audited.
Tiered storage with Spanner addresses these challenges with a new storage tier based on hard disk drives (HDD) that is 80% cheaper than the existing tier based on solid-state drives (SSD), which is optimized for low-latency and high-throughput queries.
Beyondthe cost savings, benefits include:
Ease of management: Storage tiering with Spanner is entirely policy-driven, minimizing the toil and complexity of building and managing additional pipelines, or splitting/duplicating data across solutions. Asynchronous background processesautomatically move the data from SSD to HDD as part of background maintenance tasks.
Unified and consistent experience: In Spanner, the location of data storage is transparent to you.Queries on Spanner can access data across both SSD and HDD tiers without modification. Similarly, backup policies are applied consistentlyacross the data, enabling consistent restores across data in both the storage tiers.
Flexibility and control: Tiering policies can be applied to the database, table, column, or a secondary index, allowing you to choose what data to move to HDD. For example, data in a column that is rarely queried, e.g., JSON blobs for a long tail of product attributes, can easily be moved to HDD without having to split database tables. You can also choose to have some indexes on SSD, while the data resides in HDD.
“At Mercari, we use Spanner as the database for Merpay, our mobile payments platform that supports over 18.7 million users. With our ever-growing transaction volume, we were exploring options to store accumulated historic transaction data, but did not want to take on the overhead of constantly migrating data to another solution. The launch of Spanner tiered storage will allow us to store old data more cost-effectively, without requiring the use of another solution, while giving us the flexibility of querying it as needed.” – Shingo Ishimura, GAE Meister, Mercari
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud databases’), (‘body’, <wagtail.rich_text.RichText object at 0x3e6c82979280>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/products?#databases’), (‘image’, None)])]>
Let’s take a closer look
To get started, use GoogleSQL/PostgreSQL data definition language (DDL) to configure a locality group that defines storage options [‘SSD’ (default)/ HDD]. Locality groupsare a mechanism to provide data locality and isolation along a dimension (e.g., table, column) to optimize performance. While configuring a locality group, you can also use ‘ssd_to_hdd_spill_timespan’ to specify the time for which data should be stored on SSD before it moves off to HDD as part of a subsequent compaction cycle.
code_block
<ListValue: [StructValue([(‘code’, “# An HDD-only locality group.rnCREATE LOCALITY GROUP hdd_only OPTIONS (storage = ‘hdd’);rnrnrn# An SSD-only locality group.rnCREATE LOCALITY GROUP ssd_only OPTIONS (storage = ‘ssd’);rnrnrn# An HDD to SSD spill policy.rnCREATE LOCALITY GROUP recent_on_ssd OPTIONS (storage = ‘ssd’, ssd_to_hdd_spill_timespan = ’15d’);rnrnrn# Update the tiering policy on the entire database.rnALTER LOCALITY GROUP `default` SET OPTIONS (storage = ‘ssd’, ssd_to_hdd_spill_timespan = ’30d’);rnrnrn# Apply a locality group policy to a new table.rnCREATE TABLE PaymentLedger (rn TxnId INT64 NOT NULL,rn Amount INT64 NOT NULL,rn Account INT64 NOT NULL,rn Description STRING(MAX)rn) PRIMARY KEY (TxnId), OPTIONS (locality_group = ‘recent_on_ssd’);rnrnrn# Apply a locality group policy to an existing column.rnALTER TABLE PaymentLedger ALTER COLUMN Description SET OPTIONS (locality_group = ‘hdd_only’);”), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6c82979ee0>)])]>
Once the DDL has been configured, movement of data from SSD to HDD takes place asynchronously during weekly compaction cycles at the underlying storage layer without any user involvement.
HDD usage can be monitored from System Insights, which displays the amount of HDD storage used per locality group and the disk load at the instance level.
Spanner tiered storage supports both GoogleSQL and PostgreSQL-dialect databases and is available in all regions in which Spanner is available. This functionality is available with Enterprise/Enterprise Plus editions of Spanner for no additional cost beyond the cost of the HDD storage.
Get started with Spanner today
With tiered storage, customers can onboard larger datasets on Spanner by optimizing costs, while minimizing operational overhead through a unified customer experience. Visit our documentation to learn more.
Want tolearn moreabout what makes Spanner unique and how to use tiered storage?Try it yourself for freefor 90 days or for as little as $88 USD/month (Enterprise edition) for a production-ready instance that grows with your business without downtime or disruptive re-architecture.
Today, Amazon EMR on EKS announces support for Amazon EKS Pod Identity, simplifying the setup of IAM permissions required by EMR on EKS jobs to access other AWS resources. With this launch, you can configure IAM permissions through a single API call, significantly reducing complexity and potential for errors. The new feature also allows you to leverage IAM roles across multiple clusters without the need to update IAM trust policies for use in new clusters, improving reusability and operational efficiency.
To run workloads on Amazon EMR on EKS, customers need to create a job execution IAM role that pods in EKS cluster will use to interact with other AWS resources such as Amazon S3 buckets. Previously, customers had to perform multiple configuration steps such as creating an OIDC identity provider and updating IAM’s role trust policy. Role trust policy size also limited the number of EKS clusters that customers could reuse a job execution role across. Now, customers can configure IAM permissions through a single API call and reuse an IAM role across multiple clusters without additional configuration updates.
Amazon EMR on EKS support for EKS Pod Identity is available in all Regions where Amazon EMR on EKS is currently available, except the China Regions. To get started visit the documentation.
Amazon EventBridge enhanced event source discovery, which displays the source and detail type of all AWS service events during rule creation in the AWS console, is now available in the AWS GovCloud (US) Regions. This makes it easier for customers to discover and utilize the full range of AWS service events when building event-driven architectures. Additionally, the EventBridge documentation now includes an automatically updated list of all AWS service events, providing a single source of truth and ensuring developers always have access to accurate, reliable information.
Amazon EventBridge Event Bus is a serverless event router that enables you to create highly scalable event-driven applications by routing events between your own applications, third-party SaaS applications, and other AWS services. With this update, developers can quickly search and filter through all available AWS service events, including event types, within the EventBridge console, when configuring event patterns in the sandbox and rules. This enables customers to create event-driven integrations more efficiently while reducing the risk of misconfiguration.
This feature is now available in the AWS GovCloud (US-East) and AWS GovCloud (US-West) Regions. You can get started by navigating to the EventBridge console, where you can access the Sandbox or Create Rule page to see the list of all events when building the event pattern. You can also see the updated list of AWS service events in the documentation here.
Amazon SageMaker Inference now supports rolling updates for inference component (IC) endpoints. This allows customers to update running IC endpoints without traffic interruption while using minimal extra instances, rather than requiring doubled instances as in the past. SageMaker Inference makes it easy to deploy ML models, including foundation models (FMs). As a capability of SageMaker Inference, IC enables customers to deploy multiple FMs on the same endpoint and control accelerator allocation for each model.
Now, rolling updates enables customers to update ICs within an endpoint batch by batch, instead of all at once like the previous blue/green update method. Blue/green updates required provisioning a new fleet of ICs with the updated model before shifting traffic from the old fleet to the new one, effectively doubling the number of required instances. With rolling updates, new ICs are created in smaller batches, significantly reducing the number of additional instances needed during updates. This helps customers minimize costs from extra capacity and maintain smaller buffer requirements in their capacity reservations.
Rolling update for IC is available in all regions where IC is supported: Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney, Jakarta), Canada (Central), Europe (Frankfurt, Stockholm, Ireland, London), Middle East (UAE), South America (Sao Paulo), US East (N. Virginia, Ohio), and US West (N. California, Oregon). To learn more, see the documentation.
Amazon Elastic Container Service (Amazon ECS) today introduced GPU-optimized Amazon Machine Image (AMI) for Amazon Linux 2023 (AL2023). This new offering enables customers to run GPU-accelerated containerized workloads on Amazon ECS while leveraging improved security features and newer kernel version available on AL2023.
The new ECS GPU-optimized AMI is built on the minimal AL2023 base AMI and includes NVIDIA drivers, NVIDIA Fabric Manager, NVIDIA Container Toolkit, and other essential packages needed to run GPU-accelerated container workloads. The new AMI supports a wide range of NVIDIA GPU architectures including Ampere, Turing, Volta, Maxwell, Hopper, and Ada Lovelace, and works out-of-the-box with no additional configuration required. The new AMI is designed for GPU-accelerated applications such as machine learning (ML) and artificial intelligence (AI) workloads running on Amazon ECS.
The ECS GPU-optimized AL2023 AMI is now available in all AWS regions. For additional information about running GPU-accelerated workloads with Amazon ECS, refer to the documentation and release notes.
Starting today, you can use AWS WAF Targeted Bot Control in the AWS GovCloud (US) Regions. AWS WAF Targeted Bot Control protects applications against sophisticated bots targeting critical enterprise applications like e-commerce and financial services websites.
AWS WAF is a web application firewall that helps you protect your web application resources against common web exploits and bots that can affect availability, compromise security, or consume excessive resources. You can protect the following resource types: Amazon CloudFront distributions, Amazon API Gateway REST APIs, Application Load Balancer, AWS AppSync GraphQL API, AWS App Runner, AWS Verified Access, and Amazon Cognito user pools.
To see the full list of regions where AWS WAF is currently available, visit the AWS Region Table. For more information about the service, visit the AWS WAF page. AWS WAF pricing may vary between regions. For more information about pricing, visit the AWS WAF Pricing page.
Amazon Connect announces the expansion of access to industry-leading inbound number availability across 158 countries, national outbound numbers in 72 countries, and global international dialing capabilities from any supported AWS commercial region. This expansion increases telephony coverage by an average of 125% across AWS regions. Organizations can now focus on selecting the ideal location for their customer experience operations based on business considerations such as agent availability, language fluency and regulatory needs without being constrained by telecommunications infrastructure. Agents and customers benefit from the reliability, quality, and cost-effectiveness enabled by the AWS global network and Amazon Connect’s direct connections to the 40+ tier-1 carriers closest to your customers.
With this launch, Amazon Connect reimagines the delivery of voice calls. Traditional telephony networks often introduce quality degradation through multiple interconnection points, variable routing paths, and aging infrastructure. By leveraging the AWS global network backbone – the same high-performance, low-latency private network that powers AWS, call paths are optimized and routed directly to the carrier closest to your customer. This simplified routing enables consistently clear and natural conversations for every call.
Access expanded telephony coverage for Amazon Connect in all AWS Regions where Amazon Connect is available, except the AWS GovCloud (US) Regions and Africa (Cape Town). For information about our expanded telephony coverage, see Set Up Contact Center Phone Numbers for your Amazon Connect Instance in the Amazon Connect Administrator Guide.
This blog post presents an in-depth exploration of Microsoft’s Time Travel Debugging (TTD) framework, a powerful record-and-replay debugging framework for Windows user-mode applications. TTD relies heavily on accurate CPU instruction emulation to faithfully replay program executions. However, subtle inaccuracies within this emulation process can lead to significant security and reliability issues, potentially masking vulnerabilities or misleading critical investigations—particularly incident response and malware analysis—potentially causing analysts to overlook threats or draw incorrect conclusions. Furthermore, attackers can exploit these inaccuracies to intentionally evade detection or disrupt forensic analyses, severely compromising investigative outcomes.
The blog post examines specific challenges, provides historical context, and analyzes real-world emulation bugs, highlighting the critical importance of accuracy and ongoing improvement to ensure the effectiveness and reliability of investigative tooling. Ultimately, addressing these emulation issues directly benefits users by enhancing security analyses, improving reliability, and ensuring greater confidence in their debugging and investigative processes.
Overview
We begin with an introduction to TTD, detailing its use of a sophisticated CPU emulation layer powered by the Nirvana runtime engine. Nirvana translates guest instructions into host-level micro-operations, enabling detailed capture and precise replay of a program’s execution history.
The discussion transitions into exploring historical challenges in CPU emulation, particularly for the complex x86 architecture. Key challenges include issues with floating-point and SIMD operations, memory model intricacies, peripheral and device emulation, handling of self-modifying code, and the constant trade-offs between performance and accuracy. These foundational insights lay the groundwork for our deeper examination of specific instruction emulation bugs discovered within TTD.
These include:
A bug involving the emulation of thepop r16, resulting in critical discrepancies between native execution and TTD instrumentation.
An issue with thepush segmentinstruction that demonstrates differences between Intel and AMD CPU implementations, highlighting the importance of accurate emulation aligned with hardware behavior
Errors in the implementation of thelodsbandlodswinstructions, where TTD incorrectly clears upper bits that should remain unchanged.
An issue within the WinDbg TTDAnalyze debugging extension, where a fixed output buffer resulted in truncated data during symbol queries, compromising debugging accuracy.
Each case is supported by detailed analyses, assembly code proof-of-concept samples, and debugging traces, clearly illustrating the subtle but significant pitfalls in modern CPU emulation as it pertains to TTD.
Additional bugs discovered beyond those detailed here are pending disclosure until addressed by Microsoft. All bugs discussed in this post have been resolved as of TTD version 1.11.410.
Intro to TTD
Time Travel Debugging (TTD) is a powerful usermode record-and-replay framework developed by Microsoft, originally introduced in a 2006 whitepaper under a different name. It is a staple for our workflows as it pertains to Windows environments.
TTD allows a user to capture a comprehensive recording of a process (and potential child processes) during the lifetime of the process’s execution. This is done by injecting a dynamic-link library (DLL) into the intended target process and capturing each state of the execution. This comprehensive historical view of the program’s runtime behavior is stored in a database-like trace file (.trace), which, much like a database, can be further indexed to produce a corresponding .idx file for efficient querying and analysis.
Once recorded, trace files can be consumed by a compatible client that supports replaying the entire execution history. In other words, TTD effectively functions as a record/replay debugger, enabling analysts to move backward and forward through execution states as if navigating a temporal snapshot of the program’s lifecycle.
TTD relies on a CPU emulation layer to accurately record and replay program executions. This layer is implemented by the Nirvana runtime engine, which simulates guest instructions by translating them into a sequence of simpler, host-level micro-operations. By doing so, Nirvana provides fine-grained control at the instruction and sub-instruction level, allowing instrumentation to be inserted at each stage of instruction processing (e.g., fetching, memory reads, writes). This approach not only ensures that TTD can capture the complete dynamic behavior of the original binary but also makes it possible to accurately re-simulate executions later.
Nirvana’s dynamic binary translation and code caching techniques improve performance by reusing translated sequences when possible. In cases where code behaves unpredictably—such as self-modifying code scenarios—Nirvana can switch to a pure interpretation mode or re-translate instructions as needed. These adaptive strategies ensure that TTD maintains fidelity and efficiency during the record and replay process, enabling it to store execution traces that can be fully re-simulated to reveal intricate details of the code’s behavior under analysis.
The TTD framework is composed of several core components:
TTD: The main TTD client executable that takes as input a wide array of input arguments that dictate how the trace will be conducted.
TTDRecord: The main DLL responsible for the recording that runs within the TTD client executable. It initiates the injection sequence into the target binary by injecting TTDLoader.dll.
TTDLoader: DLL that gets injected into the guest process and initiates the recorder within the guest through the TTDRecordCPU DLL. It also establishes a process instrumentation callback within the guest process that allows Nirvana to monitor the egress of any system calls the guest makes.
TTDRecordCPU: The recorder responsible for capturing the execution states into the .trace file. This is injected as a DLL into the guest process and communicates the status of the trace with TTDRecord. The core logic works by emulating the respective CPU.
TTDReplay and TTDReplayClient: The replay components that read the captured state from the trace file and allow users to step through the recorded execution.
Windbg uses these to provide support for replacing trace files.
TTDAnalyze:A WinDbg extension that integrates with the replay client, providing exclusive TTD capacities to WinDbg. Most notable of these are the Calls and Memory data model methods.
CPU Emulation
Historically, CPU emulation—particularly for architectures as intricate as x86—has been a persistent source of engineering challenges. Early attempts struggled with instruction coverage and correctness, as documentation gaps and hardware errata made it difficult to replicate every nuanced corner case. Over time, a number of recurring problem areas and bug classes emerged:
Floating-Point and SIMD Operations: Floating-point instructions, with their varying precision modes and extensive register states, have often been a source of subtle bugs. Miscalculating floating-point rounding, mishandling denormalized numbers, or incorrectly implementing special instructions like FSIN or FCOS can lead to silent data corruption or outright crashes. Similarly, SSE, AVX, and other vectorized instructions introduce complex states that must be tracked accurately.
Memory Model and Addressing Issues: The x86 architecture’s memory model, which includes segmentation, paging, alignment constraints, and potential misalignments in legacy code, can introduce complex bugs. Incorrectly emulating memory accesses, not enforcing proper page boundaries, or failing to handle “lazy” page faults and cache coherency can result in subtle errors that only appear under very specific conditions.
Peripheral and Device Emulation: Emulating the behavior of x86-specific peripherals—such as serial I/O ports, PCI devices, PS/2 keyboards, and legacy controllers—can be particularly troublesome. These components often rely on undocumented behavior or timing quirks. Misinterpreting device-specific registers or neglecting to reproduce timing-sensitive interactions can lead to erratic emulator behavior or device malfunctions.
Compatibility with Older or Unusual Processors: Emulating older generations of x86 processors, each with their own peculiarities and less standardized features, poses its own set of difficulties. Differences in default mode settings, instruction variants, and protected-mode versus real-mode semantics can cause unexpected breakages. A once-working emulator may fail after it encounters code written for a slightly different microarchitecture or an instruction that was deprecated or implemented differently in an older CPU.
Self-Modifying Code and Dynamic Translation: Code that modifies itself at runtime demands adaptive strategies, such as invalidating cached translations or re-checking original code bytes on the fly. Handling these scenarios incorrectly can lead to stale translations, misapplied optimizations, and difficult-to-trace logic errors.
Performance vs. Accuracy Trade-Offs: Historically, implementing CPU emulators often meant juggling accuracy with performance. Naïve instruction-by-instruction interpretation provided correctness but was slow. Introducing caching or just-in-time (JIT)-based optimizations risked subtle synchronization issues and bugs if not properly synchronized with memory updates or if instruction boundaries were not well preserved.
Collectively, these historical challenges underscore that CPU emulation is not just about instruction decoding. It requires faithfully recreating intricate details of processor states, memory hierarchies, peripheral interactions, and timing characteristics. Even as documentation and tooling have improved, achieving both correctness and efficiency remains a delicate balancing act, and emulation projects continue to evolve to address these enduring complexities.
The Initial TTD Bug
Executing a heavily obfuscated 32-bit Windows Portable Executable (PE) file under TTD instrumentation resulted in a crash. The same sample file did not cause a crash while executing in a real computer or in a virtual machine. We suspected either the sample is detecting TTD execution and or TTD itself has a bug in emulating an instruction. A good thing about debugging TTD issues is that the TTD trace file itself can be used to pinpoint the cause of the issue most of the time. Figure 1 points to the crash while in TTD emulation.
Figure 1: Crash while accessing an address pointed by register ESI
Back tracing the ESIregister value to 0xfb3e took stepping back hundreds of instructions and ended up in the following sequence of instructions, as shown in Figure 2.
Figure 2: Register ESI getting populated by pop si and xchg si,bp
There are two instructions populating the ESI register, both working with the 16-bit sub register of SI while completely ignoring the other 16-bit part of the ESI register. If we look closely at the results after pop si instruction in Figure 2, the upper 16-bit of the ESI register seems to be nulled out. This looked like a bug in emulating pop r16 instructions, and we quickly wrote a proof-of-concept code for verification (Figure 3).
Figure 3: Proof-of-concept for pop r16
Running the resulting binary natively and with TTD instrumentation as shown in Figure 4 confirmed our suspicion that the pop r16 instructions are emulated differently in TTD than on a real CPU.
Figure 4: Running the code natively and with TTD instrumentation
We reported this issue and the fuzzing results to the TTD team at Microsoft.
Fuzzing TTD
Given there is one instruction emulation bug (instruction sequence that produces different results in real vs TTD execution), we decided to fuzz TTD to find similar bugs. A rudimentary harness was created to execute a random sequence of instructions and record the resulting values. This harness was executed on a real CPU and under TTD instrumentation, providing us with two sets of results. Any changes in results or partial lack of results points us to a likely instruction emulation bug.
This new bug was fairly similar to the original pop r16 bug, but with a push segment instruction. This bug also comes with a little bit of twist. While our fuzzer was running on an Intel CPU-based machine and one of us verified the bug locally, the other person was not able to verify the bug. Interestingly, the failure happened on an AMD-based CPU, tipping us to the possibility that the push segment instruction implementation varies between INTEL and AMD CPUs.
Looking at both INTEL and AMD CPU specifications, INTEL specification goes into details about how recent processors implement push segment register instruction:
If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits, either a zero-extended value is pushed on the stack or the segment selector is written on the stack using a 16-bit move. For the last case, all recent Intel Core and Intel Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified. (INTEL spec Vol.2B 4-517)
We reported the discrepancy to AMD PSIRT, who concluded that this is not a security vulnerability. It seems sometime circa 2007 INTEL and AMD CPU started implementing the push segment instruction differently, and TTD emulation followed the old way.
The lodsband lodsware not correctly implemented for both 32-bit and 64-bit instructions. Both clear the upper bits of the register (rax/eax) whereas the original instructions only modify their respective granularities (i.e., lodsbwill only overwrite 1-byte, lodswonly 2-bytes).
Figure 6: Proof-of-concept for lodsb/lodsw
There are additional instruction emulation bugs pending fixes from Microsoft.
As we were pursuing our efforts in the CPU emulator, we accidentally stumbled on another bug, this time not in the emulator but inside the Windbg extension exposed by TTD: TTDAnalyze.dll.
This extension leverages the debugger’s data model to allow a user to interact with the trace file in an interactive manner. This is done via exposing a TTD data model namespace under certain parts of the data model, such as the current process (@$curproces), the current thread (@$curthread), and current debugging session (@$cursession).
Figure 7: TTD query types
As an example, the @$cursession.TTD.Callsmethod allows a user to query all call locations captured within the trace. It takes as input either an address or case-insensitive symbol name with support for regex. The symbol name can either be in the format of a string (with quotes) or parsed symbol name (without quotes). The former is only applicable when the symbols are resolved fully (e.g., private symbols), as the data model has support for converting private symbols into an ObjectTargetObjectobject thus making it consumable to the dxevaluation expression parser.
The bug in question directly affects the exposed Callsmethod under @$cursession.TTD.Callsbecause it uses a fixed, static buffer to capture the results of the symbol query. In Figure 8 we illustrate that by passing in two similar regex strings that produce inconsistent results.
Figure 8: TTD Calls query
When we query C*and Create*,the C*query results do not return the other Create APIs that were clearly captured in the trace. Under the hood, TTDAnalyzeexecutes the examine debugger command “x KERNELBASE!C*“ with a custom output capture to process the results. This output capture truncates any captured data if it is greater than 64 KB in size.
If we take the disassembly of the global buffer and output capture routine in TTDAnalyze(SHA256 CC5655E29AFA87598E0733A1A65D1318C4D7D87C94B7EBDE89A372779FF60BAD) prior to the fix, we can see the following (Figure 9 and Figure 10):
Figure 9: TTD implementation disassembly
Figure 10: TTD implementation disassembly
The capture for the examine command is capped at 64 KB. When the returned data exceeds this limit, truncation is performed at address 0x180029960. Naturally querying symbols starting with C* typically yields a large volume of results, not just those beginning with Create*, leading to the observed truncation of the data.
Final Thoughts
The analysis presented in this blog post highlights the critical nature of accuracy in instruction emulation—not just for debugging purposes, but also for ensuring robust security analysis. The observed discrepancies, while subtle, underscore a broader security concern: even minor deviations in emulation behavior can misrepresent the true execution of code, potentially masking vulnerabilities or misleading forensic investigations.
From a security perspective, the work emphasizes several key takeaways:
Reliability of Debugging Tools: TTD and similar frameworks are invaluable for reverse engineering and incident response. However, any inaccuracies in emulation, such as those revealed by the misinterpretation of pop r16, push segment, or lods* instructions, can compromise the fidelity of the analysis. This raises important questions about trust in our debugging tools when they are used to analyze potentially malicious or critical code.
Impact on Threat Analysis: The ability to replay a process’s execution with high fidelity is crucial for uncovering hidden behaviors in malware or understanding complex exploits. Instruction emulation bugs may inadvertently alter the execution path or state, leading to incomplete or skewed insights that could affect the outcome of a security investigation.
Collaboration and Continuous Improvement: The discovery of these bugs, followed by their detailed documentation and reporting to the relevant teams at Microsoft and AMD, highlights the importance of a collaborative approach to security research. Continuous testing, fuzzing, and cross-platform comparisons are essential in maintaining the integrity and security of our analysis tools.
In conclusion, this exploration not only sheds light on the nuanced challenges of CPU emulation within TTD, but also serves as a call to action for enhanced scrutiny and rigorous validation of debugging frameworks. By ensuring that these tools accurately mirror native execution, we bolster our security posture and improve our capacity to detect, analyze, and respond to sophisticated threats in an ever-evolving digital landscape.
Acknowledgments
We extend our gratitude to the Microsoft Time Travel Debugging team for their readiness and support in addressing the issues we reported. Their prompt and clear communication not only resolved the bugs but also underscored their commitment to keeping TTD robust and reliable. We further appreciate that they have made TTD publicly available—a resource invaluable for both troubleshooting and advancing Windows security research.
Amazon Athena Provisioned Capacity is now available in the Asia Pacific (Mumbai) Region. Provisioned Capacity allows you to run SQL queries on dedicated serverless resources for a fixed price, with no long-term commitment, and control workload performance characteristics such as query concurrency and cost.
Athena is a serverless, interactive query service that makes it possible to analyze petabyte-scale data with ease and flexibility. Provisioned Capacity provides workload management capabilities that help you prioritize, isolate, and scale your workloads. For example, use Provisioned Capacity when you need to run a high number of queries at the same time or isolate important queries from other queries that run in the same account. To get started, use the Athena console, AWS SDK, or CLI to request capacity and then select workgroups with queries you want to run on dedicated capacity.
Amazon WorkSpaces Pools now offers Federal Information Processing Standard 140-2 (FIPS) validated endpoints (FIPS endpoints) for user streaming sessions. FIPS 140-2 is a U.S. government standard that specifies the security requirements for cryptographic modules that protect sensitive information. WorkSpaces Pools FIPS endpoints use FIPS-validated cryptographic standards, which may be required for certain sensitive information or regulated workloads.
To enable FIPS endpoint encryption for end user streaming via AWS Console, navigate to Directories, and verify that the Pools directory where you want to add FIPS is in a STOPPED state, and that the preferred protocol is set to TCP. Once verified, select the directory and on the Directory Details page update the endpoint encryption to FIPS 140-2 Validated Mode and save.
We are excited to announce that Amazon OpenSearch Serverless is expanding availability to the Amazon OpenSearch Serverless to Europe (Spain) Region. OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service that makes it simple to run search and analytics workloads without the complexities of infrastructure management. OpenSearch Serverless’ compute capacity used for data ingestion, search, and query is measured in OpenSearch Compute Units (OCUs). To control costs, customers can configure maximum number of OCUs per account.
Today, AWS announces the general availability of GraphRAG, a capability in Amazon Bedrock Knowledge Bases that enhances Retrieval-Augmented Generation (RAG) by incorporating graph data. GraphRAG delivers more comprehensive, relevant, and explainable responses by leveraging relationships within your data, improving how Generative AI applications retrieve and synthesize information.
Since public preview, customers have leveraged the managed GraphRAG capability to get improved responses to queries from their end users. GraphRAG automatically generates and stores vector embeddings in Amazon Neptune Analytics, along with a graph representation of entities and their relationships. GraphRAG combines vector similarity search with graph traversal, enabling higher accuracy when retrieving information from disparate yet interconnected data sources.
GraphRAG with Amazon Neptune is built right into Amazon Bedrock Knowledge Bases, offering an integrated experience with no additional setup or additional charges beyond the underlying services. GraphRAG is generally available in AWS Regions where Amazon Bedrock Knowledge Bases and Amazon Neptune Analytics are both available (see current list of supported regions). To learn more, visit the Amazon Bedrock User Guide.
Today, Amazon SES announces the availability of the Vade Add On for Mail Manager, a sophisticated content filter that enhances email security for both incoming and outgoing messages. This new Add On, developed in collaboration with HornetSecurity, combines heuristics, behavioral analysis, and machine learning to provide robust protection against evolving communication threats such as spam, phishing attempts, and malware.
Now available as a rule property in Mail Manager, the Vade Add On empowers users with automated, real-time defense against email-based threats for safer communication. Its AI-powered technology employs a multi-layered approach, analyzing messages in real-time using advanced techniques like natural language processing. This integration allows customers to strengthen their email platforms by configuring ongoing protection against evolving cyber threats alongside existing Mail Manager rules, offering flexibility in managing their email security.
The Vade Add On for Amazon SES Mail Manager is available in the following AWS Regions: US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), and Asia Pacific (Sydney), US East (Ohio), US West (San Francisco), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Canada Central (Montreal), Europe (London), Europe (Paris), Europe (Stockholm), and South America (São Paulo).
To learn more about this new feature and how it can enhance your email security, visit the documentation for Email Add Ons in Mail Manager. You can easily activate the Vade Advanced Email Security Add On directly from the Amazon SES console to start protecting your email communications today.
Contact Lens now enables you to create dynamic evaluation forms that automatically show or hide questions based on responses to previous questions, tailoring each evaluation to specific customer interaction scenarios. For example, when a manager answers “Yes” to the form question “Did the customer try to make a purchase on the call?”, the form automatically presents a follow-up question: “Did the agent read the sales disclosure?”. With this launch, you can consolidate evaluation forms that are applicable to different interaction scenarios into a single dynamic evaluation form which automatically hides irrelevant questions. This reduces manager effort in selecting the relevant evaluation form and determining which evaluation questions are applicable to the interaction, helping managers perform evaluations faster and more accurately.
This feature is available in all regions where Contact Lens performance evaluations are already available. To learn more, please visit our documentation and our webpage. For information about Contact Lens pricing, please visit our pricing page.
AWS Application Load Balancer (ALB) now allows customers to provide a pool of public IPv4 addresses for IP address assignment to load balancer nodes. Customers can configure a public IP Address Manager (IPAM) pool that can consist of either Bring Your Own IP addresses (BYOIPs) that is customer owned or a contiguous IPv4 address block provided by Amazon.
With this feature, customers can optimize public IPv4 cost by using BYOIP in public IPAM pools. Customers can also simplify their enterprise allowlisting and operations, by using Amazon-provided contiguous IPv4 blocks in public IPAM pools. The ALB’s IP addresses are sourced from the IPAM pool and automatically switch to AWS managed IP addresses when the public IPAM pool is depleted. This intelligent switching maximizes service availability during scaling events.
Amazon Redshift Data API, which lets you connect to Amazon Redshift through a secure HTTPS endpoint, now supports single sign-on (SSO) through AWS IAM Identity Center. Amazon Redshift Data API removes the need to manage database drivers, connections, network configurations, and data buffering, simplifying how you access your data warehouses and data lakes.
AWS IAM Identity Center lets customers connect existing identity providers from a centrally managed location. You can now use AWS IAM Identity Center with your preferred identity provider, including Microsoft Entra Id, Okta, and Ping, to connect to Amazon Redshift clusters through Amazon Redshift Data API. This new SSO integration simplifies identity management, so that you don’t have to manage separate database credentials for your Amazon Redshift clusters. Once authenticated, your authorization rules are enforced using the permissions defined in Amazon Redshift or AWS Lake Formation.
This feature is available in all AWS Regions where both AWS IAM Identity Center and Amazon Redshift are available. For more information, see our documentation and blog.
AWS HealthOmics now supports the latest NVIDIA L4 and L40S graphical processing units (GPUs) and larger compute options of up to 192 vCPUs for workflows. AWS HealthOmics is a HIPAA-eligible service that helps healthcare and life sciences customers accelerate scientific breakthroughs with fully managed biological data stores and workflows. This release expands workflow compute capabilities to support more demanding workloads for genomics research and analysis.
In addition to current support for NVIDIA A10G and T4 GPUs, this release adds support for NVIDIA L4 and L40S GPUs, which enables researchers to efficiently run complex machine learning workloads such as protein structure prediction and biological foundation models (bioFMs). The enhanced CPU configurations with up to 192 vCPUs and 1,536 GiB of memory allows for faster processing of large-scale genomics datasets. These improvements help research teams reduce time-to-insight for critical life sciences work.
NVIDIA L4 and L40S GPUs and 128 and 192 vCPU omics instance types are now available in: US East (N. Virginia) and US West (Oregon). To get started with AWS HealthOmics workflows, see the documentation.