AWS Health customers can now use Internet Protocol version 6 (IPv6) addresses, via our new dual-stack endpoints to view operational issues or planned lifecycle events for all accounts and resources in your organization. The existing Health endpoints supporting IPv4 will remain available for backwards compatibility.
The urgency to transition to Internet Protocol version 6 (IPv6) is driven by the continued growth of internet, which is exhausting available Internet Protocol version 4 (IPv4) addresses. With simultaneous support for both IPv4 and IPv6 clients on Health endpoints, you are able to gradually transition from IPv4 to IPv6 based systems and applications, without needing to switch all over at once. This enables you to meet IPv6 compliance requirements and removes the need for expensive networking equipment to handle the address translation between IPv4 and IPv6.
To learn more on best practices for configuring IPv6 in your environment, visit the whitepaper on IPv6 in AWS. Support for IPv6 on AWS Health is available in all commercial regions. To learn more, please refer to the user guide.
Since 2022, Google Threat Intelligence Group (GTIG) has been tracking multiple cyber espionage operations conducted by China-nexus actors utilizing POISONPLUG.SHADOW. These operations employ a custom obfuscating compiler that we refer to as “ScatterBrain,” facilitating attacks against various entities across Europe and the Asia Pacific (APAC) region. ScatterBrain appears to be a substantial evolution of ScatterBee, an obfuscating compiler previously analyzed by PWC.
GTIG assesses that POISONPLUG is an advanced modular backdoor used by multiple distinct, but likely related threat groups based in the PRC, however we assess that POISONPLUG.SHADOW usage appears to be further restricted to clusters associated with APT41.
GTIG currently tracks three known POISONPLUG variants:
POISONPLUG
POISONPLUG.DEED
POISONPLUG.SHADOW
POISONPLUG.SHADOW—often referred to as “Shadowpad,” a malware family name first introduced by Kaspersky—stands out due to its use of a custom obfuscating compiler specifically designed to evade detection and analysis. Its complexity is compounded by not only the extensive obfuscation mechanisms employed but also by the attackers’ highly sophisticated threat tactics. These elements collectively make analysis exceptionally challenging and complicate efforts to identify, understand, and mitigate the associated threats it poses.
In addressing these challenges, GTIG collaborates closely with the FLARE team to dissect and analyze POISONPLUG.SHADOW. This partnership utilizes state-of-the-art reverse engineering techniques and comprehensive threat intelligence capabilities required to mitigate the sophisticated threats posed by this threat actor. We remain dedicated to advancing methodologies and fostering innovation to adapt to and counteract the ever-evolving tactics of threat actors, ensuring the security of Google and our customers against sophisticated cyber espionage operations.
Overview
In this blog post, we present our in-depth analysis of the ScatterBrain obfuscator, which has led to the development of a complete stand-alone static deobfuscator library independent of any binary analysis frameworks. Our analysis is based solely on the obfuscated samples we have successfully recovered, as we do not possess the obfuscating compiler itself. Despite this limitation, we have been able to comprehensively infer every aspect of the obfuscator and the necessary requirements to break it. Our analysis further reveals that ScatterBrain is continuously evolving, with incremental changes identified over time, highlighting its ongoing development.
This publication begins by exploring the fundamental primitives of ScatterBrain, outlining all its components and the challenges they present for analysis. We then detail the steps required to subvert and remove each protection mechanism, culminating in our deobfuscator. Our library takes protected binaries generated by ScatterBrain as input and produces fully functional deobfuscated binaries as output.
By detailing the inner workings of ScatterBrain and sharing our deobfuscator, we hope to provide valuable insights into developing effective countermeasures. Our blog post is intentionally exhaustive, drawing from our experience in dealing with obfuscation for clients, where we observed a significant lack of clarity in understanding modern obfuscation techniques. Similarly, analysts often struggle with understanding even relatively simplistic obfuscation methods primarily because standard binary analysis tooling is not designed to account for them. Therefore, our goal is to alleviate this burden and help enhance the collective understanding against commonly seen protection mechanisms.
For general questions about obfuscating compilers, we refer to our previous work on the topic, which provides an introduction and overview.
ScatterBrain Obfuscator
Introduction
ScatterBrain is a sophisticated obfuscating compiler that integrates multiple operational modes and protection components to significantly complicate the analysis of the binaries it generates. Designed to render modern binary analysis frameworks and defender tools ineffective, ScatterBrain disrupts both static and dynamic analyses.
Protection Modes: ScatterBrain operates in three distinct modes, each determining the overall structure and intensity of the applied protections. These modes allow the compiler to adapt its obfuscation strategies based on the specific requirements of the attack.
Protection Components: The compiler employs key protection components that include the following:
Selective or Full Control Flow Graph (CFG) Obfuscation: This technique restructures the program’s control flow, making it very difficult to analyze and create detection rules for.
Instruction Mutations: ScatterBrain alters instructions to obscure their true functionality without changing the program’s behavior.
Complete Import Protection: ScatterBrain employs a complete protection of a binary’s import table, making it extremely difficult to understand how the binary interacts with the underlying operating system.
These protection mechanisms collectively make it extremely challenging for analysts to deconstruct and understand the functionality of the obfuscated binaries. As a result, ScatterBrain poses a formidable obstacle for cybersecurity professionals attempting to dissect and mitigate the threats it generates.
Modes of Operation
A mode refers to how ScatterBrain will transform a given binary into its obfuscated representation. It is distinct from the actual core obfuscation mechanisms themselves and is more about the overall strategy of applying protections. Our analysis further revealed a consistent pattern in applying various protection modes at specific stages of an attack chain:
Selective: A group of individually selected functions are protected, leaving the remainder of the binary in its original state. Any import references within the selected functions are also obfuscated. This mode was observed to be used strictly for dropper samples of an attack chain.
Complete: The entirety of the code section and all imports are protected. This mode was applied solely to the plugins embedded within the main backdoor payload.
Complete “headerless”: This is an extension of the Complete mode with added data protections and the removal of the PE header. This mode was exclusively reserved for the final backdoor payload.
Selective
The selective mode of protection allows users of the obfuscator to selectively target individual functions within the binary for protection. Protecting an individual function involves keeping the function at its original starting address (produced by the original compiler and linker) and substituting the first instruction with a jump to the obfuscated code. The generated obfuscations are stored linearly from this starting point up to a designated “end marker” that signifies the ending boundary of the applied protection. This entire range constitutes a protected function.
The disassembly of a call site to a protected function can take the following from:
Figure 1: Disassembly of a call to a protected function
The start of the protected function:
.text:180001039 PROTECTED_FUNCTION
.text:180001039 jmp loc_18000DF97 ; jmp into obfuscated code
.text:180001039 sub_180001039 endp
.text:000000018000103E db 48h ; H. ; garbage data
.text:000000018000103F db 0FFh
.text:0000000180001040 db 0C1h
Figure 2: Disassembly inside of a protected function
The “end marker” consists of two sets of padding instructions, an int3 instruction and a single multi-nop instruction:
END_MARKER:
.text:18001A95C CC CC CC CC CC CC CC CC CC CC 66
66 0F 1F 84 00 00 00 00 00
.text:18001A95C int 3
.text:18001A95D int 3
.text:18001A95E int 3
.text:18001A95F int 3
.text:18001A960 int 3
.text:18001A961 int 3
.text:18001A962 int 3
.text:18001A963 int 3
.text:18001A964 int 3
.text:18001A965 int 3
.text:18001A966 db 66h, 66h ; @NOTE: IDA doesn't disassemble properly
.text:18001A966 nop word ptr [rax+rax+00000000h]
; -------------------------------------------------------------------------
; next, original function
.text:18001A970 ; [0000001F BYTES: COLLAPSED FUNCTION
__security_check_cookie. PRESS CTRL-NUMPAD+ TO EXPAND]
Figure 3: Disassembly listing of an end marker
Complete
The complete mode protects every function within the .text section of the binary, with all protections integrated directly into a single code section. There are no end markers to signify protected regions; instead, every function is uniformly protected, ensuring comprehensive coverage without additional sectioning.
This mode forces the need for some kind of deobfuscation tooling. Whereas selective mode only protects the selected functions and leaves everything else in its original state, this mode makes the output binary extremely difficult to analyze without accounting for the obfuscation.
Complete Headerless
This complete mode extends the complete approach to add further data obfuscations alongside the code protections. It is the most comprehensive mode of protection and was observed to be exclusively limited to the final payloads of an attack chain. It incorporates the following properties:
Full PE header of the protected binary is removed.
Custom loading logic (a loader) is introduced.
Becomes the entry point of the protected binary
Responsible of ensuring the protected binary is functional
Includes the option of mapping the final payload within a separate memory region distinct from the initial memory region it was loaded in
Metadata is protected via hash-like integrity checks.
The metadata is utilized by the loader as part of its initialization sequence.
Import protection will require relocation adjustments.
Done through an “import fixup table”
The loader’s entry routine crudely merges with the original entry of the binary by inserting multiple jmpinstructions to bridge the two together. The following is what the entry point looks like after running our deobfuscator against a binary protected in headerless mode.
Figure 4: Deobfuscated loader entry
The loader’s metadata is stored in the .data section of the protected binary. It is found via a memory scan that applies bitwise XOR operations against predefined constants. The use of these not only locates the metadata but also serves a dual purpose of verifying its integrity. By checking that the data matches expected patterns when XORed with these constants, the loader ensures that the metadata has not been altered or tampered with.
Figure 5: Memory scan to identify the loader’s metadata inside the .data section
The metadata contains the following (in order):
Import fixup table(fully explained in the Import Protection section)
Integrity-hash constants
Relative virtual address (RVA) of the.datasection
Offset to the import fixup table from the start of the.datasection
Size, in bytes, of the fixup table
Global pointer to the memory address that the backdoor is at
Encrypted and compressed data specific to the backdoor
Backdoor config and plugins
Figure 6: Loader’s metadata
Core Protection Components
Instruction Dispatcher
The instruction dispatcher is the central protection component that transforms the natural control flow of a binary (or individual function) into scattered basic blocks that end with a unique dispatcher routine that dynamically guides the execution of the protected binary.
Figure 7: Illustration of the control flow instruction dispatchers induce
Each call to a dispatcher is immediately followed by a 32-bit encoded displacement positioned at what would normally be the return address for the call. The dispatcher decodes this displacement to calculate the destination target for the next group of instructions to execute. A protected binary can easily contain thousands or even tens of thousands of these dispatchers making manual analysis of them practically infeasible. Additionally, the dynamic dispatching and decoding logic employed by each dispatcher effectively disrupts CFG reconstruction methods used by all binary analysis frameworks.
The decoding logic is unique for each dispatcher and is carried out using a combination of add, sub, xor, and, or, and lea instructions. The decoded offset value is then either subtracted from or added to the expected return address of the dispatcher call to determine the final destination address. This calculated address directs execution to the next block of instructions, which will similarly end with a dispatcher that uniquely decodes and jumps to subsequent instruction blocks, continuing this process iteratively to control the program flow.
The following screenshot illustrates what a dispatcher instance looks like when constructed in IDA Pro. Notice the scattered addresses present even within instruction dispatchers, which result from the obfuscator transforming fallthrough instructions—instructions that naturally follow the preceding instruction—into pairs of conditional branches that use opposite conditions. This ensures that one branch is always taken, effectively creating an unconditional jump. Additionally, a mov instruction that functions as a no-op is inserted to split these branches, further obscuring the control flow.
Figure 8: Example of an instruction dispatcher and all of its components
The core logic for any dispatcher can be categorized into the following four phases:
Preservation of Execution Context
Each dispatcher selects a single working register (e.g., RSI as depicted in the screenshot) during the obfuscation process. This register is used in conjunction with the stack to carry out the intended decoding operations and dispatch.
The RFLAGS register in turn is safeguarded by employing pushfq and popfq instructions before carrying out the decoding sequence.
Retrieval of Encoded Displacement
Each dispatcher retrieves a 32-bit encoded displacement located at the return address of its corresponding call instruction. This encoded displacement serves as the basis for determining the next destination address.
Decoding Sequence
Each dispatcher employs a unique decoding sequence composed of the following arithmetic and logical instructions: xor, sub, add, mul, imul, div, idiv, and, or, and not. This variability ensures that no two dispatchers operate identically, significantly increasing the complexity of the control flow.
Termination and Dispatch
The ret instruction is strategically used to simultaneously signal the end of the dispatcher function and redirect the program’s control flow to the previously calculated destination address.
It is reasonable to infer that the obfuscator utilizes a template similar to the one illustrated in Figure 9 when applying its transformations to the original binary:
Figure 9: Instruction dispatcher template
Opaque Predicates
ScatterBrain uses a series of seemingly trivial opaque predicates (OP) that appear straightforward to analysts but significantly challenge contemporary binary analysis frameworks, especially when used collectively. These opaque predicates effectively disrupt static CFG recovery techniques not specifically designed to counter their logic. Additionally, they complicate symbolic execution approaches as well by inducing path explosions and hindering path prioritization. In the following sections, we will showcase a few examples produced by ScatterBrain.
test OP
This opaque predicate is constructed around the behavior of the testinstruction when paired with an immediate zero value. Given that the testinstruction effectively performs a bitwise AND operation, the obfuscator exploits the fact that any value bitwise AND-ed with zero always invariably results in zero.
Here are some abstracted examples we can find in a protected binary—abstracted in the sense that all instructions are not guaranteed to follow one another directly; other forms of mutations can be between them as can instruction dispatchers.
test bl, 0
jnp loc_56C96 ; we never satisfy these conditions
------------------------------
test r8, 0
jo near ptr loc_3CBC8
------------------------------
test r13, 0
jnp near ptr loc_1A834
------------------------------
test eax, 0
jnz near ptr loc_46806
Figure 10: Test opaque predicate examples
To grasp the implementation logic of this opaque predicate, the semantics of the testinstruction and its effects on the processor’s flags register are required. The instruction can affect six different flags in the following manner:
Overflow Flag (OF): Always cleared
Carry Flag (CF): Always cleared
Sign Flag (SF): Set if the most significant bit (MSB) of the result is set; otherwise cleared
Zero Flag (ZF): Set if the result is 0; otherwise cleared
Parity Flag (PF): Set if the number of set bits in the least significant byte (LSB) of the result is even; otherwise cleared
Auxiliary Carry Flag (AF): Undefined
Applying this understanding to the sequences produced by ScatterBrain, it is evident that the generated conditions can never be logically satisfied:
Sequence
Condition Description
test <reg>, 0; jo
OFis always cleared
test <reg>, 0; jnae/jc/jb
CFis always cleared
test <reg>, 0; js
Resulting value will always be zero; therefore, SFcan never be set
test <reg>, 0; jnp/jpo
The number of bits in zero is always zero, which is an even number; therefore, PFcan never be set
test <reg>, 0; jne/jnz
Resulting value will always be zero; therefore, ZFwill always be set
Table 1: Test opaque predicate understanding
jcc OP
The opaque predicate is designed to statically obscure the original immediate branch targets for conditional branch (jcc) instructions. Consider the following examples:
test eax, eax
ja loc_3BF9C
ja loc_2D154
test r13, r13
jns loc_3EA84
jns loc_53AD9
test eax, eax
jnz loc_99C5
jnz loc_121EC
cmp eax, FFFFFFFF
jz loc_273EE
jz loc_4C227
Figure 11: jcc opaque predicate examples
The implementation is straightforward: each original jccinstruction is duplicated with a bogus branch target. Since both jccinstructions are functionally identical except for their respective branch destinations, we can determine with certainty that the first jccin each pair is the original instruction. This original jccdictates the correct branch target to follow when the respective condition is met, while the duplicated jccserves to confuse analysis tools by introducing misleading branch paths.
Stack-Based OP
The stack-based opaque predicate is designed to check whether the current stack pointer (rsp) is below a predetermined immediate threshold—a condition that can never be true. It is consistently implemented by pairing the cmp rsp instruction with a jb (jump if below) condition immediately afterward.
cmp rsp, 0x8d6e
jb near ptr unk_180009FDA
Figure 12: Stack-based opaque predicate example
This technique inserts conditions that are always false, causing CFG algorithms to follow both branches and thereby disrupt their ability to accurately reconstruct the control flow.
Import Protection
The obfuscator implements a sophisticated import protection layer. This mechanism conceals the binary ‘s dependencies by transforming each original callor jmpinstruction directed at an import through a unique stub dispatcher routine that knows how to dynamically resolve and invoke the import in question.
Figure 13: Illustration of all the components involved in the import protection
It consists of the following components:
Import-specific encrypteddata: Each protected import is represented by a unique dispatcher stub and a scattered data structure that stores RVAs to both the encrypted dynamic-link library (DLL) and application programming interface (API) names. We refer to this structure as obf_imp_t. Each dispatcher stub is hardcoded with a reference to its respective obf_imp_t.
Dispatcher stub: This is an obfuscated stub that dynamically resolves and invokes the intended import. While every stub shares an identical template, each contains a unique hardcoded RVA that identifies and locates its corresponding obf_imp_t.
Resolver routine: Called from the dispatcher stub, this obfuscated routine resolves the import and returns it to the dispatcher, which facilitates the final call to the intended import. It begins by locating the encrypted DLL and API names based on the information in obf_imp_t. After decrypting these names, the routine uses them to resolve the memory address of the API.
Import decryption routine: Called from the resolver routine, this obfuscated routine is responsible for decrypting the DLL and API name blobs through a custom stream cipher implementation. It uses a hardcoded 32-bit salt that is unique per protected sample.
Fixup Table: Present only in headerless mode, this is a relocation fixup table that the loader in headerless mode uses to correct all memory displacements to the following import protection components:
Encrypted DLL names
Encrypted API names
Import dispatcher references
Dispatcher Stub
The core of the import protection mechanism is the dispatcher stub. Each stub is tailored to an individual import and consistently employs a leainstruction to access its respective obf_imp_t, which it passes as the only input to the resolver routine.
push rcx ; save RCX
lea rcx, [rip+obf_imp_t] ; fetch import-specific obf_imp_t
push rdx ; save all other registers the stub uses
push r8
push r9
sub rsp, 28h
call ObfImportResolver ; resolve the import and return it in RAX
add rsp, 28h
pop r9 ; restore all saved registers
pop r8
pop rdx
pop rcx
jmp rax ; invoke resolved import
Figure 14: Deobfuscated import dispatcher stub
Each stub is obfuscated through the mutation mechanisms outlined earlier. This applies to the resolver and import decryption routines as well. The following is what the execution flow of a stub can look like. Note the scattered addresses that while presented sequentially are actually jumping all around the code segment due to the instruction dispatchers.
obf_imp_tis the central data structure that contains the relevant information to resolve each import. It has the following form:
struct obf_imp_t { // sizeof=0x18
uint32_t CryptDllNameRVA; // NOTE: will be 64-bits, due to padding
uint32_t CryptAPINameRVA; // NOTE: will be 64-bits, due to padding
uint64_t ResolvedImportAPI; // Where the resolved address is stored
};
Figure 16: obf_imp_t in its original C struct source form
It is processed by the resolver routine, which uses the embedded RVAs to locate the encrypted DLL and API names, decrypting each in turn. After decrypting each name blob, it usesLoadLibraryAto ensure the DLL dependency is loaded in memory and leveragesGetProcAddressto retrieve the address of the import.
The import decryption logic is implemented using a Linear Congruential Generator (LCG) algorithm to generate a pseudo-random key stream, which is then used in a XOR-based stream cipher for decryption. It operates on the following formula:
Xn + 1 = (a • Xn+ c) mod 232
where:
ais always hardcoded to 17and functions as the multiplier
cis a unique 32-bit constant determined by the encryption context and is unique per-protected sample
We refer to it as the imp_decrypt_const
mod 232 confines the sequence values to a 32-bit range
The decryption logic initializes with a value from the encrypted data and iteratively generates new values using the outlined LCG formula. Each iteration produces a byte derived from the calculated value, which is then XOR’ed with the corresponding encrypted byte. This process continues byte-by-byte until it reaches a termination condition.
A fully recovered Python implementation for the decryption logic is provided in Figure 18.
Figure 18: Complete Python implementation of the import string decryption routine
Import Fixup Table
The import relocation fixup table is a fixed-size array composed of two 32-bit RVA entries. The first RVA represents the memory displacement of where the data is referenced from. The second RVA points to the actual data in question. The entries in the fixup table can be categorized into three distinct types, each corresponding to a specific import component:
Encrypted DLL names
Encrypted API names
Import dispatcher references
Figure 19: Illustration of the import fixup table
The location of the fixup table is determined by the loader’s metadata, which specifies an offset from the start of the .data section to the start of the table. During initialization, the loader is responsible for applying the relocation fixups for each entry in the table.
Figure 20: Loader metadata that shows the Import fixup table entries and metadata used to find it
Recovery
Effective recovery from an obfuscated binary necessitates a thorough understanding of the protection mechanisms employed. While deobfuscation often benefits from working with an intermediate representation (IR) rather than the raw disassembly—an IR provides more granular control in undoing transformations—this obfuscator preserves the original compiled code, merely enveloping it with additional protection layers. Given this context, our deobfuscation strategy focuses on stripping away the obfuscator’s transformations from the disassembly to reveal the original instructions and data. This is achieved through a series of hierarchical phases, where each subsequent phase builds upon the previous one to ensure comprehensive deobfuscation.
We categorize this approach into three distinct categories that we eventually integrate:
CFG Recovery
Restoring the natural control flow by removing obfuscation artifacts at the instruction and basic block levels. This involves two phases:
Accounting for instruction dispatchers: Addressing the core of control flow protection that obscure the execution flow
Function identification andrecovery: Cataloging scattered instructions and reassembling them into their original function counterparts
Import Recovery
Original Import Table: The goal is to reconstruct the original import table, ensuring that all necessary library and function references are accurately restored.
Binary Rewriting
Generating Deobfuscated Executables: This process entails creating a new, deobfuscated executable that maintains the original functionality while removing ScatterBrain’s modifications.
Given the complexity of each category, we concentrate on the core aspects necessary to break the obfuscator by providing a guided walkthrough of our deobfuscator’s source code and highlighting the essential logic required to reverse these transformations. This step-by-step examination demonstrates how each obfuscation technique is methodically undone, ultimately restoring the binary’s original structure.
Our directory structure reflects this organized approach:
Figure 21: Directory structure of our deobfuscator library
This comprehensive recovery process not only restores the binaries to their original state but also equips analysts with the tools and knowledge necessary to combat similar obfuscation techniques in the future.
CFG Recovery
The primary obstacle disrupting the natural control flow graph is the use of instruction dispatchers. Eliminating these dispatchers is our first priority in obtaining the CFG. Afterward, we need to reorganize the scattered instructions back into their original function representations—a problem known as function identification, which is notoriously difficult to generalize. Therefore, we approach it using our specific knowledge about the obfuscator.
Linearizing the Scattered CFG
Our initial step in recovering the original CFG is to eliminate the scattering effect induced by instruction dispatchers. We will transform all dispatcher call instructions into direct branches to their resolved targets. This transformation linearizes the execution flow, making it straightforward to statically pursue the second phase of our CFG recovery. This will be implemented via brute-force scanning, static parsing, emulation, and instruction patching.
Function Identification and Recovery
We leverage a recursive descent algorithm that employs a depth-first search (DFS) strategy applied to known entry points of code, attempting to exhaust all code paths by “single-stepping” one instruction at a time. We add additional logic to the processing of each instruction in the form of “mutation rules” that stipulate how each individual instruction needs to be processed. These rules aid in stripping away the obfuscator’s code from the original.
Removing Instruction Dispatchers
Eliminating instruction dispatchers involves identifying each dispatcher location and its corresponding dispatch target. Recall that the target is a uniquely encoded 32-bit displacement located at the return address of the dispatcher call. To remove instruction dispatchers, it is essential to first understand how to accurately identify them. We begin by categorizing the defining properties of individual instruction dispatchers:
Target of a Near Call
Dispatchers are always the destination of a near call instruction, represented by the E8 opcode followed by a 32-bit displacement.
References Encoded 32-Bit Displacement at Return Address
Dispatchers reference the encoded 32-bit displacement located at the return address on the stack by performing a 32-bit read from the stack pointer. This displacement is essential for determining the next execution target.
Pairing of pushfqand popfqInstructions to Safeguard Decoding
Dispatchers use a pair of pushfq and popfq instructions to preserve the state of the RFLAGS register during the decoding process. This ensures that the dispatcher does not alter the original execution context, maintaining the integrity of register contents.
End with aretInstruction
Each dispatcher concludes with a ret instruction, which not only ends the dispatcher function but also redirects control to the next set of instructions, effectively continuing the execution flow.
Leveraging the aforementioned categorizations, we implement the following approach to identify and remove instruction dispatchers:
Brute-Force Scanner for Near Call Locations
Develop a scanner that searches for all near call instructions within the code section of the protected binary. This scanner generates a huge array of potential call locations that may serve as dispatchers.
Implementation of a Fingerprint Routine
The brute-force scan yields a large number of false positives, requiring an efficient method to filter them. While emulation can filter out false positives, it is computationally expensive to do it for the brute-force results.
Introduce a shallow fingerprinting routine that traverses the disassembly of each candidate to identify key dispatcher characteristics, such as the presence of pushfq and popfq sequences. This significantly improves performance by eliminating most false positives before concretely verifying them through emulation.
Emulation of Targets to Recover Destinations
Emulate execution starting from each verified call site to accurately recover the actual dispatch targets. Emulating from the call site ensures that the emulator processes the encoded offset data at the return address, abstracting away the specific decoding logic employed by each dispatcher.
A successful emulation also serves as the final verification step to confirm that we have identified a dispatcher.
Identification of Dispatch Targets via ret Instructions
Utilize the terminating ret instruction to accurately identify the dispatch target within the binary.
The ret instruction is a definitive marker indicating the end of a dispatcher function and the point at which control is redirected, making it a reliable indicator for target identification.
Brute-Force Scanner
The following Python code implements the brute-force scanner, which performs a comprehensive byte signature scan within the code segment of a protected binary. The scanner systematically identifies all potential callinstruction locations by scanning for the 0xE8 opcode associated with near call instructions. The identified addresses are then stored for subsequent analysis and verification.
Figure 22: Python implementation of the brute-force scanner
Fingerprinting Dispatchers
The fingerprinting routine leverages the unique characteristics of instruction dispatchers, as detailed in the Instruction Dispatchers section, to statically identify potential dispatcher locations within a protected binary. This identification process utilizes the results from the prior brute-force scan. For each address in this array, the routine disassembles the code and examines the resulting disassembly listing to determine if it matches known dispatcher signatures.
This method is not intended to guarantee 100% accuracy, but rather serve as a cost-effective approach to identifying call locations with a high likelihood of being instruction dispatchers. Subsequent emulation will be employed to confirm these identifications.
Successful Decoding of a callInstruction
The identified location must successfully decode to a call instruction. Dispatchers are always invoked via a call instruction. Additionally, dispatchers utilize the return address from the call site to locate their encoded 32-bit displacement.
Absence of Subsequent callInstructions
Dispatchers must not contain any call instructions within their disassembly listing. The presence of any call instructions within a presumed dispatcher range immediately disqualifies the call location as a dispatcher candidate.
Absence of Privileged Instructions and Indirect Control Transfers
Similarly to call instructions, the dispatcher cannot include privileged instructions or indirect unconditional jmps. Any presence of any such instructions invalidates the call location.
Detection of pushfqand popfqGuard Sequences
The dispatcher must contain pushfq and popfq instructions to safeguard the RFLAGS register during decoding. These sequences are unique to dispatchers and suffice for a generic identification without worrying about the differences that arise between how the decoding takes place.
Figure 23 is the fingerprint verification routine that incorporates all the aforementioned characteristics and validation checks given a potential call location:
Figure 23: The dispatch fingerprint routine
Emulating Dispatchers to Resolve Destination Targets
After filtering potential dispatchers using the fingerprinting routine, the next step is to emulate them in order to recover their destination targets.
Figure 24: Emulation sequence used to recover dispatcher destination targets
The Python code in Figure 24 performs this logic and operates as follows:
Initialization of the Emulator
Creates the core engine for simulating execution (EmulateIntel64), maps the protected binary image (imgbuffer) into the emulator’s memory space, maps the Thread Environment Block (TEB) as well to simulate a realistic Windows execution environment, and creates an initial snapshot to facilitate fast resets before each emulation run without needing to reinitialize the entire emulator each time.
MAX_DISPATCHER_RANGE specifies the maximum number of instructions to emulate for each dispatcher. The value 45 is chosen arbitrarily, sufficient given the limited instruction count in dispatchers even with the added mutations.
A try/except block is used to handle any exceptions during emulation. It is assumed that exceptions result from false positives among the potential dispatchers identified earlier and can be safely ignored.
Emulating Each Potential Dispatcher
For each potential dispatcher address (call_dispatch_ea), the emulator’s context is restored to the initial snapshot. The program counter (emu.pc) is set to the address of each dispatcher. emu.stepi() executes one instruction at the current program counter, after which the instruction is analyzed to determine whether we have finished.
If the instruction is a ret, the emulation has reached the dispatch point.
The dispatch target address is read from the stack using emu.parse_u64(emu.rsp).
The results are captured by d.dispatchers_to_target, which maps the dispatcher address to the dispatch target. The dispatcher address is additionally stored in the d.dispatcher_locs lookup cache.
The break statement exits the inner loop, proceeding to the next dispatcher.
Patching and Linearization
After collecting and verifying every captured instruction dispatcher, the final step is to replace each call location with a direct branch to its respective destination target. Since both near call and jmp instructions occupy 5 bytes in size, this replacement can be seamlessly performed by merely patching the jmp instruction over the call.
Figure 25: Patching sequence to transform instruction dispatcher calls to unconditional jmps to their destination targets
We utilize the dispatchers_to_target map, established in the previous section, which associates each dispatcher call location with its corresponding destination target. By iterating through this map, we identify each dispatcher call location and replace the original call instruction with a jmp. This substitution redirects the execution flow directly to the intended target addresses.
This removal is pivotal to our deobfuscation strategy as it removes the intended dynamic dispatch element that instruction dispatchers were designed to provide. Although the code is still scattered throughout the code segment, the execution flow is now statically deterministic, making it immediately apparent which instruction leads to the next one.
When we compare these results to the initial screenshot from the Instruction Dispatcher section, the blocks still appear scattered. However, their execution flow has been linearized. This progress allows us to move forward to the second phase of our CFG recovery.
Figure 26: Linearized instruction dispatcher control flow
Function Identification and Recovery
By eliminating the effects of instruction dispatchers, we have linearized the execution flow. The next step involves assimilating the dispersed code and leveraging the linearized control flow to reconstruct the original functions that comprised the unprotected binary. This recovery phase involves several stages, including raw instruction recovery, normalization, and the construction of the final CFG.
Function identification and recovery is encapsulated in the following two abstractions:
Recovered instruction (RecoveredInstr): The fundamental unit for representing individual instructions recovered from an obfuscated binary. Each instance encapsulates not only the raw instruction data but also metadata essential for relocation, normalization, and analysis within the CFG recovery process.
Recovered function (RecoveredFunc): The end result of successfully recovering an individual function from an obfuscated binary. It aggregates multiple RecoveredInstr instances, representing the sequence of instructions that constitute the unprotected function. The complete CFG recovery process results in an array of RecoveredFunc instances, each corresponding to a distinct function within the binary. We will utilize these results in the final Building Relocations in Deobfuscated Binaries section to produce fully deobfuscated binaries.
We do not utilize a basic block abstraction for our recovery approach given the following reasons. Properly abstracting basic blocks presupposes complete CFG recovery, which introduces unnecessary complexity and overhead for our purposes. Instead, it is simpler and more efficient to conceptualize a function as an aggregation of individual instructions rather than a collection of basic blocks in this particular deobfuscation context.
Figure 27: RecoveredInstr type definition
Figure 28: RecoveredFunc type definition
DFS Rule-Guided Stepping Introduction
We opted for a recursive-depth algorithm given the following reasons:
Natural fit for code traversal: DFS allows us to infer function boundaries based solely on the flow of execution. It mirrors the way functions call other functions, making it intuitive to implement and reason about when reconstructing function boundaries. It also simplifies following the flow of loops and conditional branches.
Guaranteed execution paths: We concentrate on code that is definitely executed. Given we have at least one known entry point into the obfuscated code, we know execution must pass through it in order to reach other parts of the code. While other parts of the code may be more indirectly invoked, this entry point serves as a foundational starting point.
By recursively exploring from this known entry, we will almost certainly encounter and identify virtually all code paths and functions during our traversal.
Adapts to instruction mutations: We tailor the logic of the traversal with callbacks or “rules” that stipulate how we process each individual instruction. This helps us account for known instruction mutations and aids in stripping away the obfuscator’s code.
The core data structures involved in this process are the following: CFGResult, CFGStepState, and RuleHandler:
CFGResult: Container for the results of the CFG recovery process. It aggregates all pertinent information required to represent the CFG of a function within the binary, which it primarily consumes from CFGStepState.
CFGStepState: Maintains the state throughout the CFG recovery process, particularly during the controlled-step traversal. It encapsulates all necessary information to manage the traversal state, track progress, and store intermediate results.
Recovered cache: Stores instructions that have been recovered for a protected function without any additional cleanup or verification. This initial collection is essential for preserving the raw state of the instructions as they exist within the obfuscated binary before any normalization or validation processes are applied after. It is always the first pass of recovery.
Normalized cache: The final pass in the CFG recovery process. It transforms the raw instructions stored in the recovered cache into a fully normalized CFG by removing all obfuscator-introduced instructions and ensuring the creation of valid, coherent functions.
Exploration stack: Manages the set of instruction addresses that are pending exploration during the DFS traversal for a protected function. It determines the order in which instructions are processed and utilizes a visited set to ensure that each instruction is processed only once.
Obfuscator backbone: A mapping to preserve essential control flow links introduced by the obfuscator
RuleHandler: Mutation rules are merely callbacks that adhere to a specific function signature and are invoked during each instruction step of the CFG recovery process. They take as input the current protected binary, CFGStepState, and the current step-in instruction. Each rule contains specific logic designed to detect particular types of instruction characteristics introduced by the obfuscator. Based on the detection of these characteristics, the rules determine how the traversal should proceed. For instance, a rule might decide to continue traversal, skip certain instructions, or halt the process based on the nature of the mutation.
Figure 29: CFGResult type definition
Figure 30: CFGStepState type definition
Figure 31: RuleHandler type definition
The following figure is an example of a rule that is used to detect the patched instruction dispatchers we introduced in the previous section and differentiating them from standard jmpinstructions:
Figure 32: RuleHandler example that identifies patched instruction dispatchers and differentiates them from standard jmp instructions
DFS Rule-Guided Stepping Implementation
The remaining component is a routine that orchestrates the CFG recovery process for a given function address within the protected binary. It leverages the CFGStepState to manage the DFS traversal and applies mutation rules to decode and recover instructions systematically. The result will be an aggregate of RecoveredInstrinstances that constitute the first pass of raw recovery:
Figure 33: Flow chart of our DFS rule-guided stepping algorithm
The following Python code directly implements the algorithm outlined in Figure 33. It initializes the CFG stepping state and commences a DFS traversal starting from the function’s entry address. During each step of the traversal, the current instruction address is retrieved from the to_explore exploration stack and checked against the visitedset to prevent redundant processing. The instruction at the current address is then decoded, and a series of mutation rules are applied to handle any obfuscator-induced instruction modifications. Based on the outcomes of these rules, the traversal may continue, skip certain instructions, or halt entirely.
Recovered instructions are appended to the recoveredcache, and their corresponding mappings are updated within the CFGStepState. The to_explore stack is subsequently updated with the address of the next sequential instruction to ensure systematic traversal. This iterative process continues until all relevant instructions have been explored, culminating in a CFGResult that encapsulates the fully recovered CFG.
With the raw instructions successfully recovered, the next step is to normalize the control flow. While the raw recovery process ensures that all original instructions are captured, these instructions alone do not form a cohesive and orderly function. To achieve a streamlined control flow, we must filter and refine the recovered instructions—a process we refer to as normalization. This stage involves several key tasks:
Updating branch targets: Once all of the obfuscator-introduced code (instruction dispatchers and mutations) are fully removed, all branch instructions must be redirected to their correct destinations. The scattering effect introduced by obfuscation often leaves branches pointing to unrelated code segments.
Merging overlapping basic blocks: Contrary to the idea of a basic block as a strictly single-entry, single-exit structure, compilers can produce code in which one basic block begins within another. This overlapping of basic blocks commonly appears in loop structures. As a result, these overlaps must be resolved to ensure a coherent CFG.
Proper function boundary instruction: Each function must begin and end at well-defined boundaries within the binary’s memory space. Correctly identifying and enforcing these boundaries is essential for accurate CFG representation and subsequent analysis.
Simplifying with Synthetic Boundary Jumps
Rather than relying on traditional basic block abstractions—which can impose unnecessary overhead—we employ synthetic boundary jumps to simplify CFG normalization. These artificial jmp instructions link otherwise disjointed instructions, allowing us to avoid splitting overlapping blocks and ensuring that each function concludes at a proper boundary instruction. This approach also streamlines our binary rewriting process when reconstructing the recovered functions into the final deobfuscated output binary.
Merging overlapping basic blocks and ensuring functions have proper boundary instructions amount to the same problem—determining which scattered instructions should be linked together. To illustrate this, we will examine how synthetic jumps effectively resolve this issue by ensuring that functions conclude with the correct boundary instructions. The exact same approach applies to merging basic blocks together.
Synthetic Boundary Jumps to Ensure Function Boundaries
Consider an example where we have successfully recovered a function using our DFS-based rule-guided approach. Inspecting the recovered instructions in the CFGState reveals a mov instruction as the final operation. If we were to reconstruct this function in memory as-is, the absence of a subsequent fallthrough instruction would compromise the function’s logic.
Figure 35: Example of a raw recovery that does not end with a natural function boundary instruction
To address this, we introduce a synthetic jump whenever the last recovered instruction is not a natural function boundary (e.g., ret, jmp, int3).
Figure 36: Simple Python routine that identifies function boundary instructions
We determine the fallthrough address, and if it points to an obfuscator-introduced instruction, we continue forward until reaching the first regular instruction. We call this traversal “walking the obfuscator’s backbone”:
Figure 37: Python routine that implements walking the obfuscator’s backbone logic
We then link these points with a synthetic jump. The synthetic jump inherits the original address as metadata, effectively indicating which instruction it is logically connected to.
Figure 38: Example of adding a synthetic boundary jmp to create a natural function boundary
Updating Branch Targets
After normalizing the control flow, adjusting branch targets becomes a straightforward process. Each branch instruction in the recovered code may still point to obfuscator-introduced instructions rather than the intended destinations. By iterating through thenormalized_flowcache (generated in the next section), we identify branching instructions and verify their targets using thewalk_backboneroutine.
This ensures that all branch targets are redirected away from the obfuscator’s artifacts and correctly aligned with the intended execution paths. Notice we can ignore callinstructions given that any non-dispatcher callinstruction is guaranteed to always be legitimate and never part of the obfuscator’s protection. These will, however, need to be updated during the final relocation phase outlined in the Building Relocations in Deobfuscated Binaries section.
Once recalculated, we reassemble and decode the instructions with updated displacements, preserving both correctness and consistency.
Figure 39: Python routine responsible for updating all branch targets
Putting It All Together
Putting it all together, we developed the following algorithm that builds upon the previously recovered instructions, ensuring that each instruction, branch, and block is properly connected, resulting in a completely recovered and deobfuscated CFG for an entire protected binary. We utilize the recovered cache to construct a new, normalized cache. The algorithm employs the following steps:
Iterate Over All Recovered Instructions
Traverse all recovered instructions produced from our DFS-based stepping approach.
Add Instruction to Normalized Cache
For each instruction, add it to the normalized cache, which captures the results of the normalization pass.
Identify Boundary Instructions
Determine whether the current instruction is a boundary instruction.
If it is a boundary instruction, skip further processing of this instruction and continue to the next one (return to Step 1).
Calculate Expected Fallthrough Instruction
Determine the expected fallthrough instruction by identifying the sequential instruction that follows the current one in memory.
Verify Fallthrough Instruction
Compare the calculated fallthrough instruction with the next instruction in the recovered cache.
If the fallthrough instruction is not the next sequential instruction in memory,check whether it’s a recovered instruction we already normalized:
If it is, add a synthetic jump to link the two together in the normalized cache.
If it is not, obtain the connecting fallthrough instruction from the recovery cache and append it to the normalized cache.
If the fallthrough instruction matches the next instruction in the recovered cache:
Do nothing, as the recovered instruction already correctly points to the fallthrough. Proceed to Step 6.
Handle Final Instruction
Check if the current instruction is the final instruction in the recovered cache.
If it is the final instruction:
Add a final synthetic boundary jump, because if we reach this stage, we failed the check in Step 3.
Continue iteration, which will cause the loop to exit.
If it is not the final instruction:
Continue iteration as normal (return to Step 1).
Figure 40: Flow chart of our normalization algorithm
The Python code in Figure 41 directly implements these normalization steps. It iterates over the recovered instructions and adds them to a normalized cache (normalized_flow), creates a linear mapping, and identifies where synthetic jumps are required. When a branch target points to obfuscator-injected code, it walks the backbone (walk_backbone) to find the next legitimate instruction. If the end of a function is reached without a natural boundary, a synthetic jump is created to maintain proper continuity. After the completion of the iteration, every branch target is updated (update_branch_targets), as illustrated in the previous section, to ensure that each instruction is correctly linked, resulting in a fully normalized CFG:
Figure 41: Python implementation of our normalization algorithm
Observing the Results
After applying our two primary passes, we have nearly eliminated all of the protection mechanisms. Although import protection remains to be addressed, our approach effectively transforms an incomprehensible mess into a perfectly recovered CFG.
For example, Figure 42 and Figure 43 illustrate the before and after of a critical function within the backdoor payload, which is a component of its plugin manager system. Through additional analysis of the output, we can identify functionalities that would have been impossible to delineate, much less in such detail, without our deobfuscation process.
Figure 42: Original obfuscated shadow::PluginProtocolCreateAndConfigure routine
Figure 43: Completely deobfuscated and functional shadow::PluginProtocolCreateAndConfigure routine
Import Recovery
Recovering and restoring the original import table revolves around identifying which import location is associated with which import dispatcher stub. From the stub dispatcher, we can parse the respective obf_imp_treference in order to determine the protected import that it represents.
We pursue the following logic:
Identify each valid call/jmp location associated to an import
The memory displacement for these will point to the respective dispatcher stub.
For HEADERLESS mode, we need to first resolve the fixup table to ensure the displacement points to a valid dispatcher stub.
For each valid location traverse the dispatcher stub to extract the obf_imp_t
The obf_imp_tcontains the RVAs to the encrypted DLL and API names.
Implement the string decryption logic
We need to reimplement the decryption logic in order to recover the DLL and API names.
This was already done in the initial Import Protection section.
We encapsulate the recovery of imports with the following RecoveredImportdata structure:
Figure 44: RecoveredImport type definition
RecoveredImportserves as the result produced for each import that we recover. It contains all the relevant data that we will use to rebuild the original import table when producing the deobfuscated image.
Locate Protected Import CALL and JMP Sites
Each protected import location will be reflected as either an indirect near call (FF/2) or an indirect near jmp (FF/4):
Figure 45: Disassembly of import calls and jmps representation
Indirect near calls and jmps fall under the FF group opcode where the Reg field within the ModR/M byte identifies the specific operation for the group:
/2: corresponds to CALL r/m64
/4: corresponds to JMP r/m64
Taking an indirect near call as an example and breaking it down looks like the following:
FF: group opcode.
15: ModR/M byte specifying CALL r/m64 with RIP-relative addressing.
15 is encoded in binary as 00010101
Mod (bits 6-7): 00
Indicates either a direct RIP-relative displacement or memory addressing with no displacement.
Reg (bits 3-5): 010
Identifies the call operation for the group
R/M (bits 0-2): 101
In 64-bit mode with Mod 00and R/M101, this indicates RIP-relative addressing.
<32-bit displacement>: added to RIPto compute the absolute address.
To find each protected import location and their associated dispatcher stubs we implement a trivial brute force scanner that locates all potential indirect near call/jmps via their first two opcodes.
Figure 46: Brute-force scanner to locate all possible import locations
The provided code scans the code section of a protected binary to identify and record all locations with opcode patterns associated with indirect call and jmp instructions. This is the first step we take, upon which we apply additional verifications to guarantee it is a valid import site.
Resolving the Import Fixup Table
We have to resolve the fixup table when we recover imports for the HEADERLESS protection in order to identify which import location is associated with which dispatcher. The memory displacement at the protected import site will be paired with its resolved location inside the table. We use this displacement as a lookup into the table to find its resolved location.
Let’s take a jmpinstruction to a particular import as an example.
Figure 47: Example of a jmp import instruction including its entry in the import fixup table and the associated dispatcher stub
The jmpinstruction’s displacement references the memory location 0x63A88, which points to garbage data. When we inspect the entry for this import in the fixup table using the memory displacement, we can identify the location of the dispatcher stub associated with this import at 0x295E1. The loader will update the referenced data at 0x63A88with 0x295E1, so that when the jmpinstruction is invoked, execution is appropriately redirected to the dispatcher stub.
Figure 48 is the deobfuscated code in the loader responsible for resolving the fixup table. We need to mimic this behavior in order to associate which import location targets which dispatcher.
$_Loop_Resolve_ImpFixupTbl:
mov ecx, [rdx+4] ; fixup , either DLL, API, or ImpStub
mov eax, [rdx] ; target ref loc that needs to be "fixed up"
inc ebp ; update the counter
add rcx, r13 ; calculate fixup fully (r13 is imgbase)
add rdx, 8 ; next pair entry
mov [r13+rax+0], rcx ; update the target ref loc w/ full fixup
movsxd rax, dword ptr [rsi+18h] ; fetch imptbl total size, in bytes
shr rax, 3 ; account for size as a pair-entry
cmp ebp, eax ; check if done processing all entries
jl $_Loop_Resolve_ImpTbl
Figure 48: Deobfuscated disassembly of the algorithm used to resolve the import fixup table
Resolving the import fixup table requires us to have first identified the data section within the protected binary and the metadata that identifies the import table (IMPTBL_OFFSET, IMPTBL_SIZE). The offset to the fixup table is from the start of the data section.
Figure 49: Python re-implementation of the algorithm used to resolve the import fixup table
Having the start of the fixup table, we simply iterate one entry at a time and identify which import displacement (location) is associated with which dispatcher stub (fixup).
Recovering the Import
Having obtained all potential import locations from the brute-force scan and accounted for relocations in HEADERLESS mode, we can proceed with the final verifications to recover each protected import. The recovery process is conducted as follows:
Decode the location into a valid call or jmp instruction
Any failure in decoding indicates that the location does not contain a valid instruction and can be safely ignored.
Use the memory displacement to locate the stub for the import
In HEADERLESS mode, each displacement serves as a lookup key into the fixup table for the respective dispatcher.
Extract the obf_imp_tstructure within the dispatcher
This is achieved by statically traversing a dispatcher’s disassembly listing.
The first lea instruction encountered will contain the reference to the obf_imp_t.
Process the obf_imp_tto decrypt both the DLL and API names
Utilize the two RVAs contained within the structure to locate the encrypted blobs for the DLL and API names.
Decrypt the blobs using the outlined import decryption routine.
Figure 50: Loop that recovers each protected import
The Python code iterates through every potential import location (potential_stubs) and attempts to decode each presumed call or jmp instruction to an import. A try/except block is employed to handle any failures, such as instruction decoding errors or other exceptions that may arise. The assumption is that any error invalidates our understanding of the recovery process and can be safely ignored. In the full code, these errors are logged and tracked for further analysis should they arise.
Next, the code invokes a GET_STUB_DISPLACEMENT helper function that obtains the RVA to the dispatcher associated with the import. Depending on the mode of protection, one of the following routines is used:
Figure 51: Routines that retrieve the stub RVA based on the protection mode
The recover_import_stubfunction is utilized to reconstruct the control flow graph (CFG) of the import stub, while _extract_lea_refexamines the instructions in the CFG to locate the leareference to the obf_imp_t. The GET_DLL_API_NAMESfunction operates similarly to GET_STUB_DISPLACEMENT, accounting for slight differences depending on the protection mode:
Figure 52: Routines that decrypt the DLL and API blobs based on the protection mode
After obtaining the decrypted DLL and API names, the code possesses all the necessary information to reveal the import that the protection conceals. The final individual output of each import entry is captured in a RecoveredImport object and two dictionaries:
d.imports
This dictionary maps the address of each protected import to its recovered state. It allows for the association of the complete recovery details with the specific location in the binary where the import occurs.
d.imp_dict_builder
This dictionary maps each DLL name to a set of its corresponding API names. It is used to reconstruct the import table, ensuring a unique set of DLLs and the APIs utilized by the binary.
This systematic collection and organization prepare the necessary data to facilitate the restoration of the original functionality in the deobfuscated output. In Figure 53 and Figure 54, we can observe these two containers to showcase their structure after a successful recovery:
Figure 53: Output of the d.imports dictionary after a successful recovery
Figure 54: Output of the d.imp_dict_builder dictionary after a successful recovery
Observing the Final Results
This final step—rebuilding the import table using this data—is performed by the build_import_table function in the pefile_utils.py source file. This part is omitted from the blog post due to its unavoidable length and the numerous tedious steps involved. However, the code is well-commented and structured to thoroughly address and showcase all aspects necessary for reconstructing the import table.
Nonetheless, the following figure illustrates how we generate a fully functional binary from a headerless-protected input. Recall that a headerless-protected input is a raw, headerless PE binary, almost analogous to a shellcode blob. From this blob we produce an entirely new, functioning binary with the entirety of its import protection completely restored. And we can do the same for all protection modes.
Figure 55: Display of completely restored import table for a binary protected in HEADERLESS mode
Building Relocations in Deobfuscated Binaries
Now that we can fully recover the CFG of protected binaries and provide complete restoration of the original import tables, the final phase of the deobfuscator involves merging these elements to produce a functional deobfuscated binary. The code responsible for this process is encapsulated within the recover_output64.py and the pefile_utils.pyPython files.
The rebuild process comprises two primary steps:
Building the Output Image Template
Building Relocations
1. Building the Output Image Template
Creating an output image template is essential for generating the deobfuscated binary. This involves two key tasks:
Template PE Image: A Portable Executable (PE) template that serves as the container for the output binary that incorporates the restoration of all obfuscated components. We also need to be cognizant of all the different characteristics between in-memory PE executables and on-file PE executables.
Handling Different Protection Modes: Different protection modes and input stipulate different requirements.
Headerless variants have their file headers stripped. We must account for these variations to accurately reconstruct a functioning binary.
Selective protection preserves the original imports to maintain functionality as well as includes a specific import protection for all the imports leveraged within the selected functions.
2. Building Relocations
Building relocations is a critical and intricate part of the deobfuscation process. This step ensures that all address references within the deobfuscated binary are correctly adjusted to maintain functionality. It generally revolves around the following two phases:
Calculating Relocatable Displacements: Identifying all memory references within the binary that require relocation. This involves calculating the new addresses where these references will point to. The technique we will use is generating a lookup table that maps original memory references to their new relocatable addresses.
Apply Fixups: Modifies the binary’s code to reflect the new relocatable addresses. This utilizes the aforementioned lookup table to apply necessary fixups to all instruction displacements that reference memory. This ensures that all memory references within the binary correctly point to their intended locations.
We intentionally omit the details of showcasing the rebuilding of the output binary image because, while essential to the deobfuscation process, it is straightforward enough and just overly tedious to be worthwhile examining in any depth. Instead, we focus exclusively on relocations, as they are more nuanced and reveal important characteristics that are not as apparent but must be understood when rewriting binaries.
Overview of the Relocation Process
Rebuilding relocations is a critical step in restoring a deobfuscated binary to an executable state. This process involves adjusting memory references within the code so that all references point to the correct locations after the code has been moved or modified. On the x86-64 architecture, this primarily concerns instructions that use RIP-relative addressing, a mode where memory references are relative to the instruction pointer.
Relocation is necessary when the layout of a binary changes, such as when code is inserted, removed, or shifted during deobfuscation. Given our deobfuscation approach extracts the original instructions from the obfuscator, we are required to relocate each recovered instruction appropriately into a new code segment. This ensures that the deobfuscated state preserves the validity of all memory references and that the accuracy of the original control and data flow is sustained.
Understanding Instruction Relocation
Instruction relocation revolves around the following:
Instruction’s memory address: the location in memory where an instruction resides.
Instruction’s memory memory references: references to memory locations used by the instruction’s operands.
Consider the following two instructions as illustrations:
Figure 56: Illustration of two instructions that require relocation
Unconditional jmpinstructionThis instruction is located at memory address 0x1000.It references its branch target at address 0x4E22. The displacement encoded within the instruction is 0x3E1D, which is used to calculate the branch target relative to the instruction’s position. Since it employs RIP-relative addressing, the destination is calculated by adding the displacement to the length of the instruction and its memory address.
leainstructionThis is the branch target for the jmpinstruction located at 0x4E22. It also contains a memory reference to the data segment, with an encoded displacement of 0x157.
When relocating these instructions, we must address both of the following aspects:
Changing the instruction’s address: When we move an instruction to a new memory location during the relocation process, we inherently change its memory address. For example, if we relocate this instruction from 0x1000 to 0x2000, the instruction’s address becomes 0x2000.
Adjusting memory displacements: The displacement within the instruction (0x3E1Dfor the jmp, 0x157for the lea) is calculated based on the instruction’s original location and the location of its reference. If the instruction moves, the displacement no longer points to the correct target address. Therefore, we must recalculate the displacement to reflect the instruction’s new position.
Figure 57: Updated illustration demonstration of what relocation would look like
When relocating instructions during the deobfuscation process, we must ensure accurate control flow and data access. This requires us to adjust both the instruction’s memory address and any displacements that reference other memory locations. Failing to update these values invalidates the recovered CFG.
What Is RIP-Relative Addressing?
RIP-relative addressing is a mode where the instruction references memory at an offset relative to the RIP (instruction pointer) register, which points to the next instruction to be executed. Instead of using absolute addresses, the instruction encapsulates the referenced address via a signed 32-bit displacement from the current instruction pointer.
Addressing relative to the instruction pointer exists on x86 as well, but only for control-transfer instructions that support a relative displacement (e.g., JCC conditional instructions, near CALLs, and near JMPs). The x64 ISA extended this to account for almost all memory references being RIP-relative. For example, most data references in x64 Windows binaries are RIP-relative.
An excellent tool to visualize the intricacies of a decoded Intel x64 instruction is ZydisInfo. Here we use it to illustrate how a LEA instruction (encoded as488D151B510600) references RIP-relative memory at 0x6511b.
Figure 58: ZydisInfo output for the lea instruction
For most instructions, the displacement is encoded in the final four bytes of the instruction. When an immediate value is stored at a memory location, the immediate follows the displacement. Immediate values are restricted to a maximum of 32 bits, meaning 64-bit immediates cannot be used following a displacement. However, 8-bit and 16-bit immediate values are supported within this encoding scheme.
Figure 59: ZydisInfo output for the mov instruction storing an immediate operand
Displacements for control-transfer instructions are encoded as immediate operands, with the RIP register implicitly acting as the base. This is evident when decoding a jnz instruction, where the displacement is directly embedded within the instruction and calculated relative to the current RIP.
Figure 60: ZydisInfo output for the jnz instruction with an immediate operand as the displacement
Steps in the Relocation Process
For rebuilding relocations we take the following approach:
Rebuilding the code section and creating a relocation mapWith the recovered CFG and imports, we commit the changes to a new code section that contains the fully deobfuscated code. We do this by:
Function-by-function processing: rebuild each function one at a time. This allows us to manage the relocation of each instruction within its respective function.
Tracking instruction locations: As we rebuild each function, we track the new memory locations of each instruction. This involves maintaining a global relocation dictionary that maps original instruction addresses to their new addresses in the deobfuscated binary. This dictionary is crucial for accurately updating references during the fixup phase.
Applying fixupsAfter rebuilding the code section and establishing the relocation map, we proceed to modify the instructions so that their memory references point to the correct locations in the deobfuscated binary. This restores the binary’s complete functionality and is achieved by adjusting memory references to code or data an instruction may have.
Rebuilding the Code Section and Creating a Relocation Map
To construct the new deobfuscated code segment, we iterate over each recovered function and copy all instructions sequentially, starting from a fixed offset—for example, 0x1000. During this process, we build a global relocation dictionary (global_relocs) that maps each instruction to its relocated address. This mapping is essential for adjusting memory references during the fixup phase.
The global_relocs dictionary uses a tuple as the key for lookups, and each key is associated with the relocated address of the instruction it represents. The tuple consists of the following three components:
Original starting address of the function: The address where the function begins in the protected binary. It identifies the function to which the instruction belongs.
Original instruction address within the function: The address of the instruction in the protected binary. For the first instruction in a function, this will be the function’s starting address.
Synthetic boundary JMP flag: A boolean value indicating whether the instruction is a synthetic boundary jump introduced during normalization. These synthetic instructions were not present in the original obfuscated binary, and we need to account for them specifically during relocation because they have no original address.
Figure 61: Illustration of how the new code segment and relocation map are generated
The following Python code implements the logic outlined in Figure 61. Error handling and logging code has been stripped for brevity.
Figure 62: Python logic that implements the building of the code segment and generation of the relocation map
Initialize current offset Set the starting point in the new image buffer where the code section will be placed. The variable curr_off is initialized to starting_off, which is typically 0x1000. This represents the conventional start address of the .text section in PE files. For SELECTIVE mode, this will be the offset to the start of the protected function.
Iterate over recovered functions Loop through each recovered function in the deobfuscated control flow graph (d.cfg). func_ea is the original function entry address, and rfn is a RecoveredFunc object encapsulating the recovered function’s instructions and metadata.
Handle the function start address first
Set function’s relocated start address: Assign the current offset to rfn.reloc_ea, marking where this function will begin in the new image buffer.
Update global relocation map: Add an entry to the global relocation map d.global_relocs to map the original function address to its new location.
Iterate over each recovered instruction Loop through the normalized flow of instructions within the function. We use the normalized_flow as it allows us to iterate over each instruction linearly as we apply it to the new image.
Set instruction’s relocated address: Assign the current offset to r.reloc_ea, indicating where this instruction will reside in the new image buffer.
Update global relocation map: Add an entry to d.global_relocs for the instruction, mapping its original address to the relocated address.
Update the output image: Write the instruction bytes to the new image buffer d.newimgbuffer at the current offset. If the instruction was modified during deobfuscation (r.updated_bytes), use those bytes; otherwise, use the original bytes (r.instr.bytes).
Advance the offset: Increment curr_off by the size of the instruction to point to the next free position in the buffer and move on to the next instruction until the remainder are exhausted.
Align current offset to 16-byte boundaryAfter processing all instructions in a function, align curr_off to the next 16-byte boundary. We use 8 bytes as an arbitrary pointer-sized value from the last instruction to pad so that the next function won’t conflict with the last instruction of the previous function. This further ensures proper memory alignment for the next function, which is essential for performance and correctness on x86-64 architectures. Then repeat the process from step 2 until all functions have been exhausted.
This step-by-step process accurately rebuilds the deobfuscated binary’s executable code section. By relocating each instruction, the code prepares the output template for the subsequent fixup phase, where references are adjusted to point to their correct locations.
Applying Fixups
After building the deobfuscated code section and relocating each recovered function in full, we apply fixups to correct addresses within the recovered code. This process adjusts the instruction bytes in the new output image so that all references point to the correct locations. It is the final step in reconstructing a functional deobfuscated binary.
We categorize fixups into three distinct categories, based primarily on whether they apply to control flow or data flow instructions. We further distinguish between two types of control flow instructions: standard branching instructions and those introduced by the obfuscator through the import protection. Each type has specific nuances that require tailored handling, allowing us to apply precise logic to each category.
Import Relocations: These involve calls and jumps to recovered imports.
Control Flow Relocations: All standard control flow branching branching instructions.
Data Flow Relocations: Instructions that reference static memory locations.
Using these three categorizations, the core logic boils down to the following two phases:
Resolving displacement fixups
Differentiate between displacements encoded as immediate operands (branching instructions) and those in memory operands (data accesses and import calls).
Calculate the correct fixup values for these displacements using the d.global_relocs map generated prior.
Update the output image buffer
Once the displacements have been resolved, write the updated instruction bytes into the new code segment to reflect the changes permanently.
To achieve this, we utilize several helper functions and lambda expressions. The following is a step-by-step explanation of the code responsible for calculating the fixups and updating the instruction bytes.
Figure 63: Helper routines that aid in applying fixups
Define lambda helper expressions
PACK_FIXUP: packs a 32-bit fixup value into a little-endian byte array.
CALC_FIXUP: calculates the fixup value by computing the difference between the destination address (dest) and the end of the current instruction (r.reloc_ea + size), ensuring it fits within 32 bits.
IS_IN_DATA: checks if a given address is within the data section of the binary. We exclude relocating these addresses, as we preserve the data section at its original location.
Resolve fixups for each instruction
Import and data flow relocations
Utilize the resolve_disp_fixup_and_apply helper function as both encode the displacement within a memory operand.
Control flow relocations
Use the resolve_imm_fixup_and_apply helper as the displacement is encoded in an immediate operand.
During our CFG recovery, we transformed each jmp and jcc instruction to its near jump equivalent (from 2 bytes to 6 bytes) to avoid the shortcomings of 1-byte short branches.
We force a 32-bit displacement for each branch to guarantee a sufficient range for every fixup.
Update the output image buffer
Decode the updated instruction bytes to have it reflect within the RecoveredInstrthat represents it.
Write the updated bytes to the new image buffer
updated_bytesreflects the final opcodes for a fully relocated instruction.
With the helpers in place, the following Python code implements the final processing for each relocation type.
Figure 64: The three core loops that address each relocation category
Import Relocations: The first for loop handles fixups for import relocations, utilizing data generated during the Import Recovery phase. It iterates over every recovered instructionrwithin therfn.relocs_importscache and does the following:
Prepare updated instruction bytes: initialize r.updated_byteswith a mutable copy of the original instruction bytes to prepare it for modification.
Retrieve import entry and displacement: obtain the import entry from the imports dictionaryd.importsand retrieve the new RVA from d.import_to_rva_map using the import’s API name.
Apply fixup: use theresolve_disp_fixup_and_apply helper to calculate and apply the fixup for the new RVA. This adjusts the instruction’s displacement to correctly reference the imported function.
Update image buffer: write r.updated_bytesback into the new image usingupdate_reloc_in_img. This finalizes the fixup for the instruction in the output image.
Control Flow Relocations: The second for loop handles fixups for control flow branching relocations (call, jmp, jcc). Iterating over each entryin rfn.relocs_ctrlflow, it does the following:
Retrieve destination: extract the original branch destination target from the immediate operand.
Get relocated address: reference the relocation dictionaryd.global_relocsto obtain the branch target’s relocated address. If it’s a call target, then we specifically look up the relocated address for the start of the called function.
Apply fixup: useresolve_imm_fixup_and_applyto adjust the branch target to its relocated address.
Update buffer: finalize the fixup by writingr.updated_bytesback into the new image using update_reloc_in_img.
Data Flow Relocations: The final loop handles the resolution of all static memory references stored withinrfn.relocs_dataflow. First, we establish a list of KNOWN instructions that require data reference relocations. Given the extensive variety of such instructions, this categorization simplifies our approach and ensures a comprehensive understanding of all possible instructions present in the protected binaries. Following this, the logic mirrors that of the import and control flow relocations, systematically processing each relevant instruction to accurately adjust their memory references.
After reconstructing the code section and establishing the relocation map, we proceeded to adjust each instruction categorized for relocation within the deobfuscated binary. This was the final step in restoring the output binary’s full functionality, as it ensures that each instruction accurately references the intended code or data segments.
Observing the Results
To demonstrate our deobfuscation library for ScatterBrain, we conduct a test study showcasing its functionality. For this test study, we select three samples: a POISONPLUG.SHADOW headerless backdoor and two embedded plugins.
We develop a Python script, example_deobfuscator.py, that consumes from our library and implements all of the recovery techniques outlined earlier. Figure 65 and Figure 66 showcase the code within our example deobfuscator:
Figure 65: The first half of the Python code in example_deobfuscator.py
Figure 66: The second half of the Python code in example_deobfuscator.py
Running example_deobfuscator.py we can see the following. Note, it takes a bit given we have to emulate more than 16,000 instruction dispatchers that were found within the headerless backdoor.
Figure 67: The three core loops that address each relocation category
Focusing on the headerless backdoor both for brevity and also because it is the most involved in deobfuscating, we first observe its initial state inside the IDA Pro disassembler before we inspect the output results from our deobfuscator. We can see that it is virtually impenetrable to analysis.
Figure 68: Observing the obfuscated headerless backdoor in IDA Pro
After running our example deobfuscator and producing a brand new deobfuscated binary, we can see the drastic difference in output. All the original control flow has been recovered, all of the protected imports have been restored, and all required relocations have been applied. We also account for the deliberately removed PE header of the headerless backdoor that ScatterBrain removes.
Figure 69: Observing the deobfuscated headerless backdoor in IDA Pro
Given we produce functional binaries as part of the output, the subsequent deobfuscated binary can be either run directly or debugged within your favorite debugger of choice.
Figure 70: Debugging the deobfuscated headerless backdoor in everyone’s favorite debugger
Conclusion
In this blog post, we delved into the sophisticated ScatterBrain obfuscator used by POISONPLUG.SHADOW, an advanced modular backdoor leveraged by specific China-nexus threat actors GTIG has been tracking since 2022. Our exploration of ScatterBrain highlighted the intricate challenges it poses for defenders. By systematically outlining and addressing each protection mechanism, we demonstrated the significant effort required to create an effective deobfuscation solution.
Ultimately, we hope that our work provides valuable insights and practical tools for analysts and cybersecurity professionals. Our dedication to advancing methodologies and fostering collaborative innovation ensures that we remain at the forefront of combating sophisticated threats like POISONPLUG.SHADOW. Through this exhaustive examination and the introduction of our deobfuscator, we contribute to the ongoing efforts to mitigate the risks posed by highly obfuscated malware, reinforcing the resilience of cybersecurity defenses against evolving adversarial tactics.
Special thanks to Conor Quigley and Luke Jenkins from the Google Threat Intelligence Group for their contributions to both Mandiant and Google’s efforts in understanding and combating the POISONPLUG threat. We also appreciate the ongoing support and dedication of the teams at Google, whose combined efforts have been crucial in enhancing our cybersecurity defenses against sophisticated adversaries.
AWS Deadline Cloud now supports Maxon Cinema 4D and Maxon Redshift in its Service-Managed Fleets and Customer-Managed Fleets. With this update, creative teams can seamlessly leverage the cloud to render complex projects and access flexible Usage-Based Licensing (UBL).
With AWS Deadline Cloud, you can submit Cinema 4D jobs to Deadline Cloud without having to manage your own render farm infrastructure. You can now scale Cinema 4D and Redshift rendering workloads effortlessly, eliminating bottlenecks caused by local resource limitations. UBL integration offers a pay-as-you-go licensing model, ideal for studios managing dynamic workloads. You can build pipelines for 3D graphics and visual effects using Cinema 4D without having to set up, configure, or manage the worker infrastructure yourself. Service-Managed Fleets can be configured in minutes so you can begin rendering immediately. Customers using Customer-Managed Fleets can also use Cinema 4D and Redshift UBL by integrating the license into their workflows, enabling access and pay-as-you-go usage.
Creative teams can get started today in all AWS Regions where Deadline Cloud is available.
AWS announces the general availability of Amazon S3 Metadata, the easiest and fastest way to discover and understand your Amazon S3 data. S3 Metadata provides automated and easily queried metadata that updates in near real time, simplifying business analytics, real-time inference applications, and more. S3 Metadata supports object metadata, which includes system-defined details like size and the source of the object, and custom metadata, which allows you to use tags to annotate your objects with information like product SKU, transaction ID, or content rating.
S3 Metadata automatically captures metadata from objects as they are uploaded into a bucket and makes that metadata queryable in a read-only table. As data in your bucket changes, S3 Metadata updates the table within minutes to reflect the latest changes. These metadata tables are stored in Amazon S3 Tables, storage optimized for tabular data. The S3 Tables integration with AWS Glue Data Catalog is in preview, allowing you to stream, query, and visualize data—including S3 Metadata tables—using AWS analytics services such as Amazon Athena, Amazon Data Firehose, Amazon EMR, Amazon QuickSight, and Amazon Redshift. Additionally, S3 Metadata integrates with Amazon Bedrock, allowing for the annotation of AI-generated videos with metadata that specifies its AI origin, creation timestamp, and the specific model used for its generation.
S3 Metadata is available in the following AWS Regions: US East (N. Virginia), US East (Ohio), and US West (Oregon).
AWS Elemental MediaConnect now supports a set of diagnostic metrics designed to provide visibility into the quality of your video and audio streams. The new metrics detect black frames, frozen video, and audio silence allowing you to quickly identify and address potential disruptions. This level of monitoring goes beyond traditional network performance indicators by analyzing the actual content, so you can maintain a high-quality experience for viewers.
With these metrics, you have the flexibility to configure custom thresholds tailored to your preferences. This enables you to swiftly identify and address any interruptions to content delivery.
To learn more about enabling the content quality metrics, visit the AWS Elemental MediaConnect monitoring documentation page.
AWS Elemental MediaConnect is a reliable, secure, and flexible transport service for live video that enables broadcasters and content owners to build live video workflows and securely share live content with partners and customers. MediaConnect helps customers transport high-value live video streams into, through, and out of the AWS Cloud. MediaConnect can function as a standalone service or as part of a larger video workflow with other AWS Elemental Media Services, a family of services that form the foundation of cloud-based workflows to transport, transcode, package, and deliver video.
Visit the AWS Region Table for a full list of AWS Regions where MediaConnect is available. To learn more about MediaConnect, please visit here.
Amazon Web Services (AWS) is announcing the general availability of AWS Managed Notifications, a new feature of AWS User Notifications that enhances how customers receive and manage AWS Health notifications. This feature allows you to view and modify default AWS Health notifications in the Console Notifications Center, alongside your custom notifications such as CloudWatch alarms.
A dedicated user interface is now available to manage notification subscriptions, including the ability to unsubscribe the primary or alternate contact emails from specific notification categories like ‘Operational events’. You can easily subscribe to Health Notifications through additional delivery channels. Supported channels include push notifications to the AWS Console Mobile App, AWS Chatbot (for Slack and Microsoft Teams integrations), and email.
Configuring and viewing notifications in the Console Notifications Center is offered at no additional cost.
Today, Amazon Web Services (AWS) is announcing the general availability of AWS Managed Notifications in the AWS Console Mobile Application. You can now get push notifications for default AWS Health notifications and view them in the AWS Console Mobile Application’s notification inbox, alongside your user-configured notifications such as CloudWatch alarms.
To get started, visit the AWS Console Notifications Center and select the AWS managed notifications subscriptions option in the navigation panel. Next, select Manage subscriptions for the specific notifications you’d like to receive and click Add delivery channels. Finally, in the Add delivery channels modal’s AWS Console Mobile App section, select your device and click Add delivery channels.
Configuring and viewing notifications in the AWS Console Notifications Center and the AWS Console Mobile Application are offered at no additional cost.
The AWS Console Mobile App lets you stay informed and connected with your AWS resources while on-the-go. Visit the AWS Console Mobile Application product page for more information. For more information about AWS User Notifications, visit the product page.
Amazon Elastic Kubernetes Service (Amazon EKS) now offers new update strategies for managed node groups, giving you control over how Amazon EC2 instances in your clusters are updated with new configurations or for new Kubernetes versions. This feature provides flexibility to make changes to your Amazon EKS cluster nodes in a way that best suits your use case, while reducing operational overhead and compute costs.
Amazon EKS managed node group update strategies let you choose between the current EKS managed node group update behavior, Default, and a new strategy, Minimal Capacity, that attempts to update the managed node group by launching fewer new EC2 instances compared to the Default strategy. Minimal Capacity is useful for managed node groups that have been scaled to zero, are using EC2 instances in high demand, or instances with limited availability. This is especially beneficial, for example, if your managed node groups have GPU-accelerated instances or instances purchased using a capacity reservation like Reserved Instances. By default, both existing and new EKS managed node groups use the “Default” update strategy, which updates managed node groups in the same way as before this launch.
EKS managed node group update strategies is available today at no additional cost in all AWS Regions, except AWS GovCloud (US) and China Regions.
AWS now supports the Zone Groups for Availability Zones across all AWS Regions, making it easier for you to differentiate groups of Local Zones and Availability Zones.
Zone Groups were initially launched to help you identify related groups of Local Zones that reside in the same geography. For example, the two interconnected Local Zones in Los Angeles (us-west-2-lax-1a and us-west-2-lax-1b) make up the us-west-2-lax-1 Zone Group. These Zone Groups are used for opting in to the AWS Local Zones.
You can now find the Zone Group for Availability Zones for all Regions in the DescribeAvailabilityZones API. For example, the Availability Zones in the US West (Oregon) Region make up the us-west-2-zg-1 Zone GroupName, where us-west-2 indicates the Region, and zg-1 indicates it is the group of AZs in the Region. This new identifier (such as us-west-2-zg-1) has replaced the previous naming (such as us-west-2).
Google Kubernetes Engine (GKE) provides users with a lot of options when it comes to configuring their cluster networks. But with today’s highly dynamic environments, GKE platform operators tell us that they want more flexibility when it comes to changing up their configurations. To help, today we are excited to announce a set of features and capabilities designed to make GKE cluster and control-plane networking more flexible and easier to configure.
Specifically, we’ve decoupled GKE control-plane access from node-pool IP configuration, providing you with granular control over each aspect. Furthermore, we’ve introduced enhancements to each sub-component, including:
Cluster control-plane access
Added a DNS-based approach to accessing the control plane. In addition, you can now enable or disable IP-based or DNS-based access to control-plane endpoints at any time.
Each node-pool now has its own configuration, and you can now detach or attach a public IP for each node-pool independently at any time during the node-pool’s lifecycle.
You can now change a cluster’s default configuration of attaching a public IP on the newly provisioned node pools at any time. This configuration change doesn’t require you to re-create your cluster.
Regardless of how you configure a cluster’s control-plane access, or attach and detach a public IP from a node pool, the traffic between nodes to the cluster’s control plane always remains private, no matter what.
With these new changes, going forward:
GKE platform admins and operators can now easily switch between less restrictive networking configurations (e.g., control plane and/or nodes accessible from the internet) and the most restrictive configurations, where only authorized users can access the control plane, and nodes are not exposed to the internet. The decision to make a cluster public or private is no longer immutable, giving customers more flexibility without having to make upfront decisions.
There are more ways to connect to the GKE control plane. In addition to IP-based access, we now introduce DNS-based access to the control plane. You can use IAM and authentication-based policies to add policy-based, dynamic security to access the GKE control plane.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud containers and Kubernetes’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9c95d658e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com’), (‘image’, None)])]>
Previous challenges
Due to the complexities and varieties of customers’ workloads and use cases, it is important to provide a simple and flexible way for customers to configure and operate the connectivity to GKE control plane and GKE nodes.
Control-plane connectivity and node-pool configuration are a key part of configuring GKE. We’ve continuously enhanced GKE’s networking capabilities to address customer concerns, providing more options for secure and flexible connectivity, including capabilities such as private clusters, VPC Peering-based connectivity, Private Service Connect-based connectivity, and private/public node pools.
While there have been a lot of improvements in configuration, usability and secure connectivity, there were still certain configuration challenges when it comes to complexity, usability and scale, such as:
InflexibleGKE control plane and node access configuration: GKE customers need to make an upfront one-way decision whether to create a private or public cluster during the cluster creation process. This configuration could not be changed unless the cluster is re-created.
The node pool network IP/ type configuration could not be changed once a cluster was created.
Confusing terms such as Public / Private clusters, creating confusion as to whether the configuration is for control-plane access or node-pool configuration.
Benefits of the new features
With these changes to GKE networking, we hope you will see benefits in the following areas.
Flexibility:
Clusters now have unified and flexible configuration. Clusters with or without external endpoints all share the same architecture and support the same functionality. You can secure access to clusters based on controls and best practices that meet your needs. All communication between the nodes in your cluster and the control plane use a private internal IP address.
You can change the control plane access and cluster node configuration settings at any time without having to re-create the cluster.
Security:
DNS-based endpoints with VPC Service Controls provide a multi-layer security model that protects your cluster against unauthorized networks as well as from unauthorized identities accessing the control plane. VPC Service Controls integrate with Cloud Audit Logs to monitor access to the control plane.
Private nodes and the workloads running on them are not directly accessible from the public internet, significantly reducing the potential for external attacks targeting your workloads.
You can block control plane access from Google Cloud external IP addresses or from external IP addresses to fully isolate the cluster control plane and reduce exposure to potential security threats.
Compliance: If you work in an industry with strict data-access and storage regulations, private nodes help ensure that sensitive data remains within your private network.
Control: Private nodes give you granular control over how traffic flows in and out of your cluster. You can configure firewall rules and network policies to allow only authorized communication. If you operate across a multi-cloud environment, private nodes can help you establish secure and controlled communication between different environments.
Getting started
Accessing the cluster control plane
There are now several ways to access a cluster’s control plane: via traditional public or private IP-based endpoints, and the new DNS-based endpoint. Whereas IP-based endpoints entail tedious IP address configuration (including static authorized network configuration, allowing private accessing from any regions, etc.), DNS-based endpoints offer a simplified, IAM policy-based, dynamic, flexible and more secure way to access a cluster’s control plane.
With these changes, you can now configure the cluster’s control plane to be reachable by all three endpoints (DNS-based, public or private IP-based) at same time, locking the cluster down to the granularity of a single endpoint in any permutation that you would like. You can apply your desired configuration at cluster creation time or adjust it later.
Here’s how to configure access for GKE node-pools.
GKE Standard Mode: In GKE Standard mode of operation, a private IP is always attached to every node no matter what. This private IP is used for private connectivity to the cluster’s control plane.
You can add or remove a public IP to all nodes in a node-pool at node-pool creation time. This configuration can be performed on each node-pool independently.
Each cluster has a default behavior flag that’s used at node-pool creation time if the flag is not explicitly set beforehand during node-pool creation time.
Note: Mutating a cluster’s default state does not change behavior of existing node pools. The new state is used only when a new node-pool is being created.
GKE Autopilot mode of operation: All workloads running on nodes with or without a public IP are based on the cluster’s default behavior. You can override the cluster’s default behavior on each workload independently by adding the following nodeSelector to your Pod specification:
However, overriding a cluster’s default behavior causes all workloads for which behavior hasn’t been explicitly set to be rescheduled), to run on nodes that match the cluster’s default behavior.
Conclusion
Given the complexity and variety of workloads that run on GKE, it’s important to have a simple and flexible way to configure and operate the connectivity to the GKE control plane and nodes. We hope these enhancements to GKE control-plane connectivity and node-pool configuration will bring new levels of flexibility and simplicity to GKE operations. For further details and documentation, please see:
More and more customers deploy their workloads on Google Cloud. But what if your workloads are sitting in another cloud? Planning, designing, and implementing a migration of your workloads, data, and processes is not an easy task. It gets even harder if you have to meet requirements that have an impact on the migration, such as avoiding downtime (also known as a zero-downtime migration). Moreover, some migrations require a certain amount of refactoring, for example, adapting your workloads to a new environment. This opens up a series of challenges, especially if you’re dealing with third-party or legacy software. You might also need to adapt your deployment and operational processes to work with your new environment.
And what if you don’t want to migrate all your workloads? Even if you’re not moving everything to Google Cloud, adopting a multicloud approach is still a migration. Many organizations choose to keep some workloads in their current cloud provider while moving others to Google Cloud.
Although managing workloads across multiple clouds has its own challenges, particularly when it comes to workload distribution and inter-cloud connectivity, a well-executed multicloud strategy lets you maintain flexibility, avoid vendor lock-in, and improve system resilience.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9c95d1ca30>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
To help you in your migration journey, we published a series of reference guides about migrating from Amazon Web Services (AWS) to Google Cloud. This series aims to help you design, plan, and implement a migration process from AWS to Google Cloud. It can also help decision makers who are evaluating migration opportunities and want to explore what it looks like to migrate. For example, the series includes guides that cover migration journeys, such as:
These guides follow the phases of the Google Cloud migration framework (assess, plan, migrate, optimize) in the context of specific AWS to Google Cloud migration use cases.
This approach helps to avoid big-bang, risky migrations, when working on each migration plan task. For details about completing each task of this migration plan, see the AWS to Google Cloud migration guides.
Ready to learn more? Learn more about migrating to Google Cloud and discover how Google Cloud Consulting can help you learn, build, operate and succeed.
Organizations are increasingly using Confidential Computing to help protect their sensitive data in use as part of their data protection efforts. Today, we are excited to highlight new Confidential Computing capabilities that make it easier for organizations of all sizes to adopt this important privacy-preserving technology.
1. Confidential GKE Nodes on the general-purpose C3D machine series for GKE Standard mode, generally available
Previously, Confidential GKE Nodes were only available on two machine series powered by the 2nd and 3rd Gen AMD EPYC™ processors: the general-purpose N2D machine series and the compute-optimized C2D machine series. Today, Confidential GKE Nodes are also generally available on the newer, more performant C3D machine series with AMD SEV in GKE Standard mode.
The general-purpose C3D machine series is powered by 4th Gen AMD EPYC™ (Genoa) processors to deliver optimal, reliable, and consistent performance. Customers often use Confidential GKE Nodes to address potential concerns about cloud provider risk, especially since no code changes are required to enable it.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud security products’), (‘body’, <wagtail.rich_text.RichText object at 0x3e9c95d16be0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
2. Confidential GKE Nodes on GKE Autopilot mode, generally available
Google Kubernetes Engine (GKE) offers two modes of operation: Standard and Autopilot. In Standard mode, you manage the underlying infrastructure, including configuring the individual nodes. In Autopilot mode, GKE manages the underlying infrastructure such as node configuration, autoscaling, auto-upgrades, baseline security configurations, and baseline networking configuration.
Previously, Confidential GKE Nodes were only offered on GKE Standard mode. Today, Confidential GKE Nodes are generally available on GKE Autopilot mode with the general purpose N2D machine series running with AMD Secure Encryption Virtualization (AMD SEV). This means that you can now use Confidential GKE Nodes to protect your data in use without having to manage the underlying infrastructure.
Confidential GKE Nodes can be enabled on new GKE Autopilot clusters with no code changes. Simply add the command --enable-confidential-nodes during new cluster creation. Additional pricing does apply and this new offering is available in all regions that offer the N2D machine series. Go here to get started today.
3. Confidential Space with Intel TDX-based Confidential VMs, in preview
Confidential Space allows multiple parties to securely collaborate on computations using their combined data without revealing their individual datasets to each other or to the operator enabling this collaboration. This is achieved by isolating data within a Trusted Execution Environment (TEE).
We are seeing adoption and need for these capabilities that are putting sensitive data to use in a private and compliant manner in financial services, Web3, and other industries.
Confidential Space is built on Confidential VMs. Previously, Confidential Space was only available on Confidential VMs with AMD Secure Encryption Virtualization (AMD SEV) enabled. Today, Confidential Space is also available on Confidential VMs with Intel Trust Domain Extensions (Intel TDX) enabled in preview.
Confidential Space with Intel TDX enabled offers data confidentiality, data integrity, and hardware-rooted attestation, further enhancing security. Confidential Space with Intel TDX runs on the general purpose C3 machine series, which are powered by 4th Gen Intel Xeon Scalable CPUs.
These performant C3 VMs also have Intel Advanced Matrix Extensions (Intel AMX), a new built-in accelerator that helps improve the performance of deep-learning training and inference on the CPU, on by default. Confidential Space supporting the additional confidential computing type provides users greater flexibility in selecting the right CPU platform based on performance, cost, and security requirements. Learn more about Confidential Space or check out this new Youtube video about Intel TDX.
4. Confidential VMs with NVIDIA H100 GPUs, in preview
We expanded our capabilities for secure computation last year when we unveiled Confidential VMs on the accelerator-optimized A3 machine series with NVIDIA H100 GPUs. This offering extends hardware-based data protection from the CPU to GPUs, helping to ensure the confidentiality and integrity of artificial intelligence (AI), machine learning (ML), and scientific simulation workloads leveraging GPUs can be protected while data is in use.
Today, these confidential GPUs are available in preview. Confidential VMs on the A3 machine series protects data and code in use, so that means sensitive training data or data labels, proprietary models or model weights, and top secret queries remain protected even during compute-intensive operations, like training, fine tuning, or serving.
This groundbreaking technology combines the power of Confidential Computing and accelerated computing to enable customers to harness the potential of AI while helping to maintain high levels of data security and IP protection, which can open new possibilities for innovation in regulated industries and collaborative AI development.
You can sign up here to try Confidential VMs with NVIDIA H100 GPUs. To learn more, check out our previous announcements on this offering here and here.
What’s coming in 2025
Google Cloud is committed to expanding Confidential Computing to more products and services because we want customers to have easy access to the latest in security innovation. Whether that’s adding Confidential Computing support to newer hardware or on accelerators or to services like GKE Autopilot, we aim to provide our customers with a comprehensive set of Confidential Computing solutions.
Confidential Computing is an essential technology for protecting sensitive data in the cloud, and we look forward to innovating with you in this space. You can explore the Confidential Computing products here.
Amazon Web Services (AWS) announces expansion in the Kingdom of Saudi Arabia by launching a new Amazon CloudFront edge location in Jeddah. The new AWS edge location brings the full suite of benefits provided by Amazon CloudFront, a secure, highly distributed, and scalable content delivery network (CDN) that delivers static and dynamic content, APIs, and live and on-demand video with low latency and high performance.
All Amazon CloudFront edge locations are protected against infrastructure-level DDoS threats with AWS Shield Standard that uses always-on network flow monitoring and in-line mitigation to minimize application latency and downtime. You also have the ability to add additional layers of security for applications to protect them against common web exploits and bot attacks by enabling AWS Web Application Firewall (WAF).
Traffic delivered from this edge location is included within the Middle East region pricing. To learn more about AWS edge locations, see CloudFront edge locations.
Amazon Bedrock now offers multimodal support for Cohere Embed 3 Multilingual and Embed 3 English, foundation models that generate embeddings from both text and images. This powerful addition to Amazon Bedrock can enable enterprises to unlock significant value from their vast amounts of data, including visual content. With this new capability, businesses can build systems that accurately and quickly search important multimodal assets such as complex reports, product catalogs, and design files.
According to Cohere, Embed 3 delivers exceptional performance on various retrieval tasks and is engineered to handle diverse data types. Supporting search functionality for both text and images, and in over 100 languages (Embed 3 Multilingual), it is well-suited for global applications. These models are designed to process and interpret varied datasets, effectively managing inconsistencies typical in real-world scenarios. This versatility makes Embed 3 particularly valuable for enterprises seeking to enhance their search and retrieval systems across different data formats. By leveraging this technology, businesses can develop more comprehensive search applications, and can lead to improved user experiences and increased efficiency across various use cases.
Cohere Embed 3 with multimodal support is now available in Amazon Bedrock and is supported in 12 AWS Regions, for more information on supported Regions, visit the Amazon Bedrock Model Support by Regions guide. For more details about Cohere Embed 3 and its capabilities, visit the Cohere product page. To get started with Cohere Embed 3 in Amazon Bedrock, visit the Amazon Bedrock console.
Kubernetes version 1.32 introduced several new features and bug fixes, and AWS is excited to announce that you can now use Amazon Elastic Kubernetes Service (EKS) and Amazon EKS Distro to run Kubernetes version 1.32. Starting today, you can create new EKS clusters using version 1.32 and upgrade existing clusters to version 1.32 using the EKS console, the eksctl command line interface, or through an infrastructure-as-code tool.
Kubernetes version 1.32 introduces several improvements including stable support for custom resource field selectors and auto removal of persistent volume claims created by stateful sets. This release removes v1beta3 API version of FlowSchema and PriorityLevelConfiguration. To learn more about the changes in Kubernetes version 1.32, see our documentation and the Kubernetes project release notes.
EKS now supports Kubernetes version 1.32 in all the AWS Regions where EKS is available, including the AWS GovCloud (US) Regions.
You can learn more about the Kubernetes versions available on EKS and instructions to update your cluster to version 1.32 by visiting EKS documentation. You can use EKS cluster insights to check if there any issues that can impact your Kubernetes cluster upgrades. EKS Distro builds of Kubernetes version 1.32 are available through ECR Public Gallery and GitHub. Learn more about the EKS version lifecycle policies in the documentation.
Amazon Q Business, the most capable generative AI-powered assistant for finding information, gaining insight, and taking action at work, now offers capabilities to answer questions and extract insights from images uploaded in the chat.
This new feature allows users to upload images directly to the Amazon Q Business chat and ask questions related to the content of those images. Users can seamlessly interact with visual content, enabling them to use image files as a data source for a richer image analysis experience. For instance, a user can upload an invoice image and promptly ask Amazon Q Business to categorize the expenses. Similarly, a business user can share a technical architecture diagram to request an explanation of it or to ask other specific questions related to its components and design.
The new visual analysis feature is available in all AWS Regions where Amazon Q Business is available. To learn more, visit the Amazon Q Business product page.
To get started with this new feature, visit the AWS Chat Service console or refer to our documentation for integration guidelines.
Amazon Redshift is announcing the general availability of Multi-AZ deployments for RA3 clusters in the Asia Pacific (Thailand) and Mexico (Central) AWS regions. Redshift Multi-AZ deployments support running your data warehouse in multiple AWS Availability Zones (AZ) simultaneously and continue operating in unforeseen failure scenarios. A Multi-AZ deployment raises the Amazon Redshift Service Level Agreement (SLA) to 99.99% and delivers a highly available data warehouse for the most demanding mission-critical workloads.
Enterprise customers with mission critical workloads require a data warehouse with fast failover times and simplified operations that minimizes impact to applications. Redshift Multi-AZ deployment helps meet these demands by reducing recovery time and automatically recovering in another AZ during an unlikely event such as an AZ failure. A Redshift Multi-AZ data warehouse also maximizes query processing throughput by operating in multiple AZs and using compute resources from both AZs to process read and write queries.
Amazon Redshift Multi-AZ is now generally available for RA3 clusters through the Redshift Console, API and CLI. For all regions where Multi-AZ is available, see the supported AWS regions.
Amazon Aurora PostgreSQL Limitless Database is now available with PostgreSQL version 16.6 compatibility. This release contains product improvements and bug fixes made by the PostgreSQL community, along with Aurora Limitless-specific security and feature improvements such as support for GIN operator classes with B-tree behavior (btree_gin) and support for DISCARD.
Aurora PostgreSQL Limitless Database makes it easy for you to scale your relational database workloads by providing a serverless endpoint that automatically distributes data and queries across multiple Amazon Aurora Serverless instances while maintaining the transactional consistency of a single database. Aurora PostgreSQL Limitless Database offers capabilities such as distributed query planning and transaction management, removing the need for you to create custom solutions or manage multiple databases to scale. As your workloads increase, Aurora PostgreSQL Limitless Database adds additional compute resources while staying within your specified budget, so there is no need to provision for peak, and compute automatically scales down when demand is low.
Aurora PostgreSQL Limitless Database is available in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) I7ie instances are available in AWS Europe (Frankfurt, London), and Asia Pacific (Tokyo) regions. Designed for large storage I/O intensive workloads, I7ie instances are powered by 5th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz, offering up to 40% better compute performance and 20% better price performance over existing I3en instances. I7ie instances offer up to 120TB local NVMe storage density (highest in the cloud) for storage optimized instances and offer up to twice as many vCPUs and memory compared to prior generation instances. Powered by 3rd generation AWS Nitro SSDs, I7ie instances deliver up to 65% better real-time storage performance, up to 50% lower storage I/O latency, and 65% lower storage I/O latency variability compared to I3en instances.
I7ie are high density storage optimized instances, ideal for workloads requiring fast local storage with high random read/write performance at very low latency consistency to access large data sets. These instances are available in 9 different virtual sizes and deliver up to 100Gbps of network bandwidth and 60Gbps of bandwidth for Amazon Elastic Block Store (EBS).
AWS Transfer Family now allows you to customize the directories for your Applicability Statement 2 (AS2) files, including the inbound AS2 messages, message disposition notifications (MDN), and other metadata files. This enables you to separate your AS2 messages from the MDN files and status files generated by the service, and automate downstream processing of the messages received from your trading partners.
AS2 is a business-to-business messaging protocol used to transfer Electronic Data Interchange (EDI) documents across various industries, including healthcare, retail, supply chain and logistics. You can now specify separate directory locations to store your inbound AS2 messages, the associated MDN files, and the JSON status files generated by the service. This option overrides the service-default directory structure for storing these file types, enabling easier automation of downstream processing for your AS2 messages using other AWS services. For example, you can directly store inbound AS2 messages in the input directory used for AWS B2B Data Interchange, facilitating automatic conversion of X12 EDI contents into common data representations such as JSON or XML.