Microsoft Binary Technologies and Debugging

Midway upon the journey of our life I found myself within a forest dark, For the straightforward pathway had been lost.


In the world of debugging, one could easily get lost without sufficient knowledge of the underlying mechanism. While well known examples being DLL (Dynamic-Link Libraries), FPO (Frame-Pointer Omission), LTCG (Link-time Code Generation), PE/COFF and SEH (Structured Exception Handling), there are many other technologies used by Microsoft:

  • BBT (Basic Block Tools) is a suite of optimization tools designed to help reduce the working-set requirements for a Win32 application by applying advanced static analysis and code layout heuristics, and integrating profile data gathered from monitoring the program execution flow. In addition, BBT rearranges static data and resources sections for additional paging reduction.
  • Detours is a library for instrumenting arbitrary Win32 functions on x86, x64, and IA64 machines. Detours intercepts Win32 functions by re-writing the in-memory code for target functions. The Detours package also contains utilities to attach arbitrary DLLs and data segments (called payloads) to any Win32 binary.
  • Vulcan is a single infrastructure for building a wide range of custom tools for program analysis, optimization, and testing. Through the Vulcan API, developers and testers can build custom tools with very few lines of code for basic block counting, memory tracing, memory allocation, coverage, failure insertion, optimization, compiler auditing etc. Vulcan scales to large commercial applications and has been used to improve the performance and reliability of products across Microsoft.


The following disassembly is directly related to Detours, MOV EDI, EDI is a placeholder which has 2 bytes for holding a NEAR JMP instruction. The NOP instructions has 5 bytes in total for holding an FAR JMP instruction (x86). In a short words, many Windows system DLLs have Detours in mind. The Visual C++ compiler has a command line option called /hotpatch (Create Hotpatchable Image) which does all the magic.

 7541b4c1 0400            add     al,0
7541b4c3 90              nop
7541b4c4 90              nop
7541b4c5 90              nop
7541b4c6 90              nop
7541b4c7 90              nop
7541b4c8 8bff            mov     edi,edi
7541b4ca 55              push    ebp

NTDLL is not using the hot patch approach, the NOP instructions are just for padding to make sure each entry is aligned.

77236278 b80d010000      mov     eax,10Dh
7723627d ba0003fe7f      mov     edx,offset SharedUserData!SystemCallStub (7ffe0300)
77236282 ff12            call    dword ptr [edx]
77236284 c21400          ret     14h
77236287 90              nop
77236288 b80e010000      mov     eax,10Eh
7723628d ba0003fe7f      mov     edx,offset SharedUserData!SystemCallStub (7ffe0300)
77236292 ff12            call    dword ptr [edx]
77236294 c21800          ret     18h
77236297 90              nop

With the introduction of KERNELBASE, a lot of kernel32 exported functions were forwarded.

 0:000>  .call kernel32!SetErrorMode(1) 
                                 ^ Symbol not a function in '.call kernel32!SetErrorMode(1)'
 0:000> u kernel32!SetErrorMode L1
75ac016d ff25b41da775    jmp     dword ptr [kernel32!_imp__SetErrorMode (75a71db4)]
 0:001> u poi(75a71db4) 
75417991 8bff            mov     edi,edi
75417993 55              push    ebp
75417994 8bec            mov     ebp,esp
75417996 51              push    ecx
75417997 56              push    esi
75417998 e836000000      call    KERNELBASE!GetErrorMode (754179d3)
7541799d 8bf0            mov     esi,eax
7541799f 8b4508          mov     eax,dword ptr [ebp+8]

Basic Block Tools

BBT would merge duplicated blocks, rearrange binary blocks and do a lot crazy things to the symbol files (PDB). Your callstack will look weired as functions might get merged and overlapped, especially if C++ templates are used heavily. You can tell if optimization was performed on basic block level by examining the function body.

Frame-Pointer Omission

FPO was introduced with Windows NT 3.51 thanks to 80386 making ESP available for indexing, thus allowing EBP to be used as a general purpose register. But FPO makes stack unwinding unreliable, which in turn makes it painful to debug. You can tell if FPO was used by examining the function prologue/epilogue.

FPO disabled:

55              push ebp
8B EC           mov  ebp, esp
  return TRUE;
B8 01 00 00 00  mov  eax, 1
5D              pop  ebp
C3              ret

FPO enabled:

  return TRUE;
B8 01 00 00 00  mov  eax, 1
C3              ret

FPO information is available from both public and private PDB files, WinDBG has a command kv which can be used to examine this information:

 0:000> kv
ChildEBP RetAddr  Args to Child              
002bfdac 75d9339a 7efde000 002bfdf8 76f39ed2 notepad!WinMainCRTStartup (FPO: [0,0,0])
002bfdb8 76f39ed2 7efde000 7b449f70 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
002bfdf8 76f39ea5 005b3689 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
002bfe10 00000000 005b3689 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

Link-time Code Generation

LTCG was introduced with the first version of .NET. It can be used with or without PGO (Profile Guided Optimization). If you were debugging optimized C++ application, you should already know that local variables and inline functions can be very different. With LTCG, cross-module inlining is even possible, in addition, calling convention and parameters can be optimized. Similar as BBT, functions might get merged.

Profile Guided Optimization

PGO (a.k.a. POGO) does a lot of optimization such as inlining, virtual call speculation, conditional branch optimization. What's more, POGO is able to perform optimizations at extended basic block level.

Incremental Linking

The Microsoft Incremental Linker has an option /INCREMENTAL (don't confuse it with an incremental compiler which makes use of precompiled header) which would affect debugging. In fact, the native EnC (Edit and Continue) is built on top of incremental linking technology. Sometimes we may get symbols like module!ILT+0(_main) , the ILT (Incremental Link Table) serves the incremental linker by adding a layer of indirection, thus provides the flexibility for binary patching. The bad news is that incremental linker has to generate correct symbols and patch them into PDB as well. The patching process doesn't discard unused symbols in a reliable manner. This would be challenging for debugger authors, since the integrity of symbols is not guaranteed by the MSPDB layer.

Function Inlining

Function inlining means there will be no actual call. The stepper and symbol binding components in debugger might get confused.

Intrinsic Function

Intrinsic functions are a special kind of function generated by the compiler toolchain (instead of coming from libraries or your code).