Case Study: Pack Mismatch

Packing is a way to arrange data  in memory using padding technique to align data at a memory offset equal to multiple of a fix size (usual the word size, but it could be byte size, DWORD size or any other size). The purpose of packing is to align the data memory layout to the microprocessor’s addressing model, and this can help increase performance (in some microprocessor architectures, misaligned data is even not allowed at all). Furthermore, packing can also be used to eliminate cache false sharing issue by forcing individual data structure to occupy an entire cache line. This is very important in high performance/parallel computing domain and it  become more and more relevant  to everybody else because the industrial trend of moving towards multi-core processors machine.

Note: See more information about pack and alignment in [MSDN-PACK] . For more on false sharing see [WVC07] Chapter 8.2 and [PPP09] Chapter 1

In this case study, a heap bug is encountered due to incorrect use of the packing. The debugging technique demonstrated here includes some basic knowledge of Windows heap as described in last post and some understanding of data alignment in 64 bits Windows.

The Bug:

MyProgram is a 64-bit native application running under Windows 7, and it crashes consistently when it is launched. From the Event Viewer (Windows Logs/Application), the following error is logged:

Faulting application name: MyProgram.exe, version: 0.0.0.0, time stamp: 0x4b9d52d7
Faulting module name: ntdll.dll, version: 6.1.7600.16385, time stamp: 0x4a5be02b
Exception code: 0xc0000374
Fault offset: 0x00000000000c6cd2
Faulting process id: 0x1074
Faulting application start time: 0x01cac3bc0628969b
Faulting application path: d:\MyProgram\x64\Release\MyProgram.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report Id: 4485473b-2faf-11df-8d9f-00151736c5a7

The exception code 0xc0000374 means STATUS_HEAP_CORRUPTION. At this point, it is clear that this is a heap corruption bug, but it still requires some more debugging to find out the what actually corrupted the heap.

The Debugging:

As mentioned in last post, debugger could change the behavior of a heap related bug. Therefore, the –hd option is used during debugging so that the heap layout remains exactly the same as if the MyProgram.exe is running without debugger. In addition, the symbol path is set to Microsoft Public Symbol Server.

Note: –hd prevents the debugger to use the special heap which has a different layout than the default heap used by Windows.

The debugger outputs the following information when Windows Heap Manager detected the heap problem:

Critical error detected c0000374
(1364.dd4): Break instruction exception - code 80000003 (first chance)
ntdll!RtlUnhandledExceptionFilter+0x29f:
00000000`77bc6c9f cc int 3

With symbols loaded, uses command !heap to show more information, only the relevant outputs are shown

0:003> !heap
Details:

Error address: 00000000048b8410
Heap handle: 00000000048b0000
Error type heap_failure_buffer_overrun (6)
Last known valid blocks: before - 00000000048b8320, after - 00000000048b8fc0
Stack trace:
0000000077bca4a8: ntdll!RtlpAnalyzeHeapFailure+0x00000000000003a8
0000000077b5ea71: ntdll!RtlpAllocateHeap+0x0000000000002176
0000000077b529ac: ntdll!RtlAllocateHeap+0x000000000000016c
0000000071a6c887: msvcr90!malloc+0x000000000000005b
0000000071a6c987: msvcr90!operator new+0x000000000000001f
000007fef2fdbf55: MyProvider!CGenericThreadPool::Initialize+0x0000000000000135
000007fef2fd74c5: MyProvider!SpecialThreadPool::SpecialThreadPool+0x0000000000000075
000007fef2fcb5cc: MyProvider!CPolicyProvider::CPolicyProvider+0x00000000000000ac
000007fef2fd3e84: MyProvider!MyProviderInitialize+0x0000000000000034
000007fef308e043: MyProgram!Initialize+0x00000000000002a3

As the output indicates, the heap issue is a buffer overrun, and the address where this happens is at 00000000048b8410. From the stack trace, it is obvious that overrun happens inside some constructor calls and Windows Heap Manager detected it when RtlAllocateHeap is called.

To get a close look at the error address and the heap block it belongs to:

0:003> !heap -i 00000000048b8410
Detailed information for block entry 00000000048b8410
Assumed heap : 0x00000000048b0000 (Use !heap -i NewHeapHandle to change)
Header content : 0x65010001 0x0000284A
Owning segment : 0x00000000048b0000 (offset 0)
Block flags : 0x0 (free )
Total block size : 0x1 units (0x10 bytes)
Previous block size: 0xf units (0xf0 bytes)
Block CRC : ERROR - current 65, expected 0
Free list entry : OK
Previous block : 0x00000000048b8320
Next block : 0x00000000048b8420

It reports checksum doesn’t match the expected value, which means this heap block’s header has been damaged. As explained in in last post, Windows Heap Manager cannot detect the overrun right at the moment when the damage happens, instead, it is only able to tell certain overrun has happened when the damaged block is used later on. In other word, if an operation X overrun its heap block a, which damages the next heap block a+1, Windows Heap Manager cannot detect X’s misconduct, However, if later on, the heap block a+1 is used by other operation Y, then Windows Heap Manager will find out a+1 has been damaged, but of course operation Y has nothing to do with this damage.

Therefore, what happened in this case is that, the heap block 00000000048b8410 is the a+1 block,  the stack track shown above is the innocent operation Y.  The real culprit is the one who operate on the previous heap block 0x00000000048b8320. So, to get a better look at 0x00000000048b8320:

0:003> !heap -i 0x00000000048b8320
Detailed information for block entry 00000000048b8320
Assumed heap : 0x00000000048b0000 (Use !heap -i NewHeapHandle to change)
Header content : 0x7717D4D1 0x0B0028C5 (decoded : 0x0E01000F 0x0B000080)
Owning segment : 0x00000000048b0000 (offset 0)
Block flags : 0x1 (busy )
Total block size : 0xf units (0xf0 bytes)
Requested size : 0xe5 bytes (unused 0xb bytes)
Previous block size: 0x80 units (0x800 bytes)
Block CRC : OK - 0xe
Previous block : 0x00000000048b7b20
Next block : 0x00000000048b8410

0:003> !heap -x 0x00000000048b8320
HEAP 00000000048b0000 (Seg 00000000048b0000) At 00000000048b8410 Error: invalid block size

Entry User Heap Segment Size PrevSize Unused Flags
-------------------------------------------------------------------------------------------------------------
00000000048b8320 00000000048b8330 00000000048b0000 00000000048b0000 f0 800 b busy

It tells this block was assigned to an allocation request for size of 0xe5 bytes memory. And look back at the stack track from the first !heap command, there are quite a few constructors calls. It is reasonable to guess that this heap block was allocated for an object. To show the content of this heap block, just do d command at the user portion of this heap block at address 00000000048b8330, i.e. heap block header address + header size(0x10):

0:003> dq 00000000048b8330
00000000`048b8330 000007fe`f2fc1848 00000000`0478ded0
00000000`048b8340 00000000`00000000 00000000`00000000
00000000`048b8350 00000000`00000000 00000000`00000000
00000000`048b8360 00000000`00000000 00000000`00000000
00000000`048b8370 00000000`00000000 00000000`00000000
00000000`048b8380 00000000`00000000 00000000`00000000
00000000`048b8390 00000000`00000000 00000000`00000000
00000000`048b83a0 000007fe`f2fc1880 0000000c`00000000

If the assumption(this block is allocated for an object) is right, then the first 8 bytes should be pointing to the virtual function table of that object. To verify this, do ln command at the address pointed by the first 8 bytes

0:003> ln 000007fe`f2fc1848
(000007fe`f2fc1848) MyProvider!CPolicyProvider::`vftable' Exact matches:

It is indeed an object, and this object is an instance of type CPolicyProvider. It is clear now that the heap block 0x00000000048b8320 was allocated as a response to  the constructor call of class CPolicyProvider.

Note: For more information about C++ Language Memory Model, see [ICOM96]

Note: Another easy way to find out what functions has operated on certain heap block is to use the stack trace. By specifying +ust flag with gflags.exe tool, it enables the user mode stack trace. After it is enabled, !heap –i command will automatically includes stack trace whenever available. This is a very useful feature.

To know the size of this type, do

0:003> ?? sizeof (OSDPxeProvider!OSD::CPolicyProvider)
unsigned int64 0xe5

It returns 0xe5 bytes, and it seems consistent with the size from heap block information. However, when trying to display the detailed type information using dt command, some inconsistency has been noticed (only relevant output has been shown below) :

0:003> dt -r MyProvider!CPolicyProvider
+0x000 __VFN_table : Ptr64
=000007fe`f2fec5d0 CPolicyProvider::s_theInstance : Ptr64 CPolicyProvider
+0x000 __VFN_table : Ptr64
=000007fe`f2fec5d0 CPolicyProvider::s_theInstance : Ptr64 CPolicyProvider
+0x000 __VFN_table : Ptr64
=000007fe`f2fec5d0 CPolicyProvider::s_theInstance : Ptr64 CPolicyProvider
+0x008 m_hProvider : Ptr64 Void
+0x010 configPtr : shared_any<Config *,close_delete,null_t,enum Unique>
+0x020 m_bootCounter : shared_ptr<IPerformanceCounter>
+0x030 m_abortCounter : shared_ptr<IPerformanceCounter>
+0x040 m_passCounter : shared_ptr<IPerformanceCounter>
+0x050 m_requestTotalCounter : shared_ptr<IPerformanceCounter>
+0x060 m_requestTimedOutCounter : shared_ptr<IPerformanceCounter>
+0x070 m_threadPool : SpecialThreadPool
+0x008 m_hProvider : Ptr64 Void

      . . . Other fields are not shown here

   +0x070 m_threadPool : SpecialThreadPool
+0x000 __VFN_table : Ptr64
+0x008 m_bIsInitialized : Bool
+0x009 m_bInitializeCOM : Bool
+0x00c m_dwCurrentNumberOfThreads : Uint4B

       ... Other fields are not shown here

      +0x078 m_bIsCsThreadListInitialized : Bool
+0x079 m_bIsShuttingDown : Bool
+0x07a m_bAutoCreateNewThreads : Bool
=000007fe`f2fc0000 THREAD_COUNT : Uint4B

The last field of CPolicyProvider is type of SpecialThreadPool, size of SpecialThreadPool is at least 0x7a bytes(depends on padding). So, the entire size of CPolicyProvider should be at least 0x70 + 0x7a = 0xea. In other words, the instance memory layout has more bytes than the actual memory allocated (0xea > 0xe5). This definitely causes heap buffer overrun if there is a write operation to the last field of CPolicyProvider.

Looking into the source code, type SpecialThreadPool is inherited from type CGenericThreadPool, and it just has some more additional static field and functions than the parent class. Hence, the memory layout of SpecialThreadPool should be exact the same as CGenericThreadPool. However, the size of CGenericThreadPool reported by debugger is different from the size of SpecialThreadPool reported :

0:003> ?? sizeof (CGenericThreadPool)
unsigned int64 0x80

0:003> ?? sizeof (SpecialThreadPool)
unsigned int64 0x75

To understand where has gone wrong, the memory layout of SpecialThreadPool needs to be examined more closely:

0:003> dt OSD::PXERequestThreadPool
OSDPxeProvider!OSD::PXERequestThreadPool
+0x000 __VFN_table : Ptr64
+0x008 m_bIsInitialized : Bool
+0x009 m_bInitializeCOM : Bool
+0x00c m_dwCurrentNumberOfThreads : Uint4B
+0x010 m_dwMaximumNumberOfThreads : Uint4B
+0x018 m_hCompletionPort : Ptr64 Void
+0x020 m_hMonitoringThread : Ptr64 Void
+0x028 m_lastActivationTime : Int8B
+0x030 m_lNumberOfThreadsWaiting : Int4B
+0x034 m_dwCompletionThreadTimeout : Uint4B
+0x038 m_spThreadContextArray : Ptr64 CThreadContext
+0x040 m_listActivationTime : _LIST_ENTRY
+0x050 m_csThreadList : _RTL_CRITICAL_SECTION
+0x078 m_bIsCsThreadListInitialized : Bool
+0x079 m_bIsShuttingDown : Bool
+0x07a m_bAutoCreateNewThreads : Bool

where the layout of _LIST_ENTRY and _RTL_CRITICAL_SECTION are:

0:003> dt _LIST_ENTRY
msvcr90!_LIST_ENTRY
+0x000 Flink : Ptr64 _LIST_ENTRY
+0x008 Blink : Ptr64 _LIST_ENTRY
0:003> dt _RTL_CRITICAL_SECTION
msvcr90!_RTL_CRITICAL_SECTION
+0x000 DebugInfo : Ptr64 _RTL_CRITICAL_SECTION_DEBUG
+0x008 LockCount : Int4B
+0x00c RecursionCount : Int4B
+0x010 OwningThread : Ptr64 Void
+0x018 LockSemaphore : Ptr64 Void
+0x020 SpinCount : Uint8B

And more intuitively, the memory layout of SpecialThreadPool could be represented as following block diagram (the unit in the brackets is byte):

0x00 Ptr64 (8)
0x08 bool (1) bool (1) Uint4B(4) padding(2)
0x10 Uint4B (4) padding(4)
0x18 Ptr64 (8)
0x20 Ptr64 (8)
0x28 Int8B (8)
0x30 Int4B (4) Uint4B(4)
0x38 Ptr64 (8)
0x40 _LIST_ENTRY.Ptr64 (8) 
0x48 _LIST_ENTRY.Ptr64 (8)
0x50 _RTL_CRITICAL_SECTION.Ptr64 (8)
0x58 _RTL_CRITICAL_SECTION.Int4B (4) _RTL_CRITICAL_SECTION.Int4B (4)
0x60 _RTL_CRITICAL_SECTION.Ptr64 (8)
0x68 _RTL_CRITICAL_SECTION.Ptr64 (8)
0x70 _RTL_CRITICAL_SECTION.Uint8B (8)
0x78 bool (1) bool(1) bool(1) padding(5)

As the diagram shows, the entire size of SpecialThreadPool should be 0x80 (0x78 + 0x8) bytes. 0xb (11) bytes out of 0x80 are just padding. Furthermore, it is clear that the packing size of SpecialThreadPool is 8 because every data field of SpecialThreadPool is aligned at multiple of 8 bytes.

Compare this value with the size 0x75 reported by debugger, the difference is just 0xb (0x80 - 0x75). It is the same as the number of  padding bytes. So it seems that the packing reported by debugger is just 1 (no padding), not 8.

At this point, with all the information uncovered by using debugger, it is easy to locate the root cause by just doing a simple code review. It turns out that type SpecialThreadPool and type CGenericThreadPool were using different packing size in spite of having an inheritance relationship.  In one of the header files that SpecialThreadPool includes, there is a line “#pragma pack(1)” which tells the complier to compile SpecialThreadPool using 1 as pack size. All the other classes are compiled with default pack size 8. This mistake in code cause the compiler to generate inconsistent code regarding the size of SpecialThreadPool type, which in turn results the heap buffer overrun in runtime.

Take Away:

The techniques shown in this case study is based on some fundamental understanding of how the memory is used in Windows. As a software developer, it is important to understand data structure memory layout and language object model, although most time the complier has abstracted this information away. These knowledge can comes handy while debugging memory relative debugs.

Besides packing, there are many other aspects of compiler can be configured and they would affect the final shape of the generated codes. When making setting changes in the code, it is recommended to localize its effect by saving and restore the original value. For example, instead of just throw “#pragma pack(1)” into some header file, it is better to use the following code:

#pragma pack(push) //to save current packing size 

#pragma pack(1) //enable special packing, no padding

... //your data structure here

#pragma pack(pop) //restore the packing size to original value

This way, other files will not accidentally inherit this special packing setting when including this header file. On the other hand, if the setting changes are meant to be global, it is better to make those changes in complier level (like complier switchers), not in the code itself.

References:

1. [MSDN-PACK] Pack

2. [WVC07] Windows via C/C++, Fifth Edition Chapter 8.2

3. [PPP09] Principles of Parallel Programming Chapter 1

4. [ICOM96] Inside the C++ Object Model

 

Special thanks to Gabriel and Daniel’s help in finding this bug in the first place