Thread, System.Threading.Thread, and !Threads (II)

With knowledge in my previous blog, we could avoid some mistakes in .NET programming.

A C++ Thread is very resource heavy. It is associated with a lot of dynamically allocated memory and some OS handles. So it had better to be cleaned up ASAP after its corresponding OS thread dies. C++ Thread class has a reference count. For its object to be deleted, the ref count has to be dropped to 0 (Rotor: Thread::DecExternalCount in vm\threads.cpp). One interesting point is that the C# Thread object actually keeps a reference to its associated C++ Thread, so a live C# Thread object could keep its C++ Thread from being deleted even if the OS thread is already dead. (On the other hand, C++ Thread also has a reference to C# Thread, but it will break the circle when its own ref count drops to 1). Because C# Thread is a managed object, its lifetime is mostly determined by users. Plus, C# Thread class has a finalizer, so its lifetime will be extended at least one GC.  So if user code caches the C# Thread objects or have some ill-behaved finalizers (in another blog entry, I mentioned wrong-doing finalizer on one object could prevent all other object's fianlizer from running), "dead" C++ Thread objects may accumulate over time and some "memory leak" will be observed. 

I have an example here to demo the problem and how to debug it using windbg + SOS. In this process, there are 202 C++ Thread objects. Among which 160 are "dead", meaning their associated OS threads are dead. Number of total threads in Thread Store and dead/unstarted threads are showed in "!threads" output. For a "live" thread, OS and debugger thread ID are printed out for the entry, for a "dead" thread, "XXX" is marked at beginning of the line:

0:043> !threads
ThreadCount: 202
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 160
PreEmptive GC Alloc Lock
ID ThreadOBJ State GC Context Domain Count APT Exception
0 0x1138 0x0015a298 0x20 Enabled 0x00000000:0x00000000 0x00149ac8 1 Ukn
1 0x1148 0x00152530 0x1220 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn (Finalizer)
3 0x114c 0x00177548 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
4 0x1150 0x00177878 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
5 0x1154 0x00177c08 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn

42 0x11e4 0x00180460 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
22 0x11e8 0x00180838 0x2001020 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
XXX 0 0x00180c10 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
XXX 0 0x00180fe8 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
XXX 0 0x001813c0 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
XXX 0 0x00181750 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
XXX 0 0x00181b28 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
XXX 0 0x00181f00 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
XXX 0 0x001822d8 0x1820 Enabled 0x00000000:0x00000000 0x00149ac8 0 Ukn
… //continue with a huge list

Now I want to find out why all the "dead" C++ Thread objects are still around. First I could check its ref count if I have symbols for mscorwks.

//0x00180c10 is a dead ThreadOBJ I picked from !Threads output
0:043> dt mscorwks!Thread 0x00180c10 m_ExternalRefCount
+0x0cc m_ExternalRefCount : 1

Since the ref count is 1, if this C++ Thread object has a C# Thread object associated with it, the C# object must be the last reference. I could verify if that is the case by checking the C++ object's m_ExposedObject field. It is a weak GC handle (a unmovable pointer to GC reference which doesn't counted as root of the GC object), so dereference it will get the managed object. As mentioned before, C++ Thread object also has a strong handle (m_StrongHndToExposedObject field) to the C# object, but it already cleared the strong handle when ref count drops to 1 to avoid circular reference.

0:043> dt mscorwks!Thread 0x00180c10 m_ExposedObject
+0x0c0 m_ExposedObject : 0x00a71054

0:043> dp 0x00a71054 l1
00a71054 00c5c714

0:043> !do 00c5c714
Name: System.Threading.Thread
MethodTable 0x79bb8384
EEClass 0x79bb85b0
Size 60(0x3c) bytes
GC Generation: 0
mdToken: 0x020000eb (c:\windows\\framework\v1.1.4322\mscorlib.dll)
FieldDesc*: 0x79bb8614
MT Field Offset Type Attr Value Name
0x79bb8384 0x4000330 0x4 CLASS instance 0x00000000 m_Context
0x79bb8384 0x4000331 0x8 CLASS instance 0x00000000 m_LogicalCallContext
0x79bb8384 0x4000332 0xc CLASS instance 0x00000000 m_IllogicalCallContext
0x79bb8384 0x4000333 0x10 CLASS instance 0x00000000 m_Name
0x79bb8384 0x4000334 0x14 CLASS instance 0x00000000 m_ExceptionStateInfo
0x79bb8384 0x4000335 0x18 CLASS instance 0x00000000 m_Delegate
0x79bb8384 0x4000336 0x1c CLASS instance 0x00000000 m_PrincipalSlot
0x79bb8384 0x4000337 0x20 CLASS instance 0x00000000 m_ThreadStatics
0x79bb8384 0x4000338 0x24 CLASS instance 0x00000000 m_ThreadStaticsBits
0x79bb8384 0x4000339 0x28 CLASS instance 0x00000000 m_CurrentCulture
0x79bb8384 0x400033a 0x2c CLASS instance 0x00000000 m_CurrentUICulture
0x79bb8384 0x400033b 0x30 System.Int32 instance 2 m_Priority
0x79bb8384 0x400033c 0x34 System.Int32 instance 1575952 DONT_USE_InternalThread
0x79bb8384 0x400033d 0 CLASS shared static m_LocalDataStoreMgr
>> Domain:Value 0x00149ac8:0x00c05338 <<

Then I want to check root of the C# Thread object to see who keeps it alive:

0:043> !gcroot 00c5c714
Scan Thread 0 (0x1138)

So there is an array who keeps a reference to a "dead" C# Thread. This looks interesting. I could check all other C# Thread objects in the process using !DumpHeap command. !DumpHeap could dump objects in GC heap for a particular type specified by "-type" option:

0:043> !DumpHeap -type System.Threading.Thread
Address MT Size Gen
0x00c054c8 0x79bb8384 60 1 System.Threading.Thread
0x00c5b730 0x79bc81d4 28 2 System.Threading.ThreadStart
0x00c5b74c 0x79bb8384 60 2 System.Threading.Thread
0x00c5b7bc 0x79bc81d4 28 2 System.Threading.ThreadStart
0x00c5b7d8 0x79bb8384 60 2 System.Threading.Thread
0x00c5b820 0x79bc81d4 28 2 System.Threading.ThreadStart
0x00c5b83c 0x79bb8384 60 2 System.Threading.Thread
0x00c5b884 0x79bc81d4 28 2 System.Threading.ThreadStart
0x00c5b8a0 0x79bb8384 60 2 System.Threading.Thread
0x00c5b8e8 0x79bc81d4 28 2 System.Threading.ThreadStart
0x00c5b904 0x79bb8384 60 2 System.Threading.Thread
0x00c5b94c 0x79bc81d4 28 2 System.Threading.ThreadStart
0x00c5b968 0x79bb8384 60 2 System.Threading.Thread
0x00c5b9b0 0x79bc81d4 28 2 System.Threading.ThreadStart
…//long list
total 401 objects

Because !DumpHeap match type by string, so it also dumps ThreadStart objects. Because every C# Thread object created by user code always has a ThreadStart object (but C# Thread created by System.Thread.CurrentThread may not have a ThreadStart), so they show up as a pair. among 401 such objects, 200 are C# Thread objects, roughly match the number of C++ Thread objects (the number doesn't have to be the same because not every C++ Thread object has a C# counterpart created). Generation for most of C# Thread objects are 2, meaning they already survive at least 2 GCs. When I track roots of those C# Thread objects, they all point to the array. In this case, we need to look closely to the source to see whether it is necessary to cache all the C# Thread objects in an array.

Another related topic is that CLR relies on DLL_THREAD_DETACH notification to mscorwks.dll's DllMain (Rotor: EEDllMain in vm\ceemain.cpp) to know an OS thread is dead, thus detach the related C++ Thread. Using TerminateThread API is already notoriously bad in unmanaged programming, here we see another reason not to call it in managed code: if TerminateThread is called on a managed OS thread, among other bad effect (e.g. back out code not executed), CLR will not get thread detach notification. Because C++ Thread object has references to OS thread's stack address, failing to detach it from the OS thread will cause crash at random place.



This posting is provided "AS IS" with no warranties, and confers no rights.