Case Study: How to debug 64-bit managed windows service at startup

Sometimes you might need to debug a windows service which is written in managed code and it is running on 64-bit windows, and to make the matter worse, the bug happens during the service startup. Without adding instrument codes into your binary, this could be a difficult setup to do debugging, because:

  1. Services Control Manager must start the service with the debugger attached.
  2. Starting from Windows Vista/2008, all windows services are running under a separated session 0 (which has a different WindowStations/Desktop than the normal user interactive desktop).
  3. The visual studio debugger cannot be attached in mixed mode (both native and managed) if the code is written prior to .NET 4.0

One solution for this problem is to use WinDBG debugger with SOS extension, and configure the “Image File Execution” options to start the service under debugger server automatically. The following case study demonstrates this technique.

The Bug:

ManagedService is a winidows service, its binary managedservice.exe is implemented using C#. The following error occurs when ManagedService is trying to start:

C:\Users\administrator>net start ManagedService
The ManagedService service is starting...
The ManagedService service could not be started.

A system error has occurred.

System error 1067 has occurred.

The process terminated unexpectedly.

In Event Viewer (Windows Logs/Application), the actual CLR exception and call stack are recorded:

System.Reflection.ReflectionTypeLoadException: Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information.
at System.Reflection.Module._GetTypesInternal(StackCrawlMark& stackMark)
at System.Reflection.Assembly.GetTypes()
at MyCompany.MyProduct.IndigoSerializableObject.InitializeKnownTypesCache(List`1 assembliesToExamine)
at MyCompany.MyProduct.ManagedService.OnStart(String[] args)

This gives a better insight to the problem: This service was probably going to host some WCF services and maybe it was preloading some managed types to improve performance, and the CLR loader encountered an error while loading them. However, assuming there are quite few assemblies and each has a few types, this information alone is not enough to tell which exact type causes the error. What is really needed is the content of LoaderExceptions property so that we can pinpoint the guilty type.

Note: Fusion Log won’t help in this case, because it is a type loading exception not an assembly loading exception.

The Debugging:

First, We need to configure the service so that when it starts the debugger has already been attached. For that, we can use the “Image File Execution” option by specifying the “Debugger” entry. There are numerous blogs and technical articles detailed the usage of this. Basically what it does is to allow the debugger specified in the “Debugger” entry to launch first, and then the service process is in turn spawned by that debugger.

So which debugger we should use for “Image File Execution” option? Because all windows services are running under session 0 (Windows Vista/2008 above), whatever debugger we choose will also be spawned in session 0. If that debugger put out any GUIs, they will not be directly visible to the users (the “Interactive Services Detection” dialog box would pop up).  In other words, any interactive debugger won’t be a good choice. Thus, what we actually need is to start a window-less debugger server instead, and then from the current user session, we can use an interactive debugger to connect to that server.

To setup the debugger server on service startup, we can use either gflags.exe or regedit.exe or other tools to configure the “Debugger” entry of  “Image file Execution” option for managedservice.exe in the registry as:

%windbginstallpath%\cdb.exe –server tcp:port=1234 –c “g;”

%windbginstallpath% is where the installation location of Debugging Tools for Windows, and the cdb.exe is the debugger which doesn’t create any new windows. The –server switch activate the debugger as debugger server, and tcp:port=1234 tells the debugger server to use TCP protocol and listening at port 1234 (you can use other protocol or port too, see debugger documentation).  The last –c ”g;” switch tells the debugger to let managedservice.exe to continue to run upon attachment, and this is because we know the bug is unhandled exception and the exception will cause the debugger to break automatically.

Note: For investigating other kinds of bugs, different debugger commands can be appended after “g;” to enable debugger to act in various situations, for example, “g;sxe ld some.dll”, will break into the debugger automatically when some.dll is loaded into process space.   

Now, we restart the ManagedService again by typing

net start ManagedService

and then we start a new interactive debugger client ntsd.exe (cdb.exe or windbg.exe also works) to connect to the debugger server:

%windbginstallpath%\ntsd.exe –remote tcp:server=localhost,port=1234 

Here, the –remote switch activate the ntsd.exe as debug client, and tcp:server=localhost,port=1234 tells the ntsd.exe to connect to the debugger server locally at port 1234 using TCP protocol.

Note: the Service Control manager has a default 30 seconds timeout period within which the service must have to start, otherwise it would ends the process and the attached debugger server. The timeout value can be adjusted by adding new DWORD value ServicesPipeTimeout under the following registry subkey:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control

Alternative, you can save a crash dump using .dump -ma command, and then debug the dump file in post mortem instead . –ma flag gives a memory dump with everything and it is required by SOS extension.

Once the ntsd.exe attached, we can see that the debugger has already break in because of the unhandled System.Reflection.ReflectionTypeLoadException. To save space, I only show the relevant parts of output from ntsd.exe window:

ModLoad: 000007fe`fac00000 000007fe`fac2d000 C:\Windows\system32\ntmarta.dll
ModLoad: 000007fe`fef50000 000007fe`fefa0000 C:\Windows\system32\WLDAP32.dll
(11f8.f58): CLR exception - code e0434f4d (first chance)
CLR exception type: System.Reflection.ReflectionTypeLoadException
"Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for m
ore information."
00000000`7710017a c3 ret
DCMANAGER\administrator (tcp 127.0.0.1:50908) connected at Sat Feb 20 01:41:19 2010
0:007>

It’s time to take a closer look at the ReflectionTypeLoadException. To do that, we first need to load the SOS Debugger Extension. In the ntsd.exe window, type:

.load %windir%\microsoft.net\<architecture>\<version>\sos.dll

In my case, sos.dll is at c:\windows\Microsoft.NET\Framework64\v2.0.50727\. After SOS is loaded, we can use !dso (!dumpstackobject) to see all managed objects found within the bounds of the current stack, which will include our exception:

0:007>!dso

OS Thread Id: 0x14c4 (7)
RSP/REG Object Name
000000001b7382a0 00000000027cea38 System.Diagnostics.Process
000000001b738450 00000000027cea38 System.Diagnostics.Process
000000001b738458 00000000027ceb48 System.RuntimeType
000000001b738468 00000000027ce868 System.Object
000000001b7384e0 00000000027c9be0 System.Reflection.ReflectionTypeLoadException
000000001b738510 00000000027c9be0 System.Reflection.ReflectionTypeLoadException
000000001b738548 00000000027c9be0 System.Reflection.ReflectionTypeLoadException
000000001b738550 00000000027c9be0 System.Reflection.ReflectionTypeLoadException
000000001b738560 00000000027ca798 System.String
000000001b738580 00000000027ca680 System.Object[] (System.Object[])
000000001b7387c0 00000000027ca660 System.UnhandledExceptionEventArgs
000000001b738b70 0000000002900760 System.Int32
000000001b738b80 0000000002900760 System.Int32
000000001b738bc8 0000000002900760 System.Int32

We can easily see that 00000000027c9be0 is the object address of System.Reflection.ReflectionTypeLoadException. To see inside of that exception, we can use !do (!dumpobject) command:

0:007>!do 00000000027c9be0

Name: System.Reflection.ReflectionTypeLoadException
MethodTable: 000007fef9730fc0
EEClass: 000007fef8c3bd18
Size: 152(0x98) bytes
(C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
MT Field Offset Type VT Attr Value Name
000007fef8f3ec90 40000b5 8 System.String 0 instance 00000000027ca108 _className
000007fef8f3d038 40000b6 10 ...ection.MethodBase 0 instance 000000000281de00 _exceptionMethod
000007fef8f3ec90 40000b7 18 System.String 0 instance 0000000000000000 _exceptionMethodString
000007fef8f3ec90 40000b8 20 System.String 0 instance 00000000027c9cd0 _message
000007fef8f37b98 40000b9 28 ...tions.IDictionary 0 instance 0000000000000000 _data
000007fef8f3ef58 40000ba 30 System.Exception 0 instance 0000000000000000 _innerException

000007fef8f3ec90 40000bb 38 System.String 0 instance 0000000000000000 _helpURL
000007fef8f3e580 40000bc 40 System.Object 0 instance 00000000027ca4c8 _stackTrace
000007fef8f3ec90 40000bd 48 System.String 0 instance 0000000000000000 _stackTraceString
000007fef8f3ec90 40000be 50 System.String 0 instance 0000000000000000 _remoteStackTraceString
000007fef8f45f00 40000bf 70 System.Int32 1 instance 0 _remoteStackInex
000007fef8f3e580 40000c0 58 System.Object 0 instance 0000000000000000 _dynamicMethods

000007fef8f45f00 40000c1 74 System.Int32 1 instance -2146232830 _HResult
000007fef8f3ec90 40000c2 60 System.String 0 instance 0000000000000000 _source
000007fef8f41698 40000c3 68 System.IntPtr 1 instance 0 _xptrs
000007fef8f45f00 40000c4 78 System.Int32 1 instance -532459699 _xcode
000007fef8f34748 4000dad 80 System.Object[] 0 instance 00000000027c6980 _classes
000007fef8f34748 4000dae 88 System.Object[] 0 instance 00000000027c9bb8 _exceptions

We are looking for the value of LoaderExceptions Property, and it is an array of exceptions, so that will be the last field _exceptions. Its corresponding object address is 00000000027c9bb8. This field is an array, so we can see the actually memory layout of this array object by directly dump the memory content using dd command:

0:007>dd 00000000027c9bb8

00000000`027c9bb8 f8f34748 000007fe 00000001 00000000
00000000`027c9bc8 f8f3ef58 000007fe 027c8538 00000000
00000000`027c9bd8 00000000 08000004 f9730fc0 000007fe
00000000`027c9be8 027ca108 00000000 0281d720 00000000
00000000`027c9bf8 00000000 00000000 027c9cd0 00000000
00000000`027c9c08 00000000 00000000 00000000 00000000
00000000`027c9c18 00000000 00000000 027ca4c8 00000000
00000000`027c9c28 00000000 00000000 00000000 00000000

The first 2 DWORDs together 000007fef8f34748 is the address of method table of array type, and then the following 2 DWORDs together 0000000000000001 is number of elements in the array (which is 1 in this case), and then the next 2 DWORDs 000007fef8f3ef58 is the address of method table of the type of element inside the array (which is the method table of exception). And lastly, the final 2 DWORDs 00000000027c8538 is the object address of the first element in the array, which is what we want! So, we use !do on that address:

0:007>!do 00000000027c9bb8

Name: System.TypeLoadException
MethodTable: 000007fef9718088
EEClass: 000007fef8c308b0
Size: 160(0xa0) bytes
(C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
MT Field Offset Type VT Attr Value Name
000007fef8f3ec90 40000b5 8 System.String 0 instance 0000000000000000 _className
000007fef8f3d038 40000b6 10 ...ection.MethodBase 0 instance 0000000000000000 _exceptionMethod
000007fef8f3ec90 40000b7 18 System.String 0 instance 0000000000000000 _exceptionMethodString
000007fef8f3ec90 40000b8 20 System.String 0 instance 00000000027a8720 _message
000007fef8f37b98 40000b9 28 ...tions.IDictionary 0 instance 0000000000000000 _data
000007fef8f3ef58 40000ba 30 System.Exception 0 instance 0000000000000000 _innerException
000007fef8f3ec90 40000bb 38 System.String 0 instance 0000000000000000 _helpURL
000007fef8f3e580 40000bc 40 System.Object 0 instance 0000000000000000 _stackTrace
000007fef8f3ec90 40000bd 48 System.String 0 instance 0000000000000000 _stackTraceString
000007fef8f3ec90 40000be 50 System.String 0 instance 0000000000000000 _remoteStackTraceString
000007fef8f45f00 40000bf 70 System.Int32 1 instance 0 _remoteStackIndex
000007fef8f3e580 40000c0 58 System.Object 0 instance 0000000000000000 _dynamicMethods
000007fef8f45f00 40000c1 74 System.Int32 1 instance -2146233054 _HResult
000007fef8f3ec90 40000c2 60 System.String 0 instance 0000000000000000 _source
000007fef8f41698 40000c3 68 System.IntPtr 1 instance 0 _xptrs
000007fef8f45f00 40000c4 78 System.Int32 1 instance -532459699 _xcode
000007fef8f3ec90 40003b9 80 System.String 0 instance 00000000027a83e8 ClassName
000007fef8f3ec90 40003ba 88 System.String 0 instance 00000000027a8468 AssemblyName
000007fef8f3ec90 40003bb 90 System.String 0 instance 0000000000000000 MessageArg
000007fef8f45f00 40003bc 7c System.Int32 1 instance -2146233054 ResourceId

And then we want to see the actually message. And _message is just another String object, so we just need to use !do one more time:

0:007>!do 00000000027a8720

Name: System.String
MethodTable: 000007fef8f3ec90
EEClass: 000007fef8b4b038
Size: 586(0x24a) bytes
(C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: Could not load type MyCompany.MyProduct.TestImplementation.ITreeNode`1' from assembly 'TestImplementation, Version=1.0.523.0, Culture=neutral, PublicKeyToken=9396306c2be7fcc4'.
Fields:
MT Field Offset Type VT Attr Value Name
000007fef8f45f00 4000096 8 System.Int32 1 instance 281 m_arrayLength
000007fef8f45f00 4000097 c System.Int32 1 instance 179 m_stringLength
000007fef8f406d8 4000098 10 System.Char 1 instance 43 m_firstChar
000007fef8f3ec90 4000099 20 System.String 0 shared static Empty
>> Domain:Value 000000000009fba0:0000000002771308 <<
000007fef8f40588 400009a 28 System.Char[] 0 shared static WhitespaceChars
>> Domain:Value 000000000009fba0:0000000002771b28 <<

Mystery solved! it is the ITreeNode type from assembly TestImplementation giving us the trouble. Once we know about the exact type and assembly name, the rest of the debugging is just a piece of cake. Oh, btw, if you want to quit debugging, type single 'q' to quit debugger client, or double "qq" to stop both debugger client and server.

It turned out this bug was actually caused by a mismatched version of assembly, and you probably has guessed it at the beginning.

Takeaway:

All the techniques described above are very basic debugging tricks, including Image File Execution, Debugging Tools for Windows, debugger server and client, SOS extension and .NET object memory model. And you can find more details about each of them by searching their names on the Internet. In this blog, I have shown you how to use these techniques together to solve one particular bug, but of course, you can apply them together in many other creative ways to solve different bugs in various environment.

But I think the most important takeaway should be this:

Make your startup logic as simple as possible! Less codes at startup == Less bugs at startup