.NET Debugging Demos Lab 4: High CPU Hang - Review

Hope you have had a chance to go through the high CPU lab... Some people already commented with the solution in the lab instructions for this lab but here are my answers and commentary on the lab.

Reproducing the issue:

1. Recycle IIS (IISReset)

2. Browse to http://localhost/BuggyBits/AllProducts.aspx this should take about 5 seconds or more depending on what machine you're on

3. Watch the CPU usage in taskmanager while browsing to this page

Q: What is the CPU usage like for w3wp.exe during the processing of this request?

A: Very high, approaching 100% 

Set up a performance counter log:

1. Check out Lab 3: Memory and set up a performance counter log per the instructions in there and start the log

Prepare to gather a memory dump:

1. Open a command prompt, move to the debuggers directory and type the following command but don't hit enter

adplus -hang -pn w3wp.exe -quiet

Reproduce the issue again and gather the memory dump:

1. From a command prompt in the IIS Resources/Tinyget directory run the following tinyget command to spawn up 5 threads making one request each

tinyget -srv:localhost -uri:/BuggyBits/AllProducts.aspx -threads:5 -loop:1

2. When the CPU usage is high (in task manager), press enter in the adplus window to get the dump

3. Stop the performance monitor log

Open the dump to figure out what it is doing:

1. Open the dump (hang_mode with todays date and time in the debuggers directory) in windbg using file/open crash dump

2. Load up the symbols and sos

Verify that you took the dump at the right time:

1. Run !threadpool to see the CPU usage of the system to make sure that you got the dump at the right time

Q: What is the cpu usage?

A: The CPU usage for the system is 98% according to !threadpool.

 0:027> !threadpool
CPU utilization 98% 
Worker Thread: Total: 5 Running: 5 Idle: 0 MaxLimit: 200 MinLimit: 2
Work Request in Queue: 4
AsyncTimerCallbackCompletion TimerInfo@001c9990
AsyncTimerCallbackCompletion TimerInfo@001e0e40
AsyncTimerCallbackCompletion TimerInfo@001f2800
AsyncTimerCallbackCompletion TimerInfo@001e1640
Number of Timers: 7
Completion Port Thread:Total: 1 Free: 1 MaxFree: 4 CurrentLimit: 0 MaxLimit: 200 MinLimit: 2

Q: How does the CPU usage shown here correlate to the CPU usage of the process?

A: This is a bit of a trick question:) It doesn't. The CPU usage shown is the CPU usage for the system so it might be that this process is using a lot of CPU or that some other process is using a lot of CPU. Usually on a dedicated web server you can use this as a guideline but it is always good to have performance logs as well to make sure.

Q: If you ran tinyget you may see some requests queued up eg: Work Request in Queue: 4, why are requests queueing up? Why aren't they scheduled on threads? (Compare Running worker threads and Maxlimit for worker threads)

A: In this case I actually ran tinyget twice. I have 5 worker threads, all running requests and although that is far from the max setting (100 threads/proc, and 2 procs = 200), no new threads will be started when CPU usage is over 80% so any requests that came in after it started going towards 100% will be sitting in the queue waiting for new threads to be started or freed.   

Determine which threads have been consuming most of the CPU time

1. Run .time to see the uptime and CPU user time of the process

 0:027> .time
Debug session time: Fri Feb 22 12:26:55.000 2008 (GMT+1)
System Uptime: 8 days 9:17:00.157
Process Uptime: 0 days 0:01:07.000
  Kernel time: 0 days 0:00:06.000
  User time: 0 days 0:00:45.000

2. Run !runaway to see the usertime for all the threads

 0:027> !runaway
 User Mode Time
  Thread       Time
  18:704       0 days 0:00:17.843<br>  19:9f4       0 days 0:00:13.328<br>  16:1948      0 days 0:00:10.718
  26:a7c       0 days 0:00:01.375
  24:114       0 days 0:00:01.093
  27:d54       0 days 0:00:00.390
  28:1b70      0 days 0:00:00.328
   0:b7c       0 days 0:00:00.171
  25:3f8       0 days 0:00:00.000
  23:1968      0 days 0:00:00.000

Q: From !runaway which threads are consuming most of the CPU?

A: Looks like thread 18, 19 and 16 are interesting here. They've been consuming a total of ~42 sec together.

Q: How much usermode CPU time was consumed by the process in total?  (from .time)

A: From .time we can see that the process has been up for 1 minute and 7 seconds (elapsed time) and 45 seconds has been spent in usermode code (effective time over both CPUs)

Note: Looking at !runaway output can be a bit tricky for multiple reasons.

First off, on a multiproc machine you have to remember that the usermode time (i.e. clock cycles spent in usermode code) is CPU time spent on all processors, therefore the usermode time may add up to more than elapsed time (process uptime). 

Secondly, the !runaway output shows the total usermode time spent on that thread since it was started.  In asp.net for example a thread may be reused for multiple requests so a high usermode time on a thread does not neccesarily mean that the requests running on that thread is a big CPU consumer.  

And finally, some threads like the GC threads (in multiproc, serverGC processes) stay alive for the duration of the process, so they have a higher chance of accumulating a lot of usermode time than worker threads, therefore it is really more interesting to look at two consecutive dumps and compare the differences in usermode time.

3. Pick the threads that have the highest usermode time and look at their stacks

~#s                   (set thread context, note # should be replaced with the appropriate thread number from !runaway)
kb 2000             (look at the native stack) 
!clrstack             (look at the .net stack)

 0:018> kb
ChildEBP RetAddr  Args to Child              
0290fd90 7d4d8f0e ffffffff 0290fde8 00000000 ntdll!ZwAllocateVirtualMemory+0x12
0290fddc 7d4d8f51 ffffffff 0aec1000 0030d000 kernel32!VirtualAllocEx+0x41
0290fdf8 79e7495b 0aec1000 0030d000 00080000 kernel32!VirtualAlloc+0x18
0290fe38 79e74929 0aec1000 0030d000 00080000 mscorwks!EEVirtualAlloc+0x104
0290fe50 79e7499b 7a3b45f0 0aec1000 0030d000 mscorwks!CExecutionEngine::ClrVirtualAlloc+0x14
0290fe6c 79fbdce8 0aec1000 0030d000 00080000 mscorwks!ClrVirtualAlloc+0x1a
0290fe88 79fc2efa 0aec0b18 0030d6b0 0aec0b18 mscorwks!SVR::reset_memory+0x3e
0290fea4 79fc2eb6 0aec0b18 0030d6b0 00000001 mscorwks!SVR::gc_heap::make_unused_array+0x1b
0290fec4 79fc3d1d 0aec0b18 0030d6b0 001a8970 mscorwks!SVR::gc_heap::thread_gap+0x30
0290fef0 79fc3c15 00000000 001a8800 00000002 mscorwks!SVR::gc_heap::sweep_large_objects+0xd4
0290ff28 79efa9ba 00000002 00000000 001a8800 mscorwks!SVR::gc_heap::mark_phase+0x3c5
0290ff54 79efaf60 ffffffff 00000000 001a8800 mscorwks!SVR::gc_heap::gc1+0x46
0290ff74 79efa72f 00000000 00000000 001a8800 mscorwks!SVR::gc_heap::garbage_collect+0x246
0290ff98 79fc8583 79fc8559 79f95b5c 79fc857c mscorwks!SVR::gc_heap::gc_thread_function+0x6a
0290ffb8 7d4dfe21 001a8800 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x92
0290ffec 00000000 79fc8538 001a8800 00000000 kernel32!BaseThreadStart+0x34

0:016> kb
ChildEBP RetAddr  Args to Child              
0252f084 7d4d8c82 00000308 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
0252f0f4 79e789c6 00000308 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0252f138 79e7898f 00000308 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x1af
0252f188 79e78944 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
0252f19c 79fbc82a ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
0252f1b0 79fbc6da 00000000 58ec767e 00000000 mscorwks!SVR::GCHeap::WaitUntilGCComplete+0x34
0252f1ec 79fc2945 58ec75d2 00032cff 0000909f mscorwks!Thread::RareDisablePreemptiveGC+0x1a0
0252f240 79fc2b07 00000000 0b376f34 7936d6e7 mscorwks!Thread::RedirectedHandledJITCase+0x13d
0252f2cc 66f12980 06f00f70 06f53d34 6628efd2 mscorwks!Thread::RedirectedHandledJITCaseForGCThreadControl+0x7
0252f2d8 6628efd2 06f00f70 02ef3cec 06f53d24 System_Web_RegularExpressions_ni+0x12980
0252f2ec 6613cb04 06f00f70 02ef3cec 02eee62c System_Web_ni+0x36efd2
0252f300 6613cb50 0252f50c 02ef3cec 02ef3cec System_Web_ni+0x21cb04
0252f4e8 6614c717 02ef11ac 00000000 02ee35fc System_Web_ni+0x21cb50
0252f50c 6614d8c3 00000001 02eee62c 00000000 System_Web_ni+0x22c717
0252f544 6614d80f 00000001 06eb01e4 02eb446c System_Web_ni+0x22d8c3
0252f580 6614d72f 02ef3cec 6614d6c2 00000000 System_Web_ni+0x22d80f
0252f5a0 65fe6bfb 660ee554 06f0a36c 02eee050 System_Web_ni+0x22d72f
0252f5d4 65fe3f51 00000000 0000006e 00000000 System_Web_ni+0xc6bfb
0252f610 65fe7733 0252f638 06f00110 02eee35c System_Web_ni+0xc3f51
0252f664 65fccbfe 0252f6b4 02ef0f6c 06f00fe8 System_Web_ni+0xc7733

Q: What are they doing?  Can you make an assumption based on this as to what is causing the high CPU usage?

A: 18 and 19 are GC threads (gc_heap::gc_thread_stub) and 16 is a thread running a managed code, but currently waiting for the GC to finish so at this point 16 is idling... so it looks like most of the CPU time right now is used by the GC threads

4. This is only applicable on multi-proc machines since they have dedicated GC threads...  Sum up the usermode time for the GC threads in !runaway and divide this by total usermode time from .time. 

Q: How much of the total usermode time is spent by the GC threads?

A: Thread 18 and 19 have used up around 21 seconds of CPU time => ~50% of the total CPU time, that is quite a lot of time to spend in garbage collection.

Examine the performance counter log:

1. Open up the performance counter log and add the counters for .NET CLR Memory/% Time in GC, # Gen 0 Collections, #Gen 1 Collections, #Gen 2 Collections and Large Object Heap Size.

Q: What does the % Time in GC counter measure?

A: "% Time in GC is the percentage of elapsed time that was spent in performing a garbage collection (GC) since the last GC cycle. This counter is usually an indicator of the work done by the Garbage Collector on behalf of the application to collect and compact memory. This counter is updated only at the end of every GC and the counter value reflects the last observed value; its not an average." in other words this is bound to spike up a bit after a Gen 2 collection but the important part here is that the average time in GC doesn't end up too high.

Q: What is the average % Time in GC during the stress run?

A: Around 50-60 % in my case. This will vary greatly depending on if you are on a single proc or multiproc box since the server GC and workstation GC are optimized for different things. The server GC which runs in the ASP.NET service on multiproc boxes will do more full collections, relocation and compact memory more. The workstation version omits some of this at the expense of higher memory usage.

Q: What is an appropriate number for % Time in GC?

A: This is a tricky question to answer, some say 5, some say 30. I think 5% on average would be very hard to reach in most asp.net apps, especially those with high memory usage, however, based on experience I think that if more than 30% of the CPU usage in the process goes to garbage collection something starts smelling a bit fishy. Having said that though, lowering the % time in GC can be pretty tricky since it usually involves changing the allocation patterns for the app so. It's a bit like with any optimization, try to take it down to a comfortanble level but don't overdo it unless you need to.   

Q: What is the ratio between Gen 0, Gen 1, and Gen 2 collections?  What should it be in an optimal scenario and why?

A: Gen 0 collections are basically free. Gen 1 collections (which includes Gen 0 collections) are also pretty cheap since we are still dealing with a low amount of memory that needs to be collected, relocated etc... Gen 2 collections on the other hand can get really really expensive since it deals with the whole .net GC heap. An optimal ratio would be to see 100 Gen 0 collections, per 10 Gen 1 collections and 1 Gen 2 collection, so 100:10:1. In my case the numbers are

5853 Gen 0 Collections
5830 Gen 1 Collections
5430 Gen 2 Collections

That means that pretty much every single collection that was done was a Gen 2 collection (since this also triggers a 1 and 0 collection) which is really bad since it means that we take no advantage of having a generational GC, we basically scavenge through all memory all the time.

Q: What can cause a ratio like this?

A: Generally two things cause this type of ratio. Either the allocation pattern is such that pretty much all your allocations are for large objects so the large object heap segments get exhausted all the time or you constantly call GC.Collect directly or through something like GC.GetTotalMemory(true). 

If someone does induce a GC (i.e. call GC.Collect) you will see the .NET CLR Memory/# Induced GC counter go up but in this particular case it stays flat which means that the reason is likely that we constantly allocate large objects.

Look at the memory dump to figure out what caused the high CPU in GC:

1. Run ~* kb 2000 to get all native stacks and search for the thread that triggered the GC (mscorwks!SVR::GCHeap::GarbageCollectGeneration)


   27  Id: bcc.d54 Suspend: 1 Teb: fff06000 Unfrozen
ChildEBP RetAddr  Args to Child              
103eed84 7d4d8c82 00000314 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
103eedf4 79e789c6 00000314 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
103eee38 79e7898f 00000314 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x1af
103eee88 79e78944 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
103eee9c 79efafc5 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
103eeec0 79efad3d ffffffff 001ab0e0 001ab188 mscorwks!SVR::gc_heap::wait_for_gc_done+0x62
103eeee8 79efa339 00000000 00000000 001ab0e0 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x1b5
103eef78 79ef98cf 103eefb4 00047470 00000003 mscorwks!SVR::gc_heap::try_allocate_more_space+0x136
103eef94 79f20ee4 103eefb4 00047470 00000003 mscorwks!SVR::gc_heap::allocate_more_space+0x2e
103eefd8 79f3d20e 00047470 00000000 00000000 mscorwks!SVR::gc_heap::allocate_large_object+0x5a
103eeff4 79e7510e 0f468840 00047470 00000000 mscorwks!SVR::GCHeap::Alloc+0x8d
103ef010 79e86713 00047470 00000000 00000000 mscorwks!Alloc+0x60
103ef04c 79e865b9 00023a30 4a807762 0773cd58 mscorwks!SlowAllocateString+0x29
103ef0f0 79378b7d 0773cd58 00023a2f 00000008 mscorwks!FramedAllocateString+0xa0
103ef104 79378af4 103ef14c 0d36b268 0773cd1c mscorlib_ni+0x2b8b7d
103ef12c 7a4a88d6 00000000 031df118 031deef0 mscorlib_ni+0x2b8af4
103ef14c 66f12980 06f00098 031deaec 6628efd2 System_ni+0x688d6
103ef38c 6614d8c3 00000001 03198328 00000000 System_Web_RegularExpressions_ni+0x12980
103ef3c4 6614d80f 00000001 06eb01e4 02eb446c System_Web_ni+0x22d8c3
103ef400 6614d72f 031ddfd8 6614d6c2 00000000 System_Web_ni+0x22d80f

0:016> !clrstack
OS Thread Id: 0x1948 (16)
ESP       EIP     
0252f208 7d61c828 [RedirectedThreadFrame: 0252f208] 
0252f24c 79363058 System.String.wstrcpy(Char*, Char*, Int32) 
0252f258 7936d6e7 System.String.FillStringChecked(System.String, Int32, System.String)
0252f278 79378b9f System.String.ConcatArray(System.String[], Int32)
0252f28c 79378af4 System.String.Concat(System.Object[])
0252f2a0 0fe90a2a AllProducts.Page_Load(System.Object, System.EventArgs)
0252f2d8 66f12980 System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0252f2e8 6628efd2 System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0252f2f8 6613cb04 System.Web.UI.Control.OnLoad(System.EventArgs)
0252f308 6613cb50 System.Web.UI.Control.LoadRecursive()
0252f31c 6614e12d System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0252f518 6614d8c3 System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0252f550 6614d80f System.Web.UI.Page.ProcessRequest()
0252f588 6614d72f System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0252f590 6614d6c2 System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0252f5a4 0fe90205 ASP.allproducts_aspx.ProcessRequest(System.Web.HttpContext)
0252f5a8 65fe6bfb System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0252f5dc 65fe3f51 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0252f61c 65fe7733 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0252f66c 65fccbfe System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0252f688 65fd19c5 System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0252f6bc 65fd16b2 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0252f6c8 65fcfa6d System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0252f8d8 79f047fd [ContextTransitionFrame: 0252f8d8] 
0252f90c 79f047fd [GCFrame: 0252f90c] 
0252fa68 79f047fd [ComMethodFrame: 0252fa68] 

Q: Why was a gc triggered?

A: We were trying to allocate a large object (gc_heap::allocate_large_object) and in doing so we needed more space and that triggered the garbage collection.
The object we are trying to allocate is 0x47470 = 291952 bytes which is why it ends up on the large object heap.

Q: What type of object was it allocating?

A: Based on the managed stack it looks like it could be either a char[] or a string since we are concatenating strings and this internally does a wstrcpy.

Q: What is the thread doing?

A: It's processing a request for AllProducts.aspx and specifically calling the Page_Load function which is concatenating strings, i.e. str1 = str2 + str3.

Q: Could this be causing the bad Gen 0, Gen 1, Gen 2 ratio? Why?

A: Concatentating large strings is a well known cause for high CPU in GC. The reason for this is explained in a lot more detail in this article so I won't go through the reasons in depth here, but basically what it comes down to is that when you do something like str = str + str2 in a loop you will generate new strings rather than expanding str because strings are immutable. If the loop goes on for a while and the strings are big you will constantly create new strings where one is just slightly bigger than the previous and this causes the LOH segments to be exhausted frequently, which in turn cause a lot of full GCs and high CPU in GC.

2. Find out what is on the large object heap (note: if you unfortunate you may have caught the GC in the plan or relocate phase in which case the !dumpheap output may not be correct.  If that is the case try to make do with what you get from the stacks and !dumpheap -min 85000)

!dumpheap -min 85000

 0:016> !dumpheap -min 85000
The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, or
we are at the initialization or shutdown of the gc heap. Commands related to 
displaying, finding or traversing objects as well as gc heap segments may not 
work properly. !dumpheap and !verifyheap may incorrectly complain of heap 
consistency errors.
Heap 0
 Address       MT     Size
03481a14 001a86d8   136876 Free
0aec0b18 001a86d8  3200688 Free
0b1ce1c8 790fd8c4   246832     
0b20a608 790fd8c4   246992     
0b246ad8 790fd8c4   462032     
0b2b77a8 001a86d8   784256 Free
0b376f28 790fd8c4   416272     
0b3dc938 001a86d8   336960 Free
0b42ed78 7912dae8  1048592     
0b52ed88 790fd8c4   416432     
total 10 objects
Heap 1
 Address       MT     Size
object 06eb0038: does not have valid MT
curr_object : 06eb0038
0ceb0048 001a86d8  1180896 Free
0cfd0528 790fd8c4   590672     
0d060878 001a86d8  3189232 Free
0d36b268 790fd8c4   291792     
total 4 objects
total 14 objects
      MT    Count    TotalSize Class Name
7912dae8        1      1048592 System.Byte[]
790fd8c4        7      2671024 System.String
001a86d8        6      8828908      Free
Total 14 objects


Q: What is on the large object heap?

A: In this case since we are in the middle of a GC we can't really traverse the heap which means that !dumpheap -min 85000 is not showing us all large objects on the heap but the majority of them are Byte[] and Strings which is consistent with the theory that it is string concatenation causing the problem.

Q: Is there any pattern in the sizes of the strings?

A: It's not all that evident in this particular case but there are for example one pair of strings strings (246832 bytes and 246992 bytes) and another pair (416272 and 416432 bytes), in both cases the difference in sizes is 160 bytes. If you happen to take a dump when the heap is not in motion you will probably see a longer list of strings where the size consistently grows by around 160 bytes, and this is basically because each iteration concatenates another ~160 bytes to the string in the Page_Load method.

3. Dump out a few of the strings to see the contents

!do <address of string>

 0:016> !do 0b1ce1c8 
Name: System.String
MethodTable: 790fd8c4
EEClass: 790fd824
Size: 246832(0x3c430) bytes
String: <table><tr><td><B>Product ID</B></td><td><B>Product Name</B></td><td><B>Description</B></td></tr><tr><td>0</td><td>Product 0</td><td>Description for Product 0</td>
</tr><tr><td>1</td><td>Product 1</td><td>Description for Product 1</td></tr><tr><td>2</td><td>Product 2</td><td>Description for Product 2</td></tr><tr><td>3</td>
<td>Product 3</td><td>Description for Product 3</td></tr><tr><td>4</td><td>Product 4</td><td>Description for Product ... 

Q: Based on the stacks, the knowledge of the collection ratios and the string contents, can you determine what went wrong.  You might want to do some internet searches on some of the calls on the stacks if you don't know the answer immediately.

A: As mentioned before, it appears to be the string concatenation in Page_Load that is causing the high CPU. The method appears to be assembling a table of products.

Verify your assumption in the code and modify the code to resolve the issue 

1. Open AllProducts.aspx.cs and look at the code for Page_Load

    protected void Page_Load(object sender, EventArgs e)
        DataTable dt = ((DataLayer)Application["DataLayer"]).GetAllProducts();
        string ProductsTable = "<table><tr><td><B>Product ID</B></td><td><B>Product Name</B></td><td><B>Description</B></td></tr>";

        foreach (DataRow dr in dt.Rows)
            ProductsTable += "<tr><td>" + dr[0] + "</td><td>" + dr[1] + "</td><td>" + dr[2] + "</td></tr>" ; 
        ProductsTable += "</table>";
        tblProducts.Text = ProductsTable;

Q: What line of code is causing the problem?

A: The bolded line looks like the culprit here.   

Q: How do you resolve it?  Why does this resolve the problem?

A: Use a StringBuilder and append data rather than concatenating strings. The reason this resolves the issue is because the string builder has a buffer that it fills up when you append so it fills up the existing buffer and doesn't have to create a new one all the time. When it needs to expand the buffer it doubles it so the amount of times the app has to make new allocations on the LOH is decreased by a factor of 100 or 1000 in this case. If you change it to use a string builder instead you will see what I mean... the CPU usage will drop, the execution time is vastly different and if you look at the number of GCs that occurr you will notice that instead of making around 5000 or so garbage collections we are talking about less than 10 full GCs.