Reader email: Need help troubleshooting perf/memory issues

I get several emails every day (through the blog) with urgent requests to help troubleshoot memory issues or memory related performance issues.  Some just say “we have a memory issue, how do I troubleshoot it?” and others are far far more detailed with links to dumps and everything:)

I love getting emails, but since I have a day job too (troubleshooting such issues:)) I unfortunately have to ignore most such requests through the blog:( to have time to help the customers that contact me through the normal support channels, and therefore I would recommend that if you do have a pressing issue, that you create a support case on http://support.microsoft.com/ to make sure that you get a dedicated support engineer and that your issue gets the attention it deserves.

Having said all that, I wanted to point out one email I got today, both because I could see that the person writing it had put some effort into try troubleshooting the issue before contacting me, and because I think and hope that my answer might be able to give some starting points to other people in his situation.

   Hi Tess, 
We have a week to go for production and are stuck in one performance issue. 
Please guide me in resolving this. 
We are running an ASP .Net (2.0) App on Windows Server 2003 (SP2), dual-processor, 4GB RAM. 
During stess-testing our app with 60 users, we see a drop in throughput from 17th minute to 22nd minute. 
It rises and drops again around 40th minute. 
I took 4 memory dumps for 8 mins, ran the perfmon counters. 
Here are the details: 
                     Dump1     Dump2     Dump3     Dump4 
Size                 1.16G     1.12G     1.04G     1.13G 
GC                   802MB     750MB     648MB     738MB 
Loader               18MB      18MB      18MB      18MB 
Image                236MB     236MB     236MB     236MB 
MEM_RESV             220MB     236MB     347MB     255MB 
MEM_COMM             1.19GB    1.06GB    1021MB    1.08GB 
Perfmon Counters (8 mins) 
LOH:                                307MB, 
#Bytes in all heaps:     798MB, 
%Time in GC:               49%(avg), 
#Induced GC:               14 (avg). 
So I can deduce the following:  1. We are stuck in GC for the 8 minutes. 

2. Assembly leak is not the problem. 

3. Am not sure if Finalizer thread is being blocked.   The kb reads as below:   0:017> kb <br>ChildEBP RetAddr     Args to Child <br>0e1afc74 7c827d0b     77e61d1e 00000334 00000000     ntdll!KiFastSystemCallRet <br>0e1afc78 77e61d1e     00000334 00000000 0e1afcbc     ntdll!NtWaitForSingleObject+0xc <br>0e1afce8 79e8c5f9     00000334 000007d0 00000000     kernel32!WaitForSingleObjectEx+0xac <br>0e1afd2c 79e8c52f     00000334 000007d0 00000000     mscorwks!PEImage::LoadImage+0x1af <br>0e1afd7c 79e8c54e     000007d0 00000000 00000000     mscorwks!CLREvent::WaitEx+0x117 <br>0e1afd90 79fc2397     000007d0 00000000 00000000     mscorwks!CLREvent::Wait+0x17 <br>0e1afdac 79fc24b4     000f2b28 0e1afeb0 000f3328     mscorwks!SVR::WaitForFinalizerEvent+0x4a <br>0e1afdc0 79e9845f     0e1afeb0 00000000 00000000     mscorwks!SVR::GCHeap::FinalizerThreadWorker+0x49 <br>0e1afdd4 79e983fb     0e1afeb0 0e1afe5c 79f7759b     mscorwks!Thread::DoADCallBack+0x32a <br>0e1afe68 79e98321     0e1afeb0 9b91703f 00000000     mscorwks!Thread::ShouldChangeAbortToUnload+0xe3 <br>0e1afea4 79eef6cc     0e1afeb0 00000000 0010f770     mscorwks!Thread::ShouldChangeAbortToUnload+0x30a <br>0e1afecc 79eef6dd     79fc2469 00000008 0e1aff14     mscorwks!ManagedThreadBase_NoADTransition+0x32 <br>0e1afedc 79ef12ab     79fc2469 9b91718f 00000000     mscorwks!ManagedThreadBase::FinalizerBase+0xd <br>0e1aff14 79f92015     00000000 00000000 89244470     mscorwks!SVR::GCHeap::FinalizerThreadStart+0xbb <br>0e1affb8 77e64829     000f3328 00000000 00000000     mscorwks!Thread::intermediateThreadProc+0x49 <br>0e1affec 00000000     79f91fcf 000f3328 00000000     kernel32!BaseThreadStart+0x34  4. !finalizequeue results <br>Heap 0 <br>generation 0 has 158 finalizable objects (1d1d42c8->1d1d4540) <br>generation 1 has 269 finalizable objects (1d1d3e94->1d1d42c8) <br>generation 2 has 1961 finalizable objects (1d1d1ff0->1d1d3e94) <br>Ready for finalization 0 objects (1d1d4540->1d1d4540) <br>------------------------------ <br>Heap 1 <br>generation 0 has 830 finalizable objects (1d23d7c0->1d23e4b8) <br>generation 1 has 284 finalizable objects (1d23d350->1d23d7c0) <br>generation 2 has 1238 finalizable objects (1d23bff8->1d23d350) <br>Ready for finalization 0 objects (1d23e4b8->1d23e4b8)  
 How do I proceed from here to isolate the issue? Thanks, 

Looking at the above, these were the comments and suggestions for further troubleshooting that I sent him.

Hi,

I don’t agree that you are stuck in a GC for 8 minutes, you can check with # Gen 0/Gen 1/Gen 2 collections and will probably see them increasing.  The reason I say that you are not stuck is because the % time in GC counter will only update on completion of a GC so if you were stuck it would not update.

I do think however that your % Time in GC looks really high and I think there might be two reasons for this

  1. You are reaching the point where mem usage gets really close the OOM range (typically around 1 GB private bytes) so the process is probably working really hard to get rid of everything it can.  # Induced GC here is not 0 which means that someone called GC.Collect, and unless it is your app that calls GC.Collect it is probably ASP.NET doing this to clean up at an OOM.

    Each induced GC will be a full GC which can get pretty costly performance wise.

    Recommendation: Look at general memory usage, i.e. look at !dumpheap –stat etc. and see what is using all this .net memory and try to find places where you can reduce the memory usage, or alternatively, if you need to use all this memory, then you would need to move to a 64-bit environment.  (You might want to look at some of my posts on troubleshooting .net memory issues to narrow this down).  A potential also, if you are running multiple sites here, would be to separate them into different app pools.

  2. Your LOH looks pretty large (307 MB), if you are continuously allocating on the LOH you will also trigger a lot of full GC’s.

    Recommendation:   have a look at !dumpheap –min 85000 and see if there are any unexpected large objects, or if there are any that you can reduce in size or in number of objects. (See http://blogs.msdn.com/b/tess/archive/2006/06/22/asp-net-case-study-high-cpu-in-gc-large-objects-and-high-allocation-rates.aspx  for more details)

Regarding the finalizer, the finalizer is not stuck, it is sitting in mscorwks!SVR::WaitForFinalizerEvent+0x4a which means that it is just idling, waiting for the GC to tell it to finalize. 

I have a post with common asp.net threads and how they look in the “normal” idling state here http://blogs.msdn.com/b/tess/archive/2007/04/02/things-to-ignore-when-debugging-an-asp-net-hang-update-for-net-2-0.aspx that might be useful for further debugging.

You can further see that finalization is not a problem because !finalizequeue reports no objects to ready for finalization in either heap “Ready for finalization 0 objects”.  Here you can find an example of what it would look like if the finalizer was actually stuck.

Hope that helps,
Tess