A situation when performance of a website go down after enabling WebGarden



An Asp website hangs intermittently and we saw the following errors in event logs

Event Type: Warning Event Source: W3SVC-WP Event Category: None Event ID: 2262 Date: 24/03/2010 Time: 14:37:58 User: N/A Computer: compurtername Description: ISAPI 'C:\WINDOWS\system32\inetsrv\asp.dll' reported itself as unhealthy for the following reason: 'ASP unhealthy because 98% of executing requests are hung and 100% of the request queue is full.'.

Issue started after we enabled web gardening on the website.




We took some hang dumps of IIS and analyzed them using Windbg. Most of the threads were busy doing activities like below


0:136> kb 100
ChildEBP RetAddr Args to Child
2de2e984 7c826e09 77e649ff 2de2ea2c 40110080 ntdll!KiFastSystemCallRet
2de2e988 77e649ff 2de2ea2c 40110080 2de2e9c8 ntdll!ZwCreateFile+0xc
2de2ea24 77e41a8e 00000000 40000000 00000000 kernel32!CreateFileW+0x377
2de2ea48 0b6ee42b 2de2ea6c 40000000 00000000 kernel32!CreateFileA+0x30
WARNING: Stack unwind information not available. Following frames may be wrong.
2de2eaf4 77d05186 1332c040 066a9a6c 2de2eb58 AspUpload!DllUnregisterServer+0xcdfd
2de2eb14 77d14acf 1332c040 00000068 00000004 oleaut32!tPushValJmpTab+0x102
2de2eba4 0b6e1982 05ebd60c 1332c040 00000000 oleaut32!CTypeInfo2::Invoke+0x234
2de2ebcc 7347d36b 1332c040 00000017 734614f4 AspUpload!DllUnregisterServer+0x354
2de2ec20 73468ad1 01c0f918 1332c040 00000017 vbscript!CatchIDispatchInvoke+0x46
2de2ec60 73468a40 0a0c1ee0 1332c040 00000017 vbscript!IDispatchInvoke2+0xaf
2de2ec9c 734689f2 0a0c1ee0 1332c040 00000017 vbscript!IDispatchInvoke+0x59
2de2edb0 7346613b 0a0c1ee0 1332c040 00000017 vbscript!InvokeDispatch+0x13a
2de2edd4 73468b57 0a0c1ee0 1332c040 00000017 vbscript!InvokeByName+0x42
2de2f0b0 734633e0 2de2f434 00000001 0a0c1ee0 vbscript!CScriptRuntime::Run+0x2587
2de2f1a8 73463eb7 2de2f434 00000001 098f87e0 vbscript!CScriptEntryPoint::Call+0x5c
2de2f47c 734633e0 2de2f800 00000000 0a0c1ee0 vbscript!CScriptRuntime::Run+0x20d4
2de2f574 734665d7 2de2f800 00000000 098f8810 vbscript!CScriptEntryPoint::Call+0x5c
2de2f848 734633e0 00000000 00000000 0a0c1ee0 vbscript!CScriptRuntime::Run+0x1f08
2de2f940 734637d1 00000000 00000000 00000000 vbscript!CScriptEntryPoint::Call+0x5c
2de2f998 73463b9c 0edc47d0 00000000 00000000 vbscript!CSession::Execute+0xb4
2de2f9e8 73461849 00000000 00000000 709e19c0 vbscript!COleScript::ExecutePendingScripts+0x13e
2de2fa04 709e2bfa 02c02220 02c02220 02716e38 vbscript!COleScript::SetScriptState+0x150
2de2fa30 709e2bbc 00000000 709e19c0 0953f4e8 asp!CActiveScriptEngine::TryCall+0x19
2de2fa6c 709e2af2 00000000 6472474e 02a048b8 asp!CActiveScriptEngine::Call+0x31
2de2fa88 709e29f6 2de2fb0c 00000000 00000000 asp!CallScriptFunctionOfEngine+0x5b
2de2fadc 709e2921 02f02d58 00000000 2de2fb68 asp!ExecuteRequest+0x17e
2de2fb44 709e24e0 02f02d58 02a048b8 2de2fb68 asp!Execute+0x24c
2de2fb98 709e244a 00000000 00000000 07426770 asp!CHitObj::ViperAsyncCallback+0x3f0
2de2fbb4 4a77b5ea 02a94310 046be518 2de2fd74 asp!CViperAsyncRequest::OnCall+0x92
2de2fbd0 77720d30 07426770 000e2628 00000000 comsvcs!CSTAActivityWork::STAActivityWorkHelper+0x32
2de2fc1c 777217dc 00000000 000e2628 4a77b5b8 ole32!EnterForCallback+0xc4
2de2fd7c 776f03b4 2de2fc54 4a77b5b8 07426770 ole32!SwitchForCallback+0x1a3
2de2fda8 7769c194 000e2628 4a77b5b8 07426770 ole32!PerformCallback+0x54
2de2fe40 7772433a 046be518 4a77b5b8 07426770 ole32!CObjectContext::InternalContextCallback+0x159
2de2fe60 4a77b78c 046be518 4a77b5b8 07426770 ole32!CObjectContext::DoCallback+0x1c
2de2fecc 4a77bcf2 06e49950 06e49930 00132e64 comsvcs!CSTAActivityWork::DoWork+0x12d
2de2fee4 4a77c7de 07426770 00000001 06e49930 comsvcs!CSTAThread::DoWork+0x18
2de2ff04 4a77cabf 00000000 0f451ec0 097eba48 comsvcs!CSTAThread::ProcessQueueWork+0x37
2de2ff84 77bcb530 06e49930 00000000 00000000 comsvcs!CSTAThread::WorkerLoop+0x190
2de2ffb8 77e6482f 097eba48 00000000 00000000 msvcrt!_threadstartex+0x74
2de2ffec 00000000 77bcb4bc 097eba48 00000000 kernel32!BaseThreadStart+0x34


So the thread appears to be busy doing a CreateFile operation. Note that this may not necessarily be a normal create file operation we expect.

kernel32!CreateFileA is a generic function used for many activities.

However when we dump the argument to CreateFileA function


0:136> dc 2de2ea6c
2de2ea6c 35385c5c 2e32312e 372e3631 61645c35 \\\da
2de2ea7c 415c6174 79532057 6d657473 6d495c73 ta\XYZ Share\Ma
2de2ea8c 73656761 7564415c 6f57746c 745c6b72 inFolder\Subfolder\
2de2ea9c 3031375c 5f343537 706a2e31 7c820067 imagename.jpg..|
2de2eaac 2de2eadc 77e622ec 77e622f6 00000000 ...-.".w.".w....
2de2eabc 113f7ebc 00000000 2de2eab8 00000001 .~?........-....
2de2eacc 2de2ec10 77e61a60 77e62300 ffffffff ...-`..w.#.w....
2de2eadc 77e622f6 77e622f6 0b6ee400 00000000 .".w.".w..n.....

Note : The path has been modified for the sake of data privacy.


From above we see that the major part of the contention is when copying image files to a network path

\\\data\XYZ Share\MainFolder\Subfolder\imagename.jpg

All those requests are in ThirdPartyDll.dll creating image files on the share above . Some of them have been doing this for quite some time, so the reason for the contention that eventually leads to the health recycling is that it simply takes a long time to copy files over there, especially when many threads are copying files at the same time. Since this is a system wide resource (the network), the fact that web garden is turned on, only adds fuel to the fire by allowing even more concurrent requests that are all writing to the same share. In the other web garden instance there are once again lots of threads writing files to this same share so in total too many threads concurrently trying to write images to this share.

Then when we focussed on the the file share, we found it had followign kind of entries in event logs near the time the web server became unresponsive.


Event Type: Warning
Event Source: Srv
Event Category: None
Event ID: 2025
Date:  25/03/2010
Time:  13:14:45
User:  N/A
Computer: FileServer-
The server has detected an attempted Denial-Of-Service attack from client \\ClientIpAddress, and has disconnected the connection.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
0000: 00 00 00 00 02 00 48 00   ......H.
0008: 00 00 00 00 e9 07 00 80   ....é..€
0010: 00 00 00 00 22 00 00 c0   ...."..À
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........


This appeared in perfect sync with the dump analysis. The issue seems to be happening due to large number of simultaneous connections to the content server. Now this issue has been discussed in http://support.microsoft.com/kb/898468. The workaround 1 mentions how we can increase the connections (if it is a genuine load) and can help with this issue. The key is MaxMpxCt at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters which is set to a default value of 50.

We increased the value to 200 in this case and this fixed the issue.


So to summarize, by increasing the number of worker processes in an application pool, we increased teh number of such connections leading to a poor performance. The factors were therefore external but they got exposed after enabling web gardening. Enable WebGardening if we have ample of processor and hardware support. If you can go for multiple Application pools, prefer it over web gardening. But off course, in a situation like what we discussed, it may still cause issues.