Real Case: IIS and Asp.Net hanging or poor performance (intermediary)

This week I'm working on this case related to IIS6 server hosting an Asp.Net application (it's a C# app) whose the performance is poor even when a very low number of users is hitting it. The architecture is the classical web layer accessing the backend DB servers and my goal is to isolate the problem, i.e pinpoint where the problem resides and try to find out why.

The first step after understanding the environment and being aware of all relevant servers and infrastructure devices, is to actually see the problem happening to have a better understanding of what is going on. We had one single user accessing the application and there are indeed some actions the user takes on the pages that take a long time to return, so the next step is to establish a baseline and agree on a SLA for the performance the application is supposed to have.

Another very important relevant point to this scenario is that since the environment counts on a NLB composed by multiple IIS servers, we need first to isolate which IIS server the client will connect to when we start trying to repro the problem. So after checking their NLB configuration and making sure they have the affinity set to single we can just open the IIS logs of the servers and search the client IP at the c-ip field. Since the affinity is single the server which has been serving the client will continue to be the same.

Once we know which server we will work with during the tests, my favorite tool to create the baseline is the System Monitor (perfrmon.exe) and we can also make some basic analysis of the IIS logs. So I set the perfmon with some specific counters that will help me to establish a baseline on how their app is currently behaving and also made sure they have the IIS log fields I need to work with when establishing the baseline.

The counters I used in the perfmon were the following:

ASP.NET v1.1.4322\Applications Running ASP.NET v1.1.4322\Request Execution Time ASP.NET v1.1.4322\Request Wait Time ASP.NET v1.1.4322\Requests Current ASP.NET v1.1.4322\Requests Queued ASP.NET v1.1.4322\State Server Sessions Active ASP.NET v1.1.4322\Worker Process Restarts ASP.NET v1.1.4322\Worker Processes Running


The fields I needed to make sure are being included in the IIS logs are the following:






After that we reproduced the problem, which by the way was really happening with only one user accessing it which already eliminates any hypothesis of the problem being related to the amount of load hitting the server, we started working on the baseline.

The counters in perfmon pretty much confirm what we see in the IIS logs: Their main application page takes an average time longer than 40 seconds to execute. The perfmon counter “ASP.NET v.1.1.4322\Request Execution Time” will report that as well as the IIS logs if you use the LogParser tool and run the following query against them:

First we need to consolidate all the IIS logs in a single CSV file:

LogParser "select date,time,cs-uri-stem,cs-uri-query,sc-status,sc-win32-status,time-taken,sc-bytes,cs-bytes from ex061024.log, ex061025.log to merged.csv" -o:CSV

Now we run the following query to get the average time-taken for the URLs in the logs and send the output to a new CSV file named AvgTimeTaken.txt

logparser "select cs-uri-stem,count(cs-uri-stem), avg(time-taken) from merged.csv to AvgTimeTaken.txt where time-taken>=10000 GROUP BY cs-uri-stem ORDER BY avg(time-taken) DESC" -o:CSV -i:CSV

The result is as following:


COUNT(ALL cs-uri-stem)

AVG(ALL time-taken)







Ok, now that we see the problem happening and have our baseline it’s time to start working on isolating problem.

We don’t know what exactly the user does in order to have the mainpage.aspx taking so long to execute, so what we do is try to reproduce the problem again and keep following the perfmon counter “ASP.NET v1.1.4322\Requests Current”. This method of tracking the “Requests Current” counter will not work for all scenarios, since when you have the server under some load you will likely not be able to tell when a specific request is taking long or not because there will be many requests at the same time and they will keep changing the value of “Requests Current”. For this cases we would need to have the collaboration of the user (which would tell us when the page is taking long to return) or look at other counters like the “Request Queued”. However for this specific scenario where we have the variables under control and there is a single user testing the system for us, it will work just fine.

When the “Requests Current” shows 1 request and don’t get back to zero after, let’s say, 10 seconds, that’s the point where we know this request is taking too long to execute. Remember that our baseline is around 40 seconds… Now it’s right time to generate a hung dump of the worker process. To do that we first run the iisapp.vbs script in order to identify the PID of the worker process hosting our application pool and after that we can use any tool to simply generate multiple hang dumps (like Debugging Tools For Windows or Debug Diagnostics Tool (DebugDiag)) some seconds apart from each other. In this case I used the adplus.vbs from Debugging Tools For Windows with the following sintaxe:

Adplus.vbs –hang –p 2364 –quiet –o c:\dumps

This will generate a memory dump of the w3wp.exe whose the PID is 2364 and store it at the c:\dumps folder.

The dumps analysis will require just some basic .Net debugging steps (I used the Windbg from Debugging Tools For Windows to analyze them) as below:

This is a managed application (it’s aspx) so we want to make sure we have, besides the proper symbols, the sos.dll extension properly loaded:

0:000> .chain

Extension DLL chain:

    aspvbscriptext.dll: API 1.0.0, built Fri Feb 06 18:20:44 2004

        [path: C:\debuggers\aspvbscriptext.dll]

    mdacexts.dll: image 2004.05.11.1, API 1.0.0, built Tue May 11 17:46:12 2004

        [path: C:\debuggers\mdacexts.dll]

    webdbg: API 3.5.5, built Thu Nov 15 18:26:28 2001

        [path: C:\debuggers\webdbg.dll]

    sieext: API 1.0.0, built Fri Nov 05 16:07:40 2004

        [path: C:\debuggers\winext\sieext.dll]

    dbghelp: image 6.6.0005.0, API 6.0.6, built Fri Apr 14 17:03:13 2006

        [path: C:\debuggers\dbghelp.dll]

    ext: image 6.6.0005.0, API 1.0.0, built Fri Apr 14 17:02:35 2006

        [path: C:\debuggers\winext\ext.dll]

    exts: image 6.6.0005.0, API 1.0.0, built Fri Apr 14 17:02:30 2006

        [path: C:\debuggers\WINXP\exts.dll]

    uext: image 6.6.0005.0, API 1.0.0, built Fri Apr 14 17:02:37 2006

        [path: C:\debuggers\winext\uext.dll]

    ntsdexts: image 6.0.5357.0, API 1.0.0, built Fri Apr 14 17:21:31 2006

        [path: C:\debuggers\WINXP\ntsdexts.dll]

    clr10\sos: image 6.6.0005.0, API 1.0.0, built Fri Apr 14 17:02:47 2006

        [path: C:\debuggers\clr10\sos.dll]

Ok, so once we know we have the right extension loaded, let’s start finding which threads are being used to process the Asp.Net mainpage.aspx. We expect to see only one thread here since our user is simply accessing this single page. By looking at the first dump we see the following:


0:000> !aspxpages

Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorsvr.dll"

Going to print all threads with ASP.NET Pages running.

Thread 11: /App/Mainpage.aspx

Ok, as we suspected there is only one thread processing the Asp.Net request and it’s the thread 11. Now let’s look on what this thread is currently doing:

0:011> !clrstackThread 11ESP EIP 0x0604f4dc 0x7ffe0304 [FRAME: NDirectMethodFrameStandalone] [DEFAULT] UI2 System.Data.Common.UnsafeNativeMethods/Dbnetlib.ConnectionRead(ValueClass System.Runtime.InteropServices.HandleRef,SZArray UI1,UI2,UI2,UI2,ByRef I)0x0604f4f8 0x069cf5e4 [DEFAULT] [hasThis] Void System.Data.SqlClient.TdsParser.ReadNetlib(I4) ß it seems we have some SQL stuff on the stack… 0x0604f558 0x069cf531 [DEFAULT] [hasThis] Void System.Data.SqlClient.TdsParser.ReadBuffer() 0x0604f560 0x069cf463 [DEFAULT] [hasThis] UI1 System.Data.SqlClient.TdsParser.ReadByte()0x0604f568 0x069ced7a [DEFAULT] [hasThis] Boolean System.Data.SqlClient.TdsParser.Run(ValueClass System.Data.SqlClient.RunBehavior,Class System.Data.SqlClient.SqlCommand,Class System.Data.SqlClient.SqlDataReader)0x0604f5b0 0x06aa3c79 [DEFAULT] [hasThis] Void System.Data.SqlClient.SqlDataReader.ConsumeMetaData()0x0604f5c4 0x06aa3c0e [DEFAULT] [hasThis] SZArray Class System.Data.SqlClient._SqlMetaData System.Data.SqlClient.SqlDataReader.get_MetaData()0x0604f5cc 0x06aa1be7 [DEFAULT] [hasThis] Class System.Data.SqlClient.SqlDataReader System.Data.SqlClient.SqlCommand.ExecuteReader(ValueClass System.Data.CommandBehavior,ValueClass

<cut for brevity>

So, the managed stack for the thread 11 seems to be waiting on SQL connection. The native stack would be at a ntdll!NtWaitingForSingleObject() function which basically means this thread is doing nothing but waiting. So let’s examine a little further to see if we can identify which results are we waiting from the SQL Server:

0:011> !dsoThread 11ESP/REG Object Nameeax 0x10073b34 System.Collections.Hashtableecx 0x10073b34 System.Collections.Hashtable0x604f4c8 0x1801dad4 System.Byte[]0x604f4d0 0x28061840 System.Data.SqlClient.TdsParser0x604f4d4 0x1801dad4 System.Byte[]0x604f4f8 0x28061840 System.Data.SqlClient.TdsParser0x604f500 0x28090704 System.Data.SqlClient.SqlDataReader0x604f504 0x28061840 System.Data.SqlClient.TdsParser0x604f51c 0x28061840 System.Data.SqlClient.TdsParser0x604f54c 0x2808fd28 System.Data.SqlClient.SqlCommand ß this is what we’re looking for

<cut for brevity>

Let’s taker a closer look at this specific object:

0:011> !do 0x2808fd28 Name: System.Data.SqlClient.SqlCommandMethodTable 0x06854dd8EEClass 0x0657fb60Size 64(0x40) bytesGC Generation: 0mdToken: 0x02000194 (c:\windows\assembly\gac\\1.0.5000.0__b77a5c561934e089\*: 0x068545c4MT Field Offset Type Attr Value Name0x79b80d44 0x40000b6 0x4 CLASS instance 0x00000000 __identity0x0649824c 0x4000502 0x8 CLASS instance 0x00000000 site0x0649824c 0x4000503 0xc CLASS instance 0x00000000 events0x0649824c 0x4000501 0 CLASS shared static EventDisposed>> Domain:Value 0x00099230:NotInit 0x000dfad8:0x100144d8 <<0x06854dd8 0x4000eac 0x10 CLASS instance 0x10032b90 cmdText ß this string seems promising 0x06854dd8 0x4000ead 0x24 System.Int32 instance 4 cmdType0x06854dd8 0x4000eae 0x28 System.Int32 instance 3600 _timeout0x06854dd8 0x4000eaf 0x2c System.Int32 instance 3 _updatedRowSource0x06854dd8 0x4000eb0 0x38 System.Boolean instance 0 _inPrepare0x06854dd8 0x4000eb1 0x30 System.Int32 instance -1 _handle0x06854dd8 0x4000eb2 0x14 CLASS instance 0x28090450 _parameters0x06854dd8 0x4000eb3 0x18 CLASS instance 0x2808f624 _activeConnection0x06854dd8 0x4000eb4 0x39 System.Boolean instance 0 _dirty0x06854dd8 0x4000eb5 0x3a System.Byte instance 0 _execType0x06854dd8 0x4000eb6 0x1c CLASS instance 0x00000000 _cachedMetaData0x06854dd8 0x4000eb7 0x34 System.Int32 instance -1 _rowsAffected0x06854dd8 0x4000eb8 0x3b System.Boolean instance 0 designTimeVisible0x06854dd8 0x4000eb9 0x20 CLASS instance 0x00000000 _transaction

Looking at such an instance:

0:011> !do 0x10032b90 Name: System.StringMethodTable 0x79b7daf8EEClass 0x79b7de44Size 80(0x50) bytesGC Generation: 2mdToken: 0x0200000f (c:\windows\\framework\v1.1.4322\mscorlib.dll)String: dbo.[spGenerateOptions] ß Bingo!!! This is the stored procedure we’ve been waiting on to returnFieldDesc*: 0x79b7dea8MT Field Offset Type Attr Value Name0x79b7daf8 0x4000013 0x4 System.Int32 instance 31 m_arrayLength0x79b7daf8 0x4000014 0x8 System.Int32 instance 30 m_stringLength0x79b7daf8 0x4000015 0xc System.Char instance 0x64 m_firstChar0x79b7daf8 0x4000016 0 CLASS shared static Empty>> Domain:Value 0x00099230:0x28000224 0x000dfad8:0x28000224 <<0x79b7daf8 0x4000017 0x4 CLASS shared static WhitespaceChars>> Domain:Value 0x00099230:0x28000238 0x000dfad8:0x2000364c <<

So we know that at the point this dump was taken we were just waiting at the stored procedure spGenerateOptions to complete it’s execution at a SQL Server.

We are going now to look at the subsequent dump (taken 30 seconds later) to verify whether we have any progress or not on that thread, however let’s take a loot at one last thing on this dump before starting with the other one (this can save us a lot of time here):

0:011> !runaway

 User Mode Time

  Thread Time

  13:d5c 0 days 0:00:37.500

  14:c00 0 days 0:00:24.718

  16:9c8 0 days 0:00:16.406

  15:9e8 0 days 0:00:13.562

  11:bb8 0 days 0:00:12.015 ß This is the CPU time spent by the thread 11 at this point

So we know that the thread 11 have had the CPU for the aprox. 12.015 seconds when this dump was generated. So after opening the second dump, before jumping on any deeper analysis, lets just run the same command and verify if the thread 11 made any progress.

0:011> !runaway

 User Mode Time

  Thread Time

  13:d5c 0 days 0:00:37.500

  14:c00 0 days 0:00:24.718

  16:9c8 0 days 0:00:16.406

  15:9e8 0 days 0:00:13.562

  11:bb8 0 days 0:00:12.015 ß The time is exactly the same 30 seconds later

Ok, so there is not a single additional millisecond being spent by the thread 11 on the second dump. At this point we don’t even need to take a look at the thread stack to know it’s exactly the same. In other words this thread has been doing nothing but waiting on that stored procedure to execute at the SQL Server.

The problem has been isolated as a database performance issue (mission accomplished!!! J). The next step is work on the database server (running in another box) to understand why this sp is taking too long to execute and, of course, properly addressing the issue to provide a definitive solution. IIS ended up just being the server where the symptom is faced and not the one where the root cause of the problem is really happening.