Real case: Application Pool’s worker process (w3wp.exe) crashing during recycling
I was teaching a workshop in London last week and one of the students brought a very interesting issue he had been working on for some time and based on the dump analysis he was not able to point to anything different from our own code (Microsoft’s modules) as the ones causing the crash. He had done some research and found out that some crashes when recycling the app pool have been reported around the internet due to a PHP ISAPI problem. He had such ISAPI loaded in the crashing app pool so he suspected from that but the dumps still point to our code…
Direct to point:
By looking at the dumps generated the [DebugDiag] we could see that the crash is a typical Access Violation (0xC0000005) as below:
The AV details:
0:000> .exr -1
ExceptionAddress: 5a3b01b4 (w3core!W3_SERVER::Terminate+0x00000164)
ExceptionCode: c0000005 (Access violation)
Attempt to read from address 608fc030
The thread stack when the crash happened was something like this:
0006fcb4 5a3bc347 w3core!W3_SERVER::Terminate+0x164
0006ff0c 0100187c w3core!UlW3Start+0x280
0006ff44 01001a27 w3wp!wmain+0x22a
0006ffc0 77e6f23b w3wp!wmainCRTStartup+0x12f
0006fff0 00000000 kernel32!BaseProcessStart+0x23
So as we can see, the w3core.dll was in the top of the stack and this would lead us to conclude this is the culprit module crashing the process. But let’s hold our conclusions for a moment… All the w3core.dll is trying to do is to read from an address that is actually part of another module (the ADMPROX.DLL) that is currently unloaded and that’s why we’re crashing here.
So the W3_SERVER::Terminate is trying to access an invalid address. The address being referenced is somewhere in ADMWPROX.DLL as below:
0:000> ln 608fc030
This module is already unloaded as below and that's why we're crashing.
608f0000 60901000 ADMWPROX.DLL
78130000 781cb000 MSVCR80.dll
60060000 60066000 aspnet_filter.dll
10000000 1004c000 ISAPI_Rewrite.dll
5a320000 5a332000 w3isapi.dll
022e0000 022f7000 odbcint.dll
77530000 775c7000 COMCTL32.dll
7c8d0000 7d0ce000 SHELL32.dll
762b0000 762f9000 comdlg32.dll
48890000 488cd000 ODBC32.dll
01c90000 02167000 php5ts.dll
76cf0000 76d0a000 iphlpapi.dll
03670000 03726000 ZendOptimizer.dll
02180000 02185000 ZendExtensionManager.dll
02440000 02609000 php_mbstring.dll
02610000 02621000 php_bz2.dll
The ADMPROX.DLL is also our code. So the question is why are we holding a reference for something already unloaded? We noticed that indeed there were also some PHP modules loaded (at the time the dump was generated they were unloaded as well). One possible reason would be somebody COM initializing on a not owned thread (I've seen this before in a case where the IIS mmc got locked because a reference being hold by an unloaded module that called COM initialize in one of the IIS threads instead of its own) - More details about that can be obtained in the following article:
Next step is to live debug and set a bp in the CoInitializeEx (http://msdn2.microsoft.com/en-us/library/ms695279.aspx) and find out which module would be doing that (if this is the case – this is a theory and we are about to test it). We created a new app pool and website and pointed the web site to the same content folder as the one crashing and started working with this worker process instead. We had some hard time using file image execution options and tyring to get a debugger attached when the worker process gets spawned. We workaround the situation by automating the cdb to attach and open a socket so we would be able to connect to the socket via Windbg (I will post something specifically about this later on).
After attaching the windbg and setting the proper bp as below, we caught who else is COM intializing as below:
Setting the bp at the Ole32::CoInitializeEx
[116714-SERVER1\rack (tcp 192.168.1.202:1083)] 0:000> bp 776bc826 ".time;k100;g"
[116714-SERVER1\rack (tcp 192.168.1.202:1083)] 0:000> bl
0 e 776bc826 0001 (0001) 0:**** ole32!CoInitializeEx ".time;k100;g"
Bp got hit! This is php5ts.dll as below:
Debug session time: Thu Aug 2 19:50:03.357 2007 (GMT+1)
System Uptime: 8 days 14:47:04.585
Process Uptime: 0 days 0:01:08.125
Kernel time: 0 days 0:00:00.046
User time: 0 days 0:00:00.093
00fbe744 776a0d1f ole32!CoInitializeEx
00fbe754 01c186f8 ole32!CoInitialize+0xf
00fbe7b4 10001ec8 php5ts!php_request_startup+0x18
00fbecd4 5a3968ff w3isapi!ProcessIsapiRequest+0x100
00fbed08 5a3967e0 w3core+0x68ff
00fbed28 5a396764 w3core+0x67e0
00fbed48 5a3966f4 w3core+0x6764
This is not a PHP thread, so they were right when suspecting from PHP modules. You can’t call CoInitializeEx in a thread not owned by you. Next step is contact PHP vendor and let them know about the problem.
Later the customer informed us they were able to workaround the problem by simple downgrading the version of the PHP modules. This seems to happen only in the new PHP ISAPI versions.
Anyway it was a very good sample to ilustrate some debugging during the training.