Several Good Ways to Trigger a “Hang Dump” of an Unresponsive Process
Introduction and Summary
Imagine a traffic jam where you can see cars and trucks backed up for miles. You can see the symptoms and know there is a problem. You know things have slowed to a crawl or to a stop. But that's all you can see from your vantage point. You can't see the choke point. You can't see what the real cause is of the traffic jam. All you can see are the lines of backed up cars. And it's your job to fix it! What do you do?
You can make guesses. "A car probably ran into another car and emergency response crews are probably slowing traffic down." That sounds reasonable.
But you have no idea which cars collided if indeed there was a collision. You have no idea if the emergency response crews are really there or not and if they're going to take care of the problem or not.
If only you could get a good glimpse of the root cause, right? Even just a good momentary snapshot of a few traffic cameras in the area would help unravel the mystery.
If you could see the snapshot from a webcam you might be surprised at what you find. . .
A parade of elephants?
That one snapshot can increase your understanding of the root cause of the problem can begin and you're far closer to finding a solution to the problem.
This is what I like about making "hang dumps" of user-mode processes when they're unhealthy. With a good memory dump made at the right time you can see what the various worker threads are doing. Some are doing work, some are waiting in a good way, one is causing a traffic jam, and several others are blocked by the jam. And with that snapshot of who is doing what, you get a chance to see what is causing the problem. You start to be able to see what is smoke and what is fire.
When a website hosted in IIS becomes inordinately slow or unresponsive, I am quick to recommend creating a dump of the hung w3wp.exe process.
When SharePoint workflows become unresponsive, I am quick to recommend creating a dump of the owstimer.exe process to begin to pinpoint the root cause.
When any process to become "hung," "jammed," "blocked," or otherwise unresponsive, I'm quick to trip a dump while the hang is occurring.
This blogpost recommends several good ways to create a "hang dump" of a process. First we'll see how to dump out a process with Task Manager. Then we'll see three ways to get the hang dump with Debug Diag 2.0.
Perhaps the quickest and easiest way on a Windows 2008, Windows 2008 R2, Windows 7, Windows 8, or Windows 2012 computer is to use task manager. If Outlook were unresponsive, for example, I would open Task Manager, select the Processes tab, find the process listed in the list, right-click it, select "Create Dump File," wait for the dump file to be written, and then click the option of End Task to kill the hung process. The beauty of this method is that Task Manager is already there. You don't have to install any tools. The memory dump it creates should be just as viable as dumps made by other tools.
Open task manager (one good way to do this is to press keyboard keys "Ctrl" + "Shift" + "Esc" simultaneously) and select the Processes tab.
If you know which process is unhealthy, you can locate it in the list of processes, right-click it, and select CREATE DUMP FILE.
Yeah, it can be that easy! Just make sure you dump it while it is unhealthy/unresponsive.
One common dilemma is that if you have multiple processes with the same name (often the case with w3wp.exe's) you may not know which one to dump. The main downside to using Taskman to get the dump is that sometimes it's difficult to figure out which process to dump. If you have a web server or SharePoint Web Front End server that has multiple Application Pools, sometimes it takes a bit of detective work to figure out which w3wp.exe corresponds to which application pool.
If you're not sure which one needs to be dumped, you could just methodically right-click and dump every one of the processes with the same process name.
Or maybe you can tell by the "user name" that the process runs as which w3wp.exe corresponds to which application pool. But that's not always going to work. Sometimes several App Pools might use the same user ID.
Or maybe you can tell by the size of the memory (private bytes) footprint. It's not uncommon for the unhealthy w3wp.exe to be the largest. But that's not always the case.
If the problem is due to high cpu, it may be quite obvious in taskman which process to dump. But often an unresponsive process has very low cpu. So cpu usage isn't always a good indicator.
One good way to figure out which w3wp.exe corresponds to which application pool is to run an appcmd command. Open a command prompt on the web server and run the following command to see which Application Pools line up with which PID numbers:
%windir%\system32\inetsrv\appcmd.exe list wp
Then of course I have to hunt around in Task Manager to find out how to view the PID numbers (process identifier) of the processes. But at least I can still quickly know which w3wp.exe is the one to dump. If you know which website is unresponsive, and you know which application pool that website is assigned to, and you know the pid number of that appPool, you should be able to figure out which PID to dump in Task Manager.
Another possible downside of using TaskMan for creating your dumps is that by default it will write the dump to a path like C:\Users\UserName\AppData\Local\Temp\ProcessName.DMP. That's not the easiest place to find. And maybe you don't want large dump files on your server's C drive.
Despite these two small drawbacks, all in all it is difficult to beat the convenience of the Taskman hang dump.
DebugDiag 2.0 – Installation and Launch
To have additional flexibility in making hang dumps consider installing Debug Diagnostics 2.0 on the server suffering with an unresonsive process. Download it from http://debugdiag.com.
When you launch debugdiag.msi it will give you an option to change the default installation path. If you prefer to not install tools to the system partition, be sure to select the Browse button in the earliest phase of the wizard.
Launch Debug Diagnostics 2.0 Collection from the list of programs.
If UAE is enabled on the server, you may need to use the "Run as Administrator" option to avoid errors during launch.
If you installed the tool to your system drive but would prefer that the memory dumps go to a different drive, expand the Tools menu and select Options and Settings.
Change the manual user dump save folder path and click OK
DebugDiag – Option 1: Create Full User Dump from the Processes Tab
After debugdiag launches, click the CANCEL button on the Select Rule Type window
Switch to the Processes tab in DebugDiag
Sort by process name or by whatever column makes the most sense.
For IIS web app pool problems, note especially the column for Web Application Pool name. With this cool feature there is no need to run appcmd list wp.
To trigger the hang dump, right-click the process you wish to dump and select "Create Full Userdump."
Be sure not to select MINI userdump.
DebugDiag – Option 2: Dump ALL the IIS processes from the Tools menu
If all you know is that one of your IIS websites has a problem and you don't have time to figure out which application pool to focus on,
you might consider expanding the Tools menu
and selecting CREATE IIS/COM+ HANG DUMP.
This will create hang dumps of all the IIS related processes.
DebugDiag – Option 3: Set up a performance rule to detect the problem and automagically dump the process
Sometimes the hangs happen at 4 AM and no one is around to respond to the hang.
In that case, you might want to create a performance rule to detect the hang and automatically make the dump.
From the rules tab, select Add Rule, place a bullet beside Performance, and click NEXT.
I'm not going to go through these scenarios here, but check out how cool these options are.
Your "performance rule" can predicate actions (such as triggering a hang dump) based on what perf counters say or based on checking a URL for responsiveness.
Definitely a cool option to have!
Preliminary Analysis of the Hang Dump
This step can be done from a server if (1) that server has outbound http access to the internet [specifically to Microsoft's public symbol servers] and (2) if you don't mind risking something that is fairly CPU intensive. It may make more sense to install Debug Diag 2.0 (or just the analysis piece) on a workstation that can access the internet and place the crash .dmp file on a share that it can reach.
Launch the Debug Diag Analysis program from the list of programs.
Place a checkmark beside CrashHangAnalysis. Don't select any other analysis rules.
Select "Add Data Files" and guide it to the crash dump.
Select START ANALYSIS and wait for results to display.
Zipping the Dump
If you'd like to zip the dump file up in preparation to upload to an engineer at Microsoft, here is a good way to do it.
Expand the debugdiag Tools menu, select Advanced Data Collection, select Create Full Cabinet file. This should both collect and compress the event logs, the .net config files, the dump files, and more into one convenient .cab file.
You can locate the .cab file by clicking the icon of the manila file folder.
Some people really like Procdump. I haven't used it much myself but I thought it was worthy of mention.
By Mark Russinovich
ProcDump is a command-line utility whose primary purpose is monitoring an application for CPU spikes and generating crash dumps during a spike that an administrator or developer can use to determine the cause of the spike. ProcDump also includes hung window monitoring (using the same definition of a window hang that Windows and Task Manager use), unhandled exception monitoring and can generate dumps based on the values of system performance counters. It also can serve as a general process dump utility that you can embed in other scripts.