HOWTO: Diagnose IIS6 failing to accept connections due to Connections_Refused
Recently, I have seen a bunch of questions asking: "IIS6 runs fine for X amount of time [where X varies from days to weeks] and then all of a sudden, it stops accepting all connections. If I restart/reboot the server, then it is fine again... until X amount of time later, when I have to repeat the same procedure."
Sometimes, the user even notices that HTTP Error logs in %SYSTEMROOT%\System32\LogFiles\HTTPERR\*.log mention Connections_Refused for all those requests, but what the heck does that mean?
Ok, ignoring the jokers in the back that are snickering "hmm, this is Microsoft software; what do you expect?" , this issue is actually very diagnosable and should be treatable, as I will shortly show...
Every four or five days the web site hoste on my machine will stop responding to HTTP requests on port 80.
IIS doesn't crash so I cannot get an IIS State log, just this:
Excerpt from the HTTPErr.Log:
(... working fine up until this ...)
2005-09-02 18:45:12 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:46:07 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:46:27 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:46:42 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:47:02 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:48:22 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:48:57 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:49:02 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:49:42 - - - - - - - - - 7_Connections_Refused -
2005-09-02 18:50:02 - - - - - - - - - 3_Connections_Refused -
2005-09-02 18:50:37 - - - - - - - - - 6_Connections_Refused -
2005-09-02 18:50:42 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:51:02 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:51:47 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:52:02 - - - - - - - - - 3_Connections_Refused -
2005-09-02 18:52:27 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:53:17 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:53:22 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:54:02 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:54:22 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:54:32 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:55:22 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:56:02 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:56:42 - - - - - - - - - 1_Connections_Refused -
2005-09-02 18:58:17 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:58:42 - - - - - - - - - 2_Connections_Refused -
2005-09-02 18:58:57 - - - - - - - - - 1_Connections_Refused -
I have to reboot the machine to get it to work again.
Does anyone have any ideas?
Connections_Refused is actually a very diagnosable failure pattern, and I am going to show you how to diagnose it and what is going on.
What does Connections_Refused Mean?
According to KB 820729, Connections_Refused means that the "Kernel Non Paged Pool memory has dropped below 20MB and HTTP.SYS has stopped receiving new connections". What all this means in English is this:
On Windows NT systems, there are many types of memory with different properties. Kernel Non Paged Pool (NPP) memory is one such type, and the important thing to know is that on x86 machines, the size of this memory is fixed and has a maximum size of 128MB. In other words, you may have an x86 machine with 4GB of RAM, but if you only use 128MB and that 128MB all comes from NPP memory, your system is still "out of memory" regardless of how much is actually in use.
[1/4/2006 - Correction - maximum size on x86 is "around 250MB" and 128MB with /3GB. For Windows Vista, it will be bounded by physical RAM and not arbitrary limit.]
HTTP.SYS, being a kernel-mode driver, uses NPP memory for every single connection that is active, and we made a conscious decision to have it stop accepting connections when NPP memory falls below a threshold, arbitrarily set at 20MB. This is reasonable because lots of bad things can occur if NPP memory is depleted.
For example: HTTP.SYS is likely the most active codebase in a web server, and if NPP memory gets depleted and HTTP.SYS needs NPP memory to accept a connection, the machine will blue screen. Now, regardless of the driver leaking NPP memory, HTTP.SYS looks like the culprit because it is most active. In any case, all of this is not good, so HTTP.SYS proactively refuses to get near the edge and plays it safe.
Hopefully, it is now clear what Connections_Refused means. HTTP.SYS is basically telling you "someone is using up a lot of NPP memory, and for protective reasons, I am going to stop accepting requests". We need to figure out what driver is using up all the NPP memory and address it, and the Connections_Refused should naturally go away because NPP memory will not be under pressure.
How to Diagnose this Issue
One of the nice properties of NPP memory is that a unique tag is associated with every piece of this memory, so we can always track down who is using what and how much of NPP memory. The user-mode tool to do this is called POOLMON.EXE (if you have a Kernel Debugger attached, you can use the !poolused command to get this info within the Kernel Debugger). Here are some additional bits of related information:
- KB 177415 - How to Use Memory Pool Monitor (Poolmon.exe) to Troubleshoot Kernel Mode Memory Leaks - since Windows Server 2003 has PoolTagging enabled by default, things just work.
- KB 298102 - How to Find Pool Tags that are used by Third-Party Drivers - this is basically searching for a NPP tag string within all drivers on the system hoping for a logical match...
I am now going to step through the POOLMON output of a user who had Connections_Refused and comment on what is going on. Here is a sample snippet of running POOLMON.EXE -b :
Memory: 1048016K Avail: 438396K PageFlts: 219 InRam Krnl: 3604K P:42372K Commit: 538908K Limit:11245092K Peak: 609576K Pool N:109056K P:43740K System pool information Tag Type Allocs Frees Diff Bytes Per Alloc ISil Nonp 1408366 ( 56) 1327427 ( 58) 80939 36678632 ( -872) 453<br> I100 Nonp 11048877 ( 217) 10967968 ( 219) 80909 14886928 ( -368) 183 PcNw Nonp 292427 ( 0) 88967 ( 0) 203460 12203296 ( 0) 59 SavE Nonp 1596066 ( 0) 1595659 ( 0) 407 11717016 ( 0) 28788 IAM Nonp 1946164 ( 49) 1904725 ( 50) 41439 8519576 ( -208) 205 tdLL Nonp 74748 ( 0) 69830 ( 4) 4918 2672832 ( -1776) 543 LSwi Nonp 1 ( 0) 0 ( 0) 1 2576384 ( 0) 2576384 R200 Nonp 22 ( 0) 1 ( 0) 21 2297816 ( 0) 109419 MmCm Nonp 302 ( 0) 48 ( 0) 254 2236960 ( 0) 8806 TCPt Nonp 190486 ( 4) 190462 ( 4) 24 1392264 ( 0) 58011 ULHP Nonp 3127 ( 0) 41 ( 0) 3086 1047216 ( 0) 339 PTrk Nonp 999750 ( 0) 996561 ( 0) 3189 956824 ( 0) 300 File Nonp 9628160 ( 65) 9623252 ( 65) 4908 750688 ( 0) 152 rg81 Nonp 6910 ( 0) 5294 ( 0) 1616 603864 ( 0) 373 Pool Nonp 6 ( 0) 3 ( 0) 3 602112 ( 0) 200704 Devi Nonp 680 ( 0) 262 ( 0) 418 558552 ( 0) 1336 Thre Nonp 417221 ( 6) 416457 ( 6) 764 476736 ( 0) 624 LSwr Nonp 128 ( 0) 0 ( 0) 128 416768 ( 0) 3256 Mm Nonp 26 ( 0) 2 ( 0) 24 379880 ( 0) 15828 AfdC Nonp 21989 ( 0) 19872 ( 1) 2117 338720 ( -160) 160 ....
- Notice that this system has 1GB RAM but still only 128MB NPP memory (109MB has been used, hence tripping the 20MB marker of HTTP.SYS).
- The NPP memory tags of HTTP.SYS all start with "UL" (for the trivia-inclined: as with all software projects, HTTP.SYS has gone through some evolution of its own. It used to be called something else - Universal Listener - hence its tag names all started with "UL").
- Notice that HTTP.SYS is not even in the top-10 NPP memory usage on the system, and its largest set of allocations is barely using 1MB total. As it turns out on this system, TOTAL NPP memory usage by HTTP.SYS was under 2MB, so clearly, it is not the aggressive user of NPP memory that is now causing issues.
- Meanwhile, focus your attention on the top four users of NPP memory, with tags "ISil", "I100", "PcNw", and "SavE". I looked up drivers for those tags, and I only find that "PcNw" comes from Microsoft and is associated with WDM audio. The others are all 3rd party drivers.
- Umm, this is a server, so is audio really that important? I checked on my system, and it was at 275KB, so clearly this user is/has done some heavy-duty audio stuff on this server. Thus, the memory may be legitimately used. You know, you might want to reconsider the effects of playing your MP3s on your Web Server's ability to accept connections...
- Meanwhile, Isil and I100 have suspiciously similar active allocations and ISIL is the stock ticker for a semiconductor company, so my guess is that I100 is related to a Fast Ethernet card... and since web servers totally rely on the network driver, having a memory-hog for a network driver cannot be a good thing and should be fixed. My guess for SavE is probably anti-virus.
- Now, I may be totally wrong with those arbitrary tag names, but one thing is for certain - IIS6 and HTTP.SYS is running just fine on this system. There are some other memory-hungry drivers running on the system, and if you get fixes for them such that your system will stay running longer, IIS6 will likely keep running as well
Note: Don't shoot the messenger (HTTP.SYS)! Always strive to figure out the root of your problem and address that. The rest of the problems should naturally fall away.