Server "Hangs" and Ephemeral Port Exhaustion issues
Recently I've run into several issues caused by ephemeral port exhaustion.
The issues come to us with several different symptoms and behaviors - Some of which are listed here.
The server is hung/ frozen/ unresponsive.
I'm unable to access the internet/ network file share.
I can't logon to the domain.
There are various other issues that may occur but those seem to be the common complaints. A reboot will fix the problem.
A memory dump will confirm but is a one-shot chance at data gathering. In the case of port exhaustion we can use one or two tools to quickly pinpoint the problem without taking down the system (and needing to wait for the issue to reoccur)
Let's dig into more details of "What Works/What doesn’t" and highlight the tools to confirm or discount Ephemeral Port Exhaustion (EPE for short, since I cannot spell Ephemeral or Exhaustion)
(Note: In Performance troubleshooting there are just a few things that should be banned from our vocabulary ;)
- Describing any issue only using the words Hang, Slow, Froze
- Telling me something Fails without describing the steps to reproduce the Fail and the exact results including any error message)
- Description: Server will "Hang"
- Questioning what that means: we'll find that one of the symptoms is that RDP connections "Fail"
- RDP Connections "Fail"
- Digging into that we'll find that on a 2003/XP system you will get "Bitmaps" and also be able to type in username and password in the CAD but...
- After entering the credentials the desktop will never populate or an error message:
- Unable to process the request
- Access is denied
- There are currently no logon servers available to service the logon request
- The system could not log you on.
- Existing connected users may still work unless they trigger any authentication to the DC
- Ping to the system and out of the system will work Nslookup may work with UDP but will fail if forced to use TCP (nslookup -v)
Some tools we have monitor systems over time. Typically if the issue is not happening right Now and cannot be reproduced on demand we'll have to gather data for a few hours/days/weeks until the issue does return.
Perfmon will show high number of handles in an application and/or in System that gradually increase (Process\Handle Count\*)
Poolmon (in the APP_HandleCount file Poolmon3vbs version) you may see I high number here:
Live Troubleshooting tools:
Without going into too much detail - there are three basic communication routes to the server, and three from the server. Each communication requires a socket connection. Just like any 3 prong power strip - there are three things that make a socket: Port, Protocol, IP
When we test the responsiveness of the computer the first tests should be to see what works In to the system and also Out from the system
NSLookup is a simple
command line tool that checks DNS records and resolved names for us. It goes off the box to the domain controller
or DNS server, makes a socket connection and returns the information requested.
|Ping||ICMP||Basically, checksto see if the Network is 'alive'|
|NSLookup||UCP||53 (DNS)||Destination system||BothInbound/Outbound|
|NSLookup -v||TDP||53 (DNS)||Destination system||BothInbound/Outbound|
In this scenario
Ping will work to the system and from the system.
NSLookup will check to see if outbound UDP works
NSLookup -v will force TCP and check to see if outbound TCP connections work
If outbound TCP connections fail then we move to…
NetStat - ANO:
A warning about netstat -ano
While helpful it may not always show all the open ports. You can use it to look, but if it doesn't show a ton of connections - don't be fooled… dig deeper
If the issue is currently happening we can check handle information with Process explorer
In the Main Process Explorer window we have to make a few changes to see the information we want:
- Right click the column header and add Handle Column:
2. From the menu list add Show Lower Pane and select Handles
Tip: Move the Handle Column closer to the Process Name (for ease of use) And sort by handle count.
For this example, pretend svchost is the highest consumer. Once selected you will see the Type of handles listed and if we have EPE you will could see several
thousand FILE handles with Name \Device\AFD or Device\TCP:
Restarting that application will instantly resolve the issue.
If the high handle count is in System process then there is probably another application that is telling System process to do all its work. A reboot (and if possible, Full memory dump) is the only way to clear System handles and get more data.
If the handles are not in TCP or AFD - keep digging! It still is a valid test to restart the application, and if it's a 3rd party application, restarting it and confirming the server returns to "normal" should be enough proof that the application is at fault.
|Windows 2003/xp||Port numbers 1024through 5000|
|Windows 2008+||Port numbers 49152to 65535|
For example, many communications will start on Fixed port numbers (3389, 145, 25 110 are all examples of known fixed ports) and if the application needs additional connections it will then spawn a conversation on a dynamic port(s)
If the Applications do not close the conversation correctly, the port will be left connected - using a handle and possibly other resources (NPP, PP, Threads etc) Since there are a limited number of
Ephemeral Ports we can eventually run out.
Imagine someone in the office picking up every phone, making a call and not hanging up. Every phone in use means no one else can call out. You can still work, if you do not make any outbound phone calls.
In the case of this type of Server "Hangs":
The mouse works on the console
Keyboard works on the console
Local logon will likely work on the console and RDP
Existing connections where no authentication takes place (where Kerberos is going off the box for verification) will work (file shares, currently connected RDP users)
Ping will work (ICMP)
UDP connections will work (NSLookup)
TCP Connections Into the box will work
TCP connection from the box outside will fail. (Nslookup -v)
Always dig into the exact behavior of Hang, Fail, Frozen, and Unresponsive by testing mouse, keyboard, and inbound and outbound network connectivity on various protocols.