We are currently facing an issue while opening and closing sessions with IIS application on AWS server.
IIS application using com+ component
AWS server m5.2xlarge (AMI = ami-0ba02848e01cae475)
Kerberos windows authentication
Windows server 2016 Standard
Used ports for the website : 1029 and 1036
Network load balancer (NLB) between 2 servers EC2 AWS.
Maximum number of simultaneous users: 400
Issue : After some hours of using the application (3 / 4 hours). Some of the users observe an infinite loading of the web page while trying to access the first login page through https url of the application. The number of users facing the issue increase with the time.
Some TCP connections from client (random port above 50000) to the server (port 1029 or 1036) are locked in “COSE_WAIT” status while trying to close the session.
An application pool recycling allows to close the opened TCP connections and correct the problem for the users.
The number of “CLOSE_WAIT” TCP connections locked increases with the number of users facing the issue.
This number of locked TCP connections can increase until 300 for one server. At this point, all the users are facing the issue.
RAM and CPU usage on the server are stable during the issue.
Notice that this behaviour is never observed with any other architecture (including AWS) where the same application is deployed.
Explored and discarded hypothesis :
Load balancer influence : we did a test without using the load balancer for a day (users directly access the application). We are still facing the issue.
The parameter “Regular Time Interval (Minutes)” : Set to “0”. Notice that the server is restarted each night.
([HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters) : not present in regedit.
1029 port : control test during a day with only the 1036 port.
Number of simultaneous TCP connections allowed : default value (49152)
McAfee Antivirus : control test during a day by disabling the antivirus.