I have a website in production that's been operational for close to 4-5 years. About a month ago, the users of the website have started experiencing service disruptions. These disruptions manifest themselves in the form of freezes when the user navigates from one page to the other. The website is developed in ASP.NET with MVC and is deployed under IIS on two identical web servers that sit behind a load balancer that manages the traffic. Through Chrome's DevTools screen, I can see that stylesheet or script files fail to transfer to the client side. I can see the HTTP Status of the call listed as 200 but when I look at the Timing tab, I can see this Caution message:
After 3+ minutes, the Network tab in the DevTools screen will show a "(failed)net::ERR_CONNECTION_RESET" error in the status of the call, like this:
These errors are intermittent. If I stop the browser from spinning and I refresh, the page will load fine. I navigate to 2-3 more pages and then the 4th one will freeze again. Basically the website is unusable for the customers.
Since those are static resources that are served by IIS through simple GET commands, I can get their URL and drop it in a browser. If I do this repeatedly, at some point, I will experience the same freeze.
Before anyone askes, I did check and caching on the web servers has not been disabled. I took turns stopping the web servers and tried running the website with only one web server up. The problem occurs in both cases. I looked and both servers have plenty of space on the hard-drive and the CPU does not go higher than 50% at all times. Both servers have the most recent Windows patches updated. The configuration is 64-bit Windows Server 2012 R2. Both servers have been rebooted multiple times.
Is there a way to trace, at the IIS level, what happens with these calls that are not served to the client? I can stop one of the web servers so that the entire traffic is routed through the other server, thus making the tracing of my calls easier. Our website has not been updated since October of 2018 so whatever created this issue must be an operating system update or a change in the network settings. Our client's IT department have analyzed all the recent changes and updates made recently and none should cause the problems that we see. Are there things that we should do, trying to figure out what is the issue?
Since this is a production environment and also a revenue collection system, we are under extreme pressure to figure out what is the issue.
Any help or suggestion will be highly appreciated.