question

AshifShaikh-9988 avatar image
0 Votes"
AshifShaikh-9988 asked Criszhan-msft edited

Why Restarting SQL Server fixed all the query performance issues and CPU decreased from 100% to 20%

Hello All,

Just intrigued to get some insights on why all the query performance issue with my SQL Server 2016 SP2 CU5 (Enterprise) was fixed by just a SQL Server restart.

Some background:

We had a network degradation starting from 7:30 AM however it was resolved by 10:30 AM but unfortunately the queries were still behaving badly and became worst once the load (throughput) increased on the SQL Server.

We have a 5 nodes Always on Cluster with 3 nodes in a Data Center and 2 nodes in AWS and the network degradation happened between the physical data center and AWS (Writer node is in the physical data center and 30% of the traffic comes from AWS)

I got the following error during the network issue at 10 AM:

DESCRIPTION: A fatal error occurred while reading the input stream from the network. The session will be terminated (input error: 10060, output error: 0).

Apart from this particular error there were no other errors specifically pointing to any degradation such as memory pressure\disk IO etc.

During the high CPU on the DB all I can see that even the smallest query (querying tables on few rows) is also taking time.

There were no memory pressure alerts\errors and disk was also performing optimally.

IMO it seems to be related to plan cache which eventually got cleared after the restart but how can I prove it??




sql-server-generalsql-server-transact-sqlsql-server-migration
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,
Restarting the SQL Server service will clear all the contents of the buffer pool and all compiled execution plans in the plan cache, then read&compile again when needed.
The performance problem (high CPU usage) was fixed after restarting the SQL Server service, and it did not recur for a period of time. I also think this may be caused by the bad plan in the previous cache.
Next time this problem occurs, you can try to use the DBCC FREEPROCCACHE command as suggested by Erland to clear the plan cache, or use the command ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE(individual database level).


0 Votes 0 ·
ErlandSommarskog avatar image
1 Vote"
ErlandSommarskog answered

First, you are a bit behind on the CUs. The most recent CU is CU17, you can get it here: https://www.microsoft.com/en-us/download/details.aspx?id=56975.

Yes, it sounds like a plan-cache issue. That is, you have a bad plan in the cache, and by restarting SQL Server you clear everything out.

To prove/disprove this hypothesis, next time issue DBCC FREEPROCCACHE to clear the plan cache. If this fixes the issue, it was the plan cache. In such case, Query Store is your friend to find the culprit. (If you have not enabled Query Store, do so!)

If FREEPROCCACHE does not help, it is obviously something else. I would start with looking into the SQL Server errorlog to see what is reported there. And I would definitely slap on CU17, as it could be an issue that has been fixed.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AshifShaikh-9988 avatar image
0 Votes"
AshifShaikh-9988 answered

84997-wait-type.png




Some more info on the waits during the issue.


wait-type.png (20.2 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.