Describe wait statistics
One holistic way of monitoring server performance is to evaluate what the server is waiting on. SQL Server is instrumented with a wait tracking system, which monitors each running thread and logs what resources the thread is waiting on. Wait statistics are broken down into three types of waits: resource waits, queue waits, and external waits.
- Resource waits occur when a worker thread in SQL Server requests access to a resource that is currently being used by a thread. Examples of resources waits are locks, latches, and disk I/O waits.
- Queue waits occur when a worker thread is idle and waiting for work to be assigned. Example queue waits are deadlock monitoring and deleted record cleanup.
- External waits occur when SQL Server is waiting on an external process like a linked server query to complete. An example of an external wait is a network wait related to returning a large result set to a client application.
This data is aggregated in the DMV sys.dm_os_wait_stats (and in sys.dm_db_wait_stats in Azure SQL Database) and is also tracked for active sessions in sys.dm_exec_session_wait_stats. These statistics (SQL Server tracks over 900 wait types) allow the DBA to get an overview of the performance of the server, and to readily identify configuration or hardware issues. This data is persisted from the time of instance startup, but the data can be cleared as needed to observe changes.
Wait statistics are evaluated as a percentage of the total waits on the server.
The results of this query from sys.dm_os_wait_stats shows the wait type, and the aggregation of Percent of time waiting (Wait Percentage) and the average wait time in seconds for each wait type. In this case, the server has Always On Availability Groups in place, as indicated by the REDO_THREAD_PENDING_WORK and PARALLEL_REDO_TRAN_TURN wait types. The relatively high percentage of CXPACKET and SOS_SCHEDULER_YIELD waits indicates that this server is under some CPU pressure.
There are many wait types that relate to specific and common SQL Server performance issues:
- RESOURCE_SEMAPHORE waits—this wait type is indicative of queries waiting on memory to become available, and may indicate excessive memory grants to some queries. This problem is typically observed by long query runtimes or even time outs. These wait types can be caused by out-of-date statistics, missing indexes, and excessive query concurrency.
- LCK_M_X waits—frequent occurrences of this wait type can indicate a blocking problem, that can be solved by either changing to the READ COMMITTED SNAPSHOT isolation level, or making changes in indexing to reduce transaction times, or possibly better transaction management within T-SQL code.
- PAGEIOLATCH_SH waits—this wait type can indicate a problem with indexes (or a lack of useful indexes), where SQL Server is scanning too much data. Alternatively, if the wait count is low, but the wait time is high, it can indicate storage performance problems. You can observe this behavior by analyzing the data in the waiting_tasks_count and the wait_time_ms in the sys.dm_os_wait_stats DMV, to calculate an average wait time for a given wait type.
- SOS_SCHEDULER_YIELD waits---this wait type can indicate high CPU utilization, which is correlated with either high number of large scans, or missing indexes, often in conjunction with high numbers of CXPACKET waits.
- CXPACKET waits—If this wait type is high it can indicate improper configuration. Prior to SQL Server 2019, the Max Degree of Parallelism (MaxDOP) default setting is to use all available CPUs for queries. Additionally, the cost threshold for parallelism (CTfP) setting defaults to 5, which can lead to small queries being executed in parallel, which can limit throughput. Lowering MaxDOP and increasing CTfP can reduce this wait type, but the CXPACKET wait type can also indicate high CPU utilization, which is typically resolved through index tuning.
- PAGEIOLATCH_UP—This wait type on data pages 2:1:1 can indicate TempDB contention on Page Free Space (PFS) data pages. Each data file has one PFS page per approximately 64MB of data. This wait is typically caused by only having one TempDB file, as prior to SQL Server 2016 the default behavior was to use one data file for TempDB. The best practice is to use one file per CPU core up to eight files. It is also important to ensure your TempDB data files are the same size and have the same autogrowth settings to ensure they are used evenly. SQL Server 2016 and higher control the growth of TempDB data files to ensure they grow in a consistent, simultaneous fashion.
In addition to the aforementioned DMVs, the Query Store tracks waits associated with a given query. This data is not tracked at the same granularity as the data in the DMVs but can provide a nice overview of what a query is waiting on.