SQL Server 2012 - True Black Box Recorder

Article
03/08/2012

This would be a perfect time to post a blog talking about the new SQL Server 2012 features. However, I am going to leave that activity to the marketing folks (AlwaysOn, T-SQL Enhancements, … ). I want to talk about something that might not appear on the top of a marketing or sales checklist but for any of us that have "carried the pager" (showing my age ad experience now) I truly feel SQL Server 2012 has provided the first, real Black Box Recorder to help us.

I have the unique opportunity to be involved with product planning/development cycles providing feedback and analysis on improving support/maintenance and other such aspects of SQL Server to the SQL Server development team. One of these efforts started several years ago. It started under an effort to improve cluster failover diagnostics and capabilities. A team of us gathered up dozens of scenarios that impacts the folks 'carrying the pager'. Why did the system stop responding? Why did it failover? Was it blocking? Was it I/O bottleneck? What is ######?

The team performed ranking exercises and finally settled on 32 scenarios we felt we could attack, eliminate or significantly change the impact and troubleshooting in a quality and repeatable fashion with very small or no impact on performance. During this exercise I was able to convince, with very little effort I might add, the team to extend the behavior to all SKUs and installations (except SQL Express). The 32 scenarios are not unique to clustering and provide a black box for support and administrators on any instance of SQL Server.

In the past we have added the default trace (MSSQL\LOG\*.TRC) but it was limited. We introduced MDW but you had to set that up on the instances you desired to monitor. We wanted default tracing, high speed, containing the information that is important to understanding the system state currently and historically. Many of these techniques relied on external processes attempting to query various DMVs and states for the SQL Server instance(s). They are all helpful but SQL Server knows about its own health state and we wanted to remove dependencies such as a live connection to the SQL Server because a simple networking issue caused loss of diagnostics.

The design landed on an extension of the internal monitoring behaviors in SQL Server. The same team (SQLOS) that added Scheduler Monitor, Resource Monitor, XEvent's and other facilities designed sp_server_diagnostics.

sp_server_diagnostics (Books Online Reference: https://msdn.microsoft.com/en-us/library/ff878233(v=sql.110).aspx)

This is a new, internal procedure, implemented on a preemptive thread at high priority. It accepts the interval parameter, which is uses to output a result set on the interval. It is designed to have its own reserved memory, allocated one time and it avoids locking and synchronization objects so it will provide output even when the SQL Server has encountered a problem.

I spent lots of time testing the stability and capabilities of sp_server_diagnostics. You can ask the SQL Server development team because they had to fix several issues they never dreamed I would even attempt - but I did .

I setup a USB drive on my laptop.

On the laptop I started multiple copies of SQLIOSim against the USD drive.

On my desktop I created a database, over SMB 2.0, on that USB drive.

On my desktop I started stress against the instance (Bulk Insert, sorts, hashes, index builds…).

On my desktop I used utilities such as Consume.exe to exhaust non/page-pool from Windows.

On my desktop I forced max server memory to the minimum limit.

SQL Server was returning out of memory errors, I/O delays, etc…

Even a command prompt on the desktop was not redrawing properly because I had the system low on desktop heap as well.

The default traces supported by XEvent and sp_server_diagnostics kept running properly, showing me things like memory pressure, blocking, I/O delays and other details.

The design is implemented on an internal based, preemptive, high priority thread. It actually uses XEvent sessions (internally) to help monitor various aspects of the SQL Server service. This allows us to avoid issues such as a client network connection and just write the data directly to the XEvent files in the LOG directory.

System - Collects data from a system perspective on spinlocks, severe processing conditions, non-yielding tasks, page faults, and CPU usage.
Resource - Collects data from a resource perspective on physical and virtual memory, buffer pools, pages, cache and other memory objects.
Query Processing - Collects data from a query-processing perspective on the worker threads, tasks, wait types, CPU intensive sessions, and blocking tasks.
IO Subsystem - Collects data on IO. In addition to diagnostic data.
Events - Collects data and surfaces through the stored procedure on the errors and events of interest recorded by the server, including details about ring buffer exceptions, ring buffer events about memory broker, out of memory, scheduler monitor, buffer pool, spinlocks, security, and connectivity .

Each of these categories is output as a row in the result set returned from sp_server_diagnostics on/at the designated interval. A key here is that once sp_server_diagnostics is started it will keep returning information on the indicated interval. A consumer of sp_server_diagnostics just keeps getting a new result set. Let me explain this in more detail.

Prior to SQL Server 2012 the cluster failover design was to execute a select @@SERVERNAME query ~30 seconds. If this failed we did a couple of retries and then returned an error condition to the cluster manager. Each time the query had to be parsed, complied and executed, etc… Lots of moving parts, memory allocations, execution behaviors and other things that could go wrong and it left very little information for why it made the failure decision.

For a clustered instance or availability group (AlwaysOn) AG resource, the SQL Server 2012 design is to establish a connection at the time the cluster online activity occurs (separate/dedicated thread in the cluster manager) and retrain the connection until the offline condition is met. This connection issues the sp_server_diagnostics request with a default of 5 second interval. (See the custom cluster properties to adjust the interval but anything less than 5 seconds is not recommended! )

The server returns a result set on/at every interval. This information is then used to determine the health state of the SQL Server instance, using the new failover health designation to determine the failover behavior. The failover designation (https://msdn.microsoft.com/en-us/library/ff878664(v=sql.110).aspx Failover Policy - HealthCheckTimeout, … ) allows you to determine a level of failover for a failover instance or a specific availability group. You can elect to failover only if System returns and error or perhaps you need a more sensitive failover so you elect to failover if any of the states returns an error.

This is a really great improvement as the administrator has more control over the failover but I mentioned the black box capability. All the result information is recorded in a set of XEvent, rollover files in your LOG directory. This includes the results from the sp_server_diagnostics as well as the cluster, resource DLL decisions. Furthermore, the SQL Server instance always starts a default XEvent trace, also stored in your LOG directory that records the same result set information along with additional XEvent's.

Any instance of SQL Server (except SQL Express) has the default trace in the LOG directory and those instances clustered or AlwaysOn enabled contain additional information about failover decisions made by the cluster resource DLL.

I feel like I am doing in infomercial now. Yet, that is not all! Any instance that has an availability group also starts a Availability Group black box trace, also placed in the LOG directory, which records AG specific activities (such as DDL changes).

Using SQL Server Management Studio you can open any of the .XEL files or better yet you can use the XEL Merge capabilities to open multiple files, sort by timestamp and see what happened. The targets for the collection will provide you with hours and in some cases days of historical information. You can now answer the 6pm question as to why the server was slow at 4pm because you can see memory, blocking basics, I/O, … or other health states.

Anyone Can Execute (Limit how many of these are active)

Any user, with appropriate permissions, can execute sp_server_diagnostics. I wanted this for support so we could collect and trigger additional diagnostics, for example triggering a mini-dump. While you can run this procedure you want to keep the number of these limited to avoid any impact on performance.

Alter Server Configuration

The log files can be controlled with ALTER SERVER CONFIGURATION … DIAGNOSTICS LOG T-SQL command.

Example Output

18:20.7	system	1	clean	<system spinlockBackoffs="0" sickSpinlockType="none" sickSpinlockTypeAfterAv="none" latchWarnings="0" isAccessViolationOccurred="0" writeAccessViolationCount="0" totalDumpRequests="0" intervalDumpRequests="0" nonYieldingTasksReported="0" pageFaults="0" systemCpuUtilization="12" sqlCpuUtilization="0"/>
18:20.7	resource	1	clean	<resource lastNotification="RESOURCE_NOINFO" outOfMemoryExceptions="0" isAnyPoolOutOfMemory="0" systemOutOfMemoryPeriod="0"><memoryReport name="Memory Manager" unit="KB"><entry description="VM Reserved" value="2204440"/><entry description="VM Committed" v
18:20.7	query_processing	1	clean	<queryProcessing maxWorkers="512" workersCreated="517" workersIdle="493" tasksCompletedWithinInterval="9" pendingTasks="0" oldestPendingTaskWaitingTime="0" hasUnresolvableDeadlockOccurred="0" hasDeadlockedSchedulersOccurred="0" trackingNonYieldingScheduler="0x0"><topWaits><nonPreemptive><byCount><wait waitType="ASYNC_NETWORK_IO" waits="1293727" averageWaitTime="24668922" maxWaitTime="583"/><wait waitType="THREADPOOL" waits="5286" averageWaitTime="4522428" maxWaitTime="7982"/><wait waitType="SNI_CRITICAL_SECTION" waits="498" averageWaitTime="311" maxWaitTime="60"/><wait waitType="CMEMTHREAD" waits="314" averageWaitTime="294" maxWaitTime="19"/><wait waitType="PAGEIOLATCH_SH" waits="160" averageWaitTime="613" maxWaitTime="44"/><wait waitType="IO_COMPLETION" waits="131" averageWaitTime="214" maxWaitTime="40"/><wait waitType="WRITE_COMPLETION" waits="60" averageWaitTime="274" maxWaitTime="13"/><wait waitType="PAGEIOLATCH_EX" waits="22" averageWaitTime="108" maxWaitTime="31"/><wait waitType="SLEEP_BPOOL_FLUSH" waits="12" averageWaitTime="64" maxWaitTime="23"/><wait waitType="WRITELOG" waits="9" averageWaitTime="41" maxWaitTime="10"/></byCount><byDuration><wait waitType="ASYNC_NETWORK_IO" waits="1293727" averageWaitTime="24668922" maxWaitTime="583"/><wait waitType="THREADPOOL" waits="5286" averageWaitTime="4522428" maxWaitTime="7982"/><wait waitType="LCK_M_S" waits="1" averageWaitTime="614" maxWaitTime="614"/><wait waitType="PAGEIOLATCH_SH" waits="160" averageWaitTime="613" maxWaitTime="44"/><wait waitType="SLEEP_TEMPDBSTARTUP" waits="1" averageWaitTime="451" maxWaitTime="451"/><wait waitType="SNI_CRITICAL_SECTION" waits="498" averageWaitTime="311" maxWaitTime="60"/><wait waitType="CMEMTHREAD" waits="314" averageWaitTime="294" maxWaitTime="19"/><wait waitType="WRITE_COMPLETION" waits="60" averageWaitTime="274" maxWaitTime="13"/><wait waitType="IO_COMPLETION" waits="131" averageWaitTime="214" maxWaitTime="40"/><wait waitType="ASYNC_IO_COMPLETION" waits="2" averageWaitTime="181" maxWaitTime="166"/></byDuration></nonPreemptive><preemptive><byCount><wait waitType="PREEMPTIVE_OS_ENCRYPTMESSAGE" waits="1303246" averageWaitTime="120319807" maxWaitTime="982"/><wait waitType="PREEMPTIVE_OS_DECRYPTMESSAGE" waits="200037" averageWaitTime="1613" maxWaitTime="9"/><wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="15430" averageWaitTime="125967" maxWaitTime="336"/><wait waitType="PREEMPTIVE_OS_DELETESECURITYCONTEXT" waits="3439" averageWaitTime="71074" maxWaitTime="287"/><wait waitType="PREEMPTIVE_OS_DISCONNECTNAMEDPIPE" waits="3439" averageWaitTime="84002" maxWaitTime="284"/><wait waitType="PREEMPTIVE_OS_AUTHORIZATIONOPS" waits="2541" averageWaitTime="27749" maxWaitTime="331"/><wait waitType="PREEMPTIVE_OS_LOOKUPACCOUNTSID" waits="2528" averageWaitTime="24563" maxWaitTime="158"/><wait waitType="PREEMPTIVE_OS_REVERTTOSELF" waits="2527" averageWaitTime="19438" maxWaitTime="176"/><wait waitType="PREEMPTIVE_OS_WAITFORSINGLEOBJECT" waits="1298" averageWaitTime="4055" maxWaitTime="191"/><wait waitType="PREEMPTIVE_XE_CALLBACKEXECUTE" waits="786" averageWaitTime="130" maxWaitTime="121"/></byCount><byDuration><wait waitType="PREEMPTIVE_OS_ENCRYPTMESSAGE" waits="1303246" averageWaitTime="120319807" maxWaitTime="982"/><wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="15430" averageWaitTime="125967" maxWaitTime="336"/><wait waitType="PREEMPTIVE_OS_DISCONNECTNAMEDPIPE" waits="3439" averageWaitTime="84002" maxWaitTime="284"/><wait waitType="PREEMPTIVE_OS_DELETESECURITYCONTEXT" waits="3439" averageWaitTime="71074" maxWaitTime="287"/><wait waitType="PREEMPTIVE_OS_AUTHORIZATIONOPS" waits="2541" averageWaitTime="27749" maxWaitTime="331"/><wait waitType="PREEMPTIVE_OS_LOOKUPACCOUNTSID" waits="2528" averageWaitTime="24563" maxWaitTime="158"/><wait waitType="PREEMPTIVE_OS_REVERTTOSELF" waits="2527" averageWaitTime="19438" maxWaitTime="176"/><wait waitType="PREEMPTIVE_OS_WAITFORSINGLEOBJECT" waits="1298" averageWaitTime="4055" maxWaitTime="191"/><wait waitType="PREEMPTIVE_OS_DECRYPTMESSAGE" waits="200037" averageWaitTime="1613" maxWaitTime="9"/><wait waitType="PREEMPTIVE_OS_DOMAINSERVICESOPS" waits="1" averageWaitTime="1029" maxWaitTime="1029"/></byDuration></preemptive></topWaits><cpuIntensiveRequests><request sessionId="53" requestId="0" command="EXECUTE" taskAddress="0xf6e98aa8" cpuUtilization="0" cpuTimeMs="734"/></cpuIntensiveRequests><pendingTasks></pendingTasks><blockingTasks></blockingTasks></queryProcessing>
18:20.7	io_subsystem	1	clean	<ioSubsystem ioLatchTimeouts="0" intervalLongIos="0" totalLongIos="0"><longestPendingRequests></longestPendingRequests></ioSubsystem>
18:20.7	events	0	unknown	<events><session startTime="2010-02-25T09:16:35.693" droppedEvents="0" largestDroppedEvent="0"><RingBufferTarget truncated="0" eventsPerSec="133214" processingTime="14" totalEventsProcessed="1865" eventCount="45" droppedCount="0" memoryUsed="32582"><event

RTM Issue

There is one issue with this in RTM that is scheduled to be corrected in SQL Server 2012 Service Pack 1 that Mike Wachal has outlined: https://blogs.msdn.com/b/extended_events/archive/2012/02/17/issues-with-the-system-health-session-in-sql-server-2012.aspx (Issues with the system_health session in SQL Server 2012)

Bob Dorr - Principal SQL Server Escalation Engineer

SQL Server 2012 - True Black Box Recorder

Additional resources