SQL Trace Architecture and Terminology
SQL Trace is a SQL Server database engine technology, and it is important to understand that the client-side Profiler tool is really nothing more than a wrapper over the server-side functionality. When tracing, we monitor for specific events, which are generated when various actions occur in the database engine. For example, a user login or the execution of a query are each actions that cause events to fire. Each event has an associated collection of columns, which are attributes that contain data collected when the event fires. For instance, in the case of a query we can collect data about when the query started, how long it took, and how much CPU time it used. Finally, each trace can specify filters, which limit the results returned based on a set of criteria. One could, for example, specify that only events that took longer than 50 milliseconds should be returned.
With over 170 events and 65 columns to choose from, the number of data points that can be collected is quite large. Not every column can be used with every event, but the complete set of allowed combinations is almost 4,000. Thinking about memory utilization to hold all of this data and the processor time needed to create it, you might be interested in how SQL Server manages to keep itself running efficiently while generating so much information. The answer is that SQL Server doesn’t actually collect any data at all until someone asks for it—the model instead is to selectively enable collection only as necessary.
Internal Trace Components
The central component of the SQL Trace architecture is the trace controller, which is a shared resource that manages all traces created by any consumer. Throughout the database engine are various event producers; for example, they are found in the query processor, lock manager, and cache manager. Each of these producers is responsible for generating events that pertain to certain categories of server activity, but each of the producers are disabled by default, and therefore generate no data. When a user requests that a trace be started for a certain event, a global bitmap in the trace controller is updated, letting the event producer know that at least one trace is listening, and causing the event to begin firing. Managed along with this bitmap is a secondary list of which traces are monitoring which events.
Once an event fires, its data is routed into a global event sink, which queues the event data for distribution to each trace that is actively listening. The trace controller routes the data to each listening trace based on its internal list of traces and watched events. In addition to the trace controller’s own lists, each individual trace also keeps track of which events it is monitoring, along with which columns are actually being used as well as what filters are in place. The event data returned by the trace controller to each trace is filtered, and the data columns are trimmed down as necessary, before the data is routed to an I/O provider.
Trace I/O Providers
The trace I/O providers are what actually send the data along to its final destination. The available output formats for trace data are either a file on the database server (or a network share) or a rowset to a client. Both providers use internal buffers to ensure that if the data is not consumed quickly enough (that is, written to disk or read from the rowset) that it will be queued. However, there is a big difference in how the providers handle a situation in which the queue grows beyond a manageable size.
The file provider is designed with a guarantee that no event data will be lost. To make this work even if an I/O slowdown or stall occurs, the internal buffers begin to fill if disk writes are not occurring quickly enough. Once the buffers fill up, threads sending event data to the trace begin waiting for buffer space to free up. In order to avoid threads waiting on trace buffers, it is imperative to ensure that tracing is done to a fast enough disk system. To monitor for these waits, watch the SQLTRACE_LOCK and IO_COMPLETION wait types.
The rowset provider, on the other hand, is not designed to make any data loss guarantees. If data is not being consumed quickly enough and its internal buffers fill, it waits up to 20 seconds before it begins jettisoning events in order to free buffers to get things moving. The SQL Server Profiler client tool will send a special error message if events are getting dropped, but you can also find out if you’re headed in that direction by monitoring SQL Server’s TRACEWRITE wait type, which is incremented as threads are waiting for buffers to free up.
A background trace management thread is also started whenever at least one trace is active on the server. This background thread is responsible for flushing file provider buffers (done every 4 seconds), in addition to closing rowset-based traces that are considered to be expired (this occurs if a trace has been dropping events for more than 10 minutes). By flushing the file provider buffers only occasionally, rather than writing the data to disk every time an event is collected, SQL Server can take advantage of large block writes, dramatically reducing the overhead of tracing, especially on extremely active servers.
A common question asked by DBAs new to SQL Server is why no provider exists that can write trace data directly to a table. The reason for this limitation comes down to the amount of overhead that would be required. Because a table does not support large block writes, SQL Server would have to write the event data row by row. The performance degradation caused by event consumption would require either dropping a lot of events or, if a lossless guarantee were enforced, causing a lot of blocking to occur. Neither scenario is especially palatable, so SQL Server simply does not provide this ability. However, as we will see later in the chapter, it is easy enough to load the data into a table either during or after tracing, so this is not much of a limitation.