This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
OpenSQLTrace: Automated Trace Processing and Analysis System
Andrew Zanevsky, Chad Littlewood, Michael Hayes, Raied Idan, and Bill Nicholson
Last month, you read Andrew Zanevsky's feature on scrubbing large trace files and aggregating summary information, and most of you probably also downloaded his stored procedures for generating a report of most expensive transactions. In this follow-up article, Andrew and his co-authors describe a system that they built to fully automate recurring execution of traces and generation of reports.
The value of data captured by traces increases significantly if you run traces on a regular basis and preserve all results for historical trend analysis. But storage space quickly becomes a constraint. Our main production server executes 4 million transactions per hour, and a trace lasting for 20 minutes creates a 0.5GB trace file. Our system aims to digest all this data and save only the essence. You can schedule an ad hoc trace of any server–or set up a recurring trace–and automatically load and process trace files. As you saw last month, our system scrubs unimportant details from T-SQL, thus reducing transaction types to a manageable number, and generates and saves a report of most expensive transactions. Such reports, accumulated over a period of several weeks, provide data for performance trend analysis of the whole server or of any particular transaction type.
You can install our system on any network SQL Server that has linked server connections to all the SQL Servers on which you wish to run traces. Therefore, in order to save the trace files, hard drives of this central server must be accessible through the network from traced servers. The central trace server acts as a scheduler of all traces, a processor of data, a publisher of recurring reports, a repository for historical data, and an analysis server where DBAs can generate ad hoc reports and conduct performance investigations. This design minimizes the impact on traced servers and the possibility of disrupting their work by creating disk space shortages or imposing processing overhead. You can also install and use the system directly on each traced server–provided that it has enough disk space and processing capacity.
For purposes of this article, let's call our central trace server TRACESQL and our traced server PRODSQL. If you're planning to use the same server to trace itself, then substitute the same name for TRACESQL and PRODSQL. Here's how to install OpenSQLTrace:
- If you intend to trace only the same server where you've installed OpenSQLTrace (in other words, TRACESQL and PRODSQL are the same server), then skip steps 2 and 3.
- Configure a linked server connection from TRACESQL to PRODSQL. It must allow permissions to launch system stored procedures that manage server-side traces. The easiest method is to use an account with the System Administrator role on PRODSQL, but you clearly need to consider security requirements in your particular environment.
- Find out which account is used to run the MSSQLServer service on PRODSQL. It must be a network account.
- Choose a hard drive partition on TRACESQL that will be used to store trace files. It must have enough room to accommodate a trace file from PRODSQL–probably several gigabytes, but, as they say, "your mileage may vary." Trace file sizes depend on server activity, transaction mix, and duration of traces.
- If TRACESQL and PRODSQL are different servers, create a shared folder called TRACE on TRACESQL on the drive that you chose, and grant all permissions on this share to the network account that runs the MSSQLServer service on PRODSQL.
- Create a database called Trace on TRACESQL. Allocate enough space to store several full trace files–plus a little extra space to keep saved reports. Expiration time of trace files that will be stored in the database is configurable. In our environment, we only keep them for a week, but we keep compiled summary reports indefinitely, allowing for historical trend analysis.
- Download the OpenSQLTrace2.sql script that accompanies this article and execute it from Query Analyzer on the TRACESQL server. This will create all necessary objects in the Trace database and a daily scheduled job to purge expired trace tables. (If you've previously installed the system in the same database, note that the script drops and re-creates all objects, including saved data, with the exception of unexpired trace tables.)
- If TRACESQL and PRODSQL are the same server, then alter the user-defined function ufn_Trace_File_Name created as part of the previous step. Change the following line:
return( '\\' + rtrim( @@servername ) + '\TRACE\' +
to use a hard-coded path to the TRACE folder that you created in step 5. The exact path will depend on your environment. For example, if you created the TRACE folder on drive D:, change the code as follows:
return( 'D:\TRACE\' +
Last month's article provided usage examples for stored procedures scrubbing trace files and generating hit-parade reports. Note that the new downloadable script has a renamed version of Calculate_Most_Expensive_Transactions procedure. The new name is Calculate_Hit_Parade.
This month's script opens new functionality illustrated by the following examples.
Setting up a one-time unattended trace with hit-parade processing
To test the system, let's set up a one-time trace. Execute the following procedure from Query Analyzer on TRACESQL:
Schedule_Trace 'PRODSQL', default, 1
This will schedule a job on TRACESQL to run within two minutes, launch a trace on PRODSQL to run for one minute, and save the file to the TRACE share on TRACESQL. It will also schedule another job on TRACESQL to run 10 minutes after the estimated end time of the trace, load the file into a table in the Trace database, scrub recorded T-SQL statements (see last month's article for details), generate a hit parade of the most expensive transactions, and save it in a table in the Trace database. (Hint: You can use fn_trace_getinfo() to monitor trace progress.)
Both jobs will automatically delete themselves upon successful completion. If you're impatient and want to run the test quicker, you can manually launch the first scheduled job, wait for one minute (trace duration), and then manually launch the second job.
Once the second job completes, you'll be able to find a saved report of most expensive transactions in the Hit_Parade_Archive table in the Trace database and retrieve it using stored procedure Retrieve_Report.
By default, the system records completion of T-SQL batches and remote procedure calls. If you want to record other trace events–or go into greater depth and separately record each query executed within stored procedures–you'll need to give Schedule_Trace the list of events through the @Event\_Class\_Filter parameter.
Scheduling a daily trace
If you need to run traces daily, you can schedule one trace as described previously (only specify desired trace start time instead of default and desired duration instead of one minute). Then manually change properties of both scheduled jobs (run trace and process trace) to set the schedule for daily execution instead of one-time. Also, uncheck the "Automatically delete job" option on the Notifications tab of the scheduled job dialog in EM to prevent jobs from deleting themselves upon completion (default behavior set by Schedule_Trace).
Retrieving and analyzing a hit-parade report
To retrieve a hit-parade report for any trace, you need to know the trace table name that was used to load the data. Trace tables are automatically deleted from the TRACE database when they expire (a configurable parameter), but reports extracted from them are always associated with the original table name. (The Trace_Directory table contains a directory of all processed trace tables.) You can find the trace table name by server name and the time of the trace.
Execute the following stored procedure to retrieve one hit-parade report:
You can view sample hit-parade reports in last month's article. We usually copy and paste these reports into Excel (one is included in this month's Download) where it's easy to sort and analyze the data.
In our environment, we also created a DTS package to automatically publish a daily report of most expensive transactions to a network share in a spreadsheet format. Developers can access this report to see how their stored procedures are performing and to identify bottlenecks. [I applaud the authors' provision for developer feedback–and accountability.–Ed.]
Getting actual source code of transactions by aggregated type
Once you've identified the most expensive transaction types, you may want to see the actual unscrubbed T-SQL code of all transactions that were aggregated under one type. For this "drill-down," execute this stored procedure:
Report_TSQL_by_ID '<Trace_Table_Name>', <SQL_Type_ID>
where <SQL_Type_ID> is a numeric ID of a transaction type from the hit-parade report derived from the trace table specified as <Trace_Table_Name>.
Comparing two reports
One of the most effective analysis methods is comparing two different hit-parade reports side-by-side. You may want to compare performance of the same server between two different traces or performance of two different servers with the same transaction mix. Stored procedure Compare_Reports takes the names of two trace tables (from the Trace_Directory table) as parameters and compares their saved reports. For each transaction type it shows statistics from the first trace, the second trace, and the absolute and relative difference.
Comparison of two reports only makes sense if you traced the same event types in both of them. In our environment, we run a trace at the same time of day on the same server, for the same number of minutes, which makes day-to-day comparison reasonable. But we can think of a number of analysis tasks that would require comparison of traces from two different servers, or executed at different times of day.
We copy and save trace comparison reports into Excel for further analysis. They help us answer questions like these:
- What has changed in the transaction mix that could lead to slower performance?
- Which transactions became more expensive to process?
- How has execution frequency or average duration of a particular stored procedure changed on the same server between two dates?
- Which new transaction types have appeared on the hit-parade report?
- How does I/O and CPU cost of a particular transaction type compare between two traces?
Retrieving history of a particular transaction type from all saved reports
Sometimes you may want to see how performance of a particular transaction changes over time, when you investigate a bottleneck transaction and need to analyze and possibly graphically plot response time degradation or improvement over time, for example. We also use it to verify that modifications applied to stored procedures have indeed improved their performance.
We run traces for our main production server daily and save all reports. This information, collected for several months, allows us to generate historical charts for any particular transaction type that we want to investigate. Stored procedure SQL_Type_History takes transaction type as a parameter (exactly as shown in a hit-parade report) and selects all relevant records from past hit parades.
We copy and paste the report into a spreadsheet so we can chart transaction duration over time, as in Figure 1. In this particular case, the chart illustrates how optimization efforts have improved transaction performance.
The system includes several stored procedures for administration of the trace system. When you install the downloadable script, it schedules stored procedure Purge_Trace_Tables to run daily. This procedure deletes expired trace tables that contain all trace data and use a lot of space. Note that it doesn't delete the hit-parade report derived from the trace table. You may execute this procedure manually as needed. Expiration dates are recorded for each trace table in the Trace_Directory table and may be modified manually if you want to preserve a particular table longer or eliminate it sooner.
Modify view v_Trace_Expiration to configure the default expiration period for trace tables. We use seven days, but you may choose a different duration.
Use stored procedure Integrity_Check to report any problems in the trace directory, such as missing or partially processed tables.
To delete a particular trace, execute stored procedure Delete_Trace.
Use procedure Delete_Hit_Parade to delete a previously saved hit-parade report.
It's difficult to describe in detail all the functions and uses of the trace system in this limited space, and you're obviously welcome to read our source code for additional clues and usage information. If you have specific questions or enhancement suggestions, or if you experience problems, please send any of us an e-mail and we'll try to reply directly. We'll also share important answers with all readers in future issues. However, we don't make any warranties of functionality, safety, reliability, or support, so please use the system at your own risk.
To find out more about SQL Server Professional and Pinnacle Publishing, visit their Web site at http://www.pinpub.com/
Note: This is not a Microsoft Corporation Web site. Microsoft is not responsible for its content.
This article is reproduced from the November 2004 issue of SQL Server Professional. Copyright 2004, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.