Use SQL Server replay tools to reproduce and resolve customer issues
For many ISVs run that into issues at customer sites, it is sometimes difficult to isolate underlying problems, especially on a 24x7 production environment, where limitations apply to real time troubleshooting and live debugging. In situations like this, constructing a repro scenario in a separate environment would be ideal to minimize impact to live production system, and to speed up resolution process.
SQL Server Profiler
Allow me introduce SQL Profiler, which offers replay trace function. Well, it’s not something new. First shipped in SQL Server 7.0, the feature has gone through many improvements in later releases. You can use the tool to take a captured trace as input and replay it against test database(s). It helps identify issues that could be reproduced by replaying the events in the trace. Profiler itself uses ODBC only. In SQL Server 2005 and 2008, the replay function can be configured to use multiple threads (up to 64) to replay workloads.
1. SQL Server profiler is a built-in tool with full support of Microsoft product team. It works out of box.
2. Easy to set up and run. Capture a trace using predefined replay template with all required events, and replay it against original database(s) (target machine needs to meet certain requirements http://msdn2.microsoft.com/en-us/library/ms175525.aspx)
3. In addition to multi-threaded replay, it also provides option of step through to replay events in the order they were traced.
1. Certain events can’t be replayed including replication, events involving GUID, session binding events, operations on Blobs using bcp, full-text, READTEXT, WRITETEXT, and etc. See BOL for more details (http://msdn2.microsoft.com/en-us/library/ms188238.aspx)
2. The tool does not support multiple machine replay (running multiple instances of Profiler from multiple machines to replay the trace).
3. Profiler GUI tool is client side tracing and might be intrusive and generate significant performance overhead when capturing events for replay. Be careful of what events to capture and consider using server side tracing (sp_trace_create).
Starting in SQL Server 2000, SQL Server Customer Support Services team (CSS) started a project of similar tool, called Ostress, with higher flexibility and scalability to help troubleshoot some of the challenging SQL problems. The latest version is packaged in “Replay Markup Language(RML) Utilities” supporting both SQL 2005 and SQL 2000 (http://support.microsoft.com/kb/944837). The tool can replay multi-threaded events as profiler does but with multiple machine replay support. It can simulate up to 1000 concurrent threads. The tool has a component called Ostress (just like old version), which takes a TSQL batch file and “stress” test it by opening arbitrary number of connections and iterate the TSQL batch in each connection configurable number of loops. This is useful when workload can be characterized as same or similar batch from various number of users (connections).
1. The tool offers both replay and stress test options.
2. It supports multiple machine replay (multiple instances of OStress) with up to 1000 concurrent threads.
3. OStress supports 7.0, 2000, and 2005 trace formats.
1. The tool is provided as is, no technical support from Microsoft. But you can submit questions via contact in readme of the tool.
2. Requires extra steps to process trace file and convert to RML format before being replayed.
3. Does not support MARS replay.
When to use SQL profiler and when to use RML Utilities? If you have a workload that can be replayed/reproduced with no or low concurrency requirement (<64 concurrent threads), use profiler that offers flexibility of step through or multi-threaded replay options. If you need to replay a workload with high concurrency requirement (> 64 threads) or an isolated batch that can be “stress” tested for simulation, use RML Utilities. Keep in mind, for concurrency replay, full sync of ordered events is very hard to replay and no tools exist today to exactly duplicate the original trace. So the issues that happened on traced source server might not be reproduced consistently afterwards even on same environment.
Both tools above are for database replay. For a simulation test of multi-tier application environment, consider load-test tool of Visual Studio (Team edition) or 3rd-party vendor products.
Cross Posted from http://blogs.microsoft.com/mssqlisv