Storport ETW Logging to Measure Requests Made to a Disk Unit
Greetings NTDEBUGGING community, Bob here again, and today I would like to let everyone know about a new feature implemented in the STORPORT.SYS binaries released in the following KB articles:
· KB 979764 (Windows Server 2008)
· KB 978000 (Windows Server 2008 R2)
Beginning with these new versions, it is now possible to measure timing statistics for requests made to a system’s disk unit. These measurements are taken at the lowest possible level of OS interaction with the storage adapter hardware, making it much easier to diagnose storage performance issues. The measurements are taken from the port driver, STORPORT.SYS, via ETW (Event Tracing for Windows). Windows drivers are implemented in a layered architecture, so in our case the STORPORT driver interfaces directly with the adapter or miniport driver. The STORPORT driver is responsible for managing requests and queues, and providing other upper level functions. Each miniport driver is specific to a particular storage adapter card, and knows how to deliver requests to the disk unit over the transport medium, like fibre channel for instance.
Configuring and Starting STORPORT ETW Logging
To measure storage performance, the user enters a threshold value via the GUI interface. (This can be scripted as well; see KB 978000 for details.) When an I/O time is equal to or greater than the entered threshold, an event 13 is written to the ETW log.
Here is a walkthrough to enable this feature via the GUI:
Click Start, type “perfmon”, and press ENTER to start Performance Monitor.
Expand “Data Collector Sets” and select “Event Trace Sessions”.
Right-Click on “Event Trace Sessions” and select “New -> Data Collector Set”.
The following dialog will appear:
Give the new data collector set a name in the dialog box. In this example I called it “Storport Performance Logging”. Choose the “Create manually (Advanced)” optionthen click Next to see the following dialog.
After clicking “Add...” on the dialog above the following list of providers will appear. Select “Microsoft-Windows-StorPort” and click OK.
When the dialog below opens, select “Keywords (Any)” then click Edit.
Check the box for IOPerfNotification, and then click OK. Note: For Windows Server 2008 SP1 and SP2, some properties may not be present (e. g. Reset, PortNotification and PauseResume).
Next, select “Filter” and click Edit…
This is where we enter our time threshold. Remember that timing values greater than or equal to this filter value will be logged to the trace file.
Select “Binary” and check the “Filter enabled” box
In the “Filter data” field, enter the threshold value in the number of milliseconds. This must be entered in little endian format. Please refer to this table for the most common entries.
01 00 00 00 00 00 00 00
05 00 00 00 00 00 00 00
0A 00 00 00 00 00 00 00
0F 00 00 00 00 00 00 00
Note: The “Filter type” value will always remain 0 as in the example above showing an entry for 1 ms.
WARNING - The whole data line needs to be filled in when entering the threshold. For demonstration purposes here is the WRONG WAY to enter 256 ms. This will not work because of the missing 00 entries on the line.
Below is the correct way to enter the value (256ms) Note: The Filter value may have to be re-entered for next run because it could get reset.
After clicking OK, you will see a new Event Trace session (screenshot below). Right-Click the new Event Trace session and select “Properties”
Next we need to set a location for the trace file. Ensure you have sufficient disk space in the target folder because logs can be many Megabytes in size depending on the number of hits.
The target folder may be changed in the “Directory” tab
I recommend choosing a directory that is more easily located, like C:\PerfLogs, in which case you can click “Browse”
Note: It is not recommended to store the trace file on the same storage you are troubleshooting.
The name for the file is selected on the “File” tab.
When you are done setting the file and directory information click OK.
Now right-click the log and select “Start” to begin tracing.
The “Status” should indicate that it is running.
How long to run the log depends on you issue. You should let the log run until the issue or slow down occur then check the log for delays.
Viewing the Log File
First right-click and select “Stop” to flush any cached data and stop the logging.
Then use Event Viewer to examine the log.
Next choose "Open the saved log…"
After clicking OK a popup dialog will be displayed to ask if you want to convert the log. Select “No”. Here is some sample event data that you will likely see in the log.
Example: Event ID 13
Let's examine the “Details” tab which shows information about the I/O that exceeded or equaled the threshold we had set. If you do not set a threshold, then all I/O events are logged, but for performance reasons we recommend setting a "reasonable" threshold like 15ms.
Breaking down the Event Data we have …
Miniport Extension: (Kernel address of the miniport extension, not of use to performance logging, only debugging)
PortNumber: This is the adapter port number (i.e. RaidPort1, etc.)
PathID: Path number if multi-pathing is in place
TargetID: Target LUN exposed to the Operating System
LUN: The Logical Unit Number of the physical storage where the particular I/O is directed
*RequestDuration: How long the request took in milliseconds (the number of note here for storage latency diagnostics)
Command: the SCSI command sent to the unit. (Usually read or write) This is listed in decimal. Convert this value to hexadecimal with Calc.exe for referencing SCSI commands https://en.wikipedia.org/wiki/SCSI_command
ScsiStatus: the status that came back from the drive
Example: Command 42 equals 2A which is a Write of 10 bytes.
Here’s the respective SCSI command:
When viewing the log, the commands will be mostly reads and writes. If I/O is determined to be slow at this level, then the issue is most likely disk or SAN related.
Hope you enjoyed the walkthrough!