Storport ETW Logging to Measure Requests Made to a Disk Unit

Greetings NTDEBUGGING community, Bob here again, and today I would like to let everyone know about a new feature implemented in the STORPORT.SYS binaries released in the following KB articles:

· KB 979764 (Windows Server 2008)

· KB 978000 (Windows Server 2008 R2)

 

Beginning with these new versions, it is now possible to measure timing statistics for requests made to a system’s disk unit. These measurements are taken at the lowest possible level of OS interaction with the storage adapter hardware, making it much easier to diagnose storage performance issues. The measurements are taken from the port driver, STORPORT.SYS, via ETW (Event Tracing for Windows). Windows drivers are implemented in a layered architecture, so in our case the STORPORT driver interfaces directly with the adapter or miniport driver. The STORPORT driver is responsible for managing requests and queues, and providing other upper level functions. Each miniport driver is specific to a particular storage adapter card, and knows how to deliver requests to the disk unit over the transport medium, like fibre channel for instance.

Configuring and Starting STORPORT ETW Logging

To measure storage performance, the user enters a threshold value via the GUI interface. (This can be scripted as well; see KB 978000 for details.) When an I/O time is equal to or greater than the entered threshold, an event 13 is written to the ETW log.

Here is a walkthrough to enable this feature via the GUI:

Click Start, type “perfmon”, and press ENTER to start Performance Monitor.

image

 

Expand “Data Collector Sets” and select “Event Trace Sessions”.

image

 

Right-Click on “Event Trace Sessions” and select “New -> Data Collector Set”.  

image

 

The following dialog will appear:

image

 

Give the new data collector set a name in the dialog box. In this example I called it “Storport Performance Logging”. Choose the “Create manually (Advanced)” optionthen click Next to see the following dialog.

image

 

After clicking “Add...” on the dialog above the following list of providers will appear. Select “Microsoft-Windows-StorPort” and click OK.

image

 

When the dialog below opens, select “Keywords (Any)” then click Edit.

image

 

Check the box for IOPerfNotification, and then click OK. Note: For Windows Server 2008 SP1 and SP2, some properties may not be present (e. g. Reset, PortNotification and PauseResume). 

image

 

 

Next, select “Filter” and click Edit…

image

 

This is where we enter our time threshold. Remember that timing values greater than or equal to this filter value will be logged to the trace file.

 

Select “Binary” and check the “Filter enabled” box

image

In the “Filter data” field, enter the threshold value in the number of milliseconds. This must be entered in little endian format. Please refer to this table for the most common entries.

Decimal

Hexadecimal

Binary(little endian)

1ms

5ms

10ms

15ms

1

5

A

F

01 00 00 00 00 00 00 00

05 00 00 00 00 00 00 00

0A 00 00 00 00 00 00 00

0F 00 00 00 00 00 00 00

 

Note: The “Filter type” value will always remain 0 as in the example above showing an entry for 1 ms.

WARNING - The whole data line needs to be filled in when entering the threshold. For demonstration purposes here is the WRONG WAY to enter 256 ms. This will not work because of the missing 00 entries on the line.

image 

Below is the correct way to enter the value (256ms) Note: The Filter value may have to be re-entered for next run because it could get reset.

image

After clicking OK, you will see a new Event Trace session (screenshot below). Right-Click the new Event Trace session and select “Properties”

image

 

Next we need to set a location for the trace file. Ensure you have sufficient disk space in the target folder because logs can be many Megabytes in size depending on the number of hits.

image

 

The target folder may be changed in the “Directory” tab

image

 

I recommend choosing a directory that is more easily located, like C:\PerfLogs, in which case you can click “Browse”

image

 

Note: It is not recommended to store the trace file on the same storage you are troubleshooting.

image

The name for the file is selected on the “File” tab.

image

When you are done setting the file and directory information click OK.

Now right-click the log and select “Start” to begin tracing.

image

The “Status” should indicate that it is running.

image

How long to run the log depends on you issue. You should let the log run until the issue or slow down occur then check the log for delays.

Viewing the Log File

First right-click and select “Stop” to flush any cached data and stop the logging.

image

Then use Event Viewer to examine the log.

image

 

Next choose "Open the saved log…"

image

 

image

image

 

After clicking OK a popup dialog will be displayed to ask if you want to convert the log. Select “No”. Here is some sample event data that you will likely see in the log.

image

 

Example: Event ID 13

image

 

Let's examine the “Details” tab which shows information about the I/O that exceeded or equaled the threshold we had set. If you do not set a threshold, then all I/O events are logged, but for performance reasons we recommend setting a "reasonable" threshold like 15ms.

image

Breaking down the Event Data we have …

 

Miniport Extension: (Kernel address of the miniport extension, not of use to performance logging, only debugging)

PortNumber:   This is the adapter port number (i.e. RaidPort1, etc.)

PathID:   Path number if multi-pathing is in place

TargetID: Target LUN exposed to the Operating System

LUN:   The Logical Unit Number of the physical storage where the particular I/O is directed

*RequestDuration:   How long the request took in milliseconds (the number of note here for storage latency diagnostics)

Command:   the SCSI command sent to the unit. (Usually read or write) This is listed in decimal. Convert this value to hexadecimal with Calc.exe for referencing SCSI commands http://en.wikipedia.org/wiki/SCSI_command

ScsiStatus:   the status that came back from the drive

SRBStatus:   the status as returned from adapter (see srb.h and scsi.h in the Microsoft WDK or http://en.wikipedia.org/wiki/SCSI_Status_Code)

 

Example: Command 42 equals 2A which is a Write of 10 bytes.

image

image

Here’s the respective SCSI command:

image

When viewing the log, the commands will be mostly reads and writes. If I/O is determined to be slow at this level, then the issue is most likely disk or SAN related.

Hope you enjoyed the walkthrough!

-Bob