CPU and Windows Counters in Profiling Tools

The Visual Studio 2008 Profiler enables you to collect performance data generated by the operating system (Windows counters) and performance data generated by the processor unit (CPU counters). You can also select a CPU counter as the event used to generate sampling intervals when profiling with samples.

Windows Counters

Windows counters are part of the Windows diagnostic infrastructure that provides information about the performance of the operating system or an application, service, or driver. Windows counters depend on the configuration of the current computer and might not be available on other computers. For information about how to collect Windows counter data, see How to: Collect Windows Counter Data.

CPU Counters

CPU counters are a feature of the computer's CPU that store the count of hardware-related events. The Visual Studio 2008 Profiler enables you to use these event counts as the sampling interval or collect the event counts when you profile using instrumentation.

Note

The way you use the counters depends on the profiling method. When you use the instrumentation method, you can collect data from one or more CPU counters in addition to the standard timing data. When you use the sampling method, you can choose a CPU counter as the sampling interval. For more information, see How to: Collect CPU Counter Data

Performance counters are CPU-specific. Different models and versions of a CPU can have significantly different configuration settings to enable the same performance counter. Visual Studio 2008 Profiler portable events decouple some of the common performance counters from specific processors and enable you to collect or sample generic performance events.

If you want to count a particular event when profiling, for example, L2 cache misses, the user can build a performance session around that event source. You can do this on any CPU with L2 cache. The performance session can be moved from platform to platform without modification.

The Visual Studio 2008 profiler continues to support particular events for a specific platform. For example, a developer on a Pentium 4 platform might want to count events that are specific to the NetBurst architecture. This event is not portable, but still available to the developer for a specific performance session on a specific platform.

Portable and Platform events

Portable events are a group of counters that are not specific to a specific CPU. All other counters are called platform events, and might not be supported on various platforms.

Counters for both portable and platform events are defined in .XML files, where specific values related to the counters are provided. There are multiple files for different CPUs, because data for Intel and AMD CPUs, for example, are different. The Visual Studio 2008 Profiler uses this information to present appropriate counters, portable and platform, to the user for performance measurement.

Portable Events

Portable events contain the following events:

General Events

Event Name

Event Description

Instructions Retired

Indicates the number of instructions that executed until it is completed.

Non Halted Cycles

Indicates only those cycles in which the processor is not stopped, for example, waiting for I/O.

Front End Events

Event Name

Event Description

ITLB Misses

Indicates the number of Instruction Translation Look-aside Buffer lookups that resulted in a miss.

Branch Events

Event Name

Event Description

Branches Retired

Indicates the number of branch instructions executed until it is completed.

Mis-predicted Branches

Indicates mis-predicted branches that occur because the processor predicted an incorrect path. Mis-predicted branches affect performance because the processor must discard all the work done and start again on a correct path.

Memory Events:

Event Name

Event Description

L2 Cache Read Misses

Indicates the number of second level cache read misses.

L2 Cache Read References

Indicates the number of second level cache read references. It includes load misses and read for ownership (RFO) misses and hits.

Pentium IV Events

Pentium IV events contain the following events:

Memory Events:

Event Name

Event Description

64K Alias Conflicts

Indicates the number of 64K-Alias Conflicts. Conflicts occur when a virtual memory address references a cache line that is modulo 64K bytes apart from another cache line that already resides in the L1 cache.

Page Walk DTLB Misses

Indicates the number of requests for a page walk because of a Data Translation Look-aside Buffer miss. A Page Walk DTLB Miss causes a page fault so that the operating system can load the required page into the tables.

L3 Cache Read Misses

Indicates the number of third level cache read misses. It includes misses that occur because of load and read for ownership (RFO).

L3 Cache Read References

Indicates the number of third level cache read references. It includes load misses and read for ownership (RFO) misses and hits.

All MOB Load Replays

Indicates the number of load instructions that experienced memory order buffer (MOB) replays because store-to-load forwarding restrictions were not observed.

Load/Store Splits Completed

Indicates the number of load and store splits. Data splits decrease performance because they force the processor to read/write two cache lines separately and then paste the two parts of data back together.

Front End Events:

Event Name

Event Description

Page Walk ITLB Misses

Indicates the number of page walk requests because of ITLB misses.

ITLB References

Indicates instruction translation look-aside buffer cache access.

Branch Events:

Event Name

Event Description

Trace Cache Lookup Misses

Indicates delays that occurred in order to decode instructions and build a trace because of a trace cache lookup miss.

Floating Point Unit:

Event Name

Event Description

64-bit MMX Micro-Ops Retired

Indicates the number of retired 64-bit MMX micro-operations. An assembly instruction can break into one or more micro-operations.

X87 SIMD Micro-Ops Retired

Indicates the number of retired X87 single instruction multiple data (SIMD) micro-operations. An assembly instruction can break into one or more micro-operations.

X87 Floating Point Micro-Ops Retired

Indicates the number of retired X87 floating point micro-operations. An assembly instruction can break into one or more micro-operations.

Packed Single Precision Micro-Ops Retired

Indicates the number of retired packed single precision micro-operations. Extra instructions are required to unpack the data.

Scalar Single Precision Micro-Ops Retired

Indicates the number of retired scalar single precision micro-operations.

Packed Double Precision Micro-Ops Retired

Indicates the number of retired packed double precision micro-operations. Extra instructions are necessary to unpack the data.

Scalar Double Precision Micro-Ops Retired

Indicates the number of retired scalar double precision micro-operations.

128-bit MMX Micro-Ops Retired

Indicates the number of retired 128-bit MMX micro-operations. An assembly instruction can break into one or more micro-operations.

SSE Input Assists

Indicates the number of assists necessary to handle an exception condition for SSE/SSE2 floating-point operations.

Viewing Available Counters

Visual Studio UI

To view a list of all CPU performance counters that are supported on the current platform, open the Performance Session Property Pages and do one of the following:

  • Select Sampling, and then select Performance counter from the Sample event list.

    - or -

  • Select CPU Counters, and then select Collect CPU Counters.

To view a list of the Windows performance counters that are supported on the current platform, open the Performance Session properties page, and then select Windows Counters.

Command Line

Use the /querycounters option of VSPerfCmd.exe to print a list of all CPU performance counters supported on current platform.

See Also

Tasks

How to: Choose Sampling Events

How to: Collect CPU Counter Data

How to: Collect Windows Counter Data

Other Resources

Overviews (Profiling Tools)