Walkthrough: Analyzing Performance Reports in Visual Studio Team System 2008

The goal of this walkthrough is to use the various report views provided by the Performance Wizard to diagnose performance issues in a sample application named "Rational Numbers Class, Operator Overloading Sample." The Rational Numbers Class sample can be found in the MSDN Code Gallery. This class has a function to create and factor several large rational numbers.

This article is based in part on the following blog entries by Ian Huff: Analyzing a performance report in Visual Studio Team System 2005 Part 1, Analyzing a performance report in Visual Studio Team System 2005 Part 2, Analyzing a performance report in Visual Studio Team System 2005 Part 3, and Analyzing a performance report in Visual Studio Team System 2005 Part 4.

Before You Begin

You must download and decompress the "Rational Numbers Class, Operator Overloading Sample" from MSDN Code Gallery, and then create a performance report before you use this walkthrough.

Note

This walkthrough requires Microsoft Visual Studio Team System 2008 Developer Edition or Microsoft Visual Studio Team System 2008 Team Suite.

To prepare for this walkthrough

  1. Download Rational Numbers Class, Operator Overloading Sample from MSDN Code Gallery.

  2. Right-click A1Rational.zip and then click Extract All.

  3. Open the A1Rational folder and then open the Rational folder.

  4. Double-click rational.sln.

    Visual Studio 2008 starts with the solution automatically loaded in Solution Explorer.

  5. Create a Sampling performance session for rational.sln by using the Performance Wizard. For detailed instructions, see "To create a performance session for Windows client application" in How to: Profile with the Performance Wizard and Profiler Defaults.

  6. In the Performance Explorer window, click Launch with Profiling to start the application.

  7. Click each button of the application and then click Close.

Using Summary View

After the performance session is completed, a new report, with the .vsp file name extension, appears in the Reports folder of the Performance Explorer window. The Performance report automatically opens in the Summary view in the IDE. The Summary view provides a quick and easy way to see where the biggest performance issues are in an application. The information that is contained in the Summary view differs, depending on whether the performance session was created in instrumentation mode, or in sampling mode, or with allocation data included. When you initially profile an application, we recommend that you use sampling mode first. Sampling mode can help you identify CPU performance problems while instrumentation is needed when you are investigating an I/O or network issue or the number of calls is important information for your investigation.

By default, the Summary view for sampling mode lists the top five functions in terms of inclusive samples and exclusive samples. Functions That Cause the Most Work lists the functions that have the greatest number of inclusive samples in the profiling run. An inclusive sample is a sample that was taken when this function, or one of its sub-functions, were executing. Functions that Do the Most Individual Work lists the functions that have the greatest number of exclusive samples in the profiling run. An exclusive sample is a sample that was taken only when the function itself was executing and does not include any of its sub-functions. Seeing the inclusive numbers can be useful, especially if the work is being caused by your code but actually done in the .NET Framework. The fastest way to look for a performance issue is by looking at the function that has the highest number of exclusive samples; that function is where most of your processing power is going. By looking at the top exclusive functions, we see that the Rational.reduce() function is the big offender; over 80 percent of the samples taken were taken while that function was executing.

Figure 1: Summary View, Sampling mode

Summary View, Sampling mode

In instrumentation mode, the Summary view contains three sections: Most Called Functions, Functions WithMost Individual Work, and Functions Taking Longest. Most Called Functions lists the functions that were called the most times, and lists the number of times they were called. Functions WithMost Individual Work details the functions that have the most time spent inside that function and not in any sub-function (exclusive time). Functions Taking Longest shows the functions that took the most time while including all their sub-functions (inclusive time). Figure 2 shows the instrumentation mode the Summary view for the sample application. The nice thing about instrumentation is that it captures every function entry and exit so that you get absolute data to analyze, not just samples.

Figure 2: Summary View, Instrumentation Mode

Summary View, Instrumentation mode

Looking at Functions WithMost Individual Work and Functions Taking Longest, we can confirm what we learned from sampling, that Rational.reduce() is the function that is taking much of the time. But we can also glean another useful bit of information from this page. We can see in Most Called Functions that System.Array.GetUpperBound() is being called much more than any other function.

By using the Summary views, we were able to isolate two possible performance issues in the code. First, we learned that Rational.reduce() is the function that uses most processing time. Second, we learned that System.Array.GetUpperBound() is being called lots of times. The next task is to find more information about these possible performance hotspots using some of the other views provided with the profiler. You can switch among the different performance report views by selecting the view name from the Current View drop-down list located at the top of the report window.

Using Functions View

The next analysis view we will use is the Functions view. In instrumentation mode, the Functions view contains a list of all the functions that were run. In sampling mode, the Functions view lists all the functions that were executing or on the call stack when a sample was taken. Next to each function name are multiple columns of aggregate data for that specific function. In Figure 3, you can see an example of the Functions view in instrumentation mode, sorted descending by Elapsed Exclusive Time.

Figure 3: Functions View, Instrumentation mode

Function View

Elapsed Exclusive Time is the time in this function and not in any sub-functions. Elapsed time is measured by the clock on the computer. Application time is time spent in the application or in the kernel because of the application.

The default columns in the report provide a good set of information to diagnose performance issues. Use the tooltip for each column to view a brief description of what information the column measures. You can also customize the columns displayed in the Functions view by right-clicking on any column and then selecting Add/Remove Columns. The Add/Remove Columns dialog box contains a complete list of column options for this view. Some of these additional columns include Process Name, Function Line Number, and several options related to maximum and minimum exclusive time values, among others.

The Functions view can be used much like the Summary view, but it shows you all functions, not just the top five functions in each category. Also, if you want to examine only specific modules that you can navigate in a similar view, you can easily switch to Module view. Right-click any function in the Functions view and then click Show in Module View.

Looking at the Functions view doesn’t tell us much more than we already knew from the Summary view in this example. The two main issues, Rational.reduce and System.Array.GetUpperBound, are still the two big performance villains. It would help our analysis to see where the two trouble functions fit into the overall structure of the program. For this information, we will turn to the Caller/Callee view and the Call Tree view.

Using Caller/Callee View

The Functions view provides a flat list of all the functions aggregated over the life of the applications, whereas the Caller/Callee view focuses on the functions that call a specific function, or callers, and on the functions that the specific function calls, or callees. Because we want to focus on our hotspots, right-click the Rational.reduce function in the Functions view, and then click Show functions called by reduce. This shortcut menu command automatically switches the report to the Caller/Callee view, with Rational.reduce selected as the current function.

Figure 4: Caller/Callee View of Rational.reduce, Instrumentation mode

Caller/Callee View

The Caller/Callee view is divided into three panes: Functions that called <functionname>, Current function, and Functions that were called by <functionname>. In the Functions view, the columns contain a total of all the data for the functions, but in the Caller/Callee view, the columns contain only the data for a specific function. For example, for the function Prime.IsSmallPrime in Functions that were called by <functionname>, the value in the Number of Calls column is the number of times IsSmallPrime was called from Rational.reduce, not the total number of times it was called. Double-clicking a function in Functions that called <functionname> or Functions that were called by <functionname> selects that function as the new current function so that you can trace a specific execution path up or down the call tree. Looking at the Caller/Callee view for Rational.reduce, we can see that Array.GetUpperBound is always being called from Rational.reduce. This behavior could mean that the performance issue is caused by Array.GetUpperBound being called too many times from Rational.reduce.

Using Call Tree View

The Call Tree view provides an alternative to the Caller/Callee for viewing performance data together with the program structure. In the Call Tree view, functions are displayed in a tree structure that corresponds to how they were called. By default, Noise Reduction has been applied to the Call Tree view to help reduce the number of times a function appears in the tree. For more information, see How to: Configure Noise Reduction in Performance Reports.

Figure 5 is the call tree for the Rational Class example, with the nodes expanded to show the Rational.reduce and the Array.GetUpperBound function. Just as in the Caller/Callee view, we can see that GetUpperBound is being called many times, and always from Rational.reduce. The call tree provides an easy way to browse the structure of a program, by using the common UI paradigm of expanding tree nodes.

Figure 5: Call Tree View, Instrumentation mode

Call Tree View

You can also drill into specific call trees and analyze which traces have the greatest performance impact, including bottlenecks, with the Hot Path feature. Just right-click a column in the Call Tree view that you want more information about and then click Expand Hot Path on Inclusive Samples.

Including Allocation Data

Before we jump into the code to try fixing the performance issues we've already identified, let’s see whether the managed object data can tell us any more possible issues with our code.

To enable object allocation and object lifetime data, you must create another performance report in sampling mode with certain properties selected.

To create a sampling mode performance report with allocation enabled

  1. In Performance Explorer, right-click the session name and then click Properties.

  2. On the General page under .NET memory profiling collection, select Collect .NET object allocation information and Also collect .NET object lifetime information.

  3. Click OK to exit the dialog box.

  4. Make sure that Sampling appears in the Method drop-down list, and then click Launch with Profiling to create a new performance report.

In the Summary view, you'll notice that the headings differ from our earlier sampling mode report. In addition to these changes, two new views are available: Allocation and Objects Lifetime. First, we’ll look at the Allocation view.

Figure 6: Allocation View, Sampling mode

Allocation View

The Allocation view lists the .NET objects that were allocated by the application or component. When an object row is expanded, a call tree is displayed that shows the execution paths that resulted in the creation of the object. Information is also displayed about the number of allocations for each function in the call tree. This view can help you identify objects that are using the largest part of you memory. A good way to analyze this data is to sort by Inclusive Bytes and Exclusive Bytes to see what objects are using the most space.

In addition to knowing how many bytes are being used by objects, it can be useful to know how long objects are being kept in managed applications. In managed code object deallocation is handled by the built-in garbage collector so that the programmer does not know exactly how long an object is kept in memory. The Objects Lifetime view can help you diagnose issues of this kind.

Figure 7: Objects Lifetime view, Sampling mode

Object Lifetime View

Columns titled Gen 0 Instances Collected, Gen 1 Instances Collected, Gen 2 Instances Collected, Large Object Heap Instances Collected, and Instances Alive At End detail how long objects of each type were kept before they are garbage collected. Generation 0 (zero) garbage collections are the garbage collections that occur most frequently, mainly for objects that exist only for a short time. Generation one collections occur less frequently then generation zero and generation two are the least frequent of all. So if you have many large objects being kept until generation two, they are using a large portion of memory that you might have not known you were using. Also, some objects are deallocated after the end of the profiling run; those objects are listed under the Instances Alive At End column.

In the example program, the profiling run was so short that most objects were disposed of in generation zero or kept until the end. This makes this view not as useful in diagnosing performance issues. However, for longer running profiling sessions, this kind of data can be invaluable.

The Allocation and Objects Lifetime views are quality tools for finding some performance errors in managed applications. But in this example, the performance scenario was not long enough to find any possible performance issues in these two views. Next, we’ll look at fixing the issues in the code and then run a new performance session to see whether we improved our performance.

Fixing Performance Issues

In the initial analysis, we found two possible performance issues. First, most of the time in the example application is taken up in the Rational.reduce function. Second, Array.GetUpperBound is being called many times. In addition, Array.GetUpperBound is always being called from Rational.reduce so that perhaps there is a link between the two issues.

We know enough now to jump into the code and look around for the source of the issues. From any view in the performance report, right-click the Rational.reduce function and then click View Source to open the source file to the correct point in the code. Scroll down until you see the following:

for(i = 0; i < Prime.Numbers.GetUpperBound(0); i+)
{
t = Prime.Numbers[i];
while(x % t == 0 && y % t == 0)
}

This code is where Array.GetUpperBound is called in Rational.reduce. By being called in a for loop, Array.GetUpperBound must be called every single time that the loop runs!

Let’s move the call to GetUpperBound outside the loop.

int max = Prime.Numbers.GetUpperBound(0);
for (i = 0; i < max; i+)
{
t = Prime.Numbers[i];
while (x % t == 0 && y % t == 0)
{

After you make this change, rebuild the code and run another performance session in Instrumentation mode. Don't forget to change the performance session properties so that you are no longer gathering allocation data. Figure 8 shows the Summary view from the new report that uses the improved code.

Figure 8: Summary View, Instrumentation mode with New Code

New Code Summary View, Sampling Mode

Even from the Summary view, you can see significant performance improvements to Rational.reduce. In addition, GetUpperBound is no longer on the top five functions for number of calls or in the terms of most individual work. If you use the Compare Performance Reports feature to compare the original performance report in Instrumentation mode against the new performance report in Instrumentation mode, you'll be able to see even more detailed performance changes. For more information, see Comparing Profiler Report Files.

Figure 9: Performance Comparison Report

Performance Report Comparisons

Conclusion

In summary, we were able to download an application from the Internet, find a performance issue in it, change the code to fix the issue, and then see how much we improved the code performance in only about a half hour of work. One benefit of the profiler is that it makes it easier to find performance issues in code that you did not write yourself and are not familiar with.

See Also

Other Resources

Performance Session Overview