Chapter 16 – Performance Test Reporting Fundamentals

 

patterns & practices Developer Center

Performance Testing Guidance for Web Applications

J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation

September 2007

Objectives

  • Learn how to apply principles of effective reporting to performance test data.
  • Learn when to share technical results versus produce stakeholder reports.
  • Learn what questions various team members expect performance reports to answer.

Overview

Managers and stakeholders need more than simply the results from various tests — they need conclusions based on those results, and consolidated data that supports those conclusions. Technical team members also need more than just results — they need analysis, comparisons, and details of how the results were obtained. Team members of all types get value from performance results being shared more frequently. In this chapter, you will learn how to satisfy the needs of all the consumers of performance test results and data by employing a variety of reporting and results-sharing techniques, and by learning exemplar scenarios where each technique tends be well received.

How to Use This Chapter

Use this chapter to understand the principles of effective performance test results reporting, and as a reference for exemplars of effective data presentation. To get the most from this chapter:

  • Use the “Principles of Effective Reporting” section to understand the key concepts and principles behind effective reporting.
  • Use the “Frequently Reported Performance Data” section to learn about various ways that performance data can be presented and the types of results to which those methods are most effectively applied.
  • Use the “Questions to Be Answered by Reporting” section to understand how reports are designed for various audiences, and how to deliver the right information to the right audience in a format that they find intuitive.

Principles of Effective Reporting

The key to effective reporting is to present information of interest to the intended audience in a quick, simple, and intuitive manner. The following are some of underlying principles of effective reporting:

  • Report early, report often
  • Report visually
  • Report intuitively
  • Use the right statistics
  • Consolidate data correctly
  • Summarize data effectively
  • Customize reports for the intended audience
  • Use concise verbal summaries
  • Make the data available

Report Early, Report Often

Continual sharing of information and data is critical to the efficiency and overall success of a performance-testing project. However, not all of the information and data to be shared needs to take the form of a formal or semiformal report. One effective approach is to send stakeholders summary charts and tables every day or two in an e-mail message that contains a concise statement of key points. Use the feedback and questions you receive from those stakeholders when deciding what to put in the next formal or semiformal report. In this way you can gauge the needs of your audience before writing what is intended to be a stand-alone or final document.

Sharing information and data with the technical team can be an even more straightforward process. It may be as simple as posting the location of the new results files to a team wiki before you begin analyzing them, and then posting links to any charts and graphs that derive from your analysis.

Report Visually

Most people find that data and statistics reported in a graphical format are easier to digest. This is especially true of performance results data, where the volume of data is frequently very large and most significant findings result from detecting patterns in the data. It is possible to find these patterns by scanning through tables or by using complex mathematical algorithms, but the human eye is far quicker and more accurate in the vast majority of cases.

Once a pattern or “point of interest” has been identified visually, you will typically want to isolate that pattern by removing the remaining “chart noise.” In this context, chart noise includes all of the data points representing activities and time slices that contain no points of interest (that is, the ones that look like you expect them to). Removing the chart noise enables you to more clearly evaluate the pattern you are interested in, and makes reports more clear.

Report Intuitively

Whether formal or informal, reports should be able to stand on their own. If a report leaves the reader with questions as to why the information is important, the report has failed. While reports do not need to provide the answers to issues to be effective, the issues should be quickly and intuitively clear from the presentation.

One method to validate the intuitiveness of a report is to remove all labels or identifiers from charts and graphs and all identifying information from narratives and then present the report to someone unfamiliar with the project. If that person is able to quickly and correctly point to the issue of concern in the chart or graph, or identify why the issue discussed in the narrative is relevant, then you have created an intuitive report.

Use the Right Statistics

Even though there is a widespread need to understand many statistical concepts, many software developers, testers, and managers either do not have strong backgrounds in or do not enjoy statistics. This can lead to significant misrepresentations of performance test results when reporting. If you are not sure what statistics to use to highlight a particular issue, do not hesitate to ask for assistance.

Consolidate Data Correctly

While it is not strictly necessary to consolidate results, it tends to be much easier to demonstrate patterns in results when those results are consolidated into one or two graphs rather than distributed over dozens. That said, it is important to remember that only results from identical test executions that are statistically similar can be consolidated into performance report output tables and charts.

Additional Considerations

In order for results to be consolidated, both the test and the test environment must be identical, and the test results must be statistically equivalent. One approach to determining if results are similar enough to be consolidated is to compare results from at least five test executions and apply the following rules:

  • If more than 20 percent (or one out of five) of the test execution results appear not to be similar to the rest, something is generally wrong with the test environment, the application, or the test itself.
  • If a 95th percentile value for any test execution is greater than the maximum or less than the minimum value for any of the other test executions, it is not statistically similar.
  • If every page/timer result in a test execution is noticeably higher or lower on the chart than the results of all the rest of the test executions, it is not statistically similar.
  • If a single page/timer result in a test execution is noticeably higher or lower on the chart than all the rest of the test execution results, but the results for all the rest of the pages/timers in that test execution are not, the test executions are probably statistically similar.

Summarize Data Effectively

Summarizing results frequently makes it much easier to demonstrate meaningful patterns in the test results. Summary charts and tables present data from different test executions side by side so that trends and patterns are easy to identify. The overall point of these tables and charts is to show team members how the test results compare to the performance goals of the system so they can make important decisions about taking the system live, upgrading the system, or even, in some cases, completely reevaluating the project.

Additional Considerations

Keep the following key points in mind when summarizing test data:

  • Use charts and tables that make your findings clear.
  • Use text to supplement tables and charts, not the other way around.
  • If a chart or table is confusing to the reader, don’t use it.

Customize Reports for the Intended Audience

Performance test results are most commonly read by one of three audiences: technical team members, non-technical team members, and stakeholders outside of the core team. These three groups tend to look for very different things in a performance report and are inclined to prefer different presentation methods. When reporting, make sure that you identify which group or groups you are reporting to and what their expectations are before deciding on the best way to present the results you have collected.

Use Concise Verbal Summaries

Results should have at least a short verbal summary associated with them, and some results are best or most easily presented in writing alone. What you decide to include in that text depends entirely on your intended audience. Some audiences may require just one or two sentences capturing the key point(s) you are trying to make with the graphic. For example:

“From observing this graph, you can see that the system under test meets all stated performance goals up to 150 hourly users but at that point degrades quickly to an essentially unusable state.”

Other audiences may also require a detailed explanation of the graph being presented. For example:

“In this graph, you see the average response time in seconds, portrayed vertically on the left side of the graph, plotted against the total number of hourly users simulated during each test execution, portrayed horizontally along the bottom of the graph. The intersection points depict    ”

Make the Data Available

There is a disturbingly popular belief that performance testing (or other testing) data should not be shared in its raw form out of fear that the consumers of that data will use or analyze it improperly. While this concern is not invalid, of much greater concern is the fact that it is simply not reasonable to expect any one person or team to be able to extract all of the value from a set of data at one point in time. Data provides different value to different people at different times, and the only way to get the most out of the data is to make that data continually available to the team. Additionally, making the data available tends to minimize some people’s perception that the performance results are simply fabrications based on a set of tools and processes that they do not understand.

Frequently Reported Performance Data

The following are the most frequently reported types of results data. The sections that follow describe what makes this data interesting to whom, as well as considerations for reporting that type of data.

  • End-user response times
  • Resource utilizations
  • Volumes, capacities, and rates
  • Component response times
  • Trends

End-user Response Times

End-user response time is by far the most commonly requested and reported metric in performance testing. If you have captured goals and requirements effectively, this is a measure of presumed user satisfaction with the performance characteristics of the system or application. Stakeholders are interested in end-user response times to judge the degree to which users will be satisfied with the application. Technical team members are interested because they want to know if they are achieving the overall performance goals from a user’s perspective, and if not, in what areas those goals not being met.

Exemplar1

Bb924371.image003(en-us,PandP.10).gif

Figure 16.1* *Response Time

Exemplar2

Bb924371.image004(en-us,PandP.10).gif

Figure 16.2* *Response Time Degradation

Considerations

Even though end-user response times are the most commonly reported performance-testing metric, there are still important points to consider.

  • Eliminate outliers before reporting.  Even one legitimate outlier can dramatically skew your results.
  • Ensure that the statistics are clearly communicated.  The difference between an average and a 90th percentile, for example, can easily be the difference between “ship it” and “fix it.”
  • Report abandonment separately.  If you are accounting for user abandonment, the collected response times for abandoned pages may not represent the same activity as non-abandoned pages. To be safe, report response times for non-abandoned pages with an end-user response time graph and response times and abandonment percentages by page on a separate graph or table.
  • Report every page or transaction separately.  Even though some pages may appear to represent an equivalence class, there could be differences that you are unaware of.

Resource Utilizations

Resource utilizations are the second most requested and reported metrics in performance testing. Most frequently, resource utilization metrics are reported verbally or in a narrative fashion. For example, “The CPU utilization of the application server never exceeded 45 percent. The target is to stay below 70 percent.” It is generally valuable to report resource utilizations graphically when there is an issue to be communicated.

Exemplar for Stakeholders

Bb924371.image005(en-us,PandP.10).gif

Figure 16.3* *Processor Time

Exemplar for Technical Team Members

Bb924371.image006(en-us,PandP.10).gif

Figure 16.4* *Processor Time and Queue

Additional Considerations

Points to consider when reporting resource utilizations include:

  • **Know when to report all of the data and when to summarize. ** Very often, simply reporting the peak value for a monitored resource during the course of a test is adequate. Unless an issue is detected, the report only needs to demonstrate that the correct metrics were collected to detect the issue if it were present during the test.
  • **Overlay resource utilization metrics with other load and response data.  **Resource utilization metrics are most powerful when presented on the same graph as load and/or response time data. If there is a performance issue, this helps to identify relationships across various metrics.
  • **If you decide to present more than one data point, present them all.  **Resource utilization rates will often change dramatically from one measurement to the next. The pattern of change across measurements is at least as important as the current value. Moving averages and trend lines obfuscate these patterns, which can lead to incorrect assumptions and regrettable decisions.

Volumes, Capacities, and Rates

Volume, capacity, and rate metrics are also frequently requested by stakeholders, even though the implications of these metrics are often more challenging to interpret. For this reason, it is important to report these metrics in relation to specific performance criteria or a specific performance issue. Some examples of commonly requested volume, capacity, and rate metrics include:

  • Bandwidth consumed
  • Throughput
  • Transactions per second
  • Hits per second
  • Number of supported registered users
  • Number of records/items able to be stored in the database

Exemplar

Bb924371.image007(en-us,PandP.10).gif

Figure 16.5* *Throughput

Additional Considerations

Points to consider when reporting volumes, capacities and rates include:

  • **Report metrics in context. ** Volume, capacity, and rate metrics typically have little stand-alone value.
  • Have test conditions and supporting data available.  While this is a good idea in general, it is particularly important with volume, capacity, and rate metrics.
  • Include narrative summaries with implications.  Again, while this is a good idea in general, it is virtually critical to ensure understanding of volume, capacity, and rate metrics.  

Component Response Times

Even though component response times are not reported to stakeholders as commonly as end-user response times or resource utilization metrics, they are frequently collected and shared with the technical team. These response times help developers, architects, database administrators (DBAs), and administrators determine what sub-part or parts of the system are responsible for the majority of end-user response times.

Exemplar

Bb924371.image008(en-us,PandP.10).gif

Figure 16.6* *Sequential Consecutive Database Updates

Additional Considerations

Points to consider when reporting component response times include:

  • **Relate component response times to end-user activities. ** Because it is not always obvious what end-user activities are impacted by a component’s response time, it is a good idea to include those relationships in your report.
  • Explain the degree to which the component response time matters.  Sometimes the concern is that a component might become a bottleneck under load because it is processing too slowly; at other times, the concern is that end-user response times are noticeably degraded as a result of the component. Knowing which of these conditions applies to your project enables you to make effective decisions.

Trends are one of the most powerful but least-frequently used data-reporting methods. Trends can show whether performance is improving or degrading from build to build, or the rate of degradation as load increases. Trends can help technical team members quickly understand whether the changes they recently made achieved the desired performance impact.

Exemplar

Bb924371.image009(en-us,PandP.10).gif

Figure 16.7* *Response Time Trends for Key Pages

Additional Considerations

Points to consider when reporting trends include:

  • **Trends typically do not add value until there are at least three measurements. ** Sometimes trends cannot be effectively detected until there are more than three measurements. Start creating your trend charts with the first set of data, but be cautious about including them in formal reports until you have collected enough data for there to be an actual trend to report.
  • Share trends with the technical team before including them in formal reports.  This is another generally good practice, but it is particularly relevant to trends because developers, architects, administrators, and DBAs often will have already backed out a change that caused the trend to move in the wrong direction before they are able to compile their report. In this case, you can decide that the trend report is not worth including, or you can simply make an annotation describing the cause and stating that the issue has already been resolved.

Questions to Be Answered By Reporting

Almost every team member has unique wants, needs, and expectations when it comes to reporting data and results obtained through performance testing. While this makes sharing information obtained through performance testing challenging, knowing what various team members expect and value in advance makes providing valuable information to the right people, at the right level of detail and at the right time, much easier

All Roles

Some questions that are commonly posed by team members include:

  • Is performance getting better or worse?
  • Have we met the requirements/service level agreements (SLAs)?
  • What reports are available?
  • How frequently can I get reports?
  • Can I get a report with more/less detail?

Executive Stakeholders

Executive stakeholders tend to have very specific reporting needs and expectations that are often quite different from those of other team members. Stakeholders tend to prefer information in small, digestible chunks that clearly highlight the key points. Additionally, stakeholders like visual representations of data that are intuitive at a glance, as well as “sound bite”–size explanations of those visual representations. Finally, stakeholders tend to prefer consolidated and summarized information on a less frequent (though not significantly less frequent) basis than other team members. The following are common questions that executive stakeholders want performance test reports to answer:

  • Is this ready to ship?
  • How will these results relate to production?
  • How much confidence should I have in these results?
  • What needs to be done to get this ready to ship?
  • Is the performance testing proceeding as anticipated?
  • Is the performance testing adding value?

Project-Level Managers

Project-level managers — including the project manager, development lead or manager, and the test lead or manager — have all of the same needs and questions as the executive stakeholders, except that they want the answers more frequently and in more detail. Additionally, they commonly want to know the following:

  • Are performance issues being detected efficiently?
  • Are performance issues being resolved efficiently?
  • What performance testing should we be conducting that we currently are not?
  • What performance testing are we currently doing that is not adding value?
  • Are there currently any blockers? If so, what are they?

Technical Team Members

Although technical team members have some degree of interest in all of the questions posed by managers and stakeholders, they are more interested in receiving a continual flow of information related to test results, monitored data, observations, and opportunities for analysis and improvement. Technical team members tend to want to know the following:

  • What do these results mean to my specialty/focus area?
  • Where can I go to see the results for the last test?
  • Where can I go to get the raw data?
  • Can you capture metric X during the next test run?

Types of Results Sharing

In the most basic sense, there are three distinct types of results sharing: raw data display, technical reports, and stakeholder reports. While all are based on timely, accurate, and relevant communication of results, observations, concerns, and recommendations, each type targets a different audience, and the most effective methods of communicating data differ dramatically.

Raw Data Display

While not explicitly a reporting scenario, the sharing of raw data for collaboration purposes involves many of the same principles of data presentation that are applied to reports in order to improve the effectiveness of the collaboration.

In general, most people would rather view data and statistics in graphical form instead of in tables. In some cases, however, tables are the most efficient way to show calculated results for all of the data. It is recommended that you use tables sparingly in reports, while including the tabular form of the data used to create charts and graphs as an appendix or attachment to a report, so that interested stakeholders can refer to it.

Results from the following types of tests can be well represented in a tabular format:

  • Baseline
  • Benchmark
  • Scalability
  • Any other user-experience–based test

Tables are an excellent way to present volumes of data in a clean and orderly manner and to support the findings they ultimately lead to. However, you should be careful not to overuse tables in your reports. Many people quickly skip over tables and read only the surrounding text or examine only the charts that go with them. Be certain that whether you use the tables discussed below or other types, you present in your report only those tables that clearly make an important point. Huge tables containing all of the supporting data may be of interest to a few individuals, but not to most, and thus should be included only in an appendix to a report. Raw data is most commonly shared in the following formats:

  • Spreadsheets
  • Text files (and regular expression searches)
  • Data collection tools (“canned” reports)

Technical Reports

Technical reports are generally more formal than raw data display, but not excessively so. Technical reports should stand on their own, but since they are intended for technical members of the team who are currently working on the project, they do not need to contain all of the supplemental detail that a stakeholder report normally does. In the simplest sense, technical reports are made up of the following:

  • Description of the test, including workload model and test environment
  • Easily digestible data with minimal pre-processing
  • Access to the complete data set and test conditions
  • Short statements of observations, concerns, questions, and requests for collaboration

Technical reports most commonly include data in the following formats:

  • Scatter plots
  • Pareto charts
  • Trend charts
  • Summary spreadsheets

Stakeholder Reports

Stakeholder reports are the most formal of the performance data sharing formats. These reports must be able to stand alone while at the same time being intuitive to someone who is not working on the project in a day-to-day technical role. Typically, these reports center on acceptance criteria and risks. To be effective, stakeholder reports typically need to include:

  • The acceptance criteria to which the results relate
  • Intuitive, visual representations of the most relevant data
  • A brief verbal summary of the chart or graph in terms of criteria
  • Intuitive, visual representations of the workload model and test environment
  • Access to associated technical reports, complete data sets, and test conditions
  • A summary of observations, concerns, and recommendations

When preparing stakeholder reports, consider that most stakeholder reports meet with one (or more) of the following three reactions. All three are positive in their own way but may not seem to be at first. These reactions and some recommended responses follow:

  • “These are great, but where’s the supporting data?”  This is the most common response from a technical stakeholder. Many people and organizations want to have all of the data so that they can draw their own conclusions. Fortunately, this is an easy question to handle: simply include the entire spreadsheet with this supporting data as an appendix to the report.
  • “Very pretty, but what do they mean?”  This is where text explanations are useful. People who are not familiar with performance testing or performance results often need to have the implications of the results spelled out for them. Remember that more than 90 percent of the times, performance testers are the bearers of bad news that the stakeholder was not expecting. The tester has the responsibility to ensure that the stakeholder has confidence in the findings, as well as presenting this news in a constructive manner.
  • “Terrific! This is exactly what I wanted! Don’t worry about the final report — these will do nicely.”  While this seems like a blessing, do not take it as one. Sooner or later, your tables and charts will be presented to someone who asks one of the two preceding questions, or worse, asks how the data was obtained. If there is not at least a final report that tells people where to find the rest of the data, people will question the results because you are not present to answer those questions.

Creating a Technical Report

Although six key components of a technical report are listed below, all six may not be appropriate for every technical report. Similarly, there may be additional information that should be included based on exactly what message you are trying to convey with the report. While these six components will result in successful technical reports most of the time, remember that sometimes creativity is needed to make your message clear and intuitive.

Consider including the following key components when preparing a technical report:

  • A results graph
  • A table for single-instance measurements (e.g., maximum throughput achieved)
  • Workload model (graphic)
  • Test environment (annotated graphic)
  • Short statements of observations, concerns, questions, and requests for collaboration
  • References section

Exemplar Results Graph

Bb924371.image010(en-us,PandP.10).jpg

Figure 16.8* *Consolidated Statistics

Exemplar Tables for Single-Instance Measurements

Bb924371.image011(en-us,PandP.10).gif

_Figure 16.9* *Single Instance Measurements _

Exemplar Workload Model Graphic

Bb924371.image012(en-us,PandP.10).gif

Figure 16.10* *Workload Model

Exemplar Test Environment Graphic

Bb924371.image013(en-us,PandP.10).gif

Figure 16.11* *Test Environment

Exemplar Summary Statement

“The results graph shows both response times and resource utilization together. Close examination shows that Application Server CPU Usage and queue length coincide with significantly degraded response time. It appears as if the application server CPU usage was the catalyst to the degradation, but this has yet to be confirmed. The remaining charts and graphs are included as supplemental information for easy reference.”

Exemplar References Section

“Raw data and additional supporting information is checked into the version-control system with the build and tagged as PerfTest-{date}-{issue number}.”

Creating a Stakeholder Report

Although eight key components of a stakeholder report are listed below, all eight may not be appropriate for every stakeholder report. Similarly, there may be additional information that should be included based on exactly what message you are trying to convey with the report. While these eight components will result in successful stakeholder reports most of the time, remember that sometimes creativity is needed to make your message clear and intuitive.

Consider including the following key components when preparing a stakeholder report:

  • Criteria to which the results relate
  • A results graph
  • A table for single-instance measurements (e.g., maximum throughput achieved)
  • A brief verbal summary of the chart or graph in terms of criteria
  • Workload model (graphic)
  • Test environment (annotated graphic)
  • Summary of observations, concerns, and recommendations
  • References section

Exemplar Criteria Statement

“This report relates to end-user response time compliances as documented in the requirements management system as requirements Perf### through Perf??? at one-half of the expected peak load with the most commonly expected usage scenario.”

Exemplar Results Graph

Bb924371.image014(en-us,PandP.10).gif

Figure 16.12* *Response Time Compliance Summary

Exemplar Tables for Single-Instance Measurements

Bb924371.image015(en-us,PandP.10).gif

Figure 16.13* *Single Instance Measurements

Exemplar Criteria-Based Results Summary

“All metrics collected achieved their required values except for the response times of pages 8 and 10.

  • Page 10 failed to achieve its required value by 2 percent.
  • Page 8 failed to achieve its required value by 38 percent.”

Exemplar Workload Model Graphic

Bb924371.image016(en-us,PandP.10).gif

Figure 16.14* *Workload Model

Exemplar Test Environment Graphic

Bb924371.image017(en-us,PandP.10).gif

Figure 16.15* *Test Environment

Exemplar Observations and Recommendations Statement

“Based on the test conditions and results, the performance testing and tuning team recommends the following.

  1. Continue performance testing with increasingly strenuous scenarios and loads.
  2. Priority should be given to determining the root cause of pages 8 and 10 not achieving their acceptance criteria, and subsequently tuning those root causes.
  3. At such time as additional pages demonstrate a failure to achieve their acceptance criteria, a dedicated root cause and tuning cycle should be undertaken.”

Exemplar References Section

“All of the data used to create this report and execute the tests that generated that data is checked into the version-control system as read-only with the release candidate and tagged as PerfTest-{date}-{RC number}-Validation.

“The same data has been temporarily copied to {\\shared-resource\location} for individuals without access to the version-control system.”

Summary

Performance test reporting is the process of presenting results data that will support key technological and business decisions. The key to creating effective reports is to consider the audience of the data before determining how best to present the data. The most effective performance-test results will present analysis, comparisons, and details behind how the results were obtained, and will influence critical business decision-making.

patterns & practices Developer Center