Chapter 13 – Determining Individual User Data and Variances
Performance Testing Guidance for Web Applications
J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
- Learn how to determine realistic durations and distribution patters for user delay times.
- Learn how to incorporate realistic user delays into test designs and test scripts.
- Learn about key variables to consider when defining workload characterization.
- Learn about the elements of user behavior that will aid with modeling the user experience when creating load tests.
This chapter describes the process of determining realistic individual user delays, user data, and abandonment. For performance testing to yield results that are directly applicable to understanding the performance characteristics of an application in production, the tested workloads must represent the real-world production environment. To create a reasonably accurate representation of reality, you must model users with a degree variability and randomness similar to that found in a representative cross-section of users.
How to Use This Chapter
Use this chapter to understand how to model variances such as user delays, user data, and user abandonment so that your workload characterization will create realistic usage patterns, thus improving the accuracy of production simulations. To get the most from this chapter:
- Use the “User Delay” section, along with the sections that follow, to understand the key concepts of user delay modeling and its impact on workload characterization.
- Use the “Determining Individual User Data” section to understand the key concepts of user data and its impact on workload characterization.
- Use the “User Abandonment” section to understand the key concepts of user abandonment and its impact on workload characterization.
The more accurately users are modeled, the more reliable performance test results will be. One frequently overlooked aspect of accurate user modeling is the modeling of user delays. This section explains how to determine user delay times to be incorporated into your workload model and subsequently into your performance scripts.
During a session, the user can be in a number of different states — browsing, logging onto the system, and so on. Customers will have different modes of interacting with the Web site; some users are familiar with the site and quickly go from one page to another, while others take longer to decide which action they will take. Therefore, characterizing user behavior must involve modeling the customer sessions based on page flow, frequency of hits, the amount of time users’ pause between viewing pages, and any other factor specific to how users interact with your Web site.
Consequences of Improperly Modeling User Delays
To ensure realistic load tests, any reasonable attempt at applying ranges and distributions is preferable to ignoring the concept of varying user delays. Creating a load test in which every user spends exactly the same amount of time on each page is simply not realistic and will generate misleading results. For example, you can very easily end up with results similar to the following.
Figure 13.1* *Results for Using Static User Delays
In case you are not familiar with response graphs, each dot represents a user activity (in this case, a page request); the horizontal axis shows the time, in seconds, from the start of the test run; and individual virtual testers are listed on the vertical axis. This particular response graph is an example of “banding” or “striping.” Banding should be avoided when doing load or performance testing, although it may be valuable as a stress test. From the server’s perspective, this test is the same as 10 users executing the identical actions synchronously: Home pageà wait x secondsà page1.
To put a finer point on it, hold a ruler vertically against your screen and move it slowly across the graph from left to right. This is what the server sees: no dots, no dots, no dots, lots of dots, no dots. This is a very poor representation of actual user communities.
The following figure is a much better representation of actual users, achieved by adding some small-range uniform and normally distributed delays to the same test.
Figure 13.2* *Results for Using Normally Distributed User Delays
If you perform the same activity with the ruler, you will see that the dots are more evenly distributed this time, which dramatically increases both the realism of the simulated load and the accuracy of the performance test results.
Step 1 – Determine User Delays
Delays that occur while users view content on Web pages — also commonly known as think times — represent the answers to questions such as “How long does it take a user to enter their login credentials?” and “How much time will users spend reading this page?” You can use several different methods to estimate think times associated with user activities on your Web site. The best method, of course, is to use real data collected about your production site. This is rarely possible, however, because testing generally occurs before the site is released to production. This necessitates making educated guesses or approximations regarding activity on the site.
The most commonly useful methods of determining this include the following:
- When testing a Web site that is already in production, you can determine the actual values and distribution by extracting the average and standard deviation for user viewing (or typing) time from the log file for each page. With this information, you can easily determine the think time for each page. Your production site may also have Web traffic–monitoring software that provides this type of information directly.
- If you have no log files, you can run simple in-house experiments using employees, customers, clients, friends, or family members to determine, for example, the page-viewing time differences between new and returning users. This type of simplified usability study tends to be a highly effective method of data collection for Web sites that have never been live, as well as validation of data collected by using other methods.
- Time yourself using the site, or by performing similar actions on a similar site. Obviously, this method is highly vulnerable to personal bias, but it is a reasonable place to start until you get a chance to time actual users during User Acceptance Testing (UAT) or conduct your own usability study.
- In the absence of any better source of information, you can leverage some of the metrics and statistics that have already been collected by research companies such as Nielsen//NetRatings, Keynote, or MediaMetrix. These statistics provide data on average page-viewing times and user session duration based on an impersonal sample of users and Web sites. Although these numbers are not from your specific Web site, they can work quite well as first approximations.
There is no need to spend a lot of time collecting statistically significant volumes of data, or to be excessively precise. All you really need to know is how long a typical user will spend performing an activity, give or take a second or two. However, depending on the nature of your site, you may want to determine user delay times separately for first-time and experienced users.
Step 2 – Apply Delay Ranges
Simply determining how much time one person spends visiting your pages, or what the variance in time between users is, is not enough in itself — you must vary delay times by user. It is extremely unlikely that each user will spend exactly the same amount of time on a page. It is also extremely likely that conducting a performance test in which all users spend the same amount of time on a page will lead to unrealistic or at least unreliable results.
To convert the delay times or delay ranges from step 1 into something that also represents the variability between users, the following three pieces of information are required:
- The minimum delay time
- The maximum delay time
- The distribution or pattern of user delays between those points
If you do not have a minimum and maximum value from your analysis in step 1, you can apply heuristics as follows to determine acceptable estimates:
The minimum value could be:
- An experienced user who intended to go to the page but will not remain there long (for example, a user who only needs the page to load in order to scan, find, and click the next link).
- A user who realized that they clicked to the wrong page.
- A user who clicked through a form that had all of its values pre-filled.
- The minimum length of time you think a user needs to type the required information into the form.
- Half of the value that you determined was “typical.”
The maximum value could be:
- Session time-out.
- Sufficient time for a user to look up information for a form.
- No longer than it takes a slow reader to read the entire page.
- The time it takes to read, three times out loud, the text that users are expected to read. (This is the heuristic used by the film industry for any onscreen text.)
- Double the value that you determined was “typical.”
Although you want your estimate to be relatively close to reality, any range that covers ~75 percent of the expected users is sufficient to ensure that you are not unintentionally skewing your results.
Step 3 – Apply Distributions
There are numerous mathematical models for these types of distributions. Four of these models cover the overwhelming majority of user delay scenarios:
- Linear or uniform distribution
- Normal distribution
- Negative exponential distribution
- Double hump normal distribution
Linear or Uniform Distribution
A uniform distribution between a minimum and a maximum value is the easiest to model. This distribution model simply selects random numbers that are evenly distributed between the upper and lower bounds of the range. This means that it is no more likely that the number generated will be closer to the middle or either end of the range. The figure below shows a uniform distribution of 1000 values generated between 0 and 25. Use a uniform distribution in situations where there is a reasonably clear minimum and maximum value, but either have or expect to have a distinguishable pattern between those end points.
Figure 13.3* *Uniform Distribution
A normal distribution, also known as a bell curve, is more difficult to model but is more accurate in almost all cases. This distribution model selects numbers randomly in such a way that the frequency of selection is weighted toward the center, or average value. The figure below shows a normal distribution of 1000 values generated between 0 and 25 (that is, a mean of 12.5 and a standard deviation of 3.2). Normal distribution is generally considered to be the most accurate mathematical model of quantifiable measures of large cross-sections of people when actual data is unavailable. Use a normal distribution in any situation where you expect the pattern to be shifted toward the center of the end points. The valid range of values for the standard deviation is from 0 (equivalent to a static delay of the midpoint between the maximum and minimum values) and the maximum value minus the minimum value (equivalent to a uniform distribution). If you have no way to determine the actual standard deviation, a reasonable approximation is 25 percent of (or .25 times the range) of the delay.
Figure 13.4* *Normal Distribution
Negative Exponential Distribution
Negative exponential distribution creates a distribution similar to that shown in the graph below. This model skews the frequency of delay times strongly toward one end of the range. This model is most useful for situations such as users clicking a “play again” link that only activates after multimedia content has completed playing. The following figure shows a negative exponential distribution of 1000 values generated between 0 and 25.
Figure 13.5* *Negative Exponential Distribution
Double Hump Normal Distribution
The double hump normal distribution creates a distribution similar to that shown in the graph below. To understand when this distribution would be used, consider the first time you visit a Web page that has a large amount of text. On that first visit, you will probably want to read the text, but the next time you might simply click through that page on the way to a page located deeper in the site. This is precisely the type of user behavior this distribution represents. The figure below shows that 60 percent of the users who view this page spend about 8 seconds on the page scanning for the next link to click, and the other 40 percent of the users actually read the entire page, which takes about 45 seconds. You can see that both humps are normal distributions with different minimum, maximum, and standard deviation values.
Figure 13.6* *Double Hump Normal Distribution
To implement this pattern, simply write a snippet of code to generate a number between 1 and 100 to represent a percentage of users. If that number is below a certain threshold (in the graph above, below 61), call the normal distribution function with the parameters to generate delays with the first distribution pattern. If that number is at or above that threshold, call the normal distribution function with the correct parameters to generate the second distribution pattern.
Determining Individual User Data
Once you have a list of key scenarios, you will need to determine how individual users actually accomplish the tasks or activities related to those scenarios, and the user-specific data associated with a user accomplishing that task or activity.
Unfortunately, navigation paths alone do not provide all of the information required to implement a workload simulation. To fully implement the workload model, you need several more pieces of information. This information includes:
- How long users may spend on a page?
- What data may need to be entered on each page?
- What conditions may cause a user to change navigation paths?
Consider the following key points when identifying unique data for navigation paths and/or simulated users:
- Performance tests frequently consume large amounts of test data. Ensure that you have enough data to conduct an effective test.
- Using the same data repeatedly will frequently lead to invalid performance test results.
- Especially when designing and debugging performance tests, test databases can become dramatically overloaded with data. Periodically check to see if the data base is storing unrealistic volumes of data for the situation you are trying to simulate.
- Consider including invalid data in your performance tests. For example, include some users who mistype their password on the first attempt but do it correctly on a second try.
- First-time users usually spend significantly longer on each page or activity than experienced users.
- The best possible test data is test data collected from a production database or log file.
- Consider client-side caching. First-time users will be downloading every object on the site, while frequent visitors are likely to have many static objects and/or cookies stored in their local cache. When capturing the uniqueness of the user’s behavior, consider whether that user represents a first-time user or a user with an established client-side cache.
User abandonment refers to situations where customers exit the Web site before completing a task, because of performance slowness. People have different rates of tolerance for performance, depending on their psychological profile and the type of page they request. Failing to account for user abandonment will cause loads that are highly unrealistic and improbable. Load tests should simulate user abandonment as realistically as possible or they may cause types of load that will never occur in real life — and create bottlenecks that might never happen with real users. Load tests should report the number of users that might abandon the Web site due to poor performance.
In a typical Web site traffic pattern, when the load gets too heavy for the system/application to handle, the site slows down, causing people to abandon it, thus decreasing the load until the system speeds back up to an acceptable rate. Abandonment creates a self-policing mechanism that recovers performance at previous levels (when the overload occurred), even at the cost of losing some customers. Therefore, one reason to accurately account for user abandonment is to see just how many users “some” is. Another reason is to determine the actual volume your application can maintain before you start losing customers. Yet another reason to account for user abandonment is to avoid simulating, and subsequently resolving, bottlenecks that realistically might not even be possible.
If you do not account for abandonment at all, the load test may wait indefinitely to receive the page or object it requested. When the test eventually receives that object, even if “eventually” takes hours longer than a real user would wait, the test will move on to the next object as if nothing were wrong. If the request for an object simply is not acknowledged, the test skips it and makes a note in the test execution log with no regard as to whether that object was critical to the user. Note that there are some cases where not accounting for abandonment is an accurate representation of reality; for instance, a Web-based application that has been exclusively created for an audience that has no choice but to wait because there are no alternative methods of completing a required task.
The following are generally useful guidelines related to user abandonment:
- Check the abandonment rate before evaluating response times. If the abandonment rate for a particular page is less than about 2 percent, consider the possibility of those response times being outliers.
- Check the abandonment rate before drawing conclusions about load. Remember, every user who abandons is one less user applying load. Although the response-time statistics may look good, if you have 75-percent abandonment, load is roughly 75 percent lighter than it was being tested for.
- If the abandonment rate is more than about 20 percent, consider disabling the abandonment routine and re-executing the test to help gain information about what is causing the problem.
The process of designing realistic user delays into tests and test scripts is critical for workload characterizations to generate accurate results. For performance testing to yield results that are directly applicable to understanding the performance characteristics of an application in production or a projected future business volume, the tested workloads must represent reality, replicating user delay patterns.
To create a reasonably accurate representation of reality, you must model user delays with variability and randomness by taking into account individual user data and user abandonment, similar to a representative cross-section of users.