Test Run - Diffusion Testing

By James McCaffrey | March 2011

image: James McCaffrey In this month’s column, I introduce you to a software-testing technique I call diffusion testing. The key idea of diffusion testing is that it’s sometimes possible to automatically generate new test case data from existing test cases that yield a pass result.

Although diffusion testing is a technique that isn’t applicable in most software-testing scenarios, when diffusion testing is applicable, it can greatly improve the efficiency of your test effort. Perhaps in part because it’s a niche technique, diffusion testing is one of the least known of all major testing techniques, based on my experience.

Before I present examples of diffusion testing, let me explain the motivation behind the method. A test case typically consists of a test case ID, a set of one or more inputs and an expected result. For example, the vector {001, 2, 3, 5} could represent a test case for a Sum function, with ID = 001, inputs = 2 and 3 and an expected result = 5. The test case inputs are sent to the system under test, an actual result is produced and the actual result is compared to the expected result to determine a test case pass/fail result.

In many software-testing situations, it’s difficult and time-consuming to determine the expected result part of a test case. For example, suppose you’re testing a basic math function that computes the harmonic mean of two inputs that are rates. The average of 30.0 kilometers per hour (kph) and 60.0 kph is not (30.0 + 60.0) / 2 = 45.0 kph, but rather the harmonic mean of 30.0 and 60.0, which is 1 / ((1/30.0 + 1/60.0) / 2) = 40.0 kph. Generating hundreds of expected results for this function would be tedious, take a lot of time and be prone to error.

The difficulty of determining test case expected results is a fundamental concept in software testing and is sometimes referred to as the test oracle problem. In fact, one of the holy grails of software testing is the search for techniques that can automatically determine test cases. So the motivation behind diffusion testing is that, if you can somehow automatically generate new test case data, you’ll sidestep a time-consuming part of the testing process and be able to test your system more thoroughly.

Automatically generating test case data with diffusion testing is great in principle, but how does it work? The best way to explain diffusion testing is by way of an example. Take a look at Figure 1.

image: Diffusion Testing Demo

Figure 1 Diffusion Testing Demo

Here, I’m testing a function, Choose(n,k), which returns the number of ways to select k items from n items where order doesn’t matter. In my simplified example, I have three existing test cases. The first test case has inputs n = 8 and k = 3 and an expected result of 56. After my test harness executed the first test case, which yielded a pass result, I used diffusion testing to automatically generate a new test case with inputs n = 9, k = 3 and an expected result of 84. Neat! Notice that because test case 002 yielded a fail result, I didn’t generate a new diffused test case.

But how are new test cases generated from an existing test case? For the Choose(n,k) function, it turns out that, mathematically, Choose(n+1,k) = Choose(n,k) * (n+1) / (n-k+1). In other words, there’s a known relationship between new inputs and old return values. The function I used to generate a diffused test case from an existing test case is shown in Figure 2. The entire program that generated the output shown in Figure 1 is available at msdn.microsoft.com/magazine/msdnmag0311.

Figure 2 Generating a New Test Case

static string CreateDiffusedTestCase(string existingTestCase)
{
  // Assumes input format is CaseID:N:K:Expected
  string[] tokens = existingTestCase.Split(':');
  string oldTestCase = tokens[0];
  int oldN = int.Parse(tokens[1]);
  int oldK = int.Parse(tokens[2]);
  long oldExpected = long.Parse(tokens[3]);
  string newTestCase = oldTestCase + "-diffused";
  int newN = oldN + 1;
  int newK = oldK;
  long newExpected = (oldExpected * (oldN + 1)) / (oldN - oldK + 1);
  return newTestCase + ":" + newN + ":" + newK + ":" + newExpected;
}

A couple of additional examples may help to make this idea clearer. Suppose you’re testing functions that compute the trigonometric sine and cosine. You may recall that sin 2t = 2 * sin t * cos t. So if you have test cases that yield pass results for the sine and cosine of some input, you could use diffusion testing to derive a new test case for the sine function.

Diffusion testing isn’t magic. Suppose you’re testing a function that accepts a product ID of some sort, searches a SQL database and returns true if the product is in stock and false if the product isn’t in stock. Because there’s no relationship between different inputs and results, you couldn’t use diffusion testing in this scenario. In this respect, diffusion testing is similar to other forms of testing such as boundary condition testing and pairwise testing: It’s a technique that’s applicable only in certain situations.

Let me present another example of diffusion testing. Suppose you’ve written a function, Gauss(z), which accepts a standard normal z value and which returns the area under the standard normal (bell-shaped curve) distribution from negative infinity to z. For example, Gauss(-1.645) = 0.0500, Gauss(1.645) = 0.9500 and Gauss(0) = 0.5000. One way to use diffusion testing is to note the monotonic property of Gauss and that for any z value in the range negative infinity to 2.5, the result of Gauss(z + 0.1) must be greater than Gauss(z). Another way to use diffusion testing is to note the symmetric property of Gauss and that for any z value that’s less than 0.0, Gauss(-z) = 1.0 - Gauss(z).

The examples I’ve presented illustrate the three most common—but by no means the only—scenarios where diffusion testing is applicable. The first scenario is where you’re testing a mathematical function that can be defined as a recurrence relationship. The second scenario is where you’re testing a function that has some monotonic relationship. And the third scenario is where you’re testing a function that has some symmetric relationship. A related form of testing, but one that isn’t diffusion testing, is when you’re testing a function where switching the order of input values doesn’t change the return value, such as with Sum(x,y).

Mathematical functions are the most common type of component under test that can benefit from diffusion testing, because such functions most often are recurrent, monotonic or symmetric—but you should be alert to other situations, too. Mathematical functions that involve recurrence relations are especially well-suited for diffusion testing because you can often generate multiple new test cases from an existing test case. In the demo in Figure 1, test case 001 with n = 8, k = 3 and expected = 56 generated a new diffused test case with n = 9, k = 3 and expected = 84. This new test case could be used to generate another test case with n = 10, k = 3 and expected = 120, and if that test case passed, it could be used to generate yet another new test case, and so on.

Before I wrap up, let me jump on my soapbox and address a pet peeve of mine related to naming different software-testing techniques and principles. I’ve labeled the technique described in this column as diffusion testing because existing test cases diffuse, or scatter, to create new cases. I could just as well have called the technique adaptive testing or auto-generation testing or any of a number of other things. It’s not the label that’s important, but rather the technique represented by the label that counts.

In many fields of study, including software testing, self-proclaimed experts apply some label to a common-sense technique and implicitly attempt to convince people new to the field that the label itself somehow carries some importance. This is typically motivated by the desire to directly sell training or indirectly sell consulting services by delivering a talk at a conference on the marvelous new label. Notable offenders are the terms “exploratory testing” and “context school of testing,” but there are many others. So take the term “diffusion testing” for what it is—simply a label to describe a software-testing technique, but one that can be a useful addition to your technical toolkit.

Dr. James McCaffrey works for Volt Information Sciences Inc., where he manages technical training for software engineers working at the Microsoft Redmond, Wash., campus. He’s worked on several Microsoft products, including Internet Explorer and MSN Search. Dr. McCaffrey is the author of “.NET Test Automation Recipes” (Apress, 2006), and can be reached at jammc@microsoft.com.

Thanks to the following technical experts for reviewing this article: Bj Rollison and Alan Page

Test Run - Diffusion Testing

Additional resources