Thinking About ROI for Test Automation

After we shipped Team Foundation Server 2010, I took on a new role on the TFS team as a Test Architect.  I’m excited about this role because it’s a little more hands-on than being a manager and because I get to help tackle some of the most challenging test problems we have across the product.

One of my first projects is to find ways to improve the return on investment (ROI) we’re getting with our test automation.  In general, there are 2 options to improve ROI: decrease investment or increase return (or both).

Left unchecked, test automation can be like a horde of Tribbles.  Sure, in the beginning it’s cute and fuzzy and makes you happy, but when your automated tests multiply and grow out of control until you’re up to your neck in them, you suddenly realize perhaps automation is not the silver bullet you may have initially thought it was.

I’ll attempt to cover in this post what we’re doing to improve test automation ROI, and depending on interest in the comments, I may go into more detail in follow-up posts.

Costs of Test Automation

Often, product teams are so focused on the product lifecycle that they forget about the lifecycle for test automation.  Like any other software project, writing code for automated tests has certain costs.  For example:

  • Initial coding and testing time (including time spent developing any “helper” libraries or layers)
  • Maintenance – keeping the test code up to date as the product under test changes over time
  • Investigation time when a test fails – is it a product bug or a bug in the test itself?
  • Test execution time – think of it as an opportunity cost; what other processes are blocked pending the outcome of the automation results?
  • Infrastructure, CPU cycles, electricity, lab maintenance costs, licenses & fees, etc.

There are many more costs, of course (see here for a good list including fixed and variable costs), but I think this short list represents some of the more impactful among them.

Taming the Automation Cost Beast

In our effort to improve automation ROI, we’re looking at several aspects of our automation.  I’ll walk through some of them and explain how we’re thinking about them in terms of ROI.  As we look at these metrics, we consider them all in concert and then decide on an appropriate action to take.  This could be the merging of 2 or more tests into a single case, making improvements to the test automation, or even deleting the automation altogether!

1. How long does the test take to run on average?

Tests that take a long time to run may be more susceptible to being broken by changes in the product because they’re touching a larger surface area of the product.  Consider whether these are truly worthwhile to be running at their current frequency and whether it might be more prudent to factor these into multiple smaller tests that could be run more often.  This might not be possible if the test is an end-to-end scenario and each step depends on the outcome of the previous step.  This type of scenario is sometimes more beneficial to run by hand though, since testers acting as customers would get a true feel for the flow through the code under test. 

Other times, tests that take a long time to run may be hung waiting for an unexpected modal dialog to be dismissed.  To give you an example, some of our integration tests specified a 10 minute default timeout before our test harness would kill them.  If the test timed out 15 seconds into the scenario, it would sit there doing nothing for the remaining 9 minutes and 45 seconds until the harness stopped the test process and moved on to the next test.  Clearly this time could have been better spent!

Another related metric is to look at the standard deviation of runtimes for the test.  If the test runtime varies greatly over time when you look across all runs but is relatively consistent when the test passes or fails, you might have found a test that hangs on failure until it times out and otherwise passes very quickly when it succeeds

Did the test find more test problems or product problems?  More test bugs means the test is either poorly written or it’s too dependent on product behaviors and interfaces that are changing over time.  For example, if the test is driving the product via UI automation and someone redesigns a particular dialog box or web page to use a checkbox control instead of a radio button, the test will likely break.

After analyzing each test failure, we use a custom field in our TFS Bug work items to indicate whether the test failure was due to an intended product change or a flaky test.  If analysis shows us a test failure is because of a test bug, we will often either fix it immediately or disable the test so it doesn’t get run again until it’s fixed.  Otherwise, the test will continue to fail in runs for the same issue until it is fixed, thus wasting time.

If the test breaks most often because of test issues, perhaps either it or the product code should be re-factored to be more testable.  There is a wealth of information on the web about coding for testability.  For starters, have a look at the SOLID principles

3. How many unique test and product failures did the test find?

Closely related to the previous metric is the concept of unique number of test bugs and product bugs found.  A test that’s finding a lot of unique product bugs might be too big a test.  That is, while it’s failing often for valid product problems, it might be failing before it finishes even setting up everything it needs to test a specific scenario.  Alternatively, if a test fails in 9 runs out of 10 for a test bug, it would be a lot more interesting to know the test found 9 unique issues than the same issue over and over.

4. What is the average pass % for the test?

The pass rate for a test is the number of times it passed divided by the number of times it was executed.

Now, this may seem counter-intuitive at first, but if you have a test that has always passed, that test may be providing little value.  Sure it’s telling you something is working, but it’s likely that’s because it has always worked and the feature hasn’t been changed.  The test impact analysis feature in Visual Studio 2010 is a great way to make sure these tests aren’t executed when the code they test hasn’t been modified.  Another possibility is that the test doesn’t actually verify anything!  You might be surprised at what you find when you start investigating these tests.

If a test is failing a large percentage of the time for test bugs, say > 50%, then you should probably remove it from your runs and figure out what needs to be fixed.  If it’s failing a high % of the time for product bugs, ask yourself if the test could be executed further “upstream” in the production pipeline (say, as part of a gated checkin?) so these bugs are caught before the code that cause them is committed.

Wrapping Up

By looking at test metrics over time, you can find candidate tests to delete or merge into other tests.  I’ve covered just a few of the ways we’re combing through our test automation to determine which tests are useful and which are candidates to be deleted or merged into other tests.  By keeping our automation arsenal lean and mean, we’re optimizing the automation to test what it’s good at testing.  When we delete a test, we keep the scenario around to cover through exploratory or strict manual testing, usually on a much less frequent basis than it would execute in automation runs.