Data-driven testing pitfalls

The concept of data driven testing has been around for quite a long time, and I think it's worthwhile to review
its application which can be extremely powerful to automated testing.  At
the same time, it's also beneficial to know that there are certain situations
where it can do more harm than good to try to 'force feed' DDT into the
tests.  The thing that I like about DDT is that the basic concept is so
simple, but yet not many testers know when and how to apply it.  And when they find success with it, they try
to over-use it.  This practice eventually
lead to inflexible and broken design in their automation. 

Let’s go over the main advantages on DDT and when we should use it to
improve our automation.  Data driven
tests provide its biggest bang for the buck when there are numerous tests which
are permutations of one another.  For example, testing an input to an
application or API parameters.  Scenario-based tests can also be
data-driven, as long as the general execution steps within the scenario remain
unchanged.  Recently, I happened to come across some legacy automations at
our team, and I noticed that roughly 60-70% of the test scripts are simply
data.  One major pitfall of having test data coupled
with the code makes the code fragile. 
Anytime there’s a change in requirement, test data may need to be added
or updated.  As a result, the whole
script needs to be edited, compiled, and linked (for non-interpreted languages,
that is).  So why don’t we just save
ourselves the trouble and simply abstract those data out from the test code?  They have no business being together in the first place. 

IMO, the most difficult aspect of DDT is to recognize that the tests can be
data-driven from the get-go.  Testers can
be so absorbed into coding up the automation and creating a bunch of tests in
the shortest amount of time.  By simply taking
a step back and look at the test cases at hand, one can quickly see the pattern
of DDT and how to correctly apply it. 
This white paper by Keith Zambelich (note it's a Word doc)
does an excellent job of detailing the different pieces in DDT and how to go
about implementing one.

Let’s take a look at some common pitfalls with DDT approach, which I must
admit that I had fallen not once but a few times.  

  • Not planning ahead.  For example, abstracting out the actual execution steps as part of test data is just asking for trouble.  Here's a crude example to illustrate what I'm talking about, let's just say you are testing a username/password dialog box.  For argument's sake, let's just say that most of your test cases are simply validating different username/password combination.  There is a test case which will enter a username/password, and then click Cancel, and enter a different username/password.  As you can see, there is an extra step of hitting the Cancel button.  By nature, this ought to be in the code.  However, now your code will simply contain one-off case simply to handle this single test case.  Now imagine there are a bunch of test cases with different execution steps.  Your code could ended up not being generalized at all, but instead there are so many hacks put in place to handle all the different scenarios.  To avoid this, look at the overall picture of all the testcases and design both test data file "schema" and driver code accordingly.  Keep in mind that sometime you just can't apply DDT for every test cases.
  • Over-using data-driven techniques.  This is somewhat an extension from #1.  I truly believe that using data as a mechanism for logic control is simply a bad idea.  I have seen some automation which basically use data file to control test execution (in addition to data).  The code becomes overly generalized and almost contain no logic.  Everything is now transferred into data file.  I called this "force feeding" DDT.  Actual test execution
    step (i.e. main program driver) should be the code.  Values passing
    into those steps to be executed should be abstracted out.
  • Not being consistent with your data abstraction nor code generalization.  There are some test data left hardcoded and some in the test data file.  Inconsistency always lead to confusion and code integrity.
  • Not thinking about proper reporting.  This is probably apply to most test automation in general.  Make sure that the report is precise, clear, and easy to grasp -- especially when there are failures.  This way you know right away if bugs are found.

I'm sure there are more, but those are the most common ones at the top of my head at the moment.  Data driven testing is very flexible and powerful when a tester know how to use it appropriately.