A better approach to test data generation

Anytime you are doing any sort of serious database development, you’ll need realistic test data to verify your system. There are several commonly used approaches to test data generation:

 

§ Use production data for testing purposes. Typically you do this either by using old production data or by transforming production data so as to replace actual values with somewhat-equivalent values. The advantage of this approach is there is nothing more representative than your production data. Of course the big downside here is that, well, you are using production data. This clearly has large privacy implications, especially in today’s highly regulated environment.

 

§ Come up with test data from scratch. Oftentimes when you are working on a new development project, there will of course not be any production data to mimic. While this approach is free of all the privacy issues, it is very painful to actually come up with test data easily that actually mimics of your production environment.

 

Now, let me offer you a better approach to test data generation. That is to use the new test data generation tool that ships with Team Edition for Database Professionals.

 

The premise of this tool is centered around approach #2 – building test data from scratch. We went down this path because we wanted to completely avoid the privacy\regulatory issues of munging production data. The power of this tool is in its ability to efficiently generate test data from scratch that is representative of your production environment. We facilitate this in several ways.

 

§ Highly configurable value generators. Each table column can be assigned a value generator. We have value generators for each column data type. Each generator has a set of properties which let you fully control the parameters of the generated values.

 

§ Additional powerful generators. These include a regular expression generator, which generates values based on the specified regular expression (think phone numbers), as well as a data bound generator, which allows you to pull values for data generation from a known data source.

 

§ Smart default assignment of generators. We automatically assign by default the appropriate value generator to each column based on the column’s data type. This way as soon as you use this tool, you are generating values of the appropriate data type.

 

§ Recognition of database constraints. We understand some of your database constraints and automatically configure the assigned generator to adhere to these constraints. For example, if your column has defined on it a check constraint that specifies a column must be greater than some value, we will configure the minimum value for data generation to be greater than that value.

 

§ Support for data distributions. We provide the ability to apply a distribution on a value generator to generate values according to the distribution. For example, you can generate values according to a normal curve, or say an exponential curve.

§ Generator & distribution extensibility. We allow you to extend the library of generators and distributions to include any kind of value generation that would be appropriate for your test data needs.

 

§ Data generation is repeatable. The values generated are based on a seed value, ensuring that the data generated is repeatable. This makes it ideal for use with database unit testing.

 

§ Enforcement of table ratios. We provide the ability for you to model certain relationships between your tables and enforce specified ratios between them. For example, lets say you had an Auctions and Bids table in your database. You can setup a 1:25 ratio between the two tables such that for every auction generated, 25 associated bids will be generated.

 

Are you convinced yet that this is a better approach? :) If so, please try out the tool! If not, also check out the tool and hopefully then you’ll understand.

 

It’s available here:

https://msdn.microsoft.com/vstudio/teamsystem/products/dbpro/default.aspx

 

And of course, I’d love to hear your feedback. Any thoughts you have on the functionality would be greatly appreciated.

 

In future posts I’ll be drilling into specifics, even showing you how to create your own custom generator! So stay tuned...

 

Sachin Rekhi