Using Production Data to Test Can Make the Job Easier… But be Careful

Using production data is a smart way to test. It provides the most cost effective way to generate data for testing. This allows testers to spend more time focusing on writing good test cases rather than taxing their brains to generate meaningless information that only mimics actual data. Instead, the tester can generate test cases that work with real world customer data. This is especially important when a bug is found in production code and the team is scrambling to reproduce it and get it resolved as quickly as possible. It also provides huge benefits when running performance and stress tests. The list goes on and on.

However, there are important factors to take into consideration when using production data.

· Allow enough lead time (if circumstances allow it) when requesting production data backups from Operations. The Operations team is always willing to help but it’s not like they sit around all day waiting for your request.

· Be compatible. For SQL data, Operations may use utilities such as LiteSpeed to create backups. Make sure you have this utility installed in your environment so you are ready to restore it into your database. Also, make sure the version of the utility you have installed on your SQL box is the same or at least compatible with the version your Operations teams used to create the backup.

· Most important of all, remember it is usually against company policy to use data for testing that contains Personally Identifiable Information (PII). User data includes things like email addresses or any data that is collected directly from a customer.

This last bullet is critical. It is important to understand the data your organization collects, stores, or transfers. Protecting user’s privacy is one of the most important concerns of every organization. Most companies have a standard for protecting customer, employee, partner and proprietary information. Be aware of your organizations policy.


So what do you do if the production data you want to use for testing contains PII? There are ways to reduce the sensitivity of the data.


· Request sensitive information such as phone number, IP address, or email address to be removed from the production dataset before you receive it from your Operations team. If they are not willing to do that then remove it yourself as soon as you restore it.

· Convert sensitive user data to another form. For example, if you are only interested in the domain name portion of user’s email addresses then write a function to parse out the username portion and replace it with a one-way hash value.

· Make the data anonymous. Anonymous data is non-personal data which has no intrinsic link to an individual user. Removing any unique identifier that ties the data to an individual, causes it to become anonymous.


Using production data can expedite your testing efforts but should be done with caution. Know and understand what the production dataset you are using contains to avoid violating privacy laws.


Additional Resources:

· Microsoft Security Home Page:



· Microsoft Small Business Security Guidance Center



· National Institute of Standards and Technology (NIST)



· Forum of Incident Response and Security Teams (FIRST)