Creating a custom data generator: PersonGenerator

One of the most exciting extensibility points in Team Edition for Database Professionals is the ability to create your own custom data generator. Team Data will ship with a comprehensive set of data generators to generate data for almost all data types as well as to generate more sophisticated data using, for example, regular expressions and external data sources. However, we know that our customers will want to tweak the generation of values to specifically generate data that adheres to their business rules, domain values, and other data constraints. I wanted to walk you through today how to go about building a custom generator leveraging the rich extensibility model.

 

Let me start by motivating the data generator that we are about to build. We often find that so many databases in our world contain person-related information. Just think of tables related to Contacts, Employees, Clients, Customers, etc. All such tables often have similar data fields – names, addresses, phone numbers, birthdates, etc. You could simply generate strings, dates, etc that adhere to the database constraints to successfully generate test data for these fields. However, it’s often more useful to generate meaningful data for these fields. This is especially true when you are doing database unit testing. It’s so much simpler to trouble-shoot an issue with, say, wrong data being returned, when you are dealing with “Jon” and “Mary” data, as opposed to simply a collection of random strings. This allows you to easily code tests that deal with this expected set of data. So what we would like is a data generator that can be assigned to these sorts of columns and generate meaningful information, say, from a list of such information in a data source.

 

And that is exactly the generator we are going to build. A PersonGenerator that generates data for the following 20 fields. It will pull this data from an excel file (say PersonDataGenerator.xls stored in the My Documents folder) that you can easily populate with your favorite names, phone-numbers, etc. to make it more meaningful to you.

 

Outputs

§ Title

§ Full Name

§ First Name

§ Middle Name

§ Last Name

§ Suffix

§ Sex

§ Age

§ Birth Date

§ Job Title

§ Company

§ Address 1

§ Address 2

§ City

§ State

§ Zip Code

§ E-Mail

§ Web page

§ IM

§ Phone Number

 

Let’s keep this generator simple, so the only input that we’ll allow the user of the generator to customize is the character casing. They will be able to decide if they want the data uppercase, lowercase, or normal casing (whatever casing is in the data file).

 

Inputs

§ CharacterCasing

 

So now that we have designed what we want the generator to do, let’s look at the steps it’s going to take to actually put this generator together.

 

Steps

1. Create generator data source file

2. Code the custom data generator

3. Deploy & register the data generator

4. Use the data generator

 

Create generator data source file

 

We decided that the data source for our PersonGenerator is going to be a “PersonGeneratorData.xls” (Excel 8.0) file in the current user’s “My Documents” folder.

 

We’ll name the first sheet in the Excel file “PersonDataGenerator”. This worksheet will simply contain in it on the first row the 20 output columns listed above. All subsequent rows will be actual data values.

 

So go ahead and create this file (attached) . Then add some meaningful data. For now you probably need about 10 good solid data records in there.

 

Code the custom data generator

 

1. Create a new C# Class Library project in Visual Studio 2005 named Generators (Of course you can use any .NET language, but here we walk you through a C# example)

2. Add the following reference to your project:

Microsoft.VisualStudio.TeamSystem.Data.dll

(C:\Program Files\Microsoft Visual Studio 8\DBPro\Microsoft.VisualStudio.TeamSystem.Data.dll)

3. Rename the Class1.cs file to PersonGenerator.cs

4. Copy the following code into PersonGenerator.cs

//PersonGenerator

//Sample custom data generator

//

//Team Edition for Database Professionals

//Author: Sachin Rekhi

//Date: 10.26.06

namespace Generators

{

    using System;

    using System.Collections.Generic;

    using System.Text;

    using System.Data;

    using System.Data.Common;

    using System.Data.OleDb;

    using System.Globalization;

    using Microsoft.VisualStudio.TeamSystem.Data.DataGenerator;

    /// <summary>

    /// CasingType enum. To support CharacterCasing generator input.

    /// </summary>

    public enum CasingType

    {

        NormalCase,

        UpperCase,

        LowerCase

    }

    /// <summary>

    /// PersonGenerator. Generates values for common Person-related information.

    /// Sources data from PersonGeneratorData.xls stored in current user's My Documents folder.

    /// </summary>

    public class PersonGenerator : Generator

    {

        private const string FileName = "PersonGeneratorData.xls";

    private const string ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"Excel 8.0;HDR=Yes;\";";

        private const string DataQuery = "SELECT * FROM [PersonGeneratorData$]";

       

        private CasingType _casing;

        private OleDbConnection _connection;

        private OleDbDataReader _reader;

        public PersonGenerator()

        {

        }

        #region Inputs

        [Input(Name="Character Casing", DefaultValue=CasingType.NormalCase)]

  public CasingType CharacterCasing

        {

            set { _casing = value; }

            get { return _casing; }

        }

        #endregion

        #region Outputs

        [Output]

        public string Title

        {

            get { return GetOutputValue("Title"); }

        }

        [Output]

        public string FullName

        {

            get { return GetOutputValue("FullName"); }

        }

        [Output]

        public string FirstName

        {

            get { return GetOutputValue("FirstName"); }

        }

       

        [Output]

        public string MiddleName

        {

            get { return GetOutputValue("MiddleName"); }

        }

        [Output]

        public string LastName

        {

            get { return GetOutputValue("LastName"); }

        }

        [Output]

        public string Suffix

        {

            get { return GetOutputValue("Suffix"); }

        }

        [Output]

        public string Sex

        {

            get { return GetOutputValue("Sex"); }

        }

        [Output]

        public string Age

        {

            get { return GetOutputValue("Age"); }

        }

        [Output]

        public string BirthDate

        {

            get { return GetOutputValue("BirthDate"); }

        }

        [Output]

        public string JobTitle

        {

            get { return GetOutputValue("JobTitle"); }

        }

        [Output]

        public string Company

        {

            get { return GetOutputValue("Company"); }

        }

        [Output]

        public string Address1

        {

            get { return GetOutputValue("Address1"); }

        }

        [Output]

        public string Address2

        {

            get { return GetOutputValue("Address2"); }

        }

        [Output]

  public string City

        {

            get { return GetOutputValue("City"); }

        }

        [Output]

        public string State

        {

            get { return GetOutputValue("State"); }

        }

        [Output]

        public string ZipCode

        {

            get { return GetOutputValue("ZipCode"); }

        }

        [Output]

        public string EMail

        {

            get { return GetOutputValue("EMail"); }

        }

        [Output]

        public string WebPage

        {

            get { return GetOutputValue("WebPage"); }

        }

        [Output]

        public string IM

        {

            get { return GetOutputValue("IM"); }

        }

        [Output]

        public string PhoneNumber

        {

            get { return GetOutputValue("PhoneNumber"); }

        }

        protected string GetOutputValue(string output)

        {

            string outputValue = _reader[output].ToString();

            return FormatOutputValue(outputValue);

        }

        protected string FormatOutputValue(string originalOutput)

        {

            string formattedValue = String.Empty;

            if (originalOutput != null)

            {

                //format character casing

                switch (_casing)

               {

                    case CasingType.LowerCase:

                        formattedValue = originalOutput.ToLower(CultureInfo.CurrentCulture);

                        break;

                    case CasingType.UpperCase:

                        formattedValue = originalOutput.ToUpper(CultureInfo.CurrentCulture);

                        break;

                    case CasingType.NormalCase:

                        formattedValue = originalOutput;

                        break;

                    default:

  break;

                }

            }

            return formattedValue;

        }

        #endregion

        /// <summary>

        /// OnInitialize. Open data source and initialize.

        /// </summary>

        /// <param name="initInfo"></param>

        protected override void OnInitialize(GeneratorInit initInfo)

        {

            base.OnInitialize(initInfo);

            //build connection string

            string filePath = String.Format(CultureInfo.InvariantCulture, "{0}\\{1}",

                System.Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments), FileName);

            string fullConnectionString = String.Format(CultureInfo.InvariantCulture, ConnectionString, filePath);

            //open connection

  _connection = new OleDbConnection(fullConnectionString);

            _connection.Open();

            //start reading data

            ResetDataReader();

        }

        /// <summary>

        /// ResetDataReader. Initialize data reader.

     /// </summary>

        protected void ResetDataReader()

        {

            //close reader if still open

            if (_reader != null && _reader.IsClosed == false)

                _reader.Close();

            OleDbCommand command = _connection.CreateCommand();

            command.CommandText = DataQuery;

            try

            {

                _reader = command.ExecuteReader();

            }

            catch

            {

                throw new GeneratorUserException(String.Format(CultureInfo.CurrentCulture,

                    "Data cannot be retrieved from data file. Ensure that the {0} file exists in your My Documents folder with the appropriate column headings and data.", FileName));

            }

            if (!_reader.HasRows)

  {

                throw new GeneratorUserException(String.Format(CultureInfo.CurrentCulture,

                    "Data file is empty."));

            }

        }

        /// <summary>

        /// OnGenerateNextValues. Advance to the next Person data record.

        /// </summary>

        protected override void OnGenerateNextValues()

        {

            base.OnGenerateNextValues();

            if (_reader == null)

            {

                throw new GeneratorUserException(String.Format(CultureInfo.CurrentCulture,

                    "Data reader not appropriately initialized."));

            }

            //get next Person data record

            //if reach end of file, reset to beginning

            if (!_reader.Read())

            {

    ResetDataReader();

                _reader.Read();

            }

        }

        /// <summary>

        /// Dispose. Clean up any open reader or connection.

        /// </summary>

        /// <param name="disposing"></param>

        protected override void Dispose(bool disposing)

        {

            //should this be first or in finally block?

            base.Dispose(disposing);

           

            //dispose reader

            if (_reader != null)

            {

                _reader.Dispose();

            }

            //dispose connection

            if (_connection != null)

            {

                _connection.Dispose();

            }

        }

    }

}

5. You should now set your class library to be signed. You can do this by going to project properties, going to the Signing tab, and choosing Sign the assembly. Then specify a key or create a new one.

6. You can now go ahead and build the data generator

So let’s now take a moment to walk through what the code above actually does. We first created a PersonGenerator class that inherits from the base Generator class. This represents the simplest way to go about building a custom generator.

Then we override the OnInitialize method to perform any initialization that we want our generator to do prior to actually generating data values. In this case, our initialization is going to open a data connection to the excel file. We do this using the OleDb data provider with the appropriate connection string (found as a constant in the code). The initialization then opens up a datareader against the data source file.

We then override the OnGenerateNextValues. This method gets called by the infrastructure every time a new value is needed to generate a new row for the specific field. In this method, we merely advance our data reader to the next record in our excel file. If we ever reach the end of the file, we reset the data reader to the beginning of the file and resume.

We then need to define our inputs and outputs. You can see that we have one public property CharacterCasing which is our input value. It is marked as an input with the appropriate [Input] attribute. Similary, we define our 20 output values. As you can see, we have created 20 properties appropriately marked with the [Output] attribute. Each of these output properties call GetOutputValue to pull the appropriate field from the data reader. The output is appropriately formatted in the FormatOutputValue method to adhere to the user’s input on character casing.

Finally, we implement the Dispose method to ensure that the data reader and data connection are appropriately closed when the generator is no longer needed.

Deploy & register the data generator

1.      The first thing you are going to want to do is add the data generator assembly to the GAC (Global Assembly Cache). You can do this using the gacutil.exe command. If both gacutil and your data generator assembly are in your path, you can use a command like the following. This will install (or force a re-install if it already exists) of the assembly.

gacutil /if Generators.dll

2.      The next step is to register the data generator. You do this by adding a new extensions.xml file in the appropriate location. The default location is the following:

C:\Program Files\Microsoft Visual Studio 8\DBPro

Add the file Generators.extensions.xml to the directory with the following contents:

<?xml version="1.0" encoding="us-ascii"?>

<extensions assembly="Generators, Version=1.0.0.0, Culture=neutral, PublicKeyToken=<Public Key Token>" version="1" xmlns="urn:Microsoft.VisualStudio.TeamSystem.Data.Extensions" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:Microsoft.VisualStudio.TeamSystem.Data.Extensions

Microsoft.VisualStudio.TeamSystem.Data.Extensions.xsd">

  <extension type="Generators.PersonGenerator" enabled="true" />

</extensions>

Here we are registering the strongly named assembly and each of its contained data generators. In this example we have only our one generator. You must put the public key token for the signed assembly here as well. The easiest way to obtain the public key is to browse to the assembly directory in your Windows dir, find the binary that you just added to the GAC, and copy the public key token from there.

You can clearly automate this process and create a nice little setup tool if you wished to GAC the assembly and register it on the client machine. This way your entire organization can easily take advantage of your custom library of data generators. I’m not going to get into the details here, but you can use regular deployment technologies to accomplish this.

Use the data generator

§ Now we are ready to put the data generator to use. Make sure you close out of Visual Studio and re-open it. This is necessary because the data generators are only loaded on initial load of Visual Studio and are not refreshed.

§ Open an existing database project and create a new data generation plan. Go to a specific Person-related table (say the Employees table in Northwind) and change the generator associated with a FirstName, LastName, etc. field to use the PersonGenerator. Then choose which output you would like.

§ In the preview window, look at the values that are generated from your new generator! And that’s it!

        

I hope that gives you a sense of the potential of producing your own data generator as well as gets you started on your path to developing them. There are many aspects of the extensibility model that I did not cover here. This only describes the most basic type of data generator you can built. There will be documentation and extended whitepapers available detailing just how to take advantage of the entire extensibility model. I’ll add links to my blog as soon as they are available.

For now, if you have questions, the forums are a great place to ask them. So please do! And I’d love to hear about whatever data generators you are building.

Sachin Rekhi

PersonGeneratorData.xls