Volume 32 Number 11
Continuous Data Migration Using Visual Studio and TFS
Data migration is often a requirement in an application development effort, whether it’s a greenfield project or an application redesign. Despite this, very little attention gets paid to data migration during the design and development phase, with activity often pushed back to the latter part of the project. While this approach might allow an undivided focus on migration, it comes with a few risks, not the least of which is that teams lack the time and resources to properly test the migration and surrounding code.
The concept of continuous data migration is founded on the idea of developing migration scripts during the development of an application and maintaining them through versions just like the application code. A continuous migration approach allows you to test your migrations alongside code development, ensuring your data and code assets stay in sync.
In this article, I describe a solution that leverages Visual Studio and Team Foundation Server to achieve continuous data migration. Keep in mind there are third-party tools like Red Gate Ready Roll that are capable of doing this partially, but they come at a huge cost and lack the capability of continuous data migration.
Challenge and Solution
Visual Studio can perform an incremental publish of a database with the help of SqlPackage.exe, but the tool is lacking in many ways. For instance, SqlPackage.exe falls short when inserting a new column between the columns of a table, changing the seed data, and normalizing and de-normalizing tables, among other examples.
Also, versioning changes is very important when you need to do targeted fixes and deployment. For example, you may need to increment the value of a column by 10 when you migrate from v1.2 to v1.3, but not in any other flow. This can only be achieved with the use of versioning; however, SQL lacks this capability.
I want to approach the challenge by designing a solution that takes full advantage of what Visual Studio and SqlPackage.exe can offer, while resolving the previously mentioned shortfalls.
A typical database project has two types of scripts—compiled and non-compiled. All of the objects such as schema, tables, views, stored procedures and the like are generally written as a compiled script. The seed script and any runtime queries will generally be placed into the post-deployment script, which is non-compiled.
Let’s start with an example. Figure 1 shows the AdventureWorks.Database sample database project (imported from the backup available at bit.ly/2vPwu4N). As you can see, all the objects are put in as compiled scripts.
Figure 1 Compiled Scripts in the AdventureWorks.Database
The seed data script (containing data needed for the application to function) is put in as non-compiled script and referred in the post-deployment script. Figure 2 shows this. If you are not aware of the significance of post-deployment scripts, I encourage you to read the MSDN Library documentation at bit.ly/2w12Iy4.
Figure 2 Post-Deployment Scripts
To ensure that the post-deployment script can handle incremental deployment, I’ve added a NOT EXISTS clause in front of all the INSERT statements. For example:
IF NOT EXISTS (SELECT 1 FROM [Person].[AddressType] WHERE [AddressTypeID] = 1) INSERT [Person].[AddressType] ([AddressTypeID], [Name], [ModifiedDate]) VALUES (1 N’Billing’, CAST (N’2008-04-30T00:00.000’ AS DateTime))
For the sake of simplicity and ease of maintenance, I’ll keep all the seed scripts on their respective files and refer them back in the post-deployment script.
I now have a project that will deploy the latest schema and seed data, at any point in time. It is also capable of performing an incremental deployment on an existing database if the project is not introducing any breaking changes. However, the restrictions I mentioned at the start of this section come into play.
Finally, there is a bug that breaks the incremental deployment when a user-defined type (UDT) is changed. Unfortunately, the Visual Studio team has marked this bug as won’t fix, which means you’ll need to work around that. You can explore more details about the bug in the Visual Studio Developer Community entry at bit.ly/2w0zTBU.
Just as you version any application you ship, it’s important to version the database, as well. Versioning helps you keep track of the source code so you can easily keep tabs on the features, bugs and fixes made in every release of the software. If you’re not already familiar with versioning, take a moment and check out the article, “Semantic Versioning 2.0.0” at semver.org. It’s worth a read.
Before I get started, I have to address a challenge: SQL doesn’t actually come with a mechanism for versioning, so I’ll need to create one of my own. I’ll create a table called [internal].[DatabaseVersion] to store the version detail, where “internal” is the schema of the table. It’s good practice to have a separate schema for all the database objects used for internal purposes (that is, they don’t participate in the actual business).
Figure 3 shows the schema I would propose for the table. You can follow your own pattern if you’re not convinced. Just keep in mind that we create versions to keep track of builds and releases.
Figure 3 Table Schema
CREATE TABLE [internal].[DatabaseVersion] ( [DatabaseVersionId] INT IDENTITY(1,1) NOT NULL, [Major] INT NOT NULL, [Minor] INT NOT NULL, [Build] INT NOT NULL, [Revision] INT NOT NULL, [CreatedBy] NVARCHAR (256) CONSTRAINT [DFDatabaseVersionCreatedBy] DEFAULT ('') NOT NULL, [CreatedOn] DATETIME CONSTRAINT [DFDatabaseVersionCreatedOn] DEFAULT (GETUTCDATE()) NOT NULL, [ModifiedBy] NVARCHAR (256) CONSTRAINT [DFDatabaseVersionModifiedBy] DEFAULT ('') NOT NULL, [ModifiedOn] DATETIME CONSTRAINT [DFDatabaseVersionModifiedOn] DEFAULT (GETUTCDATE()) NOT NULL, CONSTRAINT [PKDatabaseVersion] PRIMARY KEY CLUSTERED ([DatabaseVersionId] ASC) );GO
Every time I make a change to the schema or check in a data migration script, I add a new version entry to the table, which serves as the label for the change. If the current version is 18.104.22.168, and I introduce a migration that fixes the Gender flag issue by reversing the values, I will add appropriate scripts to make that change and add a new entry to the table with the version saying 1.1.0128.212.
As I already discussed, Visual Studio can do incremental deployments, but not with breaking changes. So as I design the migration, I need to keep that in mind and work around the limitation.
The first step is to create a separate project to target the migration. With respect to the sample in Figure 3, I create a new database project called AdventureWorks.Database.Migration. This migration project targets two types of scripts. The first is the data migration script, which needs to be run if any data movement or update occurs. The second script takes care of breaking schema changes that Visual Studio and SqlPackage.exe cannot handle. Both these scripts go into the project as a post-deployment script. There are no compilable scripts in this project.
To better understand the scenario, let’s discuss everything in relation to the AdventureWorks sample. I’ve uploaded the source code for this into my GIT repository at github.com/Jebarson/ContinuousDataMigration. The master branch incorporates the base project that I’ve imported and created from the database, as mentioned earlier.
Before I venture into the scenario, I’d like to explain how the migration works. As I discussed in the Versioning section, I’m versioning every single change released by adding a new row into internal.DatabaseVersion. Inside the AdventureWorks.Database.Migration project, I write the logic to execute appropriate migration scripts based on the targeted database version. Check out the flowchart in Figure 4 for an understanding of the logic involved.
Figure 4 The Migration Logic
At the start of the AdventureWorks.Database.Migration project, I check the current version of the database and based on that run the migration scripts all the way to the latest version. Here’s the code snippet I use to decide the migration path, which I’ll refer to as Script 1:
DECLARE @currentDBVersion BIGINT = NULL; -- Get the current version of the database. SELECT TOP 1 @currentDBVersion = Id FROM [internal].[DatabaseVersion] ORDER BY [DatabaseVersionId] DESC -- Jump to the incremental migration scripts based on the current version. IF @currentDBVersion = 1 GOTO Version11 ELSE IF @currentDBVersion = 2 GOTO Version12 ELSE RETURN
Now that I’ve shown how the migration scripts run, let’s drive the migration with a few fictional scenarios that help illustrate what’s going on. I’ll discuss two version changes from the base project that I created earlier.
Version 1.1: This is the first change on top of the base project I created. The changes are available in the v11 branch of the Continuous Data Migration project on GitHub. The changes I’ve committed in this version are as follows:
- Inserted a new IsEmployee column to [HumanResources].[Employee] after the JobTitle column.
- Changed the [Person].[AddressType] name from Main Office to Office.
- Changed SPs (no need to include in migration project).
- New SPs (no need to include in migration project).
All these changes are made into the regular project AdventureWorks.Database as is, along with the new version row into internal.DatabaseVersion. This facilitates any fresh deployment to incorporate the latest changes. For any existing database with the base version to be upgraded to v1.1, I need to implement the same changes into the migration project. To do so, I segregate it into two sections: the schema change and the data change. Inserting a new column called IsEmployee is a schema change, while changing AddressType from Main Office to Office is a data change.
The schema change is something Visual Studio can do. However, it can only append the column, and that’s something I don’t want. To overcome this limitation, I must generate a script to first drop all the dependencies (indexes, constraints, foreign keys and the like) of the table Employee, and then create a temporary table with the new column in the correct order with all the dependencies. Then, I can move the data from the Employee table to the temp table, drop the Employee table and finally rename the temp table to Employee. This script is available in the v11 branch of my Continuous Data Migration project on GitHub, in the file SchemaChangeScript.sql.
The data change just alters a value of the record from Main Office to Office and, therefore, I can script an update query to accomplish this. Check out the DataChangeScript.sql file in the v11 branch of the Continuous Data Migration project on GitHub.
When the migration project is run on top of the existing AdventureWorks.Database, the code from Script 1 will send the execution to a script that calls the schema and data change script, as shown in the following snippet, which I’ll call Script 2:
-- Script to migrate to v1.1 Version11: :r .\Scripts\Migration\V11\SchemaChangeScript.sql :r .\Scripts\Migration\V11\DataChangeScript.sql EXEC [internal].[CreateDatabaseVersion] @id = 2, @major = 1, @minor = 1, @build = 0128, @revision = 212
Version 1.2: This is the latest change committed on top of v1.1. The same changes are available in the v12 branch of the project on GitHub. The changes in this version are as follows:
- Changed the IsEmployee in [HumanResources].[Employee] to EmployeeType, referring a new table to [HumanResources].[EmployeeType].
- Changed SPs (no need to include in migration project).
- New table (no need to include in migration project).
Similar to v1.1, I’ve also made the changes on the regular project AdventureWorks.Database along with a new entry into internal.DatabaseVersion. As you can see, IsEmployee is now changed to EmployeeType to accommodate more types of employees. To achieve this, I follow the same pattern I did in v1.1. However, I need to write the data migration for the new column based on the value of the earlier column. The schema change script is written in the file SchemaChangeScript.sql in the v12 branch of the Continuous Data Migration project on GitHub.
Here’s the script I’ve included in the project to migrate to v1.2, which I’ll call Script 3:
-- Script to migrate to v1.2 Version12: :r .\Scripts\Migration\V12\SchemaChangeScript.sql EXEC [internal].[CreateDatabaseVersion] @id = 3, @major = 1, @minor = 2, @build = 0414, @revision = 096
As I mentioned earlier, Visual Studio is partially capable of driving an incremental deployment, but I’m targeting only the items that Visual Studio lacks with the scripts I’ve generated up to this point. You may have noticed that for certain items in both v1.1 and v1.2, I mentioned “no need to include in migration project.” That’s because Visual Studio is able to deploy them incrementally. Which begs the question: What changes qualify for a migration project and what changes don’t?
You can look at the handy cheat list in Figure 5, which will help you decide whether to script the migration or not. Note that there might be more items you come across that could be added to this list .
|New table/view/stored procedure/object||Leverage Visual Studio|
|Change in view/stored procedure/function||Leverage Visual Studio|
|Change in user-defined type||Drop all related stored procedures of the UDT. This is a workaround for the bug described earlier.|
|Addition of new column to table||Script a migration from the existing table to a new table with the correct column order (see github.com/Jebarson/ContinuousDataMigration). This isn’t required if you’re adding a nullable column and the order of the column is immaterial.|
|Normalization or de-normalization of table||Script a migration to either split or merge based on the requirement. This is similar to the script created in v1.2.|
|Change in data||Script out the data change.|
Figure 5 Migration Project Cheat List
OK, enough about generating the migration scripts. Time to move on to the deployment.
When deploying a fresh instance of the latest version of an existing database, there’s no need for migration. In the example I’ve been presenting, all you need to deploy is the AdventureWorks.Database. You can do this from either Visual Studio (via publish) or using SqlPackage.exe. Here’s the command for deploying the database using SqlPackage.exe:
SqlPackage.exe /Action:Publish /SourceFile:"AdventureWorks.Database.dacpac" /tdn:<<DatabaseName>> /tsn:"<<SQL Instance>>"
When you perform an incremental deployment over an existing database, there’s a chance that the latest script will need a migration. That means I have to deploy the migration database, as well. I’ll do this by deploying the AdventureWorks.Database.Migration project first, followed by AdventureWorks.Database. Be sure the “Always re-create database” option is unchecked under the Advanced Deployment Options area of the Advanced Publish Settings dialog box, as shown in Figure 6.
Figure 6 The Advanced Publish Settings Dialog Box
SqlPackage.exe /Action:Publish /SourceFile:"AdventureWorks.Migration.Database.dacpac" /tdn:<<DatabaseName>> /tsn:"<<SQL Instance>>" /p:CreateNewDatabase = False SqlPackage.exe /Action:Publish /SourceFile:"AdventureWorks.Database.dacpac" /tdn:<<DatabaseName>> /tsn:"<<SQL Instance>>" /p:CreateNewDatabase = False
3 Common Migration Problems and Fixes
Continuous data migration can produce a lot of benefits, but it’s not without its challenges. Here are some common issues you might expect to encounter while implementing this solution, and ways to address them.
You might encounter this error in the migration project, when your recent version of a migration script has removed an object, but the object is referred in an earlier version script. The resolution is to write queries as sp_executesql ‘<<your migration script here>>.’ For example:
EXEC sp_executesql 'ALTER TABLE Employee ADD NewCol INT'
Out of control migration scripts and version overload:
It’s a good idea to always set a minimum target version for migration. Doing so limits the scope of your migration scripts and helps ensure that they don’t become too difficult to maintain.
Implementing with a production database:
In case you want to implement this solution in a database that’s already in production, include the definition of internal.DatabaseVersion and its version entries. Change the “Script 1” to see if the table internal.DatabaseVersion exists and if it doesn’t, direct the execution to the newer version label, which will do the migration and also create the table. For example:
DECLARE @currentDBVersion BIGINT = NULL; -- Get the current version of the database. IF NOT EXISTS(SELECT 1 FROM [INFORMATION_SCHEMA].[TABLES] WHERE [TABLES]. [TABLE_NAME]='DatabaseVersion' AND [TABLES].[TABLE_SCHEMA]='internal') SELECT @currentDBVersion = 1 ELSE SELECT TOP 1 @currentDBVersion = Id FROM [internal].[DatabaseVersion] ORDER BY [DatabaseVersionId] DESC -- Jump to the incremental migration scripts based on the current version. IF @currentDBVersion = 1 GOTO Version11 ELSE IF @currentDBVersion = 2 GOTO Version12 ELSE RETURN
Configuring the TFS Build to Deploy Continuous Migration
The goal is to automate migrations just like continuous integration, so that the build server does the data migration and makes it available to developers and testers once the build is triggered. The next step is to configure the build’s release tasks.
To create tasks for the build, you should first be aware of creating a Continuous Integration build. If you aren’t, be sure to read through the tutorial posted on the Microsoft Visual Studio site at bit.ly/2xWqtUx.
Once you’ve created the build tasks, you need to create the deploy tasks for the database. In this example, you must add two deploy tasks: one for the AdventureWorks.Database.Migration project and the other for the AdventureWorks.Database project. The deploy task will look something like Figure 7.
Figure 7 The Deploy Task
Fill in the detail and set up the trigger based on your requirement. Once the build is up and running you will have set up a Continuous Data Migration for your application.
In this article, I explained the importance of continuous data migration on a project that involves multiple phases of release, and how to achieve it using Visual Studio and TFS. Continuous data migration helps you reduce both development effort and migration bugs. In my experience, I’ve been able to gain as much as 40 percent of migration effort in terms of development. It also got rid of the Migration Phase in the project.
Integration of migration scripts to TFS is as important as the migration scripts themselves. The continuous data migration process is of no use if you aren’t deploying it as a part of daily build. “Fail early, fail fast” is the mantra to remember in software development, and continuous data migration sets you up to do exactly that.
Jebarson Jebamony is a senior consultant with Microsoft Services and designs and builds solutions for Microsoft, its partners and customers. He has more than 14 years of technical experience and has worked on many Microsoft technologies in that time.
Thanks to the following Microsoft technical experts for reviewing this article: Sagar Dheram and Shrenik Jhaveri