Move data to and from SQL Server on-premises or on IaaS (Azure VM) using Azure Data Factory

Note

This article applies to version 1 of Data Factory, which is generally available (GA). If you are using version 2 of the Data Factory service, which is in preview, see SQL Server connector in V2.

This article explains how to use the Copy Activity in Azure Data Factory to move data to/from an on-premises SQL Server database. It builds on the Data Movement Activities article, which presents a general overview of data movement with the copy activity.

Supported scenarios

You can copy data from a SQL Server database to the following data stores:

Category Data store
Azure Azure Blob storage
Azure Data Lake Store
Azure Cosmos DB (DocumentDB API)
Azure SQL Database
Azure SQL Data Warehouse
Azure Search Index
Azure Table storage
Databases SQL Server
Oracle
File File system

You can copy data from the following data stores to a SQL Server database:

Category Data store
Azure Azure Blob storage
Azure Cosmos DB (DocumentDB API)
Azure Data Lake Store
Azure SQL Database
Azure SQL Data Warehouse
Azure Table storage
Databases Amazon Redshift
DB2
MySQL
Oracle
PostgreSQL
SAP Business Warehouse
SAP HANA
SQL Server
Sybase
Teradata
NoSQL Cassandra
MongoDB
File Amazon S3
File System
FTP
HDFS
SFTP
Others Generic HTTP
Generic OData
Generic ODBC
Salesforce
Web Table (table from HTML)
GE Historian

Supported SQL Server versions

This SQL Server connector support copying data from/to the following versions of instance hosted on-premises or in Azure IaaS using both SQL authentication and Windows authentication: SQL Server 2016, SQL Server 2014, SQL Server 2012, SQL Server 2008 R2, SQL Server 2008, SQL Server 2005

Enabling connectivity

The concepts and steps needed for connecting with SQL Server hosted on-premises or in Azure IaaS (Infrastructure-as-a-Service) VMs are the same. In both cases, you need to use Data Management Gateway for connectivity.

See moving data between on-premises locations and cloud article to learn about Data Management Gateway and step-by-step instructions on setting up the gateway. Setting up a gateway instance is a pre-requisite for connecting with SQL Server.

While you can install gateway on the same on-premises machine or cloud VM instance as the SQL Server for better performance, we recommended that you install them on separate machines. Having the gateway and SQL Server on separate machines reduces resource contention.

Getting started

You can create a pipeline with a copy activity that moves data to/from an on-premises SQL Server database by using different tools/APIs.

The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard.

You can also use the following tools to create a pipeline: Azure portal, Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.

Whether you use the tools or APIs, you perform the following steps to create a pipeline that moves data from a source data store to a sink data store:

  1. Create a data factory. A data factory may contain one or more pipelines.
  2. Create linked services to link input and output data stores to your data factory. For example, if you are copying data from a SQL Server database to an Azure blob storage, you create two linked services to link your SQL Server database and Azure storage account to your data factory. For linked service properties that are specific to SQL Server database, see linked service properties section.
  3. Create datasets to represent input and output data for the copy operation. In the example mentioned in the last step, you create a dataset to specify the SQL table in your SQL Server database that contains the input data. And, you create another dataset to specify the blob container and the folder that holds the data copied from the SQL Server database. For dataset properties that are specific to SQL Server database, see dataset properties section.
  4. Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output. In the example mentioned earlier, you use SqlSource as a source and BlobSink as a sink for the copy activity. Similarly, if you are copying from Azure Blob Storage to SQL Server Database, you use BlobSource and SqlSink in the copy activity. For copy activity properties that are specific to SQL Server Database, see copy activity properties section. For details on how to use a data store as a source or a sink, click the link in the previous section for your data store.

When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools/APIs (except .NET API), you define these Data Factory entities by using the JSON format. For samples with JSON definitions for Data Factory entities that are used to copy data to/from an on-premises SQL Server database, see JSON examples section of this article.

The following sections provide details about JSON properties that are used to define Data Factory entities specific to SQL Server:

Linked service properties

You create a linked service of type OnPremisesSqlServer to link an on-premises SQL Server database to a data factory. The following table provides description for JSON elements specific to on-premises SQL Server linked service.

The following table provides description for JSON elements specific to SQL Server linked service.

Property Description Required
type The type property should be set to: OnPremisesSqlServer. Yes
connectionString Specify connectionString information needed to connect to the on-premises SQL Server database using either SQL authentication or Windows authentication. Yes
gatewayName Name of the gateway that the Data Factory service should use to connect to the on-premises SQL Server database. Yes
username Specify user name if you are using Windows Authentication. Example: domainname\username. No
password Specify password for the user account you specified for the username. No

You can encrypt credentials using the New-AzureRmDataFactoryEncryptValue cmdlet and use them in the connection string as shown in the following example (EncryptedCredential property):

"connectionString": "Data Source=<servername>;Initial Catalog=<databasename>;Integrated Security=True;EncryptedCredential=<encrypted credential>",

Samples

JSON for using SQL Authentication

{
    "name": "MyOnPremisesSQLDB",
    "properties":
    {
        "type": "OnPremisesSqlServer",
        "typeProperties": {
            "connectionString": "Data Source=<servername>;Initial Catalog=MarketingCampaigns;Integrated Security=False;User ID=<username>;Password=<password>;",
            "gatewayName": "<gateway name>"
        }
    }
}

JSON for using Windows Authentication

Data Management Gateway will impersonate the specified user account to connect to the on-premises SQL Server database.

{
     "Name": " MyOnPremisesSQLDB",
     "Properties":
     {
         "type": "OnPremisesSqlServer",
         "typeProperties": {
             "ConnectionString": "Data Source=<servername>;Initial Catalog=MarketingCampaigns;Integrated Security=True;",
             "username": "<domain\\username>",
             "password": "<password>",
             "gatewayName": "<gateway name>"
        }
     }
}

Dataset properties

In the samples, you have used a dataset of type SqlServerTable to represent a table in a SQL Server database.

For a full list of sections & properties available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types (SQL Server, Azure blob, Azure table, etc.).

The typeProperties section is different for each type of dataset and provides information about the location of the data in the data store. The typeProperties section for the dataset of type SqlServerTable has the following properties:

Property Description Required
tableName Name of the table or view in the SQL Server Database instance that linked service refers to. Yes

Copy activity properties

If you are moving data from a SQL Server database, you set the source type in the copy activity to SqlSource. Similarly, if you are moving data to a SQL Server database, you set the sink type in the copy activity to SqlSink. This section provides a list of properties supported by SqlSource and SqlSink.

For a full list of sections & properties available for defining activities, see the Creating Pipelines article. Properties such as name, description, input and output tables, and policies are available for all types of activities.

Note

The Copy Activity takes only one input and produces only one output.

Whereas, properties available in the typeProperties section of the activity vary with each activity type. For Copy activity, they vary depending on the types of sources and sinks.

SqlSource

When source in a copy activity is of type SqlSource, the following properties are available in typeProperties section:

Property Description Allowed values Required
sqlReaderQuery Use the custom query to read data. SQL query string. For example: select * from MyTable. May reference multiple tables from the database referenced by the input dataset. If not specified, the SQL statement that is executed: select from MyTable. No
sqlReaderStoredProcedureName Name of the stored procedure that reads data from the source table. Name of the stored procedure. The last SQL statement must be a SELECT statement in the stored procedure. No
storedProcedureParameters Parameters for the stored procedure. Name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. No

If the sqlReaderQuery is specified for the SqlSource, the Copy Activity runs this query against the SQL Server Database source to get the data.

Alternatively, you can specify a stored procedure by specifying the sqlReaderStoredProcedureName and storedProcedureParameters (if the stored procedure takes parameters).

If you do not specify either sqlReaderQuery or sqlReaderStoredProcedureName, the columns defined in the structure section are used to build a select query to run against the SQL Server Database. If the dataset definition does not have the structure, all columns are selected from the table.

Note

When you use sqlReaderStoredProcedureName, you still need to specify a value for the tableName property in the dataset JSON. There are no validations performed against this table though.

SqlSink

SqlSink supports the following properties:

Property Description Allowed values Required
writeBatchTimeout Wait time for the batch insert operation to complete before it times out. timespan

Example: “00:30:00” (30 minutes).
No
writeBatchSize Inserts data into the SQL table when the buffer size reaches writeBatchSize. Integer (number of rows) No (default: 10000)
sqlWriterCleanupScript Specify query for Copy Activity to execute such that data of a specific slice is cleaned up. For more information, see repeatable copy section. A query statement. No
sliceIdentifierColumnName Specify column name for Copy Activity to fill with auto generated slice identifier, which is used to clean up data of a specific slice when rerun. For more information, see repeatable copy section. Column name of a column with data type of binary(32). No
sqlWriterStoredProcedureName Name of the stored procedure that upserts (updates/inserts) data into the target table. Name of the stored procedure. No
storedProcedureParameters Parameters for the stored procedure. Name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. No
sqlWriterTableType Specify table type name to be used in the stored procedure. Copy activity makes the data being moved available in a temp table with this table type. Stored procedure code can then merge the data being copied with existing data. A table type name. No

JSON examples for copying data from and to SQL Server

The following examples provide sample JSON definitions that you can use to create a pipeline by using Azure portal or Visual Studio or Azure PowerShell. The following samples show how to copy data to and from SQL Server and Azure Blob Storage. However, data can be copied directly from any of sources to any of the sinks stated here using the Copy Activity in Azure Data Factory.

Example: Copy data from SQL Server to Azure Blob

The following sample shows:

  1. A linked service of type OnPremisesSqlServer.
  2. A linked service of type AzureStorage.
  3. An input dataset of type SqlServerTable.
  4. An output dataset of type AzureBlob.
  5. The pipeline with Copy activity that uses SqlSource and BlobSink.

The sample copies time-series data from a SQL Server table to an Azure blob every hour. The JSON properties used in these samples are described in sections following the samples.

As a first step, setup the data management gateway. The instructions are in the moving data between on-premises locations and cloud article.

SQL Server linked service

{
  "Name": "SqlServerLinkedService",
  "properties": {
    "type": "OnPremisesSqlServer",
    "typeProperties": {
      "connectionString": "Data Source=<servername>;Initial Catalog=<databasename>;Integrated Security=False;User ID=<username>;Password=<password>;",
      "gatewayName": "<gatewayname>"
    }
  }
}

Azure Blob storage linked service

{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
    }
  }
}

SQL Server input dataset

The sample assumes you have created a table “MyTable” in SQL Server and it contains a column called “timestampcolumn” for time series data. You can query over multiple tables within the same database using a single dataset, but a single table must be used for the dataset's tableName typeProperty.

Setting “external”: ”true” informs Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.

{
  "name": "SqlServerInput",
  "properties": {
    "type": "SqlServerTable",
    "linkedServiceName": "SqlServerLinkedService",
    "typeProperties": {
      "tableName": "MyTable"
    },
    "external": true,
    "availability": {
      "frequency": "Hour",
      "interval": 1
    },
    "policy": {
      "externalData": {
        "retryInterval": "00:01:00",
        "retryTimeout": "00:10:00",
        "maximumRetry": 3
      }
    }
  }
}

Azure Blob output dataset

Data is written to a new blob every hour (frequency: hour, interval: 1). The folder path for the blob is dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, day, and hours parts of the start time.

{
  "name": "AzureBlobOutput",
  "properties": {
    "type": "AzureBlob",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}",
      "partitionedBy": [
        {
          "name": "Year",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "yyyy"
          }
        },
        {
          "name": "Month",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "MM"
          }
        },
        {
          "name": "Day",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "dd"
          }
        },
        {
          "name": "Hour",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "HH"
          }
        }
      ],
      "format": {
        "type": "TextFormat",
        "columnDelimiter": "\t",
        "rowDelimiter": "\n"
      }
    },
    "availability": {
      "frequency": "Hour",
      "interval": 1
    }
  }
}

Pipeline with Copy activity

The pipeline contains a Copy Activity that is configured to use these input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to SqlSource and sink type is set to BlobSink. The SQL query specified for the SqlReaderQuery property selects the data in the past hour to copy.

{  
    "name":"SamplePipeline",
    "properties":{  
    "start":"2016-06-01T18:00:00",
    "end":"2016-06-01T19:00:00",
    "description":"pipeline for copy activity",
    "activities":[  
      {
        "name": "SqlServertoBlob",
        "description": "copy activity",
        "type": "Copy",
        "inputs": [
          {
            "name": " SqlServerInput"
          }
        ],
        "outputs": [
          {
            "name": "AzureBlobOutput"
          }
        ],
        "typeProperties": {
          "source": {
            "type": "SqlSource",
            "SqlReaderQuery": "$$Text.Format('select * from MyTable where timestampcolumn >= \\'{0:yyyy-MM-dd HH:mm}\\' AND timestampcolumn < \\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)"
          },
          "sink": {
            "type": "BlobSink"
          }
        },
       "scheduler": {
          "frequency": "Hour",
          "interval": 1
        },
        "policy": {
          "concurrency": 1,
          "executionPriorityOrder": "OldestFirst",
          "retry": 0,
          "timeout": "01:00:00"
        }
      }
     ]
   }
}

In this example, sqlReaderQuery is specified for the SqlSource. The Copy Activity runs this query against the SQL Server Database source to get the data. Alternatively, you can specify a stored procedure by specifying the sqlReaderStoredProcedureName and storedProcedureParameters (if the stored procedure takes parameters). The sqlReaderQuery can reference multiple tables within the database referenced by the input dataset. It is not limited to only the table set as the dataset's tableName typeProperty.

If you do not specify sqlReaderQuery or sqlReaderStoredProcedureName, the columns defined in the structure section are used to build a select query to run against the SQL Server Database. If the dataset definition does not have the structure, all columns are selected from the table.

See the Sql Source section and BlobSink for the list of properties supported by SqlSource and BlobSink.

Example: Copy data from Azure Blob to SQL Server

The following sample shows:

  1. The linked service of type OnPremisesSqlServer.
  2. The linked service of type AzureStorage.
  3. An input dataset of type AzureBlob.
  4. An output dataset of type SqlServerTable.
  5. The pipeline with Copy activity that uses BlobSource and SqlSink.

The sample copies time-series data from an Azure blob to a SQL Server table every hour. The JSON properties used in these samples are described in sections following the samples.

SQL Server linked service

{
  "Name": "SqlServerLinkedService",
  "properties": {
    "type": "OnPremisesSqlServer",
    "typeProperties": {
      "connectionString": "Data Source=<servername>;Initial Catalog=<databasename>;Integrated Security=False;User ID=<username>;Password=<password>;",
      "gatewayName": "<gatewayname>"
    }
  }
}

Azure Blob storage linked service

{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
    }
  }
}

Azure Blob input dataset

Data is picked up from a new blob every hour (frequency: hour, interval: 1). The folder path and file name for the blob are dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, and day part of the start time and file name uses the hour part of the start time. “external”: “true” setting informs the Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.

{
  "name": "AzureBlobInput",
  "properties": {
    "type": "AzureBlob",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}",
      "fileName": "{Hour}.csv",
      "partitionedBy": [
        {
          "name": "Year",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "yyyy"
          }
        },
        {
          "name": "Month",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "MM"
          }
        },
        {
          "name": "Day",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "dd"
          }
        },
        {
          "name": "Hour",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "HH"
          }
        }
      ],
      "format": {
        "type": "TextFormat",
        "columnDelimiter": ",",
        "rowDelimiter": "\n"
      }
    },
    "external": true,
    "availability": {
      "frequency": "Hour",
      "interval": 1
    },
    "policy": {
      "externalData": {
        "retryInterval": "00:01:00",
        "retryTimeout": "00:10:00",
        "maximumRetry": 3
      }
    }
  }
}

SQL Server output dataset

The sample copies data to a table named “MyTable” in SQL Server. Create the table in SQL Server with the same number of columns as you expect the Blob CSV file to contain. New rows are added to the table every hour.

{
  "name": "SqlServerOutput",
  "properties": {
    "type": "SqlServerTable",
    "linkedServiceName": "SqlServerLinkedService",
    "typeProperties": {
      "tableName": "MyOutputTable"
    },
    "availability": {
      "frequency": "Hour",
      "interval": 1
    }
  }
}

Pipeline with Copy activity

The pipeline contains a Copy Activity that is configured to use these input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to BlobSource and sink type is set to SqlSink.

{  
    "name":"SamplePipeline",
    "properties":{  
    "start":"2014-06-01T18:00:00",
    "end":"2014-06-01T19:00:00",
    "description":"pipeline with copy activity",
    "activities":[  
      {
        "name": "AzureBlobtoSQL",
        "description": "Copy Activity",
        "type": "Copy",
        "inputs": [
          {
            "name": "AzureBlobInput"
          }
        ],
        "outputs": [
          {
            "name": " SqlServerOutput "
          }
        ],
        "typeProperties": {
          "source": {
            "type": "BlobSource",
            "blobColumnSeparators": ","
          },
          "sink": {
            "type": "SqlSink"
          }
        },
       "scheduler": {
          "frequency": "Hour",
          "interval": 1
        },
        "policy": {
          "concurrency": 1,
          "executionPriorityOrder": "OldestFirst",
          "retry": 0,
          "timeout": "01:00:00"
        }
      }
      ]
   }
}

Troubleshooting connection issues

  1. Configure your SQL Server to accept remote connections. Launch SQL Server Management Studio, right-click server, and click Properties. Select Connections from the list and check Allow remote connections to the server.

    Enable remote connections

    See Configure the remote access Server Configuration Option for detailed steps.

  2. Launch SQL Server Configuration Manager. Expand SQL Server Network Configuration for the instance you want, and select Protocols for MSSQLSERVER. You should see protocols in the right-pane. Enable TCP/IP by right-clicking TCP/IP and clicking Enable.

    Enable TCP/IP

    See Enable or Disable a Server Network Protocol for details and alternate ways of enabling TCP/IP protocol.

  3. In the same window, double-click TCP/IP to launch TCP/IP Properties window.
  4. Switch to the IP Addresses tab. Scroll down to see IPAll section. Note down the TCP Port **(default is **1433).
  5. Create a rule for the Windows Firewall on the machine to allow incoming traffic through this port.
  6. Verify connection: To connect to the SQL Server using fully qualified name, use SQL Server Management Studio from a different machine. For example: "..corp..com,1433."

    Important

    See Move data between on-premises sources and the cloud with Data Management Gateway for detailed information.

    See Troubleshoot gateway issues for tips on troubleshooting connection/gateway related issues.

Identity columns in the target database

This section provides an example that copies data from a source table with no identity column to a destination table with an identity column.

Source table:

create table dbo.SourceTbl
(
       name varchar(100),
       age int
)

Destination table:

create table dbo.TargetTbl
(
       identifier int identity(1,1),
       name varchar(100),
       age int
)

Notice that the target table has an identity column.

Source dataset JSON definition

{
    "name": "SampleSource",
    "properties": {
        "published": false,
        "type": " SqlServerTable",
        "linkedServiceName": "TestIdentitySQL",
        "typeProperties": {
            "tableName": "SourceTbl"
        },
        "availability": {
            "frequency": "Hour",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

Destination dataset JSON definition

{
    "name": "SampleTarget",
    "properties": {
        "structure": [
            { "name": "name" },
            { "name": "age" }
        ],
        "published": false,
        "type": "AzureSqlTable",
        "linkedServiceName": "TestIdentitySQLSource",
        "typeProperties": {
            "tableName": "TargetTbl"
        },
        "availability": {
            "frequency": "Hour",
            "interval": 1
        },
        "external": false,
        "policy": {}
    }
}

Notice that as your source and target table have different schema (target has an additional column with identity). In this scenario, you need to specify structure property in the target dataset definition, which doesn’t include the identity column.

Invoke stored procedure from SQL sink

See Invoke stored procedure for SQL sink in copy activity article for an example of invoking a stored procedure from SQL sink in a copy activity of a pipeline.

Type mapping for SQL server

As mentioned in the data movement activities article, the Copy activity performs automatic type conversions from source types to sink types with the following 2-step approach:

  1. Convert from native source types to .NET type
  2. Convert from .NET type to native sink type

When moving data to & from SQL server, the following mappings are used from SQL type to .NET type and vice versa.

The mapping is same as the SQL Server Data Type Mapping for ADO.NET.

SQL Server Database Engine type .NET Framework type
bigint Int64
binary Byte[]
bit Boolean
char String, Char[]
date DateTime
Datetime DateTime
datetime2 DateTime
Datetimeoffset DateTimeOffset
Decimal Decimal
FILESTREAM attribute (varbinary(max)) Byte[]
Float Double
image Byte[]
int Int32
money Decimal
nchar String, Char[]
ntext String, Char[]
numeric Decimal
nvarchar String, Char[]
real Single
rowversion Byte[]
smalldatetime DateTime
smallint Int16
smallmoney Decimal
sql_variant Object *
text String, Char[]
time TimeSpan
timestamp Byte[]
tinyint Byte
uniqueidentifier Guid
varbinary Byte[]
varchar String, Char[]
xml Xml

Mapping source to sink columns

To map columns from source dataset to columns from sink dataset, see Mapping dataset columns in Azure Data Factory.

Repeatable copy

When copying data to SQL Server Database, the copy activity appends data to the sink table by default. To perform an UPSERT instead, See Repeatable write to SqlSink article.

When copying data from relational data stores, keep repeatability in mind to avoid unintended outcomes. In Azure Data Factory, you can rerun a slice manually. You can also configure retry policy for a dataset so that a slice is rerun when a failure occurs. When a slice is rerun in either way, you need to make sure that the same data is read no matter how many times a slice is run. See Repeatable read from relational sources.

Performance and Tuning

See Copy Activity Performance & Tuning Guide to learn about key factors that impact performance of data movement (Copy Activity) in Azure Data Factory and various ways to optimize it.