Copy data to or from Azure SQL Database by using Azure Data Factory

This article outlines how to use the Copy Activity in Azure Data Factory to copy data from and to an Azure SQL Database. It builds on the copy activity overview article that presents a general overview of copy activity.

Note

This article applies to version 2 of Data Factory, which is currently in preview. If you are using version 1 of the Data Factory service, which is generally available (GA), see Azure SQL Database connector in V1.

Supported capabilities

You can copy data from/to Azure SQL Database to any supported sink data store, or copy data from any supported source data store to Azure SQL Database. For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.

Specifically, this Azure SQL Database connector supports:

  • Copying data using SQL authentication, and Azure Active Directory Application token authentication with Service Principal or Managed Service Identity (MSI).
  • As source, retrieving data using SQL query or stored procedure.
  • As sink, appending data to destination table or invoking a stored procedure with custom logic during copy.

Important

If you copy data using Azure Integration Runtime, configure Azure SQL Server Firewall to allow Azure Services to access the server. If you copy data using Self-hosted Integration Runtime, configure the Azure SQL Server firewall to allow appropriate IP range including the machine's IP that is used to connect to Azure SQL Database.

Getting started

You can create a pipeline with the copy activity by using one of the following tools or SDKs. Select a link to go to a tutorial with step-by-step instructions to create a pipeline with a copy activity.

The following sections provide details about properties that are used to define Data Factory entities specific to Azure SQL Database connector.

Linked service properties

The following properties are supported for Azure SQL Database linked service:

Property Description Required
type The type property must be set to: AzureSqlDatabase Yes
connectionString Specify information needed to connect to the Azure SQL Database instance for the connectionString property. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. Yes
servicePrincipalId Specify the application's client ID. Yes when using AAD authentication with Service Principal.
servicePrincipalKey Specify the application's key. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. Yes when using AAD authentication with Service Principal.
tenant Specify the tenant information (domain name or tenant ID) under which your application resides. You can retrieve it by hovering the mouse in the upper-right corner of the Azure portal. Yes when using AAD authentication with Service Principal.
connectVia The Integration Runtime to be used to connect to the data store. You can use Azure Integration Runtime or Self-hosted Integration Runtime (if your data store is located in private network). If not specified, it uses the default Azure Integration Runtime. No

For different authentication types, refer to the following sections on prerequisites and JSON samples respectively:

Using SQL authentication

Linked service example using SQL authentication:

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Using service principal authentication

To use service principal based AAD application token authentication, follow these steps:

  1. Create an Azure Active Directory application from Azure portal. Make note of the application name, and the following values which you use to define the linked service:

    • Application ID
    • Application key
    • Tenant ID
  2. Provision an Azure Active Directory administrator for your Azure SQL Server on Azure portal if you haven't done so. The AAD administrator has to be an AAD user or AAD group, but cannot be a service principal. This step is done so that, in the subsequent step, you can use an AAD identity to create a contained database user for the service principal.

  3. Create a contained database user for the service principal, by connecting to the database from/to which you want to copy data using tools like SSMS, with an AAD identity having at least ALTER ANY USER permission, and executing the following T-SQL. Learn more on contained database user from here.

    CREATE USER [your application name] FROM EXTERNAL PROVIDER;
    
  4. Grant the service principal needed permissions as you normally do for SQL users, e.g. by executing below:

    EXEC sp_addrolemember [role name], [your application name];
    
  5. In ADF, configure an Azure SQL Database linked service.

Linked service example using service principal authentication:

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;Connection Timeout=30"
            },
            "servicePrincipalId": "<service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<service principal key>"
            },
            "tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Using managed service identity authentication

A data factory can be associated with a managed service identity (MSI), which represents this specific data factory. You can use this service identity for Azure SQL Database authentication, which allows this designated factory to access and copy data from/to your database.

To use MSI based AAD application token authentication, follow these steps:

  1. Create a group in Azure AD and make the factory MSI a member of the group.

    a. Find the data factory service identity from Azure portal. Go to your data factory -> Properties -> copy the SERVICE IDENTITY ID.

    b. Install the Azure AD PowerShell module, sign in using Connect-AzureAD command, and run the following commands to create a group and add the data factory MSI as a member.

    $Group = New-AzureADGroup -DisplayName "<your group name>" -MailEnabled $false -SecurityEnabled $true -MailNickName "NotSet"
    Add-AzureAdGroupMember -ObjectId $Group.ObjectId -RefObjectId "<your data factory service identity ID>"
    
  2. Provision an Azure Active Directory administrator for your Azure SQL Server on Azure portal if you haven't done so. The AAD administrator can be an AAD user or AAD group. If you grant the group with MSI an admin role, skip step 3 and 4 below as the administrator would have full access to the DB.

  3. Create a contained database user for the AAD group, by connecting to the database from/to which you want to copy data using tools like SSMS, with an AAD identity having at least ALTER ANY USER permission, and executing the following T-SQL. Learn more on contained database user from here.

    CREATE USER [your AAD group name] FROM EXTERNAL PROVIDER;
    
  4. Grant the AAD group needed permissions as you normally do for SQL users, e.g. by executing below:

    EXEC sp_addrolemember [role name], [your AAD group name];
    
  5. In ADF, configure an Azure SQL Database linked service.

Linked service example using MSI authentication:

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;Connection Timeout=30"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Dataset properties

For a full list of sections and properties available for defining datasets, see the datasets article. This section provides a list of properties supported by Azure SQL Database dataset.

To copy data from/to Azure SQL Database, set the type property of the dataset to AzureSqlTable. The following properties are supported:

Property Description Required
type The type property of the dataset must be set to: AzureSqlTable Yes
tableName Name of the table or view in the Azure SQL Database instance that linked service refers to. Yes

Example:

{
    "name": "AzureSQLDbDataset",
    "properties":
    {
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "<Azure SQL Database linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "MyTable"
        }
    }
}

Copy activity properties

For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by Azure SQL Database source and sink.

Azure SQL Database as source

To copy data from Azure SQL Database, set the source type in the copy activity to SqlSource. The following properties are supported in the copy activity source section:

Property Description Required
type The type property of the copy activity source must be set to: SqlSource Yes
sqlReaderQuery Use the custom SQL query to read data. Example: select * from MyTable. No
sqlReaderStoredProcedureName Name of the stored procedure that reads data from the source table. The last SQL statement must be a SELECT statement in the stored procedure. No
storedProcedureParameters Parameters for the stored procedure.
Allowed values are: name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters.
No

Points to note:

  • If the sqlReaderQuery is specified for the SqlSource, the Copy Activity runs this query against the Azure SQL Database source to get the data. Alternatively, you can specify a stored procedure by specifying the sqlReaderStoredProcedureName and storedProcedureParameters (if the stored procedure takes parameters).
  • If you do not specify either "sqlReaderQuery" or "sqlReaderStoredProcedureName", the columns defined in the "structure" section of the dataset JSON are used to construct a query (select column1, column2 from mytable) to run against the Azure SQL Database. If the dataset definition does not have the "structure", all columns are selected from the table.
  • When you use sqlReaderStoredProcedureName, you still need to specify a dummy tableName property in the dataset JSON.

Example: using SQL query

"activities":[
    {
        "name": "CopyFromAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Azure SQL Database input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlSource",
                "sqlReaderQuery": "SELECT * FROM MyTable"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

Example: using stored procedure

"activities":[
    {
        "name": "CopyFromAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Azure SQL Database input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlSource",
                "sqlReaderStoredProcedureName": "CopyTestSrcStoredProcedureWithParameters",
                "storedProcedureParameters": {
                    "stringData": { "value": "str3" },
                    "identifier": { "value": "$$Text.Format('{0:yyyy}', <datetime parameter>)", "type": "Int"}
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

The stored procedure definition:

CREATE PROCEDURE CopyTestSrcStoredProcedureWithParameters
(
    @stringData varchar(20),
    @identifier int
)
AS
SET NOCOUNT ON;
BEGIN
     select *
     from dbo.UnitTestSrcTable
     where dbo.UnitTestSrcTable.stringData != stringData
    and dbo.UnitTestSrcTable.identifier != identifier
END
GO

Azure SQL Database as sink

To copy data to Azure SQL Database, set the sink type in the copy activity to SqlSink. The following properties are supported in the copy activity sink section:

Property Description Required
type The type property of the copy activity sink must be set to: SqlSink Yes
writeBatchSize Inserts data into the SQL table when the buffer size reaches writeBatchSize.
Allowed values are: integer (number of rows).
No (default is 10000)
writeBatchTimeout Wait time for the batch insert operation to complete before it times out.
Allowed values are: timespan. Example: “00:30:00” (30 minutes).
No
preCopyScript Specify a SQL query for Copy Activity to execute before writing data into Azure SQL Database. It will only be invoked once per copy run. You can use this property to clean up the pre-loaded data. No
sqlWriterStoredProcedureName Name of the stored procedure that defines how to apply source data into target table, e.g. to do upserts or transform using your own business logic.

Note this stored procedure will be invoked per batch. If you want to do operation that only runs once and has nothing to do with source data e.g. delete/truncate, use preCopyScript property.
No
storedProcedureParameters Parameters for the stored procedure.
Allowed values are: name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters.
No
sqlWriterTableType Specify a table type name to be used in the stored procedure. Copy activity makes the data being moved available in a temp table with this table type. Stored procedure code can then merge the data being copied with existing data. No

Tip

When copying data to Azure SQL Database, the copy activity appends data to the sink table by default. To perform an UPSERT or additional business logic, use the stored procedure in SqlSink. Learn more details from Invoking stored procedure for SQL Sink.

Example 1: appending data

"activities":[
    {
        "name": "CopyToAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Azure SQL Database output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "SqlSink",
                "writeBatchSize": 100000
            }
        }
    }
]

Example 2: invoking a stored procedure during copy for upsert

Learn more details from Invoking stored procedure for SQL Sink.

"activities":[
    {
        "name": "CopyToAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Azure SQL Database output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "SqlSink",
                "sqlWriterStoredProcedureName": "CopyTestStoredProcedureWithParameters",
                "sqlWriterTableType": "CopyTestTableType",
                "storedProcedureParameters": {
                    "identifier": { "value": "1", "type": "Int" },
                    "stringData": { "value": "str1" }
                }
            }
        }
    }
]

Identity columns in the target database

This section provides an example for copying data from a source table without an identity column to a destination table with an identity column.

Source table:

create table dbo.SourceTbl
(
       name varchar(100),
       age int
)

Destination table:

create table dbo.TargetTbl
(
       identifier int identity(1,1),
       name varchar(100),
       age int
)

Notice that the target table has an identity column.

Source dataset JSON definition

{
    "name": "SampleSource",
    "properties": {
        "type": " AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "TestIdentitySQL",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "SourceTbl"
        }
    }
}

Destination dataset JSON definition

{
    "name": "SampleTarget",
    "properties": {
        "structure": [
            { "name": "name" },
            { "name": "age" }
        ],
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "TestIdentitySQL",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "TargetTbl"
        }
    }
}

Notice that as your source and target table have different schema (target has an additional column with identity). In this scenario, you need to specify structure property in the target dataset definition, which doesn’t include the identity column.

Invoke stored procedure from SQL sink

When copying data into Azure SQL Database, a user specified stored procedure could be configured and invoked with additional parameters.

A stored procedure can be used when built-in copy mechanisms do not serve the purpose. It is typically used when upsert (insert + update) or extra processing (merging columns, looking up additional values, insertion into multiple tables, etc.) needs to be done before the final insertion of source data in the destination table.

The following sample shows how to use a stored procedure to do an upsert into a table in the Azure SQL Database. Assuming input data and the sink "Marketing" table each has three columns: ProfileID, State, and Category. Perform upsert based on the “ProfileID” column and only apply for a specific category.

Output dataset

{
    "name": "AzureSQLDbDataset",
    "properties":
    {
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "<Azure SQL Database linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "Marketing"
        }
    }
}

Define the "SqlSink" section in copy activity as follows.

"sink": {
    "type": "SqlSink",
    "SqlWriterTableType": "MarketingType",
    "SqlWriterStoredProcedureName": "spOverwriteMarketing",
    "storedProcedureParameters": {
        "category": {
            "value": "ProductA"
        }
    }
}

In your database, define the stored procedure with the same name as "SqlWriterStoredProcedureName". It handles input data from your specified source, and merge into the output table. Notice that the parameter name of the stored procedure should be the same as the "tableName" defined in dataset.

CREATE PROCEDURE spOverwriteMarketing @Marketing [dbo].[MarketingType] READONLY, @category varchar(256)
AS
BEGIN
  MERGE [dbo].[Marketing] AS target
  USING @Marketing AS source
  ON (target.ProfileID = source.ProfileID and target.Category = @category)
  WHEN MATCHED THEN
      UPDATE SET State = source.State
  WHEN NOT MATCHED THEN
      INSERT (ProfileID, State, Category)
      VALUES (source.ProfileID, source.State, source.Category)
END

In your database, define the table type with the same name as sqlWriterTableType. Notice that the schema of the table type should be same as the schema returned by your input data.

CREATE TYPE [dbo].[MarketingType] AS TABLE(
    [ProfileID] [varchar](256) NOT NULL,
    [State] [varchar](256) NOT NULL,
    [Category] [varchar](256) NOT NULL,
)

The stored procedure feature takes advantage of Table-Valued Parameters.

Data type mapping for Azure SQL Database

When copying data from/to Azure SQL Database, the following mappings are used from Azure SQL Database data types to Azure Data Factory interim data types. See Schema and data type mappings to learn about how copy activity maps the source schema and data type to the sink.

Azure SQL Database data type Data factory interim data type
bigint Int64
binary Byte[]
bit Boolean
char String, Char[]
date DateTime
Datetime DateTime
datetime2 DateTime
Datetimeoffset DateTimeOffset
Decimal Decimal
FILESTREAM attribute (varbinary(max)) Byte[]
Float Double
image Byte[]
int Int32
money Decimal
nchar String, Char[]
ntext String, Char[]
numeric Decimal
nvarchar String, Char[]
real Single
rowversion Byte[]
smalldatetime DateTime
smallint Int16
smallmoney Decimal
sql_variant Object *
text String, Char[]
time TimeSpan
timestamp Byte[]
tinyint Byte
uniqueidentifier Guid
varbinary Byte[]
varchar String, Char[]
xml Xml

Next steps

For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.