Copy data to or from Azure SQL Database by using Azure Data Factory

This article outlines how to copy data to and from Azure SQL Database. To learn about Azure Data Factory, read the introductory article.

Supported capabilities

This Azure SQL Database connector is supported for the following activities:

Specifically, this Azure SQL Database connector supports these functions:

  • Copy data by using SQL authentication and Azure Active Directory (Azure AD) Application token authentication with a service principal or managed identities for Azure resources.
  • As a source, retrieve data by using a SQL query or stored procedure.
  • As a sink, append data to a destination table or invoke a stored procedure with custom logic during the copy.

Azure SQL Database Always Encrypted is not supported now.

Important

If you copy data by using Azure Data Factory Integration Runtime, configure an Azure SQL server firewall so that Azure Services can access the server. If you copy data by using a self-hosted integration runtime, configure the Azure SQL server firewall to allow the appropriate IP range. This range includes the machine's IP that is used to connect to Azure SQL Database.

Get started

You can use one of the following tools or SDKs to use the copy activity with a pipeline. Select a link for step-by-step instructions:

The following sections provide details about properties that are used to define Data Factory entities specific to an Azure SQL Database connector.

Linked service properties

These properties are supported for an Azure SQL Database linked service:

Property Description Required
type The type property must be set to AzureSqlDatabase. Yes
connectionString Specify information needed to connect to the Azure SQL Database instance for the connectionString property.
Mark this field as a SecureString to store it securely in Data Factory. You can also put password/service principal key in Azure Key Vault,and if it's SQL authentication pull the password configuration out of the connection string. See the JSON example below the table and Store credentials in Azure Key Vault article with more details.
Yes
servicePrincipalId Specify the application's client ID. Yes, when you use Azure AD authentication with a service principal.
servicePrincipalKey Specify the application's key. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. Yes, when you use Azure AD authentication with a service principal.
tenant Specify the tenant information (domain name or tenant ID) under which your application resides. Retrieve it by hovering the mouse in the top-right corner of the Azure portal. Yes, when you use Azure AD authentication with a service principal.
connectVia The integration runtime to be used to connect to the data store. You can use Azure Integration Runtime or a self-hosted integration runtime if your data store is located in a private network. If not specified, it uses the default Azure Integration Runtime. No

For different authentication types, refer to the following sections on prerequisites and JSON samples, respectively:

Tip

If you hit error with error code as "UserErrorFailedToConnectToSqlServer" and message like "The session limit for the database is XXX and has been reached.", add Pooling=false to your connection string and try again.

SQL authentication

Linked service example that uses SQL authentication

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Password in Azure Key Vault:

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
            },
            "password": { 
                "type": "AzureKeyVaultSecret", 
                "store": { 
                    "referenceName": "<Azure Key Vault linked service name>", 
                    "type": "LinkedServiceReference" 
                }, 
                "secretName": "<secretName>" 
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Service principal authentication

To use a service principal-based Azure AD application token authentication, follow these steps:

  1. Create an Azure Active Directory application from the Azure portal. Make note of the application name and the following values that define the linked service:

    • Application ID
    • Application key
    • Tenant ID
  2. Provision an Azure Active Directory administrator for your Azure SQL server on the Azure portal if you haven't already done so. The Azure AD administrator must be an Azure AD user or Azure AD group, but it can't be a service principal. This step is done so that, in the next step, you can use an Azure AD identity to create a contained database user for the service principal.

  3. Create contained database users for the service principal. Connect to the database from or to which you want to copy data by using tools like SSMS, with an Azure AD identity that has at least ALTER ANY USER permission. Run the following T-SQL:

    CREATE USER [your application name] FROM EXTERNAL PROVIDER;
    
  4. Grant the service principal needed permissions as you normally do for SQL users or others. Run the following code, or refer to more options here.

    EXEC sp_addrolemember [role name], [your application name];
    
  5. Configure an Azure SQL Database linked service in Azure Data Factory.

Linked service example that uses service principal authentication

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;Connection Timeout=30"
            },
            "servicePrincipalId": "<service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<service principal key>"
            },
            "tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Managed identities for Azure resources authentication

A data factory can be associated with a managed identity for Azure resources that represents the specific data factory. You can use this managed identity for Azure SQL Database authentication. The designated factory can access and copy data from or to your database by using this identity.

To use managed identity authentication, follow these steps:

  1. Provision an Azure Active Directory administrator for your Azure SQL server on the Azure portal if you haven't already done so. The Azure AD administrator can be an Azure AD user or Azure AD group. If you grant the group with managed identity an admin role, skip steps 3 and 4. The administrator will have full access to the database.

  2. Create contained database users for the Data Factory Managed Identity. Connect to the database from or to which you want to copy data by using tools like SSMS, with an Azure AD identity that has at least ALTER ANY USER permission. Run the following T-SQL:

    CREATE USER [your Data Factory name] FROM EXTERNAL PROVIDER;
    
  3. Grant the Data Factory Managed Identity needed permissions as you normally do for SQL users and others. Run the following code, or refer to more options here.

    EXEC sp_addrolemember [role name], [your Data Factory name];
    
  4. Configure an Azure SQL Database linked service in Azure Data Factory.

Example:

{
    "name": "AzureSqlDbLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;Connection Timeout=30"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Dataset properties

For a full list of sections and properties available for defining datasets, see the Datasets article. This section provides a list of properties supported by the Azure SQL Database dataset.

To copy data from or to Azure SQL Database, the following properties are supported:

Property Description Required
type The type property of the dataset must be set to AzureSqlTable. Yes
tableName The name of the table or view in the Azure SQL Database instance that the linked service refers to. No for source, Yes for sink

Dataset properties example

{
    "name": "AzureSQLDbDataset",
    "properties":
    {
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "<Azure SQL Database linked service name>",
            "type": "LinkedServiceReference"
        },
        "schema": [ < physical schema, optional, retrievable during authoring > ],
        "typeProperties": {
            "tableName": "MyTable"
        }
    }
}

Copy Activity properties

For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by the Azure SQL Database source and sink.

Azure SQL Database as the source

To copy data from Azure SQL Database, set the type property in the Copy Activity source to SqlSource. The following properties are supported in the Copy Activity source section:

Property Description Required
type The type property of the Copy Activity source must be set to SqlSource. Yes
sqlReaderQuery Use the custom SQL query to read data. Example: select * from MyTable. No
sqlReaderStoredProcedureName The name of the stored procedure that reads data from the source table. The last SQL statement must be a SELECT statement in the stored procedure. No
storedProcedureParameters Parameters for the stored procedure.
Allowed values are name or value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters.
No

Points to note

  • If the sqlReaderQuery is specified for the SqlSource, Copy Activity runs this query against the Azure SQL Database source to get the data. Or you can specify a stored procedure. Specify sqlReaderStoredProcedureName and storedProcedureParameters if the stored procedure takes parameters.
  • If you don't specify either sqlReaderQuery or sqlReaderStoredProcedureName, the columns defined in the structure section of the dataset JSON are used to construct a query. select column1, column2 from mytable runs against Azure SQL Database. If the dataset definition doesn't have the structure, all columns are selected from the table.

SQL query example

"activities":[
    {
        "name": "CopyFromAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Azure SQL Database input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlSource",
                "sqlReaderQuery": "SELECT * FROM MyTable"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

Stored procedure example

"activities":[
    {
        "name": "CopyFromAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Azure SQL Database input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "SqlSource",
                "sqlReaderStoredProcedureName": "CopyTestSrcStoredProcedureWithParameters",
                "storedProcedureParameters": {
                    "stringData": { "value": "str3" },
                    "identifier": { "value": "$$Text.Format('{0:yyyy}', <datetime parameter>)", "type": "Int"}
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

Stored procedure definition

CREATE PROCEDURE CopyTestSrcStoredProcedureWithParameters
(
    @stringData varchar(20),
    @identifier int
)
AS
SET NOCOUNT ON;
BEGIN
     select *
     from dbo.UnitTestSrcTable
     where dbo.UnitTestSrcTable.stringData != stringData
    and dbo.UnitTestSrcTable.identifier != identifier
END
GO

Azure SQL Database as the sink

Tip

Learn more on the supported write behaviors, configurations and best practice from Best practice for loading data into Azure SQL Database.

To copy data to Azure SQL Database, set the type property in the Copy Activity sink to SqlSink. The following properties are supported in the Copy Activity sink section:

Property Description Required
type The type property of the Copy Activity sink must be set to SqlSink. Yes
writeBatchSize Number of rows to inserts into the SQL table per batch.
The allowed value is integer (number of rows). By default, Data Factory dynamically determine the appropriate batch size based on the row size.
No
writeBatchTimeout The wait time for the batch insert operation to finish before it times out.
The allowed value is timespan. Example: “00:30:00” (30 minutes).
No
preCopyScript Specify a SQL query for Copy Activity to run before writing data into Azure SQL Database. It's only invoked once per copy run. Use this property to clean up the preloaded data. No
sqlWriterStoredProcedureName The name of the stored procedure that defines how to apply source data into a target table.
This stored procedure is invoked per batch. For operations that only run once and have nothing to do with source data, for example, delete or truncate, use the preCopyScript property.
No
storedProcedureParameters Parameters for the stored procedure.
Allowed values are name and value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters.
No
sqlWriterTableType Specify a table type name to be used in the stored procedure. Copy Activity makes the data being moved available in a temporary table with this table type. Stored procedure code can then merge the data being copied with existing data. No

Example 1: append data

"activities":[
    {
        "name": "CopyToAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Azure SQL Database output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "SqlSink",
                "writeBatchSize": 100000
            }
        }
    }
]

Example 2: invoke a stored procedure during copy

Learn more details from Invoking stored procedure from SQL Sink.

"activities":[
    {
        "name": "CopyToAzureSQLDatabase",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<Azure SQL Database output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "<source type>"
            },
            "sink": {
                "type": "SqlSink",
                "sqlWriterStoredProcedureName": "CopyTestStoredProcedureWithParameters",
                "sqlWriterTableType": "CopyTestTableType",
                "storedProcedureParameters": {
                    "identifier": { "value": "1", "type": "Int" },
                    "stringData": { "value": "str1" }
                }
            }
        }
    }
]

Best practice for loading data into Azure SQL Database

When you copy data into Azure SQL Database, you may require different write behavior:

  • Append: my source data only has new records;
  • Upsert: my source data has both inserts and updates;
  • Overwrite: I want to reload entire dimension table each time;
  • Write with custom logic: I need extra processing before the final insertion into the destination table.

Refer to the respectively sections on how to configure in ADF and the best practices.

Append data

This is the default behavior of this Azure SQL Database sink connector, and ADF do bulk insert to write to your table efficiently. You can simply configure the source and sink accordingly in Copy activity.

Upsert data

Option I (suggested especially when you have large data to copy): the most performant approach to do upsert is the following:

  • Firstly, leverage a database scoped temporary table to bulk load all records using Copy activity. As operations against database scoped temporary tables are not logged, you can load millions of records in seconds.
  • Execute a Stored Procedure activity in ADF to apply a MERGE (or INSERT/UPDATE) statement, and use the temp table as source to perform all updates or inserts as a single transaction, reducing the amount of roundtrips and log operations. At the end of the Stored Procedure activity , temp table can be truncated to be ready for the next upsert cycle.

As an example, in Azure Data Factory, you can create a pipeline with a Copy activity chained with a Stored Procedure activity on success. The former copies data from your source store into an Azure SQL Database temporary table, say "##UpsertTempTable" as table name in dataset, then the latter invokes a Stored Procedure to merge source data from the temp table into target table, and clean up temp table.

Upsert

In your database, define a Stored Procedure with MERGE logic, like the following, which is pointed to from the above Stored Procedure activity. Assuming target Marketing table with three columns: ProfileID, State, and Category, and do the upsert based on the ProfileID column.

CREATE PROCEDURE [dbo].[spMergeData]
AS
BEGIN
	MERGE TargetTable AS target
	USING ##UpsertTempTable AS source
	ON (target.[ProfileID] = source.[ProfileID])
	WHEN MATCHED THEN
		UPDATE SET State = source.State
    WHEN NOT matched THEN
    	INSERT ([ProfileID], [State], [Category])
      VALUES (source.ProfileID, source.State, source.Category);
    
    TRUNCATE TABLE ##UpsertTempTable
END

Option II: alternatively, you can choose to Invoke stored procedure within Copy activity, while note this approach is executed for each row in the source table instead of leveraging bulk insert as the default approach in Copy activity, thus it doesn't fit for large scale upsert.

Overwrite entire table

You can configure preCopyScript property in Copy activity sink, in which case for each Copy activity run, ADF executes the script first, then run the copy to insert the data. For example, to overwrite the entire table with the latest data, you can specify a script to first delete all records before bulk-loading the new data from the source.

Write data with custom logic

Similar as described in Upsert data section, when you need to apply extra processing before the final insertion of source data into the destination table, you can a) for large scale, load to a database scoped temporary table then invoke a stored procedure, or b) invoking a stored procedure during copy.

Invoke stored procedure from SQL sink

When you copy data into Azure SQL Database, you can also configure and invoke a user-specified stored procedure with additional parameters.

Tip

Invoking stored procedure processes the data row-by-row instead of bulk operation, which is not suggested for large scale copy. Learn more from Best practice for loading data into Azure SQL Database.

You can use a stored procedure when built-in copy mechanisms don't serve the purpose, e.g. apply extra processing before the final insertion of source data into the destination table. Some extra processing examples are merge columns, look up additional values, and insertion into more than one table.

The following sample shows how to use a stored procedure to do an upsert into a table in Azure SQL Database. Assume that input data and the sink Marketing table each have three columns: ProfileID, State, and Category. Do the upsert based on the ProfileID column, and only apply it for a specific category.

Output dataset: the "tableName" should be the same table type parameter name in your stored procedure (see below stored procedure script).

{
    "name": "AzureSQLDbDataset",
    "properties":
    {
        "type": "AzureSqlTable",
        "linkedServiceName": {
            "referenceName": "<Azure SQL Database linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "tableName": "Marketing"
        }
    }
}

Define the SQL sink section in copy activity as follows.

"sink": {
    "type": "SqlSink",
    "SqlWriterTableType": "MarketingType",
    "SqlWriterStoredProcedureName": "spOverwriteMarketing",
    "storedProcedureParameters": {
        "category": {
            "value": "ProductA"
        }
    }
}

In your database, define the stored procedure with the same name as the SqlWriterStoredProcedureName. It handles input data from your specified source and merges into the output table. The parameter name of the table type in the stored procedure should be the same as the tableName defined in the dataset.

CREATE PROCEDURE spOverwriteMarketing @Marketing [dbo].[MarketingType] READONLY, @category varchar(256)
AS
BEGIN
  MERGE [dbo].[Marketing] AS target
  USING @Marketing AS source
  ON (target.ProfileID = source.ProfileID and target.Category = @category)
  WHEN MATCHED THEN
      UPDATE SET State = source.State
  WHEN NOT MATCHED THEN
      INSERT (ProfileID, State, Category)
      VALUES (source.ProfileID, source.State, source.Category);
END

In your database, define the table type with the same name as the sqlWriterTableType. The schema of the table type should be same as the schema returned by your input data.

CREATE TYPE [dbo].[MarketingType] AS TABLE(
    [ProfileID] [varchar](256) NOT NULL,
    [State] [varchar](256) NOT NULL,
    [Category] [varchar](256) NOT NULL
)

The stored procedure feature takes advantage of Table-Valued Parameters.

Mapping Data Flow properties

Learn details from source transformation and sink transformation in Mapping Data Flow.

Data type mapping for Azure SQL Database

When you copy data from or to Azure SQL Database, the following mappings are used from Azure SQL Database data types to Azure Data Factory interim data types. See Schema and data type mappings to learn how Copy Activity maps the source schema and data type to the sink.

Azure SQL Database data type Data Factory interim data type
bigint Int64
binary Byte[]
bit Boolean
char String, Char[]
date DateTime
Datetime DateTime
datetime2 DateTime
Datetimeoffset DateTimeOffset
Decimal Decimal
FILESTREAM attribute (varbinary(max)) Byte[]
Float Double
image Byte[]
int Int32
money Decimal
nchar String, Char[]
ntext String, Char[]
numeric Decimal
nvarchar String, Char[]
real Single
rowversion Byte[]
smalldatetime DateTime
smallint Int16
smallmoney Decimal
sql_variant Object
text String, Char[]
time TimeSpan
timestamp Byte[]
tinyint Byte
uniqueidentifier Guid
varbinary Byte[]
varchar String, Char[]
xml Xml

Note

For data types maps to Decimal interim type, currently ADF support precision up to 28. If you have data with precision larger than 28, consider to convert to string in SQL query.

Next steps

For a list of data stores supported as sources and sinks by Copy Activity in Azure Data Factory, see Supported data stores and formats.