Securely load data using Synapse SQL

This article highlights and provides examples on the secure authentication mechanisms for the COPY statement. The COPY statement is the most flexible and secure way of bulk loading data in Synapse SQL.

Supported authentication mechanisms

The following matrix describes the supported authentication methods for each file type and storage account. This applies to the source storage location and the error file location.

CSV Parquet ORC
Azure blob storage SAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS/KEY SAS/KEY
Azure Data Lake Gen2 SAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS (blob1)/MSI (dfs2)/SERVICE PRINCIPAL/KEY/AAD SAS (blob1)/MSI (dfs2)/SERVICE PRINCIPAL/KEY/AAD

1: The .blob endpoint (.blob.core.windows.net) in your external location path is required for this authentication method.

2: The .dfs endpoint (.dfs.core.windows.net) in your external location path is required for this authentication method.

A. Storage account key with LF as the row terminator (Unix-style new line)

--Note when specifying the column list, input field numbers start from 1
COPY INTO target_table (Col_one default 'myStringDefault' 1, Col_two default 1 3)
FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/'
WITH (
	FILE_TYPE = 'CSV'
	,CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<Your_Account_Key>')
	--CREDENTIAL should look something like this:
	--CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='x6RWv4It5F2msnjelv3H4DA80n0QW0daPdw43jM0nyetx4c6CpDkdj3986DX5AHFMIf/YN4y6kkCnU8lb+Wx0Pj+6MDw=='),
	,ROWTERMINATOR='0x0A' --0x0A specifies to use the Line Feed character (Unix based systems)
)

Important

  • Use the hexadecimal value (0x0A) to specify the Line Feed/Newline character. Note the COPY statement will interpret the '\n' string as '\r\n' (carriage return newline).

B. Shared Access Signatures (SAS) with CRLF as the row terminator (Windows style new line)

COPY INTO target_table
FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV'
    ,CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
	--CREDENTIAL should look something like this:
    --CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='?sv=2018-03-28&ss=bfqt&srt=sco&sp=rl&st=2016-10-17T20%3A14%3A55Z&se=2021-10-18T20%3A19%3A00Z&sig=IEoOdmeYnE9%2FKiJDSFSYsz4AkNa%2F%2BTx61FuQ%2FfKHefqoBE%3D'),
    ,ROWTERMINATOR='\n'-- COPY command automatically prefixes the \r character when \n (newline) is specified. This results in carriage return newline (\r\n) for Windows based systems.
)

Important

  • Do not specify the ROWTERMINATOR as '\r\n' which will be interpreted as '\r\r\n' and can result in parsing issues

C. Managed Identity

Managed Identity authentication is required when your storage account is attached to a VNet.

Prerequisites

  1. Install Azure PowerShell using this guide.
  2. If you have a general-purpose v1 or blob storage account, you must first upgrade to general-purpose v2 using this guide.
  3. You must have Allow trusted Microsoft services to access this storage account turned on under Azure Storage account Firewalls and Virtual networks settings menu. Refer to this guide for more information.

Steps

  1. If you have a standalone dedicated SQL pool, register your SQL server with Azure Active Directory (AAD) using PowerShell:

    Connect-AzAccount
    Select-AzSubscription -SubscriptionId <subscriptionId>
    Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-SQL-servername -AssignIdentity
    

    This step is not required for dedicated SQL pools within a Synapse workspace.

  2. If you have a Synapse workspace, register your workspace's system-managed identity:

    1. Go to your Synapse workspace in the Azure portal
    2. Go to the Managed identities blade
    3. Make sure the "Allow Pipelines" option is enabled

    Register workspace system msi

  3. Create a general-purpose v2 Storage Account using this guide.

    Note

    • If you have a general-purpose v1 or blob storage account, you must first upgrade to v2 using this guide.
    • For known issues with Azure Data Lake Storage Gen2, please refer to this guide.
  4. Under your storage account, select Access control (IAM).

  5. Select Add > Add role assignment to open the Add role assignment page.

  6. Assign the following role. For detailed steps, see Assign Azure roles using the Azure portal.

    Setting Value
    Role Storage Blob Data Contributor
    Assign access to SERVICEPRINCIPAL
    Members server or workspace hosting your dedicated SQL pool which you've registered with Azure Active Directory (AAD)

    Add role assignment page in Azure portal.

    Note

    Only members with Owner privilege can perform this step. For various Azure built-in roles, refer to this guide.

    Important

    Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    Granting Azure RBAC permission to load

  7. You can now run the COPY statement specifying "Managed Identity":

     COPY INTO dbo.target_table
     FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt'
     WITH (
         FILE_TYPE = 'CSV',
         CREDENTIAL = (IDENTITY = 'Managed Identity'),
     )
    

D. Azure Active Directory Authentication

Steps

  1. Under your storage account, select Access control (IAM).

  2. Select Add > Add role assignment to open the Add role assignment page.

  3. Assign the following role. For detailed steps, see Assign Azure roles using the Azure portal.

    Setting Value
    Role Storage Blob Data Owner, Contributor, or Reader
    Assign access to USER
    Members Azure AD user

    Add role assignment page in Azure portal.

    Important

    Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    Granting Azure RBAC permission to load

  4. Configure Azure AD authentication by going through the following documentation.

  5. Connect to your SQL pool using Active Directory where you can now run the COPY statement without specifying any credentials:

     COPY INTO dbo.target_table
     FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt'
     WITH (
         FILE_TYPE = 'CSV'
     )
    

E. Service Principal Authentication

Steps

  1. Create an Azure Active Directory application

  2. Get application ID

  3. Get the authentication key

  4. Get the V1 OAuth 2.0 token endpoint

  5. Assign read, write, and execution permissions to your Azure AD application on your storage account

  6. You can now run the COPY statement:

     COPY INTO dbo.target_table
     FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.txt'
     WITH (
         FILE_TYPE = 'CSV'
         ,CREDENTIAL=(IDENTITY= '<application_ID>@<OAuth_2.0_Token_EndPoint>' , SECRET= '<authentication_key>')
         --CREDENTIAL should look something like this:
     	--,CREDENTIAL=(IDENTITY= '92761aac-12a9-4ec3-89b8-7149aef4c35b@https://login.microsoftonline.com/72f714bf-86f1-41af-91ab-2d7cd011db47/oauth2/token', SECRET='juXi12sZ6gse]woKQNgqwSywYv]7A.M')
     )
    

Important

  • Use the V1 version of the OAuth 2.0 token endpoint

Next steps