Securely load data using Synapse SQL

This article highlights and provides examples on the secure authentication mechanisms for the COPY statement. The COPY statement is the most flexible and secure way of bulk loading data in Synapse SQL.

Supported authentication mechanisms

The following matrix describes the supported authentication methods for each file type and storage account. This applies to the source storage location and the error file location.

CSV Parquet ORC
Azure blob storage SAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS/KEY SAS/KEY
Azure Data Lake Gen2 SAS/MSI/SERVICE PRINCIPAL/KEY/AAD SAS (blob endpoint)/MSI (dfs endpoint)/SERVICE PRINCIPAL/KEY/AAD SAS (blob endpoint)/MSI (dfs endpoint)/SERVICE PRINCIPAL/KEY/AAD

A. Storage account key with LF as the row terminator (Unix-style new line)

--Note when specifying the column list, input field numbers start from 1
COPY INTO target_table (Col_one default 'myStringDefault' 1, Col_two default 1 3)
FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/'
WITH (
	FILE_TYPE = 'CSV'
	,CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='<Your_Account_Key>')
	--CREDENTIAL should look something like this:
	--CREDENTIAL=(IDENTITY= 'Storage Account Key', SECRET='x6RWv4It5F2msnjelv3H4DA80n0QW0daPdw43jM0nyetx4c6CpDkdj3986DX5AHFMIf/YN4y6kkCnU8lb+Wx0Pj+6MDw=='),
	,ROWTERMINATOR='0x0A' --0x0A specifies to use the Line Feed character (Unix based systems)
)

Important

  • Use the hexadecimal value (0x0A) to specify the Line Feed/Newline character. Note the COPY statement will interpret the '\n' string as '\r\n' (carriage return newline).

B. Shared Access Signatures (SAS) with CRLF as the row terminator (Windows style new line)

COPY INTO target_table
FROM 'https://adlsgen2account.dfs.core.windows.net/myblobcontainer/folder1/'
WITH (
    FILE_TYPE = 'CSV'
    ,CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='<Your_SAS_Token>')
	--CREDENTIAL should look something like this:
    --CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='?sv=2018-03-28&ss=bfqt&srt=sco&sp=rl&st=2016-10-17T20%3A14%3A55Z&se=2021-10-18T20%3A19%3A00Z&sig=IEoOdmeYnE9%2FKiJDSFSYsz4AkNa%2F%2BTx61FuQ%2FfKHefqoBE%3D'),
    ,ROWTERMINATOR='\n'-- COPY command automatically prefixes the \r character when \n (newline) is specified. This results in carriage return newline (\r\n) for Windows based systems.
)

Important

  • Do not specify the ROWTERMINATOR as '\r\n' which will be interpreted as '\r\r\n' and can result in parsing issues

C. Managed Identity

Managed Identity authentication is required when your storage account is attached to a VNet.

Prerequisites

  1. Install Azure PowerShell using this guide.
  2. If you have a general-purpose v1 or blob storage account, you must first upgrade to general-purpose v2 using this guide.
  3. You must have Allow trusted Microsoft services to access this storage account turned on under Azure Storage account Firewalls and Virtual networks settings menu. Refer to this guide for more information.

Steps

  1. In PowerShell, register your SQL server with Azure Active Directory (AAD):

    Connect-AzAccount
    Select-AzSubscription -SubscriptionId your-subscriptionId
    Set-AzSqlServer -ResourceGroupName your-database-server-resourceGroup -ServerName your-database-servername -AssignIdentity
    
  2. Create a general-purpose v2 Storage Account using this guide.

    Note

    If you have a general-purpose v1 or blob storage account, you must first upgrade to v2 using this guide.

  3. Under your storage account, navigate to Access Control (IAM), and select Add role assignment. Assign Storage Blob Data Owner, Contributor, or Reader Azure role to your SQL server.

    Note

    Only members with Owner privilege can perform this step. For various Azure built-in roles, refer to this guide.

    Important

    Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    Granting RBAC permission to load

  4. You can now run the COPY statement specifying "Managed Identity":

     COPY INTO dbo.target_table
     FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt'
     WITH (
         FILE_TYPE = 'CSV',
         CREDENTIAL = (IDENTITY = 'Managed Identity'),
     )
    

D. Azure Active Directory Authentication (AAD)

Steps

  1. Under your storage account, navigate to Access Control (IAM), and select Add role assignment. Assign Storage Blob Data Owner, Contributor, or Reader Azure role to your AAD user.

    Important

    Specify the Storage Blob Data Owner, Contributor, or Reader Azure role. These roles are different than the Azure built-in roles of Owner, Contributor, and Reader.

    Granting RBAC permission to load

  2. Configure Azure AD authentication by going through the following documentation.

  3. Connect to your SQL pool using Active Directory where you can now run the COPY statement without specifying any credentials:

     COPY INTO dbo.target_table
     FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder1/*.txt'
     WITH (
         FILE_TYPE = 'CSV'
     )
    

E. Service Principal Authentication

Steps

  1. Create an Azure Active Directory (AAD) application

  2. Get application ID

  3. Get the authentication key

  4. Get the V1 OAuth 2.0 token endpoint

  5. Assign read, write, and execution permissions to your AAD application on your storage account

  6. You can now run the COPY statement:

     COPY INTO dbo.target_table
     FROM 'https://myaccount.blob.core.windows.net/myblobcontainer/folder0/*.txt'
     WITH (
         FILE_TYPE = 'CSV'
         ,CREDENTIAL=(IDENTITY= '<application_ID>@<OAuth_2.0_Token_EndPoint>' , SECRET= '<authentication_key>')
         --CREDENTIAL should look something like this:
     	--,CREDENTIAL=(IDENTITY= '92761aac-12a9-4ec3-89b8-7149aef4c35b@https://login.microsoftonline.com/72f714bf-86f1-41af-91ab-2d7cd011db47/oauth2/token', SECRET='juXi12sZ6gse]woKQNgqwSywYv]7A.M')
     )
    

Important

  • Use the V1 version of the OAuth 2.0 token endpoint

Next steps