Load data into Azure Synapse Analytics by using Azure Data Factory

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Azure Synapse Analytics is a cloud-based, scale-out database that's capable of processing massive volumes of data, both relational and non-relational. Azure Synapse Analytics is built on the massively parallel processing (MPP) architecture that's optimized for enterprise data warehouse workloads. It offers cloud elasticity with the flexibility to scale storage and compute independently.

Getting started with Azure Synapse Analytics is now easier than ever when you use Azure Data Factory. Azure Data Factory is a fully managed cloud-based data integration service. You can use the service to populate an Azure Synapse Analytics with data from your existing system and save time when building your analytics solutions.

Azure Data Factory offers the following benefits for loading data into Azure Synapse Analytics:

  • Easy to set up: An intuitive 5-step wizard with no scripting required.
  • Rich data store support: Built-in support for a rich set of on-premises and cloud-based data stores. For a detailed list, see the table of Supported data stores.
  • Secure and compliant: Data is transferred over HTTPS or ExpressRoute. The global service presence ensures that your data never leaves the geographical boundary.
  • Unparalleled performance by using PolyBase: Polybase is the most efficient way to move data into Azure Synapse Analytics. Use the staging blob feature to achieve high load speeds from all types of data stores, including Azure Blob storage and Data Lake Store. (Polybase supports Azure Blob storage and Azure Data Lake Store by default.) For details, see Copy activity performance.

This article shows you how to use the Data Factory Copy Data tool to load data from Azure SQL Database into Azure Synapse Analytics. You can follow similar steps to copy data from other types of data stores.

Prerequisites

  • Azure subscription: If you don't have an Azure subscription, create a free account before you begin.
  • Azure Synapse Analytics: The data warehouse holds the data that's copied over from the SQL database. If you don't have an Azure Synapse Analytics, see the instructions in Create an Azure Synapse Analytics.
  • Azure SQL Database: This tutorial copies data from the Adventure Works LT sample dataset in Azure SQL Database. You can create this sample database in SQL Database by following the instructions in Create a sample database in Azure SQL Database.
  • Azure storage account: Azure Storage is used as the staging blob in the bulk copy operation. If you don't have an Azure storage account, see the instructions in Create a storage account.

Create a data factory

  1. On the left menu, select Create a resource > Data + Analytics > Data Factory:

  2. On the New data factory page, provide values for following items:

    • Name: Enter LoadSQLDWDemo for name. The name for your data factory must be *globally unique. If you receive the error "Data factory name 'LoadSQLDWDemo' is not available", enter a different name for the data factory. For example, you could use the name yournameADFTutorialDataFactory. Try creating the data factory again. For the naming rules for Data Factory artifacts, see Data Factory naming rules.
    • Subscription: Select your Azure subscription in which to create the data factory.
    • Resource Group: Select an existing resource group from the drop-down list, or select the Create new option and enter the name of a resource group. To learn about resource groups, see Using resource groups to manage your Azure resources.
    • Version: Select V2.
    • Location: Select the location for the data factory. Only supported locations are displayed in the drop-down list. The data stores that are used by data factory can be in other locations and regions. These data stores include Azure Data Lake Store, Azure Storage, Azure SQL Database, and so on.
  3. Select Create.

  4. After creation is complete, go to your data factory. You see the Data Factory home page as shown in the following image:

    Data factory home page

    Select the Author & Monitor tile to launch the Data Integration Application in a separate tab.

Load data into Azure Synapse Analytics

  1. In the Get started page, select the Copy Data tile to launch the Copy Data tool.

  2. In the Properties page, specify CopyFromSQLToSQLDW for the Task name field, and select Next.

    Properties page

  3. In the Source data store page, complete the following steps:

    Tip

    In this tutorial, you use SQL authentication as the authentication type for your source data store, but you can choose other supported authentication methods:Service Principal and Managed Identity if needed. Refer to corresponding sections in this article for details. To store secrets for data stores securely, it's also recommended to use an Azure Key Vault. Refer to this article for detailed illustrations.

    a. click + Create new connection.

    b. Select Azure SQL Database from the gallery, and select Continue. You can type "SQL" in the search box to filter the connectors.

    Select Azure SQL DB

    c. In the New Linked Service page, select your server name and DB name from the dropdown list, and specify the username and password. Click Test connection to validate the settings, then select Create.

    Configure Azure SQL DB

    d. Select the newly created linked service as source, then click Next.

  4. In the Select tables from which to copy the data or use a custom query page, enter SalesLT to filter the tables. Choose the (Select all) box to use all of the tables for the copy, and then select Next.

    Select source tables

  5. In the Apply filter page, specify your settings or select Next.

  6. In the Destination data store page, complete the following steps:

    Tip

    In this tutorial, you use SQL authentication as the authentication type for your destination data store, but you can choose other supported authentication methods:Service Principal and Managed Identity if needed. Refer to corresponding sections in this article for details. To store secrets for data stores securely, it's also recommended to use an Azure Key Vault. Refer to this article for detailed illustrations.

    a. Click + Create new connection to add a connection

    b. Select Azure Synapse Analytics from the gallery, and select Continue.

    Select Azure Synapse Analytics

    c. In the New Linked Service page, select your server name and DB name from the dropdown list, and specify the username and password. Click Test connection to validate the settings, then select Create.

    Configure Azure Synapse Analytics

    d. Select the newly created linked service as sink, then click Next.

  7. In the Table mapping page, review the content, and select Next. An intelligent table mapping displays. The source tables are mapped to the destination tables based on the table names. If a source table doesn't exist in the destination, Azure Data Factory creates a destination table with the same name by default. You can also map a source table to an existing destination table.

    Table mapping page

  8. In the Column mapping page, review the content, and select Next. The intelligent table mapping is based on the column name. If you let Data Factory automatically create the tables, data type conversion can occur when there are incompatibilities between the source and destination stores. If there's an unsupported data type conversion between the source and destination column, you see an error message next to the corresponding table.

    Column mapping page

  9. In the Settings page, complete the following steps:

    a. In Staging settings section, click + New to new a staging storage. The storage is used for staging the data before it loads into Azure Synapse Analytics by using PolyBase. After the copy is complete, the interim data in Azure Blob Storage is automatically cleaned up.

    b. In the New Linked Service page, select your storage account, and select Create to deploy the linked service.

    c. Deselect the Use type default option, and then select Next.

    Configure PolyBase

  10. In the Summary page, review the settings, and select Next.

    Summary page

  11. On the Deployment page, select Monitor to monitor the pipeline (task).

  12. Notice that the Monitor tab on the left is automatically selected. When the pipeline run completes successfully, select the CopyFromSQLToSQLDW link under the PIPELINE NAME column to view activity run details or to rerun the pipeline.

    Monitor pipeline runs

  13. To switch back to the pipeline runs view, select the All pipeline runs link at the top. Select Refresh to refresh the list.

    Monitor activity runs

  14. To monitor the execution details for each copy activity, select the Details link (eyeglasses icon) under ACTIVITY NAME in the activity runs view. You can monitor details like the volume of data copied from the source to the sink, data throughput, execution steps with corresponding duration, and used configurations. Monitor activity run details first

    Monitor activity run details second

Next steps

Advance to the following article to learn about Azure Synapse Analytics support: