Copy data from an on-premises SQL Server database to Azure Blob storage by using the Copy Data tool

In this tutorial, you use the Azure portal to create a data factory. Then, you use the Copy Data tool to create a pipeline that copies data from an on-premises SQL Server database to Azure Blob storage.

Note

In this tutorial, you perform the following steps:

  • Create a data factory.
  • Use the Copy Data tool to create a pipeline.
  • Monitor the pipeline and activity runs.

Prerequisites

Azure subscription

Before you begin, if you don't already have an Azure subscription, create a free account.

Azure roles

To create data factory instances, the user account you use to log in to Azure must be assigned a contributor or owner role or must be an administrator of the Azure subscription.

To view the permissions you have in the subscription, go to the Azure portal. Select your user name in the upper-right corner, and then select Permissions. If you have access to multiple subscriptions, select the appropriate subscription. For sample instructions on how to add a user to a role, see Add roles.

SQL Server 2014, 2016, and 2017

In this tutorial, you use an on-premises SQL Server database as a source data store. The pipeline in the data factory you create in this tutorial copies data from this on-premises SQL Server database (source) to Blob storage (sink). You then create a table named emp in your SQL Server database and insert a couple of sample entries into the table.

  1. Start SQL Server Management Studio. If it's not already installed on your machine, go to Download SQL Server Management Studio.

  2. Connect to your SQL Server instance by using your credentials.

  3. Create a sample database. In the tree view, right-click Databases, and then select New Database.

  4. In the New Database window, enter a name for the database, and then select OK.

  5. To create the emp table and insert some sample data into it, run the following query script against the database. In the tree view, right-click the database that you created, and then select New Query.

    CREATE TABLE dbo.emp
    (
        ID int IDENTITY(1,1) NOT NULL,
        FirstName varchar(50),
        LastName varchar(50)
    )
    GO
    
    INSERT INTO emp (FirstName, LastName) VALUES ('John', 'Doe')
    INSERT INTO emp (FirstName, LastName) VALUES ('Jane', 'Doe')
    GO
    

Azure storage account

In this tutorial, you use a general-purpose Azure storage account (specifically, Blob storage) as a destination/sink data store. If you don't have a general-purpose storage account, see Create a storage account for instructions to create one. The pipeline in the data factory you that create in this tutorial copies data from the on-premises SQL Server database (source) to this Blob storage (sink).

Get the storage account name and account key

You use the name and key of your storage account in this tutorial. To get the name and key of your storage account, take the following steps:

  1. Sign in to the Azure portal with your Azure user name and password.

  2. In the left pane, select More services. Filter by using the Storage keyword, and then select Storage accounts.

    Storage account search

  3. In the list of storage accounts, filter for your storage account, if needed. Then select your storage account.

  4. In the Storage account window, select Access keys.

    Access keys

  5. In the Storage account name and key1 boxes, copy the values, and then paste them into Notepad or another editor for later use in the tutorial.

Create the adftutorial container

In this section, you create a blob container named adftutorial in your Blob storage.

  1. In the Storage account window, switch to Overview, and then select Blobs.

    Select Blobs option

  2. In the Blob service window, select Container.

    Container button

  3. In the New container window, in the Name box, enter adftutorial, and then select OK.

    New container

  4. In the list of containers, select adftutorial.

    Container selection

  5. Keep the Container window for adftutorial open. You use it verify the output at the end of the tutorial. Data Factory automatically creates the output folder in this container, so you don't need to create one.

    Container window

Create a data factory

  1. On the menu on the left, select New > Data + Analytics > Data Factory.

    New data factory creation

  2. On the New data factory page, under Name, enter ADFTutorialDataFactory.

    New data factory

    The name of the data factory must be globally unique. If you see the following error message for the name field, change the name of the data factory (for example, yournameADFTutorialDataFactory). For naming rules for Data Factory artifacts, see Data Factory naming rules.

    New data factory name

  3. Select the Azure subscription in which you want to create the data factory.
  4. For Resource Group, take one of the following steps:

  5. Under Version, select V2 (Preview).
  6. Under Location, select the location for the data factory. Only locations that are supported are displayed in the drop-down list. The data stores (for example, Azure Storage and SQL Database) and computes (for example, Azure HDInsight) used by Data Factory can be in other locations/regions.
  7. Select Pin to dashboard.
  8. Select Create.
  9. On the dashboard, you see the following tile with the status Deploying Data Factory:

    Deploying data factory tile

  10. After the creation is finished, you see the Data Factory page as shown in the image.

    Data factory home page

  11. Select Author & Monitor to launch the Data Factory user interface in a separate tab.

Use the Copy Data tool to create a pipeline

  1. On the Let's get started page, select Copy Data to launch the Copy Data tool.

    Copy Data tool tile

  2. On the Properties page of the Copy Data tool, under Task name, enter CopyFromOnPremSqlToAzureBlobPipeline. Then select Next. The Copy Data tool creates a pipeline with the name you specify for this field.

    Task name

  3. On the Source data store page, select SQL Server, and then select Next. You might need to scroll down to see SQL Server in the list.

    SQL Server selection

  4. Under Connection name, enter SqlServerLinkedService. Select the Create Integration Runtime link. You must create a self-hosted integration runtime, download it to your machine, and register it with Data Factory. The self-hosted integration runtime copies data between your on-premises environment and the cloud.

    Create self-hosted integration runtime

  5. In the Create Integration Runtime dialog box, under Name, enter TutorialIntegration Runtime. Then select Create.

    Integration runtime name

  6. Select Launch express setup on this computer. This action installs the integration runtime on your machine and registers it with Data Factory. Alternatively, you can use the manual setup option to download the installation file, run it, and use the key to register the integration runtime.

    Launch express setup on this computer link

  7. Run the downloaded application. You see the status of the express setup in the window.

    Express setup status

  8. Confirm that TutorialIntegrationRuntime is selected for the Integration Runtime field.

    Integration runtime selected

  9. In Specify the on-premises SQL Server database, take the following steps:

    a. Under Connection name, enter OnPremSqlLinkedService.

    b. Under Server name, enter the name of your on-premises SQL Server instance.

    c. Under Database name, enter the name of your on-premises database.

    d. Under Authentication type, select appropriate authentication.

    e. Under User name, enter the name of user with access to on-premises SQL Server.

    f. Enter the password for the user.

  10. On the Select tables from which to copy the data or use a custom query page, select the [dbo].[emp] table in the list, and select Next.

    The emp table selection

  11. On the Destination data store page, select Azure Blob Storage, and then select Next.

    Blob storage selection

  12. On the Specify the Azure Blob storage account page, take the following steps:

    a. Under Connection name, enter AzureStorageLinkedService.

    b. Under Storage account name, select your storage account from the drop-down list.

    c. Select Next.

    Specify the storage account

  13. In the Choose the output file or folder page, under Folder path, enter adftutorial/fromonprem. You created the adftutorial container as part of the prerequisites. If the output folder doesn't exist, Data Factory automatically creates it. You also can use the Browse button to browse the blob storage and its containers/folders. Notice that the name of the output file is set to dbo.emp by default.

    Choose the output file or folder

  14. On the File format settings page, select Next.

    File format settings page

  15. On the Settings page, select Next.

    Settings page

  16. On the Summary page, review values for all the settings, and select Next.

    Summary page

  17. On the Deployment page, select Monitor to monitor the pipeline or task you created.

    Deployment page

  18. On the Monitor tab, you can view the status of the pipeline you created. You can use the links in the Action column to view activity runs associated with the pipeline run and to rerun the pipeline.

    Monitor pipeline runs

  19. Select the View Activity Runs link in the Actions column to see activity runs associated with the pipeline run. To see details about the copy operation, select the Details link (eyeglasses icon) in the Actions column. To switch back to the Pipeline Runs view, select Pipelines at the top.

    Monitor activity runs

  20. Confirm that you see the output file in the fromonprem folder of the adftutorial container.

    Output blob

  21. Select the Edit tab on the left to switch to the editor mode. You can update the linked services, datasets, and pipelines created by the tool by using the editor. Select Code to view the JSON code associated with the entity opened in the editor. For details on how to edit these entities in the Data Factory UI, see the Azure portal version of this tutorial.

    Edit tab

Next steps

The pipeline in this sample copies data from an on-premises SQL Server database to Blob storage. You learned how to:

  • Create a data factory.
  • Use the Copy Data tool to create a pipeline.
  • Monitor the pipeline and activity runs.

For a list of data stores that are supported by Data Factory, see Supported data stores.

To learn about how to copy data in bulk from a source to a destination, advance to the following tutorial: