Tutorial: Create a pipeline with Copy Activity using Data Factory Copy Wizard

This tutorial shows you how to use the Copy Wizard to copy data from an Azure blob storage to an Azure SQL database.

The Azure Data Factory Copy Wizard allows you to quickly create a data pipeline that copies data from a supported source data store to a supported destination data store. Therefore, we recommend that you use the wizard as a first step to create a sample pipeline for your data movement scenario. For a list of data stores supported as sources and as destinations, see supported data stores.

This tutorial shows you how to create an Azure data factory, launch the Copy Wizard, go through a series of steps to provide details about your data ingestion/movement scenario. When you finish steps in the wizard, the wizard automatically creates a pipeline with a Copy Activity to copy data from an Azure blob storage to an Azure SQL database. For more information about Copy Activity, see data movement activities.

Prerequisites

Complete prerequisites listed in the Tutorial Overview article before performing this tutorial.

Create data factory

In this step, you use the Azure portal to create an Azure data factory named ADFTutorialDataFactory.

  1. Log in to Azure portal.
  2. Click + NEW from the top-left corner, click Data + analytics, and click Data Factory.

    New->DataFactory

  3. In the New data factory blade:

    1. Enter ADFTutorialDataFactory for the name. The name of the Azure data factory must be globally unique. If you receive the error: Data factory name “ADFTutorialDataFactory” is not available, change the name of the data factory (for example, yournameADFTutorialDataFactoryYYYYMMDD) and try creating again. See Data Factory - Naming Rules topic for naming rules for Data Factory artifacts.

      Data Factory name not available

    2. Select your Azure subscription.
    3. For Resource Group, do one of the following steps:

      • Select Use existing to select an existing resource group.
      • Select Create new to enter a name for a resource group.

        Some of the steps in this tutorial assume that you use the name: ADFTutorialResourceGroup for the resource group. To learn about resource groups, see Using resource groups to manage your Azure resources.

    4. Select a location for the data factory.
    5. Select Pin to dashboard check box at the bottom of the blade.
    6. Click Create.

      New data factory blade

  4. After the creation is complete, you see the Data Factory blade as shown in the following image:

    Data factory home page

Launch Copy Wizard

  1. On the Data Factory blade, click Copy data [PREVIEW] to launch the Copy Wizard.

    Note

    If you see that the web browser is stuck at "Authorizing...", disable/uncheck Block third-party cookies and site data setting in the browser settings (or) keep it enabled and create an exception for login.microsoftonline.com and then try launching the wizard again.

  2. In the Properties page:

    1. Enter CopyFromBlobToAzureSql for Task name
    2. Enter description (optional).
    3. Change the Start date time and the End date time so that the end date is set to today and start date to five days earlier.
    4. Click Next.

      Copy Tool - Properties page

  3. On the Source data store page, click Azure Blob Storage tile. You use this page to specify the source data store for the copy task.

    Copy Tool - Source data store page

  4. On the Specify the Azure Blob storage account page:

    1. Enter AzureStorageLinkedService for Linked service name.
    2. Confirm that From Azure subscriptions option is selected for Account selection method.
    3. Select your Azure subscription.
    4. Select an Azure storage account from the list of Azure storage accounts available in the selected subscription. You can also choose to enter storage account settings manually by selecting Enter manually option for the Account selection method, and then click Next.

      Copy Tool - Specify the Azure Blob storage account

  5. On Choose the input file or folder page:

    1. Double-click adftutorial (folder).
    2. Select emp.txt, and click Choose

      Copy Tool - Choose the input file or folder

  6. On the Choose the input file or folder page, click Next. Do not select Binary copy.

    Copy Tool - Choose the input file or folder

  7. On the File format settings page, you see the delimiters and the schema that is auto-detected by the wizard by parsing the file. You can also enter the delimiters manually for the copy wizard to stop auto-detecting or to override. Click Next after you review the delimiters and preview data.

    Copy Tool - File format settings

  8. On the Destination data store page, select Azure SQL Database, and click Next.

    Copy Tool - Choose destination store

  9. On Specify the Azure SQL database page:

    1. Enter AzureSqlLinkedService for the Connection name field.
    2. Confirm that From Azure subscriptions option is selected for Server / database selection method.
    3. Select your Azure subscription.
    4. Select Server name and Database.
    5. Enter User name and Password.
    6. Click Next.

      Copy Tool - specify Azure SQL database

  10. On the Table mapping page, select emp for the Destination field from the drop-down list, click down arrow (optional) to see the schema and to preview the data.

    Copy Tool - Table mapping

  11. On the Schema mapping page, click Next.

    Copy Tool - schema mapping

  12. On the Performance settings page, click Next.

    Copy Tool - performance settings

  13. Review information in the Summary page, and click Finish. The wizard creates two linked services, two datasets (input and output), and one pipeline in the data factory (from where you launched the Copy Wizard).

    Copy Tool - performance settings

Launch Monitor and Manage application

  1. On the Deployment page, click the link: Click here to monitor copy pipeline.

    Copy Tool - Deployment succeeded

  2. The monitoring application is launched in a separate tab in your web browser.

    Monitoring App

  3. To see the latest status of hourly slices, click Refresh button in the ACTIVITY WINDOWS list at the bottom. You see five activity windows for five days between start and end times for the pipeline. The list is not automatically refreshed, so you may need to click Refresh a couple of times before you see all the activity windows in the Ready state.
  4. Select an activity window in the list. See the details about it in the Activity Window Explorer on the right.

    Activity window details

    Notice that the dates 11, 12, 13, 14, and 15 are in green color, which means that the daily output slices for these dates have already been produced. You also see this color coding on the pipeline and the output dataset in the diagram view. In the previous step, notice that two slices have already been produced, one slice is currently being processed, and the other two are waiting to be processed (based on the color coding).

    For more information on using this application, see Monitor and manage pipeline using Monitoring App article.

Next steps

In this tutorial, you used Azure blob storage as a source data store and an Azure SQL database as a destination data store in a copy operation. The following table provides a list of data stores supported as sources and destinations by the copy activity:

Category Data store Supported as a source Supported as a sink
Azure Azure Blob storage
  Azure Cosmos DB (DocumentDB API)
  Azure Data Lake Store
  Azure SQL Database
  Azure SQL Data Warehouse
  Azure Search Index
  Azure Table storage
Databases Amazon Redshift
  DB2*
  MySQL*
  Oracle*
  PostgreSQL*
  SAP Business Warehouse*
  SAP HANA*
  SQL Server*
  Sybase*
  Teradata*
NoSQL Cassandra*
  MongoDB*
File Amazon S3
  File System*
  FTP
  HDFS*
  SFTP
Others Generic HTTP
  Generic OData
  Generic ODBC*
  Salesforce
  Web Table (table from HTML)
  GE Historian*

For details about fields/properties that you see in the copy wizard for a data store, click the link for the data store in the table.