Exercise - Use the data factory copy activity

Completed

Once the creation of the Data Factory instance is complete, you can go to the resource where you can begin to create your data pipelines by clicking on the Author & Monitor button. This will open up the following screen:

Authoring in Azure Data Factory

The first step in your pipeline is creating a Copy Activity that copies data between the source and destination using the following steps.

  1. Open the authoring canvas by clicking on the pencil icon on the left sidebar or the create pipeline button to open the authoring canvas.

    Screenshot that shows the Create pipeline option highlighted.

  2. Create the pipeline. Click on the + button in the Factory Resources pane and select Pipeline.

    Screenshot that shows Factory Resources under the Data Factory tab. The plus symbol is selected, exposing Pipeline, both are highlighted.

  3. Add a copy activity. In the Activities pane, open the Move and Transform accordion and drag the Copy Data activity onto the pipeline canvas.

    Using the Copy Activity

With the Copy Activity added, you then start to define the source data

  1. In the Source tab of the Copy Activity settings, click + New to select a data source.

    Creating as data source

  2. For example, In the data store list, select the Amazon S3 tile and click continue

    Select Amazon S3 as a data source

  3. In the file format list, select the DelimitedText format tile and click continue

    Screenshot that shows Delimited Text selected in the Select format list.

  4. In Set Properties window, give your dataset an understandable name and click on the Linked Service dropdown. If you have not created your S3 Linked Service, select New.

    Screenshot that shows the Set Properties window, with filter highlighted under Linked service.

  5. Specific to the S3 Linked Service configuration pane, specify your S3 access key and secret key. The Data Factory service encrypts credentials with certificates managed by Microsoft. For more information, see Data Movement Security Considerations. To verify your credentials are valid, click Test Connection. Click Create when finished.

    Setting data source access with keys

  6. Once you have created and selected the linked service, specify the rest of your dataset settings. These settings specify how and where in your connection you want to pull the data. Click Finish once completed.

    Finishing up data source settings

  7. To verify your dataset is configured correctly, click Preview Data in the Source tab of the Copy Activity to get a small snapshot of your data.

    Previewing data

With the source data defined, then you will define the sink into which the data will be loaded. In this example the sink will be Azure Data Lake Storage Gen2 by performing the following steps:

  1. In the Sink tab, click + New

    Defining a data sink in the Copy Activity

  2. Select the Azure Data lake Storage Gen2 tile and click continue

    Defining the dataset

  3. In Set Properties side nav, give your dataset an understandable name and click on the Linked Service dropdown. If you have not created your ADLS Linked Service, select New.

    Setting the dataset properties

  4. In the ADLS linked service configuration pane, select your authentication method and enter your credentials. In the example below, an account key and selected my storage account from the drop down.

    Finalizing the dataset properties

  5. Once you have configured your linked service, enter in the ADLS dataset configuration. Click finish once completed.

    Finish the dataset properties

At this point, you have fully configured your copy activity.

  1. To test it out, click on the Debug button at the top of the pipeline canvas. This will start a pipeline debug run.

    Testing the Copy Activity

  2. To monitor the progress of a pipeline debug run, click on the Output tab of the pipeline

    Monitoring the Copy Activity

  3. To view a more detailed description of the activity output, click on the eyeglasses icon. This will open up the copy monitoring screen which provides useful metrics such as Data read/written, throughput and in-depth duration statistics.

    Viewing the Copy Activity Results

To verify the copy worked as expected, open up your ADLS gen2 storage account and check to see your file was written as expected