Copy Activity in Azure Data Factory


In Azure Data Factory, you can use Copy Activity to copy data among data stores located on-premises and in the cloud. After the data is copied, it can be further transformed and analyzed. You can also use Copy Activity to publish transformation and analysis results for business intelligence (BI) and application consumption.

Role of Copy Activity

Copy Activity is executed on an Integration Runtime. For different data copy scenario, different flavor of Integration Runtime can be leveraged:

  • When copying data between data stores that both are publicly accessible, copy activity can be empowered by Azure Integration Runtime, which is secure, reliable, scalable, and globally available.
  • When copying data from/to data stores located on-premises or in a network with access control (for example, Azure Virtual Network), you need to set up a self-hosted Integrated Runtime to empower data copy.

Integration Runtime needs to be associated with each source and sink data store. Learn details on how copy activity determines which IR to use.

Copy Activity goes through the following stages to copy data from a source to a sink. The service that powers Copy Activity:

  1. Reads data from a source data store.
  2. Performs serialization/deserialization, compression/decompression, column mapping, etc. It does these operations based on the configurations of the input dataset, output dataset, and Copy Activity.
  3. Writes data to the sink/destination data store.

Copy Activity Overview

Supported data stores and formats

Category Data store Supported as a source Supported as a sink Supported by Azure IR Supported by Self-hosted IR
Azure Azure Blob Storage
  Azure Cosmos DB (SQL API)
  Azure Cosmos DB's API for MongoDB
  Azure Data Explorer
  Azure Data Lake Storage Gen1
  Azure Data Lake Storage Gen2
  Azure Database for MariaDB
  Azure Database for MySQL
  Azure Database for PostgreSQL
  Azure File Storage
  Azure SQL Database
  Azure SQL Database Managed Instance
  Azure SQL Data Warehouse
  Azure Search Index
  Azure Table Storage
Database Amazon Redshift
  Drill (Preview)
  Google BigQuery
  Apache Impala (Preview)
  Microsoft Access
  Presto (Preview)
  SAP Business Warehouse Open Hub
  SAP Business Warehouse via MDX
  SQL Server
NoSQL Cassandra
  Couchbase (Preview)
File Amazon S3
  File System
  Google Cloud Storage
Generic protocol Generic HTTP
  Generic OData
  Generic ODBC
  Generic REST
Services and apps Amazon Marketplace Web Service (Preview)
  Common Data Service for Apps
  Concur (Preview)
  Dynamics 365
  Dynamics AX (Preview)
  Dynamics CRM
  Google AdWords (Preview)
  HubSpot (Preview)
  Jira (Preview)
  Magento (Preview)
  Marketo (Preview)
  Office 365 (Preview)
  Oracle Eloqua (Preview)
  Oracle Responsys (Preview)
  Oracle Service Cloud (Preview)
  Paypal (Preview)
  QuickBooks (Preview)
  Salesforce Service Cloud
  Salesforce Marketing Cloud (Preview)
  SAP Cloud for Customer (C4C)
  Shopify (Preview)
  Square (Preview)
  Web Table (HTML table)
  Xero (Preview)
  Zoho (Preview)


Any connector marked as Preview means that you can try it out and give us feedback. If you want to take a dependency on preview connectors in your solution, please contact Azure support.

Supported file formats

You can use Copy Activity to copy files as-is between two file-based data stores, in which case the data is copied efficiently without any serialization/deserialization.

Copy Activity also supports reading from and writing to files in specified formats: Text, JSON, Avro, ORC, and Parquet, and compressing and decompresing files with the following codecs: GZip, Deflate, BZip2, and ZipDeflate. See Supported file and compression formats with details.

For example, you can do the following copy activities:

  • Copy data in on-premises SQL Server and write to Azure Data Lake Storage Gen2 in Parquet format.
  • Copy files in text (CSV) format from on-premises File System and write to Azure Blob in Avro format.
  • Copy zipped files from on-premises File System and decompress then land to Azure Data Lake Storage Gen2.
  • Copy data in GZip compressed text (CSV) format from Azure Blob and write to Azure SQL Database.
  • And many more cases with serialization/deserialization or compression/decompression need.

Supported regions

The service that powers Copy Activity is available globally in the regions and geographies listed in Azure Integration Runtime locations. The globally available topology ensures efficient data movement that usually avoids cross-region hops. See Services by region for availability of Data Factory and Data Movement in a region.


To use copy activity in Azure Data Factory, you need to:

  1. Create linked services for source data store and sink data store. Refer to the connector article's "Linked service properties" section on how to configure and the supported properties. You can find the supported connector list in Supported data stores and formats section.
  2. Create datasets for source and sink. Refer to the source and sink connector articles' "Dataset properties" section on how to configure and the supported properties.
  3. Create a pipeline with copy activity. The next section provides an example.


The following template of a copy activity contains an exhaustive list of supported properties. Specify the ones that fit your scenario.

        "name": "CopyActivityTemplate",
        "type": "Copy",
        "inputs": [
                "referenceName": "<source dataset name>",
                "type": "DatasetReference"
        "outputs": [
                "referenceName": "<sink dataset name>",
                "type": "DatasetReference"
        "typeProperties": {
            "source": {
                "type": "<source type>",
            "sink": {
                "type": "<sink type>"
                "type": "TabularTranslator",
                "columnMappings": "<column mapping>"
            "dataIntegrationUnits": <number>,
            "parallelCopies": <number>,
            "enableStaging": true/false,
            "stagingSettings": {
            "enableSkipIncompatibleRow": true/false,
            "redirectIncompatibleRowSettings": {

Syntax details

Property Description Required
type The type property of a copy activity must be set to: Copy Yes
inputs Specify the dataset you created which points to the source data. Copy activity supports only a single input. Yes
outputs Specify the dataset you created which points to the sink data. Copy activity supports only a single output. Yes
typeProperties A group of properties to configure copy activity. Yes
source Specify the copy source type and the corresponding properties on how to retrieve data.

Learn details from the "Copy activity properties" section in connector article listed in Supported data stores and formats.
sink Specify the copy sink type and the corresponding properties on how to write data.

Learn details from the "Copy activity properties" section in connector article listed in Supported data stores and formats.
translator Specify explicit column mappings from source to sink. Applies when the default copy behavior cannot fulfill your need.

Learn details from Schema and data type mapping.
dataIntegrationUnits Specify the powerfulness of Azure Integration Runtime to empower data copy. Formerly known as cloud Data Movement Units (DMU).

Learn details from Data Integration Units.
parallelCopies Specify the parallelism that you want Copy Activity to use when reading data from source and writing data to sink.

Learn details from Parallel copy.
Choose to stage the interim data in aa blob storage instead of directly copy data from source to sink.

Learn the useful scenarios and configuration details from Staged copy.
Choose how to handle incompatible rows when copying data from source to sink.

Learn details from Fault tolerance.


You can monitor the copy activity run on Azure Data Factory "Author & Monitor" UI or programmatically. You can then compare the performance and configuration of your scenario to Copy Activity's performance reference from in-house testing.

Monitor visually

To visually monitor the copy activity run, go to your data factory -> Author & Monitor -> Monitor tab, you see a list of pipeline runs with a "View Activity Runs" link in the Actions column.

Monitor pipeline runs

Click to see the list of activities in this pipeline run. In the Actions column, you have links to the copy activity input, output, errors (if copy activity run fails), and details.

Monitor activity runs

Click the "Details" link under Actions to see copy activity's execution details and performance characteristics. It shows you information including volume/rows/files of data copied from source to sink, throughput, steps it goes through with corresponding duration and used configurations for your copy scenario.


For some scenarios, you will also see "Performance tuning tips" on top of the copy monitoring page, which tells you the bottleneck identified and guides you on what to change so as to boost copy throughput, see example with details here.

Example: copy from Amazon S3 to Azure Data Lake Store Monitor activity run details

Example: copy from Azure SQL Database to Azure SQL Data Warehouse using staged copy Monitor activity run details

Monitor programmatically

Copy activity execution details and performance characteristics are also returned in Copy Activity run result -> Output section. Below is an exhaustive list; only the applicable ones to your copy scenario will show up. Learn how to monitor activity run from quickstart monitoring section.

Property name Description Unit
dataRead Data size read from source Int64 value in bytes
dataWritten Data size written to sink Int64 value in bytes
filesRead Number of files being copied when copying data from file storage. Int64 value (no unit)
fileScanned Number of files being scanned from the source file storage. Int64 value (no unit)
filesWritten Number of files being copied when copying data to file storage. Int64 value (no unit)
rowsCopied Number of rows being copied (not applicable for binary copy). Int64 value (no unit)
rowsSkipped Number of incompatible rows being skipped. You can turn on the feature by set "enableSkipIncompatibleRow" to true. Int64 value (no unit)
throughput Ratio at which data are transferred Floating point number in KB/s
copyDuration The duration of the copy Int32 value in seconds
sqlDwPolyBase If PolyBase is used when copying data into SQL Data Warehouse. Boolean
redshiftUnload If UNLOAD is used when copying data from Redshift. Boolean
hdfsDistcp If DistCp is used when copying data from HDFS. Boolean
effectiveIntegrationRuntime Show which Integration Runtime(s) is used to empower the activity run, in the format of <IR name> (<region if it's Azure IR>). Text (string)
usedDataIntegrationUnits The effective Data Integration Units during copy. Int32 value
usedParallelCopies The effective parallelCopies during copy. Int32 value
redirectRowPath Path to the log of skipped incompatible rows in the blob storage you configure under "redirectIncompatibleRowSettings". See below example. Text (string)
executionDetails More details on the stages copy activity goes through, and the corresponding steps, duration, used configurations, etc. It's not recommended to parse this section as it may change. Array
"output": {
    "dataRead": 107280845500,
    "dataWritten": 107280845500,
    "filesRead": 10,
    "filesWritten": 10,
    "copyDuration": 224,
    "throughput": 467707.344,
    "errors": [],
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (East US 2)",
    "usedDataIntegrationUnits": 32,
    "usedParallelCopies": 8,
    "executionDetails": [
            "source": {
                "type": "AmazonS3"
            "sink": {
                "type": "AzureDataLakeStore"
            "status": "Succeeded",
            "start": "2018-01-17T15:13:00.3515165Z",
            "duration": 221,
            "usedDataIntegrationUnits": 32,
            "usedParallelCopies": 8,
            "detailedDurations": {
                "queuingDuration": 2,
                "transferDuration": 219

Schema and data type mapping

See the Schema and data type mapping, which describes how copy activity maps your source data to sink.

Fault tolerance

By default, copy activity stops copying data and returns failure when it encounters incompatible data between source and sink. You can explicitly configure to skip and log the incompatible rows and only copy those compatible data to make the copy succeeded. See the Copy Activity fault tolerance on more details.

Performance and tuning

See the Copy Activity performance and tuning guide, which describes key factors that affect the performance of data movement (Copy Activity) in Azure Data Factory. It also lists the observed performance during internal testing and discusses various ways to optimize the performance of Copy Activity.

In some cases, when you execute a copy activity in ADF, you will directly see "Performance tuning tips" on top of the copy activity monitoring page as shown in the following example. It not only tells you the bottleneck identified for the given copy run, but also guides you on what to change so as to boost copy throughput. The performance tuning tips currently provide suggestions like to use PolyBase when copying data into Azure SQL Data Warehouse, to increase Azure Cosmos DB RU or Azure SQL DB DTU when the resource on data store side is the bottleneck, to remove the unnecessary staged copy, etc. The performance tuning rules will be gradually enriched as well.

Example: copy into Azure SQL DB with performance tuning tips

In this sample, during copy run, ADF notice the sink Azure SQL DB reaches high DTU utilization which slows down the write operations, thus the suggestion is to increase the Azure SQL DB tier with more DTU.

Copy monitoring with performance tuning tips

Incremental copy

Data Factory supports scenarios for incrementally copying delta data from a source data store to a destination data store. See Tutorial: incrementally copy data.

Read and write partitioned data

In version 1, Azure Data Factory supported reading or writing partitioned data by using SliceStart/SliceEnd/WindowStart/WindowEnd system variables. In the current version, you can achieve this behavior by using a pipeline parameter and trigger's start time/scheduled time as a value of the parameter. For more information, see How to read or write partitioned data.

Next steps

See the following quickstarts, tutorials, and samples: