Move data from DB2 using Azure Data Factory

This article outlines how you can use the Copy Activity in an Azure data factory to copy data from an on-premises DB2 database to any data store listed under Sink column in the Supported Sources and Sinks section. This article builds on the data movement activities article, which presents a general overview of data movement with copy activity and supported data store combinations.

Data factory currently supports only moving data from a DB2 database to supported sink data stores, but not moving data from other data stores to a DB2 database.

Prerequisites

Data Factory supports connecting to on-premises DB2 database using the Data Management Gateway. See Data Management Gateway article to learn about Data Management Gateway and Move data from on-premises to cloud article for step-by-step instructions on setting up the gateway a data pipeline to move data.

Gateway is required even if the DB2 is hosted in an Azure IaaS VM. You can install the gateway on the same IaaS VM as the data store or on a different VM as long as the gateway can connect to the database.

The Data Management Gateway provides a built-in DB2 driver, therefore you don't need to manually install any driver when copying data from DB2.

Note

See Troubleshoot gateway issues for tips on troubleshooting connection/gateway related issues.

Supported versions

This DB2 connector supports the following IBM DB2 platforms and versions with Distributed Relational Database Architecture (DRDA) SQL Access Manager (SQLAM) version 9, 10 and 11:

  • IBM DB2 for z/OS 11.1
  • IBM DB2 for z/OS 10.1
  • IBM DB2 for i 7.2
  • IBM DB2 for i 7.1
  • IBM DB2 for LUW 11
  • IBM DB2 for LUW 10.5
  • IBM DB2 for LUW 10.1
Tip

If you receive an error message that states "The package corresponding to an SQL statement execution request was not found. SQLSTATE=51002 SQLCODE=-805", the reason is a needed package is not created for normal user on such OS. Follow below instructions according to your DB2 server type:

  • DB2 for i (AS400): let power user create collection for the login user before using copy activity. Command: create collection <username>
  • DB2 for z/OS or LUW: use a high privilege account - power user or admin with package authorities and BIND, BINDADD, GRANT EXECUTE TO PUBLIC permissions - to run the copy activity once, then the needed package will be auto created during copy. Afterwards, you can switch back to normal user for your subsequent copy runs.

Getting started

You can create a pipeline with a copy activity that moves data from an on-premises DB2 data store by using different tools/APIs.

  • The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard.
  • You can also use the following tools to create a pipeline: Azure portal, Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.

Whether you use the tools or APIs, you perform the following steps to create a pipeline that moves data from a source data store to a sink data store:

  1. Create linked services to link input and output data stores to your data factory.
  2. Create datasets to represent input and output data for the copy operation.
  3. Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output.

When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools/APIs (except .NET API), you define these Data Factory entities by using the JSON format. For a sample with JSON definitions for Data Factory entities that are used to copy data from an on-premises DB2 data store, see JSON example: Copy data from DB2 to Azure Blob section of this article.

The following sections provide details about JSON properties that are used to define Data Factory entities specific to a DB2 data store:

Linked service properties

The following table provides description for JSON elements specific to DB2 linked service.

Property Description Required
type The type property must be set to: OnPremisesDB2 Yes
server Name of the DB2 server. Yes
database Name of the DB2 database. Yes
schema Name of the schema in the database. The schema name is case-sensitive. No
authenticationType Type of authentication used to connect to the DB2 database. Possible values are: Anonymous, Basic, and Windows. Yes
username Specify user name if you are using Basic or Windows authentication. No
password Specify password for the user account you specified for the username. No
gatewayName Name of the gateway that the Data Factory service should use to connect to the on-premises DB2 database. Yes

Dataset properties

For a full list of sections & properties available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types (Azure SQL, Azure blob, Azure table, etc.).

The typeProperties section is different for each type of dataset and provides information about the location of the data in the data store. The typeProperties section for dataset of type RelationalTable (which includes DB2 dataset) has the following properties.

Property Description Required
tableName Name of the table in the DB2 Database instance that linked service refers to. The tableName is case-sensitive. No (if query of RelationalSource is specified)

Copy activity properties

For a full list of sections & properties available for defining activities, see the Creating Pipelines article. Properties such as name, description, input and output tables, and policies are available for all types of activities.

Whereas, properties available in the typeProperties section of the activity vary with each activity type. For Copy activity, they vary depending on the types of sources and sinks.

For Copy Activity, when source is of type RelationalSource (which includes DB2) the following properties are available in typeProperties section:

Property Description Allowed values Required
query Use the custom query to read data. SQL query string. For example: "query": "select * from "MySchema"."MyTable"". No (if tableName of dataset is specified)
Note

Schema and table names are case-sensitive. Enclose the names in "" (double quotes) in the query.

Example:

"query": "select * from "DB2ADMIN"."Customers""

JSON example: Copy data from DB2 to Azure Blob

This example provides sample JSON definitions that you can use to create a pipeline by using Azure portal or Visual Studio or Azure PowerShell. It shows how to copy data from DB2 database and Azure Blob Storage. However, data can be copied to any of the sinks stated here using the Copy Activity in Azure Data Factory.

The sample has the following data factory entities:

  1. A linked service of type OnPremisesDb2.
  2. A linked service of type AzureStorage.
  3. An input dataset of type RelationalTable.
  4. An output dataset of type AzureBlob.
  5. A pipeline with Copy Activity that uses RelationalSource and BlobSink.

The sample copies data from a query result in a DB2 database to an Azure blob hourly. The JSON properties used in these samples are described in sections following the samples.

As a first step, install and configure a data management gateway. Instructions are in the moving data between on-premises locations and cloud article.

DB2 linked service:

{
    "name": "OnPremDb2LinkedService",
    "properties": {
        "type": "OnPremisesDb2",
        "typeProperties": {
            "server": "<server>",
            "database": "<database>",
            "schema": "<schema>",
            "authenticationType": "<authentication type>",
            "username": "<username>",
            "password": "<password>",
            "gatewayName": "<gatewayName>"
        }
    }
}

Azure Blob storage linked service:

{
    "name": "AzureStorageLinkedService",
    "properties": {
        "type": "AzureStorageLinkedService",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=<AccountName>;AccountKey=<AccountKey>"
        }
    }
}

DB2 input dataset:

The sample assumes you have created a table “MyTable” in DB2 and it contains a column called “timestamp” for time series data.

Setting “external”: true informs the Data Factory service that this dataset is external to the data factory and is not produced by an activity in the data factory. Notice that the type is set to RelationalTable.

{
    "name": "Db2DataSet",
    "properties": {
        "type": "RelationalTable",
        "linkedServiceName": "OnPremDb2LinkedService",
        "typeProperties": {},
        "availability": {
            "frequency": "Hour",
            "interval": 1
        },
        "external": true,
        "policy": {
            "externalData": {
                "retryInterval": "00:01:00",
                "retryTimeout": "00:10:00",
                "maximumRetry": 3
            }
        }
    }
}

Azure Blob output dataset:

Data is written to a new blob every hour (frequency: hour, interval: 1). The folder path for the blob is dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, day, and hours parts of the start time.

{
    "name": "AzureBlobDb2DataSet",
    "properties": {
        "type": "AzureBlob",
        "linkedServiceName": "AzureStorageLinkedService",
        "typeProperties": {
            "folderPath": "mycontainer/db2/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}",
            "format": {
                "type": "TextFormat",
                "rowDelimiter": "\n",
                "columnDelimiter": "\t"
            },
            "partitionedBy": [
                {
                    "name": "Year",
                    "value": {
                        "type": "DateTime",
                        "date": "SliceStart",
                        "format": "yyyy"
                    }
                },
                {
                    "name": "Month",
                    "value": {
                        "type": "DateTime",
                        "date": "SliceStart",
                        "format": "MM"
                    }
                },
                {
                    "name": "Day",
                    "value": {
                        "type": "DateTime",
                        "date": "SliceStart",
                        "format": "dd"
                    }
                },
                {
                    "name": "Hour",
                    "value": {
                        "type": "DateTime",
                        "date": "SliceStart",
                        "format": "HH"
                    }
                }
            ]
        },
        "availability": {
            "frequency": "Hour",
            "interval": 1
        }
    }
}

Pipeline with Copy activity:

The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to RelationalSource and sink type is set to BlobSink. The SQL query specified for the query property selects the data from the Orders table.

{
    "name": "CopyDb2ToBlob",
    "properties": {
        "description": "pipeline for copy activity",
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "RelationalSource",
                        "query": "select * from \"Orders\""
                    },
                    "sink": {
                        "type": "BlobSink"
                    }
                },
                "inputs": [
                    {
                        "name": "Db2DataSet"
                    }
                ],
                "outputs": [
                    {
                        "name": "AzureBlobDb2DataSet"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1
                },
                "scheduler": {
                    "frequency": "Hour",
                    "interval": 1
                },
                "name": "Db2ToBlob"
            }
        ],
        "start": "2014-06-01T18:00:00Z",
        "end": "2014-06-01T19:00:00Z"
    }
}

Type mapping for DB2

As mentioned in the data movement activities article, the Copy activity performs automatic type conversions from source types to sink types with the following 2-step approach:

  1. Convert from native source types to .NET type
  2. Convert from .NET type to native sink type

When moving data to DB2, the following mappings are used from DB2 type to .NET type.

DB2 Database type .NET Framework type
SmallInt Int16
Integer Int32
BigInt Int64
Real Single
Double Double
Float Double
Decimal Decimal
DecimalFloat Decimal
Numeric Decimal
Date Datetime
Time TimeSpan
Timestamp DateTime
Xml Byte[]
Char String
VarChar String
LongVarChar String
DB2DynArray String
Binary Byte[]
VarBinary Byte[]
LongVarBinary Byte[]
Graphic String
VarGraphic String
LongVarGraphic String
Clob String
Blob Byte[]
DbClob String
SmallInt Int16
Integer Int32
BigInt Int64
Real Single
Double Double
Float Double
Decimal Decimal
DecimalFloat Decimal
Numeric Decimal
Date Datetime
Time TimeSpan
Timestamp DateTime
Xml Byte[]
Char String

Map source to sink columns

To learn about mapping columns in source dataset to columns in sink dataset, see Mapping dataset columns in Azure Data Factory.

Repeatable read from relational sources

When copying data from relational data stores, keep repeatability in mind to avoid unintended outcomes. In Azure Data Factory, you can rerun a slice manually. You can also configure retry policy for a dataset so that a slice is rerun when a failure occurs. When a slice is rerun in either way, you need to make sure that the same data is read no matter how many times a slice is run. See Repeatable read from relational sources.

Performance and Tuning

See Copy Activity Performance & Tuning Guide to learn about key factors that impact performance of data movement (Copy Activity) in Azure Data Factory and various ways to optimize it.