Copy data from Netezza by using Azure Data Factory

This article outlines how to use Copy Activity in Azure Data Factory to copy data from Netezza. The article builds on Copy Activity in Azure Data Factory, which presents a general overview of Copy Activity.

Tip

For data migration scenario from Netezza to Azure, learn more from Use Azure Data Factory to migrate data from on-premises Netezza server to Azure.

Supported capabilities

This Netezza connector is supported for the following activities:

You can copy data from Netezza to any supported sink data store. For a list of data stores that Copy Activity supports as sources and sinks, see Supported data stores and formats.

Netezza connector supports parallel copying from source. See the Parallel copy from Netezza section for details.

Azure Data Factory provides a built-in driver to enable connectivity. You don't need to manually install any driver to use this connector.

Prerequisites

If your data store is configured in one of the following ways, you need to set up a Self-hosted Integration Runtime in order to connect to this data store:

  • The data store is located inside an on-premises network, inside Azure Virtual Network, or inside Amazon Virtual Private Cloud.
  • The data store is a managed cloud data service where the access is restricted to IPs whitelisted in the firewall rules.

Get started

You can create a pipeline that uses a copy activity by using the .NET SDK, the Python SDK, Azure PowerShell, the REST API, or an Azure Resource Manager template. See the Copy Activity tutorial for step-by-step instructions on how to create a pipeline that has a copy activity.

The following sections provide details about properties you can use to define Data Factory entities that are specific to the Netezza connector.

Linked service properties

The following properties are supported for the Netezza linked service:

Property Description Required
type The type property must be set to Netezza. Yes
connectionString An ODBC connection string to connect to Netezza.
Mark this field as a SecureString to store it securely in Data Factory. You can also put password in Azure Key Vault and pull the pwd configuration out of the connection string. Refer to the following samples and Store credentials in Azure Key Vault article with more details.
Yes
connectVia The Integration Runtime to use to connect to the data store. Learn more from Prerequisites section. If not specified, the default Azure Integration Runtime is used. No

A typical connection string is Server=<server>;Port=<port>;Database=<database>;UID=<user name>;PWD=<password>. The following table describes more properties that you can set:

Property Description Required
SecurityLevel The level of security (SSL/TLS) that the driver uses for the connection to the data store. Example: SecurityLevel=preferredSecured. Supported values are:
- Only unsecured (onlyUnSecured): The driver doesn't use SSL.
- Preferred unsecured (preferredUnSecured) (default): If the server provides a choice, the driver doesn't use SSL.
- Preferred secured (preferredSecured): If the server provides a choice, the driver uses SSL.
- Only secured (onlySecured): The driver doesn't connect unless an SSL connection is available.
No
CaCertFile The full path to the SSL certificate that's used by the server. Example: CaCertFile=<cert path>; Yes, if SSL is enabled

Example

{
    "name": "NetezzaLinkedService",
    "properties": {
        "type": "Netezza",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "Server=<server>;Port=<port>;Database=<database>;UID=<user name>;PWD=<password>"
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Example: store password in Azure Key Vault

{
    "name": "NetezzaLinkedService",
    "properties": {
        "type": "Netezza",
        "typeProperties": {
            "connectionString": {
                 "type": "SecureString",
                 "value": "Server=<server>;Port=<port>;Database=<database>;UID=<user name>;"
            },
            "pwd": { 
                "type": "AzureKeyVaultSecret", 
                "store": { 
                    "referenceName": "<Azure Key Vault linked service name>", 
                    "type": "LinkedServiceReference" 
                }, 
                "secretName": "<secretName>" 
            }
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Dataset properties

This section provides a list of properties that the Netezza dataset supports.

For a full list of sections and properties that are available for defining datasets, see Datasets.

To copy data from Netezza, set the type property of the dataset to NetezzaTable. The following properties are supported:

Property Description Required
type The type property of the dataset must be set to: NetezzaTable Yes
schema Name of the schema. No (if "query" in activity source is specified)
table Name of the table. No (if "query" in activity source is specified)
tableName Name of the table with schema. This property is supported for backward compatibility. Use schema and table for new workload. No (if "query" in activity source is specified)

Example

{
    "name": "NetezzaDataset",
    "properties": {
        "type": "NetezzaTable",
        "linkedServiceName": {
            "referenceName": "<Netezza linked service name>",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {}
    }
}

Copy Activity properties

This section provides a list of properties that the Netezza source supports.

For a full list of sections and properties that are available for defining activities, see Pipelines.

Netezza as source

Tip

To load data from Netezza efficiently by using data partitioning, learn more from Parallel copy from Netezza section.

To copy data from Netezza, set the source type in Copy Activity to NetezzaSource. The following properties are supported in the Copy Activity source section:

Property Description Required
type The type property of the Copy Activity source must be set to NetezzaSource. Yes
query Use the custom SQL query to read data. Example: "SELECT * FROM MyTable" No (if "tableName" in dataset is specified)
partitionOptions Specifies the data partitioning options used to load data from Netezza.
Allow values are: None (default), DataSlice, and DynamicRange.
When a partition option is enabled (that is, not None), the degree of parallelism to concurrently load data from a Netezza database is controlled by parallelCopies setting on the copy activity.
No
partitionSettings Specify the group of the settings for data partitioning.
Apply when partition option isn't None.
No
partitionColumnName Specify the name of the source column in integer type that will be used by range partitioning for parallel copy. If not specified, the primary key of the table is autodetected and used as the partition column.
Apply when the partition option is DynamicRange. If you use a query to retrieve the source data, hook ?AdfRangePartitionColumnName in WHERE clause. See example in Parallel copy from Netezza section.
No
partitionUpperBound The maximum value of the partition column to copy data out.
Apply when partition option is DynamicRange. If you use query to retrieve source data, hook ?AdfRangePartitionUpbound in the WHERE clause. For an example, see the Parallel copy from Netezza section.
No
partitionLowerBound The minimum value of the partition column to copy data out.
Apply when the partition option is DynamicRange. If you use a query to retrieve the source data, hook ?AdfRangePartitionLowbound in the WHERE clause. For an example, see the Parallel copy from Netezza section.
No

Example:

"activities":[
    {
        "name": "CopyFromNetezza",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<Netezza input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "NetezzaSource",
                "query": "SELECT * FROM MyTable"
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

Parallel copy from Netezza

The Data Factory Netezza connector provides built-in data partitioning to copy data from Netezza in parallel. You can find data partitioning options on the Source table of the copy activity.

Screenshot of partition options

When you enable partitioned copy, Data Factory runs parallel queries against your Netezza source to load data by partitions. The parallel degree is controlled by the parallelCopies setting on the copy activity. For example, if you set parallelCopies to four, Data Factory concurrently generates and runs four queries based on your specified partition option and settings, and each query retrieves a portion of data from your Netezza database.

You are suggested to enable parallel copy with data partitioning especially when you load large amount of data from your Netezza database. The following are suggested configurations for different scenarios. When copying data into file-based data store, it's recommanded to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.

Scenario Suggested settings
Full load from large table. Partition option: Data Slice.

During execution, Data Factory automatically partitions the data based on Netezza's built-in data slices, and copies data by partitions.
Load large amount of data by using a custom query. Partition option: Data Slice.
Query: SELECT * FROM <TABLENAME> WHERE mod(datasliceid, ?AdfPartitionCount) = ?AdfDataSliceCondition AND <your_additional_where_clause>.
During execution, Data Factory replaces ?AdfPartitionCount (with parallel copy number set on copy activity) and ?AdfDataSliceCondition with the data slice partition logic, and sends to Netezza.
Load large amount of data by using a custom query, having an integer column with evenly distributed value for range partitioning. Partition options: Dynamic range partition.
Query: SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>.
Partition column: Specify the column used to partition data. You can partition against the column with integer data type.
Partition upper bound and partition lower bound: Specify if you want to filter against the partition column to retrieve data only between the lower and upper range.

During execution, Data Factory replaces ?AdfRangePartitionColumnName, ?AdfRangePartitionUpbound, and ?AdfRangePartitionLowbound with the actual column name and value ranges for each partition, and sends to Netezza.
For example, if your partition column "ID" set with the lower bound as 1 and the upper bound as 80, with parallel copy set as 4, Data Factory retrieves data by 4 partitions. Their IDs are between [1,20], [21, 40], [41, 60], and [61, 80], respectively.

Example: query with data slice partition

"source": {
    "type": "NetezzaSource",
    "query": "SELECT * FROM <TABLENAME> WHERE mod(datasliceid, ?AdfPartitionCount) = ?AdfDataSliceCondition AND <your_additional_where_clause>",
    "partitionOption": "DataSlice"
}

Example: query with dynamic range partition

"source": {
    "type": "NetezzaSource",
    "query": "SELECT * FROM <TABLENAME> WHERE ?AdfRangePartitionColumnName <= ?AdfRangePartitionUpbound AND ?AdfRangePartitionColumnName >= ?AdfRangePartitionLowbound AND <your_additional_where_clause>",
    "partitionOption": "DynamicRange",
    "partitionSettings": {
        "partitionColumnName": "<dynamic_range_partition_column_name>",
        "partitionUpperBound": "<upper_value_of_partition_column>",
        "partitionLowerBound": "<lower_value_of_partition_column>"
    }
}

Lookup activity properties

To learn details about the properties, check Lookup activity.

Next steps

For a list of data stores that Copy Activity supports as sources and sinks in Azure Data Factory, see Supported data stores and formats.