Copy data from MongoDB using Azure Data Factory

This article outlines how to use the Copy Activity in Azure Data Factory to copy data from a MongoDB database. It builds on the copy activity overview article that presents a general overview of copy activity.

Important

ADF release this new version of MongoDB connector which provides better native MongoDB support. If you are using the previous MongoDB connector in your solution which is supported as-is for backward compatibility, refer to MongoDB connector (legacy) article.

Supported capabilities

You can copy data from MongoDB database to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.

Specifically, this MongoDB connector supports versions up to 3.4.

Prerequisites

If your data store is configured in one of the following ways, you need to set up a Self-hosted Integration Runtime in order to connect to this data store:

  • The data store is located inside an on-premises network, inside Azure Virtual Network, or inside Amazon Virtual Private Cloud.
  • The data store is a managed cloud data service where the access is restricted to IPs whitelisted in the firewall rules.

Getting started

You can use one of the following tools or SDKs to use the copy activity with a pipeline. Select a link for step-by-step instructions:

The following sections provide details about properties that are used to define Data Factory entities specific to MongoDB connector.

Linked service properties

The following properties are supported for MongoDB linked service:

Property Description Required
type The type property must be set to: MongoDbV2 Yes
connectionString Specify the MongoDB connection string e.g. mongodb://[username:password@]host[:port][/[database][?options]]. Refer to MongoDB manual on connection string for more details.

Mark this field as a SecureString type to store it securely in Data Factory. You can also reference a secret stored in Azure Key Vault.
Yes
database Name of the database that you want to access. Yes
connectVia The Integration Runtime to be used to connect to the data store. Learn more from Prerequisites section. If not specified, it uses the default Azure Integration Runtime. No

Example:

{
    "name": "MongoDBLinkedService",
    "properties": {
        "type": "MongoDbV2",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "mongodb://[username:password@]host[:port][/[database][?options]]"
            },
            "database": "myDatabase"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Dataset properties

For a full list of sections and properties that are available for defining datasets, see Datasets and linked services. The following properties are supported for MongoDB dataset:

Property Description Required
type The type property of the dataset must be set to: MongoDbV2Collection Yes
collectionName Name of the collection in MongoDB database. Yes

Example:

{
    "name": "MongoDbDataset",
    "properties": {
        "type": "MongoDbV2Collection",
        "typeProperties": {
            "collectionName": "<Collection name>"
        },
        "schema": [],
        "linkedServiceName": {
            "referenceName": "<MongoDB linked service name>",
            "type": "LinkedServiceReference"
        }
    }
}

Copy activity properties

For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by MongoDB source.

MongoDB as source

The following properties are supported in the copy activity source section:

Property Description Required
type The type property of the copy activity source must be set to: MongoDbV2Source Yes
filter Specifies selection filter using query operators. To return all documents in a collection, omit this parameter or pass an empty document ({}). No
cursorMethods.project Specifies the fields to return in the documents for projection. To return all fields in the matching documents, omit this parameter. No
cursorMethods.sort Specifies the order in which the query returns matching documents. Refer to cursor.sort(). No
cursorMethods.limit Specifies the maximum number of documents the server returns. Refer to cursor.limit(). No
cursorMethods.skip Specifies the number of documents to skip and from where MongoDB begins to return results. Refer to cursor.skip(). No
batchSize Specifies the number of documents to return in each batch of the response from MongoDB instance. In most cases, modifying the batch size will not affect the user or the application. Cosmos DB limits each batch cannot exceed 40MB in size, which is the sum of the batchSize number of documents' size, so decrease this value if your document size being large. No
(the default is 100)

Tip

ADF support consuming BSON document in Strict mode. Make sure your filter query is in Strict mode instead of Shell mode. More description can be found at MongoDB manual.

Example:

"activities":[
    {
        "name": "CopyFromMongoDB",
        "type": "Copy",
        "inputs": [
            {
                "referenceName": "<MongoDB input dataset name>",
                "type": "DatasetReference"
            }
        ],
        "outputs": [
            {
                "referenceName": "<output dataset name>",
                "type": "DatasetReference"
            }
        ],
        "typeProperties": {
            "source": {
                "type": "MongoDbV2Source",
                "filter": "{datetimeData: {$gte: ISODate(\"2018-12-11T00:00:00.000Z\"),$lt: ISODate(\"2018-12-12T00:00:00.000Z\")}, _id: ObjectId(\"5acd7c3d0000000000000000\") }",
                "cursorMethods": {
                    "project": "{ _id : 1, name : 1, age: 1, datetimeData: 1 }",
                    "sort": "{ age : 1 }",
                    "skip": 3,
                    "limit": 3
                }
            },
            "sink": {
                "type": "<sink type>"
            }
        }
    }
]

Export JSON documents as-is

You can use this MongoDB connector to export JSON documents as-is from a MongoDB collection to various file-based stores or to Azure Cosmos DB. To achieve such schema-agnostic copy, skip the "structure" (also called schema) section in dataset and schema mapping in copy activity.

Schema mapping

To copy data from MongoDB to tabular sink, refer to schema mapping.

Next steps

For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.