Quickstart: Azure Cosmos DB for MongoDB for Python with MongoDB driver

APPLIES TO: MongoDB

Get started with the PyMongo package to create databases, collections, and documents within your Azure Cosmos DB resource. Follow these steps to install the package and try out example code for basic tasks.

Note

The example code snippets are available on GitHub as a Python project.

In this quickstart, you'll communicate with the Azure Cosmos DB’s API for MongoDB by using one of the open-source MongoDB client drivers for Python, PyMongo. Also, you'll use the MongoDB extension commands, which are designed to help you create and obtain database resources that are specific to the Azure Cosmos DB capacity model.

Prerequisites

Prerequisite check

  • In a terminal or command window, run python --version to check that you have a recent version of Python.
  • Run az --version (Azure CLI) or Get-Module -ListAvailable Az* (Azure PowerShell) to check that you have the appropriate Azure command-line tools installed.

Setting up

This section walks you through creating an Azure Cosmos DB account and setting up a project that uses the MongoDB npm package.

Create an Azure Cosmos DB account

This quickstart will create a single Azure Cosmos DB account using the API for MongoDB.

  1. Create shell variables for accountName, resourceGroupName, and location.

    # Variable for resource group name
    resourceGroupName="msdocs-cosmos-quickstart-rg"
    location="westus"
    
    # Variable for account name with a randomnly generated suffix
    let suffix=$RANDOM*$RANDOM
    accountName="msdocs-$suffix"
    
  2. If you haven't already, sign in to the Azure CLI using the az login command.

  3. Use the az group create command to create a new resource group in your subscription.

    az group create \
        --name $resourceGroupName \
        --location $location
    
  4. Use the az cosmosdb create command to create a new Azure Cosmos DB for MongoDB account with default settings.

    az cosmosdb create \
        --resource-group $resourceGroupName \
        --name $accountName \
        --locations regionName=$location
        --kind MongoDB
    

Get MongoDB connection string

  1. Find the API for MongoDB connection string from the list of connection strings for the account with the az cosmosdb keys list command.

    az cosmosdb keys list --type connection-strings \
        --resource-group $resourceGroupName \
        --name $accountName 
    
  2. Record the PRIMARY KEY values. You'll use these credentials later.

Create a new Python app

  1. Create a new empty folder using your preferred terminal and change directory to the folder.

    Note

    If you just want the finished code, download or fork and clone the example code snippets repo that has the full example. You can also git clone the repo in Azure Cloud Shell to walk through the steps shown in this quickstart.

  2. Create a requirements.txt file that lists the PyMongo and python-dotenv packages.

    # requirements.txt
    pymongo
    python-dotenv
    
  3. Create a virtual environment and install the packages.

    # py -3 uses the global python interpreter. You can also use python3 -m venv .venv.
    py -3 -m venv .venv
    source .venv/Scripts/activate   
    pip install -r requirements.txt
    

Configure environment variables

To use the CONNECTION STRING values within your code, set this value in the local environment running the application. To set the environment variable, use your preferred terminal to run the following commands:

$env:COSMOS_CONNECTION_STRING = "<cosmos-connection-string>"

Object model

Let's look at the hierarchy of resources in the API for MongoDB and the object model that's used to create and access these resources. The Azure Cosmos DB creates resources in a hierarchy that consists of accounts, databases, collections, and documents.

Diagram of the Azure Cosmos DB hierarchy including accounts, databases, collections, and docs.

Hierarchical diagram showing an Azure Cosmos DB account at the top. The account has two child database shards. One of the database shards includes two child collection shards. The other database shard includes a single child collection shard. That single collection shard has three child doc shards.

Each type of resource is represented by a Python class. Here are the most common classes:

  • MongoClient - The first step when working with PyMongo is to create a MongoClient to connect to Azure Cosmos DB's API for MongoDB. The client object is used to configure and execute requests against the service.

  • Database - Azure Cosmos DB's API for MongoDB can support one or more independent databases.

  • Collection - A database can contain one or more collections. A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database.

  • Document - A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection don't need to have the same set of fields or structure. And common fields in a collection's documents may hold different types of data.

To learn more about the hierarchy of entities, see the Azure Cosmos DB resource model article.

Code examples

The sample code described in this article creates a database named adventureworks with a collection named products. The products collection is designed to contain product details such as name, category, quantity, and a sale indicator. Each product also contains a unique identifier. The complete sample code is at https://github.com/Azure-Samples/azure-cosmos-db-mongodb-python-getting-started/tree/main/001-quickstart/.

For the steps below, the database won't use sharding and shows a synchronous application using the PyMongo driver. For asynchronous applications, use the Motor driver.

Authenticate the client

  1. In the project directory, create an run.py file. In your editor, add require statements to reference packages you'll use, including the PyMongo and python-dotenv packages.

    import os
    import sys
    from random import randint
    
    import pymongo
    from dotenv import load_dotenv
    
  2. Get the connection information from the environment variable defined in an .env file.

    load_dotenv()
    CONNECTION_STRING = os.environ.get("COSMOS_CONNECTION_STRING")
    
  3. Define constants you'll use in the code.

    DB_NAME = "adventureworks"
    COLLECTION_NAME = "products"
    

Connect to Azure Cosmos DB’s API for MongoDB

Use the MongoClient object to connect to your Azure Cosmos DB for MongoDB resource. The connect method returns a reference to the database.

client = pymongo.MongoClient(CONNECTION_STRING)

Get database

Check if the database exists with list_database_names method. If the database doesn't exist, use the create database extension command to create it with a specified provisioned throughput.

# Create database if it doesn't exist
db = client[DB_NAME]
if DB_NAME not in client.list_database_names():
    # Create a database with 400 RU throughput that can be shared across
    # the DB's collections
    db.command({"customAction": "CreateDatabase", "offerThroughput": 400})
    print("Created db '{}' with shared throughput.\n".format(DB_NAME))
else:
    print("Using database: '{}'.\n".format(DB_NAME))

Get collection

Check if the collection exists with the list_collection_names method. If the collection doesn't exist, use the create collection extension command to create it.

# Create collection if it doesn't exist
collection = db[COLLECTION_NAME]
if COLLECTION_NAME not in db.list_collection_names():
    # Creates a unsharded collection that uses the DBs shared throughput
    db.command(
        {"customAction": "CreateCollection", "collection": COLLECTION_NAME}
    )
    print("Created collection '{}'.\n".format(COLLECTION_NAME))
else:
    print("Using collection: '{}'.\n".format(COLLECTION_NAME))

Create an index

Create an index using the update collection extension command. You can also set the index in the create collection extension command. Set the index to name property in this example so that you can later sort with the cursor class sort method on product name.

indexes = [
    {"key": {"_id": 1}, "name": "_id_1"},
    {"key": {"name": 2}, "name": "_id_2"},
]
db.command(
    {
        "customAction": "UpdateCollection",
        "collection": COLLECTION_NAME,
        "indexes": indexes,
    }
)
print("Indexes are: {}\n".format(sorted(collection.index_information())))

Create a document

Create a document with the product properties for the adventureworks database:

  • A category property. This property can be used as the logical partition key.
  • A name property.
  • An inventory quantity property.
  • A sale property, indicating whether the product is on sale.
"""Create new document and upsert (create or replace) to collection"""
product = {
    "category": "gear-surf-surfboards",
    "name": "Yamba Surfboard-{}".format(randint(50, 5000)),
    "quantity": 1,
    "sale": False,
}
result = collection.update_one(
    {"name": product["name"]}, {"$set": product}, upsert=True
)
print("Upserted document with _id {}\n".format(result.upserted_id))

Create a document in the collection by calling the collection level operation update_one. In this example, you'll upsert instead of create a new document. Upsert isn't necessary in this example because the product name is random. However, it's a good practice to upsert in case you run the code more than once and the product name is the same.

The result of the update_one operation contains the _id field value that you can use in subsequent operations. The _id property was created automatically.

Get a document

Use the find_one method to get a document.

doc = collection.find_one({"_id": result.upserted_id})
print("Found a document with _id {}: {}\n".format(result.upserted_id, doc))

In Azure Cosmos DB, you can perform a less-expensive point read operation by using both the unique identifier (_id) and a partition key.

Query documents

After you insert a doc, you can run a query to get all docs that match a specific filter. This example finds all docs that match a specific category: gear-surf-surfboards. Once the query is defined, call Collection.find to get a Cursor result, and then use sort.

"""Query for documents in the collection"""
print("Products with category 'gear-surf-surfboards':\n")
allProductsQuery = {"category": "gear-surf-surfboards"}
for doc in collection.find(allProductsQuery).sort(
    "name", pymongo.ASCENDING
):
    print("Found a product with _id {}: {}\n".format(doc["_id"], doc))

Troubleshooting:

  • If you get an error such as The index path corresponding to the specified order-by item is excluded., make sure you created the index.

Run the code

This app creates an API for MongoDB database and collection and creates a document and then reads the exact same document back. Finally, the example issues a query that returns documents that match a specified product category. With each step, the example outputs information to the console about the steps it has performed.

To run the app, use a terminal to navigate to the application directory and run the application.

python run.py

The output of the app should be similar to this example:


Created db 'adventureworks' with shared throughput.

Created collection 'products'.

Indexes are: ['_id_', 'name_1']

Upserted document with _id <ID>

Found a document with _id <ID>:
{'_id': <ID>,
'category': 'gear-surf-surfboards',
'name': 'Yamba Surfboard-50',
'quantity': 1,
'sale': False}

Products with category 'gear-surf-surfboards':

Found a product with _id <ID>:
{'_id': ObjectId('<ID>'),
'name': 'Yamba Surfboard-386',
'category': 'gear-surf-surfboards',
'quantity': 1,
'sale': False}

Clean up resources

When you no longer need the Azure Cosmos DB for NoSQL account, you can delete the corresponding resource group.

Use the az group delete command to delete the resource group.

az group delete --name $resourceGroupName