Tutorial: Create a Jupyter Notebook to analyze data in your Azure Cosmos DB for NoSQL account using Visual Studio Code Jupyter notebooks

APPLIES TO: NoSQL

This tutorial walks through how to use the Visual Studio Code Jupyter notebooks to interact with your Azure Cosmos DB for NoSQL account. You'll see how to connect to your account, import data, and run queries.

Prerequisites

Create a new notebook

In this section, you'll create the Azure Cosmos database, container, and import the retail data to the container.

  1. Open Visual Studio Code.
  2. Run the Create: New Jupyter Notebook command from the Command Palette (Ctrl+Shift+P) or create a new .ipynb file in your workspace.

Tip

Now that the new notebook has been created, you can save it and name it something like AnalyzeRetailData.ipynb.

Create a database and container using the SDK

  1. Start in the default code cell.

  2. Install the Azure.cosmos package. Run this cell before continuing.

    %pip install azure.cosmos
    
  3. Import any packages you require for this tutorial.

    import azure.cosmos
    from azure.cosmos.partition_key import PartitionKey
    from azure.cosmos import CosmosClient
    
  4. Create a new instance of CosmosClient.

    endpoint = "<FILL ME>"
    key = "<FILL ME>"
    cosmos_client = CosmosClient(url=endpoint, credential=key)
    
  5. Create a database named RetailIngest using the built-in SDK.

    database = cosmos_client.create_database_if_not_exists('RetailIngest')
    
  6. Create a container named WebsiteMetrics with a partition key of /CartID.

    container = database.create_container_if_not_exists(id='WebsiteMetrics', partition_key=PartitionKey(path='/CartID'))
    
  7. Select Run to create the database and container resource.

    Screenshot of Execute cell in Visual Studio Code Jupyter notebook.

Import data into container

  1. Add a new code cell

  2. Within the code cell, add the following code to upload data from this url: https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json.

        import urllib.request
        import json
    
        with urllib.request.urlopen("https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json") as url:
            docs = json.loads(url.read().decode())
    
        for doc in docs:
            container.upsert_item(doc)
    
  3. Run the cell. This will take 45 seconds to 1 minute to run.

Analyze your data

  1. Create another new code cell.

  2. In the code cell, use a SQL query to populate a Pandas DataFrame. Run this cell.

    import pandas as pd
    from pandas import DataFrame
    
    QUERY = "SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c"
    results = container.query_items(
        query=QUERY, enable_cross_partition_query=True
    )
    
    df_cosmos = pd.DataFrame(results)
    
  3. Create another new code cell.

  4. In the code cell, output the top 10 items from the dataframe. Run this cell.

    df_cosmos.head(10)
    
  5. Observe the output of running the command.

    Action ItemRevenue Country Item
    0 Purchased 19.99 Macedonia Button-Up Shirt
    1 Viewed 12.00 Papua New Guinea Necklace
    2 Viewed 25.00 Slovakia (Slovak Republic) Cardigan Sweater
    3 Purchased 14.00 Senegal Flip Flop Shoes
    4 Viewed 50.00 Panama Denim Shorts
    5 Viewed 14.00 Senegal Flip Flop Shoes
    6 Added 14.00 Senegal Flip Flop Shoes
    7 Added 50.00 Panama Denim Shorts
    8 Purchased 33.00 Palestinian Territory Red Top
    9 Viewed 30.00 Malta Green Sweater
  6. Create another new code cell.

  7. In the code cell, import the pandas package to customize the output of the dataframe. Run this cell.

    import pandas as pd    
    df_cosmos.groupby("Item").size().reset_index()
    
  8. Observe the output of running the command.

    Item Test
    0 Flip Flop Shoes 66
    1 Necklace 55
    2 Athletic Shoes 111
    ... ... ...
    45 Windbreaker Jacket 56

Next steps