Tutorial: Create a Jupyter Notebook in Azure Cosmos DB for NoSQL to analyze and visualize data (preview)

APPLIES TO: NoSQL

Warning

The Jupyter Notebooks feature of Azure Cosmos DB will be retired March 30, 2024; you will not be able to use built-in Jupyter notebooks from the Azure Cosmos DB account. We recommend using Visual Studio Code's support for Jupyter notebooks or your preferred notebooks client.

This tutorial walks through how to use the Jupyter Notebooks feature of Azure Cosmos DB to import sample retail data to an Azure Cosmos DB for NoSQL account. You'll see how to use the Azure Cosmos DB magic commands to run queries, analyze the data, and visualize the results.

Prerequisites

Create a new notebook

In this section, you'll create the Azure Cosmos database, container, and import the retail data to the container.

  1. Navigate to your Azure Cosmos DB account and open the Data Explorer.

  2. Select New Notebook.

    Screenshot of the Data Explorer with the 'New Notebook' option highlighted.

  3. In the confirmation dialog that appears, select Create.

    Note

    A temporary workspace will be created to enable you to work with Jupyter Notebooks. When the session expires, any notebooks in the workspace will be removed.

  4. Select the kernel you wish to use for the notebook.

Tip

Now that the new notebook has been created, you can rename it to something like VisualizeRetailData.ipynb.

Create a database and container using the SDK

  1. Start in the default code cell.

  2. Import any packages you require for this tutorial.

    import azure.cosmos
    from azure.cosmos.partition_key import PartitionKey
    
  3. Create a database named RetailIngest using the built-in SDK.

    database = cosmos_client.create_database_if_not_exists('RetailIngest')
    
  4. Create a container named WebsiteMetrics with a partition key of /CartID.

    container = database.create_container_if_not_exists(id='WebsiteMetrics', partition_key=PartitionKey(path='/CartID'))
    
  5. Select Run to create the database and container resource.

    Screenshot of the 'Run' option in the menu.

Import data using magic commands

  1. Add a new code cell.

  2. Within the code cell, add the following magic command to upload, to your existing container, the JSON data from this url: https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json

    %%upload --databaseName RetailIngest --containerName WebsiteMetrics --url https://cosmosnotebooksdata.blob.core.windows.net/notebookdata/websiteData.json
    
  3. Select Run Active Cell to only run the command in this specific cell.

    Screenshot of the 'Run Active Cell' option in the menu.

    Note

    The import command should take 5-10 seconds to complete.

  4. Observe the output from the run command. Ensure that 2,654 documents were imported.

    Documents successfully uploaded to WebsiteMetrics
    Total number of documents imported:
      Success: 2654
      Failure: 0
    Total time taken : 00:00:04 hours
    Total RUs consumed : 27309.660000001593
    

Visualize your data

  1. Create another new code cell.

  2. In the code cell, use a SQL query to populate a Pandas DataFrame.

    %%sql --database RetailIngest --container WebsiteMetrics --output df_cosmos
    SELECT c.Action, c.Price as ItemRevenue, c.Country, c.Item FROM c
    
  3. Select Run Active Cell to only run the command in this specific cell.

  4. Create another new code cell.

  5. In the code cell, output the top 10 items from the dataframe.

    df_cosmos.head(10)
    
  6. Select Run Active Cell to only run the command in this specific cell.

  7. Observe the output of running the command.

    Action ItemRevenue Country Item
    0 Purchased 19.99 Macedonia Button-Up Shirt
    1 Viewed 12.00 Papua New Guinea Necklace
    2 Viewed 25.00 Slovakia (Slovak Republic) Cardigan Sweater
    3 Purchased 14.00 Senegal Flip Flop Shoes
    4 Viewed 50.00 Panama Denim Shorts
    5 Viewed 14.00 Senegal Flip Flop Shoes
    6 Added 14.00 Senegal Flip Flop Shoes
    7 Added 50.00 Panama Denim Shorts
    8 Purchased 33.00 Palestinian Territory Red Top
    9 Viewed 30.00 Malta Green Sweater
  8. Create another new code cell.

  9. In the code cell, import the pandas package to customize the output of the dataframe.

    import pandas as pd
    pd.options.display.html.table_schema = True
    pd.options.display.max_rows = None
    
    df_cosmos.groupby("Item").size()
    
  10. Select Run Active Cell to only run the command in this specific cell.

  11. In the output, select the Line Chart option to view a different visualization of the data.

    Screenshot of the Pandas dataframe visualization for the data as a line chart.

Persist your notebook

  1. In the Notebooks section, open the context menu for the notebook you created for this tutorial and select Download.

    Screenshot of the notebook context menu with the 'Download' option.

    Tip

    To save your work permanently, save your notebooks to a GitHub repository or download the notebooks to your local machine before the session ends.

Next steps