Bulk import data to Azure Cosmos DB SQL API account by using the .NET SDK

APPLIES TO: SQL API

This tutorial shows how to build a .NET console application that optimizes provisioned throughput (RU/s) required to import data to Azure Cosmos DB. In this article, you will read data from a sample data source and import it into an Azure Cosmos container. This tutorial uses Version 3.0+ of the Azure Cosmos DB .NET SDK, which can be targeted to .NET Framework or .NET Core.

This tutorial covers:

  • Creating an Azure Cosmos account
  • Configuring your project
  • Connecting to an Azure Cosmos account with bulk support enabled
  • Perform a data import through concurrent create operations

Prerequisites

Before following the instructions in this article, make sure that you have the following resources:

Step 1: Create an Azure Cosmos DB account

Create an Azure Cosmos DB SQL API account from the Azure portal or you can create the account by using the Azure Cosmos DB Emulator.

Step 2: Set up your .NET project

Open the Windows command prompt or a Terminal window from your local computer. You will run all the commands in the next sections from the command prompt or terminal. Run the following dotnet new command to create a new app with the name bulk-import-demo. The --langVersion parameter sets the LangVersion property in the created project file.

dotnet new console –langVersion:8 -n bulk-import-demo

Change your directory to the newly created app folder. You can build the application with:

cd bulk-import-demo
dotnet build

The expected output from the build should look something like this:

Restore completed in 100.37 ms for C:\Users\user1\Downloads\CosmosDB_Samples\bulk-import-demo\bulk-import-demo.csproj.
  bulk -> C:\Users\user1\Downloads\CosmosDB_Samples\bulk-import-demo \bin\Debug\netcoreapp2.2\bulk-import-demo.dll

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:34.17

Step 3: Add the Azure Cosmos DB package

While still in the application directory, install the Azure Cosmos DB client library for .NET Core by using the dotnet add package command.

dotnet add package Microsoft.Azure.Cosmos

Step 4: Get your Azure Cosmos account credentials

The sample application needs to authenticate to your Azure Cosmos account. To authenticate, you should pass the Azure Cosmos account credentials to the application. Get your Azure Cosmos account credentials by following these steps:

  1. Sign in to the Azure portal.
  2. Navigate to your Azure Cosmos account.
  3. Open the Keys pane and copy the URI and PRIMARY KEY of your account.

If you are using the Azure Cosmos DB Emulator, obtain the emulator credentials from this article.

Step 5: Initialize the CosmosClient object with bulk execution support

Open the generated Program.cs file in a code editor. You will create a new instance of CosmosClient with bulk execution enabled and use it to do operations against Azure Cosmos DB.

Let's start by overwriting the default Main method and defining the global variables. These global variables will include the endpoint and authorization keys, the name of the database, container that you will create, and the number of items that you will be inserting in bulk. Make sure to replace the endpointURL and authorization key values according to your environment.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Text.Json;
using System.Threading.Tasks;
using Microsoft.Azure.Cosmos;

public class Program
{
     private const string EndpointUrl = "https://<your-account>.documents.azure.com:443/";
     private const string AuthorizationKey = "<your-account-key>";
     private const string DatabaseName = "bulk-tutorial";
     private const string ContainerName = "items";
     private const int ItemsToInsert = 300000;

     static async Task Main(string[] args)
     {

     }
}

Inside the Main method, add the following code to initialize the CosmosClient object:

CosmosClient cosmosClient = new CosmosClient(EndpointUrl, AuthorizationKey, new CosmosClientOptions() { AllowBulkExecution = true });

After the bulk execution is enabled, the CosmosClient internally groups concurrent operations into single service calls. This way it optimizes the throughput utilization by distributing service calls across partitions, and finally assigning individual results to the original callers.

You can then create a container to store all our items. Define /pk as the partition key, 50000 RU/s as provisioned throughput, and a custom indexing policy that will exclude all fields to optimize the write throughput. Add the following code after the CosmosClient initialization statement:

Database database = await cosmosClient.CreateDatabaseIfNotExistsAsync(Program.DatabaseName);

await database.DefineContainer(Program.ContainerName, "/pk")
        .WithIndexingPolicy()
            .WithIndexingMode(IndexingMode.Consistent)
            .WithIncludedPaths()
                .Attach()
            .WithExcludedPaths()
                .Path("/*")
                .Attach()
        .Attach()
    .CreateAsync(50000);

Step 6: Populate a list of concurrent tasks

To take advantage of the bulk execution support, create a list of asynchronous tasks based on the source of data and the operations you want to perform, and use Task.WhenAll to execute them concurrently. Let’s start by using "Bogus" data to generate a list of items from our data model. In a real-world application, the items would come from your desired data source.

First, add the Bogus package to the solution by using the dotnet add package command.

dotnet add package Bogus

Define the definition of the items that you want to save. You need to define the Item class within the Program.cs file:

public class Item
{
    public string id {get;set;}
    public string pk {get;set;}

    public string username{get;set;}
}

Next, create a helper function inside the Program class. This helper function will get the number of items you defined to insert and generates random data:

private static IReadOnlyCollection<Item> GetItemsToInsert()
{
    return new Bogus.Faker<Item>()
    .StrictMode(true)
    //Generate item
    .RuleFor(o => o.id, f => Guid.NewGuid().ToString()) //id
    .RuleFor(o => o.username, f => f.Internet.UserName())
    .RuleFor(o => o.pk, (f, o) => o.id) //partitionkey
    .Generate(ItemsToInsert);
}

Read the items and serialize them into stream instances by using the System.Text.Json class. Because of the nature of the autogenerated data, you are serializing the data as streams. You can also use the item instance directly, but by converting them to streams, you can leverage the performance of stream APIs in the CosmosClient. Typically you can use the data directly as long as you know the partition key.

To convert the data to stream instances, within the Main method, add the following code right after creating the container:

Dictionary<PartitionKey, Stream> itemsToInsert = new Dictionary<PartitionKey, Stream>(ItemsToInsert);
foreach (Item item in Program.GetItemsToInsert())
{
    MemoryStream stream = new MemoryStream();
    await JsonSerializer.SerializeAsync(stream, item);
    itemsToInsert.Add(new PartitionKey(item.pk), stream);
}

Next use the data streams to create concurrent tasks and populate the task list to insert the items into the container. To perform this operation, add the following code to the Program class:

Container container = database.GetContainer(ContainerName);
List<Task> tasks = new List<Task>(ItemsToInsert);
foreach (KeyValuePair<PartitionKey, Stream> item in itemsToInsert)
{
    tasks.Add(container.CreateItemStreamAsync(item.Value, item.Key)
        .ContinueWith((Task<ResponseMessage> task) =>
        {
            using (ResponseMessage response = task.Result)
            {
                if (!response.IsSuccessStatusCode)
                {
                    Console.WriteLine($"Received {response.StatusCode} ({response.ErrorMessage}).");
                }
            }
        }));
}

// Wait until all are done
await Task.WhenAll(tasks);

All these concurrent point operations will be executed together (that is in bulk) as described in the introduction section.

Step 7: Run the sample

In order to run the sample, you can do it simply by the dotnet command:

dotnet run

Get the complete sample

If you didn't have time to complete the steps in this tutorial, or just want to download the code samples, you can get it from GitHub.

After cloning the project, make sure to update the desired credentials inside Program.cs.

The sample can be run by changing to the repository directory and using dotnet:

cd cosmos-dotnet-bulk-import-throughput-optimizer
dotnet run

Next steps

In this tutorial, you've done the following steps:

  • Creating an Azure Cosmos account
  • Configuring your project
  • Connecting to an Azure Cosmos account with bulk support enabled
  • Perform a data import through concurrent create operations

You can now proceed to the next tutorial: