March 2019

Volume 34 Number 3

[Data Points]

A Peek at the EF Core Cosmos DB Provider Preview, Part 2

By Julie Lerman

Julie LermanIn the January 2019 Data Points column, I presented a first look at the Cosmos DB Provider for EF Core. This provider is still in preview and is expected to go live with the EF Core 3.0 release, so it’s a good time to prepare for it in advance.

In that first part, you read about why there’s a NoSQL provider for an ORM and learned how to perform some basic read and write actions for individual objects and their related data as defined by a simple model. Writing code that leverages this provider isn’t much different than working with the more familiar relational database providers.

You also learned how EF Core can create a database and containers on the fly for pre-existing Azure Database accounts and how to view the data in the cloud using the Cosmos DB extension for Visual Studio Code.

I took a sidetrack in my February 2019 column (msdn.com/magazine/mt833267) to take a look at the MongoDB API of Azure Cosmos DB, although this isn’t related to EF Core. Now I’ll return to my previous subject to share some of the other interesting discoveries I made when exploring the EF Core Cosmos DB provider.

In this column, you’ll learn about some of the provider’s more advanced features, such as configuring the DbContext to change how EF Core targets Cosmos DB database containers, realizing embedded documents with owned entities and using EF Core logging to see the SQL along with other interesting processing information generated by the provider.

More About Containers and EF Core Mappings

Containers, also known as collections in the Cosmos DB SQL and Mongo DB APIs, are schema-agnostic groupings of items that are the “fundamental units of scalability” for Cosmos DB. That is, you can define throughput per container and a container can scale and be replicated as a unit. As your data grows, designing models and how they align with containers will impact performance and cost. If you’ve used EF or EF Core, you’ll be familiar with the default that one DbSet<TEntity> maps to one table in a relational database. Creating a separate Cosmos DB container for each DbSet could be an expensive default, however. But the default, as you learned in Part 1, is that all of the data for a DbContext maps to a single container. And the convention is that the container has the same name as the DbContext.

Let’s take a look at defaults and see which ones you can control with EF Core.

In the previous article, I let EF Core trigger a new database and container to be created on the fly. I already had an Azure account, targeted to the SQL API (which is what EF Core uses). Geo-redundancy is enabled by default, but I’ve only configured my account to use a single datacenter in the Eastern United States. Therefore, by default, multi-region writes are disabled. So whatever databases I add to this account, and any containers to those databases, will follow those overarching specs controlled by the account.

In the first column, I had a DbContext named ExpanseDbContext. When configuring the ExpanseDbContext to use the Cosmos provider, I specified that the database name should be ExpanseCosmosDemo:

optionsBuilder.UseCosmos(endpointstring,accountkeystring, "ExpanseCosmosDemo")

The first time my code called Database.EnsureCreated on an instance of ExpanseDbContext, the ExpanseCosmosDemo database was created along with the default container, called ExpanseDbContext, following the convention to use the name of the DbContext class.

The container was created using the Azure Cosmos DB defaults shown in Figure 1. Not shown in the figure is the indexing policy configuration using the default, which is “consistent.”

Azure Cosmos DB Defaults for Creating a Container
Figure 1 Azure Cosmos DB Defaults for Creating a Container

These settings can’t be affected by EF Core. You can modify them in the portal, using the Azure CLI or an SDK. This makes sense because EF Core’s role is to read and write data. But one thing you can affect with EF Core is container names and mapping entities to be stored in different containers.

You can override the default container name with the Has­DefaultContainerName method in OnConfiguring. For example, the following will use ExpanseDocuments as the default name instead of ExpanseDbContext:

modelBuilder.HasDefaultContainerName("ExpanseDocuments");

If you’ve determined that you want to split data into different containers, you can map a new container name for particular entities. Here’s an example that specifies the Ship entity from the previous article into a container called ExpanseShips:

modelBuilder.Entity<Ship>().ToContainer("ExpanseShips");

You can target as many entities to a single container as you want. The default container already demonstrates this. But you could use ToContainer(“ExpanseShips”) with other entities, as well, if you wanted.

What happens when you add a new container to an existing database in this way? As I noted in Part 1, the only way to have EF Core create a database or container is by calling context.Data­base.EnsureCreated. EF Core will recognize what does and doesn’t already exist and create any new containers as needed.

If you change the default container name, EF will create the new container and will work with that container going forward. But any data in the original container will remain there.

Because Azure Cosmos DB doesn’t have the ability to rename an existing container, the official recommendation is to move the data into the new collection, perhaps with a bulk executor library, such as the one at bit.ly/2RbpTvp. The same holds true if you change the mapping for an entity to a different container. The original data won’t be moved and you’ll be responsible for ensuring that the old items are transferred. Again, it’s probably more reasonable to do that one-time move outside of EF Core.

I also tested out adding graphs of Consortium with Ships where the documents would end up in separate containers in the data­base. When reading that data, I was able to write a query for Consortia that eager-loaded its ship data, for example:

context.Consortia.Include(c=>c.Ships).FirstOrDefault()

EF Core was able to retrieve the data from the separate containers and reconstruct the object graph.

Owned Entities Get Embedded Within Parent Documents

In Part 1, you saw that related entities were stored in their own documents. I’ve listed the Expanse classes in Figure 2 as a reminder of the example model. When I built a graph of a Consortium with Ships, each object was stored as a separate document with foreign keys that allow EF Core or other code to connect them back up again. That’s a very relational concept, but because consortia and ships are unique entities that have their own identity keys, this is how EF Core will persist them. But EF Core does have an understanding of document database and embedded documents, which you can witness when working with owned entities. Notice that the Origin type doesn’t have a key property and it’s used as a property of both Ship and of Consortium. It will be an owned entity in my model. You can read more about the EF Core Owned Entity feature in my April 2018 Data Points article at msdn.com/magazine/mt846463.

Figure 2 The Expanse Classes

public class Consortium
{
  public Consortium()
  {
    Ships=new List<Ship>();
    Stations=new List<Station>();
  }
  public Guid ConsortiumId { get; set; }
  public string Name { get; set; }
  public List<Ship> Ships{get;set;}
  public List<Station> Stations{get;set;}
  public Origin Origin{get;set;}
}
public class Planet
{
  public Guid PlanetId { get; set; }
  public string PlanetName { get; set; }
}
public class Ship
{
  public Guid ShipId {get;set;}
  public string ShipName {get;set;}
  public int PlanetId {get;set;}
  public Origin Origin{get;set;}
}
public class Origin
{
  public DateTime Date{get;set;}
  public String Location{get;set;}
}

In order for EF Core to comprehend an owned type so that it can map it to a database, you need to configure it either as a data annotation or (always my preference) a fluent API configuration. The latter happens in the DbContext OnConfiguring method as I’m doing here:

modelBuilder.Entity<Ship>().OwnsOne(s=>s.Origin);
modelBuilder.Entity<Consortium>().OwnsOne(s=>s.Origin);
Here’s some code for adding a new Ship, along with its origin, to a consortium object:
consortium.Ships.Add(new Ship{ShipId=Guid.NewGuid(),ShipName="Nathan Hale 3rd",
                              Origin= new Origin {Date=DateTime.Now,
                              Location="Earth"}});

When the consortium is saved via the ExpanseContext, the new ship is also saved into its own document.

Figure 3 displays the document for that Ship with its Origin represented as an embedded document. A document database doesn’t need a sub-document to have a foreign key back with its parent. However, the EF Core logic for persisting owned entities does require the foreign key (handled by EF Core Shadow Properties) in order to persist owned entities in relational databases. Therefore, it leverages its existing logic to infer the ShipId property within the Origin sub-document.

Figure 3 A Ship Document with an Origin Sub-Document Embedded

{
  "ShipId": "e5d48ffd-e52e-4d55-97c0-cee486a91629",
  "ConsortiumId": "60ccb22d-4422-45b2-a54a-71fa240435b3",
  "Discriminator": "Ship",
  "PlanetId": 0,
  "ShipName": "Nathan Hale 3rd",
  "id": "c2bdd90f-cb6a-4a3f-bacf-b0b3ac191662",
  "Origin": {
    "ShipId": "e5d48ffd-e52e-4d55-97c0-cee486a91629",
    "Date": "2019-01-22T11:40:29.117453-05:00",
    "Discriminator": "Origin",
    "Location": "Earth"
  },
  "_rid": "cgEVAKklUPgCAAAAAAAAAA==",
  "_self": "dbs/cgEVAA==/colls/cgEVAKklUPg=/docs/
            cgEVAKklUPgCAAAAAAAAAA==/",
  "_etag": "\"0000a43b-0000-0000-0000-5c47477d0000\"",
  "_attachments": "attachments/",
  "_ts": 1548175229
}

EF Core also has the ability to map owned collections with the OwnsMany mapping. In this case, you’d see multiple sub-documents within the parent document in the database.

There’s a gotcha that will be fixed in EF Core 3.0.0 preview 2. EF Core currently doesn’t understand null owned entity properties. The other database providers will throw a runtime exception if you attempt to add an object with a null owned entity property, a behav­ior you can read about in the previously mentioned April 2018 column. Unfortunately, the Cosmos DB provider doesn’t prevent you from adding objects in this state, but it’s not able to materialize objects that don’t have the owned entity property populated. Here’s the exception that was raised when I encountered this problem:

"System.InvalidCastException: Unable to cast object of type
 'Newtonsoft.Json.Linq.JValue' to type 'Newtonsoft.Json.Linq.JObject'."

So if you see that error when trying to query entities that have owned type properties, I hope you’ll remember that it’s likely a null owned type property causing the exception.

Logging the Provider Activity

EF Core plugs into the .NET Core logging framework, as I covered in my October 2018 column (msdn.com/magazine/mt830355). Shortly after that article was published, the syntax for instantiating the LoggerFactory was simplified, although the means of using categories and log levels to determine what should get output in the logs didn’t change. I reported the updated syntax in a blog post, “Logging in EF Core 2.2 Has a Simpler Syntax—More Like ASP.NET Core” (bit.ly/2UdSkuI).

When EF Core interacts with the Cosmos DB provider, it also shares details with the logger. This means you can see all of the same types of information in the logs that you can with other providers.

Keep in mind that CosmosDB doesn’t use SQL for inserting, updating and deleting, as you’re used to doing with relational databases. SQL is used for queries only, so SaveChanges won’t show SQL in the logs. However, you can see how EF Core is fixing up the objects, creating any needed IDs, foreign keys and discriminators. I was able to see all of this information when logging all of the categories tied to the Debug LogLevel, rather than only filtering on the database commands.

Here’s how I configured my GetLoggerFactory method to do that. Notice the AddFilter method. Rather than passing a category into the first parameter, I’m using an empty string, which gives me every category:

private ILoggerFactory GetLoggerFactory()
{
  IServiceCollection serviceCollection = new ServiceCollection();
  serviceCollection.AddLogging(builder =>
         builder.AddConsole()
                .AddFilter("" , LogLevel.Debug));
  return serviceCollection.BuildServiceProvider()
          .GetService<ILoggerFactory>();
}

If I’d wanted to filter on just the SQL commands, I’d have passed DbLoggerCategory.Database.Command.Name to give the correct string for just those events instead of an empty string. This relayed a lot of logging messages when inserting a few graphs and then executing a single query to retrieve some of that inserted data. I’ll include the full output and my program in the download that accompanies this column.

Some interesting tidbits from those logs include this information about adding shadow properties where you can, in the case of this provider, see the special Discriminator property being populated:

dbug: Microsoft.EntityFrameworkCore.Model[10600]
      The property 'Discriminator' on entity type 'Station' was created in shadow state
      because there are no eligible CLR members with a matching name.

If you’re saving data, after all of that fix-up is performed, you’ll see a log message that SaveChanges is starting:

debug: Microsoft.EntityFrameworkCore.Update[10004]
       SaveChanges starting for 'ExpanseContext'.

This is followed by messages about DetectChanges being called. The provider will use internal API logic to add, modify or remove the document in the relevant collection, but you won’t see any particular logs about that action. However, after these actions complete, the logs will relay typical post-save steps such as the context updating the state of the object that was just posted:

dbug: Microsoft.EntityFrameworkCore.ChangeTracking[10807]
      The 'Consortium' entity with key '{ConsortiumId: a4b0405e-a820-4806-8b60-159033184cf1}' 
      tracked by 'ExpanseContext' changed from 'Added' to 'Unchanged'.

If you’re executing a query, you’ll see a number of messages as EF Core works out the query. EF Core starts by compiling the query and then massages it until it arrives at the SQL that gets sent to the database. Here’s a log message showing the final SQL:

dbug: Microsoft.EntityFrameworkCore.Database.Command[30000]
      Executing Sql Query [Parameters=[]]
      SELECT c
      FROM root c
      WHERE (c["Discriminator"] = "Consortium")

Waiting for Release

The EF Core Cosmos DB provider preview is available for EF Core 2.2+. I worked with EF Core 2.2.1 and and then, in order to see if I noticed any changes, switched to the unreleased EF Core packages in the latest preview of EF Core 3, version 3.0.0-preview.18572.1.

EF Core 3 is on the same release schedule as .NET Core 3.0, but the latest information about the timing only says “sometime in 2019.” The official release of Preview 2 was announced at the end of January 2019 in the blog post at bit.ly/2UsNBp6. If you’re interested in this support for Azure Cosmos DB, I recommend trying it out now and helping the EF team uncover any problems to make it a more viable provider for you when it does get released.


Julie Lerman is a Microsoft Regional Director, Microsoft MVP, software team coach and consultant who lives in the hills of Vermont. You can find her presenting on data access and other topics at user groups and conferences around the world. She blogs at the thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at bit.ly/PS-Julie.

Thanks to the following Microsoft technical expert for reviewing this article: Andriy Svyryd
Andriy Svyryd is a Microsoft developer who specializes in data modeling and API design.  He has been a developer on the Entity Framework team since 2010. His work and personal projects can be seen at https://github.com/AndriySvyryd. Full biography is available at https://www.linkedin.com/in/andriy-svyryd-51364719/


Discuss this article in the MSDN Magazine forum