May 2011

Volume 26 Number 05

Entity Framework - Code First in the ADO.NET Entity Framework 4.1

By Rowan Miller | May 2011

The ADO.NET Entity Framework 4.1 was released back in April and includes a series of new features that build on top of the existing Entity Framework 4 functionality that was released in the Microsoft .NET Framework 4 and Visual Studio 2010.

The Entity Framework 4.1 is available as a standalone installer (msdn.microsoft.com/data/ee712906), as the “EntityFramework” NuGet package and is also included when you install ASP.NET MVC 3.01.

The Entity Framework 4.1 includes two new main features: DbContext API and Code First. In this article, I’m going to cover how these two features can be used to develop applications. We’ll take a quick look at getting started with Code First and then delve into some of the more advanced capabilities.

The DbContext API is a simplified abstraction over the existing ObjectContext type and a number of other types that were included in previous releases of the Entity Framework. The DbContext API surface is optimized for common tasks and coding patterns. Common functionality is exposed at the root level and more advanced functionality is available as you drill down through the API.

Code First is a new development pattern for the Entity Framework that provides an alternative to the existing Database First and Model First patterns. Code First lets you define your model using CLR classes; you can then map these classes to an existing database or use them to generate a database schema. Additional configuration can be supplied using Data Annotations or via a fluent API.

Getting Started

Code First has been around for a while, so I’m not going to go into detail on getting started. You can complete the Code First Walkthrough (bit.ly/evXlOc) if you aren’t familiar with the basics. Figure 1 is a complete code listing to help get you up and running with a Code First application.

Figure 1 Getting Started with Code First

using System.Collections.Generic;
using System.Data.Entity;
using System.Linq;
using System;

namespace Blogging
{
  class Program
  {
    static void Main(string[] args)
    {
      Database.SetInitializer<BlogContext>(new BlogInitializer());

      // TODO: Make this program do something!
    }
  }

  public class BlogContext : DbContext
  {
    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
      // TODO: Perform any fluent API configuration here!
    }
  }

  public class Blog
  {
    public int BlogId { get; set; }
    public string Name { get; set; }
    public string Abstract { get; set; }

    public virtual ICollection<Post> Posts { get; set; }
  }

  public class RssEnabledBlog : Blog
  {
    public string RssFeed { get; set; }
  }

  public class Post
  {
    public int PostId { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }
    public byte[] Photo { get; set; }

    public virtual Blog Blog { get; set; }
  }

  public class BlogInitializer : DropCreateDatabaseIfModelChanges<BlogContext>
  {
    protected override void Seed(BlogContext context)
    {
      context.Blogs.Add(new RssEnabledBlog
      {
        Name = "blogs.msdn.com/data",
        RssFeed = "http://blogs.msdn.com/b/data/rss.aspx",
        Posts = new List<Post>
        {
          new Post { Title = "Introducing EF4.1" },
          new Post { Title = "Code First with EF4.1" },
        }
      });

      context.Blogs.Add(new Blog { Name = "romiller.com" });
      context.SaveChanges();
    }
  }
}

For the sake of simplicity, I’m choosing to let Code First generate a database. The database will be created the first time I use BlogContext to persist and query data. The rest of this article will apply equally to cases where Code First is mapped to an existing database schema. You’ll notice I’m using a database initializer to drop and recreate the database as we change the model throughout this article.

Mapping with the Fluent API

Code First begins by examining your CLR classes to infer the shape of your model. A series of conventions are used to detect things such as primary keys. You can override or add to what was detected by convention using Data Annotations or a fluent API. There are a number of articles about achieving common tasks using the fluent API, so I’m going to look at some of the more advanced configuration that can be performed. In particular, I’m going to focus on the “mapping” sections of the API. A mapping configuration can be used to map to an existing database schema or to affect the shape of a generated schema. The fluent API is exposed via the DbModelBuilder type and is most easily accessed by overriding the OnModelCreating method on DbContext.

Entity Splitting Entity splitting allows the properties of an entity type to be spread across multiple tables. For example, say I want to split the photo data for posts out into a separate table so that it can be stored in a different file group. Entity splitting uses multiple Map calls to map a subset of properties to a specific table. In Figure 2, I’m mapping the Photo property to the “PostPhotos” table and the remaining properties to the “Posts” table. You’ll notice that I didn’t include the primary key in the list of properties. The primary key is always required in each table; I could have included it, but Code First will add it in for me automatically.

Figure 2 Entity Splitting

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
  modelBuilder.Entity<Post>()
    .Map(m =>
      {
        m.Properties(p => new { p.Title, p.Content });
        m.ToTable("Posts");
      })
    .Map(m =>
      {
        m.Properties(p => new { p.Photo });
        m.ToTable("PostPhotos");
      });
}

Table-per-Hierarchy (TPH) Inheritance TPH involves storing the data for an inheritance hierarchy in a single table and using a discriminator column to identify the type of each row. Code First will use TPH by default if no configuration is supplied. The discriminator column will be aptly named “Discriminator” and the CLR type name of each type will be used for the discriminator values.

You may, however, want to customize how TPH mapping is performed. To do this, you use the Map method to configure the discriminator column values for the base type and then Map<TEntityType> to configure each derived type. Here I’m using a “HasRssFeed” column to store a true/false value to distinguish between “Blog” and “RssEnabledBlog” instances:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
  modelBuilder.Entity<Blog>()
    .Map(m => m.Requires("HasRssFeed").HasValue(false))
    .Map<RssEnabledBlog>(m => m.Requires("HasRssFeed").HasValue(true));
}

In the preceeding example, I’m still using a standalone column to distinguish between types, but I know that RssEnabledBlogs can be identified by the fact that they have an RSS feed. I can rewrite the mapping to let the Entity Framework know that it should use the column that stores “Blog.RssFeed” to distinguish between types. If the column has a non-null value, it must be an RssEnabledBlog:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
  modelBuilder.Entity<Blog>()
    .Map<RssEnabledBlog>(m => m.Requires(b => b.RssFeed).HasValue());
}

Table-per-Type (TPT) Inheritance TPT involves storing all properties from the base type in a single table. Any additional properties for derived types are then stored in separate tables with a foreign key back to the base table. TPT mapping uses a Map call to specify the base table name and then Map<TEntityType> to configure the table for each derived type. In the following example, I’m storing data that’s common to all blogs in the “Blogs” table and data specific to RSS-enabled blogs in the “RssBlogs” table:

modelBuilder.Entity<Blog>()
  .Map(m => m.ToTable("Blogs"))
  .Map<RssEnabledBlog>(m => m.ToTable("RssBlogs"));

Table-per-Concrete Type (TPC) Inheritance TPC involves storing the data for each type in a completely separate table with no foreign key constraints between them. The configuration is similar to TPT mapping, except you include a “MapInheritedProperties” call when configuring each derived type. MapInheritedProperties lets Code First know to remap all properties that were inherited from the base class to new columns in the table for the derived class:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
  modelBuilder.Entity<Blog>()
    .Map(m => m.ToTable("Blogs"))
    .Map<RssEnabledBlog>(m =>
      {
        m.MapInheritedProperties();
        m.ToTable("RssBlogs");
      });
}

By convention, Code First will use identity columns for integer primary keys. However, with TPC there’s no longer a single table containing all blogs that can be used to generate primary keys. Because of this, Code First will switch off identity when you use TPC mapping. If you’re mapping to an existing database that has been set up to generate unique values across multiple tables, you can re-enable identity via the property configuration section of the fluent API.

Hybrid Mappings Of course, the shape of your schema isn’t always going to conform to one of the patterns that I’ve covered, especially if you’re mapping to an existing database. The good news is that the mapping API is composable and you can combine multiple mapping strategies. Figure 3 includes an example that shows combining Entity Splitting with TPT Inheritance Mapping. The data for Blogs is split between “Blogs” and “BlogAbstracts” tables, and the data specific to RSS-enabled blogs is stored in a separate “RssBlogs” table.

Figure 3 Combining Entity Splitting with TPT Inheritance Mapping

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
  modelBuilder.Entity<Blog>()
    .Map(m =>
      {
        m.Properties(b => new { b.Name });
        m.ToTable("Blogs");
      })
    .Map(m =>
      {
        m.Properties(b => new { b.Abstract });
        m.ToTable("BlogAbstracts");
      })
    .Map<RssEnabledBlog>(m =>
      {
         m.ToTable("RssBlogs");
      });
}

Change Tracker API

Now that I’ve looked at configuring database mappings, I want to spend some time working with data. I’m going to delve straight into some more advanced scenarios; if you aren’t familiar with basic data access, take a minute to read through the Code First Walkthrough mentioned earlier.

State Information for a Single Entity In many cases, such as logging, it’s useful to get access to the state information for an entity. This can include things such as the state of the entity and which properties are modified. DbContext provides access to this information for an individual entity via the “Entry” method. The code snippet in Figure 4 loads one “Blog” from the database, modifies a property and then prints out the current and original values for each property to the console.

Figure 4 Getting State Information for an Entity

static void Main(string[] args)
{
  Database.SetInitializer<BlogContext>(new BlogInitializer());

  using (var db = new BlogContext())
  {
    // Change the name of one blog
    var blog = db.Blogs.First();
    blog.Name = "ADO.NET Team Blog";

    // Print out original and current value for each property
    var propertyNames = db.Entry(blog).CurrentValues.PropertyNames;
    foreach (var property in propertyNames)
    {
      System.Console.WriteLine(
        "{0}\n Original Value: {1}\n Current Value: {2}", 
        property, 
        db.Entry(blog).OriginalValues[property],
        db.Entry(blog).CurrentValues[property]);
    }
  }

  Console.ReadKey();
}

When the code in Figure 4 is run, the console output is as follows:

BlogId
 Original Value: 1
 Current Value: 1
 
Name
 Original Value: blogs.msdn.com/data
 Current Value: ADO.NET Team Blog
 
Abstract
 Original Value:
 Current Value:
 
RssFeed
 Original Value: http://blogs.msdn.com/b/data/rss.aspx
 Current Value: http://blogs.msdn.com/b/data/rss.aspx

State Information for Multiple Entities DbContext allows you to access information about multiple entities via the “ChangeTracker.Entries” method. There’s both a generic overload that gives entities of a specific type and a non-generic overload that gives all entities. The generic parameter doesn’t need to be an entity type. For example, you could get entries for all loaded objects that implement a specific interface. The code in Figure 5 demonstrates loading all blogs into memory, modifying a property on one of them and then printing out the state of each tracked blog.

Figure 5 Accessing Information for Multiple Entities with DbContext

static void Main(string[] args)
{
  Database.SetInitializer<BlogContext>(new BlogInitializer());

  using (var db = new BlogContext())
  {
    // Load all blogs into memory
    db.Blogs.Load();

    // Change the name of one blog
    var blog = db.Blogs.First();
    blog.Name = "ADO.NET Team Blog";

    // Print out state for each blog that is in memory
    foreach (var entry in db.ChangeTracker.Entries<Blog>())
    {
      Console.WriteLine("BlogId: {0}\n State: {1}\n",
        entry.Entity.BlogId,
        entry.State);
    }
  }

When the code in Figure 5 is run, the console output is as follows:

BlogId: 1
  State: Modified
 
BlogId: 2
  State: Unchanged

Querying Local Instances Whenever you run a LINQ query against a DbSet, the query is sent to the database to be processed. This guarantees that you always get complete and up-to-date results, but if you know that all the data you need is already in memory, you can avoid a round-trip to the database by querying the local data. The code in Figure 6 loads all blogs into memory and then runs two LINQ queries for blogs that don’t hit the database.

Figure 6 Running LINQ Queries Against In-Memory Data

static void Main(string[] args)
{
  Database.SetInitializer<BlogContext>(new BlogInitializer());

  using (var db = new BlogContext())
  {
    // Load all blogs into memory
    db.Blogs.Load();

    // Query for blogs ordered by name
    var orderedBlogs = from b in db.Blogs.Local 
                       orderby b.Name
                       select b;

    Console.WriteLine("All Blogs:");
    foreach (var blog in orderedBlogs)
    {
      Console.WriteLine(" - {0}", blog.Name);
    }

    // Query for all RSS enabled blogs
    var rssBlogs = from b in db.Blogs.Local
                   where b is RssEnabledBlog
                   select b;

    Console.WriteLine("\n Rss Blog Count: {0}", rssBlogs.Count());
  }

  Console.ReadKey();
}

When the code in Figure 6 is run, the console output is as follows:

All Blogs:
 - blogs.msdn.com/data
 - romiller.com
 
Rss Blog Count: 1

Navigation Property as a Query DbContext allows you to get a query that represents the contents of a navigation property for a given entity instance. This allows you to shape or filter the items you want to bring into memory and can avoid bringing back unnecessary data.

For example, I have an instance of blog and want to know how many posts it has. I could write the code shown in Figure 7, but it’s relying on lazy loading to bring all the related posts back into memory just so that I can find the count.

Figure 7 Getting a Count of Database Items with Lazy Loading

static void Main(string[] args)
{
  Database.SetInitializer<BlogContext>(new BlogInitializer());

  using (var db = new BlogContext())
  {
    // Load a single blog
    var blog = db.Blogs.First();

    // Print out the number of posts
    Console.WriteLine("Blog {0} has {1} posts.",
      blog.BlogId,
      blog.Posts.Count());
  }

  Console.ReadKey();
}

That’s a lot of data being transferred from the database and taking up memory compared to the single integer result I really need.

Fortunately, I can optimize my code by using the Entry method on DbContext to get a query representing the collection of posts associated with the blog. Because LINQ is composable, I can chain on the “Count” operator and the entire query gets pushed to the database so that only the single integer result is returned (see Figure 8).

Figure 8 Using DbContext to Optimize Query Code and Save Resources

static void Main(string[] args)
{
  Database.SetInitializer<BlogContext>(new BlogInitializer());

  using (var db = new BlogContext())
  {
    // Load a single blog
    var blog = db.Blogs.First();

    // Query for count
    var postCount = db.Entry(blog)
      .Collection(b => b.Posts)
      .Query()
      .Count();

    // Print out the number of posts
    Console.WriteLine("Blog {0} has {1} posts.",
      blog.BlogId,
      postCount);
  }

  Console.ReadKey();
}

Deployment Considerations

So far I’ve looked at how to get up and running with data access. Now let’s look a little further ahead at some things to consider as your app matures and you approach a production release.

Connection Strings: So far I’ve just been letting Code First generate a database on localhost\SQLEXPRESS. When it comes time to deploy my application, I probably want to change the database that Code First is pointed at. The recommended approach for this is to add a connection string entry to the App.config file (or Web.config for Web applications). This is also the recommended approach for using Code First to map to an existing database. If the connection string name matches the fully qualified type name of the context, Code First will automatically pick it up at run time. However, the recommended approach is to use the DbContext constructor that accepts a connection name using the name=<connection string name> syntax. This ensures that Code First will always use the config file. An exception will be thrown if the connection string entry can’t be found. The following example shows the connection string section that could be used to affect the database that our sample application targets:

<connectionStrings>
  <add 
    name="Blogging" 
    providerName="System.Data.SqlClient"
    connectionString="Server=MyServer;Database=Blogging;
    Integrated Security=True;MultipleActiveResultSets=True;" />
</connectionStrings>

Here’s the updated context code:

public class BlogContext : DbContext
{
  public BlogContext() 
    : base("name=Blogging")
  {}

  public DbSet<Blog> Blogs { get; set; }
  public DbSet<Post> Posts { get; set; }
}

Note that enabling “Multiple Active Result Sets” is recommended. This allows two queries to be active at the same time. For example, this would be required to query for posts associated with a blog while enumerating all blogs.

Database Initializers By default, Code First will create a database automatically if the database it targets doesn’t exist. For some folks, this will be the desired functionality even when deploying, and just the production database will be created the first time the application launches. If you have a DBA taking care of your production environment, it’s far more likely that the DBA will create the production database for you, and once your application is deployed it should fail if the database it targets doesn’t exist. In this article, I’ve also overridden the default initializer logic and have configured the database to be dropped and recreated whenever my schema changes. This is definitely not something you want to leave in place once you deploy to production.

The recommended approach for changing or disabling initializer behavior when deploying is to use the App.config file (or Web.config for Web applications). In the appSettings section, add an entry whose key is DatabaseInitializerForType, followed by the context type name and the assembly in which it’s defined. The value can either be “Disabled” or the initializer type name followed by the assembly in which it’s defined.

The following example disables any initializer logic for the context I’ve been using in this article:

<appSettings>
  <add 
    key="DatabaseInitializerForType Blogging.BlogContext, Blogging" 
    value="Disabled" />
</appSettings>

The following example will change the initializer back to the default functionality that will create the database only if it doesn’t exist:

<appSettings>
  <add 
    key="DatabaseInitializerForType Blogging.BlogContext, Blogging" 
    value="System.Data.Entity.CreateDatabaseIfNotExists EntityFramework" />
</appSettings>

User Accounts If you decide to let your production application create the database, the application will need to initially execute using an account that has permissions to create the database and modify schema. If these permissions are left in place, the potential impact of a security compromise of your application is greatly increased. I highly recommend that an application is run with the minimal set of permissions required to query and persist data.

More to Learn

Summing up, in this article , I  took a quick look at getting started with Code First development and the new DbContext API, both of which are 
included in the ADO.NET Entity Framework 4.1. You saw how the fluent API can be used to map to an existing database or to affect the shape of a database schema that’s generated by Code First. I then looked at the change tracker API and how it can be used to query local entity instances and additional information about those instances. Finally, I covered some considerations for deploying an application that uses Code First for data access.

If you’d like to know more about any of the features included in the Entity Framework 4.1, visit msdn.com/data/ef. You can also use the Data Developer Center forum to get help with using the Entity Framework 4.1: bit.ly/166o1Z.


Rowan Miller is a program manager on the Entity Framework team at Microsoft. You can learn more about the Entity Framework on his blog at romiller.com.

Thanks to the following technical expert for reviewing this article: Arthur Vickers