EF Merge Options and Compiled Queries

Recently there have been some questions about compiled queries and how they relate to merge options.  As I looked into it I realized that I didn’t fully understand the details of how it all worked, so I walked down the hall and spent a little while chatting with one of the devs who has done the most work on this part of the code.  He was able to give me a more complete story, and I wanted to get it out to all of you—especially since this area of the EF just isn’t as clean as we would like.  This is definitely another thing on the list for improvement in a future release (*after* .Net 4), but for now at least I hope to help you understand how things work so you can figure out how to get by until we can make it better.

Part 1: Merge Options

The MergeOption enum property of ObjectQuery<T> has a significant effect on the way the EF processes a query.  The default is MergeOption.AppendOnly, and it sort of does the magic identity resolution pattern which is the default for most ORMs—that is, new entities retrieved by the query are attached to the ObjectContext, and if an entity which has the same key as an incoming entity is already attached to the context, then that object is returned as is rather than the incoming entity.  Two of the other merge options are primarily used for conflict resolution: MergeOption.PreserveChanges will update the original values that the context has for an entity with the current data from the database so that the changes you have made in the context will be saved to the database and replace whatever is there, while MergeOption.OverwriteChanges will update the current values the context has so that your entity is changed to match the current state of the database.  The most interesting one of the merge options, though, is MergeOption.NoTracking which says essentially “just give me whatever entities come from the database without trying to attach or identity resolve them against the context.”  The NoTracking option is, understandably, the fastest one.  What’s a little less obvious is that NoTracking in many cases will also produce a different query than the others because it is the simplest, most streamlined version that does the least magic so things like automatically retrieving EntityKey properties for entities related with independent associations do not happen with NoTracking queries—only with the other merge options.

For the purpose of today’s post, however, let’s home in on another aspect of MergeOptions which is that they are a property of a particular query instance NOT a property of the ObjectContext.  So, for instance, I can use LINQ with the EF and create two separate queries which retrieve data from the same entity set but have different merge options, and to make things clear I would do that by getting the query instance and setting the MergeOption property on that query.  Unfortunately, since IQueryable is a fixed interface for LINQ, and it’s what is returned when I use LINQ syntax, I typically have to cast my LINQ query to ObjectQuery before I can set the merge option.  The result is code that looks something like this:

 // executing a query directly from the context will get the default AppendOnly option
var query1 = from c in ctx.Customers
             where c.Country == "UK"
             select c;
var customer1 = query1.First();
Debug.Assert(customer1.EntityState == EntityState.Unchanged);

// if I want a different merge option, then I set the MergeOption on the query instance
var query2 = from c in ctx.Customers
             where c.Country == "UK"
             select c;
((ObjectQuery)query2).MergeOption = MergeOption.NoTracking;
var customer2 = query2.First();
Debug.Assert(!Object.ReferenceEquals(customer1, customer2));
Debug.Assert(customer2.EntityState == EntityState.Detached);

One pattern you will sometimes see is someone setting a merge option on the ObjectQuery (or ObjectSet in the case of EF4) property on the context and then writing a query which uses that query.  This actually works, to a point, because the getter for the ObjectQuery properties on the context will cache the instance of the ObjectQuery (at least for the default codegen – if someone uses a custom template or writes the context by hand, then all bets are off).  So it’s important to remember that this is a property of the query instance not the context.  Let me show you how you could get tripped up if you weren’t careful:

 var query1 = from c in ctx.Customers
             where c.Country == "UK"
             select c;
ctx.Customers.MergeOption = MergeOption.NoTracking;
var customer1 = query1.First();
Debug.Assert(customer1.EntityState == EntityState.Unchanged);

In the code above, what do you expect?  Would the Assert fire or not?  What about with this code:

 ctx.Customers.MergeOption = MergeOption.NoTracking;
var query2 = from c in ctx.Customers
             where c.Country == "UK"
             select c;
var customer2 = query2.First();
Debug.Assert(customer2.EntityState == EntityState.Detached);

As it turns out, neither assert will fire, because in the first case, the merge option wasn’t set until after the query was created and since the merge option comes from the query not the context, query1 just uses the default merge option so the entity that is retrieved is attached and ends up in the UnChanged state.  In the second case, the query is created after the merge option was set, so that query is based on the Customers query returned from the property and inherits its merge option with the result that when the customer is retrieved from the database it uses NoTracking and the entity ends up Detached.

If you look at the generated code for the Customers property on the context you will see the ObjectQuery caching in action:

 public ObjectSet<Customer> Customers
{
    get
    {
        if ((_Customers == null))
        {
            _Customers = base.CreateObjectSet<Customer>("Customers");
        }
        return _Customers;
    }
}
private ObjectSet<Customer> _Customers;

If this code had instead been written to just always return the result of calling CreateObjectSet (a shortcut I sometimes take if I write an ObjectContext by hand for a simple POCO example or something), then both of the above queries would have used the default merge option because the line above which sets the MergeOption would set it on a newly created ObjectSet which is then thrown away because no one holds onto the reference, and the line below it that creates query2 would use a different newly created ObjectSet.

Are you with me so far?  Put that info on the backburner, and let’s look at how compiled queries work.

Part 2: Compiled Queries

The idea of compiled queries is to reduce the cost associated with executing a particular query the first time by making sure that you pay that cost only once if you want to execute that query multiple times.  The way it works is that you create your LINQ query in advance and call the Compile method to get back a special delegate that you can use later to execute that query.  There are three steps.  First, you declare a static and initialize it to the delegate returned by calling the compile method.

 static Func<NorthwindEntities, string, IQueryable<Customer>> compiledQuery = 
    CompiledQuery.Compile((NorthwindEntities ctx, string country) =>
        (from c in ctx.Customers
         where c.Country == country
         select c));

When you are ready to execute the query, you invoke the delegate and pass in the context and parameters in order to get back the ObjectQuery.

 var query = compiledQuery(ctx, "UK");

This query can then be used like any other:

 foreach (var customer in query)
{
    // do some stuff
}

As it turns out, though, there are a few very unexpected things about the way compiled queries work.  Again, I can only say that I’m sorry that this is such a tricky part of the EF, that we’ll work on it in a later release, and that in the meantime if you know how it works, you can at least figure out how to get your app to function:

  1. No real work happens until the first time the query is actually executed.   CompiledQuery.Compile is NOT like a prepare method where the heavy lifting happens when you call prepare.  The hard work doesn’t even happen when the delegate is invoked to return the ObjectQuery.  It only happens when the query’s execute method is called or more commonly when the query is enumerated.  After the first time it is executed, though, all the hard work is cached so that future executions are faster.

  2. If you create a new query based on the compiled query, it will work—you just won’t get any benefit from the compilation.   ANYTHING that changes the query which will be sent to the server, will produce a new query which isn’t precompiled.  Calling .First() or .Count() or .Any(), for instance, will change the query.  You need to keep the query the EXACT same.  One way to accomplish this is to call the .AsEnumerable() method on the compiled query and then any additional methods you call will be executed by LINQ to Objects so you will get the benefit of the compilation for the part that goes against the database and then do other steps in memory once the database query is done.

The third unexpected behavior deserves a whole section of its own…

Part 3: Combining Merge Options and Compiled Queries

The real tricky thing here comes when you combine compiled queries and merge options.  Because the merge option is a property of the ObjectQuery instance, and with compiled queries you don’t actually get that instance until you invoke the delegate, you can’t specify the merge option when you initially call the compile method or when you invoke the delegate.  The merge option does, however, have an effect on the actually generated query, so at the time when the query is first executed (that’s when all the interesting work happens, remember) the merge option is locked in.  When the query is first executed, the EF examines the merge option on the ObjectQuery property of the context passed into the delegate.  After that first execution, though, the merge option will be the same for any subsequent execution of the query regardless of the merge option set on the ObjectQuery of the context used for any subsequent execution.

So if you take the following code (using the same compiled query delegate created in the example in part 2 above):

 using (var ctx1 = new NorthwindEntities())
{
    ctx1.Customers.MergeOption = MergeOption.NoTracking;
    var query1 = compiledQuery(ctx1, "UK");
    var customer1 = query1.AsEnumerable().First();
    Debug.Assert(customer1.EntityState == EntityState.Detached);
}

Then the compiled query will use the no tracking merge option so the assert doesn’t fail.  If you follow that with this code…

 using (var ctx2 = new NorthwindEntities())
{
    var query2 = compiledQuery(ctx2, "France");
    var customer2 = query2.AsEnumerable().First();
    Debug.Assert(customer2.EntityState == EntityState.Detached);
}

The second execution will also use the NoTracking merge option even though the ObjectQuery on its context has AppendOnly as its merge option.

To make matters a little more complicated, keep in mind unexpected behavior #2 which means if I had left out the AsEnumerable calls, then the query executions will be using new queries rather than reusing the compiled query so the second execution would NOT have the locked-in merge option and would instead pick up the merge option from the query on the context.  So if the second execution code looked like this:

 using (var ctx2 = new NorthwindEntities())
{
    var query2 = compiledQuery(ctx2, "France");
    var customer2 = query2.First();
    Debug.Assert(customer2.EntityState == EntityState.Detached);
}

Then the assert would fail because the AppendOnly merge option would be used and the entity state would end up Unchanged rather than Detached.

Summary

What can I say?  Compiled queries are tricky, and when you combine them with merge options they get even trickier, but the performance benefit can be huge, so it’s worth learning about how they work.  Keep in mind these three potentially unexpected behaviors:

  1. No real work happens until the first time the query is actually executed. 
  2. If you create a new query based on the compiled query, it will work—you just won’t get any benefit from the compilation. 
  3. The merge option used with a compiled query is determined by the merge option specified on the ObjectQuery used as the basis for the compiled query at the time the query is first executed.

Now, back to trying to find time to complete the next phase of D3 and get that posted.  :-)

- Danny