Unity Container speed test – huh?

I was doing a little surfing and came across a post on DI container performance measures by Philip Mateescu. Having just done an intro article on Unity, I was interested in seeing how it placed. I hadn’t thought much about the performance implications of the containers, but it knew could be very useful information to have. I was a bit shocked when it got to the performance of the singleton_loaded and transient_loaded tests.

Unity posting a time 363X slower than the leader was a little surprising. That usually is an indication that something isn’t right. Since the source code for Unity is available, I decided it might be worth a look.

Creating a baseline

I didn’t have a lot of fancy equipment available to dig into the problem. Just my somewhat dated laptop, and and the ability to create debug builds of Unity from source.  I figured that should be sufficient to narrow down the issue though. I ran the tests using the code and binaries I assumed were used to develop the numbers in the post:

 Doing 5 runs of 100000 DI loops each. Program: singleton_loaded.
 Autofac        :     253 ms.
 Castle Windsor :     420 ms.
 Ninject        :   1,277 ms.
 Spring         :     306 ms.
 StructureMap   :     489 ms.
 Unity          :  76,423 ms.

The numbers were a bit larger than those from the post. I expected that because my machine is probably not as hefty as the machine it was run on. But also, the numbers may not be collected the same way as was for the post. My numbers seem to be averaged over the 5 runs.

The next step was to drop in the binary I built from the source to see what the delta from the baseline was. My source was for building version 2.1 against .Net 4.0. My build was a debug build so some slow down was inevitable:

 Doing 5 runs of 100000 DI loops each. Program: singleton_loaded.
 Autofac        :      225 ms.
 Castle Windsor :      423 ms.
 Ninject        :    1,294 ms.
 Spring         :      312 ms.
 StructureMap   :      488 ms.
 Unity          :  115,946 ms.

Pretty scary, but it’s a place to start.

Profiling the test

I next ran the test in the profiler and found that the IsRegistered method called from the LoadedUnityRunner.Run method was using 95% of the time:

 if (k.IsRegistered<IDummy>())
     k.Resolve<IDummy>().Do();

I’ve seen this pattern before – usually with a dictionary however:

 if (m_dictionary.Contains(key))
 {
     m_dictionary.Remove(key);
 }

The problem is this pattern is duplicating work. Specifically, searching for the specified key twice.  I was curious though why the IsRegistered method was so much more expensive.  By looking at the implementation I found the problem:

 public static bool IsRegistered(this IUnityContainer container, Type typeToCheck, string nameToCheck)
 {
    Guard.ArgumentNotNull(container, "container");
    Guard.ArgumentNotNull(typeToCheck, "typeToCheck");
  
    var registration = from r in container.Registrations
                       where r.RegisteredType == typeToCheck && r.Name == nameToCheck
                       select r;
    return registration.FirstOrDefault() != null;
 }

This algorithm is very likely different than the Resolve implementation. Also, since container.Registrations returns IEnumerable<> and not IQueryable<>, there probably isn’t much opportunity for the Linq expression to optimize the performance.

We are lucky in this case that the IsRegistered method really isn’t relevant to the test and can be removed. Resolve should indicate if it can’t find the item. In any case, the test was written in so that way it should always find the item.  I rewrote the the code as follows:

 var r = k.Resolve<IDummy>();            
 if (r != null)
 {
     r.Do();     
 }

This is in fact similar to how other runners were written. After inspecting all the other runners and applying similar fixes to the Spring and Windsor runners I was ready to try again.

Rerunning the test

Here are the results from the next run:

 Doing 5 runs of 100000 DI loops each. Program: singleton_loaded.
 Autofac        :    238 ms.
 Castle Windsor :    381 ms.
 Ninject        :  1,328 ms.
 Spring         :    227 ms.
 StructureMap   :    509 ms.
 Unity          :    645 ms.

Not bad huh?  Spring performance improved by 27%, Windsor by 9.9%, and Unity by well, a lot. Although Unity is not the fastest, it is at least in the ballpark now.

Contacting the Author

At this point I was ready to contact Philip, share my results, and ask him to rerun the tests with the changes above. It turns out he had already posted an update with the changes and additional information on Unity performance. In short, IsRegistered is not to be used for other than debugging purposes.

Summary

Although my numbers aren’t directly comparable to those from the original post, I think it is safe to say that the reason the Unity numbers were off, was because the task asked of each container was not the same. Philip did a good follow up to his original post explaining the difference in the original numbers. If you are using Unity, you are probably not going to suffer the scary performance hits unless you use container.IsRegistered in the inner loop of your code.

Philip's post was a good reminder that one should always be aware of the performance behavior of the code you are building on. It provided me with a good opportunity to get hands-on with the Unity container.