Performance Quiz #6 -- Analyzing the managed code
Raymond didn't post an update today so I think this is a good time for me to post the first analysis of the managed code. There's some things to notice right away and I've highlighted them in the table below. This table was generated in the same way as the others and again I've pruned the tree so that no functions with inclusive time less than 5% are shown, so that means some things won't add up because some small things are gone -- still it helps focus attention on the relevant bits and keeps this table reasonably sized. I've also removed functions which were basically just zero or near zero cost wrappers, so there are some internal "..." entries. That's what those are.
|__CorExeMain (from mscoree.dll)||0.00||92.31|
|__CorExeMain (from mscorwks.dll)||0.00||88.46|
|ExecuteEXE(struct HINSTANCE__ *)||0.00||69.23|
|Assembly::ExecuteMainMethod(class PtrArray * *)||0.00||59.62|
First thing to notice is that the time attributed to the overall "main" of the program (that would be __CorExeMain) is only 92%. Why isn't that 100%? Well there's a few reasons. The first is that the CLR has more than one thread so some of the time went to spinning up the finalizer thread for instance. A second reason is that some percentage of the run will be kernal time and we won't be able to attribute those stacks to anything so those are gone. And the third, and most subtle reason, is that some places during the run of a managed program have a stack frame that is so crazy (usually because we're in the middle of an exotic transition) that the profiler can't walk it. That causes it to drop some samples. The result of these factors is that main is usually close to but not 100% of the time. In this case 8% of the samples were elsewhere or not traceable.
But now lets get to some meatier conclusions: NS.Test.Main() which is the managed main function is only getting 59% of the time. Even allowing for the the fact that main was only 92.31% of the time, the managed main is only 64.6% of that. Why is that?
Well there were some assorted small things that were in the set of things pruned from this view (at less than 5%) but they added up... some of it was checking the security configuration only to discover that we are full trust, some of it was looking for possible administrative configurations, some of it was setting up the performance counters for this instance of the application.
Another chunk of the time still visible even in this pruned view -- EnsureEEStarted() for instance at 18.27% of the startup including 6.73% to get the profiling services working. That part of the cost would naturally not even be present if the program wasn't run under the profiler. All these startup tasks tend to require other DLLs, typically already loaded but we still have to get access to them in this process. It all adds up.
Am I happy with a number like 59.62%? Well no, it's my job to make that number bigger (i.e. reduce the overhead) but this is a fairly small application... the whole run will be just over 100ms so basically the (fairly) fixed cost of starting up the runtime is significant in this case. It would be less daunting if the program did more.
Note to reviewers of managed systems: If you want to make a framework -- any framework -- look bad, simply ask it to do nothing. It turns out pretty much all known frameworks are lousy at doing nothing. So "hello world" for instance is a great way to make a framework look like a pig.
Still there's room for improvement there which is why I have a job :)
Moving on, System.IO.StreamReader.ReadLine() is at 28% which may not look too high, after all this program is primarily a parser. However, looking closer we can see that the hefty work, decoding code page 950, is in DBCSCodePageEncoding.GetChars() and that's only at 15.39%. It's not unhealthy to see a number like 15% spent doing one of the nice meaty jobs at hand but how did we go from 15% to decode the text all the way to 28% to decode the text and break it into lines? That seems excessive... maybe we could do something about that.
Looking further down NS.Test.DictionaryEntry.Parse() is up at 26% compared to the 28% for System.IO.StreamReader.ReadLine(). Now what's up there... it's just some simple scanning and string allocation. The string allocation should be the main cost yet it's only 14.42% of the time in Substring of which a mere 6.73%. Plus the IndexOfChar worker, which should be doing most of the searching, is only at 5.77%. Well there is a good bit of control flow and function call setup in the Parse function. That's probably a good chunkof it right there.
Still, the overall picture is pretty good. No exotic critical sections, no weird interlocked operations, very little time in the allocator. You can see why we posted a great out-of-the-gate number like 124ms. Even with the CLR's fixed overhead we got great value out of the allocator and the input mapping services.
But we can make it better still :)