Digging into interface calls in the .NET Framework: Stub-based dispatch.
In my last blog, I demonstrated how to use the .NET SOS.DLL extension DLL in the Visual Studio. In this entry I will use the capabilities of this extension DLL to dig into the way the .NET framework handles interface dispatch. As with all my blog entries, I hope that after reading this you will be inspired to try variations of the experiment I do here yourself.
Caveat: Interface dispatch has changed dramatically between version 1.1 and version 2.0 of the runtime. The example below is showing what happens for version 2.0. The techniques can be used to investigate how 1.1 interface dispatch works, however it is quite different.
Consider the follow program, that simply defines a do-nothing method ‘anIntefaceMethod’ and calls it 10 times in a loop. The question is: exactly what is the overhead of dispatching to an interface method?
class Test : AnInterface
public void anInterfaceMethod()
static public void Main()
AnInterface anInterface = new Test();
for(int i=0; i < 10; i++)
anInterface.anInterfaceMethod(); // breakpoint
Remember as always, you want to make certain that you have compiled this code with optimizations (The ‘Release’ build configuration, and that you have set the debugging options as explained in a previous blog.
By placing a breakpoint on the call to ‘anInterfaceMethod’ in Main, we can quickly switch the disassembly view (Debug->Windows->Disassembly) and see that the code for dispatching to this method is
call dword ptr ds:[00920018h]
Thus the dispatch is an indirect call through the memory location 0x0920018. You can use the Debug->Windows->Memory->Memory1 window to look at what is at 0x0920018 (Right click in the memory window to set the window to display as 4 byte integers, and be sure to 0x in front, as otherwise it is interpreted as decimal). In my case I find that the memory 0x0920018 has the value 0x0926012. (Note that you are likely to get different addresses if you try this yourself.)
Note that if I hit ‘go’ several times to let the ‘for’ loop run a few times, something rather surprising happens. The first time through the loop, value at 0x0920018 does not change, but the second time it changes value (in my case to 0x0092709e). After the first two times, the value at 0x0920018 no longer changes. Thus there is something special going on on the first couple of calls.
Most of the time, you are interested in the ‘steady-state’ performance characteristics, so it is not as interesting to understand the code on first couple of calls (in my case 0x0926012) then it is to understand the code on all other cases (in my case the code at 0x092709e). This is just as well, because as we will see, that case is more complicated. Let's look at what is going on on after the first two calls.
Unfortunately, If you try to do the ‘obvious' thing and look at the code at 0x092709e in Visual Studio’s disassembly window you will get a dialog box saying the code can not be displayed. Visual Studio is trying to ‘protect’ you against seeing internal details of the runtime (I find this rather annoying actually, the code really is there). This is where SOS comes to the rescue.
Following the instructions in my last blog, I set the ‘Enable Unmanged Code Debugging’ box on the project, and restart the debugger (hitting ‘go’ three times). Then I can open the Debug->Windows->Immediate window, and type
0092709E 813970309100 cmp dword ptr [ecx],913070h
009270A4 0F85F32F0000 jne 0092A011
009270AA E98D904B00 jmp 00DE00B0
It turns out that the 'jne' jump is almost never taken, so the interesting target is 0x0DE00B0. I do a !u to find out what code is at that address.
Normal JIT generated code
Begin 00de00b0, size 1
>>> 00DE00B0 C3 ret
Which is the telling me that 0x00DE00B0 the start of the method Test.anInterface(), and that this routine consists of a single return instruction.
The other intersting piece of data in the code at 0x092709e is the value used in the 'cmp' (compare) instruction, 0x913070. I happen to know that this is points at an internal data strcuture of the runtime called a 'MethodTable'. This structure represents a type of an object and the first field of every object in the GC heap points as this structure. SOS has a command called '!DumpMT' (Dump Method Table), that can display this. If
which is saying that 0x913070 is the internal type representation for the class ‘Test’.
Thus we have found the exact sequence of instructions that gets executed on most interface calls for my sample program. They are
mov ecx,esi // set up 'this'
call dword ptr ds:[00920018h] // call interface
cmp dword ptr [ecx],913070h // check if 'this' is an instance of 'Test'
The first two instructions are at the call site, and the last three are in a ‘stub’. Thus it takes 5 instructions to dispatch to a interface method in this case.
What is going on here?
OK, we have seen the code sequence used for interface dispatch for a particular example, now we need to understand it at an architectural level. The dispatch mechanism has the following attributes
- Every call site that dispatches through and interface has a pointer associated with it that points at code that actually performs the dispatch (in our example above this is 00920018). As execution proceeds, this pointer may change (that is the code that performs the dispatch might change).
- In the example above, this pointer quickly stabilizes to a 3 instruction sequence that checks to see if the 'this' object is of a particular type and if so dispatches to a particular address.
We can see that this technique works very well if at any particular dispatch site, you tend to call interfaces on object of the same type. This would happen for example if you are iterating over a collection which happens to have all the same type elements in it (a common case). In that case, the extra overhead is only 3 instructions (and these three instructions dispatch very well in modern processors).
Of course this is not always the case, in which case we care about what happens when the match does not happen. In this case, things get more complicated. Initially, it simply creates a new stub (with a new method table comparison), and makes the call site point at the new stub (hence the need for an indirect call). This works well if you tend to make many calls on the same object type for a while, then switch to another (for a while).... The common example here is library code on homogenous collections. The update logic is 'smart' however in that if it detects that you are bouncing back and forth between multiple objects types for a particular dispatch site, it replaces the stub with a more complicated one that does a hash-table style lookup (which will be slower, but does not need to be replaced on every change).
But I think we have gone far enough for this blog. We can go into details of polymorphic dispatch sites in another blog entry if there is interest.
Interface dispatch in the version 2.0 .NET Runtime is based on dispatch stubs.
- Each call site might have a different dispatch stub.
- Initially, these stubs point at a generic dispatcher, however quickly get updated to a 'monomorpic' stub that 'guesses' that the call site always dispatches to a particular destination. This works surprisingly well in most cases, and the overhead of dispatch for this case is very low.
- The stubs get updated if the stub's guess is wrong. First it simply tries a different monomorphic stub, however eventually it replaces it with a polymorphic stub that knows how to dispatch reasonably efficiently to one of several targets.
As always, I recommend you try these experiments yourself. For example, you can confirm that if a call site dispatches to two different targets, it will follow the heuristics that I describe above (Exercise for the reader).