64-bit calling convention and PInvoke bugs

Ok, so Word for the Mac is failing me right now. I've tried twice to start this entry there and both times Word has gone kaput on me. Back to my trusty text editor… As for the inevitable "why Mac?" question... Well, I still haven't found a laptop I like as much as my Titanium Powerbook. What can I say, I'm a hardware snob.

I originally intended to write my next entry about the GAC and its usefulness on 64-bit machines (for both the 32-bit and 64-bit CLR(s) that live there). I think this an interesting topic, especially given this article on Chris Sells' site. Alas, in writing it I realized that I need to do a little research and talk to a couple people before I feel completely competent with my facts.

So, to whet your appetite while we wait, how about an entry about the managed x64 calling convention, and a fun PInvoke bug that shows up on 64bit platforms because of the hardware difference and this calling convention? I would highly recommend reading Raymond's treatment of x64 calling convention.

One of the nice things about x64 is that we have narrowed ourselves down to one standard calling convention unlike x86 (of course if you're writing assembly you can do whatever you want). Both native code generated by the VC++ compiler and JITted managed code follows this convention. And it goes something like this:

- Arguments 1-4 are passed in registers rcx,rdx,r8,r9 (or if floating point Xmm0-Xmm3)

- Spill space is allocated by the caller for the enregistered parameters

- Additional parameters are passed on the stack previous (stack grows down) to this spill space in SLOT sized (read: 8-byte) chunks (i.e. even if you have a 1-byte bool, if you pass it on the stack it will take 8 bytes).

- The call instruction pushes an 8-byte return address onto the stack; this value will immediately follow the spill space for rcx.

- The stack must always be aligned to 16 bytes by non-leaf functions (read: if you make a call then you have to align it in the prolog).

- Non floating point returns are through rax (exception "retbufarg" which is treated later).

- Floating point returns are in Xmm0.

Floating point note: enregistered floating point parameters are put into the floating point register corresponding to their correct position in the argument list (e.g. if parameter 3 is floating point then it will be in Xmm2 instead of r8, this is different from IA64 where the floating point registers are filled using a "next available" heuristic).

That's the basics. Here are a couple of rules that build on that:

- If there is a "this" parameter (i.e. instance methods) it is put at the front of the argument list as arg1 and other args are moved by 1 slot.

- If there is a "retbufarg" then it will be treated as arg1, moving other arguments by 1 slot (including the “this” parameter). (e.g. arg1=retbufarg, arg2=this, arg3=declaration arg1, arg4=declaration arg2, etc...)

Most people (at least those reading this blog right?) know what a "this" parameter is, but what's a "retbufarg" parameter? It is a "secret" reference to space that is caller-allocated to receive the return value. This "retbufarg" parameter is passed when we can't put the return value in the return register rax. On x64 this happens when:

- the return value > 64-bits (e.g. won't fit in rax), excepting Doubles which will be returned through Xmm0.

- the size of the return value is not a power of two. e.g. a 7-byte value class (struct) returned by value will be returned by reference in a retbufarg.

Ok, so that's all well and good, but why did I need to know that you might be asking? Well, b/c it can affect lots of things. Lets take a PInvoke example that one of the devs on the 64-bit CLR team ran into on Thursday:

// defintion that worked on 32-bit

 [DllImport(ExternDll.User32, ExactSpelling=true)]

public static extern IntPtr MonitorFromPoint(int x, int y, int flags);

The actual Win32 API specified that MonitorFromPoint() takes a POINT structure and an int argument named flags. Someone decided that it would be nice to not have to define a POINT structure (which is just an 8-byte structure consisting of two ints, x and y) and instead wrote their PInvoke using the two ints shown above.

This works on x86 where those parameters are passed on the stack. In fact, because you get lucky with the calling convention they look to the Win32 API as if you had correctly declared the POINT structure and passed it instead.

But!! On x64 this breaks in a rather interesting way... Let's go back to the calling convention discussion above. Using this scheme, the parameters will be set up as such:

rcx <- x

rdx <- y

r8 <- flags

Now, these register slots on x64 are 8 bytes wide, which means our 8-byte POINT structure, when passed by value, should actually be passed in a single register. What was the MonitorFromPoint() Win32 API expecting?

rcx <- POINT { LONG x, LONG y }

rdx <- flags

NOTE: keep in mind that the LONG as specified by MSDN here is the c++ LONG which is still 32 bits on 64-bit platforms, not 64 bits like the C# long. It is the equivalent of the C# int.


[correction made here, x/y high/low were reversed]

MonitorFromPoint() expected that x was the low 4 bytes of rcx and y was the high 4 bytes. As can be imagined, this code failed horribly on x64 as such:

-Specifically, the call was in some code that tried to compensate for multiple monitors by putting a dialog on the monitor where your mouse is.

return new Screen(SafeNativeMethods.MonitorFromPoint(point.X, point.Y, MONITOR_DEFAULTTONEAREST));

- The calculation depends on the x and y that you pass it (remember that the monitor’s upper left hand corner actually starts at 2000, 2000 or something like that)

[correction made here, re:messing up x/y position within struct... wrote it too late at night]

- The calculation that we do ends up FUBAR because the x you give the method ends up being seen by the Win32 API as the whole POINT structure. Thus, it thinks that y==0, and the dialog ends up pretty much unusable up in the upper left hand corner of the screen (halfway off the screen) with its title bar inaccessible to grab it and move it.

So, the fix, if you haven’t already guessed, is to define a POINT structure containing 2 ints “x” and “y” which you then correctly define as the first parameter to MonitorFromPoint(), in this way ensuring that the usage of MonitorFromPoint() is correct.

public static extern IntPtr MonitorFromPoint(NativeMethods.POINT pt, int flags);

NOTE: this will fail in the same way on IA64, but since this is an entry about the x64 calling convention, I thought I'd stick to talking about x64.

PInvoke errors are insidious because you might take for granted that the method you're calling is declared correctly. You would be likely to spend hours having to convince yourself that your managed code is correct. Or even worse, spend hours looking at your unmanaged code (or the disassmbly of some unmanaged code in Win32 for instance), convinced it is broken. Usually if there are PInvokes involved, I would take a look at those first, hopefully some of the CDP (customer debug probes) that are going into CLR for V2.0 will help out a lot. I haven't really played with them at all, but Adam Nathan's blog would probably be a good place to start.

Additionally, Raymond discusses what can go wrong when you mismatch calling conventions. This is something you might think impossible on 64-bit as we only have the one... But, a PInvoke declaration can have calling convention assumptions built into it, as seen above... Yet another case of "old problem, new form"!