Stack Corruption: Calling convention mismatch
Stack is one of the most widely understood data structures in computer science. It is a general purpose data structure and is a part of most of modern day computer architectures as well. In the context of a thread running in an executing process, “the stack” is the memory specifically given to that thread for storing local variables, function parameters, return addresses, and other register values that need to be saved for later retrieval. Stack corruptions tend to be trickier memory corruptions to find. So I decided to write a series of blogs on stack corruptions. Some of them, like buffer overruns, can even lead to security risks. But focus of this article shall be calling convention mismatches.
So what is a calling convention? A calling convention — as the name suggests — is a convention between the caller of function and the function itself. It is a set of predefined rules that both of them agree upon to maintain the integrity of stack. These rules describe essentially two things:
1) How parameters are passed to the called function.
2) How parameters are cleaned up once the function call is finished.
Even the name of function emitted by compiler (a.k.a. “name mangling”) is governed by the calling convention.
We have only one calling convention for x64 systems — fastcall. But on x86, we have multiple types of calling conventions allowed, of which the most commonly used are as follows:
1) Stdcall Stack. 2) Cdecl Stack. 3) Fastcall. 4) Thiscall
While in cdecl, the callee cleans the stack, in the rest of them the caller does. One might think, “why is cdecl even required, as it is an overhead in terms of size?” But in C/C++, functions can take a variable number of arguments, so cdecl is a necessity.
In FastCall, the ECX and EDX registers are used to pass first two arguments from right to left. And in ThisCall, the ECX register contains the this pointer.
Now let us write a test program to discuss calling convention mismatch errors.
Say you have defined a function in a DLL with standard calling convention:
void _stdcall testFunc(void)
Now we call this function from a client EXE that refers to this DLL as follows:
int _tmain(int argc, _TCHAR* argv)
int a=1, b =0;
We get a linker error :
error LNK2001: unresolved external symbol "__declspec(dllimport) void __cdecl testFunc(void)" (__imp_?testFunc@@YAXXZ)
Now the reason we get this linker error is because the name mangling performed by the compiler (for various reasons like function overloading in C++) , takes into account the calling convention used. We can do a dumpbin on the lib file and compare names as follows:
> dumpbin /all "Stack Sample.lib"> StackSample.txtArchive member name at 6DE: /0 Stack Sample.dll
DLL name : Stack Sample.dll
Symbol name : ?testFunc@@YGXXZ (void __stdcall testFunc(void))
Name : ?testFunc@@YGXXZ
We were lucky this time; we got a linker error. This shows that name mangling depends on calling convention used. If the function call is resolved at compile time (or to be more precise, if type information for a function to be called is available), the compiler and linker work together to make sure that you are calling the correct function, as was the case before. There are some cases in which it is not possible and so neither of them would be able to detect a mismatch, and two such common scenarios would be
1) Calling a managed function from native code by passing a delegate as function pointer to native code from within a managed function.
While we are doing so, make sure that you declare the function pointer as stdcall, as all managed functions are always stdcall.
2) Using GetProcAddress to call a function. We need to make sure that the type of function pointer matches the type of called function.
The call instruction pushes the IP register’s value on stack, and the ret instruction pops the value from stack and moves it into the IP register. There can be two types of mismatch in theory:
1) If the caller assumes that called function is cdecl and cleans the stack, but called function is actually stdcall, the stack gets cleaned up twice, resulting in cleaning up of stored value of IP. And thus IP will get populated with some wrong address. If you are lucky, this will resulting in an access violation when you try to access that address. Otherwise, it will jump to that address, which can be any random value, and thus your application goes into undefined behavior.
2) If, conversely, no stack cleanup occurs, then also IP will be populated with a wrong value (which actually will be your pushed parameters) and again will result in AV or unexpected behavior.
If your application is crashing and you suspect it to be stack corruption. The best way to go ahead will be to run your application under debugger , set a breakpoint at the function in which we are crashing and single-step from there. On single -stepping if you find that you are getting an Acess violation or STATUS_ILLEGAL_INSTRUCTION exception(these two are not the only two that you can get ) and the breaking instruction is just after a return from function call , first thing you must check is if there is a calling convention mismatch.