Call stack gymnastics: Why is function Foo missing from my call stack?
Sometimes you expect to see a specific function on the stacks collected by the Concurrency Visualizer but to your surprise it is not there. To understand why this can sometimes happen, we have to start with some basic information:
- Event Tracing for Windows (ETW) is the system wide tracing mechanism shipped with Windows that is used by the Concurrency Visualizer to collect data for a profiling session.
- ETW has the capability of collecting stacks for user and kernel events.
- The process used by ETW to walk the stacks must be efficient in order to minimize the performance disruption of the scenario being profiled.
On X86 machines, ETW relies on a chain of stack frame pointers to quickly reconstruct the stacks. ETW assumes that the stack-frame base pointer (EBP) register points to the head of a linked list of return addresses. The return addresses are the functions currently on the application stack.
However, there is an old optimization, Frame Pointer Omission (FPO), which breaks the assumption that there is a nice chain of callers starting on the EBP register. When this optimization is turned on, the compiler does not generate code to update the EBP value on the function entry and allows, if desired, the use of the EBP as a regular processor register. The downside of this optimization is that it becomes necessary to have access to the PDB of all callers using FPO to correctly walk the stack (or manually by dissembling the code of each function on the stack). This big inconvenience, plus the fact that the benefit of this optimization was becoming marginal (as processors were getting more powerful), lead the Windows team to build the OS without this optimization and many other groups in Microsoft followed suit.
The Microsoft C/C++ compiler has the /Oy- switch to disable this optimization (and this is now the default on Visual Studio). However, it is hard to be 100% immune to FPO: whenever you link to a library (dynamically or statically) your code is subject to any FPO optimization contained in that library. Thus, even with /Oy- in place you may still get some function calls without the stack frame if optimizations are enabled elsewhere. This happens when the compiler does not need to create any local or temporary variables for a given function (for example: it may use CPU registers for your locals); in this case the disable optimization switch just ensures that EBP is not used as a general purpose register. This may lead to some unexpected effects on the stacks displayed in the reports of the Concurrency Visualizer:
- There is a single instruction pointer on the stack event; in this case a function with FPO was on top of the stack.
- The stack is somehow incomplete: stopping before the expected caller, omitting expected functions in the middle of the stack, and (rarely) even showing calls that are not supposed to be there. (This can happen when EBP is being used as a general register by the FPO function and its value is a valid stack address.)
Windows can perform some thread hijacking in which the application thread is used to execute some Deferred Procedure Call (DPC) code but whenever you see many lost, unexpected, or incomplete stacks consider if the application is being compiled with FPO or if the code path in question is using some library built with FPO. For C/C++ projects in Visual Studio, the optimization settings can be controlled from the project properties under "Configuration Properties | C/C++ | Optimization". So, choose your optimization settings carefully, understand how other libraries can affect your call stacks, and have a great time profiling your application.
Paulo Janotti – Parallel Computing Platform