From the December 2001 issue of MSDN Magazine.

MSDN Magazine

The .NET Profiling API and the DNProfiler Tool
Matt Pietrek
Browse the code for this article at Code Center: DNProfiler

T

he Microsoft® .NET Common Language Runtime (CLR) does many things under the hood to create an easy to use, object-oriented platform. The list includes garbage collection, standardized cross-language exception handling, a comprehensive class library, metadata, interoperability with existing native code, and remoting. It also includes a cross-CPU instruction format (Intermediate Language, or IL), and a JIT compiler to turn the IL into code executable by the target CPU.
      As systems evolve and grow more complex, the ability to peer into their inner workings becomes increasingly valuable. In Windows®, seeing into the operations of mechanisms such as the executable loader or the memory manager revealed a hodgepodge of various techniques without much consistency between them. In addition, something that worked on the Windows 9x platform may not have worked on Windows NT® and Windows 2000, or vice versa. The most evolved mechanism for seeing the inner workings of a process under Win32® was the debug API, and even that consisted of only eight notifications.
      With .NET, the story is radically different. Because the CLR runtime is so central to any .NET program, it provides a logical place to insert hooks to see the guts of .NET. Anticipating the needs of tools developers and system-level programmers, Microsoft used the clean slate of .NET to create a very detailed and consistent way to observe the inner workings of a process (and in some cases, change the way it operates). This month, I'll describe this mechanism, as well as present a program (DNProfiler) for logging the operations of the .NET runtime.

The .NET Profiling API

      One means by which you can observe the .NET runtime in action is by using the profiling API. The profiling API is badly named, since it's useful for all sorts of things beyond just profiling. Consider the list of .NET actions that can be seen via the profiling API, shown in Figure 1.
      As Keanu Reeves might say, Whoooa! After years of figuring out how to hook into Windows the hard way, Microsoft has gone and made it downright easy (relatively speaking) to observe the CLR as it goes about its business. Every action from the list is seen as a distinct callback. In theory, just write the desired code in the appropriate callback and you're good to go.
      Currently, the best documentation for using the profiling API is the Profiling.doc file in the .NET SDK, in the Tool Developers Guide\docs directory. For the Beta 2 time frame, this file is slightly out of sync with the actual implementation. In this column, I won't exhaustively cover every aspect of the profiling API, but will instead concentrate on the big picture of what it can do.
      One of the main selling points of .NET is that it moves away from the need to use COM. It's ironic, therefore, that the .NET profiling API is COM-based. Using the profiling API involves creating an in-process COM server that implements a single interface. Each interface method represents a single event from one of the categories shown in Figure 1. Inside these callback methods, the code can use another COM interface (provided by the CLR) to retrieve information about the observed events.
      Personally, I would have preferred a simpler API where you register callback functions for each event of interest. Implementing the API as a COM server forces you to include the near-boilerplate code to make the COM registry gods happy. Since the profiling interface is COM-based, the odds are high that users of the profiling API will be coding in C++.
      While the need to use classic COM rather than managed .NET code might seem strange at first, it makes sense after some thought. If the interface were to be implemented using managed code, it would have side effects on the very things that it's supposed to be monitoring. For example, what if a managed code implementation of the profiling API triggered an exception? You'd see exception events that wouldn't have occurred if the API wasn't active.
      Once you have a COM server that implements the required profiling interface, the next task is to coerce the CLR into loading the COM server and invoking the methods. This trick is accomplished by setting environment variables! Mucking around with environment variables does seem so old school compared to the registry or XML files.
      Why not use the registry (or configuration files) to indicate to the CLR that the profiling API should be invoked? One reason that I heard from a developer at Microsoft was that requiring each .NET process to check the registry at startup would be a performance hit. Given all the trillions of other things that a .NET process does at startup, this reason seems circumspect. Nonetheless, this solution has the advantage of being global to all programs (if you set it in the system environment), or specific to a process. In the latter case, a tool such as a debugger or profiler can specify the environment of the target process when it's launched. Later on I'll go over the specific environment variables that need to be set.

The .NET Profiling API Interfaces

      The primary interface I'll be examining is ICorProfilerCallback, which is defined in CorProf.IDL from the .NET Framework SDK's Include directory. An implementation of this interface is required to use the profiling API. Your job is to supply the implementation. While there are dozens of methods, never fear. For the vast majority of them, you can simply return S_OK or E_NOTIMPL. To take specific action for a specific event (such as a method being JITed), write the desired code in the appropriate callback method.
      Examining the callback methods, you'll see that most of them take parameters giving additional information about the event. For instance, the JITCompilationStarted method takes a UINT parameter named functionId. What can you do with this? The answer lies in another interface, ICorProfilerInfo.
      ICorProfilerInfo is practically one-stop shopping for any sort of information from the profiling API. Given the functionId just mentioned, you can call ICorProfilerInfo:: GetTokenAndMetaDataFromFunction, which gives you back a metadata interface instance and the metadata token of the function. Using the metadata interface and the token, you can look up the function's name, its class, and practically anything else you'd want to know.
      In short, the profiling API consists primarily of two COM interfaces. The incoming interface, which you implement, is ICorProfilerCallback. Each method in this interface is called when a CLR event occurs. The outgoing interface, ICorProfilerInfo, is provided for you by the common language runtime, and is for you to use while inside event callbacks.
      A common pattern in the ICorProfilerCallback parameters is the extensive use of IDs to represent functions, classes, modules, assemblies, and so on. ID values are opaque handles. Meaningful information about them is extracted via the ICorProfilerInfo interface. Also, certain ID values (such as a FunctionId) can change during the execution of the program if the function's code is unloaded and then re-JITted. There are callbacks to let you know when this happens, however.
      The ICorProfilerCallback methods break down into logical groupings. In most cases, events (such as assembly loads) have both a Started and Finished method, and are called in pairs. Let's go through them now and get a better idea of what you can observe within the CLR. Unless stated otherwise, all methods listed in the next paragraphs belong to the ICorProfilerCallback interface.

Initialize/Shutdown Methods

      Initialize is the first method called when a .NET process uses the profiling API. Your code gets its ICorProfilerInfo pointer from this method. The single parameter to Initialize is an LPUNKNOWN, on which you call QueryInterface to get the ICorProfilerInfo pointer. In addition, Initialize is where you tell the profiling API which events you're interested in.
      To indicate interest in events call ICorProfilerInfo::SetEventMask, passing in a DWORD with the appropriate bits set. The flag values are from the COR_PROF_MONITOR enum in CorProf.H. The lower value flags are named COR_PRF_MONITOR_XXX, and tell the CLR which ICorProfilerCallback methods to invoke. For example, if you want the ClassLoadStarted method to fire, you must set the COR_PRF_MONITOR_CLASS_LOADS flag.
      The remaining flags to ICorProfilerInfo::SetEventMask alter the behavior of the CLR in one way or another. For example, the COR_PRF_ENABLE_OBJECT_ALLOCATED flag must be set if you want to monitor object allocations at any time during execution. Similarly, the COR_PRF_DISABLE_INLINING tells the CLR not to inline any methods. If a method is inlined, you won't receive method enter and leave callbacks for it.
      You can modify which events are monitored later on by calling ICorProfilerInfo::SetEventMask again. However, certain events are immutable, meaning that once they're set in Initialize, they can't be changed later.
      The Shutdown method fires when the CLR is shutting down the process. In certain cases it may not fire, but for a normal .NET program with a normal lifetime, it should fire.

Application Domain Creation/Shutdown

      The methods in this category are AppDomainCreationStarted, AppDomainCreationFinished, AppDomainShutdownStarted, and AppDomainShutdownFinished. Their names are self-explanatory. The primary token for these methods is the AppDomainID. It's worth noting that the token passed to AppDomainCreationStarted can't be used during the callback because the AppDomain doesn't exist yet. However, upon receiving the AppDomainCreationFinished callback, you can pass the token to ICorProfilerInfo::GetAppDomainInfo to get information specific to the new AppDomain.

Assembly Loads/Unloads

      AssemblyLoadStarted, AssemblyLoadFinished, AssemblyUnloadStarted, and AssemblyUnloadFinished are methods that are involved in loading and unloading assemblies. The primary token for these methods is the AssemblyID. Note that the token passed to AssemblyLoadStarted can't be used during the callback. That's because the Assembly doesn't exist yet. However, upon receiving the AssemblyLoadFinished callback, you can pass the token to ICorProfilerInfo::GetAssemblyInfo to get information specific to the new assembly.

Module Loads/Unloads

      The methods specific to loading modules are ModuleLoadStarted, ModuleLoadFinished, ModuleUnloadStarted, ModuleUnloadFinished, and ModuleAttachedToAssembly. The first four methods are self-explanatory. The primary token for these methods is the ModuleID. Like the AssemblyLoadStarted token, the token passed to ModuleLoadStarted can't be used during the callback. That's because the module doesn't exist yet. But when you receive the ModuleLoadFinished callback, you can pass the token to ICorProfilerInfo::GetModuleInfo to get information specific to the new module.
      The last method, ModuleAttachedToAssembly, occurs when the CLR associates a module (a DLL in Win32 terminology) with a specific assembly. While a module and an assembly are often the same file, it's possible to have assemblies with multiple modules.

Class Loads/Unloads

      The methods concerned with loading classes are ClassLoadStarted, ClassLoadFinished, ClassUnloadStarted, and ClassUnloadFinished. The method names are self-explanatory. The primary token for these methods is the ClassID. The token passed to ClassLoadStarted can't be used during the callback because the class doesn't exist yet. However, upon receiving the ClassLoadFinished callback, you can pass the token to ICorProfilerInfo::GetClassIDInfo to get information specific to the new class.

JIT Compilation

      The primary token in use for the JIT compilation methods shown in Figure 2 is the FunctionID. Here, the terms function and method can be used interchangeably. The FunctionID can be passed to ICorProfilerInfo::GetFunctionInfo to retrieve information specific to the function.
      The JITCompilationStarted method is interesting, as it gives you the ability to view and modify the IL before it's JITed. See ICorProfilerInfo:: GetILFunctionBody for details, and don't expect it to be easy! The JITCachedFunctionSearchStarted method indicates when the CLR is searching for an existing copy of the method that's already been JITed to native code. You can force the runtime to use a newly JITed version of the method regardless of its pre-JIT status by setting the pbUseCachedFunction output parameter to FALSE.
      JITFunctionPitched indicates that a previously JITed method has been removed from memory. This doesn't normally happen except in low memory situations. Finally, the JITInlining method indicates that the JITer is about to inline a method inside another method. If you're counting on seeing the enter/leave notifications for that method, you can disable the inlining by setting the pfShouldInline output parameter to FALSE. It's also possible to disable inlining process-wide via a flag passed to ICorProfilerInfo::SetEventMask.

Threading

      The threading methods can be seen in Figure 3. The ThreadCreated/Destroyed methods bracket the lifetime of a CLR thread. A CLR thread isn't exactly the same as a Win32 thread. For example, conceptually a CLR thread can execute on top of more than one Win32 thread during its lifetime. The ThreadAssignedToOSThread indicates which Win32 thread is currently underpinning the CLR thread.
      As the CLR executes, it's necessary to suspend some or all of the threads to perform actions such as garbage collection. The RuntimeSuspend and RuntimeResume set of methods indicate that all CLR threads are being suspended. (It's actually more complicated than that, but I won't get into the details here.) A parameter to RuntimeSuspendStarted indicates why the suspension is occurring. Finally, the two RuntimeThread methods indicate that a specific thread is being suspended, and seem to always be bracketed by RuntimeSuspend events.

COM Interop

      When the CLR interfaces with classic COM, it does so via proxy interfaces. These two methods—COMClassicVTableCreated and COMClassicVTableDestroyed—indicate when the proxies are created and destroyed. The information conveyed for the creation method includes the .NET ClassID, the corresponding COM interface IID, a pointer to the proxy's vtable, and the number of vtable entries.

Managed/Unmanaged Code Transitions

      UnmanagedToManagedTransition and ManagedToUnmanagedTransition are called when managed code calls into unmanaged native code, and vice versa. Both methods are passed a FunctionID representing the callee, which can be passed to ICorProfilerInfo::GetFunctionInfo for more information.

Garbage Collection and Managed Heap Objects

      Although there is no actual garbage collection method, the firing of the events ObjectAllocated, MovedReferences, ObjectsAllocatedByClass, ObjectReferences, and RootReferences indicates that a garbage collection is in process. The information conveyed by these notifications is sufficiently complex that I won't attempt to explain it all in this space. Instead, I'll just highlight the key points.
      The ObjectAllocated method (which requires the COR_PRF_ENABLE_OBJECT_ALLOCATED flag to be set) is called whenever a new object is allocated from the managed heap. The object is identified by an ObjectID token and the ClassID is also given. The class ID can be passed to ICorProfilerInfo::GetClassIDInfo for additional details on the class.
      MovedReferences tells you that a particular object (specified by an ObjectID) has been moved in memory. ObjectsAllocatedByClass tells you which classes have had instances created since the last garbage collection. ObjectReferences provides a complete list of objects referenced by a given object. It's called for every outstanding object. Finally, RootReferences is a list of all the objects that are root references.

Remoting Activity

      The remoting methods shown in Figure 4 detail the various points of execution that occur during remoting calls. The RemotingClient methods fire when the program is acting as a client, while the RemotingServer methods fire when the program is acting as a remoting server. In both cases (client and server), a distinction is made between the invocation of the remote method, and the actual message transmission that carries the request/reply.

Exception Handling

      The profiling API's support for exception handling is rather complex and quite complete. I'll defer to the documentation for all the details. In a nutshell, however, the API gives you before and after notifications for each phase of the exception handling sequence. Figure 5 shows the exception methods.
      The ExceptionThrown method is the first indication you receive of an exception. The ObjectID parameter can be passed to m_pICorProfilerInfo::GetClassFromObject to get the type of the exception. For every managed method on the stack, these methods tell you when the CLR's exception code searches for a try block in the method, executes a filter function in the method, or finds a method to handle the exception. It also tells you when the exception code unwinds a method, executes a finally block while unwinding, or executes the handler code.

Receiving Method Entry and Exit Notifications

      One of the coolest features of the profiling API is the ability to be told when any managed method begins executions, and when it is about to return to its caller. This is possible because .NET methods need to be JITed before they're used. If you tell the profiling API that you're interested in method enters and leaves, it can communicate that fact to the JITer. The JITer, in turn, inserts calls at the beginning and end of the method to call special functions that you define. There's also a tailcall notification which is conceptually similar to an optimized function leave. I've never seen one generated in .NET, however.
      Unlike all the other events I've described, the method enter/leave callbacks don't belong to the ICorProfilerCallback interface. Rather, you use ICorProfilerCallback and ICorProfilerInfo to set up things so that the JITed methods will call the functions you supply. Just to make things interesting, the callbacks have a restriction that they cannot modify any registers. As such, it's highly likely that you'll write part of the code in assembly language, and use a __declspec naked function (an example is shown in Figure 6).
      A single parameter (a FunctionID) is passed to the enter/leave callbacks. The documentation warns of dire consequences if your callbacks block or spend a significant amount of time before returning. However, in my experience, I've not had problems in using ICorProfilerInfo to look up the name of the FunctionID and write it to a file.
      With a little effort, you can use the enter/leave capabilities to create a decent spy program that shows you every method as it executes. If you do this, you'll be astounded at how many methods are called just to start up a trivial .NET program. Because there's both enter and leave callbacks, you can see the nesting of method calls within other methods.
      It's important to note that the .NET runtime can make use of methods that have been pre-JITed. Unless you take special steps, these methods won't have the requisite callouts, so you won't see them with these callouts. In some cases, this can be fixed by monitoring JITCachedFunctionSearchStarted and returning FALSE in the pbUseCachedFunction parameter.

Caveats with the Profiling API

      There are a few important restrictions to know about if you intend to implement something using the profiling API. First, the COM server is assumed to be free-threaded. The .NET runtime doesn't guarantee any sort of synchronization in calling the ICorProfilerCallback methods. If you're using global data, you'll probably want to guard it with critical sections where necessary. In other cases, you might want to use thread local storage, as I did in the DNProfiler code that is shown in Figure 7.
      Another restriction is that inside the ICorProfilerCallback methods, you can't invoke any managed code, directly or indirectly. The profiling API code isn't intended to be reentrant, and calling managed code, while tempting, would definitely put you in that boat.
      The profiling API provides a way to get to the ICorDebug interface. This allows you to get at certain details such as the call stack. However, when you're executing in-process, not all of the ICorDebug methods are usable. Check the ICorDebug documentation for the list of what's usable in-process and what's not.
      Finally, the profiling API is so cool that I can imagine lots of tools wanting to use it simultaneously. Unfortunately, only a single profiling COM server can be active at one time. If you're truly after timing information, you'd only want one profiler to be active. However, for the sake of non-profiling tools, it would be nice if Microsoft could come up with a solution where well-behaved tools could play nicely together and chain along events to one another.

The DNProfiler Sample

      To show off the coolness of the profiling API, I wrote a simple implementation that logs every callback method. For some of the methods, the code performs extra work to make the event more meaningful. For instance, when ClassLoadFinished fires, DNProfiler takes the ClassID parameter and writes out the name of the class.
      DNProfiler doesn't exhaustively show every bit of information available from every callback, but it does a pretty respectable job of showing you what's going on. For this column, I haven't implemented the function enter/leave callbacks. However, the Microsoft profiling samples show how to do this.
      DNProfiler writes its results to a text file called DNProfiler.out in the same directory as the parent executable. A snippet of the startup sequence as seen by DNProfiler is shown in Figure 8. Notice in the figure how DNProfiler shows event nesting via indentation. In this specific case, between the AssemblyLoadStarted and AssemblyLoadFinished events, DNProfiler received three other events, which are indented appropriately.
      DNProfiler is just a COM server DLL, so there's no EXE to run. Instead, you'll have to either build it yourself, or run REGSVR32 on the supplied DNProfiler.DLL. Once registered, the simplest way to use DNProfiler is to use a console window and run the Profiling_on.bat file in Figure 9. Once run, all .NET programs started from that console will use DNProfiler. To set things back to normal, run the corresponding Profiling_off.bat file.
      Profiling_On.bat sets three environment variables, two of which are used by the CLR. At process startup time, when the Cor_Enable_Profiling environment variable is set to a non-zero value, the CLR attempts to load the COM server specified in the COR_PROFILER environment variable. The COR_PROFILER environment variable can be either a CLSID or a ProgID.
      The third environment variable, DN_PROFILER_MASK, is specific to DNProfiler. It allows you to dynamically configure which events you're interested in, rather than hardcoding the values into the source. In DNProfiler's Initialize method, it obtains the value of DN_PROFILER_MASK, and passes it to ICorProfilerInfo::SetEventMask. In the .bat file, I have the mask set to include just about everything, but I encourage you to experiment with other values.
      Figure 7 is a snippet of the main source file for DNProfiler. (You should download the full source from the MSDN® Magazine Web site to follow along at home.) Near the beginning is CCorProfiler::Initialize method, which is the focal point for setting things up. The Initialize method uses a private helper method, GetInitializationParameters, to read the environment variables and determine the complete file name of the output file.
      The majority of the functions in ProfilerCallback.CPP are simple implementations of ICorProfilerCallback methods. If a method is a Started method, the code usually calls the ChangeNestingLevel helper function. This updates a per-thread variable that keeps track of how many levels deep the callbacks are nested. When the corresponding Finished method is invoked, the code calls ChangeNestingLevel again to restore the original nesting value.
      Towards the end of the file are a couple of private methods that do commonly required things, like looking up a class name from a ClassID. The ProfilerPrintf function is essentially the same as fprintf, but it takes into account the per-thread nesting level, and emits the appropriate number of indentation spaces on each line before actual text.
      ProfilerCallback.H (available in the download) is the header file that describes the CProfilerCallback class. Note how the CProfilerCallback class derives from the ICorProfilerBack interface. At the bottom of the class definitions are the private methods and member variables needed for DNProfiler's implementation.
      The last source file is COMStuff.CPP (also in the download). In creating DNProfiler, I wanted to keep things as simple as possible, without dragging in ATL, MFC, or anything else not absolutely necessary. Yeah, I'm a framework Luddite, so shoot me. In any event, the Microsoft sample .NET profilers take a similar approach. I stole just the relevant portions of their code, stripped it down even further, and the result is COMStuff.CPP.

Interop Fun with DNProfiler

      It's easy to spend quite a bit of time studying the DNProfiler output, learning the sequence of events in a .NET program. I won't go through all the interesting things to be learned here. There is one situation that bears mentioning though. The .NET Windows Forms package is naturally built atop the windowing APIs in USER32.DLL. As such, you might expect to see a lot of transitions between managed code and unmanaged code when window messages are sent, for instance.
      This is indeed the case. What's surprising (to me at least) is just how many levels of recursion there can be between managed and unmanaged code. By that, I mean that a managed method will call an unmanaged method. The unmanaged method in turn calls back into managed code. Inside the managed code, it needs to call an unmanaged method, and so on. Eventually, everything unwinds back to the original level, of course.
      Check out Figure 10. It shows the abbreviated results of a window receiving the message to destroy itself. In managed code, the UnsafeNativeMethods::DispatchMessageW method is invoked. It transitions to the unmanaged WndProc::Invoke method. Inside WndProc::Invoke, the code needs to call the managed UnsafeNativeMethods::CallWindowProc. This routine in turn needs to call the unmanaged WndProc::Invoke. This ping-ponging between managed and unmanaged code drills deeper for several more layers before unwinding back to the point where UnsafeNativeMethods::DispatchMessageW returns.
      The profiling interface provided by Microsoft for .NET is one of the best things to come along for tool developers in a long time. It's very comprehensive, fairly flexible, and I think that a lot of .NET tools (besides profilers) will be based on this API. Here, I've given a reasonable overview of what it's capable of, but be sure to check out the documentation and Microsoft sample profilers for even more information.

Send questions and comments for Matt to hood@microsoft.com.
Matt Pietrek is an independent writer, consultant, and trainer. He was the lead architect for Compuware/NuMega's Bounds Checker product line for eight years and has authored three books on Windows system programming. His Web site, at http://www.wheaty.net, has code updates, a FAQ page, and information on previous columns and articles.