New stuff in Profiling API for upcoming CLR 4.0
Now that we've finally announced at PDC many of the new features coming up in the next major release of Visual Studio and CLR, I can elaborate some on what's coming up for the profiling API. Also, see Rick Byers's blog entry which also talks about debugging improvements.
What the CLR will do for you
Our upcoming profiling API-specific features are inspired by a vision to improve production troubleshooting, and happily such features will improve the developer desktop experience as well.
Attach / detach
We will now allow profilers to attach to and detach from managed processes that are already running. You no longer need to set environment variables and load the profiler when the managed app starts up. (However, if you would like your profiler to load when the process starts up, you would still use the existing activation mechanism with environment variables.) Attach / detach works with only a limited set of scenarios--see below.
You initiate the attach from a separate executable that we call the "trigger process". If you already have a shell for your profiler, then your shell will typically serve as the trigger process. We will provide you an API that your trigger process will call, which takes parameters that describe the target app to profile and details about your profiler. That API will cause the target app to load your profiler into the target app's process space using the same code that we currently use to load your profiler from startup.
When your profiler is ready to detach from the process, it calls a method right on ICorProfilerInfo3 (the new Info interface for CLR 4.0). That will cause the CLR to stop issuing profiling callbacks, and slip a bit until the profiler is provably off of all the threads' call stacks. The profiler will then be unloaded from the process space, and another profiler may be attached again if the end user wishes.
Due to the nature of the profiling APIs, only a subset of the APIs will be available to profilers that attach to a live process, as opposed to those profilers that load on startup. Specifically, attaching profilers will be able to use the APIs that enable sampling and memory profiling. This includes such operations as walking the stack, mapping instruction pointers to managed methods and their metadata, receiving most GC callbacks, and inspecting statics and object instances on the heap, along with their type information.
APIs that will not be supported for attaching profilers include the ObjectAllocated callback and APIs that enable instrumentation (IL rewriting) of methods. Those scenarios require the ability to "rejit" a method that has already been JITted or loaded from an NGENd module, and unfortunately we are unable to provide that functionality in CLR 4.
"Still no rejit?! Are you kidding me?"
I wish I were. Given customer demand, I personally rate rejit as more important than most of the other profiling API features we're doing combined. However, rejit is very expensive in terms of development and testing. After doing the math, it became apparent that even if we cut most of the other new profiling API features for CLR 4.0, rejit would still not fit. But given the high demand, we're certainly still looking to deliver rejit in a future release of the CLR, just not 4.0.
An obstacle to getting profilers installed into production data centers is that operations managers distrust "impactful" installations that modify machine state. So we're providing a way where you no longer need to register your profiler when it's installed or used. (I'm talking here about the COM registration that uses the Windows registry to map your profiler's CLSID to the full path to the profiler's DLL.) It also turns out that, even in developer desktop scenarios, relying on the registry can be a common cause of failures. For example, perhaps your profiler tries to regsvr32 itself under HKLM, but the user does not have administrative privileges. Or maybe the user does have administrative privileges, but is using Vista in non-elevated mode. So registry-free activation should help with all of those scenarios. Note that registry-free activation is optional. Your profiler may continue to use traditional COM registration if you like.
Profiler Backward Compatibility
CLR 2.0 saw significant enough changes from CLR 1.1 that we refused to load 1.1 profilers into CLR 2.0. However, CLR 4.0 is compatible enough with CLR 2.x that we will allow 2.x profilers to load into CLR 4.0 applications. This behavior would not be the default, however. If end users try to load their 2.x profiler into a CLR 4.0 application, the load will fail, and they will see an event log entry telling them either to upgrade their profiler, or to set a special environment variable to explicitly allow the older profiler to load. By forcing the end users to opt in to this behavior, we set the expectation that it is not guaranteed or tested that 2.x profilers will still work, though we believe it is likely they will work in many scenarios.
We have made some enhancements to the Enter/Leave/Tailcall interface to cut down on overhead when your profiler does not care about getting parameter or return value information.
Other random stuff
You will also find several minor enhancements and bug fixes to the profiling API. Not worth listing them all out here, but the forthcoming documentation on ICorProfilerCallback3 / ICorProfilerInfo3 will describe them.
What you must do for the CLR
Here are some of your responsibilities for playing nicely with CLR 4.0 applications.
In-process side-by-side CLR instances
Probably the biggest impact to your profiler as you upgrade it to CLR 4.0 will be supporting in-process side-by-side CLR instances. This is actually a CLR-wide feature for 4.0 (not profiling API), but it has impact on profiling API tools. Certain scenarios will now result in multiple instances of the CLR loaded into a single process, primarily to support backward-compatibility for managed components that load into a host. (Imagine one old (2.x) CLR instance alongside a new (4.0) CLR instance in the same process.) From the profiler’s point of view, it will be loaded multiple times, once per CLR instance. This means your DLL gets LoadLibrary’d multiple times and you’ll receive multiple “CreateInstance” calls to your class factory object, to generate multiple instances of your ICorProfilerCallback implementation. You can deal with this by:
- Returning failure from all but one of your CreateInstance() calls. This allows you to “pick” which CLR instance you wish to interact with. OR
- Succeeding many or all of your CreateInstance() calls. This allows you to examine multiple CLRs simultaneously.
Pick 1 / Pick first: With this first approach, your profiler will choose to return success from only one CreateInstance() call. "Pick 1" implies you allow your user to specify "which" CLR to profile, usually specified in terms of the version number of the CLR of interest. "Pick First" implies you don't even ask your user--you just simplistically succeed the first CreateInstance() call, and fail the rest. First CLR wins. The advantages of these approaches are they are fairly easy to implement, and either one qualifies your profiler as being "side-by-side aware".
Pick many / Pick all: With this approach, your profiler collects data on multiple CLRs, with the intent of presenting that data to the user in some unified way. You will have to be careful to manage multiple instances of your ICorProfilerCallback implementation and probably eliminate much of your global state. For example, if you call into an ICorProfilerInfo from one CLR with IDs (e.g., AppDomainID) of the other CLR, that will likely cause an AV. Many profilers are implemented with a global pointer to the "one and only" instance of their ICorProfilerCallback implementation. This would no longer work, as you will now have multiple instances of your ICorProfilerCallback implementation, and each one must keep track of the corresponding ICorProfilerInfo interface to call into. There will be enhancements to the profiling API to make this management easier, most notably improvements to the Enter/Leave/Tailcall and FunctionIDMapper interfaces.
I cannot stress the following enough, so I will state it twice. When you update your profiler to work with CLR 4.0, you must update your profiler to become side-by-side aware. This means you must do some amount of work, even if it is the simple "pick first" approach. The CLR determines whether your profiler is "updated for CLR 4.0" by QI'ing for the new ICorProfilerCallback3 defined in the CLR 4.0 corprofl.IDL file. If your profiler successfully returns a pointer to your ICorProfilerCallback3 implementation, then your profiler is considered a 4.0 profiler. So to restate: If your profiler provides an ICorProfilerCallback3 implementation, then your profiler must be side-by-side aware. The reason for this rule is that the CLR puts certain safeguards in place to protect older (2.x) profilers when they might load into scenarios that involve in-process side-by-side CLR instances. If you claim your profiler is updated for 4.0, those safeguards are lifted, and you really don't want that to happen unless you're side-by-side aware.
If you're curious to learn more about this "in-process side-by-side CLR instances" feature, unfortunately the blogs and documentation are still pretty thin for the moment (though I imagine that will change in the coming months). You can take a look at the PDC talk on CLR futures, which discussed this feature at a high level. Go to the PDC 2008 site, and find the session called "PC49 Microsoft .NET Framework: CLR Futures".
Profiler Backward Compatibility
Not much extra to state here, but just to be explicit, you of course have a choice. If you have a profiler that works just fine against CLR 2.0, you may either update it to work with CLR 4.0, or not update it. If you choose to update it, that means you must implement ICorProfilerCallback3 and provide that implementation to the CLR when the CLR QI's for ICorProfilerCallback3. And, due to the contract stated above, you must also ensure your profiler is side-by-side aware (pick 1, pick first, pick many, pick all, it's up to you). The alternative is, don't update your profiler! You will miss out on the new profiling API features listed above. And your profiler may also not work well in scenarios that load in-process side-by-side CLR instances. But maybe you don't care, or maybe you just want a temporary stopgap for your users until you've had the time to update and test your profiler for CLR 4.0. Just remember that CLR 4.0 will not activate your 2.x profiler by default. You will need to tell your users about the special environment variable mentioned above to get your 2.0 profiler to load into CLR 4.0. Since this is all new stuff, the environment variable has not yet been documented at the time of this blog entry, but you can expect more info on it in MSDN when we release, and possibly info on this blog sooner.
I hope you've found this overview useful. Since this is new and not documented yet, there's not much you can do to start preparing for CLR 4.0 yet. However, I'd recommend you take a look through your code and see what it would take to become side-by-side aware.