May 2009

Volume 24 Number 05

CLR Inside Out - Understanding The CLR Binder

By Aarthi Ramamurthy | May 2009

Contents

Always Use Fully Specified Assembly Names
Avoid Partial Binds
Use Fusion Log Viewer
Understand the Context
When All Else Fails, AssemblyResolve!
Know When to Use the GAC

The CLR Binder is responsible for locating necessary assemblies at run time and binding to them, so it's an important piece of .NET code. To ensure that binding is working efficiently and correctly, there are a few best practices you should follow, and we'll present them here. Some practices are simple, yet crucial. Proper assembly naming is one such task, and we'll tackle it first.

Always Use Fully Specified Assembly Names

Since the distinction between assembly and file names is often a source of confusion, let's take a closer look. A filename is the name of a file in the filesystem (such as System.dll). An assembly name, on the other hand, is a name given to an assembly to establish its unique identity. In managed code, assemblies provide identity to the code that resides in it. Two assemblies representing the same identity can have the same name (and different versions, signatures, and so forth). A filesystem is simply one of the locations from which to load assemblies—assemblies can also, for example, be loaded from byte arrays. It's best to keep the file name the same as the assembly name. While the most obvious reason is convenience, it is also because assemblies are mostly loaded by assembly names and keeping the names consistent makes it easier for the loader to find the assembly. A fully qualified assembly identity consists of four fields: the simple name of the assembly, the version, the culture, and the public key token.

One way to make effective use of the Binder and its various features is to avoid partial binds (unless you really know what you're doing). A partial bind occurs when the user specified only part of the assembly identity. For example, let's assume that the user tries to load an assembly whose simple name is MySampleAssembly as follows:

Assembly.Load("MySampleAssembly");

In this case, the user failed to specify the other three fields that are a part of the assembly's identity. This is a partial bind.

Another instance is when the user loads the assembly using Assembly.LoadWithPartialName(). Assembly.LoadWithPartialName also uses partial binding. It exists for an extremely specific scenario and should not be used for normal binding. This is why it is marked obsolete in current versions of the framework.

Avoid Partial Binds

Partial binds are a problem because they can lead to nondeterministic Binder behavior, since the Binder does not have the complete information to load the correct assembly. In the case of LoadWithPartialName(), the Binder simply tries Load() and if that fails, it then picks up the highest version of the assembly in the GAC. This may not be the version that is compatible with the current application. If a servicing update of a different application installs a higher version of this assembly in the GAC, one with which the current application is not compatible, LoadWithPartialName() will choose this newer assembly and load it, potentially breaking the current application. Also, not specifying important attributes such as the public key token can lead to binding and loading the incorrect assembly since there is no guarantee that the assembly was provided by the expected publisher.

So, if partial binds are bad, you're probably wondering why they are supported in the CLR in the first place. Well, while LoadWithPartialName() is deprecated in CLR 2.0, loading an assembly with partially specified reference is still supported. Partial binds can be advantageous if their purpose is well understood, and they are used judiciously. At the very least, you should provide the public key token of the assembly to load. Otherwise, this process may return an assembly from a completely different publisher.

Also, if you need to load a particular assembly multiple times or do not wish to hard-code assembly versions into your strings, you can specify the fully qualified assembly name as a part of the <qualifyAssembly> element in the application configuration file and then specify only a partial reference for the assembly in Assembly.Load(). This keeps the code simple, and at the same time ensures that the desired assembly gets loaded. It is to be noted that in such cases each application should have its own application configuration file containing the <qualifyAssembly> element.

In general, it is always safer to specify the fully specified reference of the desired assembly to ensure that binds are predictable.

Use Fusion Log Viewer

There are times when assembly binding fails, typically with an exception (usually, FileNotFoundException, FileLoadException, or BadImageFormatException). Often, a little insight into the internals of the Binder can help you debug the issue at hand.

The Microsoft .NET Framework SDK includes a tool called Assembly Binding Log Viewer (fuslogvw.exe), often referred to as Fusion Log Viewer. This tool logs specific Binding steps in .html files that can be viewed using the Fusion Log Viewer's user interface.

Figure 1 contains a code snippet that demonstrates loading an assembly that does not actually exist. When this code is executed, a FileNotFoundException is thrown. If the user has turned on the ability to log binding failures, the Fusion Log Viewer logs the failure.

Figure 1 Some Misbehaving Code

using System; using System.Reflection; class FusLogSample { public static void Main() { try { Assembly.Load("TheNonExistentAssembly"); } catch (Exception e) { Console.WriteLine(e.ToString()); } } }

By default, logging to disk is turned off, since logging is generally expensive and causes the running application to take a performance hit. To debug the bind failure, click on the Settings button and then turn on "Log bind failures to disk." To turn on logging for all binds, select "Log all binds to disk."

Clicking on the highlighted text in the resulting dialog box, leads you to the actual log itself (as shown in Figure 2). The first two lines tell you that the attempt to load the assembly failed and provides the HRESULT. The next two lines indicate where the CLR was loaded from (the specific directory) and the name of the executable that caused the assembly load to be initiated.

Figure 2 Fusion Log Generated While Executing Code in Figure 1

*** Assembly Binder Log Entry (2/23/2009 @ 12:29:19 PM)** * The operation failed. Bind result: hr = 0x80070002. The system cannot find the file specified. Assembly manager loaded from: C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll Running under executable \\TKZAW-PRO-15\MYDOCS5\aarthir\My Documents\Visual Studio 2008\ Projects\Sample\Sample\bin\Debug\Sample.vshost.exe --- A detailed error log follows. === Pre-bind state information === LOG: User = REDMOND\aarthir LOG: DisplayName = TheNonExistantAssembly (Partial) LOG: Appbase = file://TKZAW-PRO-15/MYDOCS5/aarthir/My Documents/Visual Studio 2008/Projects/Sample/Sample/bin/Debug/ LOG: Initial PrivatePath = NULL LOG: Dynamic Base = NULL LOG: Cache Base = NULL LOG: AppName = NULL Calling assembly : Sample, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null. === LOG: This bind starts in default load context. LOG: No application configuration file found. LOG: Using machine configuration file from C:\Windows\Microsoft.NET\Framework\v2.0.50727\config\machine.config. LOG: Policy not being applied to reference at this time (private, custom, partial, or location-based assembly bind). LOG: Attempting download of new URL file://TKZAW-PRO-15/MYDOCS5/aarthir/My Documents/Visual Studio 2008/Projects/Sample/Sample/bin/Debug/TheNonExistantAssembly.DLL. LOG: Attempting download of new URL file://TKZAW-PRO-15/MYDOCS5/aarthir/My Documents/Visual Studio 2008/Projects/Sample/Sample/bin/Debug/TheNonExistantAssembly/TheNonExistantAssembly.DLL. LOG: Attempting download of new URL file://TKZAW-PRO-15/MYDOCS5/aarthir/My Documents/Visual Studio 2008/Projects/Sample/Sample/bin/Debug/TheNonExistantAssembly.EXE. LOG: Attempting download of new URL file://TKZAW-PRO-15/MYDOCS5/aarthir/My Documents/Visual Studio 2008/Projects/Sample/Sample/bin/Debug/TheNonExistantAssembly/TheNonExistantAssembly.EXE. LOG: All probing URLs attempted and failed.

Under the "Pre-bind state information" section, the first line indicates the user name under which the code was executed. The next line provides the identity of the assembly. Note that the log indicates "partial" because the reference was only partially specified.

The next line shows the ApplicationBase directory (the directory under which the executable is present) for the application domain.

The next four fields (Private Path, Dynamic Base, App Name, and Cache Base) are application domain-specific properties that affect binding in different ways. For example, the privatePath attribute specifies subdirectories of the ApplicationBase directory to search while loading assemblies.

The next line gives you the name of the parent assembly from which Assembly.Load() was initiated. The following line specifies the loader context where this bind began (you'll read more on loader contexts in the next section).

The next three lines deal with policy files—policy files such as application configuration, publisher policy files, and machine configuration files are different means to configure assembly binding behavior (you can read more about these configuration files on MSDN, "Step 1: Examining the Configuration Files)."

The final lines show that the Binder looks for the desired assembly in different subdirectories (within the AppBase). In the end, the Binder declares that the assembly could not be found. Thus, the Fusion log provides information useful for debugging bind failures.

In the .NET Framework 4.0, the Binder explicitly warns about partial binds. Figure 3 shows a snippet from a Fusion log covering partial binds.

Figure 3 Partial Binds in the Log

*** Assembly Binder Log Entry (1/7/2009 @ 10:17:05 PM)** * The operation was successful. Bind result: hr = 0x0. The operation completed successfully. Assembly manager loaded from: C:\Windows\Microsoft.NET\Framework64\v4.0.AMD64chk\clr.dll Running under executable E:\tests\LoggingDCR\PartialNames\LoadPartial.exe --- A detailed error log follows. === Pre-bind state information === LOG: User = SampleUser LOG: DisplayName = Lib (Partial) WRN: Partial binding information was supplied for an assembly. WRN: A partial bind occurs when only part of the assembly display name is provided. WRN: This might result in the binder loading an incorrect assembly. WRN: It is recommended to provide a fully specified textual identity for the assembly, WRN: that consists of the simple name, version, culture, and public key token. WRN: See whitepaper https://go.microsoft.com/fwlink/?LinkId=109270 for more information and common solutions to this issue. WRN: Detected case of partial bind: WRN: Assembly Name: Lib | Domain ID: 1

The warnings here indicate that partial bind information was specified for the assembly. The Binder also warns about loading the same assembly into multiple contexts, as shown below. We'll cover loader contexts later.

WRN: The same assembly was loaded into multiple contexts of an application domain. WRN: This might lead to runtime failures. WRN: It is recommended to inspect your application on whether this is intentional or not.

Overall, fusion logs are a very valuable resource, both to debug bind failures and to understand how the runtime locates and loads assemblies.

Understand the Context

No article on the Binder is complete without addressing loader contexts and the reason for their existence. Loader contexts are often the source of confusion. Think of loader contexts as logical buckets within an application domain that hold assemblies. Depending on how the assemblies were being loaded, they fall into one of three loader contexts.

Load context To put it simply, all assemblies that are present either in the GAC, or in the ApplicationBase, or in the PrivateBinPath under the ApplicationBase, that are loaded using Assembly.Load will be loaded in the Load context. Assemblies resolved using the AssemblyResolve event also fall in this category.

LoadFrom context If you are attempting to load an assembly by providing a specific path that is outside the ApplicationBase, and the assembly would not have been found in the Load context, then the assembly is loaded in the LoadFrom context.

Neither context If you are attempting to load an assembly using Assembly.LoadFile(), Assembly.Load(byte[]), or Reflection.Emit, those assemblies are loaded into the Neither context.

In the case of assemblies loaded into the LoadFrom context, the Binder first checks to see if the exact assembly (same identity and location) is already present in the Load context. If it is, it discards the assembly information in the LoadFrom context and uses the assembly information from the Load context. In determining whether it is the same assembly, the location information is important, and we'll cover this shortly. In .NET Framework 1.1, this was known as LoadFrom's second bind, since the Binder used to perform two steps—first to place the assembly in the LoadFrom context, and then promote it over to the Load context if it found a matching assembly identity and location in the Load context.

Make sure that the assembly is loaded into the Load context as much as possible. For this, the assembly should be locatable from the GAC, the ApplicationBase, or the PrivateBinPath of the AppDomain. Assemblies loaded into this context automatically get benefits of NGen and the assembly's dependencies present in this context are automatically picked up.

Loading assemblies into the LoadFrom context has its own advantages—it allows multiple assemblies outside the ApplicationBase to be loaded by specifying their paths.

Now, let's talk about the location of the assembly, while identifying if the assembly loaded via LoadFrom() is the same as the Assembly loaded via Load(). Even if the types in two assemblies are identical, if the two assemblies are loaded from different paths, they are not considered identical as far as loader contexts are concerned. This leads to situations where the same assembly is loaded repeatedly in the same application domain, but into different contexts (Load and LoadFrom) and a type in the assembly in the Load context will not be allowed to be the same type in the LoadFrom context (even if they are the same assemblies as far as the assembly identities are concerned). This is one of the disadvantages of LoadFrom. Also, assemblies in the LoadFrom context do not automatically reap the benefits of NGen.

As for the Neither context, assemblies in this context cannot be bound to, unless the application subscribes to the AssemblyResolve event. This context should generally be avoided.

So why does the CLR have loader contexts in the first place? Loader contexts help ensure load-order independence while loading assemblies. In addition, they provide a measure of isolation to assemblies and their dependencies when they are loaded into different contexts.

When All Else Fails, AssemblyResolve!

The CLR follows a series of steps (described in the article "How the Runtime Locates Assemblies") to locate and bind to the desired assembly. When the assembly cannot be located at the end of all of these steps, the Binder raises an AssemblyResolve event. In order to load an assembly that could not be resolved earlier, it is possible to subscribe to the AssemblyResolve event.

In .NET Framework 4.0, the CLR is extending the AssemblyResolve event to indicate which parent assembly or RequestingAssembly was causing the load of the dependent assembly. This is very useful in cases where an assembly has a reference to another assembly, and an AssemblyResolve Event occurs for the referenced assembly. As of .NET Framework 3.5, there was no means to determine the identity of the parent assembly (or the referencing assembly).

This means that when the AssemblyResolve event is fired, apart from the current assembly (which could not be located by the Binder by normal probing means), the parent assembly is also provided so that the user knows which assembly caused the load event and the event handler can now make use of the parent assembly that is passed. This makes it easier to leverage the Neither context to provide binding isolation when needed.

The RequestingAssembly field is now another member exposed by ResolveEventArgs.

Know When to Use the GAC

The Global Assembly Cache (known as the GAC) is a machine-wide repository of managed assemblies. The primary goal of the GAC is to enable sharing of assemblies across several managed applications installed on a machine. For example, when writing add-ins you can simply place the common code of the add-ins into one assembly and place this assembly in the GAC. The add-in developer can now ensure that all the add-ins (written by him) share this assembly, instead of having to redeploy the shared component every time a new add-in is installed. This also provides the benefit of central servicing—the add-in developer can now service the common assembly present in the GAC without having to re-deploy servicing updates individually for all add-ins. This is not to say that only add-in developers can make use of the GAC—all managed developers can install assemblies to the GAC and share them across applications.

So when should an assembly be installed in the GAC as opposed to leaving the assembly as a part of the application (within the ApplicationBase)? If you have any assembly that must be shared across multiple applications and hence need to be centrally serviced, you should consider placing them in the GAC. Shared frameworks and components typically fall under this category.

When shouldn't an assembly be placed in the GAC? If you need the application to be xcopy deployable (deployed using the xcopy command to copy the directory containing the application) on different machines, placing the assembly in the GAC is probably not the best idea. In such scenarios, assemblies in the GAC will also need to be moved across machines.

In any case, the Binder follows the "GAC always wins" policy. This might play a role in deciding whether or not an assembly needs to be placed in the GAC.

A short digression: If you aren't familiar already, GACUtil is a tool used to install, uninstall, and enumerate assemblies to and from the GAC. This tool ships as a part of the .NET Framework SDK. You can read more about GACUtil online.

In .NET Framework 4.0, the GAC went through a few changes. The concept of placing assemblies into a global directory began in CLR v1.1. In case of .NET Framework 1.1 (which had CLR v1.1) and .NET Framework 2.0 (which had CLR 2.0), the GAC was split into two, one for each CLR. This avoided the leaking of assemblies across CLR versions. For example, if both .NET 1.1 and .NET 2.0 shared the same GAC, then a .NET 1.1 application, loading an assembly from this shared GAC, could get .NET 2.0 assemblies, thereby breaking the .NET 1.1 application.

The CLR version used for both .NET Framework 2.0 and .NET Framework 3.5 is CLR 2.0. As a result of this, there was no need in the previous two framework releases to split the GAC. The problem of breaking older (in this case, .NET 2.0) applications resurfaces in Net Framework 4.0 at which point CLR 4.0 released. Hence, to avoid interference issues between CLR 2.0 and CLR 4.0, the GAC is now split into private GACs for each runtime.

Tools such as GACUtil and Shfusion will behave exactly the same as they did in pre–.NET Framework 4.0 scenarios. Also, the behavior of publisher policy (explicitly mentioned, since it is required to install these policy files to the GAC) will not change. The main change is that CLR v2.0 applications now cannot see CLR v4.0 assemblies in the GAC.

To conclude, we covered some of the ways in which you can get the best out of the CLR Binder, while also looking at some of the improvements made in the Binder for .NET Framework 4.0.

Send your questions and comments to clrinout@microsoft.com.

Aarthi Ramamurthy is a Program Manager for CLR at Microsoft and primarily works on the the assembly binding and loading aspects of the runtime. She can be reached at aarthi@microsoft.com.

Mark Miller is a Software Development Engineer in Test for the CLR team and works on many of the unmanaged areas of the framework. He can be reached at markmil@microsoft.com.