Basic Instincts

Naming and Building Assemblies in Visual Basic .NET

Ted Pattison

Contents

What's in a Name?
Building a Strongly Named Assembly
Assembly Signing and Tamper Protection
Wrap-up

This month's installment of Basic Instincts will begin a multi-part series focused on working with assemblies. There is quite a bit you need to know about how assemblies are built, deployed, and versioned. This month I will address several essential details about how to build an assembly with the exact name you want. In subsequent months, I'll discuss your options for deploying assemblies on a production machine and revising assemblies that have already been put into production.

What's in a Name?

Let's start with the basics. Each assembly has a four-part name consisting of the following elements: friendly name, version number, culture setting, and public key (or public key token). When I discuss how the display name works, keep in mind that you should not try to piece together your own.

The first part of an assembly's name is its friendly name, which is simply the file name without the extension. For example, the friendly name of an assembly with a file name of MyLibrary.dll is MyLibrary. The friendly name for a system-supplied assembly file such as System.Data.dll is System.Data.

The second part of an assembly's name is its version number, which consists of a string with four dot-delimited numbers in the form "1.0.24.0." These four numbers represent the major version, minor version, build number, and revision number, respectively.

You can build an assembly with a specific version number by adding this code to the top of one of your project source files:

Imports System.Reflection

<Assembly: AssemblyVersion("1.0.24.0")>

Keep in mind that this behavior is only guaranteed in Visual Basic® .NET. Other compilers may vary.

This example demonstrates how to apply an assembly-level attribute named AssemblyVersion. Note that this example also imports the System.Reflection namespace since that's the place where the AssemblyVersion attribute is defined. Remember that when you apply an assembly-level attribute, it must be defined in a source file after any Imports statements but before any type definitions. Also note that you should apply the AssemblyVersion attribute in one and only one source file per project.

It is important to note that every assembly has a version number. This is true even when you compile an assembly without explicitly assigning it a version number. For example, what happens when you compile an assembly without using the AssemblyVersion attribute? The Visual Basic .NET compiler automatically assigns the assembly a version number of 0.0.0.0.

The third part of an assembly's name is its culture setting, a two-letter code indicating that an assembly has been localized for a particular spoken language. A culture setting can also carry an optional country code. For example, a culture setting of en-US indicates that an assembly has been localized for English in the United States. Most of the assemblies that you will produce and consume will not have a culture setting because cultures are typically only used for resource-only assemblies, or satellite assemblies.

The last part of an assembly's name is defined by a public key, which is a unique value that maps to a particular company or developer. No two companies should ever use the same public key. Therefore, an assembly with a public key should be unique and distinguishable from any other assembly produced by another company. This is true even in the case in which an assembly produced by another company has an identical friendly name, version number, and culture setting.

For example, imagine that two companies produced an assembly named MyLibrary.dll and that both assemblies are culture-neutral and have the same version number. The common language runtime (CLR) is still able to tell each of these assemblies apart as long as each company has compiled their build of MyLibrary.dll with a different public key.

A public key occupies 128 bytes. The value for a public key is often stored in a compact binary format in a key file. It can also be stored or written using a text-based hexadecimal format that looks like Figure 1. Because a public key value is so large, a smaller 8-byte value known as the public key token is usually used in its place. The public key token is usually represented by a 16-character hexadecimal string value that looks like this:

29989D7A39ACF230

Figure 1 Public Key Value in Hex Format

00 24 00 00 04 80 00 00 94 00 00 00 06 02 00 00
00 24 00 00 52 53 41 31 00 04 00 00 01 00 01 00
AD 6A 11 52 AA AD FE 79 3F 6C 54 6F D1 7F F8 CB
C8 D8 34 40 B0 4C E8 03 0A F0 B2 E8 39 52 4D E2
69 1D A6 B8 18 11 33 A9 68 EA A6 7B BB B1 BD 5C
7E 97 47 90 62 F3 9B 15 6E 17 05 79 F5 53 DB 16
E7 7F 4B E6 A9 C0 DB 21 A4 78 28 5D 77 1F 19 3C
7B E1 D7 89 30 12 E3 3A 33 4A 3E A3 1F 07 38 AB
60 5A D7 38 A2 59 5E 6F 96 CF 9E FF D4 AD AF 66
2A 0F 8F FC AE E0 26 D8 C1 EA 0F 0E 6E 99 6F A5

The CLR creates public key tokens from public key values using a special internal function. There is a one-to-one correspondence between a public key value and a public key token. In other words, for a specific public key value, there is exactly one corresponding public key token and vice versa. This is very important because it allows a public key token to identify a public key value which, in turn, allows a public key token to identify the company that has produced an assembly containing a particular public key value.

While humans would be hard pressed to read and write 128-byte public key values, it's obviously much easier to read and write public key tokens. Public key tokens also conserve space. For this reason, the compiler uses the public key token instead of the public key value when adding a reference to track a dependent assembly.

You might find yourself writing a configuration file or writing code in which you must fully qualify an assembly by name. You can specify all four parts of an assembly name using an assembly format string. Here's an example of a format string with a fully qualified assembly name:

MyLibrary,
Version=1.0.24.0,
Culture=neutral,
PublicKeyToken=29989D7A39ACF230

I have used line breaks in this example to make the format string more readable. However, you should realize that there are times when you must write the entire format string in a single line. This is often the case when you are adding a format string to a configuration file or passing it as a parameter to a command-line utility.

As you can see, the first part of a format string holds the assembly's friendly name. The other three parts are stored as name-value pairs delimited by commas. This example of a format string involves an explicit setting for a friendly name, a version number, and a public key token. Also note that the culture setting in this assembly name has been explicitly qualified as neutral.

What should you do when you want to create a format string for an assembly that does not have a public key value? You create one that explicitly states that the public key token has a value of null. Then the fully qualified format string would look like this:

MyLibrary,
Version=1.0.24.0,
Culture=neutral,
PublicKeyToken=null

Building a Strongly Named Assembly

While a public key serves to give your assembly a name that is unique, its purpose goes much further than that. The CLR relies on public-key technology to verify the identity of the company that produced the assembly and to prevent assembly tampering. These two goals are achieved by complementing the public key with a digital signature. An assembly that has a public key and a digital signature is said to have a strong name.

In order to build an assembly with a strong name, you must first acquire a pair of keys. One is a public key whose value will be written into the physical image of assembly files. The other is a private key that is used to generate digital signatures. The process of using the private key in order to write a digital signature to the image of an assembly file is known as signing.

Let's step through the process of generating a public/private key pair and building an assembly with a strong name. This will show you the actual steps that are required in the build process. After you have seen the build mechanics, I'll discuss how the strong name helps to identify the developer and to prevent tampering.

The first thing to note is that you cannot generate a key pair using Visual Studio® .NET. Instead, you must step outside of Visual Studio .NET and use SN.EXE (The .NET Framework Strong Name Utility), a command-line utility supplied by the .NET Framework SDK. To perform various tasks in SN.EXE, you pass command-line switches. Keep in mind that the switches used by SN.EXE are case sensitive. To see a list of the supported switches, simply run SN.EXE and pass it the question mark as a switch:

SN.EXE -?

When you want to create a new key file that contains a new public/private key pair, you should call SN.EXE from the command line, passing the -k switch and the name of the target key file. If you want to generate a key file named AcmeCorp.snk, you would issue the following instruction from the command line:

SN.EXE -k AcmeCorp.snk

While it's possible to generate a new public/private key pair for each assembly project, you can use a single public/private key pair for several different assembly projects. It's common for a company to use the same public/private key pair on a company-wide or department-wide basis. Just make sure that each assembly using the same public/private key pair has a unique friendly name.

Once you've generated (or acquired) a key file with a public/private key pair, you are ready to build an assembly with a strong name. First, make sure you have placed the key file in your project directory. Next, add the following code to the top of one of the source files in your project:

Imports System.Reflection

<Assembly: AssemblyVersion("1.0.24.0")>
<Assembly: AssemblyKeyFile("..\..\AcmeCorp.snk")>

The previous example demonstrates how to apply the assembly-level attribute named AssemblyKeyFile. This attribute is quite similar to the AssemblyVersion attribute in the sense that it is defined in the System.Reflection namespace. You might have noticed that the path to the key file starts with two dots and a backslash followed by two more dots and another backslash:

   ..\..\AcmeCorp.snk

This is the path you should typically use when you have placed a key file such as AcmeCorp.snk in the project directory that holds other projects files such as the .vbproj file and your .vb source files. The dots and backslashes indicate that this path is going to be relative to the place where the Visual Basic .NET compiler is initially building the assembly.

By default, Visual Studio .NET configures a new Visual Basic .NET project to use a subdirectory of the project directory such as \obj\Debug or \obj\Release. Using a path of ..\..\AcmeCorp.snk simply tells the Visual Basic .NET compiler that it should move up two directories from where it is now so it can locate the key file during compilation.

Figure 2 Strongly Named Assembly

Figure 2** Strongly Named Assembly **

When you build a project that contains the AssemblyKeyFile attribute, the Visual Basic .NET compiler builds the output assembly with a strong name. In order to do this, the compiler must be able to read the values for the public key and the private key from the key file during compilation. The compiler records the full 128-byte value into the manifest of the assembly file it is outputting. The compiler uses the private key to generate a digital signature that is appended to the end of the output assembly file. The resulting layout of the assembly file is shown in Figure 2.

Assembly Signing and Tamper Protection

Let's take an in-depth look at the .NET Framework scheme to protect against tampering. First I'll discuss digital signatures, which are generated using cryptography. In particular, the .NET Framework relies on two industry-standard forms of cryptography known as one-way hash functions and asymmetric encryption.

A one-way hash function (also known as a hash function) allows you to create a digital fingerprint for a large digital object. You feed a large digital object such as a public key or a file image to a hash function and, in turn, the hash function calculates a much smaller piece of data known as a hash value. The cryptography involved provides two important guarantees. First, the hash function always produces an identical hash value for any given digital object. Second, it is unlikely that the hash function will generate an identical hash value for two different digital objects. This is true even when two really large digital objects differ by only a single bit.

Earlier I mentioned that the CLR uses a special internal function to generate an 8-byte public key token from a 128-byte public key value. It does this by calling a hash function. This is how the CLR and compilers are able to produce identical public key tokens any time they hash the same public key value. This is also how the CLR guarantees that no two public key values will ever hash to the same public key token.

Apart from creating public key tokens, a hash function also plays an important role by generating a digital signature for an assembly with a strong name. However, in such cases the hash function does not generate a hash value from a public key value. Instead, it creates a hash value from the physical image of the assembly file itself.

The hash value of an assembly file serves as its digital fingerprint for a particular build. The idea is that you can send two different assembly files to the same hash function to determine whether their physical layouts are identical. If these two assembly files generate hash values that are equal, you know one assembly file is an exact copy of the other.

The final step in generating a digital signature from the hash value of an assembly file involves asymmetric encryption. Asymmetric encryption is a type of cryptography in which data is encrypted and decrypted using public/private key pairs. The public key and the private key complement each other because one key is used to encrypt data while the other is used to decrypt data. In the case of signing an assembly, the private key is used to generate the digital signature by encrypting the hash value of an assembly file. At some later time, the public key can be used to decrypt the digital signature and retrieve the hash value of that assembly file.

Now that you understand how all the pieces fit together, let's walk through the sequence of events that occur during compilation. The compiler must make two passes to build an assembly with a strong name. In the first pass, the compiler generates the image for the assembly file containing its manifest, its type information, and its executable code in the form of intermediate language (IL). However, the compiler cannot generate the digital signature during this first pass so it just leaves some blank space at the end of the assembly file to act as a placeholder.

In the second pass, the compiler generates a hash value from the physical image of the assembly file that was built during the first pass. Once the compiler has generated this hash value, it encrypts it with the private key to generate the digital signature. Finally, the compiler writes the digital signature into the placeholder at the end of the assembly file, as shown in Figure 2.

Now let's discuss how the CLR verifies the authenticity of an assembly's digital signature at a point in time after it has been distributed by the company that produced it. The CLR uses the public key value within the assembly manifest to decrypt the digital signature and retrieve the hash value of the assembly image that was present during compilation. Next, the CLR uses the same hash function to generate a hash value from the physical image of the current assembly file. The CLR has now generated two different hash values with which it can conduct an important comparison.

The CLR compares the two hash values. If these two hash values are equal, the CLR considers the test a success because it is sure of two things—namely, that the digital signature was generated by someone in possession of the private key and that the physical image of the assembly file present during verification exactly matches the physical image of the assembly file originally signed. If the CLR determines that the two hash values are not equal, it considers the verification test a failure. The CLR knows that either the producer of the assembly did not possess the correct private key or that the assembly's image has been modified after it had been signed unless MyLibrary.dll is in the global assembly cache (GAC). In that case it was verified when it was placed in the GAC, and not reverified at run time.

Let me walk through a typical scenario using MyApp.exe and MyLibrary.dll so you can get a better sense of how the CLR uses a verification check at run time to prevent tampering. Imagine you have built MyLibrary.dll with a strong name and, after that, you have compiled MyApp.exe with a reference to MyLibrary.dll. It is now impossible for someone without your private key to change the behavior of the application on a production machine by replacing the real version of MyLibrary.dll with a tampered version.

Remember that the assembly manifest for MyApp.exe contains a reference with the public key token associated with the same public key compiled into MyLibrary.dll. Therefore, you have the guarantee that the CLR will only load a version of MyLibrary.dll that meets two important criteria. The first is that the assembly file for MyLibrary.dll contains a public key value that matches the public key token in the assembly manifest of MyApp.exe. The second is that MyLibrary.dll has been signed by someone in possession of your private key.

Now consider what happens when a user runs MyApp.exe and it executes code that makes the first call to MyLibrary.dll. At this point, the CLR will attempt to load MyLibrary.dll. However, the CLR knows that MyLibrary.dll has a strong name, so it runs a verification check as the first part of the loading process. This verification check allows the CLR to detect if someone without access to the private key has made changes to the physical image of the assembly file.

Imagine that there is a bad guy who wants to tamper with MyLibrary.dll. Perhaps this bad guy wants to change the behavior of one or more methods by modifying the IL within the assembly file. Ultimately, this bad guy plans to replace your build of MyLibrary.dll on a production machine with a tampered version so that he can change the behavior of MyApp.exe when it runs.

Here's how the verification scheme thwarts the bad guy. For any specific physical layout of the assembly file MyLibrary.dll, there is only one possible digital signature that is valid. Therefore, any changes to an assembly's physical image after it has been signed renders the existing digital signature invalid. If this bad guy so changes the bits of the physical image of MyLibrary.dll after it has been signed, the CLR can detect tampering. A change to the physical image of the assembly will require the assembly file to be signed again. However, the bad guy cannot sign the assembly because he does not have the private key.

Note that this verification scheme doesn't really prevent assembly tampering. It only prevents tampering from going undetected. When the CLR attempts to load a strongly named assembly with a digital signature that doesn't match the physical layout of the assembly file, it fails the load attempt by throwing a System.IO.FileLoadException. In most cases, this means that the hosting application is not going to be able to do its work. However, this scenario is preferable to running the application with evil code that's been altered by the bad guy.

A critical point about tamper detection is that the entire scheme is based on the premise that the private key doesn't fall into the wrong hands. Any individual who acquires your private key can build an assembly with your public key value and a valid digital signature. This makes it possible for the bad guys to defeat the verification checks used by the CLR. If your company plans to distribute strongly named assemblies, you must carefully plan how to manage your private keys so they are not compromised.

Now that you have seen how the CLR protects against tampering, I would like to mention an important restriction when building an assembly project that has a strong name. You can only reference other assemblies that also have strong names. If you attempt to build an assembly project with a strong name while referencing an assembly without a strong name, you will receive a compile-time error.

The reason for this is to ensure the reliability of tamper protection. When you reference a strongly named assembly, the CLR prevents tampering from going undetected by running verification checks against the dependent assembly's digital signature. The scheme would not be as reliable if your application depended on a strongly named assembly which, in turn, depended on another assembly that did not have a strong name. The bad guy could replace the assembly that did not have the strong name and the CLR would not be able to detect this. Thus, the CLR enforces this restriction so you are guaranteed that a strongly named assembly depends on only other strongly named assemblies.

Wrap-up

This month's column has focused on what goes into building and naming an assembly. As you have seen, each assembly has a four-part name that consists of its friendly name, version number, culture setting, and public key token. You have also learned how to construct an assembly format string that will allow you to add a fully-qualified assembly name to a configuration file.

It is important that you know how to build an assembly with a specific version number. You do this by adding the AssemblyVersion attribute to your source code. You also have seen how to build an assembly with a strong name by adding the AssemblyKeyFile attribute to your source code and supplying the compiler with a public/private key pair. I will resume this discussion in the next installment of this column.

Send your questions and comments for Ted to  instinct@microsoft.com.

Ted Pattison is an instructor and course writer at DevelopMentor (https://www.develop.com). He has written several books about Visual Basic and COM. He is currently writing a book titled Building Applications and Components with Visual Basic .NET to be published by Addison-Wesley in the fall of 2003.