|Displaying Metadata in .NET EXEs with MetaViewer|
|Browse the code for this article at Code Center: MetaViewer
s I write this, the Microsoft® .NET initiative is still relatively new (Beta 1 is just out), and early adopters are still finding new nooks and crannies to explore. For myself, many facets of .NET metadata are particularly interesting, and metadata is a natural entry point to understanding many other areas of .NET. Metadata is the information used by the .NET common language runtime (CLR) to describe everything about classes, functions, properties, resources, and other items in an executable file. This month I'll describe my MetaViewer program, which displays the metadata contained with a .NET executable file.
Metadata OverviewIf you've done much COM programming, you're probably somewhat familiar with IDL and type libraries. These related technologies describe an interface and its methods. For each method, information is stored about each parameter, including its type. All of this information is necessary for COM Automation and other duties, such as marshaling parameters between COM components in different processes.
You can think of .NET metadata as IDL and type libraries on steroids. Metadata is much more complete and accurate than IDL. More importantly, metadata isn't optional. The .NET CLR absolutely depends on metadata to know what assemblies your code uses, what your methods look like, how your classes are laid out in memory, what resources are available in your executable file, and many other purposes. Before describing my MetaViewer code, I'll do a moderately high-level view of metadata, as seen through the .NET reflection classes.
At the top of the hierarchy of information stored in metadata is the System.Reflection.Assembly class. An Assembly object corresponds to one or more DLLs that make up a .NET assembly. The Assembly class contains a lot of information including: the list of modules (DLLs) that make up the assembly, the assembly's version information, which other assemblies this assembly depends on, and what resources (bitmaps, and so on) can be found within the assembly.
Beneath the Assembly class is the System.Reflection.Module class. A Module represents a single DLL. Currently, most assemblies consist of a single module, but you shouldn't assume a one-to-one correlation between an Assembly and a Module. Besides containing information about a specific DLL, the Module class is the container for types within the module. For the moment, you can consider a type to correspond to a .NET class defined in whatever language you're using.
The System.Type class represents a .NET type. Each Type instance represents one of three possible items: a class definition, an interface definition, or a value class (usually a structure).
Continuing down the metadata hierarchy, each Type class instance has a collection of members. A System.Reflection.MemberInfo class instance represents each member. A member can be one of the following:
You can access information about all members of a Type with the MemberInfo class. However, each MemberInfo instance is really a base class for one of the more derived types listed previously. Thus, if a particular MemberInfo instance represents a field, you can cast it to a FieldInfo to access information specific to fields.
For the MethodInfo and ConstructorInfo classes, there's one more level to the metadata. Methods and constructors may have parameters, and these parameters are represented by the System.Reflection.ParameterInfo class. For a given parameter, you can get its type (as a System.Type), and in most cases, the parameter's name.
For the sake of completeness, let's now jump back to the System.Reflection.Module class. Besides containing all the System.Type instances, a Module object also contains all methods and fields declared at module-level, meaning outside of any class definition. You'll most commonly see this in C++ with Managed Extensions with functions and variables declared at file or global scope (that is, outside of a class definition).
Figure 1 Metadata Hierarchy
Figure 1 shows the .NET reflection metadata hierarchy that I just described. If you'd like a more detailed overview of .NET metadata, be sure to read my article, "Avoiding DLL Hell: Introducing Application Metadata in the Microsoft .NET Framework" in the October 2000 issue of MSDN Magazine.
Diving into Windows FormsMy first experience with Windows Forms was the WinDes tool that comes with the .NET SDK. After electing to build a new Win32®-based form in C#, I added a treeview control, a textbox, an edit control, and a few buttons. Upon saving my form, I was surprised to see that WinDes created a single .CS source file, and no separate file describing the layout of my form.
When I examined the code that was generated for the form, I discovered that all of the control creation was done dynamically in C# code as the form was initializing. Each control is represented as a .NET object instance, and the form contains member variables that reference each control. For example, the following two lines of code create a textbox and a treeview, and initialize the form's member variables for them:
What about the properties of the controls that I tweaked in WinDes? Each property is set by a single C# statement that follows the control creation. For instance, this line sets the text_details textbox to read only:
Certain aspects of programming with Windows Forms are quite different than traditional UI development in Win32 using resource (.RC) files or even Visual Basic® prior to .NET. Both .RC files and older versions of Visual Basic store the contents and properties of forms as structured data, separate from your code for handling events. When a form (aka dialog) is created in the old model, some system code reads the layout data and constructs the form for you. In contrast, Windows Forms doesn't separate the layout of forms from the code that creates and initializes a form. This offers you the flexibility to easily customize the contents and properties of a form as the form is created.
The nice thing about programming with Windows Forms is that if you're used to UI programming with Visual Basic, you're 95 percent ready to use Windows Forms. Instead of working with raw User32 windows, forms and their controls are encapsulated in .NET classes. Typically, there's no need to screw around with window messages and various other assorted low-level machinations.
The only thing I found noticeably different about programming with Windows Forms is how an event handler is set up. The presence of a handler for a specific action is added dynamically. To make an event handler, you create an instance of an event handler class. The event handler constructor takes an argument specifying the function that you want to handle the event. The event handler instance is then attached to the specific control. In C#, this is done with the += operator. For example, the following code shows how I set a method called treeView1_AfterSelect as the handler for the TreeView.AfterSelect event:
So far, everything I've described is fairly simple and fits within regular source code. However, what about things like bitmaps, icons, and so forth? I stumbled across the answer when I added an ImageList to my form so that my TreeView nodes would have cheesy images indicating what a specific node represented. Doing this caused WinDes to create a second file with the .resX extension. Incidentally, adding binary "resources" necessitated creating a System.Resources.ResourceManager class inside of my form's initialization code to get at the compiled resources.
Peeking inside the .resX file, I discovered that it was an XML file. In order to convert it into something usable, you have to run the ResGen program, which takes the XML file and spits out a binary file with the .resource extension. You can then embed the .resource file in the final executable, or leave it as a separate file in the assembly. Using C#, the /res: option puts the .resource file into the executable, while /linkres: indicates that the .resource file should be separate from the executable file.
The MetaViewer ProgramWhen I set out to write MetaViewer, I decided on the following features:
Figure 2 MetaViewer Form
Figure 2 shows the main MetaViewer form. In the figure, MetaViewer is displaying its own metadata. The code for the form is in MetaViewer.cs (see Figure 3). I've cleaned up the code originally generated by WinDes to make it more readable. If you examine the code, you'll see that it's a mixture of Windows Forms-based code and code for accessing metadata through the reflection classes.
The main part of the MetaViewer form contains a treeview of all the types. Each top-level treeview node can be expanded to see the type's members. On the right-hand side is a pane for showing details for the currently selected node. When a type is selected, the details pane shows the namespace that the type comes from, along with the complete derivation hierarchy.
When you expand a type node and highlight one of its members, the details pane continues to show pertinent information. For a method, the pane indicates if the method has the virtual, static, public, private, or PInvoke (Platform Invoke) attributes. It also shows the method's parameters and return type. For a field, the details pane shows the field's type and whether it's static, public, or private. The ShowMemberInfoDetails method in Figure 3 shows how I ascertain and display all this information.
At the bottom of the main MetaViewer form are two buttons and an edit control. To search for a specific type by name, simply enter any part of its name (case-insensitive) into the edit control, and then press the Search button. If a match is found, the found type is highlighted. The search begins from the currently highlighted node, so you can continue to hit the Search button to locate additional types with similar names. I'll be the first to admit that the searching UI isn't as intuitive as you might find in a commercial program, but it's really not bad for the few lines of code that it took to implement.
Figure 4 Assembly Info
Finally, the Assembly Info button brings up a separate form, shown in Figure 4. The source for this form is in AssemblyInfo.cs (see Figure 5). Although metadata stores tons of information for an assembly, I selected just a few of the most important items: the module name, the assemblies that the current assemblies imports, and, finally, the list of files (typically resources) that belong to the current assembly.
Moving back to the main form, one of the nice experiences I had with Windows Forms and .NET programming was the ease of connecting the metadata info to the Windows Forms representation of that data. I simply let the UI classes hold on to my data for me. The key to this is derivation.
When using a Windows Forms TreeView control, you add items by adding TreeNode instances. By declaring my own classes derived from TreeNode, I was able to store my metadata information in the same class as the TreeNode data. You can see an example of this in the MemberInfoNode class declared in MemberNode.cs (see Figure 6). The MemberInfoNode class derives from TreeNode, and adds a field to store a MemberInfo instance.
In addition to the MemberInfoNode, I created a TypeNode class that also derives from TreeNode. The TypeNode represents a metadata type, and has a field for storing the appropriate System.Type instance. When adding a member node to the TreeView, I pass either a TypeNode or a MemberInfoNode, rather than simply passing a TreeNode.
When a TreeView event occurs (for instance, a node is selected), the event handler receives a reference to the selected TreeNode. My code takes the TreeNode and casts it back to a MemberInfoNode or TypeNode so that it can retrieve the appropriate information. How do I know what kind of derived class my event handler was passed? The C# is operator comes in very handy here. The following code shows this in action:
Interesting Problems SolvedOne of first bugs I encountered with MetaViewer was that I was only seeing public members of an assembly's types. Going back to my original ReflectMeta program from my October 2000 article, I saw that it had the exact same problem. I hunted around to find the reason I saw only public types, but nothing obvious sprang to mind. I was within minutes of filing a bug report, but the problem just seemed too glaring to be a bug in the .NET Framework.
Eventually, I looked again at the System.Type GetMember documentation, and noticed that it was an overloaded method. I was calling the simple GetMember method with no parameters. Careful reading of the documentation revealed that this method returns only the public members of a type. If I wanted a complete list of members, I had to call a different GetMember method, one that took a BindingFlags parameter.
The lesson learned: the .NET class library is vast, and uses method overloading extensively. When you're not getting the desired behavior or aren't able to do what you want, consider going back and looking for overloaded methods that you do want. In this way, MFC programmers will probably have a much easier time learning the .NET Framework than their Visual Basic counterparts.
Another interesting problem I encountered occurred when I was implementing the code to search for a type by name. In C++, the strstr function was my friend; in .NET, I couldn't find anything similar. I spent quite a bit of time checking and rechecking the String class methods to no avail. Finally, I came across the RegEx (Regular Expression) class, which is in the System.Text.RegularExpressions namespace.
With a few minutes of studying, I had code iterating through each of the Type nodes, and using the RegEx. IsMatch method to see if the name of the type matched the search string. Unfortunately, the comparison was case-sensitive. Remembering my previous hard-learned lesson about overloaded methods, I went back and found that the RegEx class has a second constructor that lets you specify search options. You can see my final implementation of the search code in the button_search_Click method in Figure 3.
A third problem I encountered was unmanaged value types. Typically, these are classic C++ structures that are used for interop, and aren't allocated from the managed heap. The .NET corhdr.h file says that unmanaged value types are deprecated. However, I still found plenty of cases where they were being used. I wasn't inspired enough to create a separate TreeView bitmap for unmanaged value types, so I used the same bitmap for regular and unmanaged value types.
I had a lot of fun writing MetaViewer, and learned quite a bit about C# and the .NET Framework in the process. Although I'm not abandoning C++ programming just yet, I was pretty impressed about how much I could do with a relatively small amount of code. Hopefully, you'll find it a good starting point for exploring metadata, and creating your own custom metadata browser.
Send questions and comments for Matt to email@example.com.
|Matt Pietrek does advanced research for the NuMega Labs of Compuware Corporation, and is the author of several books. His Web site at http://www.wheaty.net has a FAQ page and information on previous columns and articles.
From the March 2001 issue of MSDN Magazine