CLR Inside Out

Reflections on Reflection

Mike Repass

Code download available at:  CLR Inside Out 2007_06.exe(163 KB)

Contents

Reflection in .NET
Inspecting Types and Members
Invoking Code Dynamically
Putting It Together
Efficient Type Processing for Serialization
Finishing Up

Are your goals of clean componentization frustrated by the need to share too much type information across libraries? Perhaps you want the efficiency of strongly typed data storage, but it’s costly to update your database schema whenever your object model evolves and so you’d like to infer type schema at run time? Do you need to ship components that accept arbitrary user objects and process them in some type-intelligent way? Do you wish callers of your library could describe their types to you programmatically?

If you find yourself struggling to maintain strongly typed data structures, yet still maximize runtime flexibility, then you’ll likely want to consider reflection and how it can improve your software. In this column, I explore the System.Reflection namespace in the Microsoft® .NET Framework and how it can benefit your development experience. I walk through some simple examples and finally tackle the real-world scenario of serialization, in which I show how reflection and CodeDom can work together for efficient processing of runtime data.

Before diving into System.Reflection, I’d like to step back a bit and discuss programmatic reflection in general. To start, reflection can be defined as any functionality offered by a programming system that allows the programmer to inspect and manipulate code entities without knowing their identification or formal structure ahead of time. That’s a mouthful, so I’ll dissect it piece by piece.

First, what does reflection offer? What can you use it for? I prefer to organize typical reflection-centric tasks into two categories: inspection and manipulation. Inspection entails analyzing objects and types to gather structured information about their definition and behavior. Typically this is done with little or no prior knowledge about them, apart from some basic provisions. (For instance, in the .NET Framework, everything inherits from System.Object and an object-typed reference is your typical starting point for reflection.)

Manipulation uses the information gained through inspection to invoke code dynamically, create new instances of discovered types, or even restructure types and objects on the fly. It’s important to note that, for most systems, manipulating types and objects at run time will incur a performance penalty when compared to the equivalent operations done statically in the source code. This is a necessary trade-off given the dynamic nature of reflection, but there are many tricks and best practices available for optimizing your Reflection performance (see msdn.microsoft.com/msdnmag/issues/05/07/Reflection for more in-depth information on optimizing usage of reflection).

Now, what’s the target of reflection? What does the programmer actually inspect or manipulate? In my definition of reflection, I used the novel term "code entities" to highlight the fact that, from a programmer’s perspective, reflection technology can sometimes blur the conventional distinction between objects and types. For instance, a typical reflection-centric task might be:

  1. Start with a handle to an object O and use reflection to acquire a handle to its associated definition, a type T.
  2. Inspect type T and acquire a handle to its method, M.
  3. Invoke method M on another object, O’ (also of type T).

Note I’m traversing from an instance to its underlying type, from that type to a method, and then using a handle to that method to invoke it on a different instance—clearly something that is not possible with conventional C# programming techniques in source code. I’ll revisit this scenario later with a concrete example, once I’ve explored the System.Reflection in the .NET Framework.

Some programming languages offer reflection natively via their syntax whereas other platforms and frameworks (such as the .NET Framework) make it available as a system library. Regardless of how it’s exposed, the potential for using reflection technology in a given scenario is rather complex. A programming system’s ability to offer reflection depends on many factors: did the programmer express his concepts well with the features of the programming language? Does the compiler embed enough structural information (metadata) in the output so as to allow later interpretation? Is there a runtime subsystem or host interpreter available to consume that metadata? Does a platform library expose the results of this interpretation in a manner that’s useful for the programmer?

If you maintain a complex, object-oriented type system in your head, but express your code in simple C-style functions with no formal data structures, then it’s clearly impossible for your program to dynamically infer that the pointer in some variable v1 points to an object instance of some type T. Because, after all, type T is a concept in your head that is never captured in your explicit programming statements. But if you use a more flexible object-oriented language (such as C#) to express the abstract structure of your program and directly introduce the concept of type T, then the compiler will transform your ideas into a form that suitable logic could later understand, as might be provided by the common language runtime (CLR) or a dynamic language interpreter.

Is reflection exclusively a dynamic, runtime technology? Simply put, no it isn’t. There are many times throughout the development and execution cycle when reflection might be available and useful to the developer. Some programming languages are implemented by standalone compilers that transform high-level code directly into machine-consumable instructions. The output file consists solely of the translated input and there’s no support logic available at run time to take an opaque object and dynamically analyze its definition. This is exactly the scenario of many traditional C compilers. Since there’s little support logic available in the target executable, you can’t do much dynamic reflection, but the compiler will frequently offer static reflection—for example, the commonly used typeof operator allows the programmer to check type identity at compile time.

At the opposite end of the spectrum, there are interpreted programming languages that are always executed by a host process (scripting languages often fall into this category). Since the program’s full definition is available (as the input source code) and joined together with the full language implementation (as the interpreter itself), all the necessary technology is present to support self-analysis. Such dynamic languages frequently offer exhaustive reflection functionality, allowing you a rich set of tools for dynamically analyzing and manipulating your program.

The .NET Framework CLR and its hosted languages like C# fall somewhere in the middle. A compiler is used to transform source code into IL and metadata which, though lower level and perhaps less "logical" than the original source, still contain much of the abstract structural and type information. Once the program is launched and hosted by the CLR, the System.Reflection library of the base class library (BCL) can consume this information and return information about an object’s type, a type’s members, a member’s signature, and so on. Furthermore, it can support invocation as well, including late-bound invocation.

Reflection in .NET

To take advantage of reflection when programming with the .NET Framework, you use the System.Reflection namespace. This namespace provides classes which encapsulate many runtime concepts—such as assemblies, modules, types, methods, constructors, fields, and properties. The table in Figure 1 shows how classes in System.Reflection correspond to their conceptual runtime counterparts.

Figure 1 System.Reflection Classes

Language Component Corresponding .NET Class
Assembly System.Reflection.Assembly
Module System.Reflection.Module
Abstract Member System.Reflection.MemberInfo (base class for everything below)
Type System.Type
Property System.Reflection.PropertyInfo
Field System.Reflection.FieldInfo
Event System.Reflection.EventInfo
Abstract Method System.Reflection.MethodBase (base class for everything below)
Method System.Reflection.MethodInfo
Constructor System.Reflection.ConstructorInfo

Though quite important, System.Reflection.Assembly and System.Reflection.Module are primarily useful for locating and loading new code into the runtime. In this column, I leave them behind and assume that all relevant code is already loaded.

For inspecting and manipulating loaded code, the typical pattern focuses on System.Type. Normally, you start by obtaining a System.Type instance for the runtime type of interest (via Object.GetType). Then you use the various methods of System.Type to explore the type’s definition and acquire instances of the other classes inside System.Reflection. For instance, if you’re interested in a particular Method, you’ll want to acquire an instance of System.Reflection.MethodInfo for that Method (possibly via Type.GetMethod). Likewise, if you’re interested in a particular field, you’ll want to acquire an instance of System.Reflection.FieldInfo for that Field (possibly via Type.GetField).

Once all the necessary reflection instance objects are obtained, proceed along the path of inspection or manipulation, as appropriate. For inspection, you use the various descriptive properties on the reflection classes to gain the information you need (Is this a generic type? Is this an instance method?). For manipulation, you can dynamically invoke the execution of methods, create new objects by invoking constructors, and so on.

Inspecting Types and Members

Let’s jump into some code and explore the usage of basic reflection for inspection. I’ll focus on type analysis. Starting with an object, I’ll retrieve its type and then explore some interesting members (see Figure 2).

Figure 2 Retrieving Object Type and Members

Code

 
using System;
using System.Reflection;

// use Reflection to enumerate some basic properties of a type...

namespace Example1
{
    class MyClass
    {
        private int MyField = 0;
        public void MyMethod1() { return; }
        public int MyMethod2(int i) { return i; }
        public int MyProperty { get { return MyField; } }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Reflection Demo Example 1");

            MyClass mc = new MyClass();
            Type t = mc.GetType();
            Console.WriteLine("Type Name: {0}", t.Name);
    
            foreach(MethodInfo m in t.GetMethods())
                Console.WriteLine("Method Name: {0}", m.Name);

            foreach (PropertyInfo p in t.GetProperties())
                Console.WriteLine("Property Name: {0}", p.Name);
        }
    }
}

Output

 
Reflection Demo Example 1
Type Name: MyClass
Method Name: MyMethod1
Method Name: MyMethod2
Method Name: get_MyProperty
Method Name: GetType
Method Name: ToString
Method Name: Equals
Method Name: GetHashCode
Property Name: MyProperty 

The first thing to notice is that I got many more lines describing methods than you might expect upon first glance at the class definition. Where did these extra methods come from? Anyone well versed in the .NET Framework object hierarchy will recognize the methods inherited from the universal base class of Object itself. (In fact, I actually used Object.GetType to retrieve the type in the first place.) Additionally, you can see the getter function for the property. Now, what if you just wanted those functions that are explicitly defined on MyClass itself; in other words, how do you hide the inherited functions? Or perhaps you just want the explicitly defined instance functions?

A quick trip to MSDN® online reveals that you want to use the second overload of GetMethods, which accepts a BindingFlags parameter. By combining various values from the BindingFlags enumeration, you can instruct the function to only return the desired subset of methods. Replace the call to GetMethods with a call to:

GetMethods(BindingFlags.Instance | BindingFlags.DeclaredOnly | 
           BindingFlags.Public)

As a result, you get the following output (notice the absence of the static helper function and the functions inherited from System.Object):

Reflection Demo Example 1
Type Name: MyClass
Method Name: MyMethod1
Method Name: MyMethod2
Method Name: get_MyProperty
Property Name: MyProperty

What if you know the names (fully qualified) of a type and member ahead of time? How do you shift gears from type enumeration to type retrieval? The example in Figure 3 shows how to use string literals that describe type information to retrieve their actual code counterparts via Object.GetType and Type.GetMethod. Taking the code in the first two examples, you have the basic components necessary to implement a primitive class browser. You can discover a runtime entity via name and then enumerate its various relevant properties.

Figure 3 Retrieve Type and MethodInfo via Strings

using System;
using System.Reflection;

// use Reflection to retrieve references to a type and method via name

namespace Example2
{
    class MyClass
    {
        public void MyMethod1() { return; }
        public void MyMethod2() { return; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(“Reflection Demo Example 2”);

            // note that we must use the fully qualified name...
            Type t = Type.GetType(“Example2.MyClass”);
            MethodInfo m = t.GetMethod(“MyMethod1”);

            Console.WriteLine(“Type Name: {0}”, t.Name);
            Console.WriteLine(“Method Name: {0}”, m.Name);
        }
    }
}

Invoking Code Dynamically

So far, I’ve acquired handles to runtime objects (such as types and methods) purely for descriptive purposes, like outputting their names. But how can you do more? How can you actually invoke a method? Figure 4 shows how to acquire a MethodInfo for a member of a type and then use MethodInfo.Invoke to actually call the method dynamically.

Figure 4 Calling a Method Dynamically

using System;
using System.Reflection;

// use Reflection to retrieve a MethodInfo for an 
// instance method and invoke it upon many object instances

namespace Example3
{
    class MyClass
    {
        private int id = -1;

        public MyClass(int id) { this.id = id; }

        public void MyMethod2(object p)
        {
            Console.WriteLine(
                “MyMethod2 is being invoked on object with “ +
                “id {0} with parameter {1}...”, 
                    id.ToString(), p.ToString());
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(“Reflection Demo Example 3”);

            MyClass mc1 = new MyClass(1);
            MyClass mc2 = new MyClass(2);
    
            Type t = mc1.GetType();
            MethodInfo method = t.GetMethod(“MyMethod2”);

            for(int i = 1; i <= 5; i++)
                method.Invoke(mc2, new object[]{i});
        }
    }
}

A few key points are in order for this example. First, you retrieve a System.Type instance from an instance of MyClass, mc1. Next, retrieve a MethodInfo instance from that Type. Finally, when you invoke MethodInfo, you bind it to a different instance of MyClass (mc2) by passing it in as the first parameter of Invoke.

As mentioned previously, this example blurs the boundaries between type and object usage that you might expect from typical source code. Logically speaking, you retrieve a handle to a method and then invoke that method as if it belonged to a different object. This might be second nature to the programmer experienced with a functional programming language, but to the coder only familiar with C#, it might seem unintuitive to decouple object implementation from object instantiation.

Putting It Together

Now that I’ve explored the basics of inspection and invocation, I’ll put things together with a concrete example. Imagine that you want to ship a library with a static helper function that must process objects—but at design time, you don’t know anything about the types of these objects! It’s up to the caller of the function to instruct it on how to extract meaningful information from these objects. The function will accept a collection of objects and a string descriptor of a method. It will then iterate through the collection, calling the method on each object and aggregating the return values with some function (see Figure 5).

Figure 5 Extracting Information from Objects

using System;
using System.Collections.Generic;
using System.Reflection;

namespace Example4
{
    class Program
    {
        static void Main(string[] args)
        {
            // prepare some objects for our function to process
            object[] objs = new object[] {
                new IntReturner(1), new IntReturner(2), 
                new IntReturner(3), new SonOfIntReturner(4), 
                new SonOfIntReturner(5), new SonOfIntReturner(6),
            };

            Console.WriteLine(
                “Attempting to compute average, “ + 
                “passing in array with {0} elements.”, objs.Length);

            int average = ComputeAverage(objs, “GetInteger”);

            Console.WriteLine(“Found an average of {0}!”, average);
        }

        public static int ComputeAverage( 
            IEnumerable<object> objs, string methodname)
        {
            int sum = 0, count = 0;

            Type firstType = null;
            MethodInfo firstMethod = null;

            foreach (object o in objs)
            {
                if (firstMethod == null)
                {
                    firstType = o.GetType();
                    firstMethod = firstType.GetMethod(methodname);
                }

                sum += (int)firstMethod.Invoke(o, null);
                count++;
            }

            // note that we use integer division here (not floating point)
            if (count == 0) return 0;
            return sum / count; 
        }
    }

    class IntReturner
    {
        protected int value = -1;
        public IntReturner(int i) { value = i; }
        public virtual int GetInteger()
        {
            Console.WriteLine(
                “GetInteger called on instance of IntReturner, “
                “I’m returning {0}!”, value);
            return value;
        }
    }

    class SonOfIntReturner : IntReturner
    {
        public SonOfIntReturner(int i) : base(i) { }
        public override int GetInteger()
        {
            Console.WriteLine(
                “GetInteger called on instance of SonOfIntReturner, “
                “I’m returning {0}!”, this.value);
            return value;
        }
    }

    class EnemyOfIntReturner
    {
        protected int value = -1;
        public EnemyOfIntReturner(int i) { value = i; }
        public virtual int GetInteger()
        {
            Console.WriteLine(
                “GetInteger called on instance of EnemyOfIntReturner, “
                “I’m returning {0}!”, value);
            return value;
        }
    }
}

For purposes of this example, I’ll document some constraints. First, the method described by the string parameter (and necessarily implemented by each object’s underlying type) will accept no parameters and will return an integer. The code will iterate through the collection of objects, calling the specified method and gradually computing the average of all the values. And finally, since this isn’t production code, I won’t worry about validating parameters or integer overflow when aggregating the sum.

As you look through the example code, observe that the protocol between the Main function and the static helper ComputeAverage does not depend on any type information apart from the universal base class of Object itself. In other words, you could completely alter the type and structure of the objects you’re passing around, but so long as you can always use a string to describe a method that returns an integer, your ComputeAverage function will work!

There’s a critical gotcha to be aware of that is related to MethodInfo (and reflection in general) hidden in this latest example. Note that, inside the foreach loop of ComputeAverage, the code only grabs a MethodInfo from the first object in the collection and then binds for invocation for every subsequent object. As coded, this works great—it’s a simple example of MethodInfo caching. But there’s a fundamental limitation here. A MethodInfo instance can only be invoked on instances of types from the same hierarchy as the object from which it was retrieved. You see this in action because you pass in instances of both IntReturner and SonOfIntReturner (which inherits from IntReturner).

In the sample code, I’ve included a class called EnemyOfIntReturner that implements the same basic protocol as the other two classes, but does not share any common shared type. In other words, the interface is logically equivalent, but there’s no overlap in the type hierarchy. To explore the usage of MethodInfo in this scenario, try adding another object to the collection, an instance acquired with "new EnemyOfIntReturner(10)" and then run the example again. You’ll hit an exception explaining that the MethodInfo cannot be used to Invoke on the specified Object because it’s completely unrelated to the original type from which the MethodInfo was obtained (even though the method name and basic protocol is equivalent). To make your code production-quality, you must be prepared for this scenario.

A possible solution might include analyzing the type of all incoming objects on your own and maintaining an interpretation of their shared type hierarchy (if any). When the next object is of a type divergent from any known type hierarchy, you need to acquire and store a new MethodInfo. Another solution might be to catch the TargetException and simply reacquire a MethodInfo instance. Both of the solutions I’ve mentioned here have pros and cons. Joel Pobar wrote a great article for this magazine in the July 2005 issue on MethodInfo caching and Reflection performance which I highly recommend.

Hopefully, this example has demonstrated that adding some reflection to your application or framework can add a lot of flexibility for later customization or extensibility. Admittedly, using reflection can be somewhat cumbersome when compared to the equivalent logic in your native programming language. If you feel that adding reflection-based late binding to your code is too much grief for you or your customers (because, after all, they must describe their types and code to your framework somehow), it might be possible to strike a balance with just the right amount of flexibility.

Efficient Type Processing for Serialization

Now that we’ve walked through the basics of .NET Reflection with a few examples, let’s tackle a real-world scenario. If your software interacts with other systems via Web services or another out-of-process remoting technology, then it’s likely you’ve encountered serialization. Serialization is essentially the translation of a living, in-memory object into a data format suitable for transmission over the wire or storage on disk.

The System.Xml.Serialization namespace in the .NET Framework provides a powerful serialization engine with XmlSerializer, which can consume arbitrary managed objects and convert them into XML—with the option of translating the XML data back into typed object instances at a later date, a process called deserialization. The XmlSerializer class is a powerful, enterprise-ready piece of software and should be your first choice if you’re faced with serialization in your project. But for educational purposes, let’s explore how serialization (or another, similar instance of runtime-type processing) might be accomplished.

The scenario: you’re shipping a framework that needs to consume object instances of arbitrary user types and convert them into some intelligent data format. For instance, suppose you have an in-memory object of type Address like this:

(pseudocode)
class Address
{
    AddressID id;
    String Street, City;
    StateType State;
    ZipCodeType ZipCode;
}

How do you generate an appropriate data representation for later consumption? Perhaps just a simple textual rendering will do:

Address: 123
    Street: 1 Microsoft Way
    City: Redmond
    State: WA
    Zip: 98052

If you have full knowledge of the formal data types you’ll need to translate ahead of time (such as when you’re writing the code), then it becomes very easy:

foreach(Address a in AddressList)
{
    Console.WriteLine(“Address:{0}”, a.ID);
    Console.WriteLine(“\tStreet:{0}”, a.Street);
    ... // and so on
}

However, things get interesting when you have no prior knowledge of the types you’ll need to interact with at run time. How do you author general framework code like this?

MyFramework.TranslateObject(object input, MyOutputWriter output)

First, you’ll need to decide which type members are of interest for serialization. Possibilities include only capturing members of certain types, such as primitive system types, or providing a mechanism by which the type author can describe which members need to be serialized, such as by using custom attributes as markup on type members). You could only capture members of certain types, such as primitive system types, or the type author can describe which members need to be serialized (possibly by using custom attributes as markup on type members).

Once you’ve documented which data structure members will be translated, you need to code logic that can enumerate and retrieve them from an incoming object. Reflection does the heavy lifting here by allowing you to query both data structure and data value.

For sake of simplicity, let’s design a lightweight translation engine that takes an object, grabs the values of all its public properties, converts them to strings by directly calling ToString, and then proceeds to serialize these values. The algorithm will look something like this, for a given object named input:

  1. Call input.GetType to retrieve a System.Type instance describing the input’s underlying structure.
  2. Use Type.GetProperties with an appropriate BindingFlags parameter to retrieve the public properties as PropertyInfo instances.
  3. Use PropertyInfo.Name and PropertyInfo.GetValue to retrieve the properties as key-value pairs.
  4. Call Object.ToString on each value to convert it (primitively) to string format.
  5. Package the object type’s name along with the collection of property names and string values into the correct serialization format.

This algorithm simplifies matters considerably, but captures the essence of what is necessary to take a runtime data structure and convert it into self-describing data. However, there’s a catch: performance. As mentioned earlier, reflection is costly for type processing and value retrieval. For this example, I perform the full type analysis per every instance of the type that’s supplied.

What if you could somehow capture or persist your understanding of the type’s structure, so that later on you could trivially retrieve it and efficiently process new instances of that type—in other words, skip ahead to Step #3 in the example algorithm? The good news is that this is entirely possible using functionality from the .NET Framework. Once you understand the type’s data structure, you can use CodeDom to generate code on the fly that is bound to that data structure. You’ll generate a helper assembly containing a helper class and method which references the incoming type and accesses its properties directly (like any other reference in managed code) and thus the performance hit for type inspection is paid just once.

Now I’ll amend the algorithm. For a new type:

  1. Acquire a System.Type instance corresponding to that type.
  2. Use the various accessors of System.Type to retrieve the schema (or at least the subset of the schema that is of interest for serialization) such as property names, field names, and so on.
  3. Use the schema information to generate a helper assembly (via CodeDom), which is linked against the new type and processes instances efficiently.
  4. Use code in the helper assembly to extract the instance’s data.
  5. Serialize the data as appropriate.

For all incoming instances of a given type, you can skip ahead to Step #4 for a tremendous performance boost when compared to explicitly inspecting for every instance.

I’ve developed a basic serialization library called SimpleSerialization that implements this algorithm with reflection and CodeDom (it’s available as part of the download for this column). The main component is a class called SimpleSerializer, which the user constructs with a System.Type instance. Within the constructor, the new instance of SimpleSerializer analyzes the provided Type and generates a temporary assembly with a helper class. This helper class is tightly bound to the provided data type and can process instances just as if you’d written the code yourself with full knowledge of the type ahead of time.

The SimpleSerializer class has the following layout:

class SimpleSerializer
{
    public class SimpleSerializer(Type dataType);

    public void Serialize(object input, SimpleDataWriter writer);
}

Amazingly simple! The constructor does most of the heavy lifting: it uses Reflection to analyze the type’s structure and then generate a helper assembly with CodeDom. The SimpleDataWriter class is just a data sink for illustrating the common serialization pattern.

For serializing instances of a simple Address class, the following pseudocode gets the job done:

SimpleSerializer mySerializer = new SimpleSerializer(typeof(Address));
SimpleDataWriter writer = new SimpleDataWriter();
mySerializer.Serialize(addressInstance, writer);

Finishing Up

I highly encourage you to play around with the sample code, especially the SimpleSerialization library. I’ve added comments throughout the interesting parts of SimpleSerializer and it’s my hope that it will prove informative. Of course, if you’re faced with the need to do serious serialization in production code, then definitely rely the technologies provided in the .NET Framework (such as XmlSerializer). But if you find yourself needing to consume arbitrary types at run time and process them efficiently, I hope you’ll be able to adopt my SimpleSerialization library for your own scenario.

I’d like to thank CLR developers Weitao Su (Reflection) and Pete Sheill (CodeDom) for their guidance and feedback.

Send your questions and comments to clrinout@microsoft.com.

Mike Repass is a Program Manager on the .NET Framework CLR team. He works on Reflection, CodeDom, and various parts of the execution engine. You can reach him on his blog at blogs.msdn.com/mrepass.