Reflection vs. Metadata

Here are some old notes I had about Reflection vs. the raw IMetadata Import interfaces. They’re from a while ago (before CLR 4.0 was shipped!), but still relevant. Better to share late than never!

Quick reminder on the two APIs I’m comparing here:

  • Reflection is the managed API (System.Type and friends) for reading metadata. The CLR’s implementation is built on top of the CLR loader, and so it’s geared towards a “live” view of the metadata.  This is what everybody uses because it’s just so easy. The C# typeof() keyword gives you a System.Type and you’re already into the world of reflection.
  • IMetaDataImport is an unmanaged COM-classic API which is much lower level. ILDasm is built on IMetadataImport.

The executive summary is that the IMetadata APIs are a file format decoder and returns raw information. The Reflection APIs are a much higher abstraction level that include the metadata and other information in the PE file, fusion, CLR loader, and present a high-level type-system object model.

This difference means that while Reflection and Metadata are conceptually similar, there are things in Reflection that aren’t in the metadata and things in the metadata that aren’t exposed in reflection

This is not an exhaustive list.

Differences because Reflection can access the loader

Reflection explicitly does eager assembly loading

The only input to IMetaDataImport is the actual bits in a file. It is a purely static API.

In contrast, reflection is a veneer over the CLR loader and thus can use the CLR Loader, fusion, assembly resolution, and policy from current CLR bindings as input.

In my opinion, that’s the most fundamental difference between them, and the root cause of many other differences.

This means that reflection can use assembly resolution, and that causes many differences with raw IMetadataImport:

  1. auto resolving a TypeRef and TypeSpecs to a TypeDef. You can't even retrieve the original typeRef/TypeSpec tokens from the reflection APIs (the metadata tokens it gives back are for the TypeDefs). Same for MethodDef vs. MemberRef.
  2. Type.GetInterfaces() - it returns interfaces from the base type.
  3. Type.GetGenericArguments() - as noted here already.
  4. random APIs like: Type.IsClass, IsEnum, IsValueType - these check the base class, and thus force resolution.
  5. determining if a MemberRef token is a constructor or a method because they have different derived classes (see below).
  6. representing illegal types ("Foo&&"). Reflection is built on the CLR Loader, which eagerly fails when loading an illegal type. (See below.)
  7. Assembly.GetType(string name) will automatically follow TypeForwarders, which will cause assembly loading.

The practical consequence is that a tool like ILDasm can use IMetaDataImport to inspect a single assembly (eg, Winforms.dll) without needing to do assembly resolution. Whereas a reflection-based tool would need to resolve the assembly references.

Different inputs

While Reflection is mostly pure and has significant overlap with the metadata, there is no barrier to prevent runtime input sources from leaking through the system and popping up in the API.  Reflection exposes things not in the PE-file, such as:

  1. additional interfaces injected onto arrays by the CLR loader  (see here).
  2. Type.get_Guid - if the guid is not represented in the metadata via a Guid attribute, reflection gets the guid via a private algorithm buried within the CLR. 

Generics + Type variables + GetGenericArguments()

In Reflection, calling GetGenericArguments() on a open generic type returns the System.Type objects for type-variables. Whereas in metadata, this would be illegal. You could at best get the generic argument definitions from the type definition.

In reflection, if you pass in Type variables to Type.MakeGenericType(), you can get back a type-def. Whereas in metadata, you'd still have a generic type. Consider the following snippet:

 var t = typeof(Q2<>); // some generic type Q2<T>
 var a1 = t.GetGenericArguments(); // {"T"}
 var t2 = t.MakeGenericType(a1); 
 Debug.Assert(t.Equals(t2)); 
 Debug.Assert(t2.Equals(t)); 

In other words, metadata has 2 distinct concepts:

  1. The generic type arguments in the type definition (see IMDI2::EnumGenericParams)
  2. The generic type arguments from a type instantiation (as retrieved from a signature blob, see CorElementType.GenericInsantiation=0x15).

In reflection, these 2 concepts are unified together under the single Type.GetGenericArguments() API. Answering #1 requires type resolution whereas #2 can be done on a type-ref. This means that in reflection, you can't check for generic arguments without potentially doing resolution.

PE-files

Reflection exposes interesting data in the PE-file, regardless of whether it's stored in the metadata or rest of the PE-file.

Metadata is just the metadata blob within the PE file. It's somewhat arbitrary what's in metadata vs. not. Metadata can have RVAs to point to auxiliary information, but the metadata importer itself can't resolve those RVAs.

Interesting data in the PE but outside of the metadata:

  1. the entry point token is in CorHeaders outside of the metadata.
  2. Method bodies (including their exception information) are outside the metadata. 
  3. RVA-based fields (used for initializing constant arrays)
  4. embedded resources

File management

It is possible to set policy on AppDomain that will require Assembly.Load to make a shadow copy of an assembly before it's loaded and open that copy instead of assembly in the original location. This policy allows user to specify where shadow copies should be created.

Failure points

IMetadataImport only depends on the bits in the file, so it has few failure points after opening. In contrast, Reflection has many dependencies, each of which can fail. Furthermore, there is no clear mapping between a reflection API and the services it depends on, so many reflection APIs can randomly fail at random points.

Also, IMetadataImport allows representing invalid types, whereas Reflection will eagerly fail. For example, it is illegal to have a by-ref to a by-ref, (eg, "Foo&&"). Such a type can still be encoded in the metadata file format via ildasm, and IMetaDataImport will load it and provide the signature bytes. However, Reflection will eagerly fail importing because the CLR Loader won't load the type.

Detecting failures requires eagerly resolving types, so there is a tension between making Reflection a deferred API vs. keeping the eager-failure semantics.

COM-Interop

Reflection represents types at runtime, like COM-interop objects. Whereas metadata only provides a static typing.

Loader-added interfaces on arrays

In .Net 2.0, generic interfaces are added for arrays at runtime. So this expression "typeof(int[]).GetInterfaces()" returns a different result on .NET 2.0 vs. .NET 1.1; even if it's an identical binary. I mentioned this example in more detail here.

Differences in Object model

Reflection deliberately tries to be a higher level friendly managed API, and that leads to a bunch of differences

MemberInfo.ReflectedType property

Reflection has a ReflectedType property which is set based on how an item is queried. So the same item, queried from different sources, will have a different ReflectedType property, and thus compare differently. This property is entirely a fabrication of the reflection object model and does not correspond to the PE file or metadata.

Different object models for Type

The raw metadata format is very precise and represents types in a variety of distinct ways:

  1. TypeDef
  2. TypeRef
  3. TypeSpec, Signature blobs
  4. builtins ("I4")
  5. arrays,
  6. modifiers (pointer, byref)
  7. Type variables (!0, !!0)

These are all unique separate entities with distinct properties which in the metadata model, conceptually do not share a base class. In contrast, Reflection unifies these all into a single common System.Type object. So in reflection, you can't find out if your class’s basetype is a TypeDef or a TypeRef.

Psuedo-custom attributes

Reflection exposes certain random pieces of metadata as faked-up custom attribute instead of giving it a dedicated API the way IMetaDataImport does.

These "pseudo custom attributes" (PCAs) show up in the list of regular custom attributes with no special distinction. This means that requesting the custom attributes in reflection may return custom attributes not specified in the metadata. Since different CLR implementations add different PCAs, this list could change depending on the runtime you bind against.

Some examples are the Serialization and TypeForwardedTo attributes.

Custom Attributes

To get an attribute name in reflection, you must do CustomAttributeData.Constructor.DeclaringType.FullName.

This is a cumbersome route to get to the custom attribute name because it requires creating several intermediate objects (a ConstructorInfo and a Type), which may also require additional resolution. The raw IMetadataImport interfaces are much more streamlined.

ConstructorInfo vs. MethodInfo

Metadata exposes both Constructors and MethodInfos as tokens of the same type (mdMethodDef). Reflection exposes them as separate classes which both derive from MemberBase. This means that in order to create a reflection object over a MemberDef token, you must do some additional metadata resolution to determine whether to allocate a MemberInfo or a ConstructorInfo derived class. This has to be determine when you first allocate the reflection object and can't be deferred, so it forces eager resolution again.

Type Equivalence and Assignability

Reflection exposes a specific policy for type equivalence, which it inherits from the CLR loader. Metadata just exposes the raw properties on a type and requires the caller to determine if types are equivalent.

For example, Reflection has the Type.IsAssignableFrom API, which may invoke the CLR Loader and Fusion, as well as CLR-host version specific Type Unification policies (such as no-PIA support) to determine if types are considered equal. The CLR does not fully specify the behavior of Type.IsAssignableFrom,.

Case sensitivity matching

Metadata string APIs are case sensitive. Reflection string APIs often take a "ignoreCase" flag to facilitate usage with case insensitive languages, like VB.