Visual C++ 2005 Under the Hood

Article
02/03/2012

Kenny Kerr

April 2006

Applies To:
   Visual Studio 2005
   Visual C++ 2005
   Common Intermediate Language (CIL)

Summary: Understanding the relationship between the C++ you write and the CIL the compiler generates goes a long way towards helping you write faster and more correct code. Kenny Kerr shows the way. (15 printed pages)

Introduction
Behind Hello World
Statements and Expressions
The Type System
Conclusion

Introduction

The Common Intermediate Language (CIL) provides a great middle ground between high-level languages like C++ and the machine languages understood by computers. It is just high-level enough to abstract away the complexities of modern processors, while being low-level enough to provide the programmer with a simple and accurate view of the compiler's interpretation of the code and, ultimately, the form in which the common language runtime (CLR) will consume it. At that point the JIT compiler takes over and you're back to processor-specific instructions.

Understanding the relationship between the C++ you write and the CIL the compiler generates goes a long way towards helping you write faster and more accurate code. Although premature optimization is often referred to as evil, there is frequently more than one way to write a particular expression; having a natural instinct for what will run faster or consume fewer resources is goodness.

In this article we explore some commonly used C++ language constructs and how the compiler translates them into instructions for consumption by the CLR. One of the main barriers to understanding your compiler output is the amount of "noise" that is generated by it. Compilers don't do a good job of generating human readable code. After all, their priority is generating the fastest possible machine code. Human comprehension is not a priority at all, but it can certainly hurt to stare at endless lines of cryptic instructions with no obvious scope or purpose. To make it easier to comprehend, I have written the CIL for the various examples in this article by hand in an attempt to simplify the examples and add clarity.

If you'd like to play around with the CIL code examples, simply copy the code into a text file and compile it with the CIL assembler provided by the .NET Framework as follows:

ilasm.exe program.cil /debug

The /debug option instructs the assembler to generate a program database (PDB) file so that you can step through the code using the CLR debugger (dbgclr.exe).

This article assumes you have already been introduced to the new language features for supporting garbage collection and the .NET Framework in C++. For an introduction to the C++/CLI language design, check out my article titled C++/CLI: The Most Powerful Language for .NET Framework Programming.

One final note about terminology before we begin: CLI object references are referred to as handles in C++. CLI reference types are allocated on the managed heap and the handles that refer to those objects are on the stack. To avoid confusion when discussing stack usage in this article, I will simply speak in terms of objects and values being pushed onto the stack and popped off the stack. Just keep in mind that only values and handles physically reside on the stack whereas objects, in other words instances of CLI reference types, physically reside on the managed heap.

Behind Hello World

Before we dig into specific language constructs, let's take a look at a simple Hello World application to get a quick refresher on the basic structure and syntax of CIL code. Consider the following C++ program:

using namespace System;

int main()
{
    Console::WriteLine("Hello World");
}

Ah, Hello World at its finest. When you run it, the string "Hello World" is written to the standard output stream, and if there's a console window attached it will faithfully be displayed to the user. Now let's take a look at the CIL produced by the compiler.

.assembly Kerr.Sample {}
.assembly extern mscorlib {}

.method static int32 main()
{
    .entrypoint
    .maxstack 1
    
    ldstr "Hello World"
    call void [mscorlib]System.Console::WriteLine(string)
    
    ldc.i4.0
    ret
}

The first .assembly directive provides the name of the program's assembly. The braces following the name can include things like the assembly version and other assembly metadata. The .assembly extern directive indicates that the types and resources in the mscorlib assembly are required by this program. Again, the braces can include additional information to uniquely identify the assembly being referenced.

The main method is called a method definition since both the signature and the body of the method are provided. In contrast, when a method signature is provided without a body it is referred to as a method declaration. Method declarations are typically used as call targets when a method is being called, whereas a method definition provides the actual implementation for a method.

A method definition begins with the .method directive. Even though main is clearly not a member function, it must be marked static. Most things that you would expect a programming language to infer implicitly are usually explicit in CIL. Keep in mind that CIL was designed to allow compilers, not humans, to easily generate it. As you would expect, the method has an empty argument list and returns an int32 value, where int32 is a 32-bit signed integer and a synonym for the System::Int32 value type.

The .entrypoint directive indicates that this method is where execution begins for the application. Only one method in an assembly can have this directive.

The final directive in this example is .maxstack, which indicates how many stack slots the method expects to use. This really gives you a sense for the low-level nature of CIL. Even the amount of stack space reserved needs to be declared up front. A high-level language would not gain even a handful of followers if it demanded such bookkeeping from its users. We'll talk more about the stack in the next section.

And finally we get to the guts of Hello World. To write "Hello World" to the console we first push a string onto the stack using the ldstr instruction. The call instruction is used to call the Console class's WriteLine method. Notice the declaration of the WriteMethod contains everything the runtime would need to uniquely identify it. The exact WriteLine overload must be defined to match the arguments already on the stack, otherwise the program will compile but the runtime will reject it. Fortunately, you don't have to worry about the C++ compiler calling the wrong overload for the arguments you provide.

Finally, main returns zero by pushing the value onto the stack using the ldc instruction. This instruction indicates a value of zero should be pushed onto the stack as a four byte integer. The ret instruction returns control to the caller, in this case the runtime, which can then pop the value off the stack and set the value as the exit code for the process.

Statements and Expressions

In this section we will explore a slightly less trivial snippet of C++ code with the aim of getting a feel for how the compiler makes use of local variables, the stack, and branching to implement the rich syntax provided by C++. The following C++ function performs a simple comparison of two arrays for the purpose of determining equivalence.

bool Equivalent(array<Byte>^ lhs,
                array<Byte>^ rhs)
{
    bool equivalent = true;

    if (lhs != rhs)
    {
        if (nullptr == lhs || 
            nullptr == rhs ||
            lhs->Length != rhs->Length)
        {
            equivalent = false;
        }
        else
        {
            for (int index = 0; equivalent && index != lhs->Length; ++index)
            {
                equivalent = lhs[index] == rhs[index];
            }
        }
    }

    return equivalent;
}

It is considered good programming practice to declare variables as locally as possible. Similarly, it is a good idea to initialize all variables at the point of declaration. The first piece of advice limits the amount of state that needs to be managed and the second eradicates any unpredictability with the accidental use of un-initialized state. CIL is not a good language choice if you're concerned about good programming practices. Although the runtime can initialize all variables for you, you need to actually declare all the variables you will be using in the method up front.

There are two local variables in this function. Both equivalent and index are initialized at declaration time. The equivalent variable is used throughout the function and is declared before any other statements. The index variable, on the other hand, is only used in the for statement and is therefore declared in its initialization expression. The function is also written in such a way that the for loop is only used to compare the bytes as a last resort. Being careful of the scope and state of variables is a big part of defensive programming and is highly encouraged.

Now let's take a look at how this might look after the compiler is done with it. I provide the complete CIL source code for the Equivalent method here for completeness. The remainder of this section walks through the code one section at a time.

.method public static bool Equivalent(unsigned int8[] lhs,
                                      unsigned int8[] rhs)
{
    .maxstack 3
    
    .locals (bool equivalent,
             int32 index)
    
    ldc.i4.1 // true
    stloc.s equivalent
    
    // Branch to _RETURN if lhs is equal to rhs
    ldarg.s lhs
    ldarg.s rhs
    beq.s _RETURN
    
    // Branch to _RETURN_FALSE if lhs is null
    ldnull
    ldarg.s lhs
    beq _RETURN_FALSE
    
    // Branch to _RETURN_FALSE if rhs is null
    ldnull
    ldarg.s rhs
    beq.s _RETURN_FALSE
    
    // Branch to _RETURN_FALSE if the lhs length is not equal to the rhs length
    ldarg.s lhs
    ldlen
    ldarg.s rhs
    ldlen
    bne.un.s _RETURN_FALSE
    
    br.s _CONDITION
    
_LOOP:
    
    // Increment the index variable
    ldc.i4.1
    ldloc.s index
    add
    stloc.s index
    
_CONDITION:
    
    ldloc.s index
    ldarg.s lhs
    ldlen
    beq.s _RETURN
    
    // Compare array elements
    ldarg.s lhs
    ldloc.s index
    ldelem.u1
    ldarg.s rhs
    ldloc.s index
    ldelem.u1
    bne.un.s _RETURN_FALSE
    
    br.s _LOOP
    
_RETURN_FALSE:
    
    ldc.i4.0 // false
    stloc.s equivalent
    
_RETURN:
    
    ldloc.s equivalent
    ret
}

After declaring the number of stack slots required, the .locals directive is used to define the local variables used by the method.

.locals (bool equivalent,
         int32 index)
    
ldc.i4.1 // true
stloc.s equivalent

The runtime will initialize the variables with false and zero, respectively. Since equivalent is initialized to true in the C++ function, the ldc instruction is used to push 1 onto the stack and the stloc instruction pops the value off the stack and stores it in the equivalent variable.

The first if statement in the C++ function checks the handles for inequality. After all, if the array handles refer to the same object there is no point wasting processor cycles checking for equivalence. If this is the case, the method should simply return the value stored in the equivalent method (currently true). The following instructions implement this logic.

ldarg.s lhs
ldarg.s rhs
beq.s _RETURN

The ldarg instruction is used to push the lhs and rhs arguments onto the stack. The beq instruction pops both handles off the stack and compares them for equality. If they are indeed equal it transfers control, otherwise known as branching, to the instruction following the _RETURN label. At that point the code merely pushes the value of the equivalent variable onto the stack and returns. If beq determines that the handles are not equal, control passes to the next instruction.

At this point, the method's second if statement does a few more checks to weed out arrays that would obviously not be equivalent. Since we've already determined that the arrays are not equal, both in the sense that they're not sharing an address in memory and that they're not both null, if either one is actually null or if they differ in length then we can again skip any further comparison and this time return false. Here are the relevant instructions.

ldnull
ldarg.s lhs
beq.s _RETURN_FALSE
    
ldnull
ldarg.s rhs
beq.s _RETURN_FALSE
    
ldarg.s lhs
ldlen
ldarg.s rhs
ldlen
bne.un.s _RETURN_FALSE

Each group of instructions corresponds to one of the expressions separated by the C++ logical-OR operator (||). These instructions clearly illustrate the short-circuit evaluation performed by C++: if the result of the left operand to the operator is sufficient to determine the outcome of the operation, the right operand is not evaluated.

The first two checks compare the arrays in turn to null by pushing null onto the stack using the ldnull instruction and pushing the respective array onto the stack using the ldarg instruction. The beq instruction pops the two values off the stack and, if found equal, transfers control to the instruction following the _RETURN_FALSE label. At this point the equivalent variable is set to false and the method ultimately returns this value to the caller.

If, on the other hand, both arrays are not null, a final check is done to ensure that they have equal lengths. Determining the length of an array involves pushing the array onto the stack and using the ldlen instruction, which pops the array off the stack and push a value representing the number of elements in the array onto the stack. This is done for both arrays and the results are compared using the bne.un instruction which transfers control to the instruction following the _RETURN_FALSE label if it determines that the values are not equal.

With all the preliminary checks out of the way we get to the for statement to do the final comparison of the array elements. Let's quickly recap the semantics of this statement. The for statement consists of three semi-colon delimited expressions followed by a scope block. You can use any expressions you wish as long as the condition expression results in a value that can be interpreted to mean either true or false. The initialization expression is executed first and exactly once. It is typically used to initialize loop indices for iteration or any other variables required by the for statement. The condition expression is executed next and is executed before each subsequent loop. If the expression evaluates to true, the scope block will be entered. If the expression evaluates to false, control is passed to the first statement following the for statement. The loop expression is executed after every iteration. This can be used to increment loop indices or move cursors or anything else that might be appropriate. Following the loop expression, the condition expression is evaluated again, and so on and so forth.

Considering that instruction-based languages don't know anything about high-level statements like the for statement described previously, let's break it down into something a bit more machine-friendly before we explore the CIL instructions for it.

    int index = 0;
    goto _CONDITION;

_LOOP:

    ++index;

_CONDITION:

    if (equivalent && index != lhs->Length)
    {
        equivalent = lhs[index] == rhs[index];
        goto _LOOP;
    }

Well, that doesn't look anything like the for statement, but it has the same effect. Here I use the infamous goto, more generally referred to as a branch instruction. Branching is inevitable in languages that don't support selection and iteration statements. It serves little purpose in languages like C++ and C# other than to obfuscate the code.

By following the branching in the previous code example, you should be able to see how we construct the for statement semantics by executing the condition and then jumping up to the loop expression to conditionally loop again. Now let's consider how we can implement this in CIL.

    br.s _CONDITION
    
_LOOP:
    
    ldc.i4.1
    ldloc.s index
    add
    stloc.s index
    
_CONDITION:
    
    ldloc.s index
    ldarg.s lhs
    ldlen
    beq.s _RETURN
    
    ldarg.s lhs
    ldloc.s index
    ldelem.u1
    ldarg.s rhs
    ldloc.s index
    ldelem.u1
    bne.un.s _RETURN_FALSE
    
    br.s _LOOP

As we've seen, all variables are declared up front so the for statement's implementation begins with the simple br instruction that transfers control to the instruction following the _CONDITION label. The condition expression is simpler in CIL since there's no need to check whether equivalent is true at this point. All we do is check that index is still less than the array length. If they are equal, the beq instruction transfers control to the instruction following the _RETURN label, thus ending the loop.

And finally we arrive at the instructions for comparing the arrays' elements. In C++ we expressed it as follows.

equivalent = lhs[index] == rhs[index];

In CIL we will avoid repeatedly writing the result to the equivalent variable and simply transfer control to the instruction following the _RETURN_FALSE label if the elements are not equal. The instructions for comparing the array elements deserve a little illustration. They provide some insight into how the compiler can employ the stack to implement expressions. The lhs array is pushed onto the stack using the ldarg instruction and then the ldlen instruction pops it off and pushes the number of elements in the array onto the stack. We could store this value in a local variable while getting the length of the rhs array, or we can simply leave the value on the stack and save ourselves a local variable. That is exactly what happens here. Once the rhs array's length is pushed onto the stack, the bne.un instruction is used to compare the lengths and transfers control to the instruction following the _RETURN_FALSE label if they are not equal.

Table 1 illustrates the values on the stack after each of these instructions.

Table 1

Instruction sequence	Stack after instruction
darg.s lhs	lhs
ldloc.s index	lhs index
ldelem.u1	lhs[index]
ldarg.s rhs	lhs[index] rhs
ldloc.s index	lhs[index] rhs index
ldelem.u1	lhs[index] rhs[index]
bne.un.s _RETURN_FALSE

Now you should be able to see why we originally declared that the method requires three stack slots.

The final part of the for statement is the loop expression that involves incrementing the index variable. This is achieved by pushing a value of 1, as well as the index value, onto the stack and then using the add instruction to add them up. The resulting value is then copied back to the index variable. At this point the condition is evaluated again.

The method concludes with the _RETURN_FALSE and _RETURN labels and their respective instructions for preparing the return value.

The Type System

In this section we are going to explore the natural syntax and semantics provided by Visual C++ for CLI value and reference types and how the compiler implements these features in CIL.

To recap, CLI defines two types, a value type and a reference type. Value types are allocated on the stack. Reference types are allocated on the managed heap. Consider the following example.

value class MyValueType
{
    // members
};

ref class MyRefType
{
public:
    MyRefType() {}
    ~MyRefType() {}

    // remaining members
};

By adding value before the class keyword, MyValueType is declared a value type. Value types cannot contain default constructors, copy constructors, or destructors. They truly are simple value types that the runtime takes care of initializing and copying, given the appropriate CIL directives and instructions. Adding ref before the class keyword declares MyRefType a reference type. MyRefType has a default constructor as well as a destructor.

Let's examine the resulting CIL for these types. Here is the definition of MyValueType in CIL.

.class sequential sealed beforefieldinit MyValueType
    extends [mscorlib]System.ValueType
{
    // members
}

Types are defined with a .class directive followed by a type header. The type header consists of a number of type attributes followed by the name of the type you are defining. Type members are defined within the braces following the type header.

The sequential attribute indicates that the runtime should layout the type's fields sequentially in memory in the order that they appear in the metadata. In contrast, the auto attribute indicates that the runtime is free to reorder the fields for the most optimal layout for the particular platform the code happens to be running on. Sequential layout is the default for value types while automatic layout is the default for reference types. This makes sense since you may often use value types for interoperability with native C or C++ code in which a predictable layout would be required. To override these defaults simply add the Runtime::InteropServices::StructLayout attribute to your type, indicating the kind of layout scheme desired.

The sealed attribute indicates that other types cannot derive from MyValueType. The CLI defines that all value types shall be sealed. You can optionally seal reference types. Many C++ programs are filled with concrete classes that are used as standalone types, but that are not designed to be used as base classes. You can typically identify these classes by their lack of virtual methods and a non-virtual destructor. Of course, nothing stops you from shamelessly inheriting from them. The sealed attribute effectively declares that it is prohibited to inherit from the particular type without having to resort to various language tricks.

The final type attribute used by MyValueType is beforefieldinit. This attribute is an optimization that the compiler may choose to provide for a type. The attribute indicates to the runtime that it need not call the type's static constructor before allowing calls to the type's static methods. The runtime guarantees that a type's static constructor, if provided, will be called before any type member is called and the static constructor will be called only once and in a thread-safe manner. This can be quite taxing at runtime in some scenarios, so to allow the runtime to provide some performance boost to the use of the type, compilers can add this attribute if the compiler determines that it is safe to do so. Basically if the compiler can determine that while there is a static constructor, it is not required by any static methods of the class, it should add this attribute to allow the runtime this optimization.

Finally, what actually makes MyValueType a CLI value type is that it extends, in other words derives from, the ValueType class defined in the mscorlib assembly.

Now let's look at the CIL definition for MyRefType.

.class auto beforefieldinit MyRefType
    extends [mscorlib]System.Object
    implements [mscorlib]System.IDisposable
{
    .method public void .ctor()
    {
        .maxstack 1
    
        ldarg.0
        call instance void [mscorlib]System.Object::.ctor()
        
        ret
    }
    
    .method public newslot virtual final void Dispose()
    {
        .override [mscorlib]System.IDisposable::Dispose
        
        .maxstack 1
        
        ldarg.0
        call void [mscorlib]System.GC::SuppressFinalize(object)
        
        ret
    }
    
    // remaining members
}

The primary difference between this definition and that of MyValueType is that MyRefType does not have ValueType as a superclass. Instead, MyRefType extends the Object type from the mscorlib assembly. Before we talk about why MyRefType implements the IDisposable type, let's quickly take a look at the constructor.

.ctor is an example of what the CLI specification refers to as a special name and the .ctor method implements the MyRefType constructor. Even though MyRefType's constructor is empty in C++, there is clearly some work to be done. The ldarg instruction is used to push the method's first argument onto the stack. As you've probably guessed, each instance method has an implicit first argument that is a handle to the current instance. The constructor pushes the handle onto the stack and then calls the base class constructor, which in this case is Object's default constructor, and then returns.

The compiler implements MyRefType's destructor by implementing the IDisposable interface. The newslot method attribute on the Dispose method header indicates that the method will get a new slot in the type's method table. This basically means that calls targeting a base class's Dispose method explicitly will not invoke this method. This ensures that the method will only be called as a destructor, either explicitly or through the IDisposable interface. The virtual method attribute is required since Dispose is a virtual method, as all interface members are. The final method attribute ensures that derived classes cannot override this Dispose method. This is to promote the C++ approach of implicit destructor chaining, as opposed to virtual method invocation, for object destruction.

The .override directive indicates the virtual method is being implemented or overridden. This allows a virtual method with one name to be implemented or overridden by a virtual method with a different name while retaining the polymorphic behavior that is expected. In this case we are simply implementing the IDisposable::Dispose method. Since the MyRefType destructor does not include any statements, the compiler simply adds a call to the GC class's SuppressFinalize method to avoid the cost non-deterministic finalization.

There is a lot more that can be said about destructor implementations, especially in the context of derived classes and stack semantics, but I hope this introduction has given you some insight into how the compiler implements destructor semantics for CLI reference types.

Conclusion

Visual C++ 2005 provides a powerful compiler for generating .NET assemblies from C++ code. Understanding the CIL instructions that the compiler generates for common language constructs can help the developer gain new insights into how the code they write is seen by the runtime.

For a general introduction to CIL, also commonly known as the Microsoft Intermediate Language (MSIL), check out my series of articles titled Introduction to MSIL.

About the author

Kenny Kerr spends most of his time designing and building distributed applications for the Microsoft Windows platform. He also has a particular passion for C++ and security programming. Reach Kenny at https://weblogs.asp.net/kennykerr/ or visit his Web site: https://www.kennyandkarin.com/Kenny/.