Pure C++

How Templates and Generics Work Together

Stanley B. Lippman

Contents

The CAI of the STL
Redesigning the STL for .NET
Templates are Non-Public Assembly Types

I'm closing out this series of columns on generic programming in the Microsoft® .NET Framework by showing how templates and generics do and do not work together, and pointing out a pitfall with the current implementation of templates under Visual C++® 2005. I've chosen to present this material through a discussion of the ongoing work in bringing the Standard Template Library (STL) to .NET. First, I'd like to get everyone on the same page by reviewing the basic elements of the STL.

The CAI of the STL

There are three primary elements to the design of the Standard Template Library: containers, algorithms, and iterators (CAI). The STL vector and list classes represent sequential containers. A sequential container maintains a first element, a second element, and so on through a last element. The program representation of a function parameter list, for example, is typically implemented as a vector holding elements of type string:

vector<string> paramlist; 

The map and set classes represent associative containers. An associative container supports fast lookup. A map, for example, represents a key/value pair: the key is used for lookup, and the value represents the data you store and retrieve. To represent a telephone directory, you would declare a map with a string key and an integer value:

map<string,int> phonedir;

A multimap would allow you to support multiple phone entries for a single key.

The STL also provides a set of algorithms, including those for find, sort, replace, and merge, designed to operate over these containers. These are called generic algorithms because they are independent of both the type of element they are operating on (for example, an int, double, or string) and the type of container within which the elements are held (for example, whether it is a vector, list, or built-in array).

The generic algorithms achieve container independence by operating indirectly on the container. Rather than being passed the container, they are passed an iterator pair (first, last] marking the range of elements over which to iterate—the last element acts as a sentinel, or marker, to indicate one past the set of element, and that halts the algorithm:

sort( paramlist.begin(), paramlist.end() );

Here, begin() and end() are methods provided by all STL containers that return an iterator to the first and one past the last elements. For example, take a look at the following sequence of declarations:

void f()
{
    int ia[4] = {21, 8, 5, 13 };
    vector<int> ivec( ia, ia+4 );  // initialize ivec to ia ...
    list<int>   ilist( ia, ia+4);  // initialize ilist to ia ... 

         // ...
}

Notice that ia+4 actually points to an address one past the last element, 13. (People using the STL at first trip over that distinction, and might, for example, pass in ia+3, which would cause the element 13 not to be included in the values.

An iterator provides a uniform and type-independent way to access, traverse, and compare elements of a container. Iterators are a class abstraction that provides uniform pointer operations (like ++, --, *, ==, !=) independent of whether the container is a built-in array, a list, or any other conforming STL container:

void f()
{
    // ... same as above ...

    // invoke the same generic algorithm each time ...
    sort( ia, ia+4 );                       
    sort( ivec.begin(), ivec.end() );       
    sort( ilist.begin(), ilist.end() );        
}

In each of the three invocations of sort, the resulting sequence is, of course, 5, 8, 13, 21, the fourth through seventh elements of the Fibonacci sequence.

Now, this is an idealized view of the STL that does not prove feasible in practice under both a formal and a practical constraint. The formal constraint is that not all container types support all the algorithmic operations. Neither a map nor a set, for example, support random_shuffle since any reordering of the elements would violate the container type, much as indexing into a stack would violate the semantic characteristics of a stack.

More practically, it is more expensive to use sort or find to get an element in a list container with the generic algorithms than to do the same operation on a vector. As a result, the list container type provides its own class methods, which are more efficient than the generic algorithms. Similarly, it is faster to find a map element using its lookup method rather than using the find algorithm by passing it the begin and end iterators of the map and its key.

One would wish for the algorithms to perform equally on all containers, but in fact, the algorithms are biased towards block containers and built-in arrays rather than the list container and the associative container types. In fact, when I worked with Alex Stepanov at Bell Laboratories, he referred to the generic algorithms as block algorithms.

Redesigning the STL for .NET

The first thing that must be done in order to move the STL over to .NET is reimplement the containers as common language runtime (CLR) types. For various reasons that I don't have the space to go into here, it's best to make the containers reference classes rather than value classes. For example:

// simplifying the declaration for now ...
template <class elemType>
ref class vector { ... };

template <class Key, class Value>
ref class map { ... };

In the native STL, the containers are all non-polymorphic. A declaration of a vector gives the actual vector object. The following shows an example:

// Native STL vector
//       ivec.empty() == true 
//       ivec.size() == 0
vector< int > ivec; 

//      ivec2.empty() == false 
//      ivec2.size() == 10
vector< int > ivec2( 10 ); 

Under C++/CLI, however, when you declare a reference type, you define a tracking handle—the vector itself sits in the managed heap. By default, the handle is set to nullptr. Take a look at Figure 1.

Figure 1 Reference Classes

public ref class sequence {
protected:
    vector<int> ^elems;
};

public ref class fibonacci : sequence { ... };

    // equivalent test
    if ( elems == nullptr ) ...
    if ( ! elems ) ...
}

// ivec == nullptr;
vector< int >^ ivec;

// ivec2 != nullptr ...
// ivec2->empty() == true
// ivec2->size == 0
vector< int > ^ivec2 = gcnew vector<int>; 

// ivec3 != nullptr ...
// ivec3->empty() != true
// ivec3->size() == 10 ...
vector< int > ^ivec3 = gcnew vector<int>( 10 );

The next design requirement is to allow the containers to be consumed by other languages, such as C# and Visual Basic®, that do not support templates. The simplest strategy is to have the template containers implement one or more of the System container interfaces, divided into the two namespaces, as shown in Figure 2.

Figure 2 System Container Interfaces

System::Collections:: System::Collections::Generic::
ICollection ICollection<T>
IList IList<T>
IDictionary IDictionary<K,V>

In general, you will want to support both the Collections and Generic interfaces so that clients currently using the Collections interfaces will be able to use your types. Here is how you might declare support for both:

template <class T>
ref class vector : 
    System::Collections::ICollection,
    System::Collections::Generic::ICollection<T>
{ ... };

In order to implement the container interfaces of the System collection namespaces, you must also implement instances of IEnumerator and IEnumerator<T>:

generic <class T>
ref class vector_enumerator : 
    System::Collections::IEnumerator,
    System::Collections::Generic::IEnumerator<T>
{ ... };

The downside of implementing a System container interface is that while it makes the elements available, it loses the operations of the STL/CLR type container. So, an additional design support is to also provide a generic shadow container type that allows the other language to consume an actual container type. There are two general strategies for doing this: the Plauger Way and the Tsao Way, named for the two primary architects, P. J. Plauger and Anson Tsao.

The Plauger Way can be thought of as providing a generic shadow type. That is, you create the shadow generic class (you could call it generic_vector). It holds a copy of the vector template. Here's an example:

generic <typename T>
public ref class vector_generic 
{
     vector<T>^ m_templ;    // oops ...

public:
     vector_generic( vector<T>^ );
};

The oops comment on the declaration line of m_templ represents a constraint on the use of templates under .NET. As it happens, you cannot store a template requiring instantiation in a generic type. The reason is the different instantiation times of the two parameterized type facilities. Generics are instantiated by the runtime; templates are instantiated by the compiler. Therefore, a template can hold a generic, but a generic cannot hold a template.

The solution under the Plauger Way is to create a common generic interface from which both a template and generic are derived. For an example, see Figure 3. The solution under the Tsao Way follows from the observation that the interfaces are always references to the instance of the template container (which is instantiated in a particular assembly). Therefore, you simply provide an interface and the template implementation. The generic shadow type is eliminated.

generic<typename T> 
interface class vector_interface : ICollection {...};

template<typename T> 
ref class vector : vector_interface, ICollection<T> {...}; 

Figure 3 Common Generic Interface

generic<typename Value> 
interface class vector_interface {...};

template<typename Value> 
ref class vector : vector_interface{...}; 

generic <typename T>
public ref class vector_generic : vector_interface 
{
     vector_interface<T>^ m_templ; 

public:
     vector_generic( vector_interface<T>^ );
     // ...
};

In all cases, everyone other than those within the assembly program go through the generic instance rather than the STL/CLR container. This is because templates under C++/CLR cannot be public assembly members. The next section discusses this.

Templates are Non-Public Assembly Types

In order for .NET to recognize a type, it requires two elements: the program code representing the type to be translated into the Common Intermediate Language (CIL), and metadata describing the details of the type. At the moment, unfortunately, templates are unable to be represented to .NET with either element. Templates, like destructors, don't exist for .NET.

.NET can only know of the concrete instances of a template; it can't know they are a kind of template. For example, .NET can see the CIL and metadata for vector<int> and vector<double>. What it cannot see is the shared template vector that each is an instance of. One side effect of this blindness is that you cannot reflect on templates. That is, you cannot ask of the vector<int> instance, "Are you a template? If so, please pass me your list of parameters and your class definition."

Generics, on the other hand, have direct CIL support in CLR 2.0, and an extended Reflection namespace that fully supports generic reflection. That should be one consideration in your design choice of whether to use a template or generic type in your application. A second consideration is whether it needs to be shared across assemblies. Templates cannot be shared across assemblies while generics can. Therefore, if that is of a central concern to your design, you should choose a generic class over that of a template. Otherwise, you will need to provide some form of common interface to accomplish a cross-assembly handshake such as I described for the STL/CLR design.

The reason templates are not recognized across assemblies is that .NET has an extended notion of type that includes its origin. That is, under .NET, a type has a location, unlike in native C++, where a name floats freely in a shared global space. This global name pollution makes it difficult to combine diverse components into a working application. Here's a further distinction: Namespaces provide a program-level solution, and adding a location to a type provides an assembly-level solution.

That is, globally public names are, in effect, partitioned by their assembly so that the assemblies can be combined without name conflict. Under .NET, a type has a location. This means that a vector<int> of one assembly is not recognized as the same vector<int> of a second assembly since the types are tagged by separate assembly names. Generics do not suffer this problem because of the runtime instantiation provided by the CLR.

So, given these constraints, why did we choose to provide both templates and STL/CLR? The working C++ programmer has built up an expertise with this library as well as an existing body of code. We wanted to not only provide a migration path for existing code, but also for existing expertise as well. If you have previously depended on the STL in your C++ programming, its absence under .NET was felt as a loss. Well, not with STL/CLR. Last I heard, the plan was to make it available for download when it's ready. Stay tuned! I know I will.

Send your questions and comments for Stanley to  purecpp@microsoft.com.

Stanley B. Lippman began working on C++ with its inventor, Bjarne Stroustrup, in 1984 at Bell Laboratories. Later, Stan worked in feature animation both at Disney and DreamWorks and served as a Software Technical Director on Fantasia 2000. He has since served as Distinguished Consultant with JPL, and an Architect with the Visual C++ team at Microsoft.