Using the Metadata API and Tokens

Article
11/16/2012

The metadata APIs can be called from C++. The way the metadata APIs are used depends in part on the kind of client using them. Most metadata API clients fall into one of two categories:

Compilers, like the compiler in Visual C++ 2005, that build interim .obj files and then use a separate linker phase to merge the individual compilation units into a single target portable executable (PE) file.
Rapid application development (RAD) tools that manage all code and data structures in the tool environment until build time, when they build and emit a PE file in a single step.

Other clients might use metadata APIs in a way that is between these two styles. Some tools might let the metadata engine perform optimizations, but might not be interested in token-remapping information. Alternatively, they might want remapping information only for some token types and not others. In fact, a compiler might not perform optimizations even when emitting an .obj file.

The Compile-and-Link Style

In the compile-and-link style of interaction, a compiler front-end uses the IMetaDataDispenserEx API to establish an in-memory metadata scope, and then uses the IMetaDataEmit API to declare types and members, working with the metadata abstractions described in Metadata Tokens Overview. However, the front-end will not be able to supply method implementation information (for example, whether the implementation is managed or unmanaged, MSIL or native code) or relative virtual address (RVA) information, because that information cannot be determined at compile time. Instead, the back-end, or linker, will need to supply this information later, as the code is compiled and emitted into the PE file.

A complication here is that the back-end tool needs to be able to obtain information about the target save size of the metadata binary in order to leave room for it in the PE file. However, the tool is not ready to save the metadata binary into the file until the method RVAs and module-level static data member RVAs are known and emitted into metadata. In order to calculate the target save-size correctly, the metadata engine must first perform any pre-save optimizations, since these optimizations, ideally, make the target binary smaller. Optimizations might include sorting data structures for faster searching, or optimizing away (early binding) mdTypeRef and mdMemberRef tokens when the reference is to a type or member that is declared in the current scope. These sorts of optimizations can result in remapping metadata tokens that the tool must be able to reuse in order to emit the implementation and RVA information. As a result, the tool and the metadata engine must work together to track token remappings.

Therefore, the sequence of calls for persisting metadata during compilation is as follows:

IMetaDataEmit::SetHandler, to supply an IUnknown interface that the metadata engine can use to query for IID_IMapToken, used to notify the client of token remappings. SetHandler might be called at any point after the metadata scope is created, but certainly before a call to IMetaDataEmit::GetSaveSize.
IMetaDataEmit::GetSaveSize, to obtain the save size of the metadata binary. GetSaveSize uses the IMapToken interface supplied in IMetaDataEmit::SetHandler, to notify the client of any token remappings. If SetHandler was not used to supply an IMapToken interface, no optimizations are performed. This enables a compiler that is emitting an interim .obj file to skip unneeded optimizations that are likely to be redone after the link and merge phase.
IMetaDataEmit::Save, to persist the metadata binary, after IMetaDataEmit::SetRVA and other IMetaDataEmit methods are used as needed to emit the final implementation metadata.

The next level of complication comes in the linker phase, when multiple compilation units are merged into an integrated PE file. In that case, not only do the metadata scopes need to be merged, but the RVAs will change again as the new PE file is emitted. In the merge phase, the IMetaDataEmit::Merge method, working with a single import and a single emit scope with each call, remaps metadata tokens from the import scope into the emit scope. In addition, the merge process might encounter continuable errors that it must be able to send to the client. After the merge is complete, emitting the final PE file involves a call to IMetaDataEmit::GetSaveSize and another round of token remapping.

The sequence of calls for emitting and persisting metadata by the linker is as follows:

IMetaDataEmit::SetHandler, to supply an IUnknown interface that the metadata engine can use to query for not only IID_IMapToken, as before, but also for IID_IMetaDataError. The latter interface is used to notify the client of any continuable errors that arise from the merge.
IMetaDataEmit::Merge, to merge a specified metadata scope into the current emit scope. Merge uses the IMapToken interface to notify the client of token remappings and it uses IMetaDataError to notify the client of continuable errors.
IMetaDataEmit::GetSaveSize, to obtain the target save size of the metadata binary. GetSaveSize uses the IMapToken interface supplied in IMetaDataEmit::SetHandler to notify the client of any token remappings. A tool must be prepared to handle token remappings in Merge and then again in GetSaveSize after format optimizations are performed. The last notification for a token represents the final mapping that the tool should rely on.
IMetaDataEmit::Save, to persist the metadata binary, after IMetaDataEmit::SetRVA and other IMetaDataEmit methods are used as needed to emit the final implementation metadata.

The RAD Tool Style

As in the compile-and-link style of interaction, a RAD tool uses the IMetaDataDispenserEx interface to establish an in-memory metadata scope and then uses the IMetaDataEmit interface to declare types and members, working with the metadata abstractions described in Metadata Tokens Overview.

In contrast to the compile-and-link style, the RAD tool will typically emit the PE file in a single step. It will likely emit declaration and implementation information in a single pass, and it will probably never need to call IMetaDataEmit::Merge. Therefore, the only reason the RAD tool might need to handle the complexity of token remappings is to take advantage of the pre-save optimizations that are currently performed by IMetaDataEmit::GetSaveSize.

In general, a tool that can emit fully optimized metadata does not need the metadata engine in order to emit a reasonably optimized file. However, future implementations of the metadata engine and file format might make some optimization strategies obsolete, so there is a clear set of rules for how to emit optimized metadata.

After emitting the metadata declarations and implementation information, the sequence of calls is as follows:

IMetaDataEmit::SetRVA and other IMetaDataEmit methods, as needed, to emit the final implementation metadata.
IMetaDataEmit::Save, to persist the metadata binary.

Share via

Using the Metadata API and Tokens

The Compile-and-Link Style

The RAD Tool Style

See Also

Other Resources

Additional resources