Programmatic Coverage Analysis in Visual Studio 2010

As hinted upon in my last post, today’s entry will be on how to programmatically perform analysis on a Visual Studio coverage file in Visual Studio 2010.

The first step is to reference the coverage analysis assembly,  Microsoft.VisualStudio.Coverage.Analysis.dll, in your project.  This managed assembly can be found in “%ProgramFiles%\Microsoft Visual Studio 10.0\Common7\IDE\PrivateAssemblies” (use %ProgramFiles(x86)% on a 64-bit OS).  The project system should copy the assembly locally for your application to use.  This assembly has native x86 dependencies, so it can only be used from an x86 process.  It is therefore marked as 32-bit required and if you plan on loading it on a 64-bit OS, ensure your entry point assembly is also marked as 32-bit required (i.e. built for the “x86” platform).

To run your application, you will also need to copy Microsoft.VisualStudio.Coverage.Symbols.dll and dbghelp.dll to your output directory.  These are both native modules, so you won’t be able to add them as a reference.  The easiest way I’ve found to do this is to “Add Existing Item” to the project, browse to the files (both are also in PrivateAssemblies), click the little down arrow next to the “Add” button, select “Add As Link”, select the files in Solution Explorer, bring up the Properties Window, and change “Copy to Output Directory” to “Copy if newer”.  Note that Microsoft.VisualStudio.Coverage.Symbols.dll has a dependency on the Microsoft Debug Interface Access library (msdia100.dll), which is a native COM component that is registered when you install Visual Studio.  You will need to register msdia100.dll using regsvr32.exe on a machine that doesn’t have Visual Studio installed if you plan on using the coverage analysis API.

Next you’ll want to use the namespace containing the core implementation:

    1: using Microsoft.VisualStudio.Coverage.Analysis;

Now that we’ve setup a project to use the analysis API, let’s recall John Cunningham’s old post about using the Visual Studio 2005/2008 coverage analysis API.  There has been a minor breaking change with the old API.  The CoverageInfoManager class no longer exists.  Instead, CoverageInfo objects are created via static methods on CoverageInfo itself.  Here’s a quick comparison:

The old way to analyze a file:

    1: CoverageInfoManager.ExePath = "<executable_search_paths>";
    2: CoverageInfoManager.SymPath = "<symbol_search_paths>";
    3: CoverageInfo info = CoverageInfoManager.CreateInfoFromFile("<path_to_coverage_file>");
    4:  
    5: CoverageDS dataSet = info.BuildDataSet(null);
    6:  
    7: CoverageInfoManager.Shutdown();

And the new way:

    1: using (CoverageInfo info = CoverageInfo.CreateFromFile(
    2:         "<path_to_coverage_file>", 
    3:         new string[] { "<exe_path1>", "<exe_path2>" }, 
    4:         new string[] { "<sym_path1>", "<sym_path2>" }))
    5: {
    6:     CoverageDS dataSet = info.BuildDataSet();
    7: }

The executable search paths are locations where the instrumented modules can be found and the symbol search paths are where the instrumented symbols can be found.  They are optional and there is an overload of CoverageInfo.CreateFromFile that takes only the path to the coverage file.  The analysis engine will always check the same location as the coverage file to locate the instrumented modules and symbols in addition to the paths supplied.

The examples above use the CoverageDS type that is unchanged from Visual Studio 2008.  This type is a typed data set containing the following tables: Module, Namespace, Class, Method, Lines, and SourceFileNames.  It also has methods for exporting/importing the dataset’s data called ExportXml and ImportXml.  The exported XML is the same format that Visual Studio uses when it exports a coverage file to XML in the coverage results tool window.

The downside to using CoverageDS is that it will load all of the coverage data into memory at the same time.  This is simply not scalable when dealing with large numbers (i.e. many millions) of basic blocks, which usually results in 500 MB or more of data.  Therefore, the Visual Studio 2010 coverage analysis API has an alternative method for enumerating the coverage data on demand.  This also allows for easier filtering before method statistics are rolled up in the statistics of their classes, namespaces, and modules.

How to dump out the coverage statistics for each method:

    1: using (CoverageInfo info = CoverageInfo.CreateFromFile("foo.coverage"))
    2: {
    3:     List<BlockLineRange> lines = new List<BlockLineRange>();
    4:  
    5:     foreach (ICoverageModule module in info.Modules)
    6:     {
    7:         byte[] coverageBuffer = module.GetCoverageBuffer(null);
    8:  
    9:         using (ISymbolReader reader = module.Symbols.CreateReader())
   10:         {
   11:             uint methodId;
   12:             string methodName;
   13:             string undecoratedMethodName;
   14:             string className;
   15:             string namespaceName;
   16:  
   17:             lines.Clear();
   18:             while (reader.GetNextMethod(
   19:                 out methodId,
   20:                 out methodName,
   21:                 out undecoratedMethodName,
   22:                 out className,
   23:                 out namespaceName,
   24:                 lines))
   25:             {
   26:                 CoverageStatistics stats = CoverageInfo.GetMethodStatistics(coverageBuffer, lines);
   27:  
   28:                 Console.WriteLine("Method {0}{1}{2}{3}{4} has:",
   29:                     namespaceName == null ? "" : namespaceName,
   30:                     string.IsNullOrEmpty(namespaceName) ? "" : ".",
   31:                     className == null ? "" : className,
   32:                     string.IsNullOrEmpty(className) ? "" : ".",
   33:                     methodName
   34:                     );
   35:                 Console.WriteLine("    {0} blocks covered", stats.BlocksCovered);
   36:                 Console.WriteLine("    {0} blocks not covered", stats.BlocksNotCovered);
   37:                 Console.WriteLine("    {0} lines covered", stats.LinesCovered);
   38:                 Console.WriteLine("    {0} lines partially covered", stats.LinesPartiallyCovered);
   39:                 Console.WriteLine("    {0} lines not covered", stats.LinesNotCovered);
   40:                 lines.Clear();
   41:             }
   42:         }
   43:     }
   44: }

This example creates a CoverageInfo in the same manner as above, but instead of calling BuildDataSet, it enumerates through each module in the coverage file and then enumerates each method in the module to dump out its statistics.  You’ll notice that we call GetCoverageBuffer on the module which returns a byte[].  There is a byte in this array for each basic block in the module.  If the byte is zero, it means the basic block was not covered.  If it is non-zero, it means the basic block was covered.  So a simple method for counting the “raw” basic blocks covered/not covered would be to count the zero vs. non-zero bytes in this array.  However, the total number of basic blocks reported in coverage statistics is usually less than the total number of basic blocks in the module because certain basic blocks are discarded by GetNextMethod (usually these basic blocks are for compiler-generated code).  So keep that in mind if you want to analyze the coverage buffer directly.

Note that each method identifier is unique for a particular build of a module.  Each module has two properties that uniquely identify it: Signature (a Guid) and SignatureAge (a uint).  These properties actually correspond go the debug information’s signature information in the module and change whenever a module is re-linked.  For incremental links (VC++), the signature may remain the same while the signature’s age counter will increment, so these two values need to be taken together to version a particular build of a module.

Also, with modules built by VC++ that were linked with COMDAT folding enabled (an optimization that usually comes into play when using templates), you may see multiple functions returned by GetNextMethod that map to the same basic blocks because the functions contained duplicate code and were folded into a single copy.  To get around this, we only roll up a method’s statistics to the module’s statistics as long as we haven’t seen the starting basic block before (lines[0].BasicBlockIndex).  That way, the module’s totals are always accurate, although this means you may not always sum up the namespace numbers in a CoverageDS to arrive at the module’s numbers.  I generally recommend disabling identical COMDAT folding when collecting native code coverage data, as it gives you a better idea of what was actually executed by your tests.

I’ll leave filtering and doing more complicated analysis (i.e. rollup statistics) as an exercise for the reader.  Hopefully this will serve as a good starting point for those interested in programmatically analyzing code coverage data using Visual Studio 2010.