Converting a managed PDB into a XML file.

I wrote some C# sample code to get an ISymbolReader from a managed PDB (Program Database) file and then dump it back out as XML. 
The managed PDB stores all the source-level debugging information such as:
- the mapping between IL offsets and source lines.  (Normally a compiler builds up this mapping automatically, though C#'s #line directive lets you explicitly write it).
- the entry point method if present (eg, "Main" in C#). This does not exist in dlls.
- the names of local variables.
PDBs basically pick up where Reflection and Metadata leave off. That's because Reflection / Metadata just include information needed to execute the program and perform some runtime services. We didn't want to bloat managed executables with additional information beyond that. (In retrospect, we've learning that there's a niche set who really wish all this information was available from reflection).

ISymbolReader is the entry point interface into accessing information in a PDB. The interfaces are unfortunately defined in mscorlib. They actually wrap the unmanaged symbol store interfaces declared in CorSym.idl.  MDbg beta 2 provides implementations of these interfaces using COM-interop to access the underlying unmanaged interfaces.  It turns out the CLR redist includes an implementation too, but that implementation is lame. ILDasm uses the unmanaged interfaces to read PDBs (enable "View : Show Source Lines") so that it can print source-level information with the IL. 

Sample code is here, and it requires a reference to the MDbg beta 2 for the managed wrappers. It's also FxCop clean. [Update: 1/26/06: sample code is updated to handle post -beta2 MDbg breaking changes and not crash on methods without symbols]

Here's the demo!
Here's a test source file (test.cs) for which we'll inspect the pdb. Compile it as: csc test.cs /debug+

 using System;
class Foo
static void Main()
        int x =4;
                int y  = 5;

static void Second()
        for(int x = 5; x < 7; x++)

And after running the tool:
    Pdb2Xml.exe test.exe test.xml
And here's the XML output:

 <!--This is an XML file representing the PDB for 'c:\temp\test.exe'-->
<symbols file="c:\temp\test.exe">
  <!--This is a list of all source files referred by the PDB.-->
    <file id="1" name="c:\temp\test.cs" />
  <!--This is the token for the 'entry point' method, which is the method that will be called when the assembly is loaded. This usually corresponds to 'Main'-->
  <!--This is a list of all methods in the assembly that matches this PDB.-->
  <!--For each method, we provide the sequence tables that map from IL offsets back to source.-->
    <method name="Foo.Main" token="0x6000001">
      <sequencepoints total="7">
        <entry il_offset="0" start_row="5" start_column="1" end_row="5" end_column="2" file_ref="1" />
        <entry il_offset="1" start_row="6" start_column="2" end_row="6" end_column="11" file_ref="1" />
        <entry il_offset="3" start_row="7" start_column="2" end_row="7" end_column="3" file_ref="1" />
        <entry il_offset="4" start_row="8" start_column="3" end_row="8" end_column="14" file_ref="1" />
        <entry il_offset="6" start_row="9" start_column="2" end_row="9" end_column="3" file_ref="1" />
        <entry il_offset="7" start_row="10" start_column="2" end_row="10" end_column="28" file_ref="1" />
        <entry il_offset="18" start_row="11" start_column="1" end_row="11" end_column="2" file_ref="1" />
        <local name="x" il_index="0" il_start="0" il_end="19" />
        <local name="y" il_index="1" il_start="3" il_end="7" />
    <method name="Foo.Second" token="0x6000002">
      <sequencepoints total="9">
        <entry il_offset="0" start_row="14" start_column="1" end_row="14" end_column="2" file_ref="1" />
        <entry il_offset="1" start_row="15" start_column="2" end_row="15" end_column="30" file_ref="1" />
        <entry il_offset="12" start_row="16" start_column="6" end_row="16" end_column="16" file_ref="1" />
        <entry il_offset="14" hidden="true" />
        <entry il_offset="16" start_row="17" start_column="3" end_row="17" end_column="24" file_ref="1" />
        <entry il_offset="23" start_row="16" start_column="24" end_row="16" end_column="27" file_ref="1" />
        <entry il_offset="27" start_row="16" start_column="17" end_row="16" end_column="22" file_ref="1" />
        <entry il_offset="32" hidden="true" />
        <entry il_offset="35" start_row="18" start_column="1" end_row="18" end_column="2" file_ref="1" />
        <local name="CS$4$0000" il_index="1" il_start="0" il_end="36" />
        <local name="x" il_index="0" il_start="12" il_end="35" />

And just to confirm, here's the ILDasm output for Foo.Main:

 .method /*06000001*/ private hidebysig static 
        void  Main() cil managed
  // Code size       19 (0x13)
  .maxstack  1
  .locals /*11000001*/ init ([0] int32 x,
           [1] int32 y)
  .language '{3F5162F8-07C6-11D3-9053-00C04FA302A1}', '{994B45C4-E6E9-11D2-903F-00C04FA302A1}', '{5A869D0B-6611-11D3-BD2A-0000F80849BD}'
// Source File 'c:\temp\test.cs' 
//000005: {
  IL_0000:  nop
//000006:  int x =4;   
  IL_0001:  ldc.i4.4
  IL_0002:  stloc.0
//000007:    { 
  IL_0003:  nop
//000008:      int y  = 5;
  IL_0004:  ldc.i4.5
  IL_0005:  stloc.1
//000009:     }
  IL_0006:  nop
//000010:   Console.WriteLine("Boo!");
  IL_0007:  ldstr      "Boo!" /* 70000001 */
  IL_000c:  call       void [mscorlib/*23000001*/]System.Console/*01000005*/::WriteLine(string) /* 0A000003 */
  IL_0011:  nop
//000011: }
  IL_0012:  ret
} // end of method Foo::Main

Running XML queries:
XML has lots of neat associated technologies. For example, you can then run XPath queries to answer certain questions about the PDB. This is horribly inefficient, but very cool. Here are more sample queries:
Get all locals that are active at line 2
    /symbols/methods/method/locals/local[@il_start<="2" and @il_end>="2"]/@name

Get all filenames references from the pdb:

Find all methods that have code in a given filename (t.cs).

Get the name of all methods.

Find the name of the entry point token:

Name all locals in method Foo.Main

Get start row for IL offset 4 in method Foo.Main

The sample code includes these queries and others.

Other commentary:
At first I was hoping to create an XmlReader over a PDB store. It turns out this is very difficult. The reader has 25+ abstract methods; and even if you create it as a XmlTextReader over a text stream, that's bad because if you're writing XML, you really want to write it with a structured writer like XmlWriter instead of writing a raw text stream. So I settled on just using an XmlWriter instead of a reader.
I also thought it would ultimately be better to expose the PDB via an XPathNavigator instead of an XmlReader/XmlWriter, due to the query-intensive purpose of a PDB. I did XmlWriter because that was easiest and I needed to start somewhere. Xml super gurus out there are certainly free to grab this sample and produce an XPathNavigator for it.

I think it would also be a great project to get a XML2PDB writer. And then you can use XSLT to transform PDBs. That's a project for another day. Offhand, I think the managed wrappers are not complete enough to do this. But since you've got the source for MDbg, you can always fix that.

There's a long list of things I'd like to get into regarding PDBs. My list for future blog entries include:
- why do managed PDBs need a metadata reader?
- how does the code to get an ISymReader work?
- what about PDB readers for in-memory modules?
- Why does MDbg have its own implementation of ISymReader?
- More on the PDB vs. metadata / reflection split.
- Other XML-isms (such as XSD, XSLT, serialization on the xml documents above).