base64 Decode a Picture in XML

I taught a class on ASMX and WSE 2.0 the other day, and someone asked about a project they are working on involving Word 2003.  I usually cringe at the Office automation type questions that are inevitable at training classes like this because they usually involve something about automating mail merges or some obscure feature of Office that I have honestly never heard of.  But this one caught my ear.

A guy asks me how to use the picture saved in a Word file.  What caught my ear is that he said he saves the file as XML, then opens a picture in notepad, copies the text from notepad, and pastes it into Word... and wonders why this doesn't work.  I started to explain base64 encoding, the XML schema of a Word 2003 document, and why the approach wouldn't work, but I realized this wasn't what he was asking at all.  What he was asking was simply, "how do I use a picture stored in a Word 2003 XML document?"  This can be done without fully understanding base64 encoding or the Word 2003 schema, if you know where to look.

Create a Word 2003 document and paste a picture into it.  Save the document as XML.  Now open the Word document using notepad.exe.  You will see the following structure in the document:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<?mso-application progid="Word.Document"?>
<w:wordDocument xmlns:w="" xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:sl=""
xmlns:aml="" xmlns:wx=""
xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no" xml:space="preserve">
<w:rFonts w:ascii="Verdana" w:h-ansi="Verdana" />
<wx:font wx:val="Verdana" />
<w:color w:val="000000" />
<v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe"
filled="f" stroked="f"></v:shapetype>
<w:binData w:name="wordml://01000001.gif"> . . . base64 encoded data here</w:binData>

The key is that the base64 encoded data is stored within the binData element, and this is really the only bit of data that the guy asking the question was concerned with.  Now create a Windows Form application in Visual Studio .NET.  In the Windows form, add a using declaration to import System.Xml.  Add a Picture Box control to the form.  Add a method to the code-behind and add the following code to the method:

XmlTextReader reader = new XmlTextReader( @"C:\temp\xmlfile1.xml");
reader.WhitespaceHandling = WhitespaceHandling.None;

object body = reader.NameTable.Add("body");
object pict = reader.NameTable.Add("pict");

//We now have the body element.
if(pict == reader.LocalName && reader.NodeType == XmlNodeType.Element )
//Move to the shapeType element
//Skip the shapeType element
//We are now on the binData element
byte[] b = new byte[5000];
int len = reader.ReadBase64(b,0,5000);

System.IO.MemoryStream mem = new System.IO.MemoryStream();
mem.Position = 0;

  System.Drawing.Image pic = System.Drawing.Image.FromStream(mem);
this.pictureBox1.Image = pic;

I put the code into a Button1_Click event.  When I click the button, the picture that was stored in the XML document is displayed in the picturebox control.

The more amazing thing is that I saw multiple libraries for sale on the internet that did this same thing, despite the fact it only took a couple lines of code to achieve because it was already included in the framework class library.  Maybe I can figure out how to sell a component that builds strings via an efficient buffer like StringBuilder... or writes XML to a stream like XmlWriter... or maybe I can sell a component that parses XML!  Yeah... nobody thinks to look in something as obvious as System.Xml, right?