Images in Open XML documents

Images are one of the basic elements of a document, and the use of images in documents continues to grow. Just a few years ago, it was relatively uncommon to have an image in a word-processing document, and downright rare to see one in a spreadsheet. Now images are commonplace in all types of documents, and they've become an expected component of professional-looking business documents.

As Brian Jones has explained on his blog, it's pretty easy to embed an image in a WordprocessingML document that you're generating from your own code. You just insert some markup in the document body where you want the image to appear, add a relationship to the image part, define a content type for that part, and you have an image in the document.

The relationship-based structure of Open XML documents allows for a lot of flexibility. in the case of an embedded image, you can have that relationship point to an image inside the document itself (as in Brian's example), an external image on your local hard drive, or an external image located on a web server.

Sample document: Images.docx

The attached sample document shows how these three approaches can be implemented in programmatically generated WordprocessingML document. Un-zip the attached Images.zip file into a folder, and you'll see two files: a WordprocessingML document (Images.docx) and an image (external-local.jpg). When you open the DOCX in Word 2007 you'll see something like the screen shot shown here. If you don't have internet connectivity, that third image won't appear -- more on that in a minute. (Frankly, if you don't have internet connectivity, I'm not sure how you're reading this.)

If you rename the Images.docx file to a .ZIP and drill down into it, you'll see that its structure is pretty simple. There's a document.xml "start part," an embedded image (that first one, internal.jpg), a content-types item, and two relationship parts. Let's look at the key details in the content types, document body, and relationships.

Content Types

The content types definition is very simple: all three images use the same definition, because they're all the same type of content regardless of where it happens to be stored. Here's the content-type definition from [Content_Types].xml for the jpg extension:

 <Default Extension="jpg" ContentType="image/jpeg" />

Main Document Body

In the main document body, document.xml, you'll see that the manner in which each image is embedded doesn't vary. It's the same markup to embed an internal image, an external/local image, or an external/web image:

 <w:body>
  <w:p>
    <w:r>
      <w:t>Internal image stored inside Images.docx:</w:t>
    </w:r>
  </w:p>
  <w:p>
    <w:r>
      <w:pict>
          <v:shape id="myShape1" type="#_x0000_t75" style="width:400; height:240">
          <v:imagedata r:id="rId1"/>
        </v:shape>
      </w:pict>
    </w:r>
  </w:p>
  <w:p>
    <w:r>
      <w:t>External image stored in the local file system:</w:t>
    </w:r>
  </w:p>
  <w:p>
    <w:r>
      <w:pict>
          <v:shape id="myShape2" type="#_x0000_t75" style="width:400; height:240">
          <v:imagedata r:id="rId2"/>
        </v:shape>
      </w:pict>
    </w:r>
  </w:p>

  <w:p>
    <w:r>
      <w:t>External image on a web server:</w:t>
    </w:r>
  </w:p>
  <w:p>
    <w:r>
      <w:pict>
          <v:shape id="myShape3" type="#_x0000_t75" style="width:400; height:240">
          <v:imagedata r:id="rId3"/>
        </v:shape>
      </w:pict>
    </w:r>
  </w:p>
</w:body>

Document Relationships

There are only three document-level relationships defined (in document.xml.rels), one for each image. Note the TargetMode attribute, which specifies whether the image is stored inside or outside the document package itself, and the Target attribute that shows the path to the image file itself:

 <Relationship Id="rId1"
  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
  Target="internal.jpg"/>
<Relationship Id="rId2"
  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
  Target="external-local.jpg"
  TargetMode="External"/>
<Relationship Id="rId3"
  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
  Target="http://www.mahugh.com/samples/external-http.jpg"
  TargetMode="External"/>

The benefit of this approach: flexibility

This structure allows for many creative development scenarios:

  • You can embed images in the document when appropriate. For example, static images that should be tightly bound to the content of the document.
  • You can store images on the local hard drive when appropriate. For example, perhaps you're modifying or generating those images from another process that doesn't know how to put them in an OPC container. Or maybe you'd like to include an image in the document that you know is on users' hard drives already, without bloating the document by inserting it.
  • You can store images on a web server when appropriate. For example, the image may be highly dynamic, and you want to be able to update any number of distributed documents from a centralized location at any time. Or maybe you want to email a report to a colleague on Friday afternoon, and actually generate the embedded chart image over the weekend before they look at the document on Monday morning. (Hey, it's just an example!)

Another important flexibility aspect to the relationship-based approach is that the location of these images can be modified without making any changes to the document body itself. The main document body is typically the largest part in an Open XML document, and the relationships part is usually much smaller and simpler. The ability to modify the relationships part independent of the document body means you can even do this on non-.NET platforms, by simply opening the DOCX from any ZIP library and modifying the targets of the relationships part.

Image handling is a very flexible aspect of the Open XML file formats. You can learn more by experimenting with the attached sample document. Here are a few things you can do to see how easy it is to put images in documents:

  • Replace internal.jpg in the package with another image of your choosing and then re-open Images.docx in Word.
  • Copy your own image over external-local.jpg and re-open Images.docx.
  • Modify the target of rId3 to point to a JPG image on your favorite web server.

P.S. The sample images in this document are from series of pictures on my personal blog from various business trips this year. For those who are interested, here are links to more shots in each series:Paris, FranceMunich, Germany (Oktoberfest)Sao Paulo, Brazil (Carnaval)

Images.zip