Example Word '12' document from Beta 1 with hyperlink and image

I just posted another example document if any is interested. http://jonesxml.com/resources/hyperlinkandimage.docx For those of you that got a copy of Beta 1, the file will be compatible with your build, so you can open it and take a look. This is an extremely simple file that has a simple paragraph, another paragraph with a hyperlink, and an image. I posted this to show you guys a few things:

Open Packaging Conventions

As I'm mentioned before, we use a simple set of conventions for structuring a document within a ZIP. This file has some text, a hyperlink, and a picture, and the open packaging conventions are used to tie that all together.

Go ahead and rename the file to have a ".zip" extension and open it up. You'll notice there is a file there called [Content_Types].xml. That file describes what the content types of the other parts within the package are. Look at the _rels folder. The file _rels/.rels is the first place you go to start parsing the file. It's an xml file that describes all the root level relationships, and if you open it you can see that the first part you need to parse in order to read the document is "document.xml".

Use of relationships

Open the "document.xml" part and take a look:

<w:wordDocument xmlns:r="http://schemas.microsoft.com/office/2005/11/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.microsoft.com/office/word/2005/10/wordml">
<w:body>
<w:p>
<w:r>
<w:t>Hello World!</w:t>
</w:r>
</w:p>
<w:p>
<w:hyperlink r:id="rId2" >
<w:r>
<w:rPr>
<w:color w:val="0000FF"/>
<w:u w:val="single" />
</w:rPr>
<w:t>Click here for Brian Jones' blog.</w:t>
</w:r>
</w:hyperlink>
</w:p>
<w:p>
<w:r>
<w:pict>
<v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:250; height:200">
<v:imagedata r:id="rId4" />
</v:shape>
</w:pict>
</w:r>
</w:p>
</w:body>
</w:wordDocument>

The first paragraph isn't really all that interesting, but the next two definitely are. Look at the attributes called r:id. Those are relationship references. Any reference from one part in the file to another part has to be done via a relationship. The really cool part is that all relationships live out on their own, so you can quickly scan a package and figure out all the parts that make up that document and how they relate without having to actually go into the application xml.

Are you interested to know where those relationships actually point to? Well, the name of this part is "document.xml", so that means the relationship file is going to be called "_rels/document.xml.rels". That's how you find the relationships for any part, just go to the _rels folder that is in the same folder as the part, and find a part with the same name but with ".rels" at the end. This is all described in the Open Packaging Conventions, but it's pretty straightforward.

Here's what the relationship folder looks like:

<Relationships xmlns="http://schemas.microsoft.com/package/2005/06/relationships">

<Relationship
Id="rId2"
Type="http://schemas.microsoft.com/office/2006/relationships/hyperlink"
Target="http://blogs.msdn.com/brian_jones"
TargetMode="External" />

<Relationship
Id="rId4"
Type="http://schemas.microsoft.com/office/2006/relationships/image"
Target="image1.jpg"/>

</Relationships>

You can see that even external references are done via relationships. This means that if you want to do link fix-up, or even just quickly scan a document to see what it points at, you don't need to parse all the application XML, but instead just quickly scan the relationship files. You can also obviously modify the relationships just as easily if you wanted to change a server name or something. Every relationship has an id, type, and target. If the relationship points to an external source, than it also has the TargetMode attribute set to external.

Another thing you may notice from this is that the actual part names don't really matter. We output all of our files in a fairly nice structure with folders, etc. but we [DON'T] require that structure. The only thing you need to worry about is the relationship structure. You could change that folder called "word" to be "myownfolder", and as long as the relationships were updated to account for this, everything would continue to work. That means if you want to replace the picture I put in there with another one, you could just drop it into the package, and then update the document.xml.rels file to point at your new picture instead of the old one.

Formatting

Now take a look at the second paragraph with the hyperlink. When you open this in Word, the text will have a blue color and underline applied to make it look like a hyperlink. That isn't because it has a hyperlink applied, but instead because it has that formatting applied to it directly. If you were to create this file directly from Word, it would have used a style instead of direct formatting, but I wanted to show the difference between styles and direct formatting.

In Word, there are a number of different ways you can apply formatting to a document. One way is with styles. There are all kinds of styles: paragraph styles; list styles; table styles; and character styles. If some text has a style applied to it, then the WordprocessingML for it would look something like this:

<w:r>
<w:rPr>
<w:rStyle val="Hyperlink"/>
</w:rPr>
<w:t>Click here for Brian Jones' blog.</w:t>
</w:r>

If that were the case, then there would also be a styles.xml part in the package that described the Hyperlink style. In that part, there would be a style definition that would look like this:

<w:style w:type="character" w:styleId="Hyperlink">
<w:name w:val="Hyperlink" />
<w:rPr>
<w:color w:val="0000FF"/>
<w:u w:val="single" />
</w:rPr>
</w:style>

A character style (like this one) has an ID which it's referenced with, as well as a name, which is the friendly display name. They are usually both the same, but at times they need to be different (internationalization, etc).

As I already said though, in my example, I used direct formatting rather than styles. That's really not a call we make in Word, it's up to the user and the template author. If people use styles, then there won't be any formatting stored in the document.xml part and instead it will all be in the styles.xml part. If they use direct formatting though, the formatting will of course be stored right on the text in the document.xml part. If you aren't aware of the difference between direct formatting and styles, it's pretty straightforward. If you use the style picker to apply a style like "emphasis", or "heading", then we store that style name on the text, and the formatting information is stored with the style itself. If you instead press the "B" button to make the text bold, or choose a color to apply, then you haven't applied a style. Instead, you've specified that the text selected should have those specific properties stored.

That's what I did with this example. I applied formatting properties directly on the text, so instead of the style reference on the run, and then the color and underline values stored on the style, I just took the entire <w:rPr> tag from the style and moved it down to the text run, so it looks like this:

<w:r>
<w:rPr>
<w:color w:val="0000FF"/>
<w:u w:val="single" />
</w:rPr>
<w:t>Click here for Brian Jones' blog.</w:t>
</w:r>

The Word structure is actually really simple. You have a few core objects: p (paragraphs), r (text runs), tbl (tables), as well as other things like sections, table rows, table cells, text boxes, etc. These core objects are represented with their associated XML element, and any formatting or other properties that are applied with that object are stored in the objects property bag: pPr (paragraph properties), rPr (run properties), and tblPr (table properties). If you want to apply formatting to a run of text, all you do is edit the rPr tag for that run.

We'll cover much more of this over the coming months, but I want to make it clear that as you look at the WordprocessingML format, understand the core structures. Everything else is just a property of one of those structures.

-Brian