MS Office Open XML Formats and OpenDocument XML format
I've had a number of questions and e-mails asking if the new Office open XML formats are going to be the same as the OASIS OpenDocument format. Rather than reply to the various comments & e-mails separately I figured I'd just attempt to summarize everything as a new post. Sorry it took so long to reply to this issue, but I have been distracted by TechEd for the past week. Scoble actually swung by and talked with Jean Paoli and me about this just before I left for teched. You can watch that video here: https://channel9.msdn.com/ShowPost.aspx?PostID=76169
The primary question I've been getting is whether or not the two formats are the same. The two formats are very different even though they both use ZIP and XML because they use different schemas. The basis for the OpenDocument format work was the OpenOffice.org XML file format (http://www.oasis-open.org/committees/office/faq.php) that originated I believe with the StarOffice product, where the goal of that group was to create an open and interoperable format. Similarly, our goal in Microsoft has also been to create an open and interoperable format. That’s why we made such a big push to use both ZIP and XML, because they are already so widely in use. Actually a lot of other people in the industry also use XML with ZIP to create XML based formats, for example in the CAD industry it’s great because XML compresses so well with ZIP and provides an easy to use container. That wide use makes it easier for people to take our formats and build on top of them. This is where the similarity between the two formats stops though: Our primary goal at Microsoft was to create an open format that fully represented all of the features that our customers have used in their existing documents, documents that have been created using the existing Office products over the past couple decades.
Office has over 400 million customers, and we have a responsibility to continue to support all existing documents and all the existing functionality. There are billions of documents that we are going to help move into our new XML formats, and so a key constraint on all of our efforts was that these new formats had to support all those existing files and features with absolutely no loss. To give you an idea of how big of an undertaking that can be, we have more than 1600 XML elements and attributes that reflect the features in Word alone in Office 2003. This is why we had to design a new format instead of shoehorning our features in another existing format (Jean Paoli explains this in the video on Channel9).
Let's talk a bit about the interoperability of the two formats, since that’s an important topic to be clear on. Because both formats are open and documented, it is possible to create a transform (or filter) that goes between the two. The interoperability problems will start to come up if there are features that are present in one application but not present in the other application. You have to assume this will be the case since every application out there has a different set of customers that request different features. From the Microsoft point of view we have so many features we built over the years and it would be extremely unlikely that those features work exactly the same way in other applications. Believe me, there are *tons* of features in Word, Excel and PowerPoint, and we have a responsibility to our customers to continue to support them.
I’m hoping that over time, as we publish these new schemas and provide documentation, people will start to build tools for going from our formats into other formats (and vice versa). We already did this with Word 2003's XML when we build an XSLT to transform into HTML that you can find here: http://www.microsoft.com/downloads/details.aspx?familyid=19676b18-1bcd-4852-93ba-0b5a203ea731&displaylang=en. There is also an example up on the web of how to use XSL-FO with WordML. I'm also going to push hard for us to build more of these transforms that we can post up on the web. I'll probably start posting some example stuff for the Word 2003 XML over the next few months since that’s the format that is currently out there for people to play with. Let me know if there are any simple transforms you'd like to see. Also please tell me about your experience if you do have converters that you built for Office 2003 XML and how we can make things easier to build.