I've started testing interoperability between various document-format implementations, and have found some interesting results. There are some very comprehensive tests of document format interoperability beginning to emerge (see the University of Illinois study mentioned below, for example), but the things I've been doing are pretty informal: create a document, type something into it, do a little formatting, and save it. Even though these documents are very simple, I've run into some interesting results that illustrate the challenges faced by all implementers.
First let's take a look at hidden text. Most word processing software allows users to mark text as hidden, and the text remains in the document but it's not displayed or printed. Both the ODF and Open XML formats include support for hidden text.
So I tried creating some hidden text in an ODT document. In Sun's OpenOffice (version 2.4.1), I entered a paragraph of plain text, and then I entered another paragraph of text containing the sentence "This paragraph is styled as hidden text." I then selected that paragraph, created a new style named HiddenText from the selection, and checked the Hidden checkbox in that style. The text disappeared from the screen, but when I saved my document and opened it in IBM's Symphony (version 1.2), here's what I saw:
Text that is hidden in one application and not hidden in another could be a problem in some scenarios, but in that example the content of the document doesn't actually change. Let's take a look at a more serious problem, where a chart created in one application shows up with what appears to be different data when opened in another application.
I went into OpenOffice and typed two pairs of columns of X and Y values, and then I created a scatter plot from this data. I rearranged the data series (in OpenOffice's chart UI), and this is how my chart looked in OpenOffice:
I then saved the chart as an ODF spreadsheet (.ods), and opened it up in Symphony. Here's what I saw:
In this case, the underlying issue is that the ODF standard doesn't specify exactly how multiple data series in a multi-line scatter plot should be interpreted, so different implementations will have different assumptions in this area. This is the type of problem that can be resolved in maintenance, by tightening up the specification to assure that all implementations will handle these details in exactly the same way.
These are just a couple of examples of the sorts of interoperability challenges that all implementers face when working with large, complex document format standards. We face the same types of challenges here at Microsoft, in our implementations of both the ODF and Open XML document formats. As a simple example, consider how PowerPoint handles the grid used for aligning objects: the standard states that the grid should be specified in EMUs (English/Metric Units), but in PowerPoint the units are instead to be considered 1/1024th of an EMU. If you didn't know that the unit for PowerPoint’s grid was 1/1024th of an EMU, it would be hard to write code that interoperates correctly with PowerPoint; this is why we'll soon be releasing implementation notes for ECMA-376 (as we have already done for ODF).
On a related note, consider what happens if there's a typo in the spec -- implementers who conform to that erroneous information would have problems achieving interoperability with an implementation that doesn't include the typo. For this reason, we've carefully reviewed the details of our Open XML implementation against the text of the standard, and we'll be submitting errors we've found to the WG4 maintenance process.
We're also trying to help improve ODF interoperability, in our work with the OASIS ODF TC. We have submitted 15 proposals for future changes to the ODF spec to address specific interoperability challenges we've found, ranging from additional numbering formats to table grid size to z-order semantics and many others. Working together with other implementers on IS29500 and ODF maintenance is the only way to resolve some of these issues.
Rajiv Shah and Jay Kesan at the University of Illinois have done a study on interoperability issues for open standards that focuses on various implementations of ODF and Open XML. After noting that "there are significant issues with interoperability among various implementations," they summarize their findings like this:
"We consider several implications of these results including the lack of perfect compatibility between implementations, the lack of good implementations outside of Windows, and the surprisingly good overall performance of OOXML implementations. The interoperability issues are troubling and suggest the need for improved interoperability testing for document formats. The results also highlight the importance of interoperability for open standards in general. Without interoperability, governments will be locked-in to the dominant implementations for either standard and in the process lose many of the benefits that might accrue from adopting an open standard in the first instance."
That strikes me as a very pragmatic way of looking at the issues.
In summary, to deliver on the promise of standards-based interoperability, implementers need to work together in at least these three critical ways:
- shared stewardship of the standards, through active participation in the defined maintenance processes;
- transparency of implementation, through published implementation notes that describe how the major implementations are handling the myriad details found in modern document format standards; and
- collaboration between implementers in events like the the DII workshops, interoperability working groups such as the OIC TC, and direct engagement between implementers to test interoperability.
We're working hard in each of these areas, and applying these three principles to Open XML as well as ODF. What else can we do? What do you think is needed to improve document format interoperability?