Open XML Converters for Mac Office
There’s been a bit of flak about the Office Open XML file format converters for Mac Office. Sheridan posted an update on MacMojo, and Schwieb weighed in regarding some of the comments that people have made. There’s quite a bit of speculation gong on, and not a whole lot of information, so I’m going to try to dispel some of the fog.
This discussion centers on Word, because I’m a Word developer, but the general ideas hold for all three of the affected Office applications. The most significant difference between Word and the rest of the suite is that Word has a converter API. There’s a WinWord converter SDK that’s downloadable from the Microsoft support web site. While there are some subtle differences (FSRef’s instead of file paths, for example), the overall API is the same for Mac Word. Of particular importance is the fact that the lingua franca for converting Word files formats is RTF.
So, in order to write a converter for Word, you need two things: 1) a component that reads and writes the external file format; and 2) a component that generates and parses RTF. Also, because of the hierarchical structure of XML, you need to have some form of intermediate representation of the file.
Let’s put our Win Word hat on for a second, go back in time about two years, and think about how we might do this. Well, by the time Win Office ships, we’ll have a software component that satisfies all of those needs: Word itself, or the new version of Word, to be precise. So, one, very efficient, way to implement that converter is to refactor the UI out of Word 12, repackage the result up with any other necessary components, and write a wrapper around all of it that exposes the API that the older version of Word expects converters to implement.
Do that, and you can ship the converters the same time you ship Office. The big upside of this idea is that you can really narrow down the scope of testing you do on the converter itself, because you’ve already tested both the RTF and the Open XML components by testing Word itself. So, you gain leverage from both a development and a testing perspective.
Now, let’s put our Mac Word hat back on, and think of what our options are given the reasoning I’ve stated above. You can’t really ask the Win Office team to toss their idea in the trash just so you can work on the converters in tandem. Well, you can, but one would have to be very optimistic to expect more than a polite, “Sorry.” I can’t think of a clearer example of the tail trying to wag the dog.
Instead of following in Win Word’s footsteps, how about we spin off a separate development team to work on the converters separately from Word itself? I've read suggestions made by some that writing converters from scratch could have been done in a relatively (in some cases ridiculously) short amount of time. So, let's test that idea by doing some back-of-the envelope calculations. You can check these numbers for yourself by downloading the reference XML schemas and performing some searches through the .xsd files.
First, when could we have realistically started working on this? Well, not before Office 2004 shipped in April of 2004, so, ignoring the availability of specifications for the new format, let's assume that we began work on this roughly two years ago. The final draft of the spec wasn't submitted to the ECMA until this past October, so in terms of actually having a spec to write to, 24 months is extremely optimistic for the time period available.
How big is the task? Word, alone, has more than 1100 individual XML elements that need to be processed. We do this processing by writing something called a "handler", and each one of these elements needs a handler.
Now, some of these elements are more complex than others. A single, user-defined document property isn't very complex. A paragraph, or a document section, can be very complex. For some of these handlers, one developer can whip out two or three a day. Some of the other handlers will take a single developer up to an entire month to complete. Trying to get more than one developer working on the same handler at the same time ends up being very counter-productive. So, one handler per developer, and, on average, it's fair to assume productivity of one handler per dev per day.
At that rate, a team of 5 developers will implement 25 handlers a week, which means that we'd have all the XML handlers written in 44 weeks. Well, a little more than that, because I've rounded the number of elements down to the nearest 100. Nevertheless, we’ve taken a little less than a year to get the converters reading the new file format. We still aren't writing the new file format, we have the RTF side of things to worry about, which is actually more complex than the XML side, and I’ve completely left out all of the design and coding for the intermediate representation of the file. The intermediate representation, itself, is at least 6 to 8 months worth of work.
In other words, we're almost halfway through the schedule, with less than a quarter of the development work done. You want more developers? I don't have more developers. This is just for Word. We need additional teams for Excel and PowerPoint. People want Universal Binaries of Mac Office in their hands, they’re adding new features to Win Office 12 that Mac Office 2004 won’t understand, Apple has a new HIView architecture that requires some re-architecting of parts of Mac Office, and none of this work adds a single new feature to Mac Office.
More importantly, we’ve also run out of time to test the converters. Had we started writing converters from scratch, by the time we had something fully tested and ready for public consumption, it would have taken us longer than it has taken us on the route we’ve chosen, in no small part due to the fact that the current route we’ve chosen allows us to leverage almost all of the development work of the Win Office team.
The only reasonable choice for Mac Word has been to follow in Win Word’s footsteps. For those of you who attended the last Mac BU customer council meeting in Redmond and were wondering what I was doing while sitting in the back corner, now you know. I was busy refactoring Mac Word so that Mac Word 12 could, eventually, become the converter for the new file formats.
The big win for this strategy is that we get to do all of the things that customers are asking us to do with the next version of Mac Office: Universal Binaries, support for most of the new data types in Win Office 12, re-architecting the UI to take advantage of composited HIViews and add some compelling new features.
Lastly, can we port the Win Word converter? Well, actually, in a way, porting the Win Word converter is exactly what we have been doing, but we’re still faced with having to wait until Win Word ships before we have the final source code to merge into what we’ve already ported. Once that merge is done, then we still have to go through several months’ worth of testing and bug fixing before they’re ready for public use.
And that is precisely why there’s a delta between Win Office 2007 shipping and the full availability of converters for Mac Office.
Update: I’d like to clear up some things about what I said earlier. My back-of-the-envelope estimates included a lot more work than just supporting Open XML in Mac Office. Open XML is the easy part. It included the work required to generate RTF in both directions and to implement tools for developers.
If we had to add support for Open XML to Mac Word 12 without being able to port code from Win Word, the read/write estimates shrinks down to about 8.5 man/years (44 weeks x 5 devs x 2 for read+write). As I recall, this about half of what it took to add HTML support to Word: 10 or so devs over a release cycle of 2 years. Doing the work for PPT and Excel isn’t strictly a multiple of Word, because about 30% of the XML elements are shared between the three apps. So, for all of Mac Office, I’d estimate it would take a total of about 5 devs over the release cycle to add full Open XML support starting from scratch, as part of the larger project.
Currently playing in iTunes: Time Loves a Hero by Little Feat