4000 pages of documentation

There has been a great overall reaction to the news last week of Ecma's first public draft for the Office Open XML formats. One thing that is now absolutely clear to everyone that we are talking about an extremely rich and powerful set of file formats.

I think many folks didn't realize the amount of work we've had to take on, which explains why some had the false assumption that we could just use ODF. We were pretty clear in our response that it just wouldn't work for our customers because at the end of the day an open format is useless if the majority of our customers won't use it. That's why we had to make our formats fully support all the existing Microsoft Office files out there. If the formats didn't support all those features, then the only people who would use them are those that fundamentally want an open format; and everyone else would have just stuck with the old binaries. We absolutely did not want that to happen, we wanted everyone using an open format. We've invested a ton of resources into XML file formats because we believe it's a good thing, and we need to make sure that our customers will be willing to use them.

Let me be clear on a couple key points:

  1. Rich format - Yes the format is extremely rich and deep and that's because it represents a very powerful set of applications that have evolved over many years and many documents. It would have been completely unacceptable for us to create a format that didn't fully represent the existing base of Microsoft Office functionality. If we had created some kind of subset format, many in the industry would have complained for very legitimate reasons. People would have complained that we were destroying fidelity with the key features they used, that we were hiding functionality, not enabling everyone to exploit the rich features, not encouraging the move to XML, etc. Bottom line – millions of organizations would have had a legitimate problem.
  2. Extremely detailed documentation - It's funny but I've actually seen people complaining that there is too much documentation. The documentation is essential, even if there are parts that are not used by everyone. I personally think we have to provide documentation on every aspect of our format, otherwise how do you know what something means? This is a lot of work, and I believe it's absolutely necessary. I can't imagine there being a benefit to anyone from not documenting something.
  3. Full implementation - I don't think it should come as a surprise that with the rich set of features in Office, it's going to be a lot of work to build an application that can support all of that functionality. In the past, people had said that the reason nobody could build an application that matched Office was that the formats were locked up. Well, the format information was available, but not for all the many purposes that we are enabling now. Now all those people should be happy because the format information is complete enough to enable a full understanding by everyone. It's up to those other applications though to decide what level of support they want to build. While I think interoperability is possible, the struggles that the applications supporting ODF are having show that it's really a lot of work even for a format that isn't as deep. This is often to be expected though because the different applications have different sets of features as well as different implementations of the same features. That is how things work.
  4. Partial implementation - Now, if you don't care about fully matching Office features, then anyone can choose to just support a subset of the format. You can implement as much or as little of the format as you want. You can build an application that does nothing more than adds comments to a WordprocessingML document; or that automatically generates a rich SpreadsheetML file based on business data. It's up to you. The information is all there to use in the way that best benefits your application.
  5. Room for innovation - Now that all the features we've stored in our formats are fully open and documented, people are free to build with them. In addition to the fact that you can implement as much or as little of the format as you want, you are also free to add more to the format. The formats are completely extensible. You can add your own extensions to the format, or you can even join Ecma and propose that those extensions get added to the official Ecma standard. The strong support for custom defined schema in Office gives you a lot more power than what a document format on it's own would give you, through integration of your own parts.
  6. Microsoft does not own the standard - We no longer own these formats, Ecma does. I know there is still concern out there that these formats could change out from under you, but that's not something that Microsoft can do. Ecma fully controls it, and once it goes through ISO, it will be even more solid and locked down.

I'd also like to reuse some information that I left as a comment in my post last week. Some people were a bit confused on how you could create a standard that was so rich and had all the backward compatibility with the existing base of Microsoft Office documents. It was even suggested that almost as a way of leveling the playing field we should choose just a subset of features that we think everyone can build applications for. This would be a great move for our competitors but a horrible move for our customers. Adam provided a lot of feedback and I really appreciated that he took the time to write all that up. Patrick and Biff had some really great replies that tried to explain why backward compatibility was so important. Here is the reply I left for Adam that I hope helps really clear up his questions around why we went the standardization route in the first place:

Hey Adam, thanks for taking the time to get all your thoughts down. It definitely has helped me understand where you are coming from.

It sounds like you understand that from our point of view, in order to use an XML format as the *default* format for Office it needs to be 100% compatible right? I think you're point is more that we should also have an optional format that is more basic and doesn't necessarily have 100% of the features covered. That smaller more basic format would then be the one that should be standardized. I think that's what you are saying.

Based on your description, the format you desire sound a lot like HTML. HTML is a great format for basic interchange. It doesn't support everything that is present in an Office document, but as you said, that isn't always desirable. We've supported HTML for quite awhile, although we took the approach of trying to have our cake and eat it too when we attempted to make our HTML output support 100% of our features. The result was an HTML format that had a ton of extra stuff in it that many of the people who just wanted HTML didn't really care about (and it just got in the way).

Our primary goal this release with the formats was not to try and re-implement HTML, but instead to move everyone over to using XML for all of their documents. Let's talk about the motivations for what we are doing with Open XML since that was the main point of your question:

  1. The reason we've spent the past 8 or so years moving out formats toward a default XML format is that we wanted to improve the value and significance of Office documents. We wanted Office documents to play an important role in business process where they couldn't before. We wanted to make it easier for developers to build solutions that produce and consume Office documents. There are other advantages too, but the main thing is that Office documents are much more valuable in just about every way when they are open and accessible.
  2. The reason we fully document them is the exact same. We need developers to understand how to program against them. Without the full documentation, then we don't achieve any of our goals I stated above. The only benefit would be that other Microsoft products could potentially interact with the documents better (like SQL or SharePoint), but that doesn't give us the broad exposure we want. That would be selling ourselves short. We want as many solutions/platforms/developers/products as possible to be able to work with our files.
  3. The reason we moved to the "Covenant not to sue" was that a number of people out there were concerned that our royalty free license approach wasn't compatible with open source licenses. Again, since the whole reason for opening the files was to broaden the scenarios and solutions where Office documents could play a role, we moved to the CNS so that we could integrate with that many more systems. Initially we'd thought the royalty free license just about covered it, but there was enough public concern out there that that we decided we needed to make it even more basic and straightforward. We committed to not enforce any of our IP in the formats against anyone, as long as they didn't try to enforce IP against us in the same area. No license needed, no attribution, we just made a legal commitment.
  4. The reason we've taken the formats to Ecma for standardization is that it appeared that a number potential solution builders were concerned that if we owned the formats and had full control, we could change them on a whim and break their solutions. We also had significant requests from governments who also wanted to make sure that the formats were standardized and no longer owned by Microsoft. Long term archive-ability was really important and they wanted to know that even if Microsoft went away, there would still be access to the formats. We were already planning on fully documenting them, but the Ecma standardization process gave us the advantage of going through a well established formal process for ensuring that the formats are fully interoperable and fully documented. It's drawn a lot more attention to the documentation as well so I'm sure we'll get much better input, even from folks who aren't participating directly in the process.

I hope that helps to clear it up a bit. It really is just as simple as that. Any application is free to implement as little or as much of the format as they wish. If you really want every application operating on a more limited set of features, that isn't as much of a format thing as an application thing. You would need to get every application to agree that it will not add any new features or functionality, and will disable any existing functionality that the other applications don't have. That wasn't our goal. Our goal was to open up all the existing documents out there, and then anyone who wants to build solutions around those formats is free to do so. In addition, anyone is free to innovate on top of the formats, as I believe there is still a lot of innovation to come. The formats are completely extensible, so if someone wants to use the formats (or parts of the formats) as a base and build on top of that, they can do so as well. They can even join Ecma if they want and propose to add those new extensions to the next version of the standard.