Table models in file formats

In my post last week about the lack of table support in ODF, some folks were curious as to why the Ecma Open XML formats have three different table models. I explained that when you are designing a file format, you need to examine closely the target user scenarios of the applications that will use those formats.

Obviously the use cases for a table in a spreadsheet are different from those around tables in presentations or wordprocessing documents. For instance, it's not much of a stretch to imagine a table of data in a spreadsheet with 50,000 rows and 200 columns. That would never happen in a presentation though. A table in a presentation is much more heavily focused on the layout and formatting of the data (same case with a wordprocessing document).

I think a great example of why you often need different table models would be the ODF spec itself. It was too difficult to map their existing table model to the presentation format so instead of working through that issue they instead left it out of the spec. Otherwise the spec would have just stated that the table model applied to all three formats. As it stands right now, the only way to get a table in a presentation is to embed a spreadsheet. The plan is that in V 1.2 (which is still over a year out) they will have support for spreadsheet formulas and presentation tables. One could argue over whether it would have been better to actually finish the spec before submitting it to ISO and creating organizations like the ODF Alliance who purpose is to push for policies that mandate ODF, but maybe that's just me ;-).

Looking at the table models, I do think the ODF guys made a big mistake in the design of their spreadsheet format. They chose to make the table model for wordprocessing documents and spreadsheet documents the exact same (but it looks like it's still different from HTML or CALS). Now I do understand that this level of commonality is the nirvana for most folks and I also had the goal of making the Open XML formats as close to this ideal as possible (it's something we actually looked really hard at doing when we first started working on the Open XML formats). The problem is that for the same reason you often need different user interfaces in the different applications, you also need a different file format at times. Sure there are plenty of concepts that can be reused (such as basic formatting), but a spreadsheet grid is different from a presentation table. Otherwise you're stuck with a format that sells everyone a bit short as it's the greatest common factor of all the applications, and isn't optimized for the unique customer scenarios.

If you've looked at the Ecma spec, you can see that we had to diverge in the table design of for the Office Open XML formats. The use of tables in wordprocessing documents and presentations is very similar, and as a result the table models in those formats are very similar. In a spreadsheet though, you have to account for much larger sets of data, and at that point the efficiency with which you write out that information can have a much more significant impact in the amount of time it takes to actually parse the files.

So, is the spreadsheetML format super easy? Well that depends on who you ask. For people that have developed against the old binary formats, things will be unbelievably easier and more reliable. But for folks who've primarily used table models like HTML, there will be a bit of a learning curve. That's why the file format documentation that we're doing in Ecma is so important. It will empower anyone to program against these files. We could have gone with a more verbose simple table model, but that would have been at the detriment of every user out there. Most people don't care about their file format, they just want things to work. As I said in an earlier post, we had to take the training wheels off, but we're going to be there with you as you learn to ride on your own.