Spreadsheet formula bugs
I saw this blog post from Rob Wier the other day, and I thought it would be worth drilling into a bit more. While Rob tends to have a flare for the dramatics he does point out a couple of good bugs in the formulas piece of the Open XML spec. Rob's actually been spending a ton of time lately pouring through the spec looking for issues and representing those issue on a number of different national standards bodies. I'd like to give Rob the benefit of the doubt and assume that he's doing this in order to help improve the spec, and not just to try and find a reason to block it. It would have been much more efficient if he would have worked with us on TC45 though when we were initially reviewing and working through the spec. I know everyone on TC45 would have appreciated having him aboard, and since IBM is already an Ecma member it would have been simple for him to join. Either way though I appreciate the time that Rob is spending on the spec, as it's something you all know I care deeply about.
Let's talk about the bugs he pointed out in the formulas spec. It's a rather long post, but can be filtered down into the following 7 issues:
- The type of units on the values value for trig functions are not specified (radians or degrees)
- AVEDEV function is using an incorrect formula
- CONFIDENCE function needs some more information
- CONVERT doesn't specify which "cup" or "tablespoon" version to use
- Day count basis settings such as "US (NASD) 30/360" don't actually point to a full definition
- NETWORKDAYS function only assumes one style of a workweek
- Some statistic functions say "x is the sample mean" rather than "x-bar is the sample mean"
I took a look at the spec and it looks like he's right, there should be a bit more information provided and errata corrected. That's something that should be fairly easily handled during the ballot resolution meeting. The way the fast track process works is that every country gets a chance to vote on the spec and provide comments. Some groups will vote "yes", but leave comments they'd like to see addressed. It doesn't make sense to say "no" if you just have minor things you'd like to see addressed, which is why the ability to leave comments with a yes vote can be useful. Other groups will vote "no", and provide a list comments that explain what led them to that no vote. At the ballot resolution meeting, we get a chance to go through all the comments and make any necessary changes to the spec.
Some comments will probably get dealt with at the ballot resolution meeting, while others may be taken into account for future versions of the spec (depending on what everyone decides). In ODF for example, accessibility was probably raised as an issue, but it still was approved by ISO. Then OASIS went off to work on version 1.1 of ODF and added improvements specific for accessibility. Now they can take the 1.1 version to ISO and seek to get approval for that. The continual improvement and evolution of the standard is just as important as the initial review and approval. It will be important for Ecma and the ISO to define how the Open XML spec will evolve in the future.
In regards to the comments that Rob made around spreadsheet functions, I would hope those could pretty easily get handled at the ballot resolution meeting by just making the following corrections:
#1 The type of units on the values value for trig functions are not specified (radians or degrees)
This is easy enough to fix. The measurements are all in radians, so that can be easily clarified in the spec.
#2 AVEDEV function is using an incorrect formula
Looks like the wrong formula was inserted. I almost wonder if this was just a copy/paste screwup… It should be this:
#3 CONFIDENCE function needs some more information
Looks like we should have made it clear that this assumed a Normal distribution.
#4 CONVERT doesn't specify which "cup" or "tablespoon" version to use
If this wasn't clear then we should probably provide a link to the NIST definitions: http://physics.nist.gov/Pubs/SP811/appenB9.html#VOLUME
#5 Day count basis settings such as "US (NASD) 30/360" don't actually point to a full definition
A number of other applications do the same thing (including the OpenFormula spec Rob is working on), so I thought this was fairly widely known by folks working in this area. Given Rob's feedback though it probably would have been better to have included a reference to this:
Standard Securities Calculation Methods: Fixed Income Securities Formulas for Analytic Measures, Vol. 2, Spring 1995.
#6 NETWORKDAYS function only assumes one style of a workweek
Yeah, this is another one of those issues left over from the Lotus 1-2-3 days. The function has always behaved like this, and we never changed it. It could break existing solutions if we changed it now, but it would also be a good thing to add a new function that mimicked the old but also had an additional parameter to specify the other work week types. Not sure if the creation of a more extensible function is better to handle in the ballot resolution meeting or in the next version of the spec. I guess we'll see.
#7 Some statistic functions say "x is the sample mean" rather than "x-bar is the sample mean"
Yeah, looks like another bug. Rob's right that it should say "x-bar" instead of "x"
So, as I'm sure anyone would expect with a specification this large there are some errata that need to be addressed. I can't imagine any spec could be bug free, and thankfully we have a lot of people interested in this space who will help us as we continue to improve on the spec going forward.
I did want to get one little jab in at Rob though around his implication that we rushed thing with Open XML and that ODF made the right choice in delaying the formula stuff. In reality the OpenFormula technical committee didn't even exist until a year after the ODF spec was finalized and a few months after Microsoft submitted Open XML to Ecma. The people working on the group are a completely different set of folks from the initial ODF creators. The main folks who worked on ODF (ie participated in at least half of the meetings) were Gary Edwards; Michael Brauer (Sun); David Faure (KDE); Doug Alberg (Boeing); Daniel Vogelheim (Sun); Patrick Durusau; and Paul Grosso (ArborText). The OpenFormula committee is primarily made up of Rob from IBM, Eike Rathke from Sun; and David Wheeler from the OpenDocument Foundation (Jody Goldberg who worked with us on OpenXML also participated for awhile). So I think it's a bit disingenuous to make it sound like they already had a formula spec underway but decided to hold it back because it wasn't yet ready for prime time. :-)