Getting Word 2007 Technical Files into Publisher Pipelines

Nature, Science and other publishers have robust ways of converting Word 2003 documents with embedded Equation Editor and MathType objects into the XML representation they use for publication. Notably MathType can export mathematical equations as MathML and this capability is part of the methodology. In principle a similar approach can be used with Word 2007 docx files with math zones in OMML. Since these files already consist of XML, they should be compatible with the general approach. But it takes time to implement. For example, transformations must be performed to ensure validity according to the NLM DTD (see Publishers add a lot of value both in improving the writing itself as well as in including information to render correctly and to satisfy archiving requirements. This work has to be integrated into the process. I’m kicking myself for not having contacted the publishers back when we offered Word 2007 beta versions over a year ago. Then we might have been able to work out a solution closer to the time that Office 2007 shipped.

I’m very impressed with what the publishers are doing and how forward looking they are. Back in the late 1980’s and early 1990’s I was chairman of the Optical Society of America’s Publication Technology Committee and I participated on a similar committee for the American Institute of Physics. Both these groups were very advanced electronically at the time. But it’s amazing to see how much progress has been made since then.

The publisher infrastructure isn’t the only area that has trouble with Word 2007’s new equation capability. Perhaps you’ve noticed that PowerPoint 2007 doesn’t understand it either and uses images for math zones instead. Although the images generally look very good in Internet Explorer, they don’t work well in PowerPoint because the sizes and backgrounds used on slides are typically very different from those in Word. We’re working on this problem too…