More Word Feedback

Works - I am afraid I know next to nothing about Works, except that we ship a converter they give us in the Word box.

Chinese on the Mac. It's funny you mention that, because a friend of mine who is the PM in charge of MacWord is not only Chinese originally, but he is absolutely passionate (and I mean crazy-passionate) about supporting Unicode and Asian languages on the Mac, and is more or less single-handedly responsible for getting that into the 2004 product by browbeating those around him in MacBU. My understanding is that 2004 does support Chinese, BTW. If you find it surprising that it took awhile to support Chinese in MacWord given the huge population of Chinese speakers, you need to factor in the tiny percentage of those people who have Macs (then multiply that by the percent that don’t pirate software, and you get real market for Chinese in MacWord). Fundamentally without a business case, things only get done out of passion (like my friend's).

Reveal codes in Word. Well, we get that request a lot of course. The internal architectures of WordPerfect and Word are essentially totally different. WordPerfect has a tagged format system, which most of us are familiar with now from HTML, although it predated HTML by a long shot (I don't know if it was influenced by SGML or not - maybe a WP person could tell us? My guess is not). So to "reveal codes", WP just shows its internal format. Word however, is designed as a set of "objects" with properties. To make something Bold in WP, you (effectively) put Bold tags around the ends of it. In Word, that "run" of text is assigned the property "Bold". Actually, there is some indirection involved. Any run of text in Word with unique properties has a unique "property bag" assigned to it. The property bag is defined elsewhere in the document. If more runs of text are created that use the same format, the property bag is reused by reference - that is, the text is assigned the properties from bag #427, and somewhere else #427 is defined as bold, green, italic, etc. many different runs of text can refer to bag #427. Same for paragraphs, sections, and so on. That's a lot of gobbledygook to say that there are no "codes" to reveal. If you use the "Format/Reveal Formatting" feature in Word2002 or 2003, what you see is the contents of the property bag for the text you had your insertion point in, and you can then change them. So, asking for reveal codes is sort of like asking a Mazda rotary engine owner if you can see the pistons in his engine. They don’t exist. Generating a tagged format like HTML or XML from Word is therefore an export/conversion process, where these object-property sets have to be converted into a serial form that the markup languages use. Likewise import means converting these sets of tags into properties assigned to runs of text. I believe you can read more about the architecture of Word if you do a Google on Charles Simonyi. He was the architect for the original WinWord - it was his second go-round (at least) since he came from Xerox where he had worked on word processing tools of similar design (so I am told).

Some people pointed out that Open Office is not an *exact* clone of Office. That wasn't my point - all I was saying was that as a designer, I am interested in innovative, clever, usable designs to solve problems. When I looked at Open Office 1.1.1 the other day, nothing jumped out at me. If there are some neat designs in there, please share the details.

Creating and modifying Word binary docs outside of Word. Well, I think from a technical perspective that's a risky proposition. We wouldn't try it, that's for sure. It's the kind of thing you can get sort of working but it never leaves that stage due to the complexity involved. That's why our binary save converter for Word95 format was actually a version of Word95 hooked up to read RTF and spit out 95 *.doc. Creating a Word binary from scratch is tough. RTF is used for this purpose instead since it is easier to deal with than Word binary for apps other than Word (remember that is why we created it - it stands for Rich Text interchange Format). The new XML format is designed for exactly that purpose - and it is easier to work with than RTF. You can create the WordML doc (or even a minimal subset) on a server using XML tools, then send the XML to Word on the client and Word will load it up. If you're missing a lot of the Word specific stuff, that's OK - Word will fill in the missing bits with defaults. In fact, you can skip generating the doc on the server if you want - just generate an XML data file in your own schema and provide an XSLT for Word to use when opening the file. That pushes a lot of the processing onto the client.

BTW, a lot of the confusion around XML in Word2003 was that people thought it was just a file format - probably because Open Office uses it that way, and the long tradition of SGML which was so document focused. To us, WordML is handy, but what is really cool is the support for schemas that customers or other developers define. WordML in this sense acts as the "envelope" for the "letter" that is the real customer data. The Pro version of Office allows you to tag up a document in Word (although we think that's a pretty unfriendly thing to ask a normal user to do, it is the first step for developers). More interestingly, as a developer, you can build structured templates using your own schema, and have users create docs using them that are pre-structured. You can hook the save event to get the document out as XML, and then you're off and running. There's a thing on the client called the "schema-library" that associates XML namespaces of your choice with XSL files, solutions, etc. This means once you're set up in the schema-library, you can dump blobs of XML to Word (via e-mail attachments, or code), and Word will check the XML you provide - find the associated files to deal with it locally, and transform that XML using a presentation that can also retain the XML markup you supplied. Note this important difference - this is not converting one schema into another like a file converter (although it can be used that way) - it is generating presentation to wrap around the actual customer data, which is retained in the resulting file.

To be clear, if you’re thinking only in terms of file format, then the XML you're imagining has things like "bold", "italic", indent", etc. And then the conversions you imagine are sort of like converting "<b>" into "<bold>" or whatever. This is necessary stuff but not all that exciting. What I'm talking about are schemas of the form "customer ID", "quantity", price", etc. These are database schemas with semantic markup of the data. Without support for this sort of thing in the application, then XML really is just another file format. A handy one to parse outside the creating app - no question there - but the exciting bit is when you can hook business data into your documents, modify it in the content of the tools you are familiar with, and print or save or update database - whatever. This can cut out a lot of steps in today's workflow, and not only be faster but also reduce error.

Working on XML in Word2003 was a blast - it seemed like every week we'd come up with a new amazing thing you could do with it. The last two or three years have been some of the most fun I've ever had at work - OneNote was of course a thrill ride, and the XML stuff in Word2003 really was breaking exciting new ground.