Follow up on questions around PDF support in Office "12" (part 1 of many)

Sorry for not posting any replies to the PDF questions sooner, it's been pretty busy. I have a list of all the questions and I'll try to write up responses to everything over the next week or two. Today, there are a couple of the earlier questions that I wanted to address.

Do we support tagged PDF (for accessibility)?

A number of folks asked this question. While we do not support all possible tags (I don't know of many applications that do), we do support a number of them. We support tags that assist with text flow, ALT text, and Unicode text in all cases except for InfoPath, Access, and OneNote (at this time). This is an area we would love to get more feedback on. Are there other structures you would like to see tagged?

If we do support tagged PDF, what kind of semantics will Office 12 require the author to use (e.g. headings, styles)?

We are supporting basic tags that do not require special authoring—it is automatically generated by the application from the document data. Specifically though, we are not supporting Heading tags at this time. However, we would be interested in any feedback people may have in this area.

There is of course a challenge here as we start trying to support tagging for other types of structures. Of course everything is full fidelity for the view, so what you see in Office is what you'll see in the PDF. There is additional functionality though like tagging which is really useful for a number of cases. In Word for instance we have some structural features that would be nice to structure in the PDF with tags. The problem comes with more complex layouts where text that reads continuously is actually disconnected in the PDF output. This is a challenge with any format that is a print based format. Like I said though, we're definitely open to suggestions and would love feedback on what type of support you'd like to see here.

Why is this built directly into Office as opposed to a print driver in Vista?

I can't speak for what the Vista team plans to do, but I want to be clear that this is actually much more powerful than just going through a printer driver. Folks had mentioned that the Mac OS had built in support for PDF, but that is through printer drivers [Correction: looks like I was wrong here, see first comment]. Because this is a “native” solution where we are working with application data that is upstream from where a printer driver solution begins, we are starting with much more complete information about the file. It allows us to do a better job tagging documents and implementing interactive features like internal and external hyperlinks. Additionally, transparency and gradient quality will be much better.

Is this publish only? Or will we support opening PDFs as well?

This is a publish, one-way only operation . We are neither shipping a special viewer nor doing any work to make PDF files readable by the Office applications.

What choices were made with font embedding/subsetting/outlining?

As a rule, we embed and subset embeddable fonts (when permitted). We are not doing font outlining in the general case.

Where's the innovation? :-)

Well, as I said in the comments, this particular feature is about responding to customer demand. We've had a ton of people ask us for this support, and so we're providing it. I don't really care who already has this functionality, it has nothing to do with why we did this. You're seeing all kinds of press around this because it's something that a lot of people have been asking for and a lot of people care about. There are tons of great blog entries out there from people who are excited that we are building this directly into the product. I don't think anyone has claimed that this particular feature is about innovation (unlike something like the new user interface). It was a lot of work to build though. It's natively supported in Word, Excel, PowerPoint, Access, Publisher, OneNote, Visio, and InfoPath, which is a really big undertaking.

OK, sorry I haven't had a chance to answer all the other questions yet. It's really busy right now as we're trying to get stuff tightened down for the first Beta coming out in the next couple months. I'm also trying to pull together a couple posts dealing with the questions around OpenDocument and that's taking some time as well. Not to mention, I have a lot more fun talking more about scenarios and things you can do with the formats, so I really want to get back onto those topics. I'll try to get to the other questions around PDF though in the next couple days.