Article on Document Security

I just read this post on document security:

Obviously there are many different issues here, some easier to solve than others. Some of the problems are approachable with transparent file formats, as well as support for customer defined XML. One problem that can be solved with the new Office XML formats is that of unknown metadata being sent out with a document. Since the formats are now open, you can easily validate that the file doesn't have hidden edits, comments, and sensitive metadata. There is no longer the need to worry about things being hidden in your files. It's all represented in a well documented format that people can build solutions on top of. I had a post a couple weeks ago showing how you could apply a simple XSLT to remove comments and tracked changes from a document. You can imagine this being an automated process applied to all documents going out via e-mail or being posted on external websites. Of course this doesn't prevent the problem of users posting documents that they shouldn't, but it does help with those cases where people unknowingly post data that was hidden from them in their editing environment.

Additionally, you could use the support for customer XML to actually mark the documents up with your XML, and use that data to validate whether or not certain content should be removed when posted externally. I've seen people do this with Word 2003's XML support by creating XML elements to specify "security" levels. If you apply your XML to document templates, it becomes easier to identify the types of documents traveling out of your organization. This kind of metadata can be a two-edged sword since it helps you identify the type of document, but is also something you don't want to expose outside your organization (but as I said earlier, you can remove this data easily enough). It still doesn't prevent the malicious user though since they can just remove the tags or copy the content and paste it as plain text into an e-mail. That's where the problem becomes more complex.

This is an interesting topic that I'll try to get into a bit more and how it relates to the new Office file formats. There are some tools out there today that are also relevant that I'll try to dig up pointers to.