Custom Document Parsers

This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This page may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Managing the metadata associated with your document is one of the most powerful advantages of storing your enterprise content in Windows SharePoint Services 3.0. However, keeping that information in synch between the document library level and in the document itself is a challenge. Windows SharePoint Services 3.0 provides the document parser infrastructure, which enables you to create and install custom document parsers that can parse your custom file types and update a document for changes made at the document library level, and vice versa. Using a document parser for your custom file types helps ensure that your document metadata is always current and synchronized between the document library and the document itself.

A document parser is a custom COM assembly that, by implementing the Windows SharePoint Services 3.0 document parser interface, does the following when invoked by Windows SharePoint Services 3.0:

  • Extracts document property values from a document of a certain file type, and passes those property values to Windows SharePoint Services for promotion to the document library property columns.

  • Receives document properties and then demotes those property values into the document itself.

This functionality enables users to edit document properties in the document itself, and have the property values on the document library automatically updated to reflect their changes. Likewise, users can update property values at the document library level, and have those changes written back into the document automatically.

For more information about how Windows SharePoint Services invokes document parsers, and how those parsers promote and demote document metadata, see Document Parser Processing.

Parser Requirements

For Windows SharePoint Services to use a custom document parser, the document parser must meet the following conditions:

  • The document parser must be a COM assembly that implements the document parser interface.

    For more information, see Document Parser Interface Overview.

  • The document parser assembly must be installed and registered on each front-end Web server in the Windows SharePoint Services installation.

  • You must add an entry for the document parser in DOCPARSE.XML, the file that contains the list of document parsers and the file types with which each is associated.

    For more information, see Document Parser Definition Schema Overview.

Parser Association

Windows SharePoint Services selects the document parser to invoke based on the file type of the document to be parsed. A given document parser can be associated with multiple file types, but you can associate a given file type with only one parser.

To specify the file type or types that a custom document parser can parse, you add a node to the Docparse.XML file. Each node in this document identifies a document parser assembly, and specifies the file type for which it is to be used. You can specify a file type by either file extension or program ID.

If you specify multiple document parsers for the same file type, Windows SharePoint Services invokes the first document parser in the list associated with the file type.

Windows SharePoint Services 3.0 includes built-in document parsers for the following file types:

  • OLE: includes DOC, XLS, PPT, MSG, and PUB file formats

  • Office 2007 XML formats: includes DOCX, DOCM, PPTX, PPTM, XLSX and XLSM file formats

  • XML

  • HTM: includes HTM, HTML, MHT, MHTM, and ASPX file formats

You cannot create a custom document parser for these file types. For more information about how to use the built-in XML parser to promote and demote document properties for XML files, see XML Document Property Promotion and Demotion.

For more information about defining a custom document parser, see Document Parser Definition Schema Overview.

Parser Deployment

To guarantee that Windows SharePoint Services is able to invoke a given parser whenever necessary, you must install each parser assembly on each front-end Web server in your Windows SharePoint Services installation. Because of this, you can specify only one parser for a given file type across a Windows SharePoint Services installation.

The document parser infrastructure does not include the ability to package and deploy a custom document parser as part of a Windows SharePoint Services Feature.

See Also

Concepts

Document Parser Processing

Mapping Document Properties to Columns

Document Parsing and Content Types

Document Parser Definition Schema Overview

Document Parser Interface Overview