Registering IFilters in SharePoint Portal Server

The SharePoint Portal Server Search service uses Microsoft Windows® 2000 Indexing Service filter technology to filter the contents of documents in the document library and external content sources. You can install a special dynamic-link library (DLL), referred to as an IFilter, to access documents in their native formats. The name 'IFilter' is derived from the name of the Component Object Model (COM) interface that an IFilter implements.

SharePoint Portal Server provides IFilters for common document formats. These include IFilters for text, HTML, Microsoft Office documents, and Tagged Image File Format (TIFF).

IFilters run in the process of the Filter Daemon and extract the content and properties from specific file types. For example, the HTML filter strips a document of all HTML tags and emits body text in addition to various HTML elements, such as Title, as properties. IFilters are registered to document types, as identified by file extension, Multipurpose Internet Mail Extensions (MIME) Content Type, Class ID, and are independent of the protocol handler that uses them.

The Microsoft Platform Software Development Kit (SDK) provides complete information about designing and creating IFilters for the Windows 2000 Indexing Service. You can find this information in Microsoft MSDN® Library online in the Base Services section of the Platform SDK. For reference information about IFilters, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixrefint_9sfm.asp. For additional information about how to create and use IFilters, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixufilt_912d.asp.

Registering the IFilter

During the IFilter self-registration process, standard Indexing Service IFilters (using regsvr32.exe) bind to file system extensions that they can process. For more information about registering an IFilter through Indexing Service, see the section about Persistent Handlers in the Indexing Services 3.0 SDK.

SharePoint Portal Server, however, provides additional flexibility and capabilities for IFilter registration through the FilterRegistration and FilterRegistration2 objects.

FilterRegistration and FilterRegistration2 methods are called in DLLRegisterServer and DLLUnregisterServer export functions of your custom IFilter. They support the standard mapping available for previous IFilter versions, such as mapping an IFilter to a file name extension. In addition, an IFilter can be mapped to a particular MIME content type and to particular file types for a specified indexing catalog. The FilterRegistration and FilterRegistration2 objects also support setting an IFilter as the default filter for a data source. For example, Microsoft Word can be set as the default filter for all documents with a .doc file extension.

If an IFilter registered for the document type is available, SharePoint Portal Server applies it to the documents that it is crawling. If a registered IFilter is not available for the document type, SharePoint Portal Server uses the default IFilter. Binding features specific to the SharePoint Portal Server are only available to IFilters that register for these features. The IFilters registered for the SharePoint Portal Server application are examined first, followed by IFilters registered globally, and, finally those registered to the Indexing Service.

The LoadFilter object provides additional methods that allow IFilters to load other IFilters. This functionality is useful in constructing IFilters that filter documents with one or more embedded or attached items.

Introduction to Protocol Handlers