Developing Filters for Windows Search
Microsoft Windows Search uses filters to extract the content of files for inclusion in a full-text index. You can extend Windows Search to index new or proprietary file types by writing filters to extract the content and property handlers to extract the properties. Filters are associated with file types, as denoted by file extensions. While one filter can handle multiple file extensions, each extension works with only one filter.
How Filters Work
In general terms, filters work by reading file streams from protocol handlers, chunking file data into logical blocks of either property values or text content, and emitting those chunks to the filter host for inclusion in a full-text index. In Microsoft Windows Vista, property handlers are generally responsible for accessing properties and filters are generally responsible for accessing content.
During the indexing process, Windows Search calls the appropriate filter with the URL for each file or item. The filter breaks the item's content into chunks of text. Windows Search adds the text returned by the filter to the catalog. Windows Search can index any file type for which it has a registered filter.
When and What to Implement
In some circumstances, you do not need to write a new filter. Windows Search contains filters for over 200 types of items (including plaintext items such as HTML, XML, and source code files) and uses similar IFilter technology as SharePoint Services uses. If you already have filters installed for your file type, Windows Search can very likely use those existing filters to index this data. Furthermore, Windows Search includes a general filter for file types that are plaintext-based. If you have a file type that can be processed by either an existing SharePoint Services filter or the plaintext filter, you can add the file extension and filter GUID to the Registry so Windows Search can locate and use it.
If, however, you have a non-plaintext and proprietary data or file format, writing a custom filter implementation is the only way to ensure Windows Search can index the file format in the catalog. You can have only one filter add-in for a file type, so you should be aware that it is possible to override an existing filter or to have another filter override yours for a specific file type.
If you deploy your filter only on Windows Vista, then you need a filter implementing the IFilter and IPersistStream interfaces, and a property handler implementing the IPropertyStore and IInitializeWithStream interfaces. Both implementations can live in the same COM server DLL.
There are, however, times when you need to implement a filter that handles both content and properties in order to:
- Support legacy MSSearch implementations
- Traverse links
- Preserve language information
- Recursively filter embedded items
In these situations, you need a full filter implementation, including the IFilter::GetValue method to access property values.
As noted earlier, Windows Vista and Windows Search include a new property system that encapsulates an item's properties from its content. This property system does not exist in earlier versions of Microsoft Windows Desktop Search (WDS) 2.x. If your filter must support other applications as described above, it may need to handle both content and properties. We recommend you refer to Developing IFilter Add-ins for more information on developing such a filter. The remainder of this SDK targets Windows Search development and may not thoroughly address legacy issues.