Filter Handlers that Ship with Windows
Microsoft supplies several standard filters with Windows Search. Clients call these filter handlers (which are implementations of the IFilter interface) to extract text and properties from a document.
This topic is organized as follows:
- Windows Search Implementation Notes
- Windows Search Filters
- Additional Resources
- Related topics
Windows Search Implementation Notes
In Windows 7 and later, filters written in managed code are explicitly blocked. Filters MUST be written in native code due to potential CLR versioning issues with the process that multiple add-ins run in.
Windows 7 Implementation
In Windows 7 and later, there is new behavior that occurs when registering a filter handler, property handler, or new extension. When a new property handler and/or filter handler is installed, files with the corresponding extensions are automatically re-indexed.
In Windows 7 and later, we recommend that you install a filter handler in conjunction with its corresponding property handlers, and that you register the filter handler before the property handler. The registration of the property handler initiates immediate re-indexing of previously indexed files without first requiring a restart, and takes advantage of any previously registered filter handlers for the purpose of content indexing.
If only a filter handler is installed without a corresponding property handler, then automatic re-indexing occurs either after a restart of the indexing service, or a restart of the system.
Windows Vista Implementation
In Windows Vista and earlier, installing an IFilter or property handler does not initiate a re-indexing of existing items unless an independent software vendor (ISV) explicitly calls a rebuild or re-indexing of matching URLs.
There are two major differences between legacy applications like Indexing Service and newer applications like Windows Search that you should be aware of when implementing filters:
- Use of the IPersistStream interface.
- Use of property handlers.
First, Windows Vista and Windows Search 3.0 and later require you use IPersistStream for the following reasons:
- To ensure performance and future compatibility.
- To help increase security. Filters implemented with IPersistStream are more secure because the context in which the filter runs does not need the rights to open files on the disk or over the network.
The second major difference is that Windows Vista and Windows Search 3.0 and later have a new Property System that uses property handlers to enumerate properties of items.
However, there are times when you need to implement a filter that handles both content and properties in order to:
- Support legacy MSSearch implementations.
- Traverse links.
- Preserve language information.
- Recursively filter embedded items.
In these situations, you need a full filter implementation, including the IFilter::GetValue method to access property values.
As noted earlier, Windows Vista and Windows Search include a new property system that encapsulates an item's properties that is separate from an item's content. This property system does not exist in earlier versions of Microsoft Windows Desktop Search (WDS) 2.x. If your filter must support other applications as described above, it may need to handle both content and properties.
For more information on developing a compatible filter, see the following topics, IFilter (for legacy applications), and Developing Filter Add-ins (for legacy applications).
Windows Search Filters
Microsoft supplies several standard filters with Windows Search. The IFilter DLL contents are summarized in the following table. Clicking the name of a filter handler takes you to the description for that IFilter implementation.
|Filter handler||Files filtered||IFilter DLL|
|MIME Filter Handler||Multipurpose Internet Mail Extension (MIME)||mimefilt.dll|
|HTML Filter Handler||HTML 3.0 or earlier||nlhtml.dll|
|Document Filter Handler||Microsoft Word, Excel, PowerPoint||offfilt.dll|
|Plain Text Filter Handler||Plain text files - Default IFilter||query.dll|
|Binary or Null Filter Handler||Binary files - Null IFilter||query.dll|
MIME Filter Handler
The MIME filter handler (in mimefilt.dll) extracts text and property information from files with the extensions .eml, .mht and .mhtml.
HTML Filter Handler
The HTML filter handler (in nlhtml.dll) extracts text and property information from the class "htmlfiles" so that it can be indexed by Windows Search. For a description of the association between IFilter and file type, see "Finding the IFilter DLL for a File" in Registering Filter Handlers.
You can use the
META tag feature of HTML documents to convey special handling requests to the HTML IFilter.
META tags occur near the beginning of an html file within the
HEAD ... /HEAD tags, as illustrated in the following example.
<head> <META NAME="DESCRIPTION" CONTENT="This text appears on the results page as the document's summary."> </head>
META tags are automatically mapped to well known property set and property ID (property identifier (PID)) values so that queries on these properties will search the mapped contents. Some examples are listed in the following table. For a list of system properties that you can use for your file formats, see System-Defined Properties for Custom File Formats.
|Property example||Mapped to|
|meta name="author" content="ruth"||The author property in the Summary Information property set.|
|meta name="subject" content="word processing"||The subject property in the Summary Information property set.|
|meta name="keywords" content="fonts, serif"||The keyword property in the Summary Information property set.|
|meta name="ms.category" content="fiction"||The category property in the document Summary Information property set.|
Some features of the HTML IFilter are listed in the following table.
|Creating special abstracts from files||Use the
|Preventing individual files from being filtered||Add a
|Setting the language code for a file (to ensure the system chooses the correct language word breakers and noise word files)||Add the following
Document Filter Handler
The Document filter handler (in offilt.dll) filters files for some extensions of documents in Microsoft Office. These include files with the extensions .doc, .mdb, .ppt, and .xlt, for example.
Plain Text Filter Handler
For plain-text files, Windows Search uses the text filter handler, which filters both the system properties (such as file names) and the contents of a file. When a file type does not have an IFilter association in the registry, Windows Search indexes only the Shell properties for the file. However the user can use the Advanced Options in the Indexing Options control panel to Index Properties or Index Properties and File Contents.
If the user chooses this option for a file type without an associated IFilter, the text filter handler is used to extract the content of the file. The text filter handler does not "understand" any document format; when filtering the contents of a file, it treats the file as a sequence of characters. It does check for the Unicode byte-order mark at the beginning of the file.
Binary or Null Filter Handler
When a registered binary file is encountered, the null filter handler is used. The null filter handler retrieves only the system properties. The contents of a binary file are not filtered. Examples of system properties are FileName, LastWriteTime, FileSize, and Attributes.
- The IFilterSample code sample, available on Code Gallery and the Windows 7 SDK, demonstrates how to create an IFilter base class for implementing the IFilter interface.
- For an overview of the indexing process, see The Indexing Process.
- For an overview of file types, see File Types.
- To query file association attributes for a file type, see PerceivedTypes, SystemFileAssociations, and Application Registration.