How to Build an IFilter for SharePoint 2010 Search and Windows Search Using C++, ATL, and MFC

Summary: Learn, step-by-step, how to build an IFilter to index file contents by using C++, the Microsoft Foundation Class library (MFC), and the Active Template Library (ATL). This article is for developers using SharePoint 2010 and .NET Framework who have limited exposure to C++.

Applies to: Business Connectivity Services | Open XML | SharePoint Designer 2010 | SharePoint Foundation 2010 | SharePoint Online | SharePoint Server 2010 | Visual Studio

**Provided by:**Alex Culp, RBA Consulting

Contents

  • Introduction to IFilters for Windows Desktop Search and SharePoint Server 2010 Search

  • How an IFilter Works

  • Interfaces Your Code Must Implement to Build an IFilter

  • Initial Project Setup Before You Build an IFilter

  • Implementing the IFilter

  • Registering the IFilter

  • Testing the IFilter Using the IFiltTst Utility

  • Packaging and Deploying an IFilter

  • Conclusion

  • Additional Resources

  • About the Author

Download the sample code

An IFilter is an interface that enables Windows Desktop Search and Microsoft SharePoint Server 2010 search to index the contents of files. Although the default full-text search for documents works well in many situations, it is not always appropriate; for example, indexing a file that is in a binary format or indexing a text file in which you must locate specific information in the document. Windows has built-in IFilters for Microsoft Office 2010 products and filter packs that are available for download (see Microsoft Office 2010 Filter Packs), and Adobe has an IFilter for PDF files. Although an IFilter is technically an interface, implementations of that interface are also called IFilters, which can be confusing. For clarity, this article always uses the term IFilter interface to refer to the interface.

As of Windows 7, you can no longer use managed code to implement an IFilter because for any given process, only one version of the .NET Framework runtime can be loaded at a time. This means that if one IFilter developer uses the 2.0 version of the .NET Framework and another developer uses the 4.0 version, the two IFilters are incompatible. In Windows 7 and later versions, filters that are written in managed code are explicitly blocked. Filters must be written in native code because there are potential common language runtime (CLR) versioning issues with the process that multiple add-ins run in. Although it might be possible to write an IFilter in Microsoft Visual Basic 6.0, it is likely a very bad option considering the throughput demands that are required to index thousands, or possibly millions, of files (for example, in SharePoint). Therefore, the best option to develop an IFilter is to implement it by using C++.

To write an IFilter, you must implement several COM interfaces (IFilter, IPersistFile, IPersistStream, and IUnknown). Although you could write COM objects without relying on the Active Template Library (ATL), ATL makes development much easier because it provides the COM infrastructure (object creation, object destruction, mapping the interface to the concrete implementation, and so on).

Following that reasoning, this article uses ATL in its implementation of an IFilter. It also uses the Microsoft Foundation Class library (MFC) to implement string manipulation, which is a common feature of IFilters.

How an IFilter Works

An IFilter works by processing a file or stream and then breaking it up into chunks to be processed. In the case of SharePoint (both Office SharePoint Server 2007 and SharePoint 2010) the indexing service uses the IPersistStream interface to send a stream to a specific IFilter implementation. After that, the calling process (the SharePoint search process or Desktop Search) calls the Init method on the IFilter interface to allow the specific implementation of the IFilter to perform any preliminary setup actions. However, the GetChunk method does the real work. Before you can understand the GetChunk method, you have to first understand the concept of a chunk. A chunk is a single data element. The host process continues to call GetChunk to get the next piece of data until FILTER_E_END_OF_CHUNKS is returned.

Figure 1 shows a typical sequence of calls from the ifilttst.exe test utility to an IFilter implementation. Note that the sequence assumes an implementation in which the Load method on IPersistFile calls the Load method on IPersistStream.

Figure 1. Sequence of calls from a test utility to an IFilter implementation

Sequence of calls from a test utility

If your IFilter is called from SharePoint Server 2007 or SharePoint 2010, the IPersistFile interface is not used; instead the Load method provides an IStream parameter. Figure 2 shows the sequence of calls from SharePoint.

Figure 2. Sequence of calls from SharePoint

Sequence of calls from SharePoint

Interfaces Your Code Must Implement to Build an IFilter

To build an IFilter, your code must implement the IFilter and IUnknown interfaces, and either the IPersistStream or IPersistFile interface. You should however, implement all the following interfaces.

IFilter Interface

The IFilter interface is where you complete most of the work to implement your IFilter. You must implement the following methods on the IFilter interface:

  • GetChunk This method positions the filter at the beginning of the first chunk of data or at the next chunk, and returns a descriptor. This method contains the file-parsing logic.

  • GetText This method retrieves the text from the current chunk.

  • GetValue This method retrieves values from the current chunk for any value other than string. These values include datetime, integer, long, and bool.

  • Init This method initializes the filtering session. You should do any initialization logic in this method, such as setting up internal variables.

The BindRegion method is currently not used. Therefore, it can return E_NOTIMPL.

IPersistStream Interface

To implement an IFilter for SharePoint, you must implement the IPersistStream interface. When you implement an IFilter for SharePoint, you have no information about the file other than its contents. This is because you are given the contents as a stream, via an IStream object as a parameter to the Load method instead of the file information. There is no way to gather additional information from the IFilter code, such as what directory or URL is hosting the document. The only method that is used by an IFilter is the Load method. The other methods on this interface can return E_NOTIMPL.

IPersistFile Interface

Desktop search and SharePoint do not use the IPersistFile interface because it breaks the sandbox of Windows search. However, it is still a good idea to implement this interface so that you can test your IFilter. The only method that you must implement is Load. The other methods on this interface can return E_NOTIMPL. The test tool ifilttst.exe uses this interface to pass information about a file into the IFilter. Testing your IFilter is discussed later in this article. You can implement the Load method by loading a stream that is based on the file information, and then calling the Load method on the IPersistStream interface.

IUnknown Interface

The IUnknown interface enables clients to use the QueryInstance method to obtain pointers to other interfaces on a given object, and to manage the existence of the object through the AddRef and Release methods. All other COM interfaces are inherited, directly or indirectly, from IUnknown. Therefore, the three methods on IUnknown are the first entries in the VTable for every interface. If you use ATL, all of the work for implementing IUnknown is performed for you. For more information about the IUnknown interface, see IUnknown Interface.

Initial Project Setup Before You Build an IFilter

Before you build your IFilter, you must download and install the Windows SDK. Microsoft Windows SDK for Windows 7 and .NET Framework 4 has the header files that you must have to implement the IFilter. Also, you must run the Microsoft Visual Studio development system as an administrator to register your DLL.

To create the project

  1. Open Visual Studio 2010.

  2. Choose the correct project template. Under Visual C++, select ATL, and then specify a name for your project, as shown in Figure 3.

    Figure 3. Select ATL and name your project

    Select ATL and then give your project a name

  3. Click OK to create the solution, and then choose Next in the following dialog box, as shown in Figure 4.

    Figure 4. ATL Project Wizard welcome page

    ATL Project Wizard welcome page

  4. In the Application Settings dialog box, select Support MFC. This option enables you to access the CString class, which is very useful because IFilters typically do a lot of string manipulation.

  5. Click Finish to create your project, as shown in Figure 5.

    Figure 5. Select Support MFC in the Application Settings dialog box

    Select Support MFC in the dialog box

Use 64-bit Versions of IFilters, not 32-bit Versions of IFilters

Because of the memory limitations of 32-bit operating systems, most SharePoint instances are 64-bit. In fact, SharePoint 2010 only comes in a 64-bit edition. Consequently, your IFilter must be 64 bits if you want to support SharePoint 2010. This is also true if you want to implement your IFilter for desktop search on a 64-bit operating system. To support 64-bit versions of your IFilter, you must create a Win64 solution platform.

To create a Win64 solution platform

  1. In Visual Studio, on the Build menu, click Configuration Manager to open the Configuration Manager dialog box, as shown in Figure 6.

    Figure 6. Configuration Manager dialog box

    Configuration Manager dialog box

  2. Under Active solution platform, select New from the drop-down list. Choose x64 or Itanium. If you are not sure about your target server processor, choose x64, as shown in Figure 7.

    Figure 7. Choose x64 or Itanium in the New Solution Platform dialog box

    Choose x64 or Itanium in the dialog box

  3. Click OK, and then close the Configuration Manager dialog box.

Additional Include Directories to Access All Required Header Files and Libraries

To access all the header files and libraries that you must have, add the SDK include folder in your Additional Include directories. If you are using a 64-bit operating system, the default path for the windows SDK is C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include. On a 32-bit operating system, the SDK should be installed to C:\Program Files\Microsoft SDKs\Windows\v7.0A\Include.

To set the include path

  1. Right-click your project file, click Properties, and then click C/C++.

  2. In the Configuration drop-down list, select All Configurations.

  3. In the Platform drop-down list, select All Platforms. You must have these options because the include path applies to all configurations and platforms. If you set it for only one configuration, such as Win32, the IFilter does not compile under the other configuration, as shown in Figure 8.

    Figure 8. Select All Configurations and All Platforms

    Select All Configurations and All Platforms

  4. In the Additional Include Directories drop-down list, browse to your SDK folder, as shown in Figure 9. If you do not have it, download the Microsoft Windows SDK for Windows 7 and .NET Framework 4 (ISO).

    Figure 9. Browse to your SDK Include folder

    Browse to your SDK Include folder

  5. Click OK to return to the Property Pages dialog box.

To set up debugging for the IFilter sample project

  1. Right-click your project file, and then click Properties to open the Property Pages dialog box.

  2. Under Configuration Properties, click Debugging.

    Leave the Platform drop-down list set to All Platforms.

  3. Click C/C++, and then click General.

    In the Debug Information Format drop-down list, select Program for Edit and Continue (/ZI).

To change the runtime library

  1. Continuing from the previous procedure, under C/C++, click Code Generation.

  2. In the Runtime Library drop-down list, select Multi-threaded DLL (/MD). Your code cannot compile unless you use this setting.

    Ensure that you apply this setting to All Configurations and to All Platforms, as shown in Figure 10.

    Figure 10. Select Multi-threaded DLL (/MD) from the Runtime Library list

    Select Multi-threaded DLL (/MD) from the list

    The compile error is predefined in afx_ver.h.

    #if defined(_AFXDLL) && !defined(_DLL)
    #error Please use the /MD switch for _AFXDLL builds
    #endif
    

Implementing the IFilter

The rest of this article provides step-by-step instructions to implement the IFilter. If you prefer, you can download the code for the sample IFilter (see Sample IFilter).

Note

Concerning memory management, all of the code that is used in this sample uses local variables that do not require cleanup. If possible, avoid using keywords such as new, malloc, or calloc when you implement your IFilter. Otherwise, you must clean up the memory after you are finished with it. Even a small memory leak that is compounded over thousands or millions of files can cause significant problems.

Concerning CString and String conversions, there are many types of strings in C++, such as char*, LPCSTR, and LPCWSTR. Each of these strings has a different purpose. To simplify the development of the areas in the code that vary by IFilter implementation, a developer who follows this strategy can use CString class for most of the work. The CString class is roughly analogous to the string class that is found in managed code. Many of the same routines found on the managed string class can be found on the CString class. Because the CString class is part of the MFC library, the implementation of an IFilter in this article includes the MFC.

Introduction to the Sample Target File for the IFilter

Before you can examine the implementation details, you have to have a sample file to target. The following file is simple, but it shows how to use the string, integer, and datetime crawled properties.

Sample JohnDoe.my file

Customer Name: John Doe

DOB: 5/5/1973

Favorite Sport: Football

Height (Inches): 73

Create the Implementation Class

In this section, you create the class that implements the IFilter interfaces.

To create the implementation class

  1. Right-click the project, and then select Add Class to open the AddClass dialog box.

  2. Select ATL Simple Object, and then click Next to open the ATL Simple Object Wizard, as shown in Figure 11.

    Figure 11. Select ATL Simple Object in the Add Class dialog box

    Select ATL Simple Object in the dialog box

  3. Specify the name of your IFilter in the Short name box, and then click Next, as shown in Figure 12.

    Figure 12. Specify the name of your IFilter in the Short Name box

    Specify the name of your IFilter

    If your implementation class has the same name as the project, you get a warning that the .cpp file already exists. This is because the file existed when you created the project and contains some global functions for COM. Click Yes, as shown in Figure 13.

    Figure 13. Warning that the .cpp file already exists

    Warning that the .cpp file already exists

  4. The File Type Handler Options dialog box opens. Click Next to open the Options dialog box, as shown in Figure 14.

    Figure 14. File Type Handler Options dialog box

    File Type Handler Options dialog box

  5. Under Threading model, select Both. This is because IFilters support both Simple and Apartment threading models. Click Finish to finish creating the implementation class, as shown in Figure 15.

    Figure 15. Select Both in the Threading Model options

    Select Both in the Threading Model options

Update the Interface Definition Language (IDL) file

The IDL file defines the COM interfaces and operations that your DLL supports. You must update the file to support the IFilter, IPersistFile, and IPersistStream interfaces.

To update the IDL file

  1. In Visual Studio, in Solution Explorer, open the MyIFilter.idl file, which is in the Source Files folder, as shown in Figure 16.

    Figure 16. MyIFilter.idl is in the Source Files folder

    MyIFilter.idl is in the Source Files folder

  2. You have to add two import statements. At the beginning of the IDL file add the following lines of code.

    Import "filter.idl";
    Import "propsys.idl";
    
  3. In your coclass declaration, add the following lines of code.

    Interface IFilter;
    Interface IPersistFile;
    Interface IPersistStream;
    

The IDL file should now resemble the following example.

import "filter.idl";
import "propsys.idl";
import "oaidl.idl";
import "ocidl.idl";
 
[
    object,
    uuid(E18A08A4-0667-474F-B8B8-74523292BB75),
    dual,
    nonextensible,
    pointer_default(unique)
]
interface IMyIFilter : IDispatch{
};
[
    uuid(A622AC67-C33B-4886-8C2A-10BF8A7C016D),
    version(1.0),
]
library MyIFilterLib
{
    importlib("stdole2.tlb");
    [
        uuid(A204ECE7-61DD-4F9F-AC89-DD9B7EFB2076)
    ]
    coclass MyIFilter
    {
        [default] interface IMyIFilter;
        interface IFilter;
        interface IPersistFile;
        interface IPersistStream;
    };
};

Note

The GUIDS vary each time that a new project is created.

Inherit the necessary interfaces

You must include the necessary interfaces so that your implementation class can inherit from them. Open the MyIFilter.h header file and add the following lines of code to your class declaration.

public IFilter,
public IPersistStream,
public IPersistFile

Also, you must add the following interfaces into the COM map.

COM_INTERFACE_ENTRY(IFilter)

COM_INTERFACE_ENTRY(IPersistStream)

COM_INTERFACE_ENTRY(IPersistFile)

Your header file should now resemble the following example.

class ATL_NO_VTABLE CMyIFilter :
    public CComObjectRootEx<CComMultiThreadModel>,
    public CComCoClass<CMyIFilter, &CLSID_MyIFilter>,
    public IDispatchImpl<IMyIFilter, &IID_IMyIFilter, &LIBID_MyIFilterLib, /*wMajor =*/ 1, /*wMinor =*/ 0>,
    public IFilter,
    public IPersistStream,
    public IPersistFile
{
public:
    CMyIFilter()
    {
    }
 
DECLARE_REGISTRY_RESOURCEID(IDR_MYIFILTER1)
 
BEGIN_COM_MAP(CMyIFilter)
    COM_INTERFACE_ENTRY(IMyIFilter)
    COM_INTERFACE_ENTRY(IDispatch)
    COM_INTERFACE_ENTRY(IFilter)
    COM_INTERFACE_ENTRY(IPersistStream)
    COM_INTERFACE_ENTRY(IPersistFile)
END_COM_MAP()

Utility Classes Used to Set a Crawled Property for the IFilters

The actual code for setting a crawled property is fairly complex, so this article includes a very useful class that simplifies chunk and property logic.

Add the following ChunkValue.h and ChunkValue.cpp files to your project.

ChunkValue.h

#pragma once
 
#include <strsafe.h>
#include <shlwapi.h>
#include <propkey.h>
#include <propsys.h>
#include <filter.h>
#include <filterr.h>
 
// This class simplifies both chunk and property value pair logic.
// To use, create a ChunkValue class as follows.
// Example:
//      CChunkValue chunk;
//      hr = chunk.SetBoolValue(PKEY_IsAttachment, true);
//      or
//      hr = chunk.SetFileTimeValue(PKEY_ItemDate, ftLastModified);
class CChunkValue
{
public:
    CChunkValue();
 
    ~CChunkValue();
 
    // Clear the ChunkValue.
    void Clear();
 
    // Is this propvalue valid?
    BOOL IsValid();
 
 
    // Get the value as an allocated PROPVARIANT.
    HRESULT GetValue(PROPVARIANT **ppPropVariant);
 
    // Get the string value.
    PWSTR GetString();
 
    // Copy the chunk.
    HRESULT CopyChunk(STAT_CHUNK *pStatChunk);
 
    // Get the type of chunk.
    CHUNKSTATE GetChunkType();
 
    // Set the property by key to a unicode string.
    HRESULT SetTextValue(REFPROPERTYKEY pkey, PCWSTR pszValue, CHUNKSTATE chunkType = CHUNK_VALUE,
                         LCID locale = 0, DWORD cwcLenSource = 0, DWORD cwcStartSource = 0,
                         CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
    // Set the property by key to a bool.
    HRESULT SetBoolValue(REFPROPERTYKEY pkey, BOOL bVal, CHUNKSTATE chunkType = CHUNK_VALUE, LCID locale = 0,
                         DWORD cwcLenSource = 0, DWORD cwcStartSource = 0, CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
    // Set the property by key to a variant bool.
    HRESULT SetBoolValue(REFPROPERTYKEY pkey, VARIANT_BOOL bVal, CHUNKSTATE chunkType = CHUNK_VALUE, LCID locale = 0,
                         DWORD cwcLenSource = 0, DWORD cwcStartSource = 0, CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
    // Set the property by key to an int.
    HRESULT SetIntValue(REFPROPERTYKEY pkey, int nVal, CHUNKSTATE chunkType = CHUNK_VALUE,
                        LCID locale = 0, DWORD cwcLenSource = 0, DWORD cwcStartSource = 0,
                        CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
    // Set the property by key to a long.
    HRESULT SetLongValue(REFPROPERTYKEY pkey, long lVal, CHUNKSTATE chunkType = CHUNK_VALUE, LCID locale = 0,
                         DWORD cwcLenSource = 0, DWORD cwcStartSource = 0, CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
    // Set the property by key to a dword.
    HRESULT SetDwordValue(REFPROPERTYKEY pkey, DWORD dwVal, CHUNKSTATE chunkType = CHUNK_VALUE, LCID locale = 0,
                          DWORD cwcLenSource = 0, DWORD cwcStartSource = 0, CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
    // Set the property by key to an int64.
    HRESULT SetInt64Value(REFPROPERTYKEY pkey, __int64 nVal, CHUNKSTATE chunkType = CHUNK_VALUE, LCID locale = 0,
                          DWORD cwcLenSource = 0, DWORD cwcStartSource = 0, CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
    // Set the property by key to a filetime.
    HRESULT SetFileTimeValue(REFPROPERTYKEY pkey, FILETIME dtVal, CHUNKSTATE chunkType = CHUNK_VALUE,
                             LCID locale = 0, DWORD cwcLenSource = 0, DWORD cwcStartSource = 0,
                             CHUNK_BREAKTYPE chunkBreakType = CHUNK_NO_BREAK);
 
protected:
    // Set the locale for this chunk.
    HRESULT SetChunk(REFPROPERTYKEY pkey, CHUNKSTATE chunkType=CHUNK_VALUE, LCID locale=0, DWORD cwcLenSource=0, DWORD cwcStartSource=0, CHUNK_BREAKTYPE chunkBreakType=CHUNK_NO_BREAK);
 
    // Member variables.
private:
    bool m_fIsValid;
    STAT_CHUNK  m_chunk;
    PROPVARIANT m_propVariant;
    PWSTR m_pszValue;
 
};

ChunkValue.cpp

#include "stdafx.h"
#include "atlbase.h"
#include "atlconv.h"
#include <string>
CChunkValue::CChunkValue() : m_fIsValid(false), m_pszValue(NULL)
{
          PropVariantInit(&m_propVariant);
        Clear();
}
 
CChunkValue::~CChunkValue()
{
    Clear();
};
 
// Clear the ChunkValue.
void CChunkValue::Clear()
{
    m_fIsValid = false;
    ZeroMemory(&m_chunk, sizeof(m_chunk));
    PropVariantClear(&m_propVariant);
    CoTaskMemFree(m_pszValue);
    m_pszValue = NULL;
}
 
// Is this propvalue valid?
BOOL CChunkValue::IsValid()
{
    return m_fIsValid;
}
 
 
// Get the value as an allocated PROPVARIANT.
HRESULT CChunkValue::GetValue(PROPVARIANT **ppPropVariant)
{
    HRESULT hr = S_OK;
    if (ppPropVariant == NULL)
    {
        return E_INVALIDARG;
    }
 
    *ppPropVariant = NULL;
 
    PROPVARIANT *pPropVariant = static_cast<PROPVARIANT*>(CoTaskMemAlloc(sizeof(PROPVARIANT)));
 
    if (pPropVariant)
    {
        hr = PropVariantCopy(pPropVariant, &m_propVariant);
        if (SUCCEEDED(hr))
        {
            // Detach and return this as the value.
            *ppPropVariant = pPropVariant;
        }
        else
        {
            CoTaskMemFree(pPropVariant);
        }
    }
    else
    {
        hr = E_OUTOFMEMORY;
    }
 
    return hr;
}
 
// Get the string value.
PWSTR CChunkValue::GetString()
{
    return m_pszValue;
};
 
// Copy the chunk.
HRESULT CChunkValue::CopyChunk(STAT_CHUNK *pStatChunk)
{
    if (pStatChunk == NULL)
    {
        return E_INVALIDARG;
    }
 
    *pStatChunk = m_chunk;
    return S_OK;
}
 
// Get the type of chunk
CHUNKSTATE CChunkValue::GetChunkType()
{
    return m_chunk.flags;
}
 
// Set the property by key to a unicode string.
HRESULT CChunkValue::SetTextValue(REFPROPERTYKEY pkey, PCWSTR pszValue, CHUNKSTATE chunkType,
                     LCID locale, DWORD cwcLenSource, DWORD cwcStartSource,
                     CHUNK_BREAKTYPE chunkBreakType)
{
    if (pszValue == NULL)
    {
        return E_INVALIDARG;
    }
 
 
    HRESULT hr = SetChunk(pkey, chunkType, locale, cwcLenSource, cwcStartSource, chunkBreakType);
    if (SUCCEEDED(hr))
    {
        size_t cch = wcslen(pszValue) + 1;
        PWSTR pszCoTaskValue = static_cast<PWSTR>(CoTaskMemAlloc(cch * sizeof(WCHAR)));
        if (pszCoTaskValue)
        {
                  StringCchCopyW(pszCoTaskValue, cch, pszValue);
            // StringCchCopy(pszCoTaskValue, cch, pszValue);
            m_fIsValid = true;
            if (chunkType == CHUNK_VALUE)
            {
                hr = InitPropVariantFromString(pszCoTaskValue,&m_propVariant);
            }
            else
            {
                m_pszValue = pszCoTaskValue;
            }
        }
        else
        {
            hr = E_OUTOFMEMORY;
        }
    }
    return hr;
};
 
// Set the property by key to a bool.
HRESULT CChunkValue::SetBoolValue(REFPROPERTYKEY pkey, BOOL bVal, CHUNKSTATE chunkType, LCID locale,
                     DWORD cwcLenSource, DWORD cwcStartSource, CHUNK_BREAKTYPE chunkBreakType)
{
    return SetBoolValue(pkey, bVal ? VARIANT_TRUE : VARIANT_FALSE, chunkType, locale, cwcLenSource,
                          cwcStartSource, chunkBreakType);
};
 
// Set the property by key to a variant bool.
HRESULT CChunkValue::SetBoolValue(REFPROPERTYKEY pkey, VARIANT_BOOL bVal, CHUNKSTATE chunkType, LCID locale,
                     DWORD cwcLenSource, DWORD cwcStartSource, CHUNK_BREAKTYPE chunkBreakType)
{
    HRESULT hr = SetChunk(pkey, chunkType, locale, cwcLenSource, cwcStartSource, chunkBreakType);
    if (SUCCEEDED(hr))
    {
        hr = InitPropVariantFromBoolean(bVal,&m_propVariant);
        m_fIsValid = true;
    }
    return hr;
};
 
// Set the property by key to an int.
HRESULT CChunkValue::SetIntValue(REFPROPERTYKEY pkey, int nVal, CHUNKSTATE chunkType,
                    LCID locale, DWORD cwcLenSource, DWORD cwcStartSource,
                    CHUNK_BREAKTYPE chunkBreakType)
{
    HRESULT hr = SetChunk(pkey, chunkType, locale, cwcLenSource, cwcStartSource, chunkBreakType);
    if (SUCCEEDED(hr))
    {
        hr = InitPropVariantFromInt32(nVal,&m_propVariant);
        m_fIsValid = true;
    }
    return hr;
};
 
// Set the property by key to a long
HRESULT CChunkValue::SetLongValue(REFPROPERTYKEY pkey, long lVal, CHUNKSTATE chunkType, LCID locale,
                     DWORD cwcLenSource, DWORD cwcStartSource, CHUNK_BREAKTYPE chunkBreakType)
{
    HRESULT hr = SetChunk(pkey, chunkType, locale, cwcLenSource, cwcStartSource, chunkBreakType);
    if (SUCCEEDED(hr))
    {
        hr = InitPropVariantFromInt64(lVal,&m_propVariant);
        m_fIsValid = true;
    }
    return hr;
};
 
// Set the property by key to a dword
HRESULT CChunkValue::SetDwordValue(REFPROPERTYKEY pkey, DWORD dwVal, CHUNKSTATE chunkType, LCID locale,
                      DWORD cwcLenSource, DWORD cwcStartSource, CHUNK_BREAKTYPE chunkBreakType)
{
    HRESULT hr = SetChunk(pkey, chunkType, locale, cwcLenSource, cwcStartSource, chunkBreakType);
    if (SUCCEEDED(hr))
    {
        InitPropVariantFromUInt64(dwVal,&m_propVariant);
        m_fIsValid = true;
    }
    return hr;
};
 
// Set property by key to an int64
HRESULT CChunkValue::SetInt64Value(REFPROPERTYKEY pkey, __int64 nVal, CHUNKSTATE chunkType, LCID locale,
                      DWORD cwcLenSource, DWORD cwcStartSource, CHUNK_BREAKTYPE chunkBreakType)
{
    HRESULT hr = SetChunk(pkey, chunkType, locale, cwcLenSource, cwcStartSource, chunkBreakType);
    if (SUCCEEDED(hr))
    {
        hr = InitPropVariantFromInt64(nVal,&m_propVariant);
        m_fIsValid = true;
    }
    return hr;
};
 
 
// Set Property by key to a filetime
HRESULT CChunkValue::SetFileTimeValue(REFPROPERTYKEY pkey, FILETIME dtVal, CHUNKSTATE chunkType,
                         LCID locale, DWORD cwcLenSource, DWORD cwcStartSource,
                         CHUNK_BREAKTYPE chunkBreakType)
{
    HRESULT hr = SetChunk(pkey, chunkType, locale, cwcLenSource, cwcStartSource, chunkBreakType);
    if (SUCCEEDED(hr))
    {
        hr = InitPropVariantFromFileTime(&dtVal,&m_propVariant);
        m_fIsValid = true;
    }
    return hr;
};
 
 
 
// Initialize the STAT_CHUNK.
inline HRESULT CChunkValue::SetChunk(REFPROPERTYKEY pkey,
                                     CHUNKSTATE chunkType/*=CHUNK_VALUE*/,
                                     LCID locale /*=0*/,
                                     DWORD cwcLenSource /*=0*/,
                                     DWORD cwcStartSource /*=0*/,
                                     CHUNK_BREAKTYPE chunkBreakType /*= CHUNK_NO_BREAK */)
{
    Clear();
 
    // Initialize the chunk.
    m_chunk.attribute.psProperty.ulKind = PRSPEC_PROPID;
    m_chunk.attribute.psProperty.propid = pkey.pid;
    m_chunk.attribute.guidPropSet = pkey.fmtid;
    m_chunk.flags = chunkType;
    m_chunk.locale = locale;
    m_chunk.cwcLenSource = cwcLenSource;
    m_chunk.cwcStartSource = cwcStartSource;
    m_chunk.breakType = chunkBreakType;
 
    return S_OK;
}

Define Crawled Properties for the IFilter

There are several options for defining crawled properties. One is to use properties that are already defined in the propkey.h header file. There are predefined properties such as Title, Trademarks, and Size. This works well if you have no special properties in your file format that you want to capture. However, if you have unique properties, we do not recommend that you use the existing properties, which can cause confusion. Instead, create your own properties by using the same macro that you used to create the predefined properties, DEFINE_PROPERTYKEY. To avoid collisions with another IFilter, consider using a larger number for the property key.

The following example shows how to create your own crawled properties. Add a new header file that is named CrawledProperties.h and add the following lines of code to it.

CrawledProperties.h

#include <propkeydef.h>
/**** Customer Name ***
Property Name:  200
 
Category: Basic
Property Set ID: 0b63e343-9ccc-11d0-bcdb-00805fccce04
Variant Type: 31
Data Type: Text
Multi-valued: No 
*/
DEFINE_PROPERTYKEY(PKEY_BASIC_200_CUSTOMER_NAME, 0xb63e343, 0x9ccc, 0x11d0, 0xbc, 0xdb, 0x00, 0x80, 0x5f, 0xcc, 0xce, 0x04, 200);
 
/* *** DOB ***
Property Name:  201
 
Category: Basic
Property Set ID: 0b63e343-9ccc-11d0-bcdb-00805fccce04
Variant Type: 64
Data Type: Text
Multi-valued: No 
*/
DEFINE_PROPERTYKEY(PKEY_BASIC_201_DOB, 0xb63e343, 0x9ccc, 0x11d0, 0xbc, 0xdb, 0x00, 0x80, 0x5f, 0xcc, 0xce, 0x04, 201);
 
/* *** Favorite Sport ***
Property Name:  202
 
Category: Basic
Property Set ID: 0b63e343-9ccc-11d0-bcdb-00805fccce04
Variant Type: 31
Data Type: Text
Multi-valued: No 
*/
DEFINE_PROPERTYKEY(PKEY_BASIC_202_FAV_SPORT, 0x0b63e343, 0x9ccc, 0x11d0, 0xbc, 0xdb, 0x00, 0x80, 0x5f, 0xcc, 0xce, 0x04, 202);
 
/* *** Height ***
Property Name:  203
 
Category: Basic
Property Set ID: 0b63e343-9ccc-11d0-bcdb-00805fccce04
Variant Type: 31
Data Type: Text
Multi-valued: No 
*/
DEFINE_PROPERTYKEY(PKEY_BASIC_203_HEIGHT, 0x0b63e343, 0x9ccc, 0x11d0, 0xbc, 0xdb, 0x00, 0x80, 0x5f, 0xcc, 0xce, 0x04, 203);

Implement the Abstract Methods in the MyIFilter Class

The next step is to add code that implements the abstract methods in the MyIFilter class.

Header File Updates

To implement all of the methods in the interfaces that the implementation class inherits from, first declare these methods as concrete. To find the abstract method declarations, right-click any of the interfaces (IFilter, IPersistFile, and IPersistStream), and then select Go To Declaration. Copy those methods from the respective header files, and then in the MyIFilter.h file, paste them into the class declaration for the MyIFilter class. Remove the "= 0" at the end of the method declarations. You will provide a concrete implementation of all these methods, so you do not want them declared as abstract.

You also have to add a few member variables and a method that handles errors to the MyIFilter.h file. The purposes of these member variables are explained later in this article.

Member Variables and HandleError Method Declaration

private:
    IStream*                    m_pStream;         // Stream of this document
    long _uRefs;
    DWORD                       m_dwChunkId;        // Current chunk id
    DWORD                       m_iText;            // index into ChunkValue
    CChunkValue                 m_currentChunk;     // the current chunk value
    void HandleError(LPCTSTR message, HRESULT hr);

The complete header file should now resemble the following code.

MyIFilter.h

// MyIFilter.h : Declaration of the CMyIFilter
 
#pragma once
#include "resource.h"       // main symbols
#include "Filter.h"
#include "ObjIdl.h"
#include "Initguid.h"
#include "ChunkValue.h"

#include "MyIFilter_i.h"
#if defined(_WIN32_WCE) && !defined(_CE_DCOM) && !defined(_CE_ALLOW_SINGLE_THREADED_OBJECTS_IN_MTA)
#error "Single-threaded COM objects are not properly supported on Windows CE platform, such as the Windows Mobile platforms that do not include full DCOM support. Define _CE_ALLOW_SINGLE_THREADED_OBJECTS_IN_MTA to force ATL to support creating single-thread COM object's and allow use of it's single-threaded COM object implementations. The threading model in your rgs file was set to 'Free' as that is the only threading model supported in non DCOM Windows CE platforms."
#endif

using namespace ATL;
 
// CMyIFilter
 
class ATL_NO_VTABLE CMyIFilter :
      public CComObjectRootEx<CComMultiThreadModel>,
      public CComCoClass<CMyIFilter, &CLSID_MyIFilter>,
      public IDispatchImpl<IMyIFilter, &IID_IMyIFilter, &LIBID_MyIFilterLib, /*wMajor =*/ 1, /*wMinor =*/ 0>,
      public IFilter,
      public IPersistStream,
      public IPersistFile
{
public:
      CMyIFilter()
      {
      }
 
DECLARE_REGISTRY_RESOURCEID(IDR_MYIFILTER1)
 
BEGIN_COM_MAP(CMyIFilter)
      COM_INTERFACE_ENTRY(IMyIFilter)
      COM_INTERFACE_ENTRY(IDispatch)
      COM_INTERFACE_ENTRY(IFilter)
      COM_INTERFACE_ENTRY(IPersistStream)
      COM_INTERFACE_ENTRY(IPersistFile)
END_COM_MAP()

      DECLARE_PROTECT_FINAL_CONSTRUCT()
 
      HRESULT FinalConstruct()
      {
            return S_OK;
   }
 
   void FinalRelease()
   {
   }
 
   virtual  SCODE STDMETHODCALLTYPE  Init( ULONG grfFlags,
                                            ULONG cAttributes,
                                            FULLPROPSPEC const * aAttributes,
                                            ULONG * pFlags );
 
    virtual  SCODE STDMETHODCALLTYPE  GetChunk( STAT_CHUNK * pStat);
 
    virtual  SCODE STDMETHODCALLTYPE  GetText( ULONG * pcwcBuffer,
                                               WCHAR * awcBuffer );
 
    virtual  SCODE STDMETHODCALLTYPE  GetValue( PROPVARIANT * * ppPropValue );
 
    virtual  SCODE STDMETHODCALLTYPE  BindRegion( FILTERREGION origPos,
                                                  REFIID riid,
                                                  void ** ppunk);
 
    virtual  SCODE STDMETHODCALLTYPE  GetClassID( CLSID * pClassID );
 
    virtual  SCODE STDMETHODCALLTYPE  IsDirty();
 
    virtual  SCODE STDMETHODCALLTYPE  Load( LPCWSTR pszFileName,
                                            DWORD dwMode);
 
    virtual  SCODE STDMETHODCALLTYPE  Save( LPCWSTR pszFileName,
                                            BOOL fRemember );
 
    virtual  SCODE STDMETHODCALLTYPE  SaveCompleted( LPCWSTR pszFileName );
 
    virtual  SCODE STDMETHODCALLTYPE  GetCurFile( LPWSTR  * ppszFileName );
    
    virtual HRESULT STDMETHODCALLTYPE Load( 
        /* [unique][in] */ __RPC__in_opt IStream *pStm);
    
    virtual HRESULT STDMETHODCALLTYPE Save( 
        /* [unique][in] */ __RPC__in_opt IStream *pStm,
        /* [in] */ BOOL fClearDirty);
    
    virtual HRESULT STDMETHODCALLTYPE GetSizeMax( 
        /* [out] */ __RPC__out ULARGE_INTEGER *pcbSize);
 
   virtual HRESULT GetNextChunkValue(CChunkValue &chunkValue);
 
 
private:
    IStream*                    m_pStream;         // Stream of this document
    long _uRefs;
    DWORD                       m_dwChunkId;        // Current chunk id
    DWORD                       m_iText;            // index into ChunkValue
    CChunkValue                 m_currentChunk;     // the current chunk value
    void HandleError(LPCTSTR message, HRESULT hr);
 
};
 
OBJECT_ENTRY_AUTO(__uuidof(MyIFilter), CMyIFilter)

Necessary Header files to include in the IFilter Project

To implement the remaining methods for the IFilter implementation, there are several header files that you must include. Open the MyIFilter.cpp file and ensure that the following header files are included.

#include "stdafx.h"
#include "resource.h"
#include "MyIFilter_i.h"
#include "dllmain.h"
#include "Filter.h"
#include "ChunkValue.h"
#include "ObjIdl.h"
#include "Initguid.h"
#include "XEventLog.h"
#include "CrawledProperties.h"
#include "MyIFilter.h"

Implement Unused Methods in the IFilter Project

Although some methods are never going to be called, you must provide an implementation for them because they are declared as abstract in the header files. Because these methods are never going to be called, you can implement them by returning E_NOTIMPL.

Note

C++ implements interfaces as pure abstract classes.

Open the MyIFilter.cpp file and add the following method implementations.

Unimplemented Methods

SCODE STDMETHODCALLTYPE CMyIFilter::BindRegion( FILTERREGION origPos,
  REFIID riid,
  void ** ppunk)
{ 
    return E_NOTIMPL;
}
 
 
SCODE STDMETHODCALLTYPE CMyIFilter::GetClassID( CLSID * pClassID )
{
    return E_NOTIMPL;
}
 
SCODE STDMETHODCALLTYPE CMyIFilter::IsDirty()
{
    return E_NOTIMPL;
}
 
SCODE STDMETHODCALLTYPE CMyIFilter::Save( LPCWSTR pszFileName,
  BOOL fRemember )
{
    return E_NOTIMPL;
}
 
SCODE STDMETHODCALLTYPE CMyIFilter::SaveCompleted( LPCWSTR pszFileName )
{
    return E_NOTIMPL;
}
 
SCODE STDMETHODCALLTYPE CMyIFilter::GetCurFile( LPWSTR  * ppszFileName )
{ 
    return E_NOTIMPL;
}
 
HRESULT STDMETHODCALLTYPE CMyIFilter::Save(__RPC__in_opt IStream *pStm, BOOL fClearDirty)
{ 
    return E_NOTIMPL;
}
 
HRESULT STDMETHODCALLTYPE CMyIFilter::GetSizeMax(__RPC__out ULARGE_INTEGER *pcbSize)
{
    return E_NOTIMPL;
}

Load Methods to Implement in the IFilter Project

There are two Load methods that have to be implemented; one on the IPersistFile interface and one on the IPersistStream interface.

To implement the Load method for IPersistStream, add the following lines of code.

IPersistFile Load Method

SCODE STDMETHODCALLTYPE CMyIFilter::Load( LPCWSTR pszFileName,
  DWORD dwMode)
{ 
   CString msg;
   
    IStream *stream;
    USES_CONVERSION;
    HRESULT hResult = SHCreateStreamOnFile(pszFileName, STGM_READ, &stream);
    if (FAILED (hResult))
    {
        msg.FormatMessage(_T("SHCreateStreamOnFile failed for file %1."),pszFileName);
        HandleError(msg,hResult);
        return hResult;
    }
    // Use the load method for the stream.
    return Load(stream); 
}

Note

The implementation for the IPersistFile.Load method loads a stream and calls the IPersistStream.Load method.

IPersistFile.Load

HRESULT STDMETHODCALLTYPE CMyIFilter::Load( __RPC__in_opt IStream *pStm)
{ 
    if (m_pStream)
    {
        m_pStream->Release();
    }
    m_pStream = pStm;
    m_pStream->AddRef();
    return S_OK;
}

Note

The implementation of IPersistStream.Load keeps a member variable for the stream and ensures that the reference count is correct. Subsequent calls to GetChunk must have this stream to find the information. This method is the only way to retrieve a copy of the stream.

GetChunk, GetText, GetValue, and Init in the IFilter Sample Project

Three of these four methods are fairly complex because they involve a lot of type conversion, pointer logic, buffers, and string manipulation. Fortunately, if you are writing your own IFilter, you can use the implementation shown here without any modification. That way, you can focus on how you implement the GetNextChunkValue method, which is documented later in this article. Add the following methods to the MyIFilter.cpp file.

Implementation of GetText

SCODE STDMETHODCALLTYPE CMyIFilter::GetText( ULONG * pcwcBuffer,
    WCHAR * awcBuffer )
{ 
    HRESULT hr = S_OK;
 
    if ((pcwcBuffer == NULL) || (*pcwcBuffer == 0))
    {
        return E_INVALIDARG;
    }
 
    if (!m_currentChunk.IsValid())
    {
        return FILTER_E_NO_MORE_TEXT;
    }
 
    if (m_currentChunk.GetChunkType() != CHUNK_TEXT)
    {
        return FILTER_E_NO_TEXT;
    }
 
    ULONG cchTotal = static_cast<ULONG>(wcslen(m_currentChunk.GetString()));
    ULONG cchLeft = cchTotal - m_iText;
    ULONG cchToCopy = min(*pcwcBuffer - 1, cchLeft);
 
    if (cchToCopy > 0)
    {
        PCWSTR psz = m_currentChunk.GetString() + m_iText;
 
        // Copy the chars.
        StringCchCopyNW(awcBuffer, *pcwcBuffer, psz, cchToCopy);
 
        // Null terminate it.
        awcBuffer[cchToCopy] = '\0';
 
        // Set how much data is copied.
        *pcwcBuffer = cchToCopy;
 
        // Remember we copied it.
        m_iText += cchToCopy;
        cchLeft -= cchToCopy;
 
        if (cchLeft == 0)
        {
            hr = FILTER_S_LAST_TEXT;
        }
    }
    else
    {
        hr = FILTER_E_NO_MORE_TEXT;
    }
 
    return hr;
 
}

Implementation of GetChunk

SCODE STDMETHODCALLTYPE CMyIFilter::GetChunk( STAT_CHUNK * pStat)
{ 
    HRESULT hr = S_OK;
 
    // A return of S_FALSE indicates that the chunk should be skipped and that
    // we should try to get the next chunk.
 
    int cIterations = 0;
    hr = S_FALSE;
 
    while ((S_FALSE == hr) && (~cIterations & 0x0100))  // Limit to 256 iterations for safety
    {
        pStat->idChunk = m_dwChunkId;
        m_iText = 0;
        hr = GetNextChunkValue(m_currentChunk);
        ++cIterations;
    }
 
    if (hr == S_OK)
    {
        if (m_currentChunk.IsValid())
        {
            // Copy out the STAT_CHUNK
            m_currentChunk.CopyChunk(pStat);
 
            // and set the id to be the sequential chunk.
            pStat->idChunk = ++m_dwChunkId;
        }
        else
        {
            HandleError(_T("Current chunk is invalid"),E_INVALIDARG);
            hr = E_INVALIDARG;
        }
    }
 
    return hr;
}

Implementation of GetValue

SCODE STDMETHODCALLTYPE CMyIFilter::GetValue( PROPVARIANT * * ppPropValue )
{ 
    HRESULT hr = S_OK;
 
    // If this is not a value chunk they should not be calling this.
    if (m_currentChunk.GetChunkType() != CHUNK_VALUE)
    {
        return FILTER_E_NO_MORE_VALUES;
    }
 
    if (ppPropValue == NULL)
    {
        return E_INVALIDARG;
    }
 
    if (m_currentChunk.IsValid())
    {
        // Return the value of this chunk as a PROPVARIANT ( they own freeing it properly ).
        hr = m_currentChunk.GetValue(ppPropValue);
        m_currentChunk.Clear();
    }
    else
    {
        // We have already returned the value for this chunk, so go away.
        hr = FILTER_E_NO_MORE_VALUES;
    }
 
    return hr;
}

Implementation of Init

SCODE STDMETHODCALLTYPE CMyIFilter::Init( ULONG grfFlags,
  ULONG cAttributes,
  FULLPROPSPEC const * aAttributes,
  ULONG * pFlags )
{
    //This pointer is not set to any value. If you do not set it to 0
    //the IFilter will not work.
    *pFlags = 0;
 
    // Common initialization.
    m_dwChunkId = 0;
    m_iText = 0;
    m_currentChunk.Clear();
   
    return S_OK;
}

Implementing Error Handling (HandleError method) in the IFilter Sample Project

It is important to implement error handling in case something goes wrong when your code runs. Normally, ATL COM classes do not throw exceptions but instead return HRESULTs, so there is no risk to using these classes. However, if any class is used that can throw an exception, each method must have a try…catch logic to handle the exception and return HRESULT. A thrown exception (such as out of memory) will put the searchFilterHost.exe and Windows indexer into a bad state, so you will need to ensure that no exception will escape from the iFilter Most API calls in C++ return an HRESULT. An HRESULT, which is a handle to a result, is actually an integer. A value of 0 or a positive number means that there were no errors, while any negative value indicates that there is a problem. Let's revisit the code we wrote to load a file into a stream.

if (FAILED (hResult))
{
   msg.FormatMessage(_T("SHCreateStreamOnFile failed for file %1."),pszFileName);
   HandleError(msg,hResult);
   return hResult;
}

Note

This code uses the FAILED macro. While a value of 0 or a positive number does indicate success, for documentation purposes it is better to use this macro or its counterpart SUCCEEDED because they make the code self-documenting.

One of the most common approaches to implementing error handling in Windows is to write the error to the event log. The example in this article includes the HandleError method, which writes to the Windows application log when an error occurs. If you want your IFilter to handle errors differently you must write your own implementation of this method.

Before implementing the HandleError method, download the sample code from Event Logging, Part I: XEventLog - Basic NT Event Logging and add the XEventLog.h and XEventLog.cpp files to the MyIFilter project. The XEventLog class wraps all of the complexities of writing to the event log. In order to use the XEventLog class, first add a reference to the MyIFilter.h header file. Add the following line of code to the rest of the include statements for the CMyIFilter class:

#include "XEventLog.h"

Implement the HandleError method using the following code.

void CMyIFilter::HandleError(LPCTSTR message, HRESULT hr)
{
   //Change this to whatever app name you want
   const LPCTSTR AppName = _T("My IFilter");
 
   CString msg;
   msg.FormatMessage(_T("%1 HResult = %2!d!"),message,hr);
   CXEventLog eventLog(AppName);
   eventLog.Write(EVENTLOG_ERROR_TYPE,msg);
}

Implement ReadLine

Because this sample project uses a text file here, it is useful to have a routine to read a single line from the sample file and then convert it into a CString object for processing. This line is also the chunk size. Remember that a chunk is defined as whatever data has a single value in it. In this case, the sample file has one value per line.

Add the following code to the MyIFilter.cpp file to implement the ReadLine method.

Implementation of ReadLine
//Reads a line from the buffer
CString CMyIFilter::ReadLine()
{
    BYTE buffer[10001];
    BOOL eof = false;
    ULONG bytesRead;
    HRESULT hr;
    int i;
 
    for (i=0;!eof && i < 10000;i++)
    {
        hr = m_pStream->Read(buffer + i,1,&bytesRead);
        if (FAILED (hr) || 0 == bytesRead)
        {
            eof = TRUE;
        }
        else if (i > 0 && (char)buffer[i] == '\n')
        {
           eof = TRUE;
        }
    }
    buffer[i] = 0;
    CString ret((char*)buffer);
    return ret;
}

Implement GetNextChunkValue in the IFilter Sample Project

The GetNextChunkValue method is not part of the IFilter interface, but was declared in the MyIFilter.h header file, and it is called from the GetChunk method. It is a simplified method that enables an IFilter developer to focus on how to implement the logic that is required to parse the values out of the file, and not have to worry about any of the other plumbing. A developer should be able to use all the code presented to this point without having to modify any of it.

The following code pulls crawled properties out of the sample file and finishes the demonstration of how to build an IFilter completely redesigned.

Implementation of GetNextChunkValue

HRESULT CMyIFilter::GetNextChunkValue(CChunkValue &chunkValue)
{
    const int MAX_LINES = 128;
    chunkValue.Clear();
    CString line;
    CString propertyValue;
    int ndx = 0;
 

    for (int lines=0;lines < MAX_LINES;lines++)
    {
        line = ReadLine();

        if (line.Find(_T("Customer Name:")) >= 0)
        {
            //Found customer, lets save the value.
            ndx = line.Find(_T(":"));
            propertyValue = line.Mid(ndx+1,1000); //arbitrary long integer, it reads to the end of the line
            propertyValue = propertyValue.Trim();
            chunkValue.SetTextValue(PKEY_BASIC_200_CUSTOMER_NAME,propertyValue);
            return S_OK;
        }
        if (line.Find(_T("Favorite Sport:")) >= 0)
        {
            //Found the favorite sport.
            ndx = line.Find(_T(":"));
            propertyValue = line.Mid(ndx+1,1000); //arbitrary long integer, it reads to the end of the line
            propertyValue = propertyValue.Trim();
            chunkValue.SetTextValue(PKEY_BASIC_202_FAV_SPORT,propertyValue);
            return S_OK;
        }
        if (line.Find(_T("Height (Inches):")) >= 0)
        {
            //Found height.
            ndx = line.Find(_T(":"));
            propertyValue = line.Mid(ndx+1,1000); //arbitrary long integer, it reads to the end of the line
            propertyValue = propertyValue.Trim();
            int height = _ttoi(propertyValue);
            chunkValue.SetIntValue(PKEY_BASIC_203_HEIGHT,height);
            return S_OK;
        }
        if (line.Find(_T("DOB:")) >= 0)
        {
            //Found date of birth.
            ndx = line.Find(_T(":"));
            propertyValue = line.Mid(ndx+1,1000); //arbitrary long integer, it reads to the end of the line
            propertyValue = propertyValue.Trim();
 
            // Now, convert the date string to a systime.
            FILETIME filetime;
            SYSTEMTIME systime;
            int month=0;
            int day=0;
            int year=0;
            ZeroMemory(&systime,sizeof(systime));
            USES_CONVERSION;
            char* aString = T2A(propertyValue.GetBuffer());
            if (sscanf(aString,"%d/%d/%d",&day,&month,&year) == 3)
            {
                systime.wMonth = month;
                systime.wDay = day;
                systime.wYear = year;
 
                SystemTimeToFileTime(&systime,&filetime);
            }
 
            // Add the date to the chunk value and return.
            chunkValue.SetFileTimeValue(PKEY_BASIC_201_DOB,filetime);
            return S_OK;
        }
        // Did not find anything interesting, so we must be out of chunks.
        return FILTER_E_END_OF_CHUNKS;
    }
}

Registering the IFilter

If you are building a 64-bit IFilter (which you must do to use it with SharePoint 2010 or 64-bit SharePoint Server 2007) you must ensure that all of your registry settings are entered into the 64-bit registry. If you plan to automate the creation or update of these registry keys, you need to take this into account. To create the 64-bit keys, the process that sets up the registry entries needs to run as a 64-bit process. You cannot use a 32-bit installer to set up your registry keys or they will all end up in the wrong place. Unfortunately, creating a 64-bit installer can be difficult. One option is to create a 64-bit console application to set up the registry keys. If you plan to make your IFilter production quality, you will need to create a 64-bit installer. For more information about creating a 64-bit installer, see setup.using_64-bit_windows_installer_packages.

Register the IFilter sample COM DLL

Because this is a COM DLL, you can register it by typing regsvr32 [name of your DLL] on the command line. Or, if you create a setup package, you can have the setup package register the DLL.

Note

Visual Studio registers the DLL each time that it is compiled on your local computer.

Register the Persistent Handler

The next step is to create the persistent handler. This is the registry key that associates the file name extension to the IFilter implementation that crawls that extension. Use Visual Studio to create the new GUID. Click the Tools menu, and then click Create GUID.

The persistent handler GUID is an arbitrary GUID, as shown in Figure 17.

Figure 17. Click the Tools menu, and then click Create GUID

Click the Tools menu, and then click Create GUID

See Registering Filter Handles for more information about the registry keys that are used to register the persistent handler. You can find the class ID of your class by opening the MyIFilter.idl file and looking directly above the coclass declaration. The following code is an example.

    importlib("stdole2.tlb");
    [
        uuid(A204ECE7-61DD-4F9F-AC89-DD9B7EFB2076)
    ]
    coclass MyIFilter
    {
        [default] interface IMyIFilter;
        interface IFilter;
        interface IPersistFile;
        interface IPersistStream;
    };

Table 1 shows the persistent filter handler registry keys.

Table 1. Persistent handler registry keys

Registry Key

Value Name

Value Type

Value

Notes

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.{Extension}

(default)

REG_SZ

{Extension}File Format

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.{Extension}

Content Type

REG_SZ

application/{Extension}

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.{Extension}

PerceivedType

REG_SZ

text

Depending on your IFilter implementation, the sample is a text file.

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\.{Extension}\PersistentHandler

(default)

REG_SZ

{Persistent Handler Guid}

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{Persistent Handler Guid}

(default)

REG_SZ

{Extension} File Persistent Handler

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{Persistent Handler Guid}\PersistentAddinsRegistered

(default)

REG_SZ

(value not set)

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{Persistent Handler Guid}\PersistentAddinsRegistered\{89BCB740-6119-101A-BCB7-00DD010655AF}

(default)

REG_SZ

{Class ID Guid}

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{Persistent Handler Guid}\PersistentHandler

(default)

REG_SZ

{Persistent Handler Guid}

SharePoint Server 2007 Registry Keys

In addition to the registry keys for the persistent handler, Table 2 shows all the registry keys needed for an IFilter to work in SharePoint Server 2007.

Table 2. SharePoint Server 2007 IFilter registry keys

Registry Key

Value Name

Value Type

Value

Notes

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Applications\{Site Guid}Gather\Portal_Content\Extensions\ExtensionList

##

REG_SZ

{Extension}

## is the next available number. You can create this registry key from SharePoint Search administration.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Setup\Filters\{Extension}

(Default)

REG_SZ

(Value not set)

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Setup\Filters\{Extension}

Extension

REG_SZ

{Extension}

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Setup\Filters\{Extension}

FileTypeBucket

REG_DWORD

5

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Setup\Filters\{Extension}

MimeTypes

REG_SZ

application/{Extension}

Setting this as application/text causes SharePoint to use full-text search instead of your IFilter.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Setup\Filters\{Extension}

ThreadingModel

REG_SZ

Both

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Setup\ContentIndexCommon\Filters\Extension\{Extension}

(Default)

REG_MULTI_SZ

{Class ID Guid}

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12\Search\Global\Gathering Manager

(Default)

REG_MULTI_SZ

{Class ID Guid}

This key affects the maximum size of the file to parse. If you design an IFilter that parses very large files, you must update this setting to accommodate a file's maximum size. Even if your IFilter only examines the first few kilobytes of the file, your IFilter is not called if the file is too large.

SharePoint 2010 Registry Keys

In addition to setting up the persistent handler, you only need one other registry key for SharePoint 2010, as shown in Table 3.

Table 3. SharePoint 2010 IFilter registry key

Registry Key

Value Name

Value Type

Value

Notes

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\14\Search\Setup\ContentIndexCommon\Filters\Extension\.{Extension}

(default)

REG_SZ

{Class ID Guid}

Testing the IFilter Using the IFiltTst Utility

The easiest way to test your IFilter implementation is to use the IFiltTst utility. For more information, see IFiltTst. You can configure this utility to run automatically when you start debugging so that you can step into your code.

To configure IFilTst to run automatically

  1. Right-click the MyIFilter project, and then click Properties.

  2. Set the Command text line to C:\Program Files\Microsoft SDKs\Windows\v7.0\Bin\x64\ifilttst.exe for the x64 version.

  3. Add the command line -I "[Path to the Sample File]". Ensure that the path has quotation marks around it if there are any spaces in the path to the sample file, as shown in Figure 18.

    Figure 18. Setting the Command text line to the path

    Setting the Command text line to the path

  4. For 32-bit versions, use the ifilttst.exe executable in the path C:\Program Files\Microsoft SDKs\Windows\v7.0\Bin\.

  5. Click OK.

You can now step into your code by pressing F5, or by selecting Start Debugging from the Debug menu.

Packaging and Deploying an IFilter

Even if you use a command line to create the registry entries, you should still create a setup package for the IFilter to install it on servers. By creating a setup package, you do not have to worry about making sure that you have the correct version of the ATL and MFC DLLs deployed. If you do decide to create a setup package for your IFilter and your IFilter is a 64-bit implementation, you must create a 64-bit installer to configure the registry keys. For more information about how to use 64-bit Windows Installer packages, see setup.using_64-bit_windows_installer_packages.

Conclusion

An IFilter is an interface that enables Windows Desktop Search and Microsoft SharePoint Server 2010 search to index the contents of files. The best option to develop an IFilter is to implement it by using C++.

To write an IFilter, you must implement several COM interfaces (IFilter, IPersistFile, IPersistStream, and IUnknown). Although you could write COM objects without relying on the Active Template Library (ATL), ATL makes development much easier because it provides the COM infrastructure.

Developers using SharePoint 2010 and .NET Framework who have limited exposure to C++ can use the example in this article to get started with custom IFilter development.

Additional Resources

For more information, see the following resources:

About the Author

Alex Culp has been designing and developing a variety of applications on the Microsoft Platform since 1997. He is a graduate of the University of Texas at Austin where he received a B.S. in Computer Science. When he is not working, he loves spending time with his wonderful wife Terah and son Nicholas. Other than his passion for technology, he loves running and any kind of high-adventure sports. He also loves working with the student ministry and volunteers as a chaplain at a local area hospital.