Implementing Protocol Handler Interfaces

Creating a protocol handler involves implementing ISearchProtocol to manage UrlAccessor objects, IUrlAccessor to generate metadata about and identify appropriate filters for items in the data store, and IFilter to filter proprietary files or to enumerate and filter hierarchically stored files. The protocol handler must be multi-threaded.

This sections contains the following topics:

  • Note on URLs
  • Protocol Handler Interfaces
  • IFilters for Containers
  • Related Topics

Note on URLs

Windows Search uses URLs to uniquely identify items in the hierarchy of your data store. A URL that defines an entry node is called a search root; Windows Search begins at that search root, asking the protocol handler to enumerate child links for each URL. The typical URL structure is:

<protocol>:// [{user SID}/] <localhost>/<path>/[<ItemID>]

The <protocol> portion identifies which protocol handler to invoke for the URL. If there is a <user SID> portion, the protocol handler is invoked in the security context of that user. Otherwise, it is invoked in the security context of the system service. The <path> portion defines the hierarchy of the store, where each forward slash ('/') is a sperator between folder names. The <ItemID> is a unique string identifying the child item (e.g., the filename).

Note  With WDS 2.6.5, protocol handlers are always invoked in the context of the user.

The Windows Search Indexer trims the final slash '/' from URLs, so you cannot rely on the existence of a final slash to identify a directory (as opposed to an item). Your protocol handler must be able to handle this URL syntax.

Note    You need to select a protocol name to identify your data store that does not conflict with current ones. We recommend this naming convention: companyName.scheme.

Protocol Handler Interfaces

ISearchProtocol and ISearchProtocol2

The SearchProtocol interfaces initialize and manage your protocol handler UrlAccessor objects. The ISearchProtocol2 interface is an extension of ISearchProtocol and includes an extra method to specify more information about the user and the item. You can implement whichever interface meets your needs.

For more information on implementing the SearchProtocol interfaces, see the ISearchProtocol Interface or ISearchProtocol2 Interface reference pages.

IUrlAccessor and IUrlAccessor2

For a specified URL, the UrlAccessor object provides access to the metadata about a URL. It can also bind those items to a protocol handler-specific filter (i.e., a filter other than the one associated with the file extension). The IUrlAccessor2 interface is an extension of IUrlAccessor and includes methods to get a code page or display URL from an item and to identify whether the URL is for an item (rather than a directory). You can implement whichever interface meets your needs.

The UrlAccessor object is instantiated and initialized by a SearchProtocol object; however, you can also implement an internal initialization method so your UrlAccessor object can perform initialization tasks specific to your protocol handler, such as validating the URL for an item being accessed or checking the last modified time to determine if a file must be processed in the current crawl.

IUrlAccessor provides four important pieces of information with the following methods:

  1. GetLastModifiedTime() returns the time the URL was last modified. If this time is more recent than the last time the indexer processed this URL, filters are invoked to process the (possibly) changed data for that item. Modified times for directories are ignored.
  2. IsDirectory() identifies whether the URL represents a folder containing child URLs.
  3. BindToStream() binds to an IStream which represents the data of a file in a custom store.
  4. BindToFilter() binds to a protocol handler-specific filter, which can provide more complete metadata for that item.

For further instructions on implementing the UrlAccessor interfaces, see the IUrlAccessor Interface or IUrlAccessor2 Interface reference pages.

IProtocolHandlerSite

This interface is used to instantiate filter handlers which are hosted in an isolated process. The appropriate filter is obtained for a specified persistent class identifier (CLSID), document storage class or file extension. The benefits of asking the host process to bind to filters is that the host process can manage the complexity of looking up the appropriate filter for you and can control the security of invoking the filter. For more information, see the IProtocolHandlerSite reference.

IFilters for Containers

If you are implementing a hierarchical protocol handler, you must implement a container IFilter component that enumerates child URLs. The enumeration process is a loop through the GetChunk and GetValue methods of the IFilter interface; each child URL is emitted as the value of the property.

GetChunk returns a FULLPROSPEC with the property set GATHER_PROPSET {0B63E343-9CCC-11D0-BCDB-00805FCCCE04} and a PropID of either:

  • PID_GTHR_DIRLINK, the URL to the item without the last modified time. GetValue() will return a PROPVARIANT containing the child URL.
  • PID_GTHR_DIRLINK_WITH_TIME, the URL along with the last modified time. GetValue() will return a PROPVARIANT containing a vector of the child URL and the last modified time.

Returning PID_GTHR_DIRLINK_WITH_TIME is more efficient because the indexer can immediately determine whether the item needs to be indexed without calling the ISearchProtocol->CreateUrlAccessor() and IUrlAccessor->GetLastModified() methods.

The following sample code demonstrates how to build the proper PID_GTHR_DIRLINK_WITH_TIME.

Important   THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright (c) Microsoft Corporation. All rights reserved.

// params are assumed to be valid

HRESULT GetPropVariantForUrlAndTime(PCWSTR pszUrl, const FILETIME &ftLastModified, PROPVARIANT **ppPropValue)
{
    *ppPropValue = NULL;

    // allocate the propvariant pointer
    *ppPropValue = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*ppPropValue));
    HRESULT hr = *ppPropValue ? S_OK : E_OUTOFMEMORY;

    if (SUCCEEDED(hr))
    {
        PropVariantInit(*ppPropValue);  // zero init the value

        // now allocate enough memory for 2 nested PropVariants.
        // PID_GTHR_DIRLINK_WITH_TIME is an array of 2 PROPVARIANTs
        PROPVARIANT *pVector = (PROPVARIANT *)CoTaskMemAlloc(sizeof(*pVector) * 2);
        hr = pVector ? S_OK : E_OUTOFMEMORY;

        if (SUCCEEDED(hr))
        {
            // set the container PROPVARIANT that it is a vector of 2 PROPVARIANTS
            (*ppPropValue)->vt = VT_VARIANT | VT_VECTOR;
            (*ppPropValue)->capropvar.cElems = 2;
            (*ppPropValue)->capropvar.pElems = pVector;
            PWSTR pszUrlAlloc;
            hr = SHStrDup(pszUrl, &pszUrlAlloc);

            if (SUCCEEDED(hr))
            {
                // now fill the array of PROPVARIANTS
                // put the pointer to the URL into the vector
                (*ppPropValue)->capropvar.pElems[0].vt = VT_LPWSTR; 
                (*ppPropValue)->capropvar.pElems[0].pwszVal = pszUrlAlloc;

                 // put the FILETIME into vector
                (*ppPropValue)->capropvar.pElems[1].vt = VT_FILETIME; 
                (*ppPropValue)->capropvar.pElems[1].filetime = ftLastModified;
            }

            else
            {
                CoTaskMemFree(pVector);
            }
        }
 
        if (FAILED(hr))
        {
            CoTaskMemFree(*ppPropValue);
            *ppPropValue = NULL;
        }
    }
    return S_OK;
}

Note   A container IFilter component should always enumerate all child URLs even if the child URLs have not changed because the Indexer detects deletions through the enumeration process. If the date output in a DIR_LINKS_WITH_TIME indicates that the data has not changed, the indexer does not update the data for that URL.

Folder Metadata

Your protocol hander needs to emit metadata for the folder(s) it enumerates. The following table lists the more important properties your protocol handler should emit.

System Property Description
System.ItemFolderPathDisplay The user-friendly name of the folder portion of the item.
Example: /Tom/Inbox/News
System.ItemPathDisplay The user-friendly name of the path to the item.
Example: /Tom/Inbox/News/RE: Red Sox Tickets
System.ItemName The user-friendly name of the item.
Example: Red Sox Tickets
System.ItemNamePrefix The prefix of the name of an item (not to be used for sorting purposes).
Example: Re: Red Sox Tickets