Updating HTMLProp iFilter sample to work with MOSS 2007

What happened?
Customer was using MOSS 2007 SP2 and was indexing a lot of content from external sites. These sites had date as information in the HTML meta fields. Customer wanted them to be indexed as Date/Time field in MOSS so that they can be queried like a Date rather than string.

Customer is trying to use HTMLProp iFilter sample from the Platform SDK to index the fields as Date/Time but it is not indexing them as Date/Time but still being indexed as string only.

Why it happened?
By default, SharePoint's HTML ifilter (nlhtml.dll) converts all the meta field information from crawling to string type rather than their particular data type. After troubleshooting found, that the HTMLProp's implementation available in SDK may not be compatible with MOSS 2007 ifilter implementation.

HTMLProp is implementing IPersistFile but MOSS 2007 expects IPersistStream to be available, so it is not working with MOSS (got this information by using iFilter Explorer, a 3rd party tool)

How was it resolved?
Studied how nlhtml.dll works and modified the HTMLProp sample to use IPersistStream along with IPersistFile and registered it with MOSS. You can use the modified sample given along with the instructions from http://msdn.microsoft.com/en-us/library/dd582939(office.11).aspx to get it working.

To register the HTMLProp iFilter to work with MOSS, few registry entries need to be changed to make iFilter to load when indexing HTML and HTM files:

ADD your full dll path to following registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\DLLsToRegister

For the following 2 keys, need to change the (Default) value to "{f4309e80-a1db-11d1-a8fb-00e098006ed3}"-- This is GUID of our HTMLProp IFilter

Changed for MOSS
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.htm
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.html

Changed for WSS v3
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.htm

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.html

Important NOTE: Please make sure you take backup of registry keys before changing anything.

References:
IPersistFile - http://msdn.microsoft.com/en-us/library/ms687223(VS.85).aspx
IPersistStream - http://msdn.microsoft.com/en-us/library/ms690091(VS.85).aspx
http://blogs.msdn.com/ifilter/archive/2006/12/25/chronicles-of-an-ifilter-development-inception-to-deployment.aspx

HTMLProp_Updated_Sample.zip