MOSS: How to prevent a single text on a page to come in search results?
I was recently working on a requirement, somewhat weird (As developers would say) but made a lot of sense (As end users would say).
Customer had a home page with their company logo. At some places in the home page (or better in the master page) there were texts like Search <Name of company>. So when I searched for the company name, the results showed the text Search <Company Name>. As such these are okay but customers marketing team had a problem with this and it makes complete sense.
Now customer wanted to hide these results from the search results.
Ideas for the solution: Create a custom search - Lot of work in writing code, not feasible as customer was closing to release date.
Another solution that came to mind was to somehow prevent the text from crawling. But how to prevent a text on a page from getting crawled? MOSS Search doesn’t allow this. It allows preventing crawling at a page level and not a subpage level.
Below is a brief on the possible solutions I thought before finalizing on the ultimate solution. The possible solutions were:
1. Use of SPRestricted Control – This is a control provided by SharePoint that allows us to show/hide content based on permissions. We could have used it but we wanted to hide the content from the System Account and this control doesn’t allow us to explicitly specify users. It allows us to only specify permission levels. Hence this was not a viable solution.
2. Use of custom Control and Permissions: - This involves creating a Custom User control and encapsulating the labels contents in it. We would write some code which will check what is the current user and determine its permissions. If it is System Account hide it otherwise show it. This can alternated with checking of permissions but performance can be a key here, especially if we use checking of permissions. Not suggested.
3. Use of custom Control and Check whether executed by Crawler: This involves creating a Custom User control and encapsulating the labels contents in it. We would write some code which will check whether the page is executed by a Crawler. The only change we need to make in addition in this solution is to make a Registry change on the index server. This is the suggested solution as it has marginal performance hit.
The business logic i used for implementing this logic:
1. Encapsulate the label with a user control – Built in visual studio.
2. Write custom code to prevent the user control to show the text based on who is accessing the page.
3. Restart Search and IIS.
I tried the steps and was able to get this to work. For step 2 I tried getting the current user’s identity and if it was the identity of the search service identity, prevent the text from showing. This works fine as long as Search service identity is not an identity being used by customer to view the site in browser.
So how to work around this issue. There is a way. In the HttpContext object there is a property – Request.Browser.Crawler – this property traps the case when crawler is executing the page.
HttpContext context = HttpContext.Current;
txtBox1.Text = "Search <Name of company>";
The only hitch in using this is to make the below registry change [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager] "UserAgent"="Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 5.0 Robot Crawler)"
Typically all other search engines have this registry entry and MOSS somehow fails to add this to registry. This is a known issue and quite a number of KBs are available for this change.
Sample code for the same is available as attachment with the blog.