How to optimize content for search (FAST Search Server 2010 for SharePoint)

 

Applies to: FAST Search Server 2010

After installation of a new SharePoint Server, not all content is equally relevant for search. Content has different levels of importance, and some content should not be indexed at all. The search engine must be informed of this content differentiation so that the most relevant items are returned when users search.

The search engine must contain the important and relevant information for relevancy tuning to have any effect. It is also important to prevent indexing of information that has no value for search. Removing unwanted information from the search engine helps prevent the return of unwanted and low-value results. The following sections include some guidelines that should be established for content owners and site administrators when organizing the data:

  • Structure your sites

  • Use short URLs

  • Use simple and descriptive link texts

  • Add metadata to your content

  • Prevent indexing of irrelevant content

  • Encourage archiving and deleting old content

  • Review your security settings

Structure your sites

Decide which sites are (or will be) the most important and reserve them for premium content. These sites can also be given additional rank points through site promotions. For more on site promotions, see Add, remove and display site promotions for a keyword by using Windows PowerShell (FAST Search Server 2010 for SharePoint).

Use short URLs

Add rank points to pages that have short URLs, for example giving the page http://myserver.com/ additional rank points compared to the page http://myserver.com/site. General and important information should therefore be part of a page that has a short URL. In the FAST Search Server 2010 for SharePoint solution, you can give additional rank points to sites that have short URLs by adjusting the static/quality rank component Urldepthrank. For more on how to adjust this component, see Change the weight, remove or add a static rank component by using Windows PowerShell (FAST Search Server 2010 for SharePoint).

You can also nest less important sites, use natural hierarchies, and avoid flat structures.

For example:

Use Versus

http://sales

http://sales/china

http://china

http://sales/japan

http://japan

http://products

http://products/proda

http://proda

http://products/prodb

http://prodb

The search engine contains a module that contributes to dynamic rank points based on the link text (anchor text) and static rank points based on links that are created between items in the SharePoint Server installation and other content that you plan to index.

Dynamic rank points are added when your search words match the words used in the link text (anchor text). A search hit in the link text is considered of high quality, because it contains words that other users have used to describe a certain page. For more see Change the authority (anchor text) weight by using Windows PowerShell (FAST Search Server 2010 for SharePoint).

In addition, FAST Search Server 2010 for SharePoint will add static rank points based on the link structure, that is the number of links pointing to a site or an item. The static rank that is given by the created links in the SharePoint Server installation is handled through the static/quality rank components SiteRank and DocRank. The rank score is based on a link graph analysis of the indexed content. The idea is that the more links that point to a page or an item, the higher the rank score. For more on how to adjust these components, see Change the weight, remove or add a static rank component by using Windows PowerShell (FAST Search Server 2010 for SharePoint).

Additional tips:

  • Remember that using hyperlinks affects the importance of pages.

  • When using links, use link text (anchor text) that gives a good description of the referred page and avoid using link text like click, here, and see.

  • For image links, ensure that you add relevant “alt” text in the link.

Add metadata to your content

Encourage all content providers to annotate content with meaningful metadata. Metadata is used by the search engine both for initial recall when you search and for refining search results by drilling down based on provided or extracted metadata. Metadata is either added to the document properties or updated directly in SharePoint Server for the document. The most important metadata are as follows:

Metadata Description

Title

Used as a high quality part of information for search. When searching, a match in a title gives additional rank points compared to a match in the document body.

Dates

Used for freshness relevance. Equally important items are sorted by the age of the document.

Authors

Generally important for recall when you search on names, in addition to being an important document refiner.

Description/comments

Generally important for recall, but also important for presentation of the search result as it is used as a short description on the search result page.

All other metadata is also important for relevance and search recall and for use as custom refiners.

When importing a larger document set into the SharePoint Server installation, verify that metadata for the items such as title and author is correct and not a template title or author. Instead of using template values, you should remove these values during the import operation to prevent false rank and use of wrong values when presenting the search results.

Prevent indexing of irrelevant content

By excluding content that is of no interest for search, you prevent irrelevant content from appearing in the search result.

To prevent indexing of content on a SharePoint site, use the SharePoint Server feature Site Action> Site Settings > Search and Offline Availability and set Indexing Site Content to No.

If a Web Part contains different levels of security for example reference to items with both general and admin permissions, the content of the web part will not be available for search. This behavior can be changed in Site Action > Site Settings > Search and Offline Availability by changing the Indexing ASPX Page Content setting.

For web content outside SharePoint, you can use tags to prevent indexing of that content. User the tag <meta name="robots" content="noindex"/> to prevent indexing of a whole page, and the tag <div class =”noindex”> for excluding individual items. By using these tags, only relevant content will be indexed and used for search recall. Examples of using these tags could be for menu, footer, and header information that is repeated on several pages and not something that you want to search on. Common header/footer information is often set in the master page of the SharePoint Server Web Content Management (WCM) or in the page layout templates.

You can also exclude content of crawled properties from searching. For more, see Exclude crawled properties from searching by using Windows PowerShell (FAST Search Server 2010 for SharePoint).

Encourage archiving and deleting old content

Archive or delete data that no longer has any interest to the organization. Keeping the SharePoint Server installation clean from historical data and junk that is of no interest is good for both the maintenance of SharePoint Server and for the search solution.

Review your security settings

FAST Search Server 2010 for SharePoint can index intranet content that is protected by ACLs, and will help ensure that users are only able to see results that they have permission to see. As long as the permission settings are correct, this will not cause any issues.

The main challenge when indexing many content repositories on your intranet is that any incorrect permission settings will be visible to the users, as the content is suddenly available via one query. Before indexing the content, these permission inconsistencies may not have been visible, as users may not have direct access to these repositories.