3.1.1.7 Crawl Mapping

The portal content project contains a collection of zero or more crawl mapping objects. Each crawled item has two URLs: access URL and display URL. The index server uses the access URL to obtain the item from the item repository, and it uses the display URL as a URL of the item to store in the metadata index. The display URL is returned to the users if the URL is requested in the search query. During the crawl the access URL and display URL of every item is checked against the crawl mapping objects. The match occurs if any prefix of the URL that covers complete path segments as described in [RFC2396], section 3.3, is equal to the Source or Target property of the mapping. If multiple mappings match the URL, the mapping that matches the longest prefix is chosen. For example, http://site/pathseg1/pathseg2/file.htm matches http://site, or http://site/pathseg1, or http://site/pathseg1/pathseg2, but does not match http://site/pathse or http://saite/pathseg1/path.

If the access URL matches the Source property of the mapping, the matching prefix will be replaced by the Target property to construct the display URL. The suffix of the URL will be preserved.

The crawl mappings collection does not allow mappings with duplicate Source or Target properties.

  • Source: source URL prefix for access URLs.

  • Target: target URL prefix for the display URLs.