2.14 Click Distance File

The click distance file uses the appropriate Content Index file format, as specified in section 2.3, to store specific data used in the rank.

A click distance file stores the click distance value for every document identifier present in the full-text index catalog. The click distance value is calculated using the minimum number of links that need to be followed to create a path between the list of authority pages and the item represented by this document identifier on the web graph.

The encoding of the file uses the same MaxDocID value as the master index of the same full-text index catalog and the format version is always 0x52.

The click distance file contains two content index records. The first content index record has the content index key with the index key string the same as BOF key and property identifier =96 (pidClickDistance). This record is used for storing 2 values:

  • MaxClickDistance: The maximum click distance value stored in the file.

  • AverageClickDistance: The average of the click distance values stored in the file.

These 2 values are stored as occurrence values for 2 document identifiers. The document identifier values MUST be ignored by the reader and SHOULD be set by the writer to 1 and 2 respectively. The MaxDocIDOccBucket field MUST be ignored.<31>

The second content index record has the content index key with the index key string the same as the EOF key and property identifier =96 (pidClickDistance). This record lists all of the document identifiers used in the current full-text index catalog. For each document identifier, there is one occurrence value, which is the click distance value for that document identifier.