2.3 Content Index File Format

A content index file stores an inverted index that allows fast search for all items that contain a given term in a specific property of an item. Each distinct property of an item, such as title, author, main text, and so on, has a separate property identifier assigned to it. For each search query term, it is possible to define a content index key that is used to find information about this term in content index file.

A content index file stores a set of content index records. Each content index record is associated with a unique content index key and stores document identifiers of all items that contain the term used to create content index key in a part of item defined by property identifier. See the following diagram:

Basic structure of a content index file (version 0x52, 0x53)

Figure 1: Basic structure of a content index file (version 0x52, 0x53)

Basic structure of a content index file (version 0x54)

Figure 2: Basic structure of a content index file (version 0x54)

A content index file has two input parameters: DocIDMax and format version.

A content index file MUST contain: records with content index keys, one record with max key, records with EOF keys for all property identifiers that are used in at least one record with content index key, and one record with EOF key and property identifier equal to 0x7FFEFFFF. Content index records MUST be ordered by content index key in default index key sorted order.

A content index file which belongs to a master index component whose format version is equal to 0x53 or 0x54 MUST contain records with BOF keys for all property identifiers that are used in at least one record with content index key, one record with BOF key and property identifier equal to 0x7FFEFFFF.<2>

A content index file which belongs to an index component whose format version is less than 0x54 MUST NOT contain content index records with property identifier equal to 0x7ffeFFC8 or 0x7ffeFFC9. Content index record with property identifier equal to 0x7ffeFFC8 contains a list of items that are more likely to be relevant for a query that contains the term that is used to create the content key and for each item it contains a value that represents relative rank of an item for that term. Content index record with property identifier equal to 0x7ffeFFC9 MUST be present if content index record with property identifier equal to 0x7ffeFFC8 is present with same key and the record MUST contain a set of items that are less likely to be relevant for a query that contains the term that is used to create the content key.