2.18 Full-Text Index Catalog

A full-text index catalog is a collection of files placed in the same directory. These files contain the data necessary for resolving full-text queries against all documents crawled by the search application.

Each search application operates with 3 full-text index catalogs

  • Main catalog, as specified in section 2.18.1.

  • Anchor text catalog, as specified in section 2.18.2.

  • Active anchor text catalog, as specified in section 2.18.3.

The following files MUST be present in any full-text index catalog.

Diacritic settings: The file SETTINGS.DIA has the diacritic settings file format, as specified in section 3.1.13, and stores the diacritic setting for the full-text index catalog.

QIR file<36>: A set of files that has the query-independent rank file format, as specified in section 3.1.10. These files contain query independent values for a property for each document. Each set of files correspond to one property. The filenames are CiQR????.000, CiQR????.001 and CiQR????.002 for the header, first and second data files respectively. The last 4 characters of file names MUST be equal to the hexadecimal value of the property identifier for the property.

Example: For a property with a property identifier equal to "172", the filenames are: CiQR00AC.000, CiQR00AC.001, and CiQR00AC.002.

Detected languages file<37>: A set of files that has the detected languages file format, as specified in section 3.1.9. The filenames are CiDL0000.000, CiDL0000.001, and CiDL0000.002 for the header, first, and second data files respectively.

Index table: A set of files with the index table file format, as specified in section 3.1.11. The index table enumerates all remaining files in the full-text index catalog, unless specified otherwise. Filenames are INDEX.000, INDEX.001, and INDEX.002 for the header, first and second data files respectively.

The following components MUST be included in the full-text index catalog if they are referenced by the catalog index table file. The file names corresponding to the shadow merge log file, as specified inĀ  master merge log file, AVDL file and backup AVDL file are composed of the log prefix (mentioned in the following list) and the 2 higher bytes of ComponentID recorded in hexadecimal representation (4 digits). The extensions for these files are ".000" for the header, ".001" for the first and ".002" for the second data files.

Master index component: A full-text index component referenced by an itMaster CIndexRecord, as specified in section 2.13.3. There MUST be no more than one master full-text index component in a full-text index catalog.

Shadow index component: Full-text index components referenced by itShadow CIndexRecords, as specified in section 2.13.3. There MUST be exactly one full-text index component for each itShadow CIndexRecord in the Index table.

Interrupted shadow merges: A set of files referenced by an itShadowMergeLog CIndexRecord, as specified in section 2.13.3, that includes an incomplete full-text index component and a shadow merge log file, whose log prefix is "CiMG".

Interrupted master merge: A set of files referenced by itMasterMergeLog, as specified in section 2.13.3, and itNewMaster CIndexRecords. It includes an incomplete full-text index component and a master merge log file whose log prefix is "CiMG".

AVDL file: An AVDL file referenced by an itAvdlLog CIndexRecord. The AVDL file log prefix is "CiAD".

AVDL backup files: AVDL files referenced by itAvdlLogBackup1 and itAvdlLogBackup2 CIndexRecords. AVDL backup files log prefix is "CiAB".

The following components are included in the full-text index catalog and they are not referenced by the index table file.

Lexicon file: The file NLGINDEXLEXICON.LEX has the index lexicon file format, as specified in section 2.15. This file MUST be present if there is a master index component or a master merge log file with split key bigger than the minimal content index key in the catalog.

Click distance: The file 00CD00CD.ci has the click distance file format, as specified in section 2.14. This file MUST only be present in the anchor text catalog, as specified in section 2.18.2 and the active anchor text catalog, as specified in section 2.18.3, when the active anchor text catalog is not empty.

If content index file which belongs to a master index component whose format version is equal to 0x54 contains content index records with property identifier equal to 0x7ffeFFC8 and 0x7ffeFFC9 then a QIR file with property identifier equals 0xAC MUST be present in full-text index catalog. For each document identifier in master index component this QIR file MUST store an uncompressed float value. This value defines importance of the item for any query it might be retrieved for.