全文搜索Full-Text Search

适用于:是SQL Server是Azure SQL 数据库否Azure SQL 数据仓库否并行数据仓库APPLIES TO: yesSQL Server yesAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse

SQL ServerSQL ServerAzure SQL DatabaseAzure SQL Database 中的全文搜索为用户和应用程序提供了对 SQL ServerSQL Server 表中基于字符的数据运行全文查询的功能。Full-Text Search in SQL ServerSQL Server and Azure SQL DatabaseAzure SQL Database lets users and applications run full-text queries against character-based data in SQL ServerSQL Server tables.

基本任务Basic tasks

本主题概述了全文搜索并介绍了其组件和体系结构。This topic provides an overview of Full-Text Search and describes its components and its architecture. 如果你希望马上开始,可以使用下面的基本任务。If you prefer to get started right away, here are the basic tasks.

备注

全文搜索是 SQL ServerSQL Server 数据库引擎的一个可选组件。Full-Text Search is an optional component of the SQL ServerSQL Server Database Engine. 如果你在安装 SQL Server 时没有选择全文搜索,请再次运行 SQL Server 安装程序来添加它。If you didn't select Full-Text Search when you installed SQL Server, run SQL Server Setup again to add it.

概述Overview

全文索引包括表中一个或多个基于字符的列。A full-text index includes one or more character-based columns in a table. 这些列可以具有以下任何数据类型:charvarcharncharnvarchartextntextimagexmlvarbinary(max)FILESTREAMThese columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max) and FILESTREAM. 每个全文索引都对表中的一个或多个列创建索引,并且每个列都可以使用特定语言。Each full-text index indexes one or more columns from the table, and each column can use a specific language.

全文查询根据特定语言(例如,英语或日语)的规则对词和短语进行操作,从而依据全文索引中的文本数据执行语言搜索。Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. 全文查询可以包括简单的词和短语,或者词或短语的多种形式。Full-text queries can include simple words and phrases or multiple forms of a word or phrase. 全文查询返回包含至少一个匹配项(也称为“命中”)的所有文档。A full-text query returns any documents that contain at least one match (also known as a hit). 当目标文档包含在全文查询中指定的所有字词,并符合任何其他搜索条件(如匹配的字词之间的距离)时,即发生匹配。A match occurs when a target document contains all the terms specified in the full-text query, and meets any other search conditions, such as the distance between the matching terms.

全文搜索查询Full-Text Search queries

在将列添加到全文索引之后,用户和应用程序即可对列中的文本运行全文查询。After columns have been added to a full-text index, users and applications can run full-text queries on the text in the columns. 这些查询可以搜索以下项中的任一项:These queries can search for any of the following:

  • 一个或多个特定的词或短语(“简单词”)One or more specific words or phrases (simple term)

  • 以指定文本开头的词或短语(“前缀词”)A word or a phrase where the words begin with specified text (prefix term)

  • 特定词的变形(“派生词”)Inflectional forms of a specific word (generation term)

  • 与另一个词或短语邻近的词或短语(“邻近词”)A word or phrase close to another word or phrase (proximity term)

  • 特定词的同义词形式(“同义词库”)Synonymous forms of a specific word (thesaurus)

  • 使用加权值的词或短语(“加权词”)Words or phrases using weighted values (weighted term)

全文查询不区分大小写。Full-text queries are not case-sensitive. 例如对于 "Aluminum" 或 "aluminum",搜索将返回相同的结果。For example, searching for "Aluminum" or "aluminum" returns the same results.

全文查询使用一小组 Transact-SQLTransact-SQL 谓词(CONTAINS 和 FREETEXT)和函数(CONTAINSTABLE 和 FREETEXTTABLE)。Full-text queries use a small set of Transact-SQLTransact-SQL predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). 然而,给定商业应用场景的搜索目标会对全文查询的结构产生影响。However, the search goals of a given business scenario influence the structure of the full-text queries. 例如:For example:

  • 电子商务(在网站上搜索产品):e-business-searching for a product on a website:

    SELECT product_id   
    FROM products   
    WHERE CONTAINS(product_description, "Snap Happy 100EZ" OR FORMSOF(THESAURUS,'Snap Happy') OR '100EZ')   
    AND product_cost < 200 ;  
    
  • 招聘员工(搜索具有 SQL ServerSQL Server 使用经验的职位候选人):Recruitment scenario-searching for job candidates that have experience working with SQL ServerSQL Server:

    SELECT candidate_name,SSN   
    FROM candidates   
    WHERE CONTAINS(candidate_resume,"SQL Server") AND candidate_division =DBA;  
    

有关详细信息,请参阅 使用全文搜索查询For more information, see Query with Full-Text Search.

全文搜索查询与 LIKE 谓词的对比Compare Full-Text Search queries to the LIKE predicate

与全文搜索不同, LIKE Transact-SQLTransact-SQL 谓词仅对字符模式有效。In contrast to full-text search, the LIKE Transact-SQLTransact-SQL predicate works on character patterns only. 另外,不能使用 LIKE 谓词来查询格式化的二进制数据。Also, you cannot use the LIKE predicate to query formatted binary data. 此外,对大量非结构化的文本数据执行 LIKE 查询要比对相同数据执行同样的全文查询慢得多。Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. 对数百万行文本数据进行的 LIKE 查询可能需要几分钟的时间才能返回结果;而对于同样的数据,全文查询只需要几秒甚至更少的时间,具体取决于返回的行数。A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.

全文搜索体系结构Full-Text Search architecture

全文搜索体系结构包括以下进程:Full-text search architecture consists of the following processes:

  • SQL ServerSQL Server 进程 (sqlservr.exe)。The SQL ServerSQL Server process (sqlservr.exe).

  • 筛选器后台程序宿主进程 (fdhost.exe)。The filter daemon host process (fdhost.exe).

    为了安全起见,筛选器由称为筛选器后台程序宿主的单独进程加载。For security reasons, filters are loaded by separate processes called the filter daemon hosts. fdhost.exe 进程是由 FDHOST 启动器服务 (MSSQLFDLauncher) 创建的,这些进程使用 FDHOST 启动器服务帐户的安全凭据运行。The fdhost.exe processes are created by an FDHOST launcher service (MSSQLFDLauncher), and they run under the security credentials of the FDHOST launcher service account. 因此,必须运行 FDHOST 启动器服务才能正常进行全文索引和全文查询。Therefore, the FDHOST launcher service must be running for full-text indexing and full-text querying to work. 有关设置此服务的服务帐户的信息,请参阅 设置用于全文筛选器后台程序启动器的服务帐户For information about setting the service account for this service, see Set the Service Account for the Full-text Filter Daemon Launcher.

这两个进程包含全文搜索体系结构的各组件。These two processes contain the components of the full-text search architecture. 下图概括了这些组件及其关系。These components and their relationships are summarized in the following illustration. 该图后面的内容介绍了这些组件。The components are described after the illustration.

全文搜索体系结构full-text search architecture

SQL Server 进程SQL Server process

SQL ServerSQL Server 进程使用全文搜索的以下组件:The SQL ServerSQL Server process uses the following components for full-text search:

  • 用户表。User tables. 这些表包含要进行全文索引的数据。These tables contain the data to be full-text indexed.

  • 全文收集器。Full-text gatherer. 全文收集器使用全文爬网线程。The full-text gatherer works with the full-text crawl threads. 它负责计划和驱动对全文索引的填充,并负责监视全文目录。It is responsible for scheduling and driving the population of full-text indexes, and also for monitoring full-text catalogs.

  • 同义词库文件。Thesaurus files. 这些文件包含搜索项的同义词。These files contain synonyms of search terms. 有关详细信息,请参阅 为全文搜索配置和管理同义词库文件For more information, see Configure and Manage Thesaurus Files for Full-Text Search.

  • 非索引字表对象。Stoplist objects. 非索引字表对象包含对搜索无用的常见词列表。Stoplist objects contain a list of common words that are not useful for the search. 有关详细信息,请参阅 为全文搜索配置和管理非索引字和非索引字表For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

  • SQL ServerSQL Server 查询处理器。SQL ServerSQL Server query processor. 查询处理器编译并执行 SQL 查询。The query processor compiles and executes SQL queries. 如果 SQL 查询包含全文搜索查询,则在编译和执行期间该查询都会发送到全文引擎。If a SQL query includes a full-text search query, the query is sent to the Full-Text Engine, both during compilation and during execution. 查询结果将与全文索引相匹配。The query result is matched against the full-text index.

  • 全文引擎。Full-Text Engine. SQL ServerSQL Server 中的全文引擎现已与查询处理器完全集成。The Full-Text Engine in SQL ServerSQL Server is fully integrated with the query processor. 全文引擎编译和执行全文查询。The Full-Text Engine compiles and executes full-text queries. 作为查询执行的一部分,全文引擎可能会接收来自同义词库和非索引字表的输入。As part of query execution, the Full-Text Engine might receive input from the thesaurus and stoplist.

    备注

    SQL Server 2008SQL Server 2008 和更高版本中,全文引擎位于 SQL ServerSQL Server 进程中,而不是位于单独的服务中。In SQL Server 2008SQL Server 2008 and later versions, the Full-Text Engine resides in the SQL ServerSQL Server process, rather than in a separate service. 通过将全文引擎集成到数据库引擎中,提高了全文可管理性和总体性能,并进一步优化了混合查询。Integrating the Full-Text Engine into the Database Engine improved full-text manageability, optimization of mixed query, and overall performance.

  • 索引编写器(索引器)。Index writer (indexer). 索引编写器生成用于存储索引标记的结构。The index writer builds the structure that is used to store the indexed tokens.

  • 筛选器后台程序管理器。Filter daemon manager. 筛选器后台程序管理器负责监视全文引擎筛选器后台程序宿主的状态。The filter daemon manager is responsible for monitoring the status of the Full-Text Engine filter daemon host.

Filter Daemon Host processFilter Daemon Host process

筛选器后台程序宿主是一个由全文引擎启动的进程。The filter daemon host is a process that is started by the Full-Text Engine. 它运行下列全文搜索组件,这些组件负责对表中的数据进行访问、筛选和断字,同时还负责对查询输入进行断字和提取词干。It runs the following full-text search components, which are responsible for accessing, filtering, and word breaking data from tables, as well as for word breaking and stemming the query input.

筛选器后台程序宿主的组件如下:The components of the filter daemon host are as follows:

  • 协议处理程序。Protocol handler. 此组件从内存中取出数据,以进行进一步的处理,并访问指定数据库的用户表中的数据。This component pulls the data from memory for further processing and accesses data from a user table in a specified database. 其职责之一是从全文索引列中收集数据,并将所收集的数据传递给筛选器后台程序宿主,从而由该宿主根据需要应用筛选和断字符。One of its responsibilities is to gather data from the columns being full-text indexed and pass it to the filter daemon host, which will apply filtering and word breaker as required.

  • 筛选器。Filters. 某些数据类型需要筛选,然后才能为文档中的数据(包括 varbinaryvarbinary(max)imagexml 列中的数据)创建全文索引。Some data types require filtering before the data in a document can be full-text indexed, including data in varbinary, varbinary(max), image, or xml columns. 给定文档采用何种筛选器取决于文档类型。The filter used for a given document depends on its document type. 例如,Microsoft Word (.doc) 文档、Microsoft Excel (.xls) 文档和 XML (.xml) 文档分别使用不同的筛选器。For example, different filters are used for Microsoft Word (.doc) documents, Microsoft Excel (.xls) documents, and XML (.xml) documents. 然后,筛选器从文档中提取文本块区,删除嵌入的格式并保留文本,如有可能的话也会保留有关文本位置的信息。Then the filter extracts chunks of text from the document, removing embedded formatting and retaining the text and, potentially, information about the position of the text. 结果将以文本化信息流的形式出现。The result is a stream of textual information. 有关详细信息,请参阅 配置和管理搜索筛选器For more information, see Configure and Manage Filters for Search.

  • 断字符和词干分析器。Word breakers and stemmers. 断字符是特定于语言的组件,它根据给定语言的词汇规则查找词边界(“断字”)。A word breaker is a language-specific component that finds word boundaries based on the lexical rules of a given language (word breaking). 每个断字符都与用于组合动词及执行变形扩展的特定于语言的词干分析器组件相关联。Each word breaker is associated with a language-specific stemmer component that conjugates verbs and performs inflectional expansions. 在创建索引时,筛选器后台程序宿主使用断字符和词干分析器来对给定表列中的文本数据执行语言分析。At indexing time, the filter daemon host uses a word breaker and stemmer to perform linguistic analysis on the textual data from a given table column. 与全文索引中的表列相关的语言将决定为列创建索引时要使用的断字符和词干分析器。The language that is associated with a table column in the full-text index determines which word breaker and stemmer are used for indexing the column. 有关详细信息,请参阅 配置和管理断字符和词干分析器以便搜索For more information, see Configure and Manage Word Breakers and Stemmers for Search.

全文搜索处理Full-Text Search processing

全文搜索由全文引擎提供支持。Full-text search is powered by the Full-Text Engine. 全文引擎有两个角色:索引支持和查询支持。The Full-Text Engine has two roles: indexing support and querying support.

全文索引过程Full-Text indexing process

全文填充(也称为爬网)开始后,全文引擎会将大批数据存入内存并通知筛选器后台程序宿主。When a full-text population (also known as a crawl) is initiated, the Full-Text Engine pushes large batches of data into memory and notifies the filter daemon host. 宿主对数据进行筛选和断字,并将转换的数据转换为倒排词列表。The host filters and word breaks the data and converts the converted data into inverted word lists. 然后,全文搜索从词列表中提取转换的数据,对其进行处理以删除非索引字,然后将某一批次的词列表永久保存到一个或多个倒排索引中。The full-text search then pulls the converted data from the word lists, processes the data to remove stopwords, and persists the word lists for a batch into one or more inverted indexes.

对存储在 varbinary(max)image 列中的数据编制索引时,筛选器(实现 IFilter 接口)将基于为该数据指定的文件格式(例如, MicrosoftMicrosoft Word)来提取文本。When indexing data stored in a varbinary(max) or image column, the filter, which implements the IFilter interface, extracts text based on the specified file format for that data (for example, MicrosoftMicrosoft Word). 在某些情况下,筛选器组件要求将 varbinary(max)image 数据写入 filterdata 文件夹中,而不是将其存入内存。In some cases, the filter components require the varbinary(max), or image data to be written out to the filterdata folder, instead of being pushed into memory.

在处理过程中,通过断字符将收集到的文本数据分隔成各个单独的标记或关键字。As part of processing, the gathered text data is passed through a word breaker to separate the text into individual tokens, or keywords. 用于词汇切分的语言将在列级指定,或者也可以通过筛选器组件在 varbinary(max)imagexml 数据内标识。The language used for tokenization is specified at the column level, or can be identified within varbinary(max), image, or xml data by the filter component.

还可能会进行其他处理以删除非索引字,并在将标记存储到全文索引或索引片断之前对其进行规范化。Additional processing may be performed to remove stopwords, and to normalize tokens before they are stored in the full-text index or an index fragment.

填充完成后,将触发最终的合并过程,以便将索引碎片合并为一个主全文索引。When a population has completed, a final merge process is triggered that merges the index fragments together into one master full-text index. 由于只需要查询主索引而不需要查询大量索引碎片,因此这将提高查询性能,并且可以使用更好的计分统计信息来得出相关性排名。This results in improved query performance since only the master index needs to be queried rather than a number of index fragments, and better scoring statistics may be used for relevance ranking.

全文查询过程Full-Text querying process

查询处理器将查询的全文部分传递到全文引擎以进行处理。The query processor passes the full-text portions of a query to the Full-Text Engine for processing. 全文引擎执行断字,此外,它还可以执行同义词库扩展、词干分析以及非索引字(干扰词)处理。The Full-Text Engine performs word breaking and, optionally, thesaurus expansions, stemming, and stopword (noise-word) processing. 然后,查询的全文部分以 SQL 运算符的形式表示,主要作为流式表值函数 (STVF)。Then the full-text portions of the query are represented in the form of SQL operators, primarily as streaming table-valued functions (STVFs). 在查询执行过程中,这些 STVF 访问倒排索引以检索正确结果。During query execution, these STVFs access the inverted index to retrieve the correct results. 此时会将结果返回给客户端,或者先将它们进一步处理,再将它们返回给客户端。The results are either returned to the client at this point, or they are further processed before being returned to the client.

全文索引体系结构Full-text index architecture

全文引擎使用全文索引中的信息来编译可快速搜索表中的特定词或词组的全文查询。The information in full-text indexes is used by the Full-Text Engine to compile full-text queries that can quickly search a table for particular words or combinations of words. 全文索引将有关重要的词及其位置的信息存储在数据库表的一列或多列中。A full-text index stores information about significant words and their location within one or more columns of a database table. 全文索引是一种特殊类型的基于标记的功能性索引,它是由 SQL ServerSQL Server全文引擎生成和维护的。A full-text index is a special type of token-based functional index that is built and maintained by the Full-Text Engine for SQL ServerSQL Server. 生成全文索引的过程不同于生成其他类型的索引。The process of building a full-text index differs from building other types of indexes. 全文引擎并非基于特定行中存储的值来构造 B 树结构,而是基于要编制索引的文本中的各个标记来生成倒排、堆积且压缩的索引结构。Instead of constructing a B-tree structure based on a value stored in a particular row, the Full-Text Engine builds an inverted, stacked, compressed index structure based on individual tokens from the text being indexed. 全文索引大小仅受运行 SQL ServerSQL Server 实例的计算机的可用内存资源限制。The size of a full-text index is limited only by the available memory resources of the computer on which the instance of SQL ServerSQL Server is running.

SQL Server 2008SQL Server 2008开始,全文索引与数据库引擎集成在一起,而不是像 SQL ServerSQL Server早期版本那样位于文件系统中。Beginning in SQL Server 2008SQL Server 2008, the full-text indexes are integrated with the Database Engine, instead of residing in the file system as in previous versions of SQL ServerSQL Server. 对于新数据库,全文目录现在为不属于任何文件组的虚拟对象;它仅是一个表示一组全文索引的逻辑概念。For a new database, the full-text catalog is now a virtual object that does not belong to any filegroup; it is merely a logical concept that refers to a group of the full-text indexes. 然而,请注意,在升级 SQL Server 2005 (9.x)SQL Server 2005 (9.x) 数据库(即包含数据文件的任意全文目录)的过程中,将创建一个新文件组。有关详细信息,请参阅 升级全文搜索Note, however, that during upgrade of a SQL Server 2005 (9.x)SQL Server 2005 (9.x) database, any full-text catalog that contains data files, a new filegroup is created; for more information, see Upgrade Full-Text Search.

每个表只允许有一个全文索引。Only one full-text index is allowed per table. 若要对某个表创建全文索引,该表必须具有一个唯一且非 Null 的列。For a full-text index to be created on a table, the table must have a single, unique nonnull column. 你可以在 charvarcharncharnvarchartextntextimagexmlvarbinary类型的列上生成全文索引,并且可对 varbinary(max) 索引以进行全文搜索。You can build a full-text index on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max) can be indexed for full-text search. 在数据类型为 varbinaryvarbinary(max)imagexml 的列上创建全文索引需要你指定类型列。Creating a full-text index on a column whose data type is varbinary, varbinary(max), image, or xml requires that you specify a type column. 类型列是用来存储每行中文档的文件扩展名(.doc、.pdf、xls 等)的表列。A type column is a table column in which you store the file extension (.doc, .pdf, .xls, and so forth) of the document in each row.

全文索引结构Full-text index structure

对全文索引的结构的良好了解将帮助您了解全文引擎的工作方式。A good understanding of the structure of a full-text index will help you understand how the Full-Text Engine works. 本主题使用 中的 Document Adventure WorksAdventure Works 表的以下摘录部分作为示例表。This topic uses the following excerpt of the Document table in Adventure WorksAdventure Works as an example table. 此摘录部分仅显示该表的两个列( DocumentID 列和 Title 列)和三行。This excerpt shows only two columns, the DocumentID column and the Title column, and three rows from the table.

对于本示例,我们假定已对“标题”列创建全文索引。For this example, we will assume that a full-text index has been created on the Title column.

DocumentIDDocumentID TitleTitle
11 Crank Arm and Tire MaintenanceCrank Arm and Tire Maintenance
22 Front Reflector Bracket and Reflector Assembly 3Front Reflector Bracket and Reflector Assembly 3
33 Front Reflector Bracket InstallationFront Reflector Bracket Installation

例如,下表(显示 Fragment 1)描述对“Document”表的“Title”列创建的全文索引的内容。For example, the following table, which shows Fragment 1, depicts the contents of the full-text index created on the Title column of the Document table. 与该表呈现的信息相比,全文索引包含更多信息。Full-text indexes contain more information than is presented in this table. 该表是全文索引的逻辑表示形式,并且仅为演示目的提供。The table is a logical representation of a full-text index and is provided for demonstration purposes only. 行以压缩格式存储,以优化磁盘的使用。The rows are stored in a compressed format to optimize disk usage.

注意,数据已从原始文档倒排。Notice that the data has been inverted from the original documents. 发生数据倒排是因为关键字映射到文档 ID。Inversion occurs because the keywords are mapped to the document IDs. 因此,全文索引通常称为倒排索引。For this reason, a full-text index is often referred to as an inverted index.

还要注意,已从全文索引中删除关键字“and”。Also notice that the keyword "and" has been removed from the full-text index. 这样做是因为“and”是非索引字,从全文索引中删除非索引字可以大幅节省磁盘空间,并因此提高查询性能。This is done because "and" is a stopword, and removing stopwords from a full-text index can lead to substantial savings in disk space thereby improving query performance. 有关非索引字的详细信息,请参阅 为全文搜索配置和管理非索引字和非索引字表For more information about stopwords, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

碎片 1Fragment 1

关键字Keyword ColIdColId DocIdDocId 出现次数Occurrence
CrankCrank 11 11 11
ArmArm 11 11 22
TireTire 11 11 44
维护Maintenance 11 11 55
FrontFront 11 22 11
FrontFront 11 33 11
ReflectorReflector 11 22 22
ReflectorReflector 11 22 55
ReflectorReflector 11 33 22
BracketBracket 11 22 33
BracketBracket 11 33 33
AssemblyAssembly 11 22 66
33 11 22 77
安装Installation 11 33 44

Keyword 列包含在创建索引时提取的单个标记的表示形式。The Keyword column contains a representation of a single token extracted at indexing time. 断字符可确定组成标记的词。Word breakers determine what makes up a token.

“ColId” 列包含的值对应于已建立全文索引的特定列。The ColId column contains a value that corresponds to a particular column that is full-text indexed.

“DocId” 列包含八字节整数的值,该值映射到全文索引表中特定的全文键值。The DocId column contains values for an eight-byte integer that maps to a particular full-text key value in a full-text indexed table. 如果全文键不是整数数据类型,则需要此映射。This mapping is necessary when the full-text key is not an integer data type. 在这样的情况下,全文键值与 DocId 值之间的映射将保留在名为 DocId Mapping 的单独表中。In such cases, mappings between full-text key values and DocId values are maintained in a separate table called the DocId Mapping table. 若要查询这些映射,请使用 sp_fulltext_keymappings 系统存储过程。To query for these mappings use the sp_fulltext_keymappings system stored procedure. 若要满足搜索条件,则上述表中的 DocId 值需要与 DocId Mapping 表联接,以便从所查询的基表中检索行。To satisfy a search condition, DocId values from the above table need to be joined with the DocId Mapping table to retrieve rows from the base table being queried. 如果基表的全文键值是整数类型,则该值直接充当 DocId,而不需要映射。If the full-text key value of the base table is an integer type, the value directly serves as the DocId and no mapping is necessary. 因此,使用整数全文键值有助于优化全文查询。Therefore, using integer full-text key values can help optimize full-text queries.

Occurrence 列包含整数值。The Occurrence column contains an integer value. 对于每个 DocId 值,均有一个 Occurrence 值的列表对应于该 DocId 值中特定关键字的相对字偏移量。For each DocId value, there is a list of occurrence values that correspond to the relative word offsets of the particular keyword within that DocId. 位置值用于确定短语或邻近匹配项,例如具有相邻位置值的短语。Occurrence values are useful in determining phrase or proximity matches, for example, phrases have numerically adjacent occurrence values. 它们还用于计算相关性分数,例如记分时可能会用某个关键字在 DocId 中的出现次数。They are also useful in computing relevance scores; for example, the number of occurrences of a keyword in a DocId may be used in scoring.

全文索引碎片Full-text index fragments

逻辑全文索引通常拆分到多个内部表中。The logical full-text index is usually split across multiple internal tables. 每个内部表称为一个全文索引碎片。Each internal table is called a full-text index fragment. 这些碎片中的某一些可能包含比其他碎片更新的数据。Some of these fragments might contain newer data than others. 例如,如果用户更新其 DocId 是 3 的以下行,并且该表可自动跟踪更改,则会创建新的碎片。For example, if a user updates the following row whose DocId is 3 and the table is auto change-tracked, a new fragment is created.

DocumentIDDocumentID TitleTitle
33 Rear ReflectorRear Reflector

在下面的示例(显示 Fragment 2)中,碎片中包含比 Fragment 1 中更新的关于 DocId 3 的数据。In the following example, which shows Fragment 2, the fragment contains newer data about DocId 3 compared to Fragment 1. 因此,当用户查询“Rear Reflector”时,会将 Fragment 2 中的数据用于 DocId 3。Therefore, when the user queries for "Rear Reflector" the data from Fragment 2 is used for DocId 3. 每个碎片都用一个创建时间戳来标记,可以使用 sys.fulltext_index_fragments 目录视图查询该时间戳。Each fragment is marked with a creation timestamp that can be queried by using the sys.fulltext_index_fragments catalog view.

碎片 2Fragment 2

关键字Keyword ColIdColId DocIdDocId OccOcc
RearRear 11 33 11
ReflectorReflector 11 33 22

从 Fragment 2 可以看到,全文查询需要在内部查询每个碎片,并放弃更旧的条目。As can be seen from Fragment 2, full-text queries need to query each fragment internally and discard older entries. 因此,全文索引中太多的全文索引碎片会导致查询性能大幅下降。Therefore, too many full-text index fragments in the full-text index can lead to substantial degradation in query performance. 若要减少碎片数,请使用 ALTER FULLTEXT CATALOGTransact-SQLTransact-SQL 语句的 REORGANIZE 选项来重新组织全文目录。To reduce the number of fragments, reorganize the fulltext catalog by using the REORGANIZE option of the ALTER FULLTEXT CATALOGTransact-SQLTransact-SQL statement. 此语句将执行一次 主合并,主合并将碎片合并成一个更大的碎片,并从全文索引中删除所有过时的条目。This statement performs a master merge, which merges the fragments into a single larger fragment and removes all obsolete entries from the full-text index.

经过重新组织后,示例索引会包含以下行:After being reorganized, the example index would contain the following rows:

关键字Keyword ColIdColId DocIdDocId OccOcc
CrankCrank 11 11 11
ArmArm 11 11 22
TireTire 11 11 44
维护Maintenance 11 11 55
FrontFront 11 22 11
RearRear 11 33 11
ReflectorReflector 11 22 22
ReflectorReflector 11 22 55
ReflectorReflector 11 33 22
BracketBracket 11 22 33
AssemblyAssembly 11 22 66
33 11 22 77

全文索引和普通 SQL Server 索引之间的区别:Differences between full-text indexes and regular SQL Server indexes:.

全文索引Full-text indexes 普通 SQL Server 索引Regular SQL Server indexes
每个表只允许有一个全文索引。Only one full-text index allowed per table. 每个表允许有多个普通索引。Several regular indexes allowed per table.
将数据添加到全文索引的操作称为“填充”,可以通过计划或特定请求来请求填充,也可以在添加新数据时自动填充。The addition of data to full-text indexes, called a population, can be requested through either a schedule or a specific request, or can occur automatically with the addition of new data. 当插入、更新或删除作为其基础的数据时自动更新。Updated automatically when the data upon which they are based is inserted, updated, or deleted.
在同一个数据库内分组为一个或多个全文目录。Grouped within the same database into one or more full-text catalogs. 不分组。Not grouped.

全文搜索中的语言组件和语言支持Full-Text search linguistic components and language support

全文搜索支持大约 50 种不同语言,例如英语、西班牙语、中文、日语、阿拉伯语、孟加拉语和印地语。Full-text search supports almost 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi. 有关支持的全文语言的完整列表,请参阅 sys.fulltext_languages (Transact-SQL)For a complete list of the supported full-text languages, see sys.fulltext_languages (Transact-SQL). 全文索引中包含的每一列与一个 Microsoft Windows 区域设置标识符 (LCID) 相关联,每个区域设置标识符等同于全文搜索支持的一种语言。Each of the columns contained in the full-text index is associated with a Microsoft Windows locale identifier (LCID) that equates to a language that is supported by full-text search. 例如,LCID 1033 等于美国英语,LCID 2057 等于英国英语。For example, LCID 1033 equates to U.S English, and LCID 2057 equates to British English. 对于每种支持的全文语言, SQL ServerSQL Server 提供语言组件以支持对以该语言存储的全文数据进行索引和查询。For each supported full-text language, SQL ServerSQL Server provides linguistic components that support indexing and querying full-text data that is stored in that language.

语言特有组件包括:Language-specific components include the following:

  • 断字符和词干分析器。Word breakers and stemmers. 断字符根据给定语言的词汇规则查找词边界(“断字”)。A word breaker finds word boundaries based on the lexical rules of a given language (word breaking). 每个断字符与一个词干分析器相关联,该词干分析器组合了同一种语言的动词。Each word breaker is associated with a stemmer that conjugates verbs for the same language. 有关详细信息,请参阅 配置和管理断字符和词干分析器以便搜索For more information, see Configure and Manage Word Breakers and Stemmers for Search.

  • 非索引字表。Stoplists. 提供系统非索引字表,其中包含一组基本非索引字(也称为干扰词)。A system stoplist is provided that contains a basic set stopwords (also known as noise words). “非索引字”是对搜索没有任何帮助并且被全文查询忽略的词。A stopword is a word that does not help the search and is ignored by full-text queries. 例如,在英语区域设置中,诸如“a”、“and”、“is”和“the”之类的词都被视为非索引字。For example, for the English locale words such as "a", "and", "is", and "the" are considered stopwords. 通常情况下,需要配置一个或多个同义词库文件和非索引字表。Typically, you will need to configure one or more thesaurus files and stoplists. 有关详细信息,请参阅 为全文搜索配置和管理非索引字和非索引字表For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

  • 同义词库文件。Thesaurus files. SQL ServerSQL Server 还会安装一个全局同义词库文件,并且还为每种全文语言安装一个同义词库文件。also installs a thesaurus file for each full-text language, as well as a global thesaurus file. 安装的同义词库文件实际上是空的,不过可以编辑它们以便为特定语言或商业应用场景定义同义词。The installed thesaurus files are essentially empty, but you can edit them to define synonyms for a specific language or business scenario. 通过开发针对全文数据定制的同义词库,您可以有效地扩大对这些数据的全文查询的范围。By developing a thesaurus tailored to your full-text data, you can effectively broaden the scope of full-text queries on that data. 有关详细信息,请参阅 为全文搜索配置和管理同义词库文件For more information, see Configure and Manage Thesaurus Files for Full-Text Search.

  • 筛选器 (iFilter)。Filters (iFilters). varbinary(max)imagexml 数据类型列中的文档进行索引时,需要使用筛选器来执行额外的处理工作。Indexing a document in a varbinary(max), image, or xml data type column requires a filter to perform extra processing. 此筛选器必须特定于文档类型(.doc、.pdf、.xls、.xml 等)。The filter must be specific to the document type (.doc, .pdf, .xls, .xml, and so forth). 有关详细信息,请参阅 配置和管理搜索筛选器For more information, see Configure and Manage Filters for Search.

断字符(和词干分析器)以及筛选器在筛选器后台程序宿主进程 (fdhost.exe) 中运行。Word breakers (and stemmers) and filters run in the filter daemon host process (fdhost.exe).

适用对象:yesSQL Server(从 2008 版开始)yesAzure SQL 数据库noAzure SQL 数据仓库no并行数据仓库APPLIES TO: yesSQL Server (starting with 2008) yesAzure SQL Database noAzure SQL Data Warehouse noParallel Data Warehouse