全文檢索搜尋Full-Text Search

SQL ServerSQL ServerAzure SQL DatabaseAzure SQL Database 中的全文檢索搜尋可讓使用者和應用程式針對 SQL ServerSQL Server 資料表中以字元為主的資料,執行全文檢索查詢。Full-Text Search in SQL ServerSQL Server and Azure SQL DatabaseAzure SQL Database lets users and applications run full-text queries against character-based data in SQL ServerSQL Server tables.

基本工作Basic tasks

本主題提供全文檢索搜尋的概觀,並描述其元件和其架構。This topic provides an overview of Full-Text Search and describes its components and its architecture. 如果您想要立即開始,則以下是基本工作。If you prefer to get started right away, here are the basic tasks.

注意

Full-Text Search is an optional component of the SQL ServerSQL Server Database Engine 的選擇性元件。Full-Text Search is an optional component of the SQL ServerSQL Server Database Engine. 如果您在安裝 SQL Server 時未選取全文檢索搜尋,請重新執行 SQL Server 安裝程式予以新增。If you didn't select Full-Text Search when you installed SQL Server, run SQL Server Setup again to add it.

概觀Overview

全文檢索索引包括資料表中一或多個以字元為基礎的資料行。A full-text index includes one or more character-based columns in a table. 這些資料行可以具有下列任何資料類型:charvarcharncharnvarchartextntextimagexmlvarbinary(max)FILESTREAMThese columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max) and FILESTREAM. 每個全文檢索索引都會為資料表中的一個或多個資料行建立索引,而且每個資料行都可以使用特定的語言。Each full-text index indexes one or more columns from the table, and each column can use a specific language.

全文檢索查詢會根據特定語言的規則 (例如英文或日文) 在單字與片語上運作,藉以針對全文檢索索引中的文字資料執行語言搜尋。Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. 全文檢索查詢可以包含簡單的單字和片語,或者單字或片語的多種形式。Full-text queries can include simple words and phrases or multiple forms of a word or phrase. 全文檢索查詢會傳回至少包含一個符合項目 (也稱為 「叫用」(Hit)) 的任何文件。A full-text query returns any documents that contain at least one match (also known as a hit). 如果目標文件包含全文檢索查詢中指定的所有詞彙,而且符合其他搜尋條件 (例如相符詞彙之間的距離),就會出現符合項目。A match occurs when a target document contains all the terms specified in the full-text query, and meets any other search conditions, such as the distance between the matching terms.

全文檢索搜尋查詢 Full-Text Search queries

將資料行加入至全文檢索索引之後,使用者和應用程式即可針對資料行中的文字執行全文檢索查詢。After columns have been added to a full-text index, users and applications can run full-text queries on the text in the columns. 這些查詢可以搜尋下列任何項目:These queries can search for any of the following:

  • 一或多個特定的單字或片語 (「不可分割的詞彙」(Simple Term))One or more specific words or phrases (simple term)

  • 以指定之文字開頭的單字或片語 (「前置詞彙」(Prefix Term))A word or a phrase where the words begin with specified text (prefix term)

  • 特定單字的字形變化 (「衍生詞彙」(Generation Term))Inflectional forms of a specific word (generation term)

  • 靠近另一個單字或片語的單字或片語 (「相近詞彙」(Proximity Term))A word or phrase close to another word or phrase (proximity term)

  • 特定單字的同義字變化 (「同義字」(Thesaurus))Synonymous forms of a specific word (thesaurus)

  • 使用加權值的單字或片語 (「加權詞彙」(Weighted Term))Words or phrases using weighted values (weighted term)

    全文檢索查詢不區分大小寫。Full-text queries are not case-sensitive. 例如,搜尋 "Aluminum" 或 "aluminum" 都會傳回相同的結果。For example, searching for "Aluminum" or "aluminum" returns the same results.

    全文檢索查詢會使用一小組 Transact-SQLTransact-SQL 述詞 (CONTAINS 和 FREETEXT) 與函數 (CONTAINSTABLE 和 FREETEXTTABLE)。Full-text queries use a small set of Transact-SQLTransact-SQL predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). 不過,給定商務案例的搜尋目標會影響全文檢索查詢的結構。However, the search goals of a given business scenario influence the structure of the full-text queries. 例如:For example:

  • 電子商務 - 搜尋網站上的產品:e-business—searching for a product on a website:

    SELECT product_id   
    FROM products   
    WHERE CONTAINS(product_description, ”Snap Happy 100EZ” OR FORMSOF(THESAURUS,’Snap Happy’) OR ‘100EZ’)   
    AND product_cost < 200 ;  
    
  • 人員招募案例 - 搜尋具有 SQL ServerSQL Server使用經驗的工作應徵者:Recruitment scenario—searching for job candidates that have experience working with SQL ServerSQL Server:

    SELECT candidate_name,SSN   
    FROM candidates   
    WHERE CONTAINS(candidate_resume,”SQL Server”) AND candidate_division =DBA;  
    

    如需詳細資訊,請參閱 使用全文檢索搜尋查詢For more information, see Query with Full-Text Search.

比較全文檢索搜尋查詢與 LIKE 述詞 Compare Full-Text Search queries to the LIKE predicate

相較於全文檢索搜尋, LIKE Transact-SQLTransact-SQL 述詞只能針對字元模式運作。In contrast to full-text search, the LIKE Transact-SQLTransact-SQL predicate works on character patterns only. 您也無法使用 LIKE 述詞來查詢格式化的二進位資料。Also, you cannot use the LIKE predicate to query formatted binary data. 此外,針對大量非結構化文字資料執行 LIKE 查詢的速度會比針對相同資料執行對等全文檢索查詢的速度要慢很多。Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. 對於數百萬列的資料,使用 LIKE 查詢時可能要好幾分鐘才能傳回搜尋結果,但是使用全文檢索查詢時可能只要幾秒鐘的時間 (視傳回的資料列數目而定)。A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.

全文檢索搜尋架構 Full-Text Search architecture

全文檢索搜尋架構是由下列處理序所組成:Full-text search architecture consists of the following processes:

  • SQL ServerSQL Server 處理序 (sqlservr.exe)。The SQL ServerSQL Server process (sqlservr.exe).

  • 篩選背景程式主機處理序 (fdhost.exe)。The filter daemon host process (fdhost.exe).

    基於安全性理由,篩選是由稱為篩選背景程式主機的個別處理序載入。For security reasons, filters are loaded by separate processes called the filter daemon hosts. FDHOST 啟動器服務 (MSSQLFDLauncher) 會建立 fdhost.exe 處理序,而且這些處理序會在 FDHOST 啟動器服務帳戶的安全性認證底下執行。The fdhost.exe processes are created by an FDHOST launcher service (MSSQLFDLauncher), and they run under the security credentials of the FDHOST launcher service account. 因此,您必須執行 FDHOST 啟動器服務,才能讓全文檢索索引和全文檢索查詢運作。Therefore, the FDHOST launcher service must be running for full-text indexing and full-text querying to work. 如需設定此服務之服務帳戶的相關資訊,請參閱 設定全文檢索篩選背景程式啟動器的服務帳戶For information about setting the service account for this service, see Set the Service Account for the Full-text Filter Daemon Launcher.

    這兩個處理序包含全文檢索搜尋架構的元件。These two processes contain the components of the full-text search architecture. 下圖將摘要列出這些元件及其關聯性。These components and their relationships are summarized in the following illustration. 這些元件將在該圖之後描述。The components are described after the illustration.

    全文檢索搜尋架構full-text search architecture

SQL Server 處理序 SQL Server process

SQL ServerSQL Server 處理序會使用全文檢索搜尋的下列元件:The SQL ServerSQL Server process uses the following components for full-text search:

  • 使用者資料表。User tables. 這些資料表包含要進行全文檢索索引的資料。These tables contain the data to be full-text indexed.

  • 全文檢索收集程式。Full-text gatherer. 全文檢索收集程式會使用全文檢索搜耙執行緒。The full-text gatherer works with the full-text crawl threads. 此元件負責排程和驅動全文檢索索引母體擴展,以及監視全文檢索目錄。It is responsible for scheduling and driving the population of full-text indexes, and also for monitoring full-text catalogs.

  • 同義字檔案。Thesaurus files. 這些檔案包含搜尋詞彙的同義字。These files contain synonyms of search terms. 如需詳細資訊,請參閱 設定及管理全文檢索搜尋的同義字檔案For more information, see Configure and Manage Thesaurus Files for Full-Text Search.

  • 停用字詞表物件。Stoplist objects. 停用字詞表物件包含對搜尋沒有任何協助之常見單字的清單。Stoplist objects contain a list of common words that are not useful for the search. 如需詳細資訊,請參閱 設定及管理全文檢索搜尋的停用字詞與停用字詞表For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

  • SQL ServerSQL Server 查詢處理器。 SQL ServerSQL Server query processor. 查詢處理器會編譯並執行 SQL 查詢。The query processor compiles and executes SQL queries. 如果某個 SQL 查詢包含全文檢索搜尋查詢,該查詢就會在編譯和執行期間傳送至全文檢索引擎。If a SQL query includes a full-text search query, the query is sent to the Full-Text Engine, both during compilation and during execution. 系統會針對全文檢索索引比對查詢結果。The query result is matched against the full-text index.

  • 全文檢索引擎。Full-Text Engine. SQL ServerSQL Server 中的全文檢索引擎現在已經與查詢處理器完全整合了。The Full-Text Engine in SQL ServerSQL Server is fully integrated with the query processor. 全文檢索引擎會編譯並執行全文檢索查詢。The Full-Text Engine compiles and executes full-text queries. 在查詢執行期間,全文檢索引擎可能會收到來自同義字和停用字詞表的輸入。As part of query execution, the Full-Text Engine might receive input from the thesaurus and stoplist.

    注意

    SQL Server 2008SQL Server 2008 及更新的版本中,全文檢索引擎位於 SQL ServerSQL Server 處理序中,而非個別的服務中。In SQL Server 2008SQL Server 2008 and later versions, the Full-Text Engine resides in the SQL ServerSQL Server process, rather than in a separate service. 將全文檢索引擎整合到 Database Engine,已改善全文檢索管理能力、混合式查詢的最佳化,以及整體效能。Integrating the Full-Text Engine into the Database Engine improved full-text manageability, optimization of mixed query, and overall performance.

  • 索引寫入器 (索引子)。Index writer (indexer). 索引寫入器會建立用來儲存索引 Token 的結構。The index writer builds the structure that is used to store the indexed tokens.

  • 篩選背景程式管理員。Filter daemon manager. 篩選背景程式管理員會負責監視全文檢索引擎篩選背景程式主機的狀態。The filter daemon manager is responsible for monitoring the status of the Full-Text Engine filter daemon host.

Filter Daemon Host process Filter Daemon Host process

篩選背景程式主機是全文檢索引擎所啟動的處理序。The filter daemon host is a process that is started by the Full-Text Engine. 它會執行下列全文檢索搜尋元件,而這些元件會負責存取、篩選和斷詞處理資料表的資料,以及斷詞處理和詞幹分析查詢輸入。It runs the following full-text search components, which are responsible for accessing, filtering, and word breaking data from tables, as well as for word breaking and stemming the query input.

篩選背景程式主機的元件如下所示:The components of the filter daemon host are as follows:

  • 通訊協定處理常式。Protocol handler. 這個元件會從記憶體中提取資料以便進一步處理,而且會從指定之資料庫中的使用者資料表中存取資料。This component pulls the data from memory for further processing and accesses data from a user table in a specified database. 其中一項責任就是從建立全文檢索索引的資料行中蒐集資料,並將資料傳遞給篩選背景程式主機,然後此處理序將會視需要套用篩選和斷詞工具。One of its responsibilities is to gather data from the columns being full-text indexed and pass it to the filter daemon host, which will apply filtering and word breaker as required.

  • 篩選器。Filters. 某些資料類型需要先篩選,然後才能針對文件中的資料建立全文檢索索引,包括 varbinaryvarbinary(max)imagexml 資料行中的資料。Some data types require filtering before the data in a document can be full-text indexed, including data in varbinary, varbinary(max), image, or xml columns. 用於給定文件的篩選會因其文件類型而不同。The filter used for a given document depends on its document type. 例如,Microsoft Word (.doc) 文件、Microsoft Excel (.xls) 文件和 XML (.xml) 文件會使用不同的篩選。For example, different filters are used for Microsoft Word (.doc) documents, Microsoft Excel (.xls) documents, and XML (.xml) documents. 然後,篩選會從文件中擷取文字區塊,並且移除內嵌的格式,並保留文字和文字位置的相關資訊。Then the filter extracts chunks of text from the document, removing embedded formatting and retaining the text and, potentially, information about the position of the text. 其結果就是文字資訊的資料流。The result is a stream of textual information. 如需詳細資訊,請參閱 設定及管理搜尋的篩選For more information, see Configure and Manage Filters for Search.

  • 斷詞工具和字幹分析器。Word breakers and stemmers. 斷詞工具是一項語言特有的元件,它會根據給定語言的語彙規則來尋找文字分界 (「斷詞」(Word Breaking))。A word breaker is a language-specific component that finds word boundaries based on the lexical rules of a given language (word breaking). 每個斷詞工具都與語言特有的字幹分析器元件相關聯,而且此元件會進行動詞變化和執行字形擴展。Each word breaker is associated with a language-specific stemmer component that conjugates verbs and performs inflectional expansions. 建立索引時,篩選背景程式主機會使用斷詞工具和字幹分析器,針對來自給定資料表資料行的文字資料執行語言分析。At indexing time, the filter daemon host uses a word breaker and stemmer to perform linguistic analysis on the textual data from a given table column. 與全文檢索索引中資料表資料行相關聯的語言會決定哪些斷詞工具和字幹分析器要用於建立該資料行的索引。The language that is associated with a table column in the full-text index determines which word breaker and stemmer are used for indexing the column. 如需詳細資訊,請參閱 設定及管理搜尋的斷詞工具與字幹分析器For more information, see Configure and Manage Word Breakers and Stemmers for Search.

全文檢索搜尋處理 Full-Text Search processing

全文檢索搜尋是由全文檢索引擎所提供。Full-text search is powered by the Full-Text Engine. 此全文檢索引擎扮演兩個角色:索引支援和查詢支援。The Full-Text Engine has two roles: indexing support and querying support.

全文檢索索引處理序 Full-Text indexing process

全文檢索擴展 (也就是搜耙) 起始時,全文檢索引擎會將大批的資料發送至記憶體中,並通知篩選背景程式主機。When a full-text population (also known as a crawl) is initiated, the Full-Text Engine pushes large batches of data into memory and notifies the filter daemon host. 此主機會針對資料進行篩選並斷詞,並且將轉換的資料轉換成反向字詞清單。The host filters and word breaks the data and converts the converted data into inverted word lists. 然後,全文檢索搜尋會從這些字詞清單中提取轉換的資料、處理資料以便移除停用字詞,並且將批次的字詞清單保存在一或多個反向索引中。The full-text search then pulls the converted data from the word lists, processes the data to remove stopwords, and persists the word lists for a batch into one or more inverted indexes.

varbinary(max)image 資料行中儲存的資料編製索引時,實作 IFilter 介面的篩選會依據為該資料指定的檔案格式 (例如 MicrosoftMicrosoft Word 格式) 擷取文字。When indexing data stored in a varbinary(max) or image column, the filter, which implements the IFilter interface, extracts text based on the specified file format for that data (for example, MicrosoftMicrosoft Word). 在某些情況下,篩選元件必須將 varbinary(max)image 資料寫出至 filterdata 資料夾,而不是推送至記憶體中。In some cases, the filter components require the varbinary(max), or image data to be written out to the filterdata folder, instead of being pushed into memory.

做為處理程序的一部分,收集的文字資料在經由文字分隔的處理之後,會分隔成 Token 或關鍵字。As part of processing, the gathered text data is passed through a word breaker to separate the text into individual tokens, or keywords. 用於 Token 化的語言是在資料行層級指定,也可由篩選元件在 varbinary(max)imagexml 資料中識別。The language used for tokenization is specified at the column level, or can be identified within varbinary(max), image, or xml data by the filter component.

您可以在停用字詞和 Token 儲存至全文檢索索引或索引片段前,進行額外的處理以移除停用字詞並將 Token 正規化。Additional processing may be performed to remove stopwords, and to normalize tokens before they are stored in the full-text index or an index fragment.

完成母體擴展後會觸發最後的合併程序,將索引片段合併成一個主要的全文檢索索引。When a population has completed, a final merge process is triggered that merges the index fragments together into one master full-text index. 如此可提升查詢的效能,因為只需要查詢一個主索引而不需查詢數個索引片段,而且可使用較佳的計分系統來排定關聯順序。This results in improved query performance since only the master index needs to be queried rather than a number of index fragments, and better scoring statistics may be used for relevance ranking.

全文檢索查詢處理序 Full-Text querying process

查詢處理器會將查詢的全文檢索部分傳遞至全文檢索引擎,以便進行處理。The query processor passes the full-text portions of a query to the Full-Text Engine for processing. 全文檢索引擎會執行斷詞並選擇性地執行同義字展開、詞幹分析和停用字詞 (非搜尋字) 處理。The Full-Text Engine performs word breaking and, optionally, thesaurus expansions, stemming, and stopword (noise-word) processing. 然後,查詢的全文檢索部分會以 SQL 運算子的形式表示,主要表示成資料流資料表值函式 (STVF)。Then the full-text portions of the query are represented in the form of SQL operators, primarily as streaming table-valued functions (STVFs). 在查詢執行期間,這些 STVF 會存取反向索引來擷取正確的結果。During query execution, these STVFs access the inverted index to retrieve the correct results. 接著,這些結果會在此時傳回用戶端,或在傳回用戶端之前進一步處理。The results are either returned to the client at this point, or they are further processed before being returned to the client.

全文檢索索引架構Full-text index architecture

全文檢索引擎會使用全文檢索索引中的資訊來編譯全文檢索查詢,以便快速地在資料表中搜尋特定字詞或字詞組合。The information in full-text indexes is used by the Full-Text Engine to compile full-text queries that can quickly search a table for particular words or combinations of words. 全文檢索索引會儲存重要單字及這些單字在資料庫資料表之一或多個資料行內位置的相關資訊。A full-text index stores information about significant words and their location within one or more columns of a database table. 全文檢索索引是一種特殊類型的 Token 式功能索引,由 Full-Text Engine for SQL ServerSQL Server所建立與維護。A full-text index is a special type of token-based functional index that is built and maintained by the Full-Text Engine for SQL ServerSQL Server. 建立全文檢索索引的程序與建立其他索引類型的程序大不相同。The process of building a full-text index differs from building other types of indexes. 全文檢索引擎會根據個別 Token 從索引中的文字建立反向、堆疊以及壓縮的索引結構,而不是根據特定資料列中所儲存的值來建構 B 型樹狀結構。Instead of constructing a B-tree structure based on a value stored in a particular row, the Full-Text Engine builds an inverted, stacked, compressed index structure based on individual tokens from the text being indexed. 全文檢索索引的大小只受限於執行 SQL ServerSQL Server 執行個體之電腦的可用記憶體資源。The size of a full-text index is limited only by the available memory resources of the computer on which the instance of SQL ServerSQL Server is running.

SQL Server 2008SQL Server 2008開始,全文檢索索引會與 Database Engine 整合在一起,而非位於檔案系統中,如同舊版 SQL ServerSQL ServerBeginning in SQL Server 2008SQL Server 2008, the full-text indexes are integrated with the Database Engine, instead of residing in the file system as in previous versions of SQL ServerSQL Server. 在新的資料庫中,全文檢索目錄現在是不屬於任何檔案群組的虛擬物件。它只是參考一組全文檢索索引的邏輯概念。For a new database, the full-text catalog is now a virtual object that does not belong to any filegroup; it is merely a logical concept that refers to a group of the full-text indexes. 不過,請注意,在 SQL Server 2005SQL Server 2005 資料庫 (含有資料檔案的任何全文檢索目錄) 的升級期間,系統會建立新的檔案群組。如需詳細資訊,請參閱 升級全文檢索搜尋Note, however, that during upgrade of a SQL Server 2005SQL Server 2005 database, any full-text catalog that contains data files, a new filegroup is created; for more information, see Upgrade Full-Text Search.

每個資料表只允許有一個全文檢索索引。Only one full-text index is allowed per table. 若要對資料表建立全文檢索索引,該資料表必須有單一的非 Null 唯一資料行。For a full-text index to be created on a table, the table must have a single, unique nonnull column. 您可以針對 charvarcharncharnvarchartextntextimagexmlvarbinaryvarbinary(max) 類型的資料行建立全文檢索索引,並且建立全文檢索搜尋的索引。You can build a full-text index on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max) can be indexed for full-text search. 針對資料類型為 varbinaryvarbinary(max)imagexml 的資料行建立全文檢索索引會要求您指定類型資料行。Creating a full-text index on a column whose data type is varbinary, varbinary(max), image, or xml requires that you specify a type column. 「類型資料行」是一個資料表資料行,您可以在每個資料列中儲存文件的副檔名 (.doc、.pdf 與 .xls 等)。A type column is a table column in which you store the file extension (.doc, .pdf, .xls, and so forth) of the document in each row.

全文檢索索引結構 Full-text index structure

若能充分了解全文檢索索引的結構,將有助於了解全文檢索引擎的運作方式。A good understanding of the structure of a full-text index will help you understand how the Full-Text Engine works. 本主題會使用下列 Document Adventure WorksAdventure Works 資料表的摘錄當做範例資料表。This topic uses the following excerpt of the Document table in Adventure WorksAdventure Works as an example table. 這個摘錄只會顯示該資料表中的兩個資料行 ( DocumentID 資料行和 Title 資料行) 和三個資料列。This excerpt shows only two columns, the DocumentID column and the Title column, and three rows from the table.

就本例而言,我們會假設已經在 Title 資料行中建立了全文檢索索引。For this example, we will assume that a full-text index has been created on the Title column.

DocumentIDDocumentID TitleTitle
11 Crank Arm and Tire MaintenanceCrank Arm and Tire Maintenance
22 Front Reflector Bracket and Reflector Assembly 3Front Reflector Bracket and Reflector Assembly 3
33 Front Reflector Bracket InstallationFront Reflector Bracket Installation

例如,下表 (顯示片段 1) 會描述針對 Document 資料表之 Title 資料行所建立的全文檢索索引內容。For example, the following table, which shows Fragment 1, depicts the contents of the full-text index created on the Title column of the Document table. 全文檢索索引所包含的資訊會比顯示在此資料表中的資訊還要多。Full-text indexes contain more information than is presented in this table. 此資料表是全文檢索索引的邏輯表示法,僅針對示範目的提供。The table is a logical representation of a full-text index and is provided for demonstration purposes only. 這些資料列會以壓縮的格式儲存,以便最佳化磁碟使用量。The rows are stored in a compressed format to optimize disk usage.

請注意,資料已經與原始文件相反。Notice that the data has been inverted from the original documents. 因為關鍵字會對應至文件識別碼,所以會發生相反的情況。Inversion occurs because the keywords are mapped to the document IDs. 因此,全文檢索索引通常稱為反向索引。For this reason, a full-text index is often referred to as an inverted index.

此外,請注意,關鍵字 "and" 已經從全文檢索索引中移除了。Also notice that the keyword "and" has been removed from the full-text index. 進行此作業的原因是 "and" 是停用字詞,而且從全文檢索索引中移除停用字詞可能會大幅節省磁碟空間,進而改善查詢效能。This is done because "and" is a stopword, and removing stopwords from a full-text index can lead to substantial savings in disk space thereby improving query performance. 如需停用字詞的詳細資訊,請參閱 設定及管理全文檢索搜尋的停用字詞與停用字詞表For more information about stopwords, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

片段 1Fragment 1

關鍵字Keyword ColIdColId DocIdDocId 出現次數Occurrence
CrankCrank 11 11 11
ArmArm 11 11 22
TireTire 11 11 44
維護Maintenance 11 11 55
FrontFront 11 22 11
FrontFront 11 33 11
ReflectorReflector 11 22 22
ReflectorReflector 11 22 55
ReflectorReflector 11 33 22
BracketBracket 11 22 33
BracketBracket 11 33 33
組件Assembly 11 22 66
33 11 22 77
安裝Installation 11 33 44

Keyword 資料行包含編列索引時所擷取的單一 Token 表示法。The Keyword column contains a representation of a single token extracted at indexing time. 文字分隔會決定 Token 的組成項目。Word breakers determine what makes up a token.

ColId 資料行所包含的值會對應到已建立全文檢索索引的特定資料行。The ColId column contains a value that corresponds to a particular column that is full-text indexed.

DocId 資料行含有八位元組整數的值,此整數會對應到全文檢索索引資料表中的特定全文檢索索引鍵值。The DocId column contains values for an eight-byte integer that maps to a particular full-text key value in a full-text indexed table. 當全文檢索索引鍵不是整數資料類型時,這項對應就是必要的。This mapping is necessary when the full-text key is not an integer data type. 在這類情況下,全文檢索索引鍵值與 DocId 值之間的對應會保存在稱為 DocId Mapping 資料表的個別資料表中。In such cases, mappings between full-text key values and DocId values are maintained in a separate table called the DocId Mapping table. 若要查詢這些對應,請使用 sp_fulltext_keymappings 系統預存程序。To query for these mappings use the sp_fulltext_keymappings system stored procedure. 為了滿足搜尋條件,上述資料表中的 DocId 值必須與 DocId Mapping 資料表聯結,以便從查詢的基底資料表中擷取資料列。To satisfy a search condition, DocId values from the above table need to be joined with the DocId Mapping table to retrieve rows from the base table being queried. 如果基底資料表的全文檢索索引鍵值是整數類型,此值就會直接當做 DocId 而且不需要任何對應。If the full-text key value of the base table is an integer type, the value directly serves as the DocId and no mapping is necessary. 因此,使用整數全文檢索索引鍵值有助於最佳化全文檢索查詢。Therefore, using integer full-text key values can help optimize full-text queries.

Occurrence 資料行包含整數值。The Occurrence column contains an integer value. 針對每個 DocId 值,都會有一個對應到該 DocId 內特定關鍵字之相對單字位移的出現次數值清單。For each DocId value, there is a list of occurrence values that correspond to the relative word offsets of the particular keyword within that DocId. 出現次數值有助於決定詞句或相似的相符項目,例如,具有鄰近發生次數值的片語。Occurrence values are useful in determining phrase or proximity matches, for example, phrases have numerically adjacent occurrence values. 它們也有助於計算相關分數。例如,在 DocId 中的關鍵字出現次數可用來計分。They are also useful in computing relevance scores; for example, the number of occurrences of a keyword in a DocId may be used in scoring.

全文檢索索引片段 Full-text index fragments

邏輯全文檢索索引通常會在多份內部資料表之間分割。The logical full-text index is usually split across multiple internal tables. 每份內部資料表會稱為全文檢索索引片段。Each internal table is called a full-text index fragment. 其中某些片段可能包含比其他片段更新的資料。Some of these fragments might contain newer data than others. 例如,如果使用者更新 DocId 為 3 的下列資料列,而且資料表已進行自動變更追蹤,就會建立新的片段。For example, if a user updates the following row whose DocId is 3 and the table is auto change-tracked, a new fragment is created.

DocumentIDDocumentID TitleTitle
33 Rear ReflectorRear Reflector

在下列範例 (顯示片段 2) 中,此片段包含的 DocId 3 相關資料比片段 1 更新。In the following example, which shows Fragment 2, the fragment contains newer data about DocId 3 compared to Fragment 1. 因此,當使用者查詢 "Rear Reflector" 時,片段 2 的資料就會用於 DocId 3。Therefore, when the user queries for "Rear Reflector" the data from Fragment 2 is used for DocId 3. 每個片段都會以建立時間戳記標示,而且您可以使用 sys.fulltext_index_fragments 目錄檢視,查詢此時間戳記。Each fragment is marked with a creation timestamp that can be queried by using the sys.fulltext_index_fragments catalog view.

片段 2Fragment 2

關鍵字Keyword ColIdColId DocIdDocId OccOcc
RearRear 11 33 11
ReflectorReflector 11 33 22

如片段 2 所示,全文檢索查詢必須在內部查詢每個片段並捨棄較舊的項目。As can be seen from Fragment 2, full-text queries need to query each fragment internally and discard older entries. 因此,如果全文檢索索引包含過多全文檢索索引片段,可能會導致查詢效能大幅降低。Therefore, too many full-text index fragments in the full-text index can lead to substantial degradation in query performance. 若要減少片段的數目,請使用 ALTER FULLTEXT CATALOG Transact-SQLTransact-SQL 陳述式的 REORGANIZE 選項來重新組織全文檢索目錄。To reduce the number of fragments, reorganize the fulltext catalog by using the REORGANIZE option of the ALTER FULLTEXT CATALOG Transact-SQLTransact-SQL statement. 這個陳述式會執行「主要合併」,將片段合併成較大的單一片段,然後從全文檢索索引中移除所有已過時的項目。This statement performs a master merge, which merges the fragments into a single larger fragment and removes all obsolete entries from the full-text index.

重新組織之後,範例索引就會包含下列資料列:After being reorganized, the example index would contain the following rows:

關鍵字Keyword ColIdColId DocIdDocId OccOcc
CrankCrank 11 11 11
ArmArm 11 11 22
TireTire 11 11 44
維護Maintenance 11 11 55
FrontFront 11 22 11
RearRear 11 33 11
ReflectorReflector 11 22 22
ReflectorReflector 11 22 55
ReflectorReflector 11 33 22
BracketBracket 11 22 33
組件Assembly 11 22 66
33 11 22 77

全文檢索索引與一般 SQL Server 索引之間的差異:Differences between full-text indexes and regular SQL Server indexes:.

全文檢索索引Full-text indexes 一般 SQL Server 索引Regular SQL Server indexes
每個資料表只允許有一個全文檢索索引。Only one full-text index allowed per table. 每個資料表允許有多個一般索引。Several regular indexes allowed per table.
將資料加入至全文檢索索引的作業稱為「母體擴展」(Population),可透過排程或特定的要求來要求執行,也可在加入新的資料時自動執行。The addition of data to full-text indexes, called a population, can be requested through either a schedule or a specific request, or can occur automatically with the addition of new data. 當依據的資料有插入、更新或刪除時,會自動更新索引內容。Updated automatically when the data upon which they are based is inserted, updated, or deleted.
在相同的資料庫中分組為一個或多個全文檢索目錄。Grouped within the same database into one or more full-text catalogs. 沒有分組。Not grouped.

全文檢索搜尋語言元件和語言支援 Full-Text search linguistic components and language support

全文檢索搜尋幾乎支援 50 種不同的語言,例如英文、西班牙文、中文、日文、阿拉伯文、孟加拉文和印度文。Full-text search supports almost 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi. 如需支援之全文檢索語言的完整清單,請參閱 sys.fulltext_languages (Transact-SQL)For a complete list of the supported full-text languages, see sys.fulltext_languages (Transact-SQL). 全文檢索索引所包含的每個資料行都與 Microsoft Windows 地區設定識別碼 (LCID) 相關聯,而這個識別碼就等於全文檢索搜尋所支援的語言。Each of the columns contained in the full-text index is associated with a Microsoft Windows locale identifier (LCID) that equates to a language that is supported by full-text search. 例如,LCID 1033 等於美式英文,而 LCID 2057 等於英式英文。For example, LCID 1033 equates to U.S English, and LCID 2057 equates to British English. SQL ServerSQL Server 針對每個支援的全文檢索語言提供了一些語言元件,可支援索引和查詢使用該語言所儲存的全文檢索資料。For each supported full-text language, SQL ServerSQL Server provides linguistic components that support indexing and querying full-text data that is stored in that language.

語言特有的元件包含下列:Language-specific components include the following:

  • 斷詞工具和字幹分析器。Word breakers and stemmers. 斷詞工具會根據給定語言的語彙規則來尋找文字分界 (「斷詞」(Word Breaking))。A word breaker finds word boundaries based on the lexical rules of a given language (word breaking). 每個斷詞工具都與針對相同語言進行動詞變化的字幹分析器相關聯。Each word breaker is associated with a stemmer that conjugates verbs for the same language. 如需詳細資訊,請參閱 設定及管理搜尋的斷詞工具與字幹分析器For more information, see Configure and Manage Word Breakers and Stemmers for Search.

  • 停用字詞表。Stoplists. 提供包含基本停用字詞 (也稱為非搜尋字) 集合的系統停用字詞表。A system stoplist is provided that contains a basic set stopwords (also known as noise words). 「停用字詞」 (Stopword) 是指無助於搜尋而且全文檢索查詢會忽略的單字。A stopword is a word that does not help the search and is ignored by full-text queries. 以英文地區設定為例,"a"、"and"、"is" 和 "the" 都會被視為停用字詞。For example, for the English locale words such as "a", "and", "is", and "the" are considered stopwords. 一般而言,您必須設定一個或多個同義字檔案和停用字詞表。Typically, you will need to configure one or more thesaurus files and stoplists. 如需詳細資訊,請參閱 設定及管理全文檢索搜尋的停用字詞與停用字詞表For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

  • 同義字檔案。Thesaurus files. SQL ServerSQL Server 也會針對每個全文檢索語言安裝同義字檔案,以及全域同義字檔案。 also installs a thesaurus file for each full-text language, as well as a global thesaurus file. 已安裝的同義字 (Thesaurus) 檔案基本上是空白的,但是您可以編輯它們,以便定義特定語言或商務狀況的同義字 (Synonym)。The installed thesaurus files are essentially empty, but you can edit them to define synonyms for a specific language or business scenario. 透過開發符合全文檢索資料的同義字,您可以有效地擴大針對該資料進行全文檢索查詢的範圍。By developing a thesaurus tailored to your full-text data, you can effectively broaden the scope of full-text queries on that data. 如需詳細資訊,請參閱 設定及管理全文檢索搜尋的同義字檔案For more information, see Configure and Manage Thesaurus Files for Full-Text Search.

  • 篩選 (iFilters)。Filters (iFilters). varbinary(max)imagexml 資料類型資料行中索引文件需要執行額外處理的篩選。Indexing a document in a varbinary(max), image, or xml data type column requires a filter to perform extra processing. 此篩選必須是文件類型 (.doc、.pdf、.xls 和 .xml 等等) 特有的。The filter must be specific to the document type (.doc, .pdf, .xls, .xml, and so forth). 如需詳細資訊,請參閱 設定及管理搜尋的篩選For more information, see Configure and Manage Filters for Search.

    斷詞工具 (和字幹分析器) 與篩選會在篩選背景程式主機處理序 (fdhost.exe) 中執行。Word breakers (and stemmers) and filters run in the filter daemon host process (fdhost.exe).

本主題適用於:是SQL Server (從 2008 開始)是Azure SQL Database否Azure SQL 資料倉儲 否平行處理資料倉儲 THIS TOPIC APPLIES TO:yesSQL Server (starting with 2008)yesAzure SQL DatabasenoAzure SQL Data Warehouse noParallel Data Warehouse