在 SharePoint Server 中規劃編目及同盟Plan crawling and federation in SharePoint Server

摘要: 規劃如何編目或結為同盟以在 SharePoint Server 2016 和 SharePoint Server 2013 中進行搜尋。Summary: Plan to crawl or federate for search in SharePoint Server 2016 and SharePoint Server 2013.

您必須先編目要讓使用者搜尋的內容或將其結為同盟,使用者才能在 SharePoint Server 中執行搜尋。當您編目內容時,搜尋服務會建置可供使用者執行查詢 (搜尋要求) 的搜尋索引。您也可以設定搜尋系統,使其同時顯示來自於外部提供者 (例如 Bing) 的搜尋結果和來自於本機搜尋索引的結果。從外部提供者取得搜尋結果並將其顯示於本機的程序,稱為同盟。Before users can perform searches in SharePoint Server, you must crawl or federate the content that you want them to be able to search. When you crawl content, the Search service builds a search index that users can run queries (search requests) against. You can also configure the Search system to display search results from an external provider (such as Bing) alongside the results from the local search index. The process of getting search results from an external provider and displaying the results locally is called federation.

規劃內容來源Plan content sources

內容來源 是指編目設定群組的定義,這些設定包括所要編目的主機、所將編目的內容類型 (例如 SharePoint 內容或檔案共用)、編目排程與編目的深度等。A content source is a definition of a group of crawl settings such as which hosts to crawl, the type of content that will be crawled (such as SharePoint content or file shares), a crawl schedule, and how deep to crawl.

當您建立 Search Service 應用程式時,此服務應用程式會自動提供預先設定的內容來源 本機 SharePoint 網站 。您可以使用此內容來源,指定如何編目與 Search Service 應用程式相關聯之 Web 應用程式中的 SharePoint 內容。When you create a Search service application, the service application automatically provides the pre-configured content source Local SharePoint sites. You can use this content source to specify how to crawl all SharePoint content in web applications that are associated with the Search service application.

若您只有一種類型的內容 (例如,所有內容都屬於「SharePoint 網站」類型或「檔案共用」類型),您可能只需要一個內容來源。但若您有不同類型的內容,或每部主機各有不同的需求,則您可以定義多個內容來源。若您需要執行下列作業,請規劃建立更多內容來源:If you have only one type of content (for example, all content is of type SharePoint sites or type file shares), you might need only one content source. However, if you have different types of content or unique requirements per host, you might want to define multiple content sources. Plan to create additional content sources when you have to do the following:

  • 編目不同類型的內容,例如企業營運系統應用程式中的檔案共用與資料Crawl different types of content — for example, file shares and data in a line-of-business application

  • 在不同的排程上編目某些內容Crawl some content on different schedules than other content

  • 限制或增加要編目的內容量Limit or increase the quantity of content that is crawled

  • 為不同網站的編目設定不同的屬性Set different priorities for crawling different sites

  • 只更新某些類型的內容Keep some types of content fresher than others

您可以在每個 Search Service 應用程式中建立大量的內容來源,但是每個內容來源有相關的額外負荷。因此,建議您建立可滿足其他操作需求 (例如編目優先順序和編目排程中的差異) 的最少內容來源數目。各內容來源可以包含最多 100 個起始位址。You can create a large number of content sources in each Search service application, but there is overhead associated with each content source. Therefore, we recommend that you create the smallest number of content sources that satisfy your other operational requirements, such as differences in crawl priority and crawl scheduling. Each content source can contain up to 100 start addresses.

規劃如何編目不同種類的內容Plan to crawl different kinds of content

每個內容來源只可編目一種內容。例如,您可以建立包含 SharePoint 網站之起始位址的內容來源,也可建立包含檔案共用之起始位址的內容來源,但無法建立一個內容來源而同時包含 SharePoint 網站與檔案共用的起始位址。下表列出您可以設定的內容來源種類。You can crawl only one kind of content per content source. For example, you can create a content source that contains start addresses for SharePoint sites and another content source that contains start addresses for file shares, but you cannot create a single content source that contains start addresses to both SharePoint sites and file shares. The following table lists the kinds of content sources that you can configure.

使用此種內容來源Use this kind of content source 若為此內容For this content
SharePoint 網站SharePoint sites 同一個伺服器陣列或不同 SharePoint Server 伺服器陣列中的 SharePoint 網站。SharePoint sites from the same farm or different SharePoint Server farms.

同一個伺服器陣列或不同 SharePoint Server 2013、SharePoint Server 2010、SharePoint Foundation 2010 或 Microsoft Search Server 2010 伺服器陣列中的 SharePoint 網站。SharePoint sites from the same farm or different SharePoint Server 2013, SharePoint Server 2010, SharePoint Foundation 2010, or Microsoft Search Server 2010 farms.

同一個伺服器陣列或不同 Office SharePoint Server 2007、Windows SharePoint Services 3.0 或 Search Server 2008 伺服器陣列中的 SharePoint 網站。SharePoint sites from the same farm or different Office SharePoint Server 2007, Windows SharePoint Services 3.0, or Search Server 2008 farms.
網站Web sites 組織中其他位於 SharePoint 網站上的 Web 內容。Other web content in your organization that is not located in SharePoint sites.

網際網路上的網站內容。Content on web sites on the Internet.
檔案共用File shares 組織中檔案共用上的內容。Content on file shares in your organization.

安全性提示: 當搜尋服務編目檔案共用時,若對共用檔案的權限與對包含該檔案之資料夾的權限不相同,對檔案的權限將具有優先權,而會用於搜尋結果的安全性修剪。因此,為確保只有適當的項目會出現在搜尋結果中,請確定檔案共用上的檔案權限是適當的。若檔案權限不適當,您可以從搜尋索引或搜尋結果中刪除特定項目。如需詳細資訊,請參閱 Delete items from the search index or from search results in SharePoint ServerSecurity note: When the Search service crawls a file share, if the permissions on a file on the share are different from the permissions on folders that contain the file, then the permissions on the file take precedence and are used for security trimming of search results. Therefore, to ensure that only appropriate items appear in search results, make sure that the permissions for files on file shares are appropriate. For cases in which file permissions are not appropriate, you can delete particular items from the search index or from search results. For more information, see Delete items from the search index or from search results in SharePoint Server.
Exchange 公用資料夾Exchange public folders Exchange 2007 與 Exchange Server 2010 公用資料夾。Exchange 2007 and Exchange Server 2010 public folders.
Lotus NotesLotus Notes Lotus Notes 資料庫中所儲存的電子郵件訊息。E-mail messages stored in Lotus Notes databases.

附註: 不同於其他種類的內容來源,除非您先安裝及設定適當的先決條件軟體,否則使用者介面中將不會出現 Lotus Notes 內容來源選項。如需詳細資訊,請參閱< Configure and use the Lotus Notes connector for SharePoint Server> (也適用於 SharePoint Server)。 Note: Unlike all other kinds of content sources, the Lotus Notes content source option does not appear in the user interface until you have installed and configured the appropriate prerequisite software. For more information, see Configure and use the Lotus Notes connector for SharePoint Server (also applies to SharePoint Server).
DocumentumDocumentum EMC Documentum 系統中的內容。Content from the EMC Documentum system.

附註: 您必須先安裝及設定適當的先決條件軟體與 Documentum 適用的 Microsoft SharePoint 2016 索引連接器,才能編目 EMC Documentum 內容。如需詳細資訊,請參閱< Configure and use the Documentum connector in SharePoint Server> (也適用於 SharePoint Server)。 Note: You can't crawl EMC Documentum content before you have installed and configured the appropriate prerequisite software and the Microsoft SharePoint 2016 Indexing Connector for Documentum. For more information, see Configure and use the Documentum connector in SharePoint Server (also applies to SharePoint Server).
企業營運資料Line-of-business data 儲存在企業營運系統應用程式中的商務資料。Business data that is stored in line-of-business applications.
自訂存放庫Custom repository 必須先安裝及登錄自訂連接器後才能編目的內容來源。Content sources that can only be crawled after a custom connector is installed and registered.

企業營運資料的內容來源Content sources for line-of-business data

商務資料內容來源需要在 Business Data Connectivity 服務應用程式的應用程式模型中指定裝載資料的應用程式。您可以建立一個內容來源以編目在 Business Data Connectivity 服務中登錄的所有應用程式,或建立個別的內容來源以編目各個應用程式。如需詳細資訊,請參閱 SharePoint 2013 的搜尋連接器架構 (此 MSDN 文章也適用於 SharePoint Server)。Business data content sources require that the applications hosting the data are specified in an Application Model in a Business Data Connectivity service application. You can create one content source to crawl all applications that are registered in the Business Data Connectivity service, or you can create separate content sources to crawl individual applications. For more information, see Search connector framework in SharePoint 2013 (This MSDN article also applies to SharePoint Server).

通常,規劃將商務資料整合到網站集合的人員不會是整體內容規劃程序中的同一組人員。因此,請納入內容規劃小組的商務應用程式管理員,這樣管理員才能建議您如何將商務應用程式資料整合到內容,並有效地呈現在網站集合中。Often, the people who plan for integration of business data into site collections are not the same people involved in the overall content planning process. Therefore, include business application administrators in content planning teams so that they can advise you how to integrate the business application data into content and effectively present it in the site collections.

編目不同排程的內容Crawl content on different schedules

基於下列理由,請考慮以不同的排程定義內容來源:Consider defining content sources with different schedules for the following reasons:

  • 配合停機時間與尖峰用量時段。To accommodate down times and periods of peak usage.

  • 更常編目經常更新的內容。To more frequently crawl content that is more frequently updated.

  • 分開編目速度較慢之伺服器上的內容,以及速度較快之伺服器上的內容。To crawl content that is located on slower servers separately from content that is located on faster servers.

  • 因有對高新鮮度內容的需要而必須持續編目 SharePoint 內容來源。如需詳細資訊,請參管理 SharePoint Server 中的連續編目To continuously crawl a SharePoint content source because of high freshness demands. For more information, see Manage continuous crawls in SharePoint Server.

執行完整編目的原因Reasons to do a full crawl

搜尋服務應用程式管理員對一或多個內容來源進行完整編目的原因如下:Reasons for a Search service application administrator to do a full crawl for one or more content sources include the following:

  • Search Service 應用程式才剛建立,而尚未編目預先設定的內容來源 [本機 SharePoint 網站]*A Search service application has just been created and the preconfigured content source *Local SharePoint sites has not been crawled yet.

  • 有其他內容來源是新的,且尚未編目。Some other content source is new and has not been crawled yet.

  • Search Service 應用程式管理員已變更內容來源。The Search service application administrator has changed a content source.

  • 已在伺服器陣列的伺服器上安裝軟體更新或 Service Pack。如需詳細資訊,請參閱軟體更新或 Service Pack 的說明。A software update or service pack was installed on servers in the farm. See the instructions for the software update or service pack for more information.

  • Search Service 應用程式管理員或網站集合管理員已新增或變更受管理屬性。必須執行所有受影響內容來源的完整編目,新的或已變更的受管理屬性才會生效。A Search service application administrator or site collection administrator added or changed a managed property. A full crawl of all affected content sources is required for the new or changed managed property to take effect.

  • 您想要偵測安全性變更 (在上次檔案共用的完整編目後,檔案共用的本機群組所做的變更)。You want to detect security changes that were made to local groups on a file share after the last full crawl of the file share.

  • 您想要解決連續的累加編目失敗。如果任何特定內容的累加編目連續出現多次失敗,系統會從搜尋索引移除受影響的內容。You want to resolve consecutive incremental crawl failures. If an incremental crawl fails a large number of consecutive times for any particular content, the system removes the affected content from the search index.

  • 已新增、刪除或修改編目規則。Crawl rules have been added, deleted, or modified.

  • 您想要取代毀損的搜尋索引。You want to replace a corrupted search index.

  • 指派給預設內容存取帳戶的使用者帳戶權限已變更。The permissions for the user account that is assigned to the default content access account have changed.

,即使累加編目或連續編目已在下列情況下排程,系統仍會執行完整編目:The system does a full crawl even when an incremental crawl or continuous crawl is scheduled under the following circumstances:

  • 搜尋管理員已停止先前的編目。A search administrator stopped the previous crawl.

  • 內容資料庫已還原,或伺服器陣列管理員已中斷內容資料庫連線再重新附加。A content database was restored, or a farm administrator has detached and reattached a content database.

  • 此 Search Service 應用程式中的內容來源完整編目從未完成。A full crawl of the content source has never been done from this Search service application.

  • 編目資料庫未包含正在編目之位址的項目。若正在編目項目的編目資料庫中沒有項目,就不會執行累加編目。The crawl database does not contain entries for the addresses that are being crawled. Without entries in the crawl database for the items being crawled, incremental crawls cannot occur.

限制或增加要編目的內容量Limit or increase the quantity of content that is crawled

每項內容來源的屬性中所能使用的選項不一,取決於所選的內容來源類型。您可以使用編目設定選項,限制或增加要編目的內容量。對於每項內容來源,您可以指定從起始位址開始編目的範圍。大部分內容來源類型都可讓您指定您要編目的階層深度 (從每一個起始位址起算)。此行為會套用到特定內容來源中的所有起始位址。如需要編目到一些更深階層的網站,可以另建包含這些網站的內容來源。下表是設定編目選項時的最佳作法。The options available in the properties for each content source vary depending on the content source type that you select. You can use crawl setting options to limit or increase the quantity of content that is crawled. For each content source, you can specify how extensively to crawl the start addresses. Most content source types allow you to specify how many levels deep in the hierarchy from each start address to crawl. This behavior is applied to all start addresses in a particular content source. If you have to crawl some sites at deeper levels, you can create additional content sources that include those sites. The following table describes best practices when you configure crawl setting options.

針對此種內容來源For this kind of content source 若情況是...If this pertains 請使用此編目設定選項Use this crawl setting option
SharePoint 網站SharePoint sites 您只要加入網站本身的內容,而不要加入其子網站的內容;或是您要編目子網站上不同排程的內容。You want to include the content that is on the site itself and you do not want to include the content that is on subsites, or you want to crawl the content that is on subsites on a different schedule. 只要編目每個起始位址的 SharePoint 網站。Crawl only the SharePoint site of each start address.
SharePoint 網站SharePoint sites 您要加入網站本身的內容。You want to include the content on the site itself.

-或--or-

您要編目起始位址下所有排程相同的內容。You want to crawl all content under the start address on the same schedule.
編目每個起始位址之主機名稱下的所有內容。Crawl everything under the host name of each start address.
網站Web sites 所連結之網站上的內容可能毫不相關。Content available on linked sites is unlikely to be relevant. 只編目每個起始位址之伺服器上的內容。Crawl only within the server of each start address.
網站Web sites 相關的內容只位於第一頁。Relevant content is located on only the first page. 只編目每個起始位址的第一頁。Crawl only the first page of each start address.
網站Web sites 您要限制編目起始位址上之連結的深度。You want to limit how deep to crawl the links on the start addresses. 自訂 指定要編目的頁數深度及伺服器躍點數。Custom — Specify the number of pages deep and number of server hops to crawl.

附註: 對於連線頻繁的網站,建議您先從小量著手,因為指定超過三頁的深度或超過三個伺服器躍點,可能會編目整個網際網路。Note: For a highly connected site, we recommend that you start with a small number, because specifying more than three pages deep or more than three server hops can crawl the entire Internet.
檔案共用File shares
Exchange 公用資料夾Exchange public folders
子資料夾中的內容可能毫不相關。Content available in the subfolders is unlikely to be relevant. 只編目每個起始位址的資料夾。Crawl only the folder of each start address.
檔案共用File shares
Exchange 公用資料夾Exchange public folders
子資料夾中的內容可能互有相關。Content in the subfolders is likely to be relevant. 編目每個起始位址的資料夾與子資料夾。Crawl the folder and subfolders of each start address.
商務資料Business data 商務資料目錄中繼資料存放區中所登錄的所有應用程式皆包含相關的內容。All applications that are registered in the Business Data Catalog metadata store contain relevant content. 編目整個商務資料目錄中繼資料存放區。Crawl the whole Business Data Catalog metadata store.
商務資料Business data 商務資料目錄中繼資料存放區中所登錄的應用程式並非全都包含相關的內容。Not all applications that are registered in the BDC metadata store contain relevant content.

-或--or-

您要編目某些不同排程的應用程式。You want to crawl some applications on a different schedule.
編目選取的應用程式。Crawl selected applications.

規劃連接器Plan connectors

編目程式會使用連接器 (在舊版 SharePoint Server 中稱為「通訊協定處理常式」) 來取得及索引內容。對於最常用的通訊協定,SharePoint Server 會提供並自動使用適當的連接器。若要編目的內容需要預設連接器以外的連接器,必須先安裝第三方的連接器或建立自訂的連接器。如需預設會安裝之連接器的清單,請參閱Default connectors in SharePoint Server (也適用於 SharePoint Server)。The crawler uses connectors (known as "protocol handlers" in earlier versions of SharePoint Server) to acquire and index content. For the most commonly-used protocols, SharePoint Server provides and automatically uses the appropriate connectors. To crawl content that requires a connector that is not provided by default, you must first install a third-party connector or build a custom connector. For a list of connectors that are installed by default, see Default connectors in SharePoint Server (also applies to SharePoint Server).

規劃內容來源時的其他注意事項Other considerations when planning content sources

針對相同類型的內容存放庫 (例如 SharePoint 網站) 來決定是否要使用一或多個內容來源時,有很大部分取決於管理方面的考量。為讓管理工作更簡單,建議在組織內容來源時,能夠採用管理員易於更新內容來源、編目規則及編目排程的方式。For content repositories that are of the same type, such as SharePoint sites, your decision about whether to use one or more content sources depends largely upon administrative considerations. To make administration easier, organize content sources in such a way that updating content sources, crawl rules, and crawl schedules is convenient for administrators.

  • 您無法使用同一個 Search Service 應用程式中的多項內容來源來編目相同的起始位址。例如,當您使用特定內容來源編目網站集合及其所有子網站之後,即無法再於不同的時間使用其他內容來源個別編目其中的子網站。You can't crawl the same start addresses by using multiple content sources in the same Search service application. For example, if you use a particular content source to crawl a site collection and all its subsites, you cannot use a different content source to crawl one of those subsites separately on a different schedule.

  • 管理員經常更新內容來源。變更內容來源需要對該內容來源進行完整編目。因此請考慮建立個別的內容來源,以便在必要時同時執行多個完整編目,並且縮短任何特定內容來源的完整編目時間。Administrators often update content sources. Changing a content source requires a full crawl for that content source. Therefore, consider creating separate content sources so that you can run multiple full crawls at the same time if necessary, and so that a full crawl for any particular content source is less time-consuming.

規劃編目規則以達最佳編目效果Plan crawl rules to optimize crawls

編目規則會套用到 Search Service 應用程式中的所有內容來源。您可以只將編目規則套用到某個或某些 URL,以達成下列目的:Crawl rules apply to all content sources in the Search service application. You can apply crawl rules to a particular URL or set of URLs to do the following things:

  • 藉由排除一或多個 URL 避免編目不相關的內容。這同時也可減少使用伺服器資源與網路流量。Avoid crawling irrelevant content by excluding one or more URLs. This also helps reduce the use of server resources and network traffic.

  • 只編目 URL 的連結,而不編目 URL 本身。當網站包含相關內容的連結,而頁面中所含的連結卻包含不相關的資訊時,即可使用此選項。Crawl links on the URL without crawling the URL itself. This option is useful for sites that have links of relevant content when the page that contains the links does not contain relevant information.

  • 編目複雜的 URL。此選項會指示系統編目含有問號來指定查詢參數的 URL。這些 URL 可能不含相關的內容,視網站而定。由於複雜的 URL 常會重新導向到不相關的網站,因此建議只有在確定複雜 URL 所連到的內容確實相關時,才在網站上啟用此選項。Enable complex URLs to be crawled. This option directs the system to crawl URLs that contain a query parameter specified with a question mark. Depending upon the site, these URLs might not include relevant content. Because complex URLs can often redirect to irrelevant sites, it is a good idea to enable this option only on sites where you know that the content available from complex URLs is relevant.

  • 讓 SharePoint 網站上的內容編目為 HTTP 頁面。此選項可讓搜尋系統編目位於防火牆後方的 SharePoint 網站,或是在所處環境中,所編目的網站對編目程式 (搜尋拓撲中的編目元件) 使用之 Web 服務的存取有所限制的 SharePoint 網站。Enable content on SharePoint sites to be crawled as HTTP pages. This option enables the Search system to crawl SharePoint sites that are behind a firewall or in scenarios in which the site being crawled restricts access to the Web service that is used by the crawler (a crawl component in the search topology).

  • 指定要使用預設的內容存取帳戶、其他內容存取帳戶或用戶端憑證來編目指定的 URL 。Specify whether to use the default content access account, a different content access account, or a client certificate for crawling the specified URL.

由於編目內容會耗費資源與頻寬,因此只加入少量您確認相關的內容,會比加入大量毫不相關的內容來得好。完成初始部署之後,您可以查閱查詢與編目記錄檔,調整內容來源與編目規則的相關程度,然後加入更多的內容。Because crawling content consumes resources and bandwidth, it is better to include a smaller amount of content that you know is relevant than a larger amount of content that might be irrelevant. After the initial deployment, you can review the query and crawl logs and adjust content sources and crawl rules to be more relevant and include more content.

規劃編目程式驗證Plan crawler authentication

當編目程式存取內容來源中所列的起始位址時,必須通過內容代管伺服器的驗證及授權。系統預設會使用預設的內容存取帳戶。您也可使用編目規則指定在編目特定內容時使用其他內容存取帳戶。當您使用預設的內容存取帳戶或編目規則所指定的其他內容存取帳戶時,所使用的內容存取帳戶至少須具備所有編目內容的讀取權限。否則不只不會編目內容,還不會為內容編製索引,致使查詢也無從使用。When the crawler accesses the start addresses that are listed in content sources, the crawler must be authenticated by, and granted access to, the servers that host that content. By default, the system uses the default content access account. Or, you can use crawl rules to specify a different content access account to use when crawling particular content. Whether you use the default content access account or a different content access account specified by a crawl rule, the content access account that you use must have at least read permissions on all content that is crawled. If the content access account does not have read permissions, the content is not crawled, is not indexed, and therefore is not available to queries.

建議讓指定的預設內容存取帳戶具備大部分編目內容的存取權。僅當有安全性考量而必須區分內容存取帳戶時,才使用其他內容存取帳戶。We recommend that the account that you specify as the default content access account has access to most of your crawled content. Only use other content access accounts when security considerations require separate content access accounts.

您必須為您所規劃的每一項內容來源指定內容存取帳戶預設無法存取的起始位址,然後再規劃如何加入這些起始位址的編目規則。For each content source that you plan, determine the start addresses that cannot be accessed by the default content access account, and then plan to add crawl rules for those start addresses.

重要

請確認預設內容存取帳戶或其他內容存取帳戶所使用的網域帳戶,不是所編目之 Web 應用程式關聯的應用程式集區所使用的網域帳戶。這可讓 SharePoint 網站上尚未發佈的內容與檔案次要版本 (即歷程記錄) 加入編目與索引編製。Ensure that the domain account that is used for the default content access account or any other content access account is not the same domain account that is used by an application pool associated with any Web application that you crawl. Doing so can cause unpublished content in SharePoint sites and minor versions of files (that is, history) in SharePoint sites to be crawled and indexed.

另一項重要注意事項是編目程式所使用的驗證通訊協定,必須與主機伺服器相同。編目程序預設會使用 NTLM 進行驗證。如有必要,可以設定編目程式使用其他驗證通訊協定。Another important consideration is that the crawler must use the same authentication protocol as the host server. By default, the crawler authenticates by using NTLM. You can configure the crawler to use a different authentication protocol, if it is necessary.

若要使用宣告式驗證,必須啟用所要編目之 Web 應用程式的 Windows 驗證。If you are using claims-based authentication, make sure that Windows authentication is enabled on any Web applications to be crawled.

規劃內容處理Plan content processing

編目程式會編目內容來源指定的內容存放庫,然後將已編目之項目的內容與中繼資料饋送到內容處理元件。內容處理元件會讀取並剖析編目屬性,然後將這些屬性報告至搜尋管理資料庫。The crawler crawls content repositories specified by content sources and then feeds the contents and metadata of crawled items to the content processing component. The content processing component reads and parses the crawled properties and then reports the properties to the Search Administration database.

您可以將編目屬性對應至 Managed 屬性,然後編輯搜尋結構描述以設定屬性設定。內容處理元件會讀取搜尋結構描述,並使用此結構描述進行對應。只有 Managed 屬性會納入搜尋索引中。Managed 屬性可用來建立精簡器或執行其他作業。如需詳細資訊,請參閱SharePoint Server 的搜尋結構描述概觀You can map crawled properties to managed properties and configure property settings by editing the search schema. The content processing component reads the search schema and uses it to carry out the mapping. Only managed properties are included in the search index. Managed properties can be used to create refiners, for example. For more information, see Overview of the search schema in SharePoint Server.

加入或排除檔案類型Include or exclude file types

您可以在搜尋索引中納入任何檔案類型的內容。若要編製內容的索引,必須先由編目元件加以編目,然後由內容處理元件加以剖析。檔案的副檔名必須包含在 [管理檔案類型] 頁面上的副檔名清單中,編目元件才能編目該檔案。只有在下列情況下,內容處理元件才能剖析已編目的檔案內容:Content from any file type can be included in the search index. In order for content to be indexed, it must first be crawled by a crawl component and then parsed by a content processing component. A crawl component can crawl a file only if the file extension is included in the list of file name extensions on the Manage File Types page. A content processing component can parse the contents of a crawled file only under the following conditions:

  • 內容處理元件具有可剖析檔案格式的格式處理常式。The content processing component has a format handler that can parse the file format.

  • 內容處理元件能夠剖析具有檔案格式及副檔名的檔案。The content processing component is enabled to parse files that have the file format and file name extension.

如果內容處理元件無法剖析檔案,搜尋索引只會包含檔案內容,例如檔案名稱。If the content processing component is unable to parse a file, the search index will only include file properties, such as the file name.

根據預設,SharePoint Server 會滿足眾多檔案類型的上述需求,且可以編目及剖析這些檔案類型,不需要您安裝額外的格式處理常式。如需檔案類型的概觀,請參閱<Default crawled file name extensions and parsed file types in SharePoint Server>。By default, SharePoint Server satisfies these requirements for many types of files and it can crawl and parse these file types without your having to install additional format handlers. For an overview of the file types, see Default crawled file name extensions and parsed file types in SharePoint Server.

注意

您可以延伸檔案格式的初始集合,SharePoint Server 可新增第三方篩選器型格式處理常式 (亦稱為 iFilter) 來加以剖析。第三方 iFilter 可以覆寫內建的格式處理常式。You can extend the initial collection of file formats that SharePoint Server can parse by adding third-party filter-based format handlers, known as iFilters. A third party iFilter can override a built-in format handler.

若內容存放庫的檔案類型「不」**** 在 [管理檔案類型] 頁面中,而您打算將內容包含在該來源的搜尋索引中,請檢閱下列事項:When you plan to include content in the search index from content repositories that have file types that are not on the Manage File Types page, review the following:

  • 若要編目檔案類型,請將該檔案類型新增到 [管理檔案類型] 頁面中。To crawl the file type, add the file type to the Manage File Types page.

  • 剖析檔案類型:To parse the file type:

    • 如果 SharePoint Server 沒有該格式的格式處理常式,請在每部伺服器 (裝載 Search Service 應用程式內容處理元件) 上,為該檔案格式安裝第三方篩選器型格式處理常式。If SharePoint Server does not have a format handler for the format, install a third-party filter-based format handler for the file format on each server that hosts a content processing component in the Search service application.

    • 在每部伺服器 (裝載 Search Service 應用程式內容處理元件) 上啟用檔案格式剖析及副檔名Enable parsing of the file format and file name extension on each server that hosts a content processing component in the Search service application

如需詳細資訊,請參閱<Add or remove a file type from the search index in SharePoint Server>。For more information, see Add or remove a file type from the search index in SharePoint Server.

規劃如何使用 (自訂) 實體擷取器Plan to use (custom) entity extractors

您可以設定搜尋系統搜尋非結構化內容中的「實體」,例如在文件的本文或標題中。這些實體可能是文字或片語,例如產品名稱。若要指定所要搜尋的實體,可以建立及部署自己的字典。You can configure the search system to look for "entities" in unstructured content, such as in the body text or the title of a document. These entities can be words or phrases, such as product names. To specify which entities to look for, you can create and deploy your own dictionaries.

所擷取的實體會各以不同的 Managed 屬性儲存在搜尋索引中;而這些 Managed 屬性會自動設定為可搜尋、可查詢、可擷取、可排序及可精簡。例如,您可以在搜尋精簡器中使用這些屬性,協助使用者篩選其搜尋結果。The extracted entities are stored in the search index as separate managed properties, which are automatically configured to be searchable, queryable, retrievable, sortable and refinable. You can use those properties in search refiners, for example, to help users filter their search results.

若為公司,您可使用 SharePoint Server 提供之預先填入的公司擷取字典。For companies, you can use the pre-populated company extraction dictionary that SharePoint Server provides.

除此之外,您還可以利用自訂實體擷取字典的形式,部署數種自訂實體擷取器。您可以使用 Microsoft PowerShell 部署這些字典。字典中的實體 (一或多個單字) 可以區分大小寫或不分大小寫的方式,與內容中的字詞或局部字詞進行比對。如需詳細資訊,請參閱<在 SharePoint Server 中建立及部署自訂實體擷取器>。In addition, you can deploy several types of custom entity extractors in the form of custom entity extraction dictionaries. You deploy these dictionaries using Microsoft PowerShell. The entries in these dictionaries (single or multiple words) will be matched on words or parts of words in the content in a case-sensitive or case-insensitive way. For more information, see Create and deploy custom entity extractors in SharePoint Server.


自訂實體擷取器/字典Custom entity extractor / dictionary 描述Description
全字相符Word Extraction 不分大小寫,最多 5 個字典。例如 "anchor" 項目會視為與 "anchor" 及 "Anchor" 相符,而不會視為與 "anchorage" 相符。Case-insensitive, maximum 5 dictionaries. For example, the entry "anchor" would match "anchor" and "Anchor", but not "anchorage".
全字部分相符Word Part Extraction 不分大小寫,最多 5 個字典。例如 "anchor" 項目會視為與 "anchor"、"Anchor" 及部分 "anchorage" 相符。Case-insensitive, maximum 5 dictionaries. For example, the entry "anchor" would match "anchor", "Anchor" and within "anchorage".
全字相符擷取Word Exact Extraction 區分大小寫,最多 1 個字典。例如 "anchor" 項目會視為與 "anchor" 相符,而不會視為與 "Anchor" 或 "anchorage" 相符。Case-sensitive, maximum 1 dictionary. For example, the entry "anchor" would match "anchor", but not "Anchor" or "Anchorage".
全字部分相符擷取Word Part Exact Extraction 區分大小寫,最多 1 個字典。例如 "anchor" 項目會視為與 "anchor" 及部分 "anchorage" 相符,而不會視為與 "Anchor" 相符。Case-sensitive, maximum 1 dictionary. For example, the entry "anchor" would match "anchor" and within "anchorage", but not "Anchor".

關於結果來源與同盟About result sources and federation

在 SharePoint Server 中,您可以使用「結果來源」指定您要從中取得搜尋結果的提供者 URL、用以取得這些結果的通訊協定,以及其他相關設定。例如,預先設定的預設結果來源為本機 SharePoint 結果In SharePoint_Server, you use a result source to specify the URL of a provider to get search results from, a protocol to use to get those results, and other related settings. For example, the preconfigured default result source is Local SharePoint Results.

您可以新增結果來源,以指定您要從中取得搜尋結果的外部搜尋提供者 (例如遠端搜尋引擎或饋送)。此動作稱為「同盟」。You can add result sources that specify external search providers (such as remote search engines or feeds) from which to get search results. This is called federation.

關於同盟About federation

使用同盟時,使用者可以搜尋和擷取本機伺服器陣列中伺服器尚未編目的內容。例如,同盟所提供的搜尋結果可能來自 Web 搜尋提供者 (如 Bing) 或您沒有存取權可編目的私人資料集。When you use federation, users can search for and retrieve content that has not been crawled by servers in the local farm. For example, federation can provide search results from a web-search provider such as Bing, or perhaps from a private data set that you do not have access to crawl.

如果組織分散在不同地理區,而且想要在每個位置都有其專屬搜尋索引時提供其各種位置之內容的搜尋存取權,則同盟也可以是不錯的方案。因為每個位置都提供來自其專屬索引的搜尋結果,所以不需要部署可建立和存取單一統一索引的集中式搜尋服務。在此內容中,同盟所提供的優點如下:Federation can also be a good solution for a geographically distributed organization that wants to provide search access to content at its various locations when each location has its own search index. Because each location provides search results from its own index, it is not necessary to deploy a centralized search service that builds and accesses a single, unified index. In this context, federation can provide advantages such as the following:

  • 低頻寬需求:分散在不同地理區的組織可能沒有編目和檢索大量遠端內容所需的高網路頻寬。組織使用同盟時,跨廣域網路針對搜尋所傳輸的主要資料只是一組來自每個同盟內容存放庫的搜尋結果。Low bandwidth requirements ─ An organization that is geographically dispersed might not have the high network bandwidth that is required to crawl and index large amounts of remote content. When an organization uses federation, the main data that is transmitted for search across the wide-area network is only a set of search results from each federated content repository.

  • 搜尋結果新鮮度:組織內的每個部門編目本機內容的速度,會比集中式搜尋部署編目整個組織中的所有內容還要快。Freshness of search results ─ Each division within an organization can crawl the local content more quickly than a centralized search deployment would be able to crawl all of the content in the entire organization.

  • 部門搜尋變化:組織使用同盟時,組織內的每個部門都可以提供和控制其專屬搜尋環境。舉例來說,每個部門都可以將搜尋調整成其專屬需求和喜好設定、具有其專屬使用者體驗和其專屬搜尋連接器。集中式搜尋入口網站不允許這類差異。Divisional search variability ─ When an organization uses federation, each division within the organization can provide and control its own search environment. Each division can tailor search to its own requirements and preferences, with its own user experience and its own search connectors, for example. A centralized search portal would not allow for such differences.

  • 受限的搜尋索引大小:大型且分散在不同地理區的組織可能有數百萬份文件。因為需要有支援這類大型索引的基礎架構,所以讓組織具有單一的統一搜尋索引並不實際。同盟可讓每個部門中的使用者執行單一搜尋,以找出組織中分散在多個較小型搜尋索引的相關內容。Limited size of search indexes ─ A large, geographically distributed organization might have millions of documents. It might not be practical for the organization to have a single, unified search index because of the infrastructure that would be required to support such a large index. Federation enables users in each division to perform a single search to find relevant content that is distributed across multiple smaller search indexes in the organization.

使用結果來源進行同盟Using result sources for federation

若要在 SharePoint Server 中使用同盟,請在 [新增/編輯結果來源] 頁面的 [通訊協定]**** 區段中選取下列其中一種通訊協定:To use federation in SharePoint Server 2013, you select one of the following protocols in the Protocol section on the Add/Edit Result Source page:

您選取此通訊協定You select this protocol 透過這種提供者取得同盟搜尋結果To get federated search results from this kind of provider
遠端 SharePointRemote SharePoint 另一個 SharePoint Server 伺服器陣列中搜尋服務的索引The index of a search service in another SharePoint Server farm
OpenSearch 1.0/1.1OpenSearch 1.0/1.1 使用 OpenSearch 通訊協定的外部搜尋引擎或摘要 (如 Bing)An external search engine or feed that uses the OpenSearch protocol, such as Bing
ExchangeExchange Exchange Server 2013Exchange Server 2013

注意

在 [新增/編輯結果來源] 頁面上,當您選取上表中顯示的其中一種通訊協定時,也必須填寫頁面上的其他相關欄位,以完整指定結果來源。On the Add/Edit Result Source page, when you select one of the protocols shown in the preceding table, you must also fill in other related fields on the page to fully specify the result source.

另請參閱See also

了解 SharePoint Server 中的搜尋結果來源Understanding result sources for search in SharePoint Server

在 SharePoint Server 中設定搜尋的結果來源Configure result sources for search in SharePoint Server

Manage crawling in SharePoint ServerManage crawling in SharePoint Server

Default connectors in SharePoint ServerDefault connectors in SharePoint Server

SharePoint Server 中預設編目的檔案副檔名及剖析的檔案類型Default crawled file name extensions and parsed file types in SharePoint Server

SharePoint 2013 中的搜尋連接器架構Search connector framework in SharePoint 2013