在 SharePoint Server 中編目的最佳作法Best practices for crawling in SharePoint Server

了解 SharePoint Server 2016 和 SharePoint Server 2013 中編目的最佳作法。Learn about best practices for crawling in SharePoint Server 2016 and SharePoint Server 2013.

搜尋系統會編目內容,以建立可供使用者用來執行搜尋查詢的搜尋索引。本文包含有關如何有效管理編目的建議。The Search system crawls content to build a search index that users can run search queries against. This article contains suggestions as to how to manage crawls most effectively.

使用預設內容存取帳戶來編目大部分的內容Use the default content access account to crawl most content

預設內容存取帳戶是您指定的預設用於編目的 SharePoint Server 搜尋服務的網域帳戶。簡化了程序,其為您的內容來源以編目指定的內容儘可能使用此帳戶。若要變更預設內容存取帳戶,請參閱變更用於編目 SharePoint Server 中的預設帳戶The default content access account is a domain account that you specify for the SharePoint Server Search service to use by default for crawling. For simplicity, it is best to use this account to crawl as much as possible of the content that is specified by your content sources. To change the default content access account, see Change the default account for crawling in SharePoint Server.

當您無法使用預設內容存取帳戶來編目特定 URL (例如,基於安全理由),可以建立編目規則以指定下列其中一種替代方式來驗證編目程式:When you cannot use the default content access account for crawling a particular URL (for example, for security reasons), you can create a crawl rule to specify one of the following alternatives for authenticating the crawler:

  • 不同的內容存取帳戶A different content access account

  • 用戶端憑證A client certificate

  • 表單認證Form credentials

  • 編目的 CookieA cookie for crawling

  • 匿名存取Anonymous access

如需詳細資訊,請參閱 < Manage crawl in SharePoint Server 的規則For more information, see Manage crawl rules in SharePoint Server.

有效使用內容來源Use content sources effectively

內容來源是 Search Service 應用程式中的一組選項,供您用來指定下列每一項:A content source is a set of options in a Search service application that you use to specify each of the following:

  • 一或多個要編目的起始位址。One or more start addresses to crawl.

  • 在 [起始位址 (例如 SharePoint Server 網站、 檔案共用或企業營運資料) 的內容類型。您可以指定要編目內容來源中的內容只有一個類型。例如,您可以使用一個內容來源以編目 SharePoint Server 網站和不同的內容來源以編目檔案共用。The type of content in the start addresses (such as SharePoint Server sites, file shares, or line-of-business data). You can specify only one type of content to crawl in a content source. For example, you would use one content source to crawl SharePoint Server sites, and a different content source to crawl file shares.

  • 完整或累加編目的編目排程和編目優先順序,將會套用至內容來源所指定的所有內容存放庫。A crawl schedule and a crawl priority for full or incremental crawls that will apply to all of the content repositories that the content source specifies.

當您建立 Search service 應用程式時,搜尋系統會自動建立及設定一個內容來源,名為[本機 SharePoint 網站。此預先設定的內容來源的時間來編目使用者設定檔、 及編目與搜尋服務應用程式相關聯的 web 應用程式中的所有 SharePoint Server 網站。您也可以使用此內容來源以編目內容的其他 SharePoint 伺服器陣列,包括 SharePoint Server 2007 伺服器陣列、 SharePoint Server 2010 伺服器陣列、 SharePoint Server 2013 伺服器陣列或其他 SharePoint 伺服器陣列。When you create a Search service application, the search system automatically creates and configures one content source, which is named Local SharePoint sites. This preconfigured content source is for crawling user profiles, and for crawling all SharePoint Server sites in the web applications with which the Search service application is associated. You can also use this content source for crawling content in other SharePoint Server farms, including SharePoint Server 2007 farms, SharePoint Server 2010 farms, SharePoint Server 2013 farms, or other SharePoint Server farms.

當您想要執行下列任何一項作業時,請建立額外的內容來源:Create additional content sources when you want to do any of the following:

  • 編目其他類型的內容Crawl other types of content

  • 限制或增加要編目的內容量Limit or increase how much content to crawl

  • 增加或減少編目特定內容的頻率Crawl certain content more or less frequently

  • 設定不同的優先順序來編目特定內容 (此適用於完整與累加編目,但不適用於連續編目)Set different priorities for crawling certain content (this applies to full and incremental crawls, but not to continuous crawls)

  • 在不同的排程編目特定內容 (此適用於完整與累加編目,但不適用於連續編目)Crawl certain content on different schedules (this applies to full and incremental crawls, but not to continuous crawls)

但是,為讓管理工作盡量保持簡單,建議您限制建立和使用的內容來源數目。However, to keep administration as easy as possible, we recommend that you limit the number of content sources that you create and use.

使用內容來源以排定編目Using content sources to schedule crawls

您可以編輯預先設定的內容來源[本機 SharePoint 網站新增至指定編目排程;它不指定預設的編目排程。針對任何內容來源,您可以手動,開始編目但建議您排程累加編目或啟用連續編目以確定該內容進行編目定期。You can edit the preconfigured content source Local SharePoint sites to specify a crawl schedule; it does not specify a crawl schedule by default. For any content source, you can start crawls manually, but we recommend that you schedule incremental crawls or enable continuous crawls to make sure that content is crawled regularly.

基於下列原因,請考慮使用不同的內容來源,在不同的排程編目內容。Consider using different content sources to crawl content on different schedules for the following reasons.

  • 配合伺服器停機與伺服器尖峰用量時段。To accommodate server down times and periods of peak server usage.

  • 將裝載於速度較慢之伺服器上的內容,和裝載於速度較快之伺服器上的內容分開編目。To crawl content that is hosted on slower servers separately from content that is hosted on faster servers.

  • 頻繁編目較常進行更新的內容。To frequently crawl content that is updated more often.

將內容編目可大幅降低裝載內容之伺服器的效能。效果取決於主機伺服器是否有足夠的資源 (特別是 CPU 和 RAM) 可處理負載。因此,當您規劃編目排程時,請考慮採用下列最佳作法:Crawling content can significantly decrease the performance of the servers that host the content. The effect depends on whether the host servers have sufficient resources (especially CPU and RAM) to handle the load. Therefore, when you plan crawl schedules, consider the following best practices:

  • 將每個內容來源的編目,排在當裝載內容的伺服器有空以及對伺服器資源的需求較低時進行。Schedule crawls for each content source during times when the servers that host the content are available and when there is low demand on the server resources.

  • 以交錯方式編目排程,使編目伺服器與主機伺服器上的負載分散一段時間。您可以最佳化編目排程時,此為您熟悉每個內容來源的一般編目期間所檢查的編目記錄檔的方式。如需詳細資訊,請參閱在SharePoint Server 中的檢視搜尋診斷資料中的 [編目記錄檔Stagger crawl schedules so that the load on crawl servers and host servers is distributed over time. You can optimize crawl schedules in this manner as you become familiar with the typical crawl durations for each content source by checking the crawl log. For more information, see Crawl log in View search diagnostics in SharePoint Server.

  • 只有當視需要時執行完整編目。如需詳細資訊,請參閱 <規劃編目及同盟 in SharePoint Server執行完整編目的原因。需要完整編目才會生效,例如建立編目規則之任何系統管理變更為執行不久之前下次完整編目的變更,因此不需要額外的完整編目。如需詳細資訊,請參閱 < Manage crawl in SharePoint Server 的規則Run full crawls only when it is necessary. For more information, see Reasons to do a full crawl in Plan crawling and federation in SharePoint Server. For any administrative change that requires a full crawl to take effect, such as creation of a crawl rule, perform the change shortly before the next full crawl so that an additional full crawl is not necessary. For more information, see Manage crawl rules in SharePoint Server.

編目使用者設定檔編目 SharePoint Server 網站之前Crawl user profiles before you crawl SharePoint Server sites

根據預設,在伺服器陣列中第一個 Search service 應用程式中預先設定的內容來源[本機 SharePoint 網站至少包含下列兩個起始位址:By default, in the first Search service application in a farm, the preconfigured content source Local SharePoint sites contains at least the following two start addresses:

不過,如果您要部署 「 人員搜尋 」,我們建議您建立不同的內容來源的起始位址 sps3: / / My_Site_host_URL_執行編目的內容來源和第一次。執行此動作的原因是編目完成之後,搜尋系統會產生是為了標準化人的名稱清單。這會使該名人員的所有結果時的人員名稱中有不同表單中的搜尋結果的一組,會都顯示單一群組 (又稱為結果區塊)。例如,搜尋查詢 」 Anne Weiler"中,可以顯示 Anne Weiler 或 A Weiler 或別名 AnneW 所製作的所有文件會標示為 「 文件所 Anne 的 Weiler"在結果區塊中。同樣地,那些 identity 的任何製作的所有文件可以顯示在精簡搜尋面板標題"Anne Weiler 」 下的 [如果"Author"是其中一個出現的類別。However, if you are deploying "people search", we recommend that you create a separate content source for the start address sps3:// _My_Site_host_URL and run a crawl for that content source first. The reason for doing this is that after the crawl finishes, the search system generates a list to standardize people's names. This is so that when a person's name has different forms in one set of search results, all results for that person are displayed in a single group (known as a result block). For example, for the search query "Anne Weiler", all documents authored by Anne Weiler or A. Weiler or alias AnneW can be displayed in a result block that is labeled "Documents by Anne Weiler". Similarly, all documents authored by any of those identities can be displayed under the heading "Anne Weiler" in the refinement panel if "Author" is one of the categories there.

編目使用者設定檔及然後編目 SharePoint Server 網站To crawl user profiles and then crawl SharePoint Server sites

  1. 確認執行此程序的使用者帳戶為您要設定之 Search Service 應用程式的管理員。Verify that the user account that performs this procedure is an administrator for the Search service application that you want to configure.

  2. 遵循Deploy people search in SharePoint Server中的指示。這些指示的一部分,您執行下列動作:Follow the instructions in Deploy people search in SharePoint Server. As part of those instructions, you do the following:

    • 建立內容來源的編目使用者設定檔 (設定檔存放區)。您可能會授與該內容來源的名稱,例如人員。在 [新的內容來源,請在 [起始位址] 區段中輸入 sps3: / / //my_site_host_url,其中_My_Site_host_URL_是 「 我的網站主機 」 的 URL。Create a content source that is only for crawling user profiles (the profile store). You might give that content source a name such as People. In the new content source, in the Start Addresses section, type sps3:// My_Site_Host_URL, where My_Site_host_URL is the URL of the My Site host.

    • 開始對您剛才建立的 People 內容來源進行編目。Start a crawl for the People content source that you just created.

    • 刪除起始位址 sps3: / / My_Site_Host_URL_從預先設定的內容來源 [本機 SharePoint 網站Delete the start address sps3:// _My_Site_Host_URL from the preconfigured content source Local SharePoint sites.

  3. 等候約兩個小時後的 People 內容來源的編目完成。Wait about two hours after the crawl for the People content source finishes.

  4. 啟動第一次的完整編目內容來源[本機 SharePoint 網站Start the first full crawl for the content source Local SharePoint sites.

使用連續編目以協助確保有新鮮的搜尋結果Use continuous crawls to help ensure that search results are fresh

啟用連續編目是您可以選取當您新增或編輯SharePoint 網站] 類型的內容來源的編目排程選項。連續編目已新增、 變更或刪除自上次編目的內容進行編目。連續編目啟動預先定義的時間間隔。預設的間隔時間是每隔 15 分鐘,但是您可以設定使用 Microsoft PowerShell 進行較短的間隔的連續編目。連續編目經常發生,因為它們協助確保搜尋索引新鮮度、 甚至是針對經常更新的 SharePoint Server 內容。此外時所傳回錯誤的特定項目中的多個編目嘗試延遲累加或完整編目,, 連續編目可編目其他內容及提供至索引新鮮度,因為連續編目沒有處理或重試傳回錯誤超過 3 次的項目。(已啟用連續編目的內容來源]"清理句點"累加編目自動每四個小時執行重新編目便會重複替換傳回錯誤的所有項目。)Enable continuous crawls is a crawl schedule option that you can select when you add or edit a content source of type SharePoint Sites. A continuous crawl crawls content that was added, changed, or deleted since the last crawl. A continuous crawl starts at predefined time intervals. The default interval is every 15 minutes, but you can set continuous crawls to occur at shorter intervals by using Microsoft PowerShell. Because continuous crawls occur so often, they help ensure search-index freshness, even for SharePoint Server content that is frequently updated. Also, while an incremental or full crawl is delayed by multiple crawl attempts that are returning an error for a particular item, a continuous crawl can be crawling other content and contributing to index freshness, because a continuous crawl does not process or retry items that return errors more than three times. (For content sources that have continuous crawls enabled, a "clean-up" incremental crawl automatically runs every four hours to re-crawl any items that repeatedly return errors.)

單一的連續編目包括所有內容來源的連續編目已啟用搜尋服務應用程式中。同樣地,連續編目間隔套用至其啟用連續編目的 Search service 應用程式中的所有內容來源。如需詳細資訊,請參閱管理連續編目 SharePoint Server 中A single continuous crawl includes all content sources in a Search service application for which continuous crawls are enabled. Similarly, the continuous crawl interval applies to all content sources in the Search service application for which continuous crawls are enabled. For more information, see Manage continuous crawls in SharePoint Server.

連續編目增加及編目目標上編目程式的負載。請確定您計劃和向外延展據以這增加的資源使用率。啟用連續編目每個大型內容來源,我們建議您將一個或多個前端網頁伺服器設定為專用編目目標。如需詳細資訊,請參閱 <管理編目負載 (SharePoint Server 2010)Continuous crawls increase the load on the crawler and on crawl targets. Make sure that you plan and scale out accordingly for this increased consumption of resources. For each large content source for which you enable continuous crawls, we recommend that you configure one or more front-end web servers as dedicated targets for crawling. For more information, see Manage crawl load (SharePoint Server 2010).

使用編目規則從所編目的內容中排除不相關的內容Use crawl rules to exclude irrelevant content from being crawled

由於編目會耗用資源和頻寬,因此在初始部署期間,最好是編目您知道確實重要的少量內容,而不是編目為數較多、但可能有一部分並不相關的內容。若要限制編目的內容量,您可以基於下列原因建立編目規則:Because crawling consumes resources and bandwidth, during initial deployment it might be better to crawl a small amount of content that you know is relevant, instead of crawling a larger amount of content, some of which might not be relevant. To limit how much content that you crawl, you can create crawl rules for the following reasons:

  • 排除一或多個 URL,以避免編目不相關的內容。To avoid crawling irrelevant content by excluding one or more URLs.

  • 編目 URL 上的連結卻不編目 URL 本身。當網站本身不含相關內容但是含有相關內容的連結時,這相當有用。To crawl links on a URL without crawling the URL itself. This is useful for sites that do not contain relevant content but have links to relevant content.

根據預設,編目程式不會遵循複雜的 Url,亦即包含問號後面接著額外參數的 Url — 例如http://contoso/page.aspx?x=y。如果您啟用編目程式遵循複雜的 Url,這會導致編目程式收集比預期或適當的多個多個 Url。這會導致編目程式收集不必要的連結、 填滿設為備援連結的編目資料庫,而導致整天大的索引。By default, the crawler will not follow complex URLs, which are URLs that contain a question mark followed by additional parameters — for example, http://contoso/page.aspx?x=y. If you enable the crawler to follow complex URLs, this can cause the crawler to gather many more URLs than is expected or appropriate. This can cause the crawler to gather unnecessary links, fill the crawl database with redundant links, and result in an index that is unnecessarily large.

這些量值有助於減少使用伺服器資源和網路流量,並可增加搜尋結果的相關性。初始部署之後,您可以檢閱查詢和編目記錄檔和調整內容來源及編目規則,以視需要加入更多的內容。如需詳細資訊,請參閱 < Manage crawl in SharePoint Server 的規則These measures can help reduce the use of server resources and network traffic, and can increase the relevance of search results. After the initial deployment, you can review the query and crawl logs and adjust content sources and crawl rules to include more content if it is necessary. For more information, see Manage crawl rules in SharePoint Server.

編目的預設區域的 SharePoint Server web 應用程式Crawl the default zone of SharePoint Server web applications

編目的 SharePoint Server web 應用程式的預設區域,查詢處理器自動對應,並傳回搜尋結果 Url,使相對於從中執行查詢的備用存取對應 (AAM) 區域。這可讓使用者輕易檢視和開啟搜尋結果。When you crawl the default zone of a SharePoint Server web application, the query processor automatically maps and returns search-result URLs so that they are relative to the alternate access mapping (AAM) zone from which queries are performed. This makes it possible for users to readily view and open search results.

但是,如果您編目預設區域以外的 Web 應用程式區域,查詢處理器不會對應搜尋結果 URL,因此這些 URL 就不會與從中執行查詢的 AAM 區域相關。搜尋結果 URL 會變成與所編目的非預設區域相關。因此,使用者無法馬上就檢視或開啟搜尋結果。However, if you crawl a zone of a web application other than the default zone, the query processor does not map search-result URLs so that they are relative to the AAM zone from which queries are performed. Instead, search-result URLS will be relative to the non-default zone that was crawled. Because of this, users might not readily be able to view or open search results.

例如,假設您有下列 AAM 用於某個名為 WebApp1 的 Web 應用程式:For example, assume that you have the following AAMs for a web application named WebApp1:

預設Default 公用 URLPublic URL 驗證提供者Authentication provider
預設Default https://contoso Windows 驗證:NTLMWindows authentication: NTLM
外部網路Extranet https://fabrikam 表單型驗證Forms-based authentication
內部網路Intranet http://fabrikam Windows 驗證:NTLMWindows authentication: NTLM

現在,說出編目的預設區域, https://contoso。當使用者執行的查詢https://contoso/searchresults.aspx,來自 WebApp1 的結果使用的 Url 會是相對https://contoso,因此會是表單https://contoso/路徑/ 結果.aspx。Now, say that you crawl the default zone, https://contoso. When users perform queries from https://contoso/searchresults.aspx, URLs of results from WebApp1 will all be relative to https://contoso, and therefore will be of the form https://contoso/ path/ result.aspx.

同樣地,當查詢是來自 Extranet 區域 — 在此例中https://fabrikam/searchresults.aspx— 來自 WebApp1 的結果將是相對https://fabrikam,因此會是表單https://fabrikam/路徑/ 結果.aspx。Similarly, when queries originate from the Extranet zone—in this case, https://fabrikam/searchresults.aspx—results from WebApp1 will all be relative to https://fabrikam, and therefore will be of the form https://fabrikam/ path/ result.aspx.

在前述兩個案例中,由於查詢位置和搜尋結果 URL 之間有區域一致性,使用者馬上就能檢視和開啟搜尋結果,不需要變更為不同區域的安全性內容。In both of the previous cases, because of the zone consistency between the query location and the search-result URLs, users will readily be able to view and open search results, without having to change to the different security context of a different zone.

不過,現在改用說編目例如內部網路區域,將非預設區域http://fabrikam。在此例中為從任何區域的查詢,來自 WebApp1 的結果使用的 Url 一律是相對於已編目的非預設區域。也就是從查詢https://contoso/searchresults.aspx、 https://fabrikam/searchresults.aspx,或http://fabrikam/searchresults.aspx將會產生搜尋結果 Url 的開頭編目與因此會是表單的非預設區域http://fabrikam/路徑/ 結果.aspx。這可能會造成未預期或有問題的行為如下所示:However, now instead say that you crawl a non-default zone such as the Intranet zone, http://fabrikam. In this case, for queries from any zone, URLs of results from WebApp1 will always be relative to the non-default zone that was crawled. That is, a query from https://contoso/searchresults.aspx, https://fabrikam/searchresults.aspx, or http://fabrikam/searchresults.aspx will yield search-result URLs that begin with the non-default zone that was crawled, and therefore will be of the form http://fabrikam/ path/ result.aspx. This can cause unexpected or problematic behavior such as the following:

  • 當使用者嘗試開啟搜尋結果時,系統可能會提示他們輸入他們沒有的認證。例如,外部網路區域中的表單型驗證使用者可能沒有 Windows 驗證認證。When users try to open search results, they might be prompted for credentials that they don't have. For example, forms-based authenticated users in the Extranet zone might not have Windows authentication credentials.

  • 來自 WebApp1 的結果會使用 HTTP,但使用者可能會從位於外部網路區域搜尋https://fabrikam/searchresults.aspx。這可能會有安全性含意因為結果將不會使用安全通訊端階層 (SSL) 加密。The results from WebApp1 will use HTTP, but users might be searching from the Extranet zone at https://fabrikam/searchresults.aspx. This might have security implications because the results will not use secure sockets layer (SSL) encryption.

  • 精簡搜尋可能無法正確進行篩選,因為它們是對預設區域的公用 URL 進行篩選,而非對已編目的 URL 進行篩選。這是因為索引中的 URL 型屬性會與已編目的非預設 URL 相關。Refinements might not filter correctly, because they filter on the public URL for the default zone instead of the URL that was crawled. This is because URL-based properties in the index will be relative to the non-default URL that was crawled.

減少在 SharePoint Server 編目目標上進行編目的影響Reduce the effect of crawling on SharePoint Server crawl targets

您可以執行下列動作來減少在 SharePoint Server 編目目標 (也就是 SharePoint Server 前端網頁伺服器) 上編目的影響:You can reduce the effect of crawling on SharePoint Server crawl targets (that is, SharePoint Server front-end web servers) by doing the following:

  • 小型的 SharePoint Server 環境,請將所有編目流量重新都導向至單一 SharePoint Server 前端網頁伺服器。在大型環境中將重新導向至特定的群組的前端網頁伺服器的所有編目流量。這可防止編目程式要用來呈現和做為作用中使用者的網頁及內容的同一個資源。For a small SharePoint Server environment, redirect all crawl traffic to a single SharePoint Server front-end web server. For a large environment, redirect all crawl traffic to a specific group of front-end web servers. This prevents the crawler from using the same resources that are being used to render and serve web pages and content to active users.

  • 若要防止編目程式在編目期間使用共用的 SQL Server 磁碟與處理器資源的 Microsoft SQL Server 中的限制搜尋資料庫使用量。Limit search database usage in Microsoft SQL Server to prevent the crawler from using shared SQL Server disk and processor resources during a crawl.

如需詳細資訊,請參閱 <管理編目負載 (SharePoint Server 2010)For more information, see Manage crawl load (SharePoint Server 2010).

使用編目程式影響規則以限制編目造成的影響Using crawler impact rules to limit the effect of crawling

若要限制編目程式造成的影響,您也可以建立編目程式影響規則 (其會出現在 [Search_service_application_name:搜尋管理] 頁面)。編目程式影響規則會指定編目程式向一或多個起始位址要求內容的速率。具體而言,編目程式影響規則會指定要向 URL 一次要求指定數目的文件 (要求之間沒有等待時間),還是要向 URL 一次要求一份文件,並在要求之間等待指定的時間。每個編目程式影響規則均會套用至所有編目元件。To limit crawler impact, you can also create crawler impact rules, which are available from the Search_service_application_name: Search Administration page. A crawler impact rule specifies the rate at which the crawler requests content from a start address or range of start addresses. Specifically, a crawler impact rule either requests a specified number of documents at a time from a URL without waiting between requests, or it requests one document at a time from the URL and waits a specified time between requests. Each crawler impact rule applies to all crawl components.

對於貴組織中的伺服器,您可以根據已知的伺服器效能和容量,設定編目程式影響規則。但是,對於外部網站可能無法這麼做。因此,您可能會不小心要求太多內容或太頻繁要求內容,造成外部伺服器上的資源使用過度。如此可能會使得這些外部伺服器的管理員限制伺服器存取,導致您難以甚至無法編目這些存放庫。因此,請設定編目程式影響規則來盡可能不影響外部伺服器,而您仍然能夠夠頻繁地編目足夠的內容,讓索引的新鮮度符合您的需求。For servers in your organization, you can set crawler impact rules based on known server performance and capacity. However, this might not be possible for external sites. Therefore, you might unintentionally use too many resources on external servers by requesting too much content or requesting content too frequently. This could cause administrators of those external servers to limit server access so that it becomes difficult or impossible for you to crawl those repositories. Therefore, set crawler impact rules to have as little effect on external servers as possible while you still crawl enough content frequently enough to make sure that that the freshness of the index meets your requirements.

使用 Active Directory 群組而不是個別使用者來決定權限Use Active Directory groups instead of individual users for permissions

網站上執行不同的活動的使用者或群組的能力是取決於您指派權限層級。如果您新增或移除個別的網站權限的使用者或如果您使用 SharePoint Server 群組指定網站的權限,以及變更群組的成員資格編目程式必須執行 「 僅限安全性編目",這會更新所有受影響的項目中以反映變更搜尋索引。同樣地,新增或更新 web 應用程式原則不同的使用者或 SharePoint Server 群組會觸發該原則所涵蓋的所有內容的編目。這會增加編目負載,而且可能會降低搜尋結果時效性。因此,指定網站的權限,最好使用 Active Directory 網域服務 (AD DS) 群組,因為這不需要更新受影響的項目在搜尋索引中編目程式。The ability of a user or group to perform various activities on a site is determined by the permission level that you assign. If you add or remove users individually for site permissions, or if you use a SharePoint Server group to specify site permissions and you change the membership of the group, the crawler must perform a "security-only crawl", which updates all affected items in the search index to reflect the change. Similarly, adding or updating web application policy with different users or SharePoint Server groups will trigger a crawl of all content covered by that policy. This increases crawl load and can reduce search-results freshness. Therefore, to specify site permissions, it is best to use Active Directory Domain Services (AD DS) groups, because this does not require the crawler to update the affected items in the search index.

新增第二個編目元件以提供容錯Add a second crawl component to provide fault tolerance

當您建立 Search service 應用程式時,預設搜尋拓撲包含一個編目元件。編目元件從內容存放庫擷取的項目、 下載項目伺服器裝載編目元件、 將項目及相關聯的中繼資料傳遞至內容處理元件,並將編目的相關資訊新增至相關聯的編目資料庫。您可以新增第二個編目元件以提供容錯。當一個編目元件變成無法使用,其餘的編目元件都會對全部的編目。大部分的 SharePoint Server 伺服器陣列的兩個編目元件的總 」 即足以。When you create a Search service application, the default search topology includes one crawl component. A crawl component retrieves items from content repositories, downloads the items to the server that hosts the crawl component, passes the items and associated metadata to a content processing component, and adds crawl-related information to associated crawl databases. You can add a second crawl component to provide fault tolerance. If one crawl component becomes unavailable, the remaining crawl component will take over all of the crawling. For most SharePoint Server farms, a total of two crawl components is sufficient.

如需詳細資訊,請參閱下列文章:For more information, see the following articles:

管理環境資源以提升編目效能Manage environment resources to improve crawl performance

當編目程式編目內容、將內容下載至編目伺服器 (即裝載該編目元件的伺服器),並將內容饋送至內容處理元件時,有好幾項因素可能會使效能降低。若要提升編目效能,您可以執行下列動作:As the crawler crawls content, downloads the content to the crawl server (the server that hosts the crawl component), and feeds the content to content processing components, several factors can adversely affect performance. To improve crawl performance, you can do the following:

若要解決此潛在效能瓶頸To address this potential performance bottleneck 實作此解決方案Implement this solution
所編目伺服器的回應時間緩慢Slow response time from crawled servers 提供更多的 CPU 與 RAM 和更快的磁碟 I/OProvide more CPU and RAM and faster disk I/O
低網路頻寬Low network bandwidth 在每個編目伺服器上安裝一兩張 1Gbps 網路介面卡Install one or two one-gigabit-per-second network adapters on each crawl server
內容處理Content processing 提供更多內容處理元件,並為每個內容處理元件提供更多 CPU 資源Provide more content processing components, and more CPU resources for each content processing component
索引元件的處理速度緩慢Slow processing by the index components 為裝載索引元件的伺服器新增 I/O 資源Add I/O resources for servers that host index components

如需詳細資訊,請參閱下列資源:For more information, see the following resources:

變更搜尋拓撲之前確定沒有編目正在使用中Make sure no crawls are active before you change the search topology

在對搜尋拓撲進行變更之前,建議您先確認沒有編目正在進行中。否則,拓撲變更可能無法順利進行。We recommend that you confirm that no crawls are in progress before you initiate a change to the search topology. Otherwise, it is possible that the topology change will not occur smoothly.

如果必要,您可以手動暫停或停止完整或累加編目,您也可以停用連續編目。如需詳細資訊,請參閱下列文章:If necessary, you can manually pause or stop full or incremental crawls, and you can disable continuous crawls. For more information, see the following articles:

注意

暫停編目的缺點是編目元件的參照仍然留在搜尋管理資料庫的 MSSCrawlComponentsState 表格中。如果您要移除任何編目元件 (例如,因為您要從伺服器陣列中移除裝載這些元件的伺服器),則可能會造成問題。但是,當您停止編目時,MSSCrawlComponentsState 表格中對編目元件的參照就會刪除。因此,如果您要移除編目元件,最好是停止編目而不是暫停編目。Pausing a crawl has the disadvantage that references to crawl components can remain in the MSSCrawlComponentsState table in the search administration database. This can cause a problem if you want to remove any crawl components (say, because you want to remove a server that hosts those components from the farm). However, when you stop a crawl, references to crawl components in the MSSCrawlComponentsState table are deleted. Therefore, if you want to remove crawl components, it is better to stop crawls than to pause crawls.

若要確認沒有編目正在進行中、 在_Search_service_application_name_: 管理內容來源] 頁面上,確認每個內容來源的 [狀態] 欄位中的值為 [閒置] 或 [暫停。(編目完成時,或停止編目時,「狀態」 欄位的內容來源中的值會變更為閒置。)To confirm that no crawls are in progress, on the Search_service_application_name: Manage Content Sources page, make sure that the value in the Status field for each content source is either Idle or Paused. (When a crawl is completed, or when you stop a crawl, the value in the Status field for the content source will change to Idle.)

從伺服器陣列移除編目主機之前先從該主機移除編目元件Remove crawl components from a crawl host before you remove the host from a farm

當伺服器中裝載編目元件時,從伺服器陣列移除該伺服器會讓搜尋系統無法編目內容。因此,在從伺服器陣列移除編目主機之前,強烈建議您先執行下列動作:When a server hosts a crawl component, removing the server from the farm can make it impossible for the Search system to crawl content. Therefore, before you remove a crawl host from a farm, we strongly recommend that you do the following:

  1. 確定沒有編目正在使用中。Make sure that no crawls are active.

    如需詳細資訊,請參閱上一節,請確定沒有編目正在使用中變更搜尋拓撲之前For more information, see the previous section, Make sure no crawls are active before you change the search topology.

  2. 移除或搬動該主機上的編目元件。Remove or relocate crawl components that are on that host.

如需詳細資訊,請參閱下列資源:For more information, see the following resources:

變更編目設定或套用更新之後測試編目和查詢功能Test crawl and query functionality after you change the crawl configuration or apply updates

建議您在變更設定或套用更新之後,測試伺服器陣列中的編目和查詢功能。下列程序是輕鬆執行此類測試的範例。We recommend that you test the crawl and query functionality in the server farm after you make configuration changes or apply updates. The following procedure is an example of an easy way to perform such a test.

若要測試編目和查詢功能To test the crawl and query functionality

  1. 確認執行此程序的使用者帳戶為您要設定之 Search Service 應用程式的管理員。Verify that the user account that performs this procedure is an administrator for the Search service application that you want to configure.

  2. 建立您僅會在此測試使用的暫時內容來源。Create a content source that you will use temporarily just for this test.

    在 [新的內容來源,請在 [起始位址] 區段的 [類型的起始位址下 (每行一個) ] 方塊中指定包含多個項目不是已在索引中的起始位址 — 例如數個 TXT 檔案的檔案共用。如需詳細資訊,請參閱新增、 編輯或刪除 SharePoint Server 中的內容來源In the new content source, in the Start Addresses section, in the Type start addresses below (one per line) box, specify a start address that contains several items that are not already in the index — for example, several TXT files that are on a file share. For more information, see Add, edit, or delete a content source in SharePoint Server.

  3. 開始對該內容來源進行完整編目。Start a full crawl for that content source.

    如需詳細資訊,請參閱 <啟動、 暫停、 繼續或停止 SharePoint Server 中的編目。編目時完成,在_Search_service_application_name_: 管理內容來源] 頁面上的值在 [狀態] 欄中的內容來源將會是閒置。(若要更新 [狀態] 欄,重新整理依序按一下 [重新整理的 [管理內容來源] 頁面上)。For more information, see Start, pause, resume, or stop a crawl in SharePoint Server. When the crawl is complete, on the Search_service_application_name: Manage Content Sources page, the value in the Status column for the content source will be Idle. (To update the Status column, refresh the Manage Content Sources page by clicking Refresh.)

  4. 當編目完成時,移至搜尋中心並且執行搜尋查詢以尋找這些檔案。When the crawl is complete, go to the Search Center and perform search queries to find those files.

    如果您的部署還沒有搜尋中心,請參閱建立 SharePoint Server 中的搜尋中心網站If your deployment does not already have a Search Center, see Create a Search Center site in SharePoint Server.

  5. 完成測試之後,刪除該暫時內容來源。After you finish testing, delete the temporary content source.

    如此會從搜尋索引中移除該內容來源所指定的項目,在您完成測試之後,這些項目就不會顯示在搜尋結果中。This removes the items specified by that content source from the search index so that they do not appear in search results after you finish testing.

使用編目記錄和編目狀況報表來診斷問題Use the crawl log and crawl-health reports to diagnose problems

編目記錄會追蹤所編目內容的狀態資訊。記錄中會包含內容來源、主機、錯誤、資料庫、URL 及歷程記錄的檢視。例如,您可以使用此記錄來判斷上次成功編目內容來源的時間、所編目的內容是否成功新增至索引、該內容是否因為編目規則而遭到排除,或者編目是否因為遇到錯誤而失敗。The crawl log tracks information about the status of crawled content. The log includes views for content sources, hosts, errors, databases, URLs, and history. For example, you can use this log to determine the time of the last successful crawl for a content source, whether crawled content was successfully added to the index, whether it was excluded because of a crawl rule, or whether crawling failed because of an error.

編目狀況報表會提供有關編目速率、編目延遲、編目新鮮度、內容處理、CPU 和記憶體負載、連續編目及編目佇列的詳細資訊。Crawl-health reports provide detailed information about crawl rate, crawl latency, crawl freshness, content processing, CPU and memory load, continuous crawls, and the crawl queue.

您可以使用編目記錄和編目狀況報表來診斷搜尋體驗的問題。診斷資訊可以協助您判斷調整各項元素 (如內容來源、編目規則、編目程式影響規則、編目元件及編目資料庫) 是否有幫助。You can use the crawl log and crawl-health reports to diagnose problems with the search experience. The diagnostic information can help you determine whether it would be helpful to adjust elements such as content sources, crawl rules, crawler impact rules, crawl components, and crawl databases.

如需詳細資訊,請參閱 < SharePoint Server 中的檢視搜尋診斷資料For more information, see View search diagnostics in SharePoint Server.