PowerShell to Rebalance Crawl Store DBs in SP2013

In SharePoint 2013, simply adding a new Crawl Store DB doesn't cause the SSA to rebalance links among stores, and admins are unable to manually trigger a rebalancing process until the standard deviation of links in all existing Crawl Stores exceeds the threshold defined by the SSA property CrawlStoreImbalanceThreshold.

Once this threshold is reached eventually, the Search Admin UI displays a control that allows the administrator to initiate the rebalancing process. Specifically, the CrawlStoresAreUnbalanced()  method checks whether the standard deviation of link counts among all crawl stores is higher than value defined by the SSA property CrawlStoreImbalanceThreshold. Being said, you may have to lower the threshold value much lower than expected to trigger CrawlStoresAreUnbalanced() to evaluate as TRUE. Another SSA property, CrawlPartitionSplitThreshold, determines the threshold when hosts can be split across multiple Crawl Store DBs during the rebalancing process.

The following example illustrates a full example of these cmdlets, which are largely derived from the CrawlStorePartitionManager Class ( https://msdn.microsoft.com/en-us/library/microsoft.office.server.search.administration.crawlstorepartitionmanager )

Prior to the rebalancing process, we can see that all links currently exist in a single Crawl Store DB:

​Crawl Store DB Name ​ContentSourceID HostID​ linkCount​
​V5_SSA_CrawlStore ​1 ​4 ​20,558
​V5_SSA_CrawlStore 4 ​1 ​157,671
​V5_SSA_CrawlStore 6 ​2 ​14,813
​V5_SSA_CrawlStore ​6 ​3 ​10,818

 

$SSA = Get-SPEnterpriseSearchServiceApplication

New-SPEnterpriseSearchCrawlDatabase -SearchApplication $SSA -DatabaseName V5_SSA_CrawlStore2

New-SPEnterpriseSearchCrawlDatabase -SearchApplication $SSA -DatabaseName V5_SSA_CrawlStore3 

$foo = new-Object Microsoft.Office.Server.Search.Administration.CrawlStorePartitionManager($SSA)

$foo.CrawlStoresAreUnbalanced()

False 

$ssa.GetProperty("CrawlStoreImbalanceThreshold")

10000000 # 1 million (this is the default value)

$ssa.SetProperty("CrawlStoreImbalanceThreshold",10000)

# Verify in registry of Crawl Component that this changes to the new value

# ex: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Applications\1d330903-aad9-47e2-9373-f30e945c933c-crawl-0\CatalogNames

$foo.CrawlStoresAreUnbalanced()

True # After lowering the threshold, it's no longer "balanced"

$ssa.GetProperty("CrawlPartitionSplitThreshold")

10000000 # 10 million (this is the default value)

$ssa.SetProperty("CrawlPartitionSplitThreshold",50000)

# This allows any partition greater than 50,000 items to be split across Crawl Store when rebalancing 

$foo.BeginCrawlStoreRebalancing()

Guid

----

f9923696-76f1-482d-96cd-c10aedd92fa2

$foo.TimeToCompletion("f9923696-76f1-482d-96cd-c10aedd92fa2")

# Repeat as needed using GUID from above...

$foo.Completed("f9923696-76f1-482d-96cd-c10aedd92fa2")

True

 

After the rebalance, use SQL Queries such as the following to confirm:

SELECT ContentSourceID, HostID, COUNT(*) AS linkCount FROM [V5_SSA_CrawlStore].[dbo].[MSSCrawlURL] with (nolock) group by ContentSourceID, HostID order by ContentSourceID, HostID

SELECT ContentSourceID, HostID, COUNT(*) AS linkCount FROM [V5_SSA_CrawlStore2].[dbo].[MSSCrawlURL] with (nolock) group by ContentSourceID, HostID order by ContentSourceID, HostID

SELECT ContentSourceID, HostID, COUNT(*) AS linkCount FROM [V5_SSA_CrawlStore3].[dbo].[MSSCrawlURL] with (nolock) group by ContentSourceID, HostID order by ContentSourceID, HostID

​Crawl Store DB Name ​ContentSourceID HostID​ linkCount​
​V5_SSA_CrawlStore ​1 ​4 ​20,558
​V5_SSA_CrawlStore 6 ​2 ​14,813
​V5_SSA_CrawlStore 6 ​3 ​10,818
​V5_SSA_CrawlStore2 ​4 ​1 77,836
​V5_SSA_CrawlStore3 ​4 ​1 ​79,835

 

Which confirms the reblanced Crawl StoreDBs as well as illustrating the splitting of a single HostID across the crawl stores (in this case, HostID 1 was split across CrawlStore2 and CrawlStore3).

 

Update: I've recently had several people reach out to me after reading this TechNet article, which states:

“In SharePoint Server 2010, host distribution rules are used to associate a host with a specific crawl database. Because of changes in the search system architecture, SharePoint Server 2013 does not use host distribution rules. Instead, Search service application administrators can determine whether the crawl database should be rebalanced by monitoring the Databases view in the crawl log”

In response, I've just published Why Host Distribution Rules Don't Apply to SharePoint 2013.

 

Update: For reference, use the following PowerShell to determine the document counts being used by CrawlStoresAreUnbalanced() to calculate the standard deviation among all crawl stores:

$crawlLog = New-Object Microsoft.Office.Server.Search.Administration.CrawlLog $SSA

$dbHashtable = $crawlLog.GetCrawlDatabaseInfo()

$dbHashtable.Keys
    Guid
    ----
    5bf0290a-ad4c-4462-a7b2-6892be9431c1
    9e95a69e-7129-4d96-aaff-d577c4663cb3

$dbHashtable["5bf0290a-ad4c-4462-a7b2-6892be9431c1"]
    DocumentCount : 5094767
    Partitions : {msdn.microsoft.com, technet.microsoft....
    ID : 5bf0290a-ad4c-4462-a7b2-6892be9431c1
    Name : V5_SSA_CrawlStoreToo

$dbHashtable["9e95a69e-7129-4d96-aaff-d577c4663cb3"]
    DocumentCount : 188343
    Partitions : {{853da760-f456-4375-a77b-8e41bc218770}...
    ID : 9e95a69e-7129-4d96-aaff-d577c4663cb3
    Name : V5_SSA_CrawlStore