Configure the scoring algorithm in Azure Cognitive Search
Depending on the age of your search service, Azure Cognitive Search supports two scoring algorithms for assigning relevance to results in a full text search query:
- An Okapi BM25 algorithm, used in all search services created after July 15, 2020
- A classic similarity algorithm, used by all search services created before July 15, 2020
BM25 ranking is the default because it tends to produce search rankings that align better with user expectations. It includes parameters for tuning results based on factors such as document size. For search services created after July 2020, BM25 is the sole scoring algorithm. If you try to set "similarity" to ClassicSimilarity on a new service, an HTTP 400 error will be returned because that algorithm is not supported by the service.
For older services, classic similarity remains the default algorithm. Older services can upgrade to BM25 on a per-index basis. When switching from classic to BM25, you can expect to see some differences how search results are ordered.
Set BM25 parameters
BM25 similarity adds two parameters to control the relevance score calculation. To set "similarity" parameters, issue a Create or Update Index request as illustrated by the following example.
PUT [service-name].search.windows.net/indexes/[index-name]?api-version=2020-06-30&allowIndexDowntime=true
{
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity",
"b" : 0.5,
"k1" : 1.3
}
}
Because Cognitive Search won't allow updates to a live index, you'll need to take the index offline so that the parameters can be added. Indexing and query requests will fail while the index is offline. The duration of the outage is the amount of time it takes to update the index, usually no more than several seconds. When the update is complete, the index comes back automatically. To take the index offline, append the "allowIndexDowntime=true" URI parameter on the request that sets the "similarity" property.
BM25 property reference
| Property | Type | Description |
|---|---|---|
| k1 | number | Controls the scaling function between the term frequency of each matching terms to the final relevance score of a document-query pair. Values are usually 0.0 to 3.0, with 1.2 as the default. A value of 0.0 represents a "binary model", where the contribution of a single matching term is the same for all matching documents, regardless of how many times that term appears in the text, while a larger k1 value allows the score to continue to increase as more instances of the same term is found in the document. Using a higher k1 value can be important in cases where we expect multiple terms to be part of a search query. In those cases, we might want to favor documents that match many of the different query terms being searched over documents that only match a single one, multiple times. For example, when querying the index for documents containing the terms "Apollo Spaceflight", we might want to lower the score of an article about Greek Mythology that contains the term "Apollo" a few dozen times, without mentions of "Spaceflight", compared to another article that explicitly mentions both "Apollo" and "Spaceflight" a handful of times only. |
| b | number | Controls how the length of a document affects the relevance score. Values are between 0 and 1, with 0.75 as the default. A value of 0.0 means the length of the document will not influence the score, while a value of 1.0 means the impact of term frequency on relevance score will be normalized by the document's length. Normalizing the term frequency by the document's length is useful in cases where we want to penalize longer documents. In some cases, longer documents (such as a complete novel), are more likely to contain many irrelevant terms, compared to much shorter documents. |
Enable BM25 scoring on older services
If you're running a search service that was created from March 2014 through July 15, 2020, you can enable BM25 by setting a "similarity" property on new indexes. The property is only exposed on new indexes, so if want BM25 on an existing index, you must drop and rebuild the index with a "similarity" property set to "Microsoft.Azure.Search.BM25Similarity".
Once an index exists with a "similarity" property, you can switch between BM25Similarity or ClassicSimilarity.
The following links describe the Similarity property in the Azure SDKs.
| Client library | Similarity property |
|---|---|
| .NET | SearchIndex.Similarity |
| Java | SearchIndex.setSimilarity |
| JavaScript | SearchIndex.Similarity |
| Python | similarity property on SearchIndex |
REST example
You can also use the REST API. The following example creates a new index with the "similarity" property set to BM25:
PUT [service-name].search.windows.net/indexes/[index name]?api-version=2020-06-30
{
"name": "indexName",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true
},
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"analyzer": "en.lucene"
},
...
],
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
}
}
See also
Povratne informacije
Pošalјite i prikažite povratne informacije za