Sharepoint 2013 Search Ranking and Relevancy Part 1: Let’s compare to FS14

I’m very happy to do some “guest” blogging for my good friend Leo and continue diving into various search-related topics.  In this and upcoming posts, I’d like to jump right into something that interests me very much, and that is taking a look at what makes some documents more relevant than others as well as what factors influence rank score calculations.

Since Sharepoint 2013 is already out, I’d like to touch upon a question that comes up often when someone is considering moving from FAST ESP or FAST for Sharepoint 2010 to Sharepoint 2013 :  “So how are rank scores calculated in Sharepoint 2013 Search as opposed to previous FAST versions”?

In upcoming posts, I will go more into “internals” of the current Sharepoint 2013 ranking model as well as introduce the basics of relevancy calculation concepts that apply across many search engines and are not necessarily specific to FAST or Sharepoint Search.

There are some excellent blog posts out there that go in-depth on how Sharepoint 2013 Search rank models work, including the ones below from Alexey Kozhemiakin and Mikael Svenson.

https://powersearching.wordpress.com/2013/03/29/how-sharepoint-2013-ranking-models-work/

https://techmikael.blogspot.com/2013/04/rank-models-in-2013main-differences.html

To avoid being repetitive, what I’ve tried to do is to create an easy to see comparison chart between factors that influence rank calculations in FS14 to Sharepoint 2013 Search.  I may update this chart in the future to include FAST ESP, although the main factors involved in both ESP and FS14 are somewhat similar to each other as opposed to Sharepoint 2013 Search(which is closer related to Sharepoint 2010 Search model).

One of the main differences is with the fact that Sharepoint 2013 Search uses a 2-stage process for rank calculations:  a linear ranking model as a 1st stage and a Neural Network as a 2nd stage.  The 1st stage is “light” and we can afford to apply it to all documents in a result set.  There are specific rank features that are part of this stage that are applied to all documents.  The top 1000 documents(candidates) based on Stage 1 Rank are input to Stage 2.  This stage is more performance intensive and re-computes the rank score for documents used as an input, which is why it is only applied to a limited set.  It consists of all the same rank features as Stage 1 plus 4 additional Proximity features.

For my comparison below, I was mainly using a model called “Search Ranking Model with Two Linear Stages”, which has been put in place as of August 2013 CU.  This model is recommended to use as a template when creating custom rank models, as it provides you with proximity without a Neural Network.

Rank Factor

FS14

Rank Models 1 OOTB rank model 16 Rank Models
Freshness Available OOTB and customizable N/A OOTB, possible to be configured
Dynamic Ranking (field weighting/managed properties) Context Boost:

Title, DocSubject, Keywords, DocKeywords, urlkeywords, Description, Author, CreatedBy, ModifiedBy,  MetadataAuthor, WorkEmail, Body, crawledpropertiescontent

Document MP’s + Usage/Social data

Title, QLogClickedText, SocialTag, Filename, Author, AnchorText, body

FileType Field-Boost weight/Managed Property Boost(OOTB -4000 points):

 

Format:

Unknown Format, XML, XLS

 

FileExtension:

CVS, TXT, MSG, OFT, ZIP, VSD, RTF

 

IsEmptyList, IsListItem

FileType rank feature:

 

 

 

PPT, Sharepoint site, DOC, HTML, ListItems, Image, Message, XLS, TXT

 

Language N/A Dynamic Rank(query-based).  LCID, i.e locale ID is used.
Social Distance  N/A Static Rank(colleague relationship to the person issuing the query).

 

0 bucket – No colleague relationship

1 bucket – first level(direct) relationship

2 bucket – second level(indirect) relationship

Static Rank Boost (Query-Independent) Quality Weight Components:

 

hwboost

docrank

siterank

urldepthrank

 

 

Authority Weight– Partial and Complete

 

 

Now part of Analytics Processing Component.  Static Rank features calculated with Search and Usage Analytics:

 

QLogClicks

QLogSkips

QLogLastClicks

EventRate

 

Proximity Enabled by default MinSpan (Neural Networks 2nd stage, parameters for proximity minimal span

 

Anchortext (Query-Dependent) Extnumocc = part of Dynamic Rank calculations, query-time hits in anchortext

 

AnchortextComplete
URLDepth (Query-Dependent) N/A – in FS14, this was a static rank feature. UrlDepth – Depth of the document URL(number of slashes)

 

Click-Through Weight(Query-Dependent) Query-Authority weight:  click-through weight, dynamic rank N/A

Now part of static rank features used in Analytics processing Component(QLogClicks, etc)

 

Rank Tuning

FS14

SP2013 Search

GUI-based applications. Ease of tuning rank calculations and user-friendliness N/A

Rank calculations  and scores can be seen either via ranklog output or via Codeplex tools such as FS4SP Query Logger.   However, there isn’t a user-friendly tool to help you make the changes and push them live, or preferably see them in “Preview” mode offline.  A separate ‘spreladmin’ tool is needed for click analysis.

 

Rank Tuning App(coming  soon).  A GUI-based and user-friendly way to tune/customize ranking and impact relevancy.  Includes a “preview”, i.e offline mode.
Rank logging availability Server-side:

Ranklog is available via QRServer output.  However, it is server-side and only available to Admins with local access to QRServer port 13280.

 

Client-side:

 

N/A

Server-side:

Rank tuning app/ULS logs

 

 

 

                                                                                                  Client-side:

 

ExplainRank template available to clients.

 

https://powersearching.wordpress.com/2013/01/25/explain-rank-in-sharepoint-2013-search/