Beyond Relational– Semantic Search with SQL Server FileTable

Semantic Search new for SQL Server Denali adds extra intelligence to the FullText search capability that has existed in SQL Server for several releases, by looking at the relationships between words and in and across each document.  For example you can find which documents are similar to a specified document, and this could help with matching candidates’ CVs to job adverts, or you can find the key words and phrases in a given set of documents. 

This statistical analysis information  is stored in a special database (which is supplied but not installed in SQL Server Denali) and is refreshed as part of the process that update the FullText indexes it depends on.

Once it is setup you have access to three tabular functions:

Note that in CTP3 only words not phrases work in these functions and that currently doesn’t understand word stems e.g. that cloudy and clouds are derivations of the term cloud and treats each as separate words.  This could be fixed as Denali gets nearer to release but there is no guarantee of this.

There are examples of this in the adventure works database which you can try, but I wanted to see what it would do with my FileTable of PowerPoint presentations I created in my last three blog posts, and have a short screen cast on this ….

and the 4 FileTable screencasts in this series are now linked (hyperlinks at the end of each screencast) so you can hop through the series.

Here are some notes if you want to install semantic search:

  • You’ll need to have installed SQL Server Denali ctp3 with the FullText option. 
  • Having done that you’ll need to install the SemanticLanguageDatabase.msi in the SQL install media (there are x86 and x64 versions). 
  • This database then needs to be attached to SQL Server 
  • It is then registered with this statement:
 EXEC sp_fulltext_semantic_language_statistics_db @dbname = N'Semanticsdb'
  •   You’ll need to add FullText indexing on the FileTables (or any other table)  the extra semantic_statistics clause:
        LANGUAGE 2057 
        TYPE COLUMN file_type 
        LANGUAGE 2057
    KEY INDEX PK_MyDecks_5A5B77D56E0F5D8F
    ON DeepFat_Catalog