mdq.NGrams (Transact-SQL)

Outputs a stream of tokens based on an input string. This function is in the mdq schema and is available only in the Master Data Services database.

Topic link iconTransact-SQL Syntax Conventions

Syntax

mdq.NGrams(input,n,padSpace)

Arguments

  • input
    Is the input string to create tokens from. input is nvarchar(4000) with no default.

  • n
    Specifies the length of each token. n is tinyint with a default value of 3. Valid values are 1 through 255.

  • padSpace
    Specifies whether to left-pad and right-pad the input. padSpace is bit with a default value of 0. A value of 0 pads the beginning and end of the input with characters. A value of 1 pads the beginning and end of the input with space characters.

Table Returned

Column name

Column type

Description

Sequence

int

Is the sequence of the tokens in the result stream.

Token

Nvarchar(255)

Is a single token of the specified length.

Remarks

The result is a stream of tokens, also known as a set of n-grams, in the length specified by n. n-grams can be used to compare strings and determine approximate matches between those strings.

Permissions

This function is available to the public role.

Examples

The following example splits the input string into a stream of trigrams (tokens that are three characters in length).

USE MDM_Sample;
GO
SELECT * FROM mdq.NGrams(N'Northwind', 3, 0);

See Also

Reference

Master Data Services Functions (Transact-SQL)