TokenFilterName Class

public final class TokenFilterName
extends ExpandableStringEnum<TokenFilterName>

Defines the names of all token filters supported by the search engine.

Field Summary

Modifier and Type Field and Description
static final TokenFilterName APOSTROPHE

Strips all characters after an apostrophe (including the apostrophe itself).

static final TokenFilterName ARABIC_NORMALIZATION

A token filter that applies the Arabic normalizer to normalize the orthography.

static final TokenFilterName ASCII_FOLDING

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist.

static final TokenFilterName CJK_BIGRAM

Forms bigrams of CJK terms that are generated from the standard tokenizer.

static final TokenFilterName CJK_WIDTH

Normalizes CJK width differences.

static final TokenFilterName CLASSIC

Removes English possessives, and dots from acronyms.

static final TokenFilterName COMMON_GRAM

Construct bigrams for frequently occurring terms while indexing.

static final TokenFilterName EDGE_NGRAM

Generates n-grams of the given size(s) starting from the front or the back of an input token.

static final TokenFilterName ELISION

Removes elisions.

static final TokenFilterName GERMAN_NORMALIZATION

Normalizes German characters according to the heuristics of the German2 snowball algorithm.

static final TokenFilterName HINDI_NORMALIZATION

Normalizes text in Hindi to remove some differences in spelling variations.

static final TokenFilterName INDIC_NORMALIZATION

Normalizes the Unicode representation of text in Indian languages.

static final TokenFilterName KEYWORD_REPEAT

Emits each incoming token twice, once as keyword and once as non-keyword.

static final TokenFilterName KSTEM

A high-performance kstem filter for English.

static final TokenFilterName LENGTH

Removes words that are too long or too short.

static final TokenFilterName LIMIT

Limits the number of tokens while indexing.

static final TokenFilterName LOWERCASE

Normalizes token text to lower case.

static final TokenFilterName NGRAM

Generates n-grams of the given size(s).

static final TokenFilterName PERSIAN_NORMALIZATION

Applies normalization for Persian.

static final TokenFilterName PHONETIC

Create tokens for phonetic matches.

static final TokenFilterName PORTER_STEM

Uses the Porter stemming algorithm to transform the token stream.

static final TokenFilterName REVERSE

Reverses the token string.

static final TokenFilterName SCANDINAVIAN_FOLDING_NORMALIZATION

Folds Scandinavian characters ������->a and ����->o.

static final TokenFilterName SCANDINAVIAN_NORMALIZATION

Normalizes use of the interchangeable Scandinavian characters.

static final TokenFilterName SHINGLE

Creates combinations of tokens as a single token.

static final TokenFilterName SNOWBALL

A filter that stems words using a Snowball-generated stemmer.

static final TokenFilterName SORANI_NORMALIZATION

Normalizes the Unicode representation of Sorani text.

static final TokenFilterName STEMMER

Language specific stemming filter.

static final TokenFilterName STOPWORDS

Removes stop words from a token stream.

static final TokenFilterName TRIM

Trims leading and trailing whitespace from tokens.

static final TokenFilterName TRUNCATE

Truncates the terms to a specific length.

static final TokenFilterName UNIQUE

Filters out tokens with same text as the previous token.

static final TokenFilterName UPPERCASE

Normalizes token text to upper case.

static final TokenFilterName WORD_DELIMITER

Splits words into subwords and performs optional transformations on subword groups.

Constructor Summary

Constructor Description
TokenFilterName()

Deprecated

Use the fromString(String name) factory method.

Creates a new instance of TokenFilterName value.

Method Summary

Modifier and Type Method and Description
static TokenFilterName fromString(String name)

Creates or finds a TokenFilterName from its string representation.

static Collection<TokenFilterName> values()

Gets known TokenFilterName values.

Methods inherited from ExpandableStringEnum

Methods inherited from java.lang.Object

Field Details

APOSTROPHE

public static final TokenFilterName APOSTROPHE

Strips all characters after an apostrophe (including the apostrophe itself). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html.

ARABIC_NORMALIZATION

public static final TokenFilterName ARABIC_NORMALIZATION

A token filter that applies the Arabic normalizer to normalize the orthography. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html.

ASCII_FOLDING

public static final TokenFilterName ASCII_FOLDING

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html.

CJK_BIGRAM

public static final TokenFilterName CJK_BIGRAM

Forms bigrams of CJK terms that are generated from the standard tokenizer. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html.

CJK_WIDTH

public static final TokenFilterName CJK_WIDTH

Normalizes CJK width differences. Folds fullwidth ASCII variants into the equivalent basic Latin, and half-width Katakana variants into the equivalent Kana. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html.

CLASSIC

public static final TokenFilterName CLASSIC

Removes English possessives, and dots from acronyms. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html.

COMMON_GRAM

public static final TokenFilterName COMMON_GRAM

Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html.

EDGE_NGRAM

public static final TokenFilterName EDGE_NGRAM

Generates n-grams of the given size(s) starting from the front or the back of an input token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html.

ELISION

public static final TokenFilterName ELISION

Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html.

GERMAN_NORMALIZATION

public static final TokenFilterName GERMAN_NORMALIZATION

Normalizes German characters according to the heuristics of the German2 snowball algorithm. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html.

HINDI_NORMALIZATION

public static final TokenFilterName HINDI_NORMALIZATION

Normalizes text in Hindi to remove some differences in spelling variations. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizationFilter.html.

INDIC_NORMALIZATION

public static final TokenFilterName INDIC_NORMALIZATION

Normalizes the Unicode representation of text in Indian languages. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizationFilter.html.

KEYWORD_REPEAT

public static final TokenFilterName KEYWORD_REPEAT

Emits each incoming token twice, once as keyword and once as non-keyword. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html.

KSTEM

public static final TokenFilterName KSTEM

A high-performance kstem filter for English. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/en/KStemFilter.html.

LENGTH

public static final TokenFilterName LENGTH

Removes words that are too long or too short. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html.

LIMIT

public static final TokenFilterName LIMIT

Limits the number of tokens while indexing. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html.

LOWERCASE

public static final TokenFilterName LOWERCASE

Normalizes token text to lower case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.htm.

NGRAM

public static final TokenFilterName NGRAM

Generates n-grams of the given size(s). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html.

PERSIAN_NORMALIZATION

public static final TokenFilterName PERSIAN_NORMALIZATION

Applies normalization for Persian. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizationFilter.html.

PHONETIC

public static final TokenFilterName PHONETIC

Create tokens for phonetic matches. See https://lucene.apache.org/core/4\_10\_3/analyzers-phonetic/org/apache/lucene/analysis/phonetic/package-tree.html.

PORTER_STEM

public static final TokenFilterName PORTER_STEM

Uses the Porter stemming algorithm to transform the token stream. See http://tartarus.org/~martin/PorterStemmer.

REVERSE

public static final TokenFilterName REVERSE

Reverses the token string. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html.

SCANDINAVIAN_FOLDING_NORMALIZATION

public static final TokenFilterName SCANDINAVIAN_FOLDING_NORMALIZATION

Folds Scandinavian characters ������->a and ����->o. It also discriminates against use of double vowels aa, ae, ao, oe and oo, leaving just the first one. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html.

SCANDINAVIAN_NORMALIZATION

public static final TokenFilterName SCANDINAVIAN_NORMALIZATION

Normalizes use of the interchangeable Scandinavian characters. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html.

SHINGLE

public static final TokenFilterName SHINGLE

Creates combinations of tokens as a single token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html.

SNOWBALL

public static final TokenFilterName SNOWBALL

A filter that stems words using a Snowball-generated stemmer. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html.

SORANI_NORMALIZATION

public static final TokenFilterName SORANI_NORMALIZATION

Normalizes the Unicode representation of Sorani text. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html.

STEMMER

public static final TokenFilterName STEMMER

Language specific stemming filter. See https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search\#TokenFilters.

STOPWORDS

public static final TokenFilterName STOPWORDS

Removes stop words from a token stream. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html.

TRIM

public static final TokenFilterName TRIM

Trims leading and trailing whitespace from tokens. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html.

TRUNCATE

public static final TokenFilterName TRUNCATE

Truncates the terms to a specific length. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html.

UNIQUE

public static final TokenFilterName UNIQUE

Filters out tokens with same text as the previous token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html.

UPPERCASE

public static final TokenFilterName UPPERCASE

Normalizes token text to upper case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html.

WORD_DELIMITER

public static final TokenFilterName WORD_DELIMITER

Splits words into subwords and performs optional transformations on subword groups.

Constructor Details

TokenFilterName

@Deprecated
public TokenFilterName()

Deprecated

Use the fromString(String name) factory method.

Creates a new instance of TokenFilterName value.

Method Details

fromString

public static TokenFilterName fromString(String name)

Creates or finds a TokenFilterName from its string representation.

Parameters:

name - a name to look for.

Returns:

the corresponding TokenFilterName.

values

public static Collection values()

Gets known TokenFilterName values.

Returns:

known TokenFilterName values.

Applies to