TokenFilterName Class
- java.
lang. Object - com.
azure. core. util. ExpandableStringEnum<T> - com.
azure. search. documents. indexes. models. TokenFilterName
- com.
- com.
public final class TokenFilterName
extends ExpandableStringEnum<TokenFilterName>
Defines the names of all token filters supported by the search engine.
Field Summary
Modifier and Type | Field and Description |
---|---|
static final
Token |
APOSTROPHE
Strips all characters after an apostrophe (including the apostrophe itself). |
static final
Token |
ARABIC_NORMALIZATION
A token filter that applies the Arabic normalizer to normalize the orthography. |
static final
Token |
ASCII_FOLDING
Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. |
static final
Token |
CJK_BIGRAM
Forms bigrams of CJK terms that are generated from the standard tokenizer. |
static final
Token |
CJK_WIDTH
Normalizes CJK width differences. |
static final
Token |
CLASSIC
Removes English possessives, and dots from acronyms. |
static final
Token |
COMMON_GRAM
Construct bigrams for frequently occurring terms while indexing. |
static final
Token |
EDGE_NGRAM
Generates n-grams of the given size(s) starting from the front or the back of an input token. |
static final
Token |
ELISION
Removes elisions. |
static final
Token |
GERMAN_NORMALIZATION
Normalizes German characters according to the heuristics of the German2 snowball algorithm. |
static final
Token |
HINDI_NORMALIZATION
Normalizes text in Hindi to remove some differences in spelling variations. |
static final
Token |
INDIC_NORMALIZATION
Normalizes the Unicode representation of text in Indian languages. |
static final
Token |
KEYWORD_REPEAT
Emits each incoming token twice, once as keyword and once as non-keyword. |
static final
Token |
KSTEM
A high-performance kstem filter for English. |
static final
Token |
LENGTH
Removes words that are too long or too short. |
static final
Token |
LIMIT
Limits the number of tokens while indexing. |
static final
Token |
LOWERCASE
Normalizes token text to lower case. |
static final
Token |
NGRAM
Generates n-grams of the given size(s). |
static final
Token |
PERSIAN_NORMALIZATION
Applies normalization for Persian. |
static final
Token |
PHONETIC
Create tokens for phonetic matches. |
static final
Token |
PORTER_STEM
Uses the Porter stemming algorithm to transform the token stream. |
static final
Token |
REVERSE
Reverses the token string. |
static final
Token |
SCANDINAVIAN_FOLDING_NORMALIZATION
Folds Scandinavian characters ������->a and ����->o. |
static final
Token |
SCANDINAVIAN_NORMALIZATION
Normalizes use of the interchangeable Scandinavian characters. |
static final
Token |
SHINGLE
Creates combinations of tokens as a single token. |
static final
Token |
SNOWBALL
A filter that stems words using a Snowball-generated stemmer. |
static final
Token |
SORANI_NORMALIZATION
Normalizes the Unicode representation of Sorani text. |
static final
Token |
STEMMER
Language specific stemming filter. |
static final
Token |
STOPWORDS
Removes stop words from a token stream. |
static final
Token |
TRIM
Trims leading and trailing whitespace from tokens. |
static final
Token |
TRUNCATE
Truncates the terms to a specific length. |
static final
Token |
UNIQUE
Filters out tokens with same text as the previous token. |
static final
Token |
UPPERCASE
Normalizes token text to upper case. |
static final
Token |
WORD_DELIMITER
Splits words into subwords and performs optional transformations on subword groups. |
Constructor Summary
Constructor | Description |
---|---|
TokenFilterName() |
Deprecated
Use the fromString(String name) factory method.
Creates a new instance of Token |
Method Summary
Modifier and Type | Method and Description |
---|---|
static
Token |
fromString(String name)
Creates or finds a Token |
static
Collection<Token |
values()
Gets known Token |
Methods inherited from ExpandableStringEnum
Methods inherited from java.lang.Object
Field Details
APOSTROPHE
public static final TokenFilterName APOSTROPHE
Strips all characters after an apostrophe (including the apostrophe itself). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html.
ARABIC_NORMALIZATION
public static final TokenFilterName ARABIC_NORMALIZATION
A token filter that applies the Arabic normalizer to normalize the orthography. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html.
ASCII_FOLDING
public static final TokenFilterName ASCII_FOLDING
Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html.
CJK_BIGRAM
public static final TokenFilterName CJK_BIGRAM
Forms bigrams of CJK terms that are generated from the standard tokenizer. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html.
CJK_WIDTH
public static final TokenFilterName CJK_WIDTH
Normalizes CJK width differences. Folds fullwidth ASCII variants into the equivalent basic Latin, and half-width Katakana variants into the equivalent Kana. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html.
CLASSIC
public static final TokenFilterName CLASSIC
Removes English possessives, and dots from acronyms. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html.
COMMON_GRAM
public static final TokenFilterName COMMON_GRAM
Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html.
EDGE_NGRAM
public static final TokenFilterName EDGE_NGRAM
Generates n-grams of the given size(s) starting from the front or the back of an input token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html.
ELISION
public static final TokenFilterName ELISION
Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html.
GERMAN_NORMALIZATION
public static final TokenFilterName GERMAN_NORMALIZATION
Normalizes German characters according to the heuristics of the German2 snowball algorithm. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html.
HINDI_NORMALIZATION
public static final TokenFilterName HINDI_NORMALIZATION
Normalizes text in Hindi to remove some differences in spelling variations. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizationFilter.html.
INDIC_NORMALIZATION
public static final TokenFilterName INDIC_NORMALIZATION
Normalizes the Unicode representation of text in Indian languages. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizationFilter.html.
KEYWORD_REPEAT
public static final TokenFilterName KEYWORD_REPEAT
Emits each incoming token twice, once as keyword and once as non-keyword. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html.
KSTEM
public static final TokenFilterName KSTEM
A high-performance kstem filter for English. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/en/KStemFilter.html.
LENGTH
public static final TokenFilterName LENGTH
Removes words that are too long or too short. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html.
LIMIT
public static final TokenFilterName LIMIT
Limits the number of tokens while indexing. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html.
LOWERCASE
public static final TokenFilterName LOWERCASE
Normalizes token text to lower case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.htm.
NGRAM
public static final TokenFilterName NGRAM
Generates n-grams of the given size(s). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html.
PERSIAN_NORMALIZATION
public static final TokenFilterName PERSIAN_NORMALIZATION
Applies normalization for Persian. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizationFilter.html.
PHONETIC
public static final TokenFilterName PHONETIC
Create tokens for phonetic matches. See https://lucene.apache.org/core/4\_10\_3/analyzers-phonetic/org/apache/lucene/analysis/phonetic/package-tree.html.
PORTER_STEM
public static final TokenFilterName PORTER_STEM
Uses the Porter stemming algorithm to transform the token stream. See http://tartarus.org/~martin/PorterStemmer.
REVERSE
public static final TokenFilterName REVERSE
Reverses the token string. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html.
SCANDINAVIAN_FOLDING_NORMALIZATION
public static final TokenFilterName SCANDINAVIAN_FOLDING_NORMALIZATION
Folds Scandinavian characters ������->a and ����->o. It also discriminates against use of double vowels aa, ae, ao, oe and oo, leaving just the first one. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html.
SCANDINAVIAN_NORMALIZATION
public static final TokenFilterName SCANDINAVIAN_NORMALIZATION
Normalizes use of the interchangeable Scandinavian characters. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html.
SHINGLE
public static final TokenFilterName SHINGLE
Creates combinations of tokens as a single token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html.
SNOWBALL
public static final TokenFilterName SNOWBALL
A filter that stems words using a Snowball-generated stemmer. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html.
SORANI_NORMALIZATION
public static final TokenFilterName SORANI_NORMALIZATION
Normalizes the Unicode representation of Sorani text. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html.
STEMMER
public static final TokenFilterName STEMMER
Language specific stemming filter. See https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search\#TokenFilters.
STOPWORDS
public static final TokenFilterName STOPWORDS
Removes stop words from a token stream. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html.
TRIM
public static final TokenFilterName TRIM
Trims leading and trailing whitespace from tokens. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html.
TRUNCATE
public static final TokenFilterName TRUNCATE
Truncates the terms to a specific length. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html.
UNIQUE
public static final TokenFilterName UNIQUE
Filters out tokens with same text as the previous token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html.
UPPERCASE
public static final TokenFilterName UPPERCASE
Normalizes token text to upper case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html.
WORD_DELIMITER
public static final TokenFilterName WORD_DELIMITER
Splits words into subwords and performs optional transformations on subword groups.
Constructor Details
TokenFilterName
@Deprecated
public TokenFilterName()
Deprecated
Creates a new instance of TokenFilterName value.
Method Details
fromString
public static TokenFilterName fromString(String name)
Creates or finds a TokenFilterName from its string representation.
Parameters:
Returns:
values
public static Collection
Gets known TokenFilterName values.
Returns:
Applies to
Azure SDK for Java
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for