LexicalTokenizerName Class
- java.
lang. Object - com.
azure. core. util. ExpandableStringEnum<T> - com.
azure. search. documents. indexes. models. LexicalTokenizerName
- com.
- com.
public final class LexicalTokenizerName
extends ExpandableStringEnum<LexicalTokenizerName>
Defines the names of all tokenizers supported by the search engine.
Field Summary
Modifier and Type | Field and Description |
---|---|
static final
Lexical |
CLASSIC
Grammar-based tokenizer that is suitable for processing most European-language documents. |
static final
Lexical |
EDGE_NGRAM
Tokenizes the input from an edge into n-grams of the given size(s). |
static final
Lexical |
KEYWORD
Emits the entire input as a single token. |
static final
Lexical |
LETTER
Divides text at non-letters. |
static final
Lexical |
LOWERCASE
Divides text at non-letters and converts them to lower case. |
static final
Lexical |
MICROSOFT_LANGUAGE_STEMMING_TOKENIZER
Divides text using language-specific rules and reduces words to their base forms. |
static final
Lexical |
MICROSOFT_LANGUAGE_TOKENIZER
Divides text using language-specific rules. |
static final
Lexical |
NGRAM
Tokenizes the input into n-grams of the given size(s). |
static final
Lexical |
PATH_HIERARCHY
Tokenizer for path-like hierarchies. |
static final
Lexical |
PATTERN
Tokenizer that uses regex pattern matching to construct distinct tokens. |
static final
Lexical |
STANDARD
Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. |
static final
Lexical |
UAX_URL_EMAIL
Tokenizes urls and emails as one token. |
static final
Lexical |
WHITESPACE
Divides text at whitespace. |
Constructor Summary
Constructor | Description |
---|---|
LexicalTokenizerName() |
Deprecated
Use the fromString(String name) factory method.
Creates a new instance of Lexical |
Method Summary
Modifier and Type | Method and Description |
---|---|
static
Lexical |
fromString(String name)
Creates or finds a Lexical |
static
Collection<Lexical |
values()
Gets known Lexical |
Methods inherited from ExpandableStringEnum
Methods inherited from java.lang.Object
Field Details
CLASSIC
public static final LexicalTokenizerName CLASSIC
Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html.
EDGE_NGRAM
public static final LexicalTokenizerName EDGE_NGRAM
Tokenizes the input from an edge into n-grams of the given size(s). See https://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html.
KEYWORD
public static final LexicalTokenizerName KEYWORD
Emits the entire input as a single token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/KeywordTokenizer.html.
LETTER
public static final LexicalTokenizerName LETTER
Divides text at non-letters. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html.
LOWERCASE
public static final LexicalTokenizerName LOWERCASE
Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseTokenizer.html.
MICROSOFT_LANGUAGE_STEMMING_TOKENIZER
public static final LexicalTokenizerName MICROSOFT_LANGUAGE_STEMMING_TOKENIZER
Divides text using language-specific rules and reduces words to their base forms.
MICROSOFT_LANGUAGE_TOKENIZER
public static final LexicalTokenizerName MICROSOFT_LANGUAGE_TOKENIZER
Divides text using language-specific rules.
NGRAM
public static final LexicalTokenizerName NGRAM
Tokenizes the input into n-grams of the given size(s). See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html.
PATH_HIERARCHY
public static final LexicalTokenizerName PATH_HIERARCHY
Tokenizer for path-like hierarchies. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html.
PATTERN
public static final LexicalTokenizerName PATTERN
Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html.
STANDARD
public static final LexicalTokenizerName STANDARD
Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html.
UAX_URL_EMAIL
public static final LexicalTokenizerName UAX_URL_EMAIL
Tokenizes urls and emails as one token. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.html.
WHITESPACE
public static final LexicalTokenizerName WHITESPACE
Divides text at whitespace. See http://lucene.apache.org/core/4\_10\_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizer.html.
Constructor Details
LexicalTokenizerName
@Deprecated
public LexicalTokenizerName()
Deprecated
Creates a new instance of LexicalTokenizerName value.
Method Details
fromString
public static LexicalTokenizerName fromString(String name)
Creates or finds a LexicalTokenizerName from its string representation.
Parameters:
Returns:
values
public static Collection
Gets known LexicalTokenizerName values.
Returns:
Applies to
Azure SDK for Java
Commentaires
https://aka.ms/ContentUserFeedback.
Bientôt disponible : Tout au long de 2024, nous allons supprimer progressivement GitHub Issues comme mécanisme de commentaires pour le contenu et le remplacer par un nouveau système de commentaires. Pour plus d’informations, consultezEnvoyer et afficher des commentaires pour