LuceneStandardTokenizer Class
Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.
All required parameters must be populated in order to send to Azure.
- Inheritance
-
azure.search.documents.indexes._generated.models._models_py3.LexicalTokenizerLuceneStandardTokenizer
Constructor
LuceneStandardTokenizer(*, name: str, max_token_length: Optional[int] = 255, **kwargs)
Parameters
- odata_type
- str
Required
Required. Identifies the concrete type of the tokenizer.Constant filled by server.
- name
- str
Required
Required. The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
- max_token_length
- int
Required
The maximum token length. Default is 255. Tokens longer than the maximum length are split.
Feedback
Submit and view feedback for