WordDelimiterTokenFilter Class

Splits words into subwords and performs optional transformations on subword groups. This token filter is implemented using Apache Lucene.

All required parameters must be populated in order to send to Azure.

Inheritance
azure.search.documents.indexes._generated.models._models_py3.TokenFilter
WordDelimiterTokenFilter

Constructor

WordDelimiterTokenFilter(*, name: str, generate_word_parts: Optional[bool] = True, generate_number_parts: Optional[bool] = True, catenate_words: Optional[bool] = False, catenate_numbers: Optional[bool] = False, catenate_all: Optional[bool] = False, split_on_case_change: Optional[bool] = True, preserve_original: Optional[bool] = False, split_on_numerics: Optional[bool] = True, stem_english_possessive: Optional[bool] = True, protected_words: Optional[List[str]] = None, **kwargs)

Parameters

odata_type
str
Required

Required. Identifies the concrete type of the token filter.Constant filled by server.

name
str
Required

Required. The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.

generate_word_parts
bool
Required

A value indicating whether to generate part words. If set, causes parts of words to be generated; for example "AzureSearch" becomes "Azure" "Search". Default is true.

generate_number_parts
bool
Required

A value indicating whether to generate number subwords. Default is true.

catenate_words
bool
Required

A value indicating whether maximum runs of word parts will be catenated. For example, if this is set to true, "Azure-Search" becomes "AzureSearch". Default is false.

catenate_numbers
bool
Required

A value indicating whether maximum runs of number parts will be catenated. For example, if this is set to true, "1-2" becomes "12". Default is false.

catenate_all
bool
Required

A value indicating whether all subword parts will be catenated. For example, if this is set to true, "Azure-Search-1" becomes "AzureSearch1". Default is false.

split_on_case_change
bool
Required

A value indicating whether to split words on caseChange. For example, if this is set to true, "AzureSearch" becomes "Azure" "Search". Default is true.

preserve_original
bool
Required

A value indicating whether original words will be preserved and added to the subword list. Default is false.

split_on_numerics
bool
Required

A value indicating whether to split on numbers. For example, if this is set to true, "Azure1Search" becomes "Azure" "1" "Search". Default is true.

stem_english_possessive
bool
Required

A value indicating whether to remove trailing "'s" for each subword. Default is true.

protected_words
list[str]
Required

A list of tokens to protect from being delimited.