DictionaryDecompounderTokenFilter Class
Decomposes compound words found in many Germanic languages. This token filter is implemented using Apache Lucene.
All required parameters must be populated in order to send to Azure.
- Inheritance
-
azure.search.documents.indexes._generated.models._models_py3.TokenFilterDictionaryDecompounderTokenFilter
Constructor
DictionaryDecompounderTokenFilter(*, name: str, word_list: List[str], min_word_size: Optional[int] = 5, min_subword_size: Optional[int] = 2, max_subword_size: Optional[int] = 15, only_longest_match: Optional[bool] = False, **kwargs)
Parameters
- odata_type
- str
Required. Identifies the concrete type of the token filter.Constant filled by server.
- name
- str
Required. The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
- min_word_size
- int
The minimum word size. Only words longer than this get processed. Default is 5. Maximum is 300.
- min_subword_size
- int
The minimum subword size. Only subwords longer than this are outputted. Default is 2. Maximum is 300.
- max_subword_size
- int
The maximum subword size. Only subwords shorter than this are outputted. Default is 15. Maximum is 300.
- only_longest_match
- bool
A value indicating whether to add only the longest matching subword to the output. Default is false.
Feedback
Submit and view feedback for