WordTokenizer Class
Description The input to this transform is text, and the output is a vector of text containing the words (tokens) in the original text. The separator is space, but can be specified as any other character (or multiple characters) if needed.
- Inheritance
-
nimbusml.internal.core.preprocessing.text._wordtokenizer.WordTokenizerWordTokenizernimbusml.base_transform.BaseTransformWordTokenizersklearn.base.TransformerMixinWordTokenizer
Constructor
WordTokenizer(char_array_term_separators=None, columns=None, **params)
Parameters
- columns
see Columns.
- char_array_term_separators
Array of single character term separator(s). By default uses space character separator.
- params
Additional arguments sent to compute engine.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep
default value: False