WordTokenizer Class

Description The input to this transform is text, and the output is a vector of text containing the words (tokens) in the original text. The separator is space, but can be specified as any other character (or multiple characters) if needed.

Inheritance
nimbusml.internal.core.preprocessing.text._wordtokenizer.WordTokenizer
WordTokenizer
nimbusml.base_transform.BaseTransform
WordTokenizer
sklearn.base.TransformerMixin
WordTokenizer

Constructor

WordTokenizer(char_array_term_separators=None, columns=None, **params)

Parameters

columns

see Columns.

char_array_term_separators

Array of single character term separator(s). By default uses space character separator.

params

Additional arguments sent to compute engine.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False