ToKey Class

Converts input values (words, numbers, etc.) to index in a dictionary.

Inheritance
nimbusml.internal.core.preprocessing._tokey.ToKey
ToKey
nimbusml.base_transform.BaseTransform
ToKey
sklearn.base.TransformerMixin
ToKey

Constructor

ToKey(max_num_terms=1000000, term=None, sort='ByOccurrence', text_key_values=False, columns=None, **params)

Parameters

columns

a dictionary of key-value pairs, where key is the output column name and value is the input column name.

  • Multiple key-value pairs are allowed.

  • Input column type: numeric or string.

  • Output column type:

    Key Type.

  • If the output column names are same as the input column names, then

simply specify columns as a list of strings.

The << operator can be used to set this value (see Column Operator)

For example

  • ToKey(columns={'out1':'input1', 'out2':'input2'})

  • ToKey() << {'out1':'input1', 'out2':'input2'}

For more details see Columns.

max_num_terms

Maximum number of keys to keep per column when auto- training.

term

List of terms.

sort

How items should be ordered when vectorized. By default, they will be in the order encountered. If by value items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').

text_key_values

Whether key value metadata should be text, regardless of the actual input type.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # ToKey
   import numpy
   from nimbusml import FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.preprocessing import ToKey

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32,
                                  names={0: 'id'})
   print(data.head())
   #    age  case education   id  induced  parity  pooled.stratum  spontaneous ...
   # 0  26.0   1.0    0-5yrs  1.0      1.0     6.0             3.0         2.0 ...
   # 1  42.0   1.0    0-5yrs  2.0      1.0     1.0             1.0         0.0 ...
   # 2  39.0   1.0    0-5yrs  3.0      2.0     6.0             4.0         0.0 ...
   # 3  34.0   1.0    0-5yrs  4.0      2.0     4.0             2.0         0.0  ..
   # 4  35.0   1.0   6-11yrs  5.0      1.0     3.0            32.0         1.0  ..

   # transform usage
   xf = ToKey(columns={'id_1': 'id', 'edu_1': 'education'})

   # fit and transform
   features = xf.fit_transform(data)
   print(features.head())
   #    age  case    edu_1 education   id  id_1  induced  parity  ...
   # 0  26.0   1.0   0-5yrs    0-5yrs  1.0     0      1.0     6.0 ...
   # 1  42.0   1.0   0-5yrs    0-5yrs  2.0     1      1.0     1.0 ...
   # 2  39.0   1.0   0-5yrs    0-5yrs  3.0     2      2.0     6.0 ...
   # 3  34.0   1.0   0-5yrs    0-5yrs  4.0     3      2.0     4.0 ...
   # 4  35.0   1.0  6-11yrs   6-11yrs  5.0     4      1.0     3.0 ...

Remarks

The ToKey transform converts a column of text to key values using a dictionary. This operation can be reversed by using FromKey to obtain the orginal values.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False