ToKey Class

Reference

Converts input values (words, numbers, etc.) to index in a dictionary.

Inheritance: nimbusml.internal.core.preprocessing._tokey.ToKey

ToKey

nimbusml.base_transform.BaseTransform

ToKey

sklearn.base.TransformerMixin

ToKey

Constructor

ToKey(max_num_terms=1000000, term=None, sort='ByOccurrence', text_key_values=False, columns=None, **params)

Parameters

columns

a dictionary of key-value pairs, where key is the output column name and value is the input column name.

Multiple key-value pairs are allowed.
Input column type: numeric or string.
Output column type:

Key Type.
If the output column names are same as the input column names, then

simply specify columns as a list of strings.

The << operator can be used to set this value (see Column Operator)

For example

ToKey(columns={'out1':'input1', 'out2':'input2'})
ToKey() << {'out1':'input1', 'out2':'input2'}

For more details see Columns.

max_num_terms

Maximum number of keys to keep per column when auto- training.

term

List of terms.

sort

How items should be ordered when vectorized. By default, they will be in the order encountered. If by value items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').

text_key_values

Whether key value metadata should be text, regardless of the actual input type.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # ToKey
   import numpy
   from nimbusml import FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.preprocessing import ToKey

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32,
                                  names={0: 'id'})
   print(data.head())
   #    age  case education   id  induced  parity  pooled.stratum  spontaneous ...
   # 0  26.0   1.0    0-5yrs  1.0      1.0     6.0             3.0         2.0 ...
   # 1  42.0   1.0    0-5yrs  2.0      1.0     1.0             1.0         0.0 ...
   # 2  39.0   1.0    0-5yrs  3.0      2.0     6.0             4.0         0.0 ...
   # 3  34.0   1.0    0-5yrs  4.0      2.0     4.0             2.0         0.0  ..
   # 4  35.0   1.0   6-11yrs  5.0      1.0     3.0            32.0         1.0  ..

   # transform usage
   xf = ToKey(columns={'id_1': 'id', 'edu_1': 'education'})

   # fit and transform
   features = xf.fit_transform(data)
   print(features.head())
   #    age  case    edu_1 education   id  id_1  induced  parity  ...
   # 0  26.0   1.0   0-5yrs    0-5yrs  1.0     0      1.0     6.0 ...
   # 1  42.0   1.0   0-5yrs    0-5yrs  2.0     1      1.0     1.0 ...
   # 2  39.0   1.0   0-5yrs    0-5yrs  3.0     2      2.0     6.0 ...
   # 3  34.0   1.0   0-5yrs    0-5yrs  4.0     3      2.0     4.0 ...
   # 4  35.0   1.0  6-11yrs   6-11yrs  5.0     4      1.0     3.0 ...

Remarks

The ToKey transform converts a column of text to key values using a dictionary. This operation can be reversed by using FromKey to obtain the orginal values.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep

default value: False

Share via

ToKey Class

Constructor

Parameters

Examples

Remarks

Methods

get_params

Parameters

Additional resources