Indicator Class

Create a new column indicating if the input has missing values.

Inheritance
nimbusml.internal.core.preprocessing.missing_values._indicator.Indicator
Indicator
nimbusml.base_transform.BaseTransform
Indicator
sklearn.base.TransformerMixin
Indicator

Constructor

Indicator(columns=None, **params)

Parameters

columns

a dictionary of key-value pairs, where key is the output column name and value is the input column name.

  • Multiple key-value pairs are allowed.

  • Input column type:

    Vector Type.

  • Output column type:

    Vector Type.

  • If the output column names are same as the input column names, then

simply specify columns as a list of strings.

The << operator can be used to set this value (see Column Operator)

For example

  • Indicator(columns={'out1':'input1', 'out2':'input2'})

  • Indicator() << {'out1':'input1', 'out2':'input2'}

For more details see Columns.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # Indicator
   import numpy as np
   import pandas as pd
   from nimbusml import FileDataStream
   from nimbusml.preprocessing.missing_values import Indicator

   with_nans = pd.DataFrame(
       data=dict(
           Sepal_Length=[2.5, np.nan, 2.1, 1.0],
           Sepal_Width=[.75, .9, .8, .76],
           Petal_Length=[np.nan, 2.5, 2.6, 2.4],
           Petal_Width=[.8, .7, .9, 0.7],
           Species=["setosa", "viginica", "", 'versicolor']))

   # write NaNs to file to show how this transform work
   tmpfile = 'tmpfile_with_nans.csv'
   with_nans.to_csv(tmpfile, index=False)

   data = FileDataStream.read_csv(tmpfile, sep=',', numeric_dtype=np.float32)

   # transform usage
   xf = Indicator(columns={'PL': 'Petal_Length', 'SL': 'Sepal_Length'})

   # fit and transform
   features = xf.fit_transform(data)

   # print features
   print(features.head())

   #      PL  Petal_Length  Petal_Width     SL  ... Sepal_Width       Species
   # 0   True           NaN          0.8  False ...        0.75        setosa
   # 1  False           2.5          0.7   True ...        0.90      viginica
   # 2  False           2.6          0.9  False ...        0.80          None
   # 3  False           2.4          0.7  False ...        0.76    versicolor

Remarks

Indicator creates a new column containing indicator values ("True" or "False") of which rows have missing values.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False