Filter Class

Filters all rows where the input columns have value of NaN.

Inheritance
nimbusml.internal.core.preprocessing.missing_values._filter.Filter
Filter
nimbusml.base_transform.BaseTransform
Filter
sklearn.base.TransformerMixin
Filter

Constructor

Filter(complement=False, columns=None, **params)

Parameters

columns

a list of strings representing the column names to perform the transformation on.

The << operator can be used to set this value (see Column Operator)

For example

  • Filter(columns=['education', 'age'])

  • Filter() << ['education', 'age']

For more details see Columns.

complement

If true, keep only rows that contain NA values, and filter the rest.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # Filter
   import numpy as np
   import pandas as pd
   from nimbusml import FileDataStream
   from nimbusml.preprocessing.missing_values import Filter

   with_nans = pd.DataFrame(
       data=dict(
           Sepal_Length=[2.5, np.nan, 2.1, 1.0],
           Sepal_Width=[.75, .9, .8, .76],
           Petal_Length=[np.nan, 2.5, 2.6, 2.4],
           Petal_Width=[.8, .7, .9, 0.7]))

   # write NaNs to file to show how this transform work
   tmpfile = 'tmpfile_with_nans.csv'
   with_nans.to_csv(tmpfile, index=False)

   data = FileDataStream.read_csv(tmpfile, sep=',', numeric_dtype=np.float32)

   # transform usage
   xf = Filter(
       columns=[
           'Petal_Length',
           'Petal_Width',
           'Sepal_Length',
           'Sepal_Width'])

   # fit and transform
   features = xf.fit_transform(data)

   # print features
   print(features.head())
   #    Petal_Length  Petal_Width  Sepal_Length  Sepal_Width
   # 0           2.4          0.7           1.0         0.76

Remarks

Filter removes the entire row if any of the input columns have value of NaN in that row. This preprocessing is required for many ML algorithms that cannot work with NaNs. Useful if any NaN entry invalidates the entire row.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False