TakeFilter Class

Take N first rows of the dataset, allowing limiting input to a subset of rows.

Inheritance
nimbusml.internal.core.preprocessing.filter._takefilter.TakeFilter
TakeFilter
nimbusml.base_transform.BaseTransform
TakeFilter
sklearn.base.TransformerMixin
TakeFilter

Constructor

TakeFilter(count, columns=None, **params)

Parameters

columns

a string representing the column name to perform the transformation on.

  • Input column type: numeric.

  • Output column type: numeric.

The << operator can be used to set this value (see Column Operator)

For example

  • TakeFilter(columns='age')

  • TakeFilter() << {'age'}

For more details see Columns.

count

number of rows to keep from the beginning of the dataset.

params

Additional arguments sent to compute engine.

Examples


   import numpy as np
   from nimbusml import FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.preprocessing.filter import SkipFilter, TakeFilter

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()
   data = FileDataStream.read_csv(
       path, sep=',', names={
           0: 'id'}, dtype={
           'id': str, 'age': np.float32})
   print(data.head())
   #    age  case education id  induced  parity  pooled.stratum  spontaneous  ...
   # 0  26.0     1    0-5yrs  1        1       6               3            2  ...
   # 1  42.0     1    0-5yrs  2        1       1               1            0  ...
   # 2  39.0     1    0-5yrs  3        2       6               4            0  ...
   # 3  34.0     1    0-5yrs  4        2       4               2            0  ...
   # 4  35.0     1   6-11yrs  5        1       3              32            1  ...

   # fit and transform
   print(TakeFilter(count=100).fit_transform(data).shape)
   # (100, 9), first 100 rows are preserved

   print(SkipFilter(count=100).fit_transform(data).shape)
   # (148, 9), first 100 rows are deleted

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False