PrefixColumnConcatenator Class

Combines several columns into a single vector-valued column by prefix.

Inheritance
nimbusml.internal.core.preprocessing.schema._prefixcolumnconcatenator.PrefixColumnConcatenator
PrefixColumnConcatenator
nimbusml.base_transform.BaseTransform
PrefixColumnConcatenator
sklearn.base.TransformerMixin
PrefixColumnConcatenator

Constructor

PrefixColumnConcatenator(columns=None, **params)

Parameters

columns

a dictionary of key-value pairs, where key is the output column name and value is a list of input column names.

  • Only one key-value pair is allowed.

  • Input column type: numeric or string.

  • Output column type:

Vector Type.

The << operator can be used to set this value (see Column Operator)

For example

  • ColumnConcatenator(columns={'features': ['age', 'parity',

'induced']})

  • ColumnConcatenator() << {'features': ['age', 'parity',

'induced']})

For more details see Columns.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # PrefixColumnConcatenator
   import numpy as np
   import pandas as pd
   from nimbusml.preprocessing.schema import PrefixColumnConcatenator

   data = pd.DataFrame(
       data=dict(
           PrefixA=[2.5, np.nan, 2.1, 1.0],
           PrefixB=[.75, .9, .8, .76],
           AnotherColumn=[np.nan, 2.5, 2.6, 2.4]))

   # transform usage
   xf = PrefixColumnConcatenator(columns={'combined': 'Prefix'})

   # fit and transform
   features = xf.fit_transform(data)

   # print features
   print(features.head())
   #   PrefixA  PrefixB  AnotherColumn  combined.PrefixA  combined.PrefixB
   #0      2.5     0.75            NaN               2.5              0.75
   #1      NaN     0.90            2.5               NaN              0.90
   #2      2.1     0.80            2.6               2.1              0.80
   #3      1.0     0.76            2.4               1.0              0.76

Remarks

PrefixColumnConcatenator creates a single vector-valued column from multiple columns. It can be performed on data before training a model. The concatenation can significantly speed up the processing of data when the number of columns is as large as hundreds to thousands.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False