Binner Class
Normalizes columns as specified below.
- Inheritance
-
nimbusml.internal.core.preprocessing.normalization._binner.BinnerBinnernimbusml.base_transform.BaseTransformBinnersklearn.base.TransformerMixinBinner
Constructor
Binner(num_bins=1024, fix_zero=True, max_training_examples=1000000000, columns=None, **params)
Parameters
- columns
a dictionary of key-value pairs, where key is the output column name and value is the input column name.
Multiple key-value pairs are allowed.
Input column type: numeric or
Output column type: numeric or
If the output column names are same as the input column names, then
simply specify columns
as a list of strings.
The << operator can be used to set this value (see Column Operator)
For example
Binner(columns={'out1':'input1', 'out2':'input2'})
Binner() << {'out1':'input1', 'out2':'input2'}
For more details see Columns.
- num_bins
Max number of bins, power of 2 recommended.
- fix_zero
Whether to map zero to zero, preserving sparsity.
- max_training_examples
Max number of examples used to train the normalizer.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# Binner
import numpy
from nimbusml import FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.preprocessing.normalization import Binner
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(
path,
sep=',',
numeric_dtype=numpy.float32) # Error with integer input
print(data.head())
# age case education induced parity pooled.stratum row_num ...
# 0 26.0 1.0 0-5yrs 1.0 6.0 3.0 1.0 ...
# 1 42.0 1.0 0-5yrs 1.0 1.0 1.0 2.0 ...
# 2 39.0 1.0 0-5yrs 2.0 6.0 4.0 3.0 ...
# 3 34.0 1.0 0-5yrs 2.0 4.0 2.0 4.0 ...
# 4 35.0 1.0 6-11yrs 1.0 3.0 32.0 5.0 ...
xf = Binner(columns={'in': 'induced', 'sp': 'spontaneous'})
# fit and transform
features = xf.fit_transform(data)
# print features
print(features.head())
# age case education in induced parity ... row_num sp ...
# 0 26.0 1.0 0-5yrs 0.5 1.0 6.0 ... 1.0 1.0 ...
# 1 42.0 1.0 0-5yrs 0.5 1.0 1.0 ... 2.0 0.0 ...
# 2 39.0 1.0 0-5yrs 1.0 2.0 6.0 ... 3.0 0.0 ...
# 3 34.0 1.0 0-5yrs 1.0 2.0 4.0 ... 4.0 0.0 ...
# 4 35.0 1.0 6-11yrs 0.5 1.0 3.0 ... 5.0 0.5 ...
Remarks
In linear classification algorithms instances are viewed as vectors in multi-dimensional space. Since the range of values of raw data varies widely, some objective functions do not work properly without normalization. For example, if one of the features has a broad range of values, the distances between points is governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance. This can provide significant speedup and accuracy benefits. In all the linear algorithms in nimbusml (LogisticRegressionClassifier, AveragedPerceptronBinaryClassifier, etc.), the default is to normalize features before training.
Binner
creates equi-density bins, and then normalizes every
value in the bin to be divided by the total number of bins. The
number of bins the normalizer uses can be defined by the user, and
the
default is 1000.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep