AveragedPerceptronBinaryClassifier Class

Machine Learning Averaged Perceptron Binary Classifier

Inheritance
nimbusml.internal.core.linear_model._averagedperceptronbinaryclassifier.AveragedPerceptronBinaryClassifier
AveragedPerceptronBinaryClassifier
nimbusml.base_predictor.BasePredictor
AveragedPerceptronBinaryClassifier
sklearn.base.ClassifierMixin
AveragedPerceptronBinaryClassifier

Constructor

AveragedPerceptronBinaryClassifier(normalize='Auto', caching='Auto', loss='hinge', learning_rate=1.0, decrease_learning_rate=False, l2_regularization=0.0, number_of_iterations=1, initial_weights_diameter=0.0, reset_weights_after_x_examples=None, lazy_update=True, recency_gain=0.0, recency_gain_multiplicative=False, averaged=True, averaged_tolerance=0.01, initial_weights=None, shuffle=True, feature=None, label=None, **params)

Parameters

feature

see Columns.

label

see Columns.

normalize

Specifies the type of automatic normalization used:

  • "Auto": if normalization is needed, it is performed automatically. This is the default choice.

  • "No": no normalization is performed.

  • "Yes": normalization is performed.

  • "Warn": if normalization is needed, a warning message is displayed, but normalization is not performed.

Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0 and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

loss

The default is Hinge. Other choices are Exp, Log, and SmoothedHinge. For more information, please see the documentation page about losses, Loss.

learning_rate

Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.

decrease_learning_rate

Decrease learning rate.

l2_regularization

L2 Regularization Weight.

number_of_iterations

Number of iterations.

initial_weights_diameter

Sets the initial weights diameter that specifies the range from which values are drawn for the initial weights. These weights are initialized randomly from within this range. For example, if the diameter is specified to be d, then the weights are uniformly distributed between -d/2 and d/2. The default value is 0, which specifies that all the weights are set to zero.

reset_weights_after_x_examples

Number of examples after which weights will be reset to the current average.

lazy_update

Instead of updating averaged weights on every example, only update when loss is nonzero.

recency_gain

Extra weight given to more recent updates.

recency_gain_multiplicative

Whether Recency Gain is multiplicative (vs. additive).

averaged

Do averaging?.

averaged_tolerance

The inexactness tolerance for averaging.

initial_weights

Initial Weights and bias, comma-separated.

shuffle

Whether to shuffle for each training iteration.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # AveragedPerceptronBinaryClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.linear_model import AveragedPerceptronBinaryClassifier

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #   age  case education  induced  parity   ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6  ...       1            2  ...
   # 1   42     1    0-5yrs        1       1  ...       2            0  ...
   # 2   39     1    0-5yrs        2       6  ...       3            0  ...
   # 3   34     1    0-5yrs        2       4  ...       4            0  ...
   # 4   35     1   6-11yrs        1       3  ...       5            1  ...
   # define the training pipeline
   pipeline = Pipeline([AveragedPerceptronBinaryClassifier(
       feature=['age', 'parity', 'spontaneous'], label='case')])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel     Score
   # 0               0 -0.285667
   # 1               0 -1.304729
   # 2               0 -2.651955
   # 3               0 -2.111450
   # 4               0 -0.660658
   # print evaluation metrics
   print(metrics)
   #        AUC  Accuracy  Positive precision  Positive recall  ...
   # 0  0.705038   0.71371                 0.7         0.253012  ...

Remarks

Perceptron is a classification algorithm that makes its predictions based on a linear function. I.e., for an instance with feature values f0, f1,..., f_D-1, , the prediction is given by the sign of sigma[0,D-1] ( w_i * f_i), where w_0, w_1,...,w_D-1 are the weights computed by the algorithm.

Perceptron is an online algorithm, i.e., it processes the instances in the training set one at a time. The weights are initialized to be 0, or some random values. Then, for each example in the training set, the value of sigma[0, D-1] (w_i * f_i) is computed. If this value has the same sign as the label of the current example, the weights remain the same. If they have opposite signs, the weights vector is updated by either subtracting or adding (if the label is negative or positive, respectively) the feature vector of the current example, multiplied by a factor 0 < a <= 1, called the learning rate. In a generalization of this algorithm, the weights are updated by adding the feature vector multiplied by the learning rate, and by the gradient of some loss function (in the specific case described above, the loss is hinge- loss, whose gradient is 1 when it is non-zero).

In Averaged Perceptron (AKA voted-perceptron), the weight vectors are stored, together with a weight that counts the number of iterations it survived (this is equivalent to storing the weight vector after every iteration, regardless of whether it was updated or not). The prediction is then calculated by taking the weighted average of all the sums sigma[0, D-1] (w_i * f_i) or the different weight vectors.

Reference

Wikipedia entry for Perceptron

Large Margin Classification Using the Perceptron Algorithm

Discriminative Training Methods for Hidden Markov Models

Methods

decision_function

Returns score values

get_params

Get the parameters for this operator.

predict_proba

Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)