LogisticRegressionBinaryClassifier Class

Machine Learning Logistic Regression

Inheritance
nimbusml.internal.core.linear_model._logisticregressionbinaryclassifier.LogisticRegressionBinaryClassifier
LogisticRegressionBinaryClassifier
nimbusml.base_predictor.BasePredictor
LogisticRegressionBinaryClassifier
sklearn.base.ClassifierMixin
LogisticRegressionBinaryClassifier

Constructor

LogisticRegressionBinaryClassifier(normalize='Auto', caching='Auto', show_training_statistics=False, l2_regularization=1.0, l1_regularization=1.0, optimization_tolerance=1e-07, history_size=20, enforce_non_negativity=False, initial_weights_diameter=0.0, maximum_number_of_iterations=2147483647, stochastic_gradient_descent_initilaization_tolerance=0.0, quiet=False, use_threads=True, number_of_threads=None, dense_optimizer=False, feature=None, label=None, weight=None, **params)

Parameters

feature

see Columns.

label

see Columns.

weight

see Columns.

normalize

If Auto, the choice to normalize depends on the preference declared by the algorithm. This is the default choice. If No, no normalization is performed. If Yes, normalization always performed. If Warn, if normalization is needed by the algorithm, a warning message is displayed but normalization is not performed. If normalization is performed, a MaxMin normalizer is used. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

show_training_statistics

Show statistics of training examples.

l2_regularization

L2 regularization weight.

l1_regularization

L1 regularization weight.

optimization_tolerance

Tolerance parameter for optimization convergence. Low = slower, more accurate.

history_size

Memory size for L-BFGS. Lower=faster, less accurate. The technique used for optimization here is L-BFGS, which uses only a limited amount of memory to compute the next step direction. This parameter indicates the number of past positions and gradients to store for the computation of the next step. Must be greater than or equal to 1.

enforce_non_negativity

Enforce non-negative weights. This flag, however, does not put any constraint on the bias term; that is, the bias term can be still a negtaive number.

initial_weights_diameter

Sets the initial weights diameter that specifies the range from which values are drawn for the initial weights. These weights are initialized randomly from within this range. For example, if the diameter is specified to be d, then the weights are uniformly distributed between -d/2 and d/2. The default value is 0, which specifies that all the weights are set to zero.

maximum_number_of_iterations

Maximum iterations.

stochastic_gradient_descent_initilaization_tolerance

Run SGD to initialize LR weights, converging to this tolerance.

quiet

If set to true, produce no output during training.

use_threads

Whether or not to use threads. Default is true.

number_of_threads

Number of threads.

dense_optimizer

If True, forces densification of the internal optimization vectors. If False, enables the logistic regression optimizer use sparse or dense internal states as it finds appropriate. Setting denseOptimizer to True requires the internal optimizer to use a dense internal state, which may help alleviate load on the garbage collector for some varieties of larger problems.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # LogisticRegressionBinaryClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotVectorizer
   from nimbusml.linear_model import LogisticRegressionBinaryClassifier

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #    age  case education  induced  parity ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       LogisticRegressionBinaryClassifier(feature=['parity', 'edu'], label='case')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel  Probability     Score
   # 0               0     0.334679 -0.687098
   # 1               0     0.334679 -0.687098
   # 2               0     0.334679 -0.687098
   # 3               0     0.334679 -0.687098
   # 4               0     0.334679 -0.687098
   # print evaluation metrics
   print(metrics)
   #   AUC  Accuracy  Positive precision  Positive recall  Negative precision  ...
   # 0  0.5  0.665323                   0                0           0.665323  ...

Remarks

Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.

The optimization technique used for LogisticRegressionBinaryClassifier is the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). Both the L-BFGS and regular BFGS algorithms use quasi-Newtonian methods to estimate the computationally intensive Hessian matrix in the equation used by Newton's method to calculate steps. But the L-BFGS approximation uses only a limited amount of memory to compute the next step direction, so that it is especially suited for problems with a large number of variables. The memory_size parameter specifies the number of past positions and gradients to store for use in the computation of the next step.

This learner can use elastic net regularization: a linear combination of L1 (lasso) and L2 (ridge) regularizations. Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing models with extreme coefficient values. This can improve the generalization of the model learned by selecting the optimal complexity in the bias-variance tradeoff. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less. L1 and L2 regularization have different effects and uses that are complementary in certain respects.

  • l1_weight: can be applied to sparse models, when working with high-dimensional data. It pulls small weights associated

features that are relatively unimportant towards 0.

  • l2_weight: is preferable for data that is not sparse. It pulls large weights towards zero.

Adding the ridge penalty to the regularization overcomes some of lasso's limitations. It can improve its predictive accuracy, for example, when the number of predictors is greater than the sample size. If x = l1_weight and y = l2_weight, ax + by = c defines the linear span of the regularization terms. The default values of x and y are both 1. An agressive regularization can harm predictive capacity by excluding important variables out of the model. So choosing the optimal values for the regularization parameters is important for the performance of the logistic regression model.

Reference

Wikipedia: L-BFGS

Wikipedia: Logistic regression

Scalable Training of L1-Regularized Log-Linear Models

Test Run - L1 and L2 Regularization for Machine Learning

Methods

decision_function

Returns score values

get_params

Get the parameters for this operator.

predict_proba

Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)