SdcaLogisticRegressionBinaryTrainer SdcaLogisticRegressionBinaryTrainer SdcaLogisticRegressionBinaryTrainer Class

Definition

The IEstimator<TTransformer> for training a binary logistic regression classification model using the stochastic dual coordinate ascent method. The trained model is calibrated and can produce probability by feeding the output value of the linear function to a PlattCalibrator.

public sealed class SdcaLogisticRegressionBinaryTrainer : Microsoft.ML.Trainers.SdcaBinaryTrainerBase<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LinearBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>
type SdcaLogisticRegressionBinaryTrainer = class
    inherit SdcaBinaryTrainerBase<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>
Public NotInheritable Class SdcaLogisticRegressionBinaryTrainer
Inherits SdcaBinaryTrainerBase(Of CalibratedModelParametersBase(Of LinearBinaryModelParameters, PlattCalibrator))
Inheritance

Remarks

To create this trainer, use SdcaLogisticRegression or SdcaLogisticRegression(Options).

Input and Output Columns

The input label column data must be Boolean. The input features column data must be a known-sized vector of Single.

This trainer outputs the following columns:

Output Column Name Column Type Description
Score Single The unbounded score that was calculated by the model.
PredictedLabel Boolean The predicted label, based on the sign of the score. A negative score maps to false and a positive score maps to true.
Probability Single The probability calculated by calibrating the score of having true as the label. Probability value is in range [0, 1].

Trainer Characteristics

Machine learning task Binary classification
Is normalization required? Yes
Is caching required? No
Required NuGet in addition to Microsoft.ML None

Training Algorithm Details

This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a state-of-the-art optimization technique for convex objective functions. The algorithm can be scaled because it's a streaming training algorithm as described in a KDD best paper.

Convergence is underwritten by periodically enforcing synchronization between primal and dual variables in a separate thread. Several choices of loss functions are also provided such as hinge-loss and logistic loss. Depending on the loss used, the trained model can be, for example, support vector machine or logistic regression. The SDCA method combines several of the best properties such the ability to do streaming learning (without fitting the entire data set into your memory), reaching a reasonable result with a few scans of the whole data set (for example, see experiments in this paper), and spending no computation on zeros in sparse data sets.

Note that SDCA is a stochastic and streaming optimization algorithm. The result depends on the order of training data because the stopping tolerance is not tight enough. In strongly-convex optimization, the optimal solution is unique and therefore everyone eventually reaches the same place. Even in non-strongly-convex cases, you will get equally-good solutions from run to run. For reproducible results, it is recommended that one sets 'Shuffle' to False and 'NumThreads' to 1.

Properties

Info Info Info

Fields

FeatureColumn FeatureColumn FeatureColumn

The feature column that the trainer expects.

(Inherited from TrainerEstimatorBase<TTransformer,TModel>)

LabelColumn LabelColumn LabelColumn

The label column that the trainer expects. Can be null, which indicates that label is not used for training.

(Inherited from TrainerEstimatorBase<TTransformer,TModel>)

WeightColumn WeightColumn WeightColumn

The weight column that the trainer expects. Can be null, which indicates that weight is not used for training.

(Inherited from TrainerEstimatorBase<TTransformer,TModel>)

Methods

Fit(IDataView) Fit(IDataView) Fit(IDataView)

Trains and returns a ITransformer.

(Inherited from TrainerEstimatorBase<TTransformer,TModel>)

GetOutputSchema(SchemaShape) GetOutputSchema(SchemaShape) GetOutputSchema(SchemaShape)

Extension Methods

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also