SdcaRegressionTrainer
SdcaRegressionTrainer
SdcaRegressionTrainer
Class
Definition
The IEstimator<TTransformer> for training a regression model using the stochastic dual coordinate ascent method.
public sealed class SdcaRegressionTrainer : Microsoft.ML.Trainers.SdcaTrainerBase<Microsoft.ML.Trainers.SdcaRegressionTrainer.Options,Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.Trainers.LinearRegressionModelParameters>,Microsoft.ML.Trainers.LinearRegressionModelParameters>
type SdcaRegressionTrainer = class
inherit SdcaTrainerBase<SdcaRegressionTrainer.Options, RegressionPredictionTransformer<LinearRegressionModelParameters>, LinearRegressionModelParameters>
Public NotInheritable Class SdcaRegressionTrainer
Inherits SdcaTrainerBase(Of SdcaRegressionTrainer.Options, RegressionPredictionTransformer(Of LinearRegressionModelParameters), LinearRegressionModelParameters)
 Inheritance

TrainerEstimatorBase<TTransformer,TModel>TrainerEstimatorBase<TTransformer,TModel>TrainerEstimatorBase<TTransformer,TModel>StochasticTrainerBase<TTransformer,TModel>StochasticTrainerBase<TTransformer,TModel>StochasticTrainerBase<TTransformer,TModel>SdcaTrainerBase<TOptions,TTransformer,TModel>SdcaTrainerBase<TOptions,TTransformer,TModel>SdcaTrainerBase<TOptions,TTransformer,TModel>SdcaRegressionTrainerSdcaRegressionTrainerSdcaRegressionTrainer
Remarks
To create this trainer, use Sdca or Sdca(Options).
Input and Output Columns
The input label column data must be Single. The input features column data must be a knownsized vector of Single.
This trainer outputs the following columns:
Output Column Name  Column Type  Description 

Score 
Single  The unbounded score that was predicted by the model. 
Trainer Characteristics
Machine learning task  Regression 
Is normalization required?  Yes 
Is caching required?  No 
Required NuGet in addition to Microsoft.ML  None 
Training Algorithm Details
This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a stateoftheart optimization technique for convex objective functions. The algorithm can be scaled because it's a streaming training algorithm as described in a KDD best paper.
Convergence is underwritten by periodically enforcing synchronization between primal and dual variables in a separate thread. Several choices of loss functions are also provided such as hingeloss and logistic loss. Depending on the loss used, the trained model can be, for example, support vector machine or logistic regression. The SDCA method combines several of the best properties such the ability to do streaming learning (without fitting the entire data set into your memory), reaching a reasonable result with a few scans of the whole data set (for example, see experiments in this paper), and spending no computation on zeros in sparse data sets.
Note that SDCA is a stochastic and streaming optimization algorithm. The result depends on the order of training data because the stopping tolerance is not tight enough. In stronglyconvex optimization, the optimal solution is unique and therefore everyone eventually reaches the same place. Even in nonstronglyconvex cases, you will get equallygood solutions from run to run. For reproducible results, it is recommended that one sets 'Shuffle' to False and 'NumThreads' to 1.
This class uses empricial risk minimization (i.e., ERM) to formulate the optimization problem built upon collected data. Note that empricial risk is usually measured by applying a loss function on the model's predictions on collected data points. If the training data does not contain enough data points (for example, to train a linear model in $n$dimensional space, we need at least $n$ data points), overfitting may happen so that the model produced by ERM is good at describing training data but may fail to predict correct results in unseen events. Regularization is a common technique to alleviate such a phenomenon by penalizing the magnitude (usually measured by the norm function) of model parameters. This trainer supports elastic net regularization, which penalizes a linear combination of L1norm (LASSO), $ \textbf{w}_c _1$, and L2norm (ridge), $ \textbf{w}_c _2^2$ regularizations for $c=1,\dots,m$. L1norm and L2norm regularizations have different effects and uses that are complementary in certain respects.
Together with the implemented optimization algorithm, L1norm regularization can increase the sparsity of the model weights, $\textbf{w}_1,\dots,\textbf{w}_m$. For highdimensional and sparse data sets, if users carefully select the coefficient of L1norm, it is possible to achieve a good prediction quality with a model that has only a few nonzero weights (e.g., 1% of total model weights) without affecting its prediction power. In contrast, L2norm cannot increase the sparsity of the trained model but can still prevent overfitting by avoiding large parameter values. Sometimes, using L2norm leads to a better prediction quality, so users may still want to try it and fine tune the coefficients of L1norm and L2norm. Note that conceptually, using L1norm implies that the distribution of all model parameters is a Laplace distribution while L2norm implies a Gaussian distribution for them.
An aggressive regularization (that is, assigning large coefficients to L1norm or L2norm regularization terms) can harm predictive capacity by excluding important variables from the model. For example, a very large L1norm coefficient may force all parameters to be zeros and lead to a trivial model. Therefore, choosing the right regularization coefficients is important in practice.
For more information, see:
 Scaling Up Stochastic Dual Coordinate Ascent.
 Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization.
Check the See Also section for links to examples of the usage.
Properties
Info Info Info
Fields
FeatureColumn FeatureColumn FeatureColumn
The feature column that the trainer expects.
(Inherited from TrainerEstimatorBase<TTransformer,TModel>)LabelColumn LabelColumn LabelColumn
The label column that the trainer expects. Can be null
, which indicates that label
is not used for training.
WeightColumn WeightColumn WeightColumn
The weight column that the trainer expects. Can be null
, which indicates that weight is
not used for training.
Methods
Fit(IDataView) Fit(IDataView) Fit(IDataView)
Trains and returns a ITransformer.
(Inherited from TrainerEstimatorBase<TTransformer,TModel>)GetOutputSchema(SchemaShape) GetOutputSchema(SchemaShape) GetOutputSchema(SchemaShape)
Extension Methods
WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) 
Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called. 
Applies to
See also
 Sdca(RegressionCatalog+RegressionTrainers, String, String, String, ISupportSdcaRegressionLoss, Nullable<Single>, Nullable<Single>, Nullable<Int32>)
 Sdca(RegressionCatalog+RegressionTrainers, SdcaRegressionTrainer+Options)
 SdcaRegressionTrainer.Options SdcaRegressionTrainer.Options SdcaRegressionTrainer.Options
Feedback
Loading feedback...