SdcaMulticlassTrainerBase<TModel> Class
Definition
The IEstimator<TTransformer> to predict a target using a linear multiclass classifier model trained with a coordinate descent method. Depending on the used loss function, the trained model can be, for example, maximum entropy classifier or multiclass support vector machine.
public abstract class SdcaMulticlassTrainerBase<TModel> : Microsoft.ML.Trainers.SdcaTrainerBase<Microsoft.ML.Trainers.SdcaMulticlassTrainerBase<TModel>.MulticlassOptions,Microsoft.ML.Data.MulticlassPredictionTransformer<TModel>,TModel> where TModel : class
type SdcaMulticlassTrainerBase<'Model (requires 'Model : null)> = class
inherit SdcaTrainerBase<SdcaMulticlassTrainerBase<'Model>.MulticlassOptions, MulticlassPredictionTransformer<'Model>, 'Model (requires 'Model : null)>
Public MustInherit Class SdcaMulticlassTrainerBase(Of TModel)
Inherits SdcaTrainerBase(Of SdcaMulticlassTrainerBase(Of TModel).MulticlassOptions, MulticlassPredictionTransformer(Of TModel), TModel)
Type Parameters
 TModel
 Inheritance

SdcaMulticlassTrainerBase<TModel>
 Derived
Remarks
To create this trainer for maximum entropy classifier, use SdcaMaximumEntropy or SdcaMaximumEntropy(Options). To create this trainer for a loss function (such as support vector machine's hinge loss) of your choice, use SdcaNonCalibrated or SdcaNonCalibrated(Options).
Input and Output Columns
The input label column data must be key type and the feature column must be a knownsized vector of Single.
This trainer outputs the following columns:
Output Column Name  Column Type  Description 

Score 
Vector of Single  The scores of all classes. Higher value means higher probability to fall into the associated class. If the ith element has the largest value, the predicted label index would be i. Note that i is zerobased index. 
PredictedLabel 
key type  The predicted label's index. If its value is i, the actual label would be the ith category in the keyvalued input label type. 
Trainer Characteristics
Machine learning task  Multiclass classification 
Is normalization required?  Yes 
Is caching required?  No 
Required NuGet in addition to Microsoft.ML  None 
Scoring Function
This trains linear model to solve multiclass classification problems. Assume that the number of classes is $m$ and number of features is $n$. It assigns the $c$th class a coefficient vector $\textbf{w}_c \in {\mathbb R}^n$ and a bias $b_c \in {\mathbb R}$, for $c=1,\dots,m$. Given a feature vector $\textbf{x} \in {\mathbb R}^n$, the $c$th class's score would be $\hat{y}^c = \textbf{w}_c^T \textbf{x} + b_c$. If $\textbf{x}$ belongs to class $c$, then $\hat{y}^c$ should be much larger than 0. In contrast, a $\hat{y}^c$ much smaller than 0 means the desired label should not be $c$.
If and only if the trained model is a maximum entropy classifier, you can interpret the output score vector as the predicted class probabilities because softmax function may be applied to postprocess all classes' scores. More specifically, the probability of $\textbf{x}$ belonging to class $c$ is computed by $\tilde{P}( c  \textbf{x} ) = \frac{ e^{\hat{y}^c} }{ \sum_{c' = 1}^m e^{\hat{y}^{c'}} }$ and store at the $c$th element in the score vector. In other cases, the output score vector is just $[\hat{y}^1, \dots, \hat{y}^m]$.
Training Algorithm Details
The optimization algorithm is an extension of a coordinate descent method following a similar path proposed in an earlier paper. It is usually much faster than LBFGS and truncated Newton methods for largescale and sparse data sets.
This class uses empirical risk minimization (i.e., ERM) to formulate the optimization problem built upon collected data. Note that empirical risk is usually measured by applying a loss function on the model's predictions on collected data points. If the training data does not contain enough data points (for example, to train a linear model in $n$dimensional space, we need at least $n$ data points), overfitting may happen so that the model produced by ERM is good at describing training data but may fail to predict correct results in unseen events. Regularization is a common technique to alleviate such a phenomenon by penalizing the magnitude (usually measured by the norm function) of model parameters. This trainer supports elastic net regularization, which penalizes a linear combination of L1norm (LASSO), $ \textbf{w}_c _1$, and L2norm (ridge), $ \textbf{w}_c _2^2$ regularizations for $c=1,\dots,m$. L1norm and L2norm regularizations have different effects and uses that are complementary in certain respects.
Together with the implemented optimization algorithm, L1norm regularization can increase the sparsity of the model weights, $\textbf{w}_1,\dots,\textbf{w}_m$. For highdimensional and sparse data sets, if users carefully select the coefficient of L1norm, it is possible to achieve a good prediction quality with a model that has only a few nonzero weights (e.g., 1% of total model weights) without affecting its prediction power. In contrast, L2norm cannot increase the sparsity of the trained model but can still prevent overfitting by avoiding large parameter values. Sometimes, using L2norm leads to a better prediction quality, so users may still want to try it and fine tune the coefficients of L1norm and L2norm. Note that conceptually, using L1norm implies that the distribution of all model parameters is a Laplace distribution while L2norm implies a Gaussian distribution for them.
An aggressive regularization (that is, assigning large coefficients to L1norm or L2norm regularization terms) can harm predictive capacity by excluding important variables from the model. For example, a very large L1norm coefficient may force all parameters to be zeros and lead to a trivial model. Therefore, choosing the right regularization coefficients is important in practice.
Check the See Also section for links to usage examples.
Fields
FeatureColumn 
The feature column that the trainer expects. (Inherited from TrainerEstimatorBase<TTransformer,TModel>) 
LabelColumn 
The label column that the trainer expects. Can be 
WeightColumn 
The weight column that the trainer expects. Can be 
Properties
Info  (Inherited from StochasticTrainerBase<TTransformer,TModel>) 
Methods
Fit(IDataView) 
Trains and returns a ITransformer. (Inherited from TrainerEstimatorBase<TTransformer,TModel>) 
GetOutputSchema(SchemaShape)  (Inherited from TrainerEstimatorBase<TTransformer,TModel>) 
Extension Methods
WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) 
Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called. 
Applies to
See also
 SdcaMaximumEntropy(MulticlassClassificationCatalog+MulticlassClassificationTrainers, SdcaMaximumEntropyMulticlassTrainer+Options)
 SdcaMaximumEntropy(MulticlassClassificationCatalog+MulticlassClassificationTrainers, String, String, String, Nullable<Single>, Nullable<Single>, Nullable<Int32>)
 SdcaMaximumEntropyMulticlassTrainer.Options
 SdcaNonCalibrated(MulticlassClassificationCatalog+MulticlassClassificationTrainers, SdcaNonCalibratedMulticlassTrainer+Options)
 SdcaNonCalibrated(MulticlassClassificationCatalog+MulticlassClassificationTrainers, String, String, String, ISupportSdcaClassificationLoss, Nullable<Single>, Nullable<Single>, Nullable<Int32>)
 SdcaNonCalibratedMulticlassTrainer.Options
Feedback
Loading feedback...