SdcaNonCalibratedBinaryTrainer クラス

定義

確率 IEstimator<TTransformer> 的二重座標上昇法を使用して二項ロジスティック回帰分類モデルをトレーニングするための 。

public sealed class SdcaNonCalibratedBinaryTrainer : Microsoft.ML.Trainers.SdcaBinaryTrainerBase<Microsoft.ML.Trainers.LinearBinaryModelParameters>
type SdcaNonCalibratedBinaryTrainer = class
    inherit SdcaBinaryTrainerBase<LinearBinaryModelParameters>
Public NotInheritable Class SdcaNonCalibratedBinaryTrainer
Inherits SdcaBinaryTrainerBase(Of LinearBinaryModelParameters)
継承

注釈

To create this trainer, use SdcaNonCalibrated or SdcaNonCalibrated(Options).

Input and Output Columns

The input label column data must be Boolean. The input features column data must be a known-sized vector of Single. This trainer outputs the following columns:

Output Column Name Column Type Description
Score Single The unbounded score that was calculated by the model.
PredictedLabel Boolean The predicted label, based on the sign of the score. A negative score maps to false and a positive score maps to true.

Trainer Characteristics

機械学習タスク 二項分類
正規化は必要ですか? はい
キャッシュは必要ですか? いいえ
Microsoft NuGetに必要な情報。ML なし
ONNX にエクスポート可能 はい

Training Algorithm Details

This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a state-of-the-art optimization technique for convex objective functions. The algorithm can be scaled because it's a streaming training algorithm as described in a KDD best paper.

Convergence is underwritten by periodically enforcing synchronization between primal and dual variables in a separate thread. Several choices of loss functions are also provided such as hinge-loss and logistic loss. Depending on the loss used, the trained model can be, for example, support vector machine or logistic regression. The SDCA method combines several of the best properties such the ability to do streaming learning (without fitting the entire data set into your memory), reaching a reasonable result with a few scans of the whole data set (for example, see experiments in this paper), and spending no computation on zeros in sparse data sets.

Note that SDCA is a stochastic and streaming optimization algorithm. The result depends on the order of training data because the stopping tolerance is not tight enough. In strongly-convex optimization, the optimal solution is unique and therefore everyone eventually reaches the same place. Even in non-strongly-convex cases, you will get equally-good solutions from run to run. For reproducible results, it is recommended that one sets 'Shuffle' to False and 'NumThreads' to 1.

This class uses empirical risk minimization (i.e., ERM) to formulate the optimization problem built upon collected data. Note that empirical risk is usually measured by applying a loss function on the model's predictions on collected data points. If the training data does not contain enough data points (for example, to train a linear model in $n$-dimensional space, we need at least $n$ data points), overfitting may happen so that the model produced by ERM is good at describing training data but may fail to predict correct results in unseen events. Regularization is a common technique to alleviate such a phenomenon by penalizing the magnitude (usually measured by the norm function) of model parameters. This trainer supports elastic net regularization, which penalizes a linear combination of L1-norm (LASSO), $|| \textbf{w}_c ||_1$, and L2-norm (ridge), $|| \textbf{w}_c ||_2^2$ regularizations for $c=1,\dots,m$. L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects.

Together with the implemented optimization algorithm, L1-norm regularization can increase the sparsity of the model weights, $\textbf{w}_1,\dots,\textbf{w}_m$. For high-dimensional and sparse data sets, if users carefully select the coefficient of L1-norm, it is possible to achieve a good prediction quality with a model that has only a few non-zero weights (e.g., 1% of total model weights) without affecting its prediction power. In contrast, L2-norm cannot increase the sparsity of the trained model but can still prevent overfitting by avoiding large parameter values. Sometimes, using L2-norm leads to a better prediction quality, so users may still want to try it and fine tune the coefficients of L1-norm and L2-norm. Note that conceptually, using L1-norm implies that the distribution of all model parameters is a Laplace distribution while L2-norm implies a Gaussian distribution for them.

An aggressive regularization (that is, assigning large coefficients to L1-norm or L2-norm regularization terms) can harm predictive capacity by excluding important variables from the model. For example, a very large L1-norm coefficient may force all parameters to be zeros and lead to a trivial model. Therefore, choosing the right regularization coefficients is important in practice.

For more information, see:

Check the See Also section for links to examples of the usage.

フィールド

FeatureColumn

トレーナーが期待する機能列。

(継承元 TrainerEstimatorBase<TTransformer,TModel>)
LabelColumn

トレーナーが想定するラベル列。 Nullにすることができます。これは、ラベルがトレーニングに使用されないことを示します。

(継承元 TrainerEstimatorBase<TTransformer,TModel>)
WeightColumn

トレーナーが想定する重み列。 Nullにすることができます。これは、重みがトレーニングに使用されないことを示します。

(継承元 TrainerEstimatorBase<TTransformer,TModel>)

プロパティ

Info (継承元 SdcaBinaryTrainerBase<TModelParameters>)

メソッド

Fit(IDataView)

をトレーニングし、を返し ITransformer ます。

(継承元 TrainerEstimatorBase<TTransformer,TModel>)
GetOutputSchema(SchemaShape) (継承元 TrainerEstimatorBase<TTransformer,TModel>)

拡張メソッド

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

"キャッシュチェックポイント" を推定チェーンに追加します。 これにより、ダウンストリームの estimators がキャッシュされたデータに対してトレーニングされます。 複数のデータパスを使用する場合は、トレーナーの前にキャッシュチェックポイントを用意することをお勧めします。

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

推定値を指定した場合は、が呼び出された後にデリゲートを呼び出すラップオブジェクトを返し Fit(IDataView) ます。 多くの場合、推定によってどのような情報が返されるかについての情報を返すことが重要です。これは、メソッドが、 Fit(IDataView) 単純なだけではなく、明示的に型指定されたオブジェクトを返すためです ITransformer 。 同時に、多くのオブジェクトを IEstimator<TTransformer> 含むパイプラインには、多くの場合、estimators のチェーンを構築することが必要になる場合があり EstimatorChain<TLastTransformer> ます。この場合、トランスフォーマーを取得する対象の推定は、このチェーンのどこかに埋もれています。 このシナリオでは、このメソッドを使用して、fit が呼び出されたときに呼び出されるデリゲートをアタッチできます。

適用対象

こちらもご覧ください