KMeansTrainer
KMeansTrainer
KMeansTrainer
Class
Definition
The IEstimator<TTransformer> for training a KMeans clusterer
public class KMeansTrainer : Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.ClusteringPredictionTransformer<Microsoft.ML.Trainers.KMeansModelParameters>,Microsoft.ML.Trainers.KMeansModelParameters>
type KMeansTrainer = class
inherit TrainerEstimatorBase<ClusteringPredictionTransformer<KMeansModelParameters>, KMeansModelParameters>
Public Class KMeansTrainer
Inherits TrainerEstimatorBase(Of ClusteringPredictionTransformer(Of KMeansModelParameters), KMeansModelParameters)
 Inheritance

TrainerEstimatorBase<TTransformer,TModel>TrainerEstimatorBase<TTransformer,TModel>TrainerEstimatorBase<TTransformer,TModel>KMeansTrainerKMeansTrainerKMeansTrainer
Remarks
To create this trainer, use KMeans or Kmeans(Options).
Input and Output Columns
The input features column data must be Single. No label column needed. This trainer outputs the following columns:
Output Column Name  Column Type  Description 

Score 
vector of Single  The distances of the given data point to all clusters' centriods. 
PredictedLabel 
key type  The closest cluster's index predicted by the model. 
Trainer Characteristics
Machine learning task  Clustering 
Is normalization required?  Yes 
Is caching required?  Yes 
Required NuGet in addition to Microsoft.ML  None 
Training Algorithm Details
Kmeans is a popular clustering algorithm. With Kmeans, the data is clustered into a specified number of clusters in order to minimize the withincluster sum of squared distances. This implementation follows the Yinyang Kmeans method. For choosing the initial cluster centeroids, one of three options can be used:
 Random initialization. This might lead to potentially bad approximations of the optimal clustering.
 The Kmeans++ method. This is an improved initialization algorithm introduced here by Ding et al., that guarantees to find a solution that is $O(log K)$ competitive to the optimal Kmeans solution.
 The Kmeans method. This method was introduced here by Bahmani et al., and uses a parallel method that drastically reduces the number of passes needed to obtain a good initialization.
Kmeans is the default initialization method. The other methods can be specified in the Options when creating the trainer using KMeansTrainer(Options).
Scoring Function
The output Score column contains the $L_2$norm distance (i.e., Euclidean distance) of the given input vector $\textbf{x}\in \mathbb{R}^n$ to each cluster's centroid. Assume that the centriod of the $c$th cluster is $\textbf{m}_c \in \mathbb{R}^n$. The $c$th value at the Score column would be $d_c =  \textbf{x}  \textbf{m}_c _2^2$. The predicted label is the index with the smallest value in a $K$ dimensional vector $[d_{0}, \dots, d_{K1}]$, where $K$ is the number of clusters.
For more information on Kmeans, and Kmeans++ see: Kmeans Kmeans++
Check the See Also section for links to usage examples.
Properties
Info Info Info 
Fields
FeatureColumn FeatureColumn FeatureColumn 
The feature column that the trainer expects. (Inherited from TrainerEstimatorBase<TTransformer,TModel>) 
LabelColumn LabelColumn LabelColumn 
The label column that the trainer expects. Can be 
WeightColumn WeightColumn WeightColumn 
The weight column that the trainer expects. Can be 
Methods
Fit(IDataView) Fit(IDataView) Fit(IDataView) 
Trains and returns a ITransformer. (Inherited from TrainerEstimatorBase<TTransformer,TModel>) 
GetOutputSchema(SchemaShape) GetOutputSchema(SchemaShape) GetOutputSchema(SchemaShape)  Inherited from TrainerEstimatorBase<TTransformer,TModel> 
Extension Methods
WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) 
Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called. 
Applies to
See also
Feedback
Loading feedback...