Train Clustering Model

Article
05/06/2019

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning.

ML Studio (classic) documentation is being retired and may not be updated in the future.

Trains a clustering model and assigns data from the training set to clusters

Category: Machine Learning / Train

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Module overview

This article describes how to use the Train Clustering Model module in Machine Learning Studio (classic), to train a clustering model.

The module takes an untrained clustering model that you have already configured using the K-Means Clustering module, and trains the model using a labeled or unlabeled data set. The module creates both a trained model that you can use for prediction, and a set of cluster assignments for each case in the training data.

Note

A clustering model cannnot be trained using the Train Model module, which is the generic module for creating machine learning models. That is because Train Model works only with supervised learning algorithms. K-means and other clustering algorithms allow unsupervised learning, meaning that the algorithm can learn from unlabeled data.

How to use Train Clustering Model

Add the Train Clustering Model module to your experiment in Studio (classic). You can find the module under Machine Learning Modules, in the Train category.
Add the K-Means Clustering module, or another custom module that creates a compatible clustering model, and set the parameters of the clustering model.
Attach a training dataset to the right-hand input of Train Clustering Model.
In Column Set, select the columns from the dataset to use in building clusters. Be sure to select columns that make good features: for example, avoid using IDs or other columns that have unique values, or columns that have all the same values.

If a label is available, you can either use it as a feature, or leave it out.
Select the option, Check for Append or Uncheck for Result Only, if you want to output the training data together with the new cluster label.

If you deselect this option, only the cluster assignments are output.
Run the experiment, or click the Train Clustering Model module and select Run Selected.

Results

After training has completed:

To view the cluster and their separation in a graph, right-click the Results dataset output and select Visualize.

The graph represents the principal components of the cluster, rather than the actual values. See Principal Component Analysis for more information.
To view the values in the dataset, add an instance of the Convert to Dataset module, and connect it to the Results dataset output. Run the Convert to Dataset module to get a copy of the data that you can view or download.
To save the trained model for later re-use, right-click the module, select Trained model, and click Save As Trained Model.
To generate scores from the model, use Assign Data to Clusters.

Examples

For an example of how clustering is used in machine learning, see the Azure AI Gallery:

Clustering: Find similar Companies: Demonstrates how to use clustering on attributes derived from unstructured text.
Clustering: Color quantization: Demonstrates how to use clustering to find related colors and reduce the number of bits used in images.
Clustering: Group iris data: Provides a simple example of clustering based on the iris dataset.

Expected inputs

Name	Type	Description
Untrained model	ICluster interface	Untrained clustering model
Dataset	Data Table	Input data source

Module parameters

Name	Range	Type	Default	Description
Column Set	any	ColumnSelection		Column selection pattern
Check for Append or Uncheck for Result Only	any	Boolean	true	Whether output dataset must contain input dataset appended by assignments column (Checked) or assignments column only (Unchecked)

Outputs

Name	Type	Description
Trained model	ICluster interface	Trained clustering model
Results dataset	Data Table	Input dataset appended by data column of assignments or assignments column only

Exceptions

Exception	Description
Error 0003	Exception occurs if one or more of inputs are null or empty.

For a list of errors specific to Studio (classic) modules, see Machine Learning Error codes.

For a list of API exceptions, see Machine Learning REST API Error Codes.