Machine Learning - Initialize Model - Clustering

This article describes the modules in Azure Machine Learning Studio that support creation of clustering models.

Note

Applies to: Machine Learning Studio

This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.

What is clustering?

Clustering, in machine learning, is a method of grouping data points into similar clusters. It is also called segmentation.

Over the years, many clustering algorithms have been developed. Almost all clustering algorithms use the features of individual items to find similar items. For example, you might apply clustering to find similar people by demographics. You might use clustering with text analysis to group sentences with similar topics or sentiment.

Clustering is called a non-supervised learning technique because it can be used in unlabeled data. Indeed, clustering is a useful first step for discovering new patterns, and requires little prior knowledge about how the data might be structured or how items are related. Clustering is often used for exploration of data prior to analysis with other more predictive algorithms.

How to create a clustering model

In Machine Learning Studio, you can use clustering with either labeled or unlabeled data.

  • In unlabeled data, the clustering algorithm determines which data points are closest together, and creates clusters around a central point, or centroid. You can then use the cluster ID as a temporary label for the group of data.

  • If the data has labels, you can use the label to drive the number of clusters, or use the label as just another feature.

After you have configured the clustering algorithm, you train it on data by using either the Train Clustering Model or Sweep Clustering modules.

When the model is trained, use it to predict cluster membership for new data points. For example, if you have used clustering to group customers by purchasing behavior, you can use the model to predict the purchasing behavior of new customers.

List of modules

The clustering category includes this module:

To use a different clustering algorithm, or create a custom clustering model by using R, see these topics:

Examples

For examples of clustering in action, see the Azure AI Gallery.

See these articles for help choosing an algorithm:

See also