Assign to Clusters (deprecated)

Assigns data to clusters using an existing trained clustering model

Category: Deprecated Modules and Features

Note

Applies to: Machine Learning Studio (classic)

This content pertains only to Studio (classic). Similar drag and drop modules have been added to Azure Machine Learning designer (preview). Learn more in this article comparing the two versions.

Module overview

This article describes how to use the Assign to Clusters module in Azure Machine Learning Studio (classic), to generate predictions using a trained clustering model, based on the K-Means clustering algorithm included in Studio.

The module returns the probable assignment for each new data point, based on the trained model.

Note

This module has been deprecated, and is available solely for compatibility with existing experiments. For new and updated experiments, we recommend that you use Assign Data to Clusters.

How to use Assign to Clusters

Use of this module requires that you have already configured a clustering model in Studio using the K-Means Clustering module.

  1. Add the Assign to Clusters module to your experiment, and attach the trained model to the left input port.

  2. Provide an unlabeled data set as input.

    By default, all columns that are used in the input dataset are returned in the results. If you want to use fewer columns when creating cluster predictions, use Select Columns in Dataset to select a subset of the columns.

    However, an error occurs if the dataset provided as input to the Assign to Clusters module doesn't contain all columns that were used in training the clustering model.

  3. For Column set, open the column selector and choose the columns that should be used as input to the trained clustering model.

    If the original models used eight feature columns to cluster cases, you must provide those same eight columns as input to Assign to Clusters.

  4. Leave the Check for Append or Uncheck for Result Only option selected if you want to add the cluster assignments to the input dataset.

    Deselect this option if you want just the results (cluster assignments). This is the likely option when running in a web service.

  5. Run the experiment.

Results

The Assign to Clusters module returns the cluster assignments for each case, in the Assignments column appended at the right-hand side of the dataset.

If you do not want all the columns in the results, change the output option to get only the results. For example, when making predictions as part of a web service you might want to return only the predicted assignment.

The following table shows the typical results of Assign to Clusters.

In this example, K-means clustering was used to group people in the Adult Census dataset, using these columns: workclass, education, occupation, sex. The Assign to Clusters module was used to predict the census group for a new person, based on these attributes.

Age Workclass Education Assignments
54 Federal-gov HS-grad 0
43 State-gov Assoc-voc 1
21 HS-grad
28 Self-emp-not-inc Bachelors 0

Important

If there are missing values in any attribute that was used to train the model (such as the preceding case where the workclass value is missing), a cluster assignment is not returned.

Examples

Because this module has been deprecated, there are no examples of this module in the Azure AI Gallery.

To see examples of updated clustering models, see these experiments:

  • Color quantization: Uses clustering to group images by color patterns to reduce the number of bits needed to represent each image.

  • Clustering: Similar Companies: Uses clustering with text extracted from Wikipedia description to find companies in predefined categories.

Technical notes

  • If any column names are duplicated when the new column is appended to the dataset, a numeric suffix is added to the name of the new column.

  • The clusters created by the model are 0-based numeric labels. These labels cannot be edited in Azure Machine Learning Studio (classic).

Expected inputs

Name Type Description
Trained model ICluster interface Trained clustering model
Dataset Data Table Input data source

Module parameters

Name Range Type Default Description
Column Set Any ColumnSelection Select the columns from the input dataset to map to the clustering model.
Check for Append or Uncheck for Result Only Any Boolean true Deselect this option if you want to output only the results (cluster assignments).

By default, the column containing the clustering results is appended to the columns of the input dataset .

Outputs

Name Type Description
Results dataset Data Table A dataset containing either the results of clustering only, or the input dataset with the assignments column appended

Exceptions

Exception Description
Error 0003 Exception occurs if one or more of inputs are null or empty.

For a list of errors specific to Studio modules, see Machine Learning Error codes.

For a list of API exceptions, see Machine Learning REST API Error Codes.

See also

K-Means Clustering
Score Model
Score
A-Z Module List
Assign Data to Clusters