Thank you for using the Microsoft Q&A forum.
Based on your description, you have a Data to Clusters model that is being fed 4000 points during training and another 4000 items to be compared against the points. While it is providing satisfactory assignments, you're interested in determining the confidence level of these assignments by accessing the scores associated with each item.
To achieve this, follow these steps:
- Utilize the "Assign Data to Clusters" Component:
- Locate or create a clustering model using the K-means clustering algorithm within Azure Machine Learning Designer.
- Configure the Component:
- Attach the trained model to the left input port of the "Assign Data to Clusters" component.
- Provide a new dataset as input, ensuring that the input columns match those used in training the clustering model.
- Retrieve Results:
- The "Assign Data to Clusters" component returns a dataset containing the probable assignments for each new data point.
- This dataset includes the Assignments column, indicating the cluster to which each row is assigned, along with columns indicating the distance from each point to the centers of each cluster.
- Select Top 5 Scores:
- To obtain the top 5 scores along with their respective assignments, select the top 5 rows based on the Assignments column and the distances from the point to the cluster centers. You can also make use of the "Execute Python Script" component to sort the dataset by the Assignments column and the distance columns in descending order and select the top 5 rows.
- Here is an example python script for your reference. Please make necessary modifications, as per your use case requirements.
import pandas as pd def azureml_main(dataframe1 = None, dataframe2 = None): # Sort the dataset by the Assignments column and the distance columns in descending order sorted_df = dataframe1.sort_values(['Assignments', 'Distance to Cluster Center', 'Distance to Other Center'], ascending=[True, False, True]) # Select the top 5 rows top_5 = sorted_df.head(5) # Return the top 5 rows return top_5,
For a detailed implementation of this approach, you can refer to the provided "Component: K-Means Clustering" documentation here. Additionally, you can explore the step-by-step Microsoft documentation on clustering here.I hope the provided information helps you in further improvising your solution. Thank you.