Sample 5 - Classification: Predict churn, appetency, and up-selling
Learn how to build a complex machine learning experiment without writing a single line of code using the visual interface.
This experiment trains three, two-class boosted decision tree classifiers to predict common tasks for customer relationship management (CRM) systems: churn, appetency, and up-selling. The data values and labels are split across multiple data sources and scrambled to anonymize customer information, however, we can still use the visual interface to combine data sets and train a model using the scrambled values.
Because we're trying to answer the question "Which one?" this is called a classification problem. However, you can apply the same steps in this experiment to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
Here's the completed graph for this experiment:
Create an Azure Machine Learning service workspace if you don't have one.
In your workspace, select Visual interface. Then select Launch visual interface.
The interface webpage opens in a new browser page.
You can also access the visual interface from your workspace landing page (preview).
Select the Open button for the Sample 5 experiment.
The data we use for this experiment is from KDD Cup 2009. The dataset has 50,000 rows and 230 feature columns. The task is to predict churn, appetency, and up-selling for customers who use these features. For more information about the data and the task, see the KDD website.
This visual interface sample experiment shows binary classifier prediction of churn, appetency, and up-selling, a common task for customer relationship management (CRM).
First, we do some simple data processing.
The raw dataset contains lots of missing values. We use the Clean Missing Data module to replace the missing values with 0.
The features and the corresponding churn, appetency, and up-selling labels are in different datasets. We use the Add Columns module to append the label columns to the feature columns. The first column, Col1, is the label column. The rest of the columns, Var1, Var2, and so on, are the feature columns.
We use the Split Data module to split the dataset into train and test sets.
We then use the Boosted Decision Tree binary classifier with the default parameters to build the prediction models. We build one model per task, that is, one model each to predict up-selling, appetency, and churn.
Visualize the output of the Evaluate Model module to see the performance of the model on the test set. For the up-selling task, the ROC curve shows that the model does better than a random model. The area under the curve (AUC) is 0.857. At threshold 0.5, the precision is 0.7, the recall is 0.463, and the F1 score is 0.545.
You can move the Threshold slider and see the metrics change for the binary classification task.
Clean up resources
You can use the resources that you created as prerequisites for other Azure Machine Learning service tutorials and how-to articles.
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges:
In the Azure portal, select Resource groups on the left side of the window.
In the list, select the resource group that you created.
On the right side of the window, select the ellipsis button (...).
Select Delete resource group.
Deleting the resource group also deletes all resources that you created in the visual interface.
Delete only the compute target
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This is to minimize charges. If you want to delete the compute target, take these steps:
In the Azure portal, open your workspace.
In the Compute section of your workspace, select the resource.
Delete individual assets
In the visual interface where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.
Explore the other samples available for the visual interface: