Use boosted decision tree to predict churn with Azure Machine Learning designer
Designer (preview) sample 5
APPLIES TO: Basic edition Enterprise edition (Upgrade to Enterprise)
Learn how to build a complex machine learning pipeline without writing a single line of code using the designer (preview).
This pipeline trains 2 two-class boosted decision tree classifiers to predict common tasks for customer relationship management (CRM) systems - customer churn. The data values and labels are split across multiple data sources and scrambled to anonymize customer information, however, we can still use the designer to combine data sets and train a model using the obscured values.
Because you're trying to answer the question "Which one?" this is called a classification problem, but you can apply the same logic shown in this sample to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
Here's the completed graph for this pipeline:
Create an Azure Machine Learning workspace if you don't have one.
Sign into ml.azure.com and select the workspace you want to work with.
- Click sample 5 to open it.
The data for this pipeline is from KDD Cup 2009. It has 50,000 rows and 230 feature columns. The task is to predict churn, appetency, and up-selling for customers who use these features. For more information about the data and the task, see the KDD website.
This sample pipeline in the designer shows binary classifier prediction of churn, appetency, and up-selling, a common task for customer relationship management (CRM).
First, some simple data processing.
The raw dataset has many missing values. Use the Clean Missing Data module to replace the missing values with 0.
The features and the corresponding churn are in different datasets. Use the Add Columns module to append the label columns to the feature columns. The first column, Col1, is the label column. From the visualization result we can see the dataset is unbalanced. There way more negative (-1) examples than positive examples (+1). We will use SMOTE module to increase underrepresented cases later.
Use the Split Data module to split the dataset into train and test sets.
Then use the Boosted Decision Tree binary classifier with the default parameters to build the prediction models. Build one model per task, that is, one model each to predict up-selling, appetency, and churn.
In the right part of the pipeline, we use SMOTE module to increase the percentage of positive examples. The SMOTE percentage is set to 100 to double the positive examples. Learn more on how SMOTE module works with SMOTE module reference0.
Visualize the output of the Evaluate Model module to see the performance of the model on the test set.
You can move the Threshold slider and see the metrics change for the binary classification task.
Clean up resources
You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to articles.
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges.
In the Azure portal, select Resource groups on the left side of the window.
In the list, select the resource group that you created.
Select Delete resource group.
Deleting the resource group also deletes all resources that you created in the designer.
Delete individual assets
In the designer where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This action is taken to minimize charges. If you want to delete the compute target, take these steps:
You can unregister datasets from your workspace by selecting each dataset and selecting Unregister.
To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually delete those assets.
Explore the other samples available for the designer:
- Sample 1 - Regression: Predict an automobile's price
- Sample 2 - Regression: Compare algorithms for automobile price prediction
- Sample 3 - Classification with feature selection: Income Prediction
- Sample 4 - Classification: Predict credit risk (cost sensitive)
- Sample 6 - Classification: Predict flight delays
- Sample 7 - Text Classification: Wikipedia SP 500 Dataset