Build a classifier & use feature selection to predict income with Azure Machine Learning designer
Designer (preview) sample 3
APPLIES TO: Basic edition Enterprise edition (Upgrade to Enterprise)
Learn how to build a machine learning classifier without writing a single line of code using the designer (preview). This sample trains a two-class boosted decision tree to predict adult census income (>=50K or <=50K).
Because the question is answering "Which one?" this is called a classification problem. However, you can apply the same fundamental process to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
Here's the final pipeline graph for this sample:
Create an Azure Machine Learning workspace if you don't have one.
Sign into ml.azure.com and select the workspace you want to work with.
- Click the sample 3 to open it.
The dataset contains 14 features and one label column. There are multiple types of features, including numerical and categorical. The following diagram shows an excerpt from the dataset:
Follow these steps to create the pipeline:
Drag the Adult Census Income Binary dataset module into the pipeline canvas.
Add a Split Data module to create the training and test sets. Set the fraction of rows in the first output dataset to 0.7. This setting specifies that 70% of the data will be output to the left port of the module and the rest to the right port. We use the left dataset for training and the right one for testing.
Add the Filter Based Feature Selection module to select 5 features by PearsonCorreclation.
Add a Two-Class Boosted Decision Tree module to initialize a boosted decision tree classifier.
Add a Train Model module. Connect the classifier from the previous step to the left input port of the Train Model. Connect the filtered dataset from Filter Based Feature Selection module as training dataset. The Train Model will train the classifier.
Add Select Columns Transformation and Apply Transformation module to apply the same transformation (filtered based feature selection) to test dataset.
Add Score Model module and connect the Train Model module to it. Then add the test set (the output of Apply Transformation module which apply feature selection to test set too) to the Score Model. The Score Model will make the predictions. You can select its output port to see the predictions and the positive class probabilities.
This pipeline has two score modules, the one on the right has excluded label column before make the prediction. This is prepared to deploy a real-time endpoint, because the web service input will expect only features not label.
Add an Evaluate Model module and connect the scored dataset to its left input port. To see the evaluation results, select the output port of the Evaluate Model module and select Visualize.
In the evaluation results, you can see that the curves like ROC, Precision-recall and confusion metrics.
Clean up resources
You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to articles.
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges.
In the Azure portal, select Resource groups on the left side of the window.
In the list, select the resource group that you created.
Select Delete resource group.
Deleting the resource group also deletes all resources that you created in the designer.
Delete individual assets
In the designer where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This action is taken to minimize charges. If you want to delete the compute target, take these steps:
You can unregister datasets from your workspace by selecting each dataset and selecting Unregister.
To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually delete those assets.
Explore the other samples available for the designer:
- Sample 1 - Regression: Predict an automobile's price
- Sample 2 - Regression: Compare algorithms for automobile price prediction
- Sample 4 - Classification: Predict credit risk (cost sensitive)
- Sample 5 - Classification: Predict churn
- Sample 6 - Classification: Predict flight delays
- Sample 7 - Text Classification: Wikipedia SP 500 Dataset