Tutorial: Create your first classification model with automated machine learning
In this tutorial, you learn how to create your first automated machine learning experiment in the Azure portal (preview) without writing a single line of code. This example creates a classification model to predict if a client will subscribe to a fixed term deposit with a financial institution.
By using the automated machine learning capabilities of the Azure Machine Learning service and the Azure portal, you begin the automated machine learning process. The algorithm selection and hyperparameter tuning is done for you. The automated machine learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.
In this tutorial, you learn the following tasks:
- Configure an Azure Machine Learning service workspace.
- Create an experiment.
- Auto-train a classification model.
- View training run details.
- Deploy the model.
An Azure subscription. If you don’t have an Azure subscription, create a free account.
Download the bankmarketing_train.csv data file. The y column indicates if a customer subscribed to a fixed term deposit, which is later identified as the target column for predictions in this tutorial.
Create a workspace
Sign in to the Azure portal by using the credentials for the Azure subscription you use.
In the upper-left corner of Azure portal, select + Create a resource.
Use the search bar to find Machine Learning service workspace.
Select Machine Learning service workspace.
In the Machine Learning service workspace pane, select Create to begin.
Configure your new workspace by providing the workspace name, subscription, resource group, and location.
Field Description Workspace name Enter a unique name that identifies your workspace. In this example, we use docs-ws. Names must be unique across the resource group. Use a name that's easy to recall and to differentiate from workspaces created by others. Subscription Select the Azure subscription that you want to use. Resource group Use an existing resource group in your subscription or enter a name to create a new resource group. A resource group holds related resources for an Azure solution. In this example, we use docs-aml. Location Select the location closest to your users and the data resources to create your workspace.
After you are finished configuring the workspace, select Create.
It can take a few moments to create the workspace.
When the process is finished, a deployment success message appears. To view the new workspace, select Go to resource.
Create an experiment
These steps walk you through experiment set up from data selection to choosing your primary metric and model type.
Go to the left pane of your workspace. Select Automated machine learning under the Authoring (Preview) section. You'll see the Welcome to Automated Machine Learning screen, since this is your first experiment with Automated Machine Learning.
Select Create experiment. Then enter my-1st-automl-experiment as the experiment name.
Select Create a new compute and configure your compute context for this experiment.
Field Value Compute name Enter a unique name that identifies your compute context. For this example, we use automl-compute. Virtual machine size Select the virtual machine size for your compute. We use Standard_DS12_V2. Additional settings Min node: 1. To enable data profiling, you must have one or more nodes.
Max node: 6.
To create your new compute, select Create. This takes a few moments.
When creation is complete, select your new compute from the drop-down list, and then select Next.
For this tutorial, we use the default storage account and container created with your new compute. They automatically populate in the form.
Select Upload and choose the bankmarketing_train.csv file from your local computer to upload it to the default container. Public preview supports only local file uploads and Azure Blob storage accounts. When the upload is complete, select the file from the list.
The Preview tab allows us to further configure our data for this experiment.
On the Preview tab, indicate that the data includes headers. The service defaults to include all of the features (columns) for training. For this example, scroll to the right and Ignore the day_of_week feature.
Data profiling is not available with computes that have zero minimum nodes.
Select Classification as the prediction task.
Select y as the target column, where we want to do predictions. This column indicates whether the client subscribed to a term deposit or not.
Expand Advanced Settings and populate the fields as follows.
Advanced settings Value Primary metric AUC_weighted Exit criteria When any of these criteria are met, the training job ends before full completion:
Training job time (minutes): 5
Max number of iterations: 10
Preprocessing Enables preprocessing done by automated machine learning. This includes automatic data cleansing, preparing, and transformation to generate synthetic features. Validation Select K-fold cross-validation and 2 for the number of cross-validations. Concurrency Select 5 for the number of max concurrent iterations.
For this experiment, we don't set a metric or max cores per iterations threshold. We also don't block algorithms from being tested.
Select Start to run the experiment.
When the experiment starts, you see a blank Run Detail screen with the following status at the top.
The experiment preparation process takes a couple of minutes. When the process finishes, the status message changes to Run is Running.
View experiment details
As the experiment progresses, the Run Detail screen updates the iteration chart and list with the different iterations (models) that are run. The iterations list is in order by metric score. By default, the model that scores the highest based on our AUC_weighted metric is at the top of the list.
Training jobs take several minutes for each pipeline to finish running.
Deploy the model
By using automated machine learning in the Azure portal, we can deploy the best model as a web service to predict on new data and identify potential areas of opportunity. For this experiment, deployment means that the financial institution now has an iterative and scalable solution for identifying potential fixed term deposit customers.
In this experiment context, VotingEnsemble is considered the best model, based on the AUC_weighted metric. We deploy this model, but be advised, deployment takes about 20 minutes to complete.
On the Run Detail page, select the Deploy Best Model button.
Populate the Deploy Best Model pane as follows:
Field Value Deployment name my-automl-deploy Deployment description My first automated machine learning experiment deployment Scoring script Autogenerate Environment script Autogenerate
The following message appears when deployment successfully finishes:
Now you have an operational web service to generate predictions.
Clean up resources
Deployment files are larger than data and experiment files, so they cost more to store. Delete only the deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files. Otherwise, delete the entire resource group, if you don't plan to use any of the files.
Delete the deployment instance
Delete just the deployment instance from the Azure portal, if you want to keep the resource group and workspace for other tutorials and exploration.
Go to the Assets pane on the left and select Deployments.
Select the deployment you want to delete and select Delete.
Delete the resource group
The resources you created can be used as prerequisites to other Azure Machine Learning service tutorials and how-to articles.
If you don't plan to use the resources you created, delete them, so you don't incur any charges:
In the Azure portal, select Resource groups on the far left.
From the list, select the resource group you created.
Select Delete resource group.
Enter the resource group name. Then select Delete.
In this automated machine learning tutorial, you used the Azure portal to create and deploy a classification model. See these articles for more information and next steps:
- Learn more about preprocessing.
- Learn more about data profiling.
- Learn more about automated machine learning.
This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License. Any rights in individual contents of the database are licensed under the Database Contents License and available on Kaggle. This dataset was originally available within the UCI Machine Learning Database.
Please cite the following work:
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.