Tutorial: Train a classification model with no-code AutoML in the Azure Machine Learning studio

Learn how to train a classification model with no-code AutoML using Azure Machine Learning automated ML in the Azure Machine Learning studio. This classification model predicts if a client will subscribe to a fixed term deposit with a financial institution.

With automated ML, you can automate away time intensive tasks. Automated machine learning rapidly iterates over many combinations of algorithms and hyperparameters to help you find the best model based on a success metric of your choosing.

You won't write any code in this tutorial, you'll use the studio interface to perform training. You'll learn how to do the following tasks:

  • Create an Azure Machine Learning workspace.
  • Run an automated machine learning experiment.
  • Explore model details.
  • Deploy the recommended model.

Also try automated machine learning for these other model types:

Prerequisites

  • An Azure subscription. If you don't have an Azure subscription, create a free account.

  • Download the bankmarketing_train.csv data file. The y column indicates if a customer subscribed to a fixed term deposit, which is later identified as the target column for predictions in this tutorial.

Create a workspace

An Azure Machine Learning workspace is a foundational resource in the cloud that you use to experiment, train, and deploy machine learning models. It ties your Azure subscription and resource group to an easily consumed object in the service.

There are many ways to create a workspace. In this tutorial, you create a workspace via the Azure portal, a web-based console for managing your Azure resources.

  1. Sign in to the Azure portal by using the credentials for your Azure subscription.

  2. In the upper-left corner of the Azure portal, select the three bars, then + Create a resource.

    Screenshot showing + Create a resource.

  3. Use the search bar to find Machine Learning.

  4. Select Machine Learning.

    Screenshot shows search results to select Machine Learning.

  5. In the Machine Learning pane, select Create to begin.

  6. Provide the following information to configure your new workspace:

    Field Description
    Workspace name Enter a unique name that identifies your workspace. In this example, we use docs-ws. Names must be unique across the resource group. Use a name that's easy to recall and to differentiate from workspaces created by others.
    Subscription Select the Azure subscription that you want to use.
    Resource group Use an existing resource group in your subscription, or enter a name to create a new resource group. A resource group holds related resources for an Azure solution. In this example, we use docs-aml.
    Location Select the location closest to your users and the data resources to create your workspace.
  7. After you're finished configuring the workspace, select Review + Create.

  8. Select Create to create the workspace.

    Warning

    It can take several minutes to create your workspace in the cloud.

    When the process is finished, a deployment success message appears.

  9. To view the new workspace, select Go to resource.

  10. From the portal view of your workspace, select Launch studio to go to the Azure Machine Learning studio.

Important

Take note of your workspace and subscription. You'll need these to ensure you create your experiment in the right place.

Sign in to the studio

You complete the following experiment set-up and run steps via the Azure Machine Learning studio at https://ml.azure.com, a consolidated web interface that includes machine learning tools to perform data science scenarios for data science practitioners of all skill levels. The studio is not supported on Internet Explorer browsers.

  1. Sign in to Azure Machine Learning studio.

  2. Select your subscription and the workspace you created.

  3. Select Get started.

  4. In the left pane, select Automated ML under the Author section.

    Since this is your first automated ML experiment, you'll see an empty list and links to documentation.

    Get started page

  5. Select +New automated ML run.

Create and load dataset

Before you configure your experiment, upload your data file to your workspace in the form of an Azure Machine Learning dataset. Doing so, allows you to ensure that your data is formatted appropriately for your experiment.

  1. Create a new dataset by selecting From local files from the +Create dataset drop-down.

    1. On the Basic info form, give your dataset a name and provide an optional description. The automated ML interface currently only supports TabularDatasets, so the dataset type should default to Tabular.

    2. Select Next on the bottom left

    3. On the Datastore and file selection form, select the default datastore that was automatically set up during your workspace creation, workspaceblobstore (Azure Blob Storage). This is where you'll upload your data file to make it available to your workspace.

    4. Select Browse.

    5. Choose the bankmarketing_train.csv file on your local computer. This is the file you downloaded as a prerequisite.

    6. Give your dataset a unique name and provide an optional description.

    7. Select Next on the bottom left, to upload it to the default container that was automatically set up during your workspace creation.

      When the upload is complete, the Settings and preview form is pre-populated based on the file type.

    8. Verify that the Settings and preview form is populated as follows and select Next.

      Field Description Value for tutorial
      File format Defines the layout and type of data stored in a file. Delimited
      Delimiter One or more characters for specifying the boundary between  separate, independent regions in plain text or other data streams. Comma
      Encoding Identifies what bit to character schema table to use to read your dataset. UTF-8
      Column headers Indicates how the headers of the dataset, if any, will be treated. All files have same headers
      Skip rows Indicates how many, if any, rows are skipped in the dataset. None
    9. The Schema form allows for further configuration of your data for this experiment. For this example, select the toggle switch for the day_of_week, so as to not include it. Select Next. Schema form

    10. On the Confirm details form, verify the information matches what was previously populated on the Basic info, Datastore and file selection and Settings and preview forms.

    11. Select Create to complete the creation of your dataset.

    12. Select your dataset once it appears in the list.

    13. Review the Data preview to ensure you didn't include day_of_week then, select Close.

    14. Select Next.

Configure run

After you load and configure your data, you can set up your experiment. This setup includes experiment design tasks such as, selecting the size of your compute environment and specifying what column you want to predict.

  1. Select the Create new radio button.

  2. Populate the Configure Run form as follows:

    1. Enter this experiment name: my-1st-automl-experiment

    2. Select y as the target column, what you want to predict. This column indicates whether the client subscribed to a term deposit or not.

    3. Select +Create a new compute and configure your compute target. A compute target is a local or cloud-based resource environment used to run your training script or host your service deployment. For this experiment, we use a cloud-based compute.

      1. Populate the Virtual Machine form to set up your compute.

        Field Description Value for tutorial
        Virtual machine priority Select what priority your experiment should have Dedicated
        Virtual machine type Select the virtual machine type for your compute. CPU (Central Processing Unit)
        Virtual machine size Select the virtual machine size for your compute. A list of recommended sizes is provided based on your data and experiment type. Standard_DS12_V2
      2. Select Next to populate the Configure settings form.

        Field Description Value for tutorial
        Compute name A unique name that identifies your compute context. automl-compute
        Min / Max nodes To profile data, you must specify 1 or more nodes. Min nodes: 1
        Max nodes: 6
        Idle seconds before scale down Idle time before the cluster is automatically scaled down to the minimum node count. 1800 (default)
        Advanced settings Settings to configure and authorize a virtual network for your experiment. None
      3. Select Create to create your compute target.

        This takes a couple minutes to complete.

        Settings page

      4. After creation, select your new compute target from the drop-down list.

    4. Select Next.

  3. On the Select task and settings form, complete the setup for your automated ML experiment by specifying the machine learning task type and configuration settings.

    1. Select Classification as the machine learning task type.

    2. Select View additional configuration settings and populate the fields as follows. These settings are to better control the training job. Otherwise, defaults are applied based on experiment selection and data.

      Additional configurations Description Value for tutorial
      Primary metric Evaluation metric that the machine learning algorithm will be measured by. AUC_weighted
      Explain best model Automatically shows explainability on the best model created by automated ML. Enable
      Blocked algorithms Algorithms you want to exclude from the training job None
      Exit criterion If a criteria is met, the training job is stopped. Training job time (hours): 1
      Metric score threshold: None
      Validation Choose a cross-validation type and number of tests. Validation type:
       k-fold cross-validation

      Number of validations: 2
      Concurrency The maximum number of parallel iterations executed per iteration Max concurrent iterations: 5

      Select Save.

  4. Select Finish to run the experiment. The Run Detail screen opens with the Run status at the top as the experiment preparation begins. This status updates as the experiment progresses. Notifications also appear in the top right corner of the studio to inform you of the status of your experiment.

Important

Preparation takes 10-15 minutes to prepare the experiment run. Once running, it takes 2-3 minutes more for each iteration.

In production, you'd likely walk away for a bit. But for this tutorial, we suggest you start exploring the tested algorithms on the Models tab as they complete while the others are still running.

Explore models

Navigate to the Models tab to see the algorithms (models) tested. By default, the models are ordered by metric score as they complete. For this tutorial, the model that scores the highest based on the chosen AUC_weighted metric is at the top of the list.

While you wait for all of the experiment models to finish, select the Algorithm name of a completed model to explore its performance details.

The following navigates through the Details and the Metrics tabs to view the selected model's properties, metrics, and performance charts.

Run iteration detail

Model explanations

While you wait for the models to complete, you can also take a look at model explanations and see which data features (raw or engineered) influenced a particular model's predictions.

These model explanations can be generated on demand, and are summarized in the model explanations dashboard that's part of the Explanations (preview) tab.

To generate model explanations,

  1. Select Run 1 at the top to navigate back to the Models screen.

  2. Select the Models tab.

  3. For this tutorial, select the first MaxAbsScaler, LightGBM model.

  4. Select the Explain model button at the top. On the right, the Explain model pane appears.

  5. Select the automl-compute that you created previously. This compute cluster initiates a child run to generate the model explanations.

  6. Select Create at the bottom. A green success message appears towards the top of your screen.

    Note

    The explainability run takes about 2-5 minutes to complete.

  7. Select the Explanations (preview) button. This tab populates once the explainability run completes.

  8. On the left hand side, expand the pane and select the row that says raw under Features.

  9. Select the Aggregate feature importance tab on the right. This chart shows which data features influenced the predictions of the selected model.

    In this example, the duration appears to have the most influence on the predictions of this model.

    Model explanation dashboard

Deploy the best model

The automated machine learning interface allows you to deploy the best model as a web service in a few steps. Deployment is the integration of the model so it can predict on new data and identify potential areas of opportunity.

For this experiment, deployment to a web service means that the financial institution now has an iterative and scalable web solution for identifying potential fixed term deposit customers.

Check to see if your experiment run is complete. To do so, navigate back to the parent run page by selecting Run 1 at the top of your screen. A Completed status is shown on the top left of the screen.

Once the experiment run is complete, the Details page is populated with a Best model summary section. In this experiment context, VotingEnsemble is considered the best model, based on the AUC_weighted metric.

We deploy this model, but be advised, deployment takes about 20 minutes to complete. The deployment process entails several steps including registering the model, generating resources, and configuring them for the web service.

  1. Select VotingEnsemble to open the model-specific page.

  2. Select the Deploy button in the top-left.

  3. Populate the Deploy a model pane as follows:

    Field Value
    Deployment name my-automl-deploy
    Deployment description My first automated machine learning experiment deployment
    Compute type Select Azure Compute Instance (ACI)
    Enable authentication Disable.
    Use custom deployments Disable. Allows for the default driver file (scoring script) and environment file to be auto-generated.

    For this example, we use the defaults provided in the Advanced menu.

  4. Select Deploy.

    A green success message appears at the top of the Run screen, and in the Model summary pane, a status message appears under Deploy status. Select Refresh periodically to check the deployment status.

Now you have an operational web service to generate predictions.

Proceed to the Next Steps to learn more about how to consume your new web service, and test your predictions using Power BI's built in Azure Machine Learning support.

Clean up resources

Deployment files are larger than data and experiment files, so they cost more to store. Delete only the deployment files to minimize costs to your account, or if you want to keep your workspace and experiment files. Otherwise, delete the entire resource group, if you don't plan to use any of the files.

Delete the deployment instance

Delete just the deployment instance from Azure Machine Learning at https://ml.azure.com/, if you want to keep the resource group and workspace for other tutorials and exploration.

  1. Go to Azure Machine Learning. Navigate to your workspace and on the left under the Assets pane, select Endpoints.

  2. Select the deployment you want to delete and select Delete.

  3. Select Proceed.

Delete the resource group

Important

The resources that you created can be used as prerequisites to other Azure Machine Learning tutorials and how-to articles.

If you don't plan to use any of the resources that you created, delete them so you don't incur any charges:

  1. In the Azure portal, select Resource groups on the far left.

  2. From the list, select the resource group that you created.

  3. Select Delete resource group.

    Screenshot of the selections to delete a resource group in the Azure portal.

  4. Enter the resource group name. Then select Delete.

Next steps

In this automated machine learning tutorial, you used Azure Machine Learning's automated ML interface to create and deploy a classification model. See these articles for more information and next steps:

Note

This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License. Any rights in individual contents of the database are licensed under the Database Contents License and available on Kaggle. This dataset was originally available within the UCI Machine Learning Database.

[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014.