Tutorial: Predict automobile price with the visual interface

In this two-part tutorial, you learn how to use the Azure Machine Learning service visual interface to develop and deploy a predictive analytic solution that predicts the price of any car.

In part one, you'll set up your environment, drag-and-drop datasets and analysis modules onto an interactive canvas, and connect them together to create an experiment.

In part one of the tutorial you learn how to:

  • Create a new experiment
  • Import data
  • Prepare data
  • Train a machine learning model
  • Evaluate a machine learning model

In part two of the tutorial, you'll learn how to deploy your predictive model as an Azure web service so you can use it to predict the price of any car based on technical specifications you send it.

A completed version of this tutorial is available as a sample experiment.

To find it, from the Experiments page, select Add New, then select the Sample 1 - Regression: Automobile Price Prediction(Basic) experiment.

Create a new experiment

To create a visual interface experiment, you first need an Azure Machine Learnings service workspace. In this section you learn how to create both these resources.

Create a new workspace

If you have an Azure Machine Learning service workspace, skip to the next section.

  1. Sign in to the Azure portal by using the credentials for the Azure subscription you use.

  2. In the upper-left corner of Azure portal, select + Create a resource.

  3. Use the search bar to find Machine Learning service workspace.

  4. Select Machine Learning service workspace.

  5. In the Machine Learning service workspace pane, select Create to begin.

  6. Configure your new workspace by providing the workspace name, subscription, resource group, and location.

    Field Description
    Workspace name Enter a unique name that identifies your workspace. In this example, we use docs-ws. Names must be unique across the resource group. Use a name that's easy to recall and to differentiate from workspaces created by others.
    Subscription Select the Azure subscription that you want to use.
    Resource group Use an existing resource group in your subscription or enter a name to create a new resource group. A resource group holds related resources for an Azure solution. In this example, we use docs-aml.
    Location Select the location closest to your users and the data resources to create your workspace.
  7. After you are finished configuring the workspace, select Create.

    It can take a few moments to create the workspace.

    When the process is finished, a deployment success message appears. To view the new workspace, select Go to resource.

Create an experiment

  1. Open your workspace in the Azure portal.

  2. In your workspace, select Visual interface. Then select Launch visual interface.

    Screenshot of the Azure portal showing how to access the Visual interface from a Machine Learning service workspace

  3. Create a new experiment by selecting +New at the bottom of the visual interface window.

  4. Select Blank Experiment.

  5. Select the default experiment name "Experiment created on ..." at the top of the canvas and rename it to something meaningful. For example, "Automobile price prediction". The name doesn't need to be unique.

Import data

Machine learning depends on data. Luckily, there are several sample datasets included in this interface available for you to experiment with. For this tutorial, use the sample dataset Automobile price data (Raw).

  1. To the left of the experiment canvas is a palette of datasets and modules. Select Saved Datasets then select Samples to view the available sample datasets.

  2. Select the dataset, Automobile price data (raw), and drag it onto the canvas.

    Drag data to canvas

  3. Select which columns of data to work with. Type Select in the Search box at the top of the palette to find the Select Columns in Dataset module.

  4. Click and drag the Select Columns in Dataset module onto the canvas. Drop the module below the dataset module.

  5. Connect the dataset you added earlier to the Select Columns in Dataset module by clicking and dragging. Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, all the way to the input port of Select Columns in Dataset, which is the small circle at the top of the module.

    Tip

    You create a flow of data through your experiment when you connect the output port of one module to an input port of another.

    Connect modules

    The red exclamation mark indicates that you haven't set the properties for the module yet.

  6. Select the Select Columns in Dataset module.

  7. In the Properties pane to the right of the canvas, select Edit columns.

    In the Select columns dialog, select ALL COLUMNS and include all features. The dialog should look like this:

    column-selector

  8. On the lower right, select OK to close the column selector.

Run the experiment

At any time, click the output port of a dataset or module to see what the data looks like at that point in the data flow. If the Visualize option is disabled, you first need to run the experiment.

An experiment runs on a compute target, a compute resource that is attached to your workspace. Once you create a compute target, you can reuse it for future runs.

  1. Select Run at the bottom to run the experiment.

  2. When the Setup Compute Targets dialog appears, if your workspace already has a compute resource, you can select it now. Otherwise, select Create new.

    Note

    The visual interface can only run experiments on Machine Learning Compute targets. Other compute targets will not be shown.

  3. Provide a name for the compute resource.

  4. Select Run.

    Setup compute target

    The compute resource will now be created. View the status in the top-right corner of the experiment.

    Note

    It takes approximately 5 minutes to create a compute resource. After the resource is created, you can reuse it and skip this wait time for future runs.

    The compute resource will autoscale to 0 nodes when it is idle to save cost. When you use it again after a delay, you may again experience approximately 5 minutes of wait time while it scales back up.

After the compute target is available, the experiment runs. When the run is complete, a green check mark appears on each module.

Visualize the data

Now that you have run your initial experiment, you can visualize the data to understand more about the dataset you have.

  1. Select the output port at the bottom of the Select Columns in Dataset then select Visualize.

  2. Click on different columns in the data window to view information about that column.

    In this dataset, each row represents an automobile, and the variables associated with each automobile appear as columns. There are 205 rows and 26 columns in this dataset.

    Each time you click a column of data, the Statistics information and Visualization image of that column appears on the left.

    Preview the data

  3. Click each column to understand more about your dataset, and think about whether these columns will be useful to predict the price of an automobile.

Prepare data

Typically, a dataset requires some preprocessing before it can be analyzed. You might have noticed some missing values when visualizing the dataset. These missing values need to be cleaned so the model can analyze the data correctly. You'll remove any rows that have missing values. Also, the normalized-losses column has a large proportion of missing values, so you'll exclude that column from the model altogether.

Tip

Cleaning the missing values from input data is a prerequisite for using most of the modules.

Remove column

First, remove the normalized-losses column completely.

  1. Select the Select Columns in Dataset module.

  2. In the Properties pane to the right of the canvas, select Edit columns.

    • Leave With rules and ALL COLUMNS selected.

    • From the drop-downs, select Exclude and column names, and then click inside the text box. Type normalized-losses.

    • On the lower right, select OK to close the column selector.

    Exclude a column

    Now the properties pane for Select Columns in Dataset indicates that it will pass through all columns from the dataset except normalized-losses.

    The properties pane shows that the normalized-losses column is excluded.

  3. Double-click the Select Columns in Dataset module and type the comment "Exclude normalized losses."

    After you type the comment, click outside the module. A down-arrow appears to show that the module contains a comment.

  4. Click on the down-arrow to display the comment.

    The module now shows an up-arrow to hide the comment.

    Comments

Clean missing data

When you train a model, you have to do something about the data that is missing. In this case, you'll add a module to remove any remaining row that has missing data.

  1. Type Clean in the Search box to find the Clean Missing Data module.

  2. Drag the Clean Missing Data module to the experiment canvas and connect it to the Select Columns in Dataset module.

  3. In the Properties pane, select Remove entire row under Cleaning mode.

  4. Double-click the module and type the comment "Remove missing value rows."

    Your experiment should now look something like this:

    select-column

Train a machine learning model

Now that the data is ready, you can construct a predictive model. You'll use your data to train the model. Then you'll test the model to see how closely it's able to predict prices.

Select an algorithm

Classification and regression are two types of supervised machine learning algorithms. Classification predicts an answer from a defined set of categories, such as a color (red, blue, or green). Regression is used to predict a number.

Because you want to predict price, which is a number, you can use a regression algorithm. For this example, you'll use a linear regression model.

Split the data

Use your data for both training the model and testing it by splitting the data into separate training and testing datasets.

  1. Type split data in the search box to find the Split Data module and connect it to the left port of the Clean Missing Data module.

  2. Select the Split Data module. In the Properties pane, set the Fraction of rows in the first output dataset to 0.7. This way, we'll use 70 percent of the data to train the model, and hold back 30 percent for testing.

  3. Double-click the Split Data and type the comment "Split the dataset into training set(0.7) and test set(0.3)"

Train the model

Train the model by giving it a set of data that includes the price. The model scans the data and looks for correlations between a car's features and its price.

  1. To select the learning algorithm, clear your module palette search box.

  2. Expand the Machine Learning then expand Initialize Model. This displays several categories of modules that can be used to initialize machine learning algorithms.

  3. For this experiment, select Regression > Linear Regression and drag it to the experiment canvas.

  4. Find and drag the Train Model module to the experiment canvas. Connect the output of the Linear Regression module to the left input of the Train Model module, and connect the training data output (left port) of the Split Data module to the right input of the Train Model module.

    Screenshot showing the correct configuration of the Train Model module. The Linear Regression module connects to left port of Train Model module and the Split Data module connects to right port of Train Model

  5. Select the Train Model module. In the Properties pane, Select Launch column selector and then type price next to Include column names. Price is the value that your model is going to predict

    Screenshot showing the correct configuration for the column selector module. With rules > Include column names > "price"

    Your experiment should look like this:

    Screenshot showing the correct configuration of the experiment after adding the Train Model module.

Evaluate a machine learning model

Now that you've trained the model using 70 percent of your data, you can use it to score the other 30 percent of the data to see how well your model functions.

  1. Type score model in the search box to find the Score Model module and drag the module to the experiment canvas. Connect the output of the Train Model module to the left input port of Score Model. Connect the test data output (right port) of the Split Data module to the right input port of Score Model.

  2. Type evaluate in the search box to find the Evaluate Model and drag the module to the experiment canvas. Connect the output of the Score Model module to the left input of Evaluate Model. The final experiment should look something like this:

    Screenshot showing the final correct configuration of the experiment.

  3. Run the experiment using the compute resource you created earlier.

  4. View the output from the Score Model module by selecting the output port of Score Model and select Visualize. The output shows the predicted values for price and the known values from the test data.

    Screenshot of the output visualization highlighting the "Scored Label" column

  5. To view the output from the Evaluate Model module, select the output port, and then select Visualize.

    Screenshot showing the evaluation results for the final experiment.

The following statistics are shown for your model:

  • Mean Absolute Error (MAE): The average of absolute errors (an error is the difference between the predicted value and the actual value).
  • Root Mean Squared Error (RMSE): The square root of the average of squared errors of predictions made on the test dataset.
  • Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
  • Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
  • Coefficient of Determination: Also known as the R squared value, this is a statistical metric indicating how well a model fits the data.

For each of the error statistics, smaller is better. A smaller value indicates that the predictions more closely match the actual values. For Coefficient of Determination, the closer its value is to one (1.0), the better the predictions.

Clean up resources

Important

You can use the resources that you created as prerequisites for other Azure Machine Learning service tutorials and how-to articles.

Delete everything

If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges:

  1. In the Azure portal, select Resource groups on the left side of the window.

    Delete resource group in the Azure portal

  2. In the list, select the resource group that you created.

  3. On the right side of the window, select the ellipsis button (...).

  4. Select Delete resource group.

Deleting the resource group also deletes all resources that you created in the visual interface.

Delete only the compute target

The compute target that you created here automatically autoscales to zero nodes when it's not being used. This is to minimize charges. If you want to delete the compute target, take these steps:

  1. In the Azure portal, open your workspace.

    Delete the compute target

  2. In the Compute section of your workspace, select the resource.

  3. Select Delete.

Delete individual assets

In the visual interface where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.

Delete experiments

Next steps

In part one of this tutorial, you completed these steps:

  • Created an experiment
  • Prepare the data
  • Train the model
  • Score and evaluate the model

In part two, you'll learn how to deploy your model as an Azure web service.