Sample 1 - Regression: Predict price
Learn how to build a machine learning regression model without writing a single line of code using the visual interface.
This experiment trains a decision forest regressor to predict a car's price based on technical features such as make, model, horsepower, and size. Because we're trying to answer the question "How much?" this is called a regression problem. However, you can apply the same fundamental steps in this experiment to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.
The fundamental steps of a training machine learning model are:
- Get the data
- Pre-process the data
- Train the model
- Evaluate the model
Here's the final, completed graph of the experiment we'll be working on. We'll provide the rationale for all the modules so you can make similar decisions on your own.
Create an Azure Machine Learning service workspace if you don't have one.
In your workspace, select Visual interface. Then select Launch visual interface.
The interface webpage opens in a new browser page.
Select the Open button for the Sample 1 experiment:
Get the data
In this experiment, we use the Automobile price data (Raw) dataset, which is from the UCI Machine Learning Repository. The dataset contains 26 columns that contain information about automobiles, including make, model, price, vehicle features (like the number of cylinders), MPG, and an insurance risk score. The goal of this experiment is to predict the price of the car.
Pre-process the data
The main data preparation tasks include data cleaning, integration, transformation, reduction, and discretization or quantization. In the visual interface, you can find modules to perform these operations and other data pre-processing tasks in the Data Transformation group in the left panel.
We use the Select Columns in Dataset module to exclude normalized-losses that have many missing values. We then use Clean Missing Data to remove the rows that have missing values. This helps to create a clean set of training data.
Train the model
Machine learning problems vary. Common machine learning tasks include classification, clustering, regression, and recommender systems, each of which might require a different algorithm. Your choice of algorithm often depends on the requirements of the use case. After you pick an algorithm, you need to tune its parameters to train a more accurate model. You then need to evaluate all models based on metrics like accuracy, intelligibility, and efficiency.
Because the goal of this experiment is to predict automobile prices, and because the label column (price) contains real numbers, a regression model is a good choice. Considering that the number of features is relatively small (less than 100) and these features aren't sparse, the decision boundary is likely to be nonlinear. So we use Decision Forest Regression for this experiment.
We use the Split Data module to randomly divide the input data so that the training dataset contains 70% of the original data and the testing dataset contains 30% of the original data.
Test, evaluate, and compare
We split the dataset and use different datasets to train and test the model to make the evaluation of the model more objective.
After the model is trained, we use the Score Model and Evaluate Model modules to generate predicted results and evaluate the models.
Score Model generates predictions for the test dataset by using the trained model. To check the result, select the output port of Score Model and then select Visualize.
We then pass the scores to the Evaluate Model module to generate evaluation metrics. To check the result, select the output port of the Evaluate Model and then select Visualize.
Clean up resources
You can use the resources that you created as prerequisites for other Azure Machine Learning service tutorials and how-to articles.
If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges:
In the Azure portal, select Resource groups on the left side of the window.
In the list, select the resource group that you created.
On the right side of the window, select the ellipsis button (...).
Select Delete resource group.
Deleting the resource group also deletes all resources that you created in the visual interface.
Delete only the compute target
The compute target that you created here automatically autoscales to zero nodes when it's not being used. This is to minimize charges. If you want to delete the compute target, take these steps:
In the Azure portal, open your workspace.
In the Compute section of your workspace, select the resource.
Delete individual assets
In the visual interface where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.
Explore the other samples available for the visual interface: