Tutorial: Predict automobile price with the designer (preview)Tutorial: Predict automobile price with the designer (preview)

Gäller för: NoBasic Edition JaEnterprise edition                       (Uppgradera till företag)APPLIES TO: noBasic edition yesEnterprise edition                       (Upgrade to Enterprise)

In this two-part tutorial, you learn how to use the Azure Machine Learning designer to develop and deploy a predictive analytics solution that predicts the price of any car.In this two-part tutorial, you learn how to use the Azure Machine Learning designer to develop and deploy a predictive analytics solution that predicts the price of any car.

In part one, you set up your environment, drag modules onto an interactive canvas, and connect them together to create an Azure Machine Learning pipeline.In part one, you set up your environment, drag modules onto an interactive canvas, and connect them together to create an Azure Machine Learning pipeline.

In part one of the tutorial, you'll learn how to:In part one of the tutorial, you'll learn how to:

  • Create a new pipeline.Create a new pipeline.
  • Import data.Import data.
  • Prepare data.Prepare data.
  • Train a machine learning model.Train a machine learning model.
  • Evaluate a machine learning model.Evaluate a machine learning model.

In part two of the tutorial, you'll learn how to deploy your predictive model as a real-time inferencing endpoint to predict the price of any car based on technical specifications you send it.In part two of the tutorial, you'll learn how to deploy your predictive model as a real-time inferencing endpoint to predict the price of any car based on technical specifications you send it.

Anteckning

A completed version of this tutorial is available as a sample pipeline.A completed version of this tutorial is available as a sample pipeline.

To find it, go to the designer in your workspace.To find it, go to the designer in your workspace. In the New pipeline section, select Sample 1 - Regression: Automobile Price Prediction(Basic) .In the New pipeline section, select Sample 1 - Regression: Automobile Price Prediction(Basic).

Create a new pipelineCreate a new pipeline

Azure Machine Learning pipelines organize multiple, dependent machine learning and data processing steps into a single resource.Azure Machine Learning pipelines organize multiple, dependent machine learning and data processing steps into a single resource. Pipelines help you organize, manage, and reuse complex machine learning workflows across projects and users.Pipelines help you organize, manage, and reuse complex machine learning workflows across projects and users. To create an Azure Machine Learning pipeline, you need an Azure Machine Learning workspace.To create an Azure Machine Learning pipeline, you need an Azure Machine Learning workspace. In this section, you learn how to create both these resources.In this section, you learn how to create both these resources.

Create a new workspaceCreate a new workspace

If you have an Azure Machine Learning workspace with an Enterprise edition, skip to the next section.If you have an Azure Machine Learning workspace with an Enterprise edition, skip to the next section.

  1. Logga in på Azure Portal med hjälp av autentiseringsuppgifterna för din Azure-prenumeration.Sign in to the Azure portal by using the credentials for your Azure subscription.

  2. I det övre vänstra hörnet av Azure Portal väljer du + skapa en resurs.In the upper-left corner of the Azure portal, select + Create a resource.

    Skapa en ny resurs

  3. Använd Sök fältet för att hitta Machine Learning.Use the search bar to find Machine Learning.

  4. Välj Machine Learning.Select Machine Learning.

  5. I fönstret Machine Learning väljer du skapa för att börja.In the Machine Learning pane, select Create to begin.

  6. Ange följande information för att konfigurera din nya arbets yta:Provide the following information to configure your new workspace:

    FältField BeskrivningDescription
    Namn på arbets ytaWorkspace name Ange ett unikt namn som identifierar din arbets yta.Enter a unique name that identifies your workspace. I det här exemplet använder vi dokument-WS.In this example, we use docs-ws. Namn måste vara unika i resurs gruppen.Names must be unique across the resource group. Använd ett namn som är enkelt att återkalla och särskilja från arbets ytor som skapats av andra.Use a name that's easy to recall and to differentiate from workspaces created by others.
    PrenumerationSubscription Ange den prenumeration som du vill använda.Select the Azure subscription that you want to use.
    ResursgruppResource group Använd en befintlig resurs grupp i din prenumeration eller ange ett namn för att skapa en ny resurs grupp.Use an existing resource group in your subscription, or enter a name to create a new resource group. En resurs grupp innehåller relaterade resurser för en Azure-lösning.A resource group holds related resources for an Azure solution. I det här exemplet använder vi AML-dokument.In this example, we use docs-aml.
    PlatsLocation Välj den plats som är närmast dina användare och data resurserna för att skapa din arbets yta.Select the location closest to your users and the data resources to create your workspace.
    Arbetsyte versionWorkspace edition Välj företag.Select Enterprise. Den här självstudien kräver att Enterprise Edition används.This tutorial requires the use of the Enterprise edition. Enterprise-utgåvan är i för hands version och lägger inte till några extra kostnader.The Enterprise edition is in preview and doesn't currently add any extra costs.
  7. När du är klar med konfigurationen av arbets ytan väljer du skapa.After you're finished configuring the workspace, select Create.

    Varning

    Det kan ta flera minuter att skapa din arbets yta i molnet.It can take several minutes to create your workspace in the cloud.

    När processen är klar visas ett meddelande om lyckad distribution.When the process is finished, a deployment success message appears.

  8. Om du vill visa den nya arbets ytan väljer du gå till resurs.To view the new workspace, select Go to resource.

Skapa pipelinenCreate the pipeline

  1. Sign in to ml.azure.com, and select the workspace you want to work with.Sign in to ml.azure.com, and select the workspace you want to work with.

  2. Select Designer.Select Designer.

    Screenshot of the visual workspace showing how to access the designer

  3. Select Easy-to-use prebuilt modules.Select Easy-to-use prebuilt modules.

  4. Select the default pipeline name Pipeline-Created-on at the top of the canvas.Select the default pipeline name Pipeline-Created-on at the top of the canvas. Rename it to something meaningful.Rename it to something meaningful. An example is Automobile price prediction.An example is Automobile price prediction. Namnet behöver inte vara unikt.The name doesn't need to be unique.

Importera dataImport data

There are several sample datasets included in the designer for you to experiment with.There are several sample datasets included in the designer for you to experiment with. For this tutorial, use Automobile price data (Raw) .For this tutorial, use Automobile price data (Raw).

  1. To the left of the pipeline canvas is a palette of datasets and modules.To the left of the pipeline canvas is a palette of datasets and modules. Select Datasets, and then view the Samples section to view the available sample datasets.Select Datasets, and then view the Samples section to view the available sample datasets.

  2. Select the dataset Automobile price data (Raw) , and drag it onto the canvas.Select the dataset Automobile price data (Raw), and drag it onto the canvas.

    Drag data to canvas

Visualisera datanVisualize the data

You can visualize the data to understand the dataset that you'll use.You can visualize the data to understand the dataset that you'll use.

  1. Select the Automobile price data (Raw) module.Select the Automobile price data (Raw) module.

  2. In the properties pane to the right of the canvas, select Outputs.In the properties pane to the right of the canvas, select Outputs.

  3. Select the graph icon to visualize the data.Select the graph icon to visualize the data.

    Visualisera datan

  4. Select the different columns in the data window to view information about each one.Select the different columns in the data window to view information about each one.

    Each row represents an automobile, and the variables associated with each automobile appear as columns.Each row represents an automobile, and the variables associated with each automobile appear as columns. There are 205 rows and 26 columns in this dataset.There are 205 rows and 26 columns in this dataset.

Förbered dataPrepare data

Datasets typically require some preprocessing before analysis.Datasets typically require some preprocessing before analysis. You might have noticed some missing values when you inspected the dataset.You might have noticed some missing values when you inspected the dataset. These missing values must be cleaned so that the model can analyze the data correctly.These missing values must be cleaned so that the model can analyze the data correctly.

Ta bort en kolumnRemove a column

When you train a model, you have to do something about the data that's missing.When you train a model, you have to do something about the data that's missing. In this dataset, the normalized-losses column is missing many values, so you exclude that column from the model altogether.In this dataset, the normalized-losses column is missing many values, so you exclude that column from the model altogether.

  1. Enter Select in the search box at the top of the palette to find the Select Columns in Dataset module.Enter Select in the search box at the top of the palette to find the Select Columns in Dataset module.

  2. Drag the Select Columns in Dataset module onto the canvas.Drag the Select Columns in Dataset module onto the canvas. Drop the module below the dataset module.Drop the module below the dataset module.

  3. Connect the Automobile price data (Raw) dataset to the Select Columns in Dataset module.Connect the Automobile price data (Raw) dataset to the Select Columns in Dataset module. Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, to the input port of Select Columns in Dataset, which is the small circle at the top of the module.Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, to the input port of Select Columns in Dataset, which is the small circle at the top of the module.

    Tips

    You create a flow of data through your pipeline when you connect the output port of one module to an input port of another.You create a flow of data through your pipeline when you connect the output port of one module to an input port of another.

    Connect modules

  4. Select the Select Columns in Dataset module.Select the Select Columns in Dataset module.

  5. In the properties pane to the right of the canvas, select Parameters > Edit column.In the properties pane to the right of the canvas, select Parameters > Edit column.

  6. Select the + to add a new rule.Select the + to add a new rule.

  7. From the drop-down menu, select Exclude and Column names.From the drop-down menu, select Exclude and Column names.

  8. Enter normalized-losses in the text box.Enter normalized-losses in the text box.

  9. In the lower right, select Save to close the column selector.In the lower right, select Save to close the column selector.

    Exclude a column

    The properties pane shows that the normalized-losses column is excluded.The properties pane shows that the normalized-losses column is excluded.

  10. Select the Select Columns in Dataset module.Select the Select Columns in Dataset module.

  11. In the properties pane, select Parameters > Comment and enter Exclude normalized losses.In the properties pane, select Parameters > Comment and enter Exclude normalized losses.

Clean missing dataClean missing data

Your dataset still has missing values after you remove the normalized-losses column.Your dataset still has missing values after you remove the normalized-losses column. You can remove the remaining missing data by using the Clean Missing Data module.You can remove the remaining missing data by using the Clean Missing Data module.

Tips

Cleaning the missing values from input data is a prerequisite for using most of the modules in the designer.Cleaning the missing values from input data is a prerequisite for using most of the modules in the designer.

  1. Enter Clean in the search box to find the Clean Missing Data module.Enter Clean in the search box to find the Clean Missing Data module.

  2. Drag the Clean Missing Data module to the pipeline canvas.Drag the Clean Missing Data module to the pipeline canvas. Connect it to the Select Columns in Dataset module.Connect it to the Select Columns in Dataset module.

  3. In the properties pane, select Remove entire row under Cleaning mode.In the properties pane, select Remove entire row under Cleaning mode.

  4. In the properties pane Comment box, enter Remove missing value rows.In the properties pane Comment box, enter Remove missing value rows.

    Your pipeline should now look something like this:Your pipeline should now look something like this:

    Select-column

Train a machine learning modelTrain a machine learning model

Now that the data is processed, you can train a predictive model.Now that the data is processed, you can train a predictive model.

Välja en algoritmSelect an algorithm

Klassificering och regression är två typer av övervakade Machine Learning-algoritmer.Classification and regression are two types of supervised machine learning algorithms. Classification predicts an answer from a defined set of categories, such as a color like red, blue, or green.Classification predicts an answer from a defined set of categories, such as a color like red, blue, or green. Regression används för att förutsäga ett tal.Regression is used to predict a number.

Because you want to predict price, which is a number, you can use a regression algorithm.Because you want to predict price, which is a number, you can use a regression algorithm. For this example, you use a linear regression model.For this example, you use a linear regression model.

Split the dataSplit the data

Split your data into two separate datasets for training the model and testing it.Split your data into two separate datasets for training the model and testing it.

  1. Enter split data in the search box to find the Split Data module.Enter split data in the search box to find the Split Data module. Connect it to the left port of the Clean Missing Data module.Connect it to the left port of the Clean Missing Data module.

  2. Select the Split Data module.Select the Split Data module.

  3. In the properties pane, set the Fraction of rows in the first output dataset to 0.7.In the properties pane, set the Fraction of rows in the first output dataset to 0.7.

    This option splits 70 percent of the data to train the model and 30 percent for testing it.This option splits 70 percent of the data to train the model and 30 percent for testing it.

  4. In the properties pane Comment box, enter Split the dataset into training set (0.7) and test set (0.3) .In the properties pane Comment box, enter Split the dataset into training set (0.7) and test set (0.3).

Träna modellenTrain the model

Train the model by giving it a set of data that includes the price.Train the model by giving it a set of data that includes the price. The model scans through the data and looks for correlations between a car's features and its price to construct a model.The model scans through the data and looks for correlations between a car's features and its price to construct a model.

  1. To select the learning algorithm, clear your module palette search box.To select the learning algorithm, clear your module palette search box.

  2. Expand Machine Learning Algorithms.Expand Machine Learning Algorithms.

    This option displays several categories of modules that you can use to initialize learning algorithms.This option displays several categories of modules that you can use to initialize learning algorithms.

  3. Select Regression > Linear Regression, and drag it to the pipeline canvas.Select Regression > Linear Regression, and drag it to the pipeline canvas.

  4. Find and drag the Train Model module to the pipeline canvas.Find and drag the Train Model module to the pipeline canvas.

  5. Connect the output of the Linear Regression module to the left input of the Train Model module.Connect the output of the Linear Regression module to the left input of the Train Model module.

  6. Connect the training data output (left port) of the Split Data module to the right input of the Train Model module.Connect the training data output (left port) of the Split Data module to the right input of the Train Model module.

    Screenshot showing the correct configuration of the Train Model module.

  7. Select the Train Model module.Select the Train Model module.

  8. In the properties pane, select Edit column selector.In the properties pane, select Edit column selector.

  9. In the Label column dialog box, expand the drop-down menu and select Column names.In the Label column dialog box, expand the drop-down menu and select Column names.

  10. In the text box, enter price.In the text box, enter price. Price is the value that your model is going to predict.Price is the value that your model is going to predict.

    Your pipeline should look like this:Your pipeline should look like this:

    Screenshot showing the correct configuration of the pipeline after adding the Train Model module.

Evaluate a machine learning modelEvaluate a machine learning model

After you train your model by using 70 percent of the data, you can use it to score the other 30 percent to see how well your model functions.After you train your model by using 70 percent of the data, you can use it to score the other 30 percent to see how well your model functions.

  1. Enter score model in the search box to find the Score Model module.Enter score model in the search box to find the Score Model module. Drag the module to the pipeline canvas.Drag the module to the pipeline canvas.

  2. Connect the output of the Train Model module to the left input port of Score Model.Connect the output of the Train Model module to the left input port of Score Model. Connect the test data output (right port) of the Split Data module to the right input port of Score Model.Connect the test data output (right port) of the Split Data module to the right input port of Score Model.

  3. Enter evaluate in the search box to find the Evaluate Model module.Enter evaluate in the search box to find the Evaluate Model module. Drag the module to the pipeline canvas.Drag the module to the pipeline canvas.

  4. Connect the output of the Score Model module to the left input of Evaluate Model.Connect the output of the Score Model module to the left input of Evaluate Model.

    The final pipeline should look something like this:The final pipeline should look something like this:

    Screenshot showing the correct configuration of the pipeline.

Köra en pipelineRun the pipeline

En pipeline körs på ett beräknings mål, som är en beräknings resurs som är kopplad till din arbets yta.A pipeline runs on a compute target, which is a compute resource that's attached to your workspace. När du har skapat ett beräknings mål kan du återanvända det för framtida körningar.After you create a compute target, you can reuse it for future runs.

  1. Välj Kör överst på arbets ytan för att köra pipelinen.Select Run at the top of the canvas to run the pipeline.

  2. När fönstret Inställningar visas väljer du Välj Compute Target (Välj Compute Target).When the Settings pane appears, select Select compute target.

    Om du redan har ett tillgängligt beräknings mål kan du välja att köra denna pipeline.If you already have an available compute target, you can select it to run this pipeline.

    Anteckning

    Designern kan bara köra experiment på Azure Machine Learning Compute-mål.The designer can run experiments only on Azure Machine Learning Compute targets. Andra beräknings mål visas inte.Other compute targets won't be shown.

  3. Ange ett namn för beräknings resursen.Enter a name for the compute resource.

  4. Välj Spara.Select Save.

    Konfigurera beräknings mål

  5. Välj Kör.Select Run.

  6. I dialog rutan Konfigurera pipeline-körning väljer du + nytt experiment för experimentet.In the Set up pipeline run dialog box, select + New experiment for the Experiment.

    Anteckning

    Experiment grupp liknande pipeliner körs tillsammans.Experiments group similar pipeline runs together. Om du kör en pipeline flera gånger kan du välja samma experiment för efterföljande körningar.If you run a pipeline multiple times, you can select the same experiment for successive runs.

    1. Ange ett beskrivande namn för experimentets namn.Enter a descriptive name for Experiment Name.

    2. Välj Kör.Select Run.

    Du kan visa körnings status och information överst till höger på arbets ytan.You can view run status and details at the top right of the canvas.

    Anteckning

    Det tar cirka fem minuter att skapa en beräknings resurs.It takes approximately five minutes to create a compute resource. När resursen har skapats kan du återanvända den och hoppa över vänte tiden för framtida körningar.After the resource is created, you can reuse it and skip this wait time for future runs.

    Beräknings resursen skalar automatiskt till noll noder när den är inaktiv för att spara pengar.The compute resource autoscales to zero nodes when it's idle to save cost. När du använder den igen efter en fördröjning kan du uppleva ungefär fem minuters vänte tid medan den skalas upp.When you use it again after a delay, you might experience approximately five minutes of wait time while it scales back up.

Visa resultatView results

After the run completes, you can view the results of the pipeline run.After the run completes, you can view the results of the pipeline run.

  1. Select the Score Model module to view its output.Select the Score Model module to view its output.

  2. In the properties pane, select Outputs > Visualize.In the properties pane, select Outputs > Visualize.

    Here you can see the predicted prices and the actual prices from the testing data.Here you can see the predicted prices and the actual prices from the testing data.

    Screenshot of the output visualization highlighting the Scored Label column

  3. Select the Evaluate Model module to view its output.Select the Evaluate Model module to view its output.

  4. In the properties pane, select Output > Visualize.In the properties pane, select Output > Visualize.

The following statistics are shown for your model:The following statistics are shown for your model:

  • Mean Absolute Error (MAE) : The average of absolute errors.Mean Absolute Error (MAE): The average of absolute errors. An error is the difference between the predicted value and the actual value.An error is the difference between the predicted value and the actual value.
  • Root Mean Squared Error (RMSE) : The square root of the average of squared errors of predictions made on the test dataset.Root Mean Squared Error (RMSE): The square root of the average of squared errors of predictions made on the test dataset.
  • Relativa absoluta fel: Medelvärdet av absoluta fel i förhållande till den absoluta skillnaden mellan faktiska värden och medelvärdet av alla faktiska värden.Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
  • Relativa kvadratfel: Medelvärdet av kvadratfel i förhållande till kvadratskillnaden mellan faktiska värden och medelvärdet av alla faktiska värden.Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
  • Coefficient of Determination: Also known as the R squared value, this statistical metric indicates how well a model fits the data.Coefficient of Determination: Also known as the R squared value, this statistical metric indicates how well a model fits the data.

För all felstatistik gäller att mindre är bättre.For each of the error statistics, smaller is better. A smaller value indicates that the predictions are closer to the actual values.A smaller value indicates that the predictions are closer to the actual values. For the coefficient of determination, the closer its value is to one (1.0), the better the predictions.For the coefficient of determination, the closer its value is to one (1.0), the better the predictions.

Rensa resurserClean up resources

Viktigt

Du kan använda de resurser som du har skapat som krav för andra Azure Machine Learning självstudier och instruktions artiklar.You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to articles.

Ta bort alltDelete everything

Om du inte planerar att använda något som du har skapat tar du bort hela resurs gruppen så att du inte debiteras några avgifter.If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges.

  1. I Azure Portal väljer du resurs grupper på vänster sida av fönstret.In the Azure portal, select Resource groups on the left side of the window.

    Ta bort resursgrupp i Azure-portalen

  2. I listan väljer du den resurs grupp som du skapade.In the list, select the resource group that you created.

  3. Välj Ta bort resursgrupp.Select Delete resource group.

Om du tar bort resurs gruppen raderas även alla resurser som du skapade i designern.Deleting the resource group also deletes all resources that you created in the designer.

Ta bort enskilda till gångarDelete individual assets

I designern där du skapade experimentet kan du ta bort enskilda till gångar genom att markera dem och sedan välja knappen ta bort .In the designer where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.

Beräknings målet som du skapade här automatiskt skalar automatiskt till noll noder när det inte används.The compute target that you created here automatically autoscales to zero nodes when it's not being used. Den här åtgärden vidtas för att minimera kostnaderna.This action is taken to minimize charges. Gör så här om du vill ta bort beräknings målet: If you want to delete the compute target, take these steps:

Ta bort till gångar

Du kan avregistrera data uppsättningar från din arbets yta genom att markera varje data uppsättning och välja avregistrera.You can unregister datasets from your workspace by selecting each dataset and selecting Unregister.

Avregistrera data uppsättning

Om du vill ta bort en data uppsättning går du till lagrings kontot genom att använda Azure Portal eller Azure Storage Explorer och manuellt ta bort dessa till gångar.To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually delete those assets.

Nästa stegNext steps

In part one of this tutorial, you completed the following tasks:In part one of this tutorial, you completed the following tasks:

  • Skapa en pipelineCreate a pipeline
  • Förbereda dataPrepare the data
  • Träna modellenTrain the model
  • Score and evaluate the modelScore and evaluate the model

In part two, you'll learn how to deploy your model as a real-time endpoint.In part two, you'll learn how to deploy your model as a real-time endpoint.