教學課程:透過視覺化介面預測汽車價格Tutorial: Predict automobile price with the visual interface

在這個分成兩部分的教學課程中,您會了解如何使用 Azure Machine Learning 服務的視覺化介面來開發和部署預測性分析解決方案,以預測任何汽車的價格。In this two-part tutorial, you learn how to use the Azure Machine Learning service visual interface to develop and deploy a predictive analytic solution that predicts the price of any car.

在第一部分中,您將設定環境、將資料集和分析模組拖放到互動式畫布上,並將它們連接在一起以建立實驗。In part one, you'll set up your environment, drag-and-drop datasets and analysis modules onto an interactive canvas, and connect them together to create an experiment.

在教學課程的第一部分中,您將了解如何:In part one of the tutorial you learn how to:

  • 建立新實驗Create a new experiment
  • 匯入資料Import data
  • 準備資料Prepare data
  • 訓練機器學習模型Train a machine learning model
  • 評估機器學習模型Evaluate a machine learning model

在教學課程的第二部分中,您將了解如何將預測模型部署為 Azure Web 服務,以便用來根據您傳入的技術規格預測任何汽車的價格。In part two of the tutorial, you'll learn how to deploy your predictive model as an Azure web service so you can use it to predict the price of any car based on technical specifications you send it.

本教學課程完成後,將可作為範例實驗。A completed version of this tutorial is available as a sample experiment.

若要找到它,請從 [實驗頁面] 選取 [新增] ,然後選取 [範例 1 - 迴歸: 汽車價格預測 (基本)] 實驗。To find it, from the Experiments page, select Add New, then select the Sample 1 - Regression: Automobile Price Prediction(Basic) experiment.

建立新實驗Create a new experiment

若要建立視覺化介面實驗,您首先需要 Azure Machine Learning 服務工作區。To create a visual interface experiment, you first need an Azure Machine Learnings service workspace. 在本節中,您將了解如何建立這兩個資源。In this section you learn how to create both these resources.

建立新的工作區Create a new workspace

如果您有 Azure Machine Learning 服務工作區,請跳至下一節。If you have an Azure Machine Learning service workspace, skip to the next section.

  1. 使用您所使用之 Azure 訂用帳戶的認證來登入 Azure 入口網站Sign in to the Azure portal by using the credentials for the Azure subscription you use.

  2. 在 Azure 入口網站的左上角,選取 [+建立資源] 。In the upper-left corner of Azure portal, select + Create a resource.

    建立新的資源

  3. 使用搜尋列尋找 [Machine Learning 服務工作區] 。Use the search bar to find Machine Learning service workspace.

  4. 選取 [Machine Learning 服務工作區] 。Select Machine Learning service workspace.

  5. 在 [Machine Learning 服務工作區] 窗格中選取 [建立] 來開始操作。In the Machine Learning service workspace pane, select Create to begin.

  6. 提供下列資訊來設定新的工作區:Provide the following information to configure your new workspace:

    欄位Field 說明Description
    工作區名稱Workspace name 輸入可識別您工作區的唯一名稱。Enter a unique name that identifies your workspace. 在此範例中,我們使用 docs-wsIn this example, we use docs-ws. 名稱必須是整個資源群組中唯一的。Names must be unique across the resource group. 請使用可輕鬆回想並且與其他人建立的工作區有所區別的名稱。Use a name that's easy to recall and to differentiate from workspaces created by others.
    SubscriptionSubscription 選取您要使用的 Azure 訂用帳戶。Select the Azure subscription that you want to use.
    Resource groupResource group 在您的訂用帳戶中使用現有的資源群組,或輸入名稱來建立新的資源群組。Use an existing resource group in your subscription or enter a name to create a new resource group. 資源群組會保留 Azure 方案的相關資源。A resource group holds related resources for an Azure solution. 在此範例中,我們使用 docs-amlIn this example, we use docs-aml.
    LocationLocation 選取最接近您的使用者與資料資源的位置,以建立工作區。Select the location closest to your users and the data resources to create your workspace.
  7. 在完成工作區的設定後,選取 [建立] 。After you are finished configuring the workspace, select Create.

    警告

    在雲端中建立工作區可能需要數分鐘的時間。It can take a several minutes to create your workspace in the cloud.

    程序完成後,會出現部署成功訊息。When the process is finished, a deployment success message appears.

  8. 若要檢視新的工作區,選取 [前往資源] 。To view the new workspace, select Go to resource.

建立實驗Create an experiment

  1. Azure 入口網站中開啟工作區。Open your workspace in the Azure portal.

  2. 在您的工作區中,選取 [視覺化介面] 。In your workspace, select Visual interface. 然後,選取 [啟動視覺化介面] 。Then select Launch visual interface.

    Azure 入口網站的螢幕擷取畫面,顯示如何從機器學習服務工作區存取視覺化介面

  3. 選取視覺化介面視窗底部的 [+新增] ,以建立新的實驗。Create a new experiment by selecting +New at the bottom of the visual interface window.

  4. 選取 [空白實驗] 。Select Blank Experiment.

  5. 選取畫布頂端的預設實驗名稱「實驗建立時間」 ,並將其重新命名為有意義的名稱。Select the default experiment name "Experiment created on ..." at the top of the canvas and rename it to something meaningful. 例如, 「汽車價格預測」For example, "Automobile price prediction". 此名稱不必是唯一的。The name doesn't need to be unique.

匯入資料Import data

機器學習取決於資料。Machine learning depends on data. 幸而,此介面中包含數個範例資料集,可供您在實驗時使用。Luckily, there are several sample datasets included in this interface available for you to experiment with. 在本教學課程中,使用範例資料集汽車價格資料 (原始)For this tutorial, use the sample dataset Automobile price data (Raw).

  1. 實驗畫布左側是資料集和模組的調色盤。To the left of the experiment canvas is a palette of datasets and modules. 選取 [儲存的資料集] ,然後選取 [範例] 以檢視可用的範例資料集。Select Saved Datasets then select Samples to view the available sample datasets.

  2. 選取資料集汽車價格資料 (原始) ,並將其拖曳到畫布上。Select the dataset, Automobile price data (raw), and drag it onto the canvas.

    將資料拖曳到畫布

  3. 選取要使用的資料行。Select which columns of data to work with. 在選擇區頂端的搜尋方塊中輸入選取,以尋找選取資料集中的資料行模組。Type Select in the Search box at the top of the palette to find the Select Columns in Dataset module.

  4. 按住選取資料集中的資料行模組,並將其拖曳到畫布上。Click and drag the Select Columns in Dataset module onto the canvas. 將模組放在資料集模組下。Drop the module below the dataset module.

  5. 按住並拖曳您先前新增的資料集,將其連接到選取資料集中的資料行模組。Connect the dataset you added earlier to the Select Columns in Dataset module by clicking and dragging. 從資料集的輸出連接埠 (即畫布上位於資料集底部的小圓圈) 一路拖曳到選取資料集中的資料行的輸入連接埠 (即模組頂端的小圓圈)。Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, all the way to the input port of Select Columns in Dataset, which is the small circle at the top of the module.

    提示

    將某個模組的輸出連接埠連線至另一個模組的輸入連接埠時,您會透過實驗建立資料流程。You create a flow of data through your experiment when you connect the output port of one module to an input port of another.

    連線模組

    紅色驚嘆號標示表示您尚未設定該模組的屬性。The red exclamation mark indicates that you haven't set the properties for the module yet.

  6. 選取選取資料集中的資料行模組。Select the Select Columns in Dataset module.

  7. 在畫布右側的 [屬性] 窗格中,選取 [編輯資料行] 。In the Properties pane to the right of the canvas, select Edit columns.

    在 [選取資料行] 對話方塊中選取 [所有資料行] ,並包含 [所有功能] 。In the Select columns dialog, select ALL COLUMNS and include all features. 對話方塊應該會看起來如下:The dialog should look like this:

    資料行選取器

  8. 在右下方,選取 [確定] 按鈕,以關閉資料行選取器。On the lower right, select OK to close the column selector.

執行實驗Run the experiment

您隨時可以按一下資料集或模組的輸出連接埠,以查看資料於該時間點在資料流程中的型態。At any time, click the output port of a dataset or module to see what the data looks like at that point in the data flow. 如果 [視覺化] 選項已停用,則必須先執行實驗。If the Visualize option is disabled, you first need to run the experiment.

對計算目標執行實驗,此目標為連結至工作區的計算資源。An experiment runs on a compute target, a compute resource that is attached to your workspace. 建立計算目標之後,您可以將其重複用於未來的執行。Once you create a compute target, you can reuse it for future runs.

  1. 選取底部的 [執行] 以執行實驗。Select Run at the bottom to run the experiment.

  2. 當 [設定計算目標] 對話方塊出現時,如果您的工作區中已有計算資源,您即可加以選取。When the Setup Compute Targets dialog appears, if your workspace already has a compute resource, you can select it now. 否則,請選取 [新建] 。Otherwise, select Create new.

    注意

    視覺化介面只能對 Machine Learning Compute 目標執行實驗。The visual interface can only run experiments on Machine Learning Compute targets. 其他計算目標將不會顯示。Other compute targets will not be shown.

  3. 提供計算資源的名稱。Provide a name for the compute resource.

  4. 選取 [執行] 。Select Run.

    設定計算目標

    此時會建立計算資源。The compute resource will now be created. 在實驗的右上角檢視狀態。View the status in the top-right corner of the experiment.

    注意

    建立計算資源大約需要 5 分鐘。It takes approximately 5 minutes to create a compute resource. 資源建立後,您可以在未來執行時加以重複使用,而略過這段等候時間。After the resource is created, you can reuse it and skip this wait time for future runs.

    計算資源在閒置時會自動調整為 0 個節點,以節省成本。The compute resource will autoscale to 0 nodes when it is idle to save cost. 當您在一段時間後再次加以使用時,它在重新相應增加時可能又會再出現約 5 分鐘的等候時間。When you use it again after a delay, you may again experience approximately 5 minutes of wait time while it scales back up.

有可用的計算目標之後,便會執行實驗。After the compute target is available, the experiment runs. 執行完成時,每個模組上會出現綠色的核取記號。When the run is complete, a green check mark appears on each module.

將資料視覺化Visualize the data

現在您已執行初始實驗,接下來可以將資料視覺化,以深入了解您擁有的資料集。Now that you have run your initial experiment, you can visualize the data to understand more about the dataset you have.

  1. 選取位於選取資料集中的資料行底部的輸出連接埠,然後選取 [視覺化] 。Select the output port at the bottom of the Select Columns in Dataset then select Visualize.

  2. 按一下 [資料] 視窗中的不同資料行,以檢視該資料行的相關資訊。Click on different columns in the data window to view information about that column.

    在此資料集中,每個資料列分別代表一款汽車,而與每款汽車相關聯的變數會顯示為資料行。In this dataset, each row represents an automobile, and the variables associated with each automobile appear as columns. 此資料集中有 205 個資料列和 26 個資料行。There are 205 rows and 26 columns in this dataset.

    每當您按一下資料的資料行時,該資料行的 [統計資料] 資訊和 [視覺效果] 影像就會出現在左側。Each time you click a column of data, the Statistics information and Visualization image of that column appears on the left.

    預覽資料Preview the data

  3. 按一下每個資料行即可深入了解您的資料集,並判斷這些資料行是否有助於預測汽車的價格。Click each column to understand more about your dataset, and think about whether these columns will be useful to predict the price of an automobile.

準備資料Prepare data

資料集通常必須先經過某些前置處理,才能進行分析。Typically, a dataset requires some preprocessing before it can be analyzed. 您在視覺化資料集時,可能已注意到某些遺漏值。You might have noticed some missing values when visualizing the dataset. 必須清除這些遺漏的值,讓模型才能正確地分析資料。These missing values need to be cleaned so the model can analyze the data correctly. 您將移除含有遺漏值的所有資料列。You'll remove any rows that have missing values. 此外,自負虧損資料行含有比例很高的遺漏值,因此您會將該資料行從模型中完全排除。Also, the normalized-losses column has a large proportion of missing values, so you'll exclude that column from the model altogether.

提示

在使用大部分的模組時,都必須從輸入資料中清除遺漏值。Cleaning the missing values from input data is a prerequisite for using most of the modules.

移除資料行Remove column

首先,完全移除自負虧損資料行。First, remove the normalized-losses column completely.

  1. 選取選取資料集中的資料行模組。Select the Select Columns in Dataset module.

  2. 在畫布右側的 [屬性] 窗格中,選取 [編輯資料行] 。In the Properties pane to the right of the canvas, select Edit columns.

    • 保留 [套用規則] 和 [所有資料行] 的選取狀態。Leave With rules and ALL COLUMNS selected.

    • 在下拉式清單中,選取 [排除] 和 [資料行名稱] ,然後按一下文字方塊內部。From the drop-downs, select Exclude and column names, and then click inside the text box. 輸入自負虧損Type normalized-losses.

    • 在右下方,選取 [確定] 按鈕,以關閉資料行選取器。On the lower right, select OK to close the column selector.

    排除資料行

    現在,[選取資料集中的資料行] 的屬性窗格指出它會傳遞資料集中的所有資料行,但 [自負虧損] 除外。Now the properties pane for Select Columns in Dataset indicates that it will pass through all columns from the dataset except normalized-losses.

    [屬性] 窗格顯示 [自負虧損] 資料行已排除。The properties pane shows that the normalized-losses column is excluded.

  3. 按兩下選取資料集中的資料行模組,然後輸入註解「排除自負虧損」。Double-click the Select Columns in Dataset module and type the comment "Exclude normalized losses."

    輸入註解後,請按一下該模組外部。After you type the comment, click outside the module. 此時會出現向下箭號,以顯示該模組包含註解。A down-arrow appears to show that the module contains a comment.

  4. 按一下向下箭號以顯示註解。Click on the down-arrow to display the comment.

    模組此時會顯示向上箭號以隱藏註解。The module now shows an up-arrow to hide the comment.

    註解

清除遺漏的資料Clean missing data

當您定型模型時,您必須對遺漏的資料採取某些動作。When you train a model, you have to do something about the data that is missing. 在此案例中,您將新增模組來移除任何含有遺漏資料的其餘資料列。In this case, you'll add a module to remove any remaining row that has missing data.

  1. 在搜尋方塊中輸入清除,以尋找清除遺漏的資料模組。Type Clean in the Search box to find the Clean Missing Data module.

  2. 清除遺漏的資料模組拖曳到實驗畫布,然後將其連線至選取資料集中的資料行模組。Drag the Clean Missing Data module to the experiment canvas and connect it to the Select Columns in Dataset module.

  3. 在 [屬性] 窗格中,選取 [清除模式] 下方的 [移除整個資料列] 。In the Properties pane, select Remove entire row under Cleaning mode.

  4. 按兩下模組,並輸入註解「移除遺漏值資料列」。Double-click the module and type the comment "Remove missing value rows."

    您的實驗目前看起來如下:Your experiment should now look something like this:

    select-column

訓練機器學習模型Train a machine learning model

現在資料已就緒,接下來可以建構預測模型。Now that the data is ready, you can construct a predictive model. 您將使用您的資料來訓練模型。You'll use your data to train the model. 然後,您將測試模型以確認它預測價格的精準度。Then you'll test the model to see how closely it's able to predict prices.

選取演算法Select an algorithm

分類迴歸是兩種受監督的機器學習服務演算法。Classification and regression are two types of supervised machine learning algorithms. 分類可從一組已定義的類別預測答案,例如色彩 (紅色、藍色或綠色)。Classification predicts an answer from a defined set of categories, such as a color (red, blue, or green). 迴歸可用來預測數字。Regression is used to predict a number.

因為要預測價格,也就是一個數字,因此您將使用迴歸演算法。Because you want to predict price, which is a number, you can use a regression algorithm. 在此範例中,您將使用線性迴歸模型。For this example, you'll use a linear regression model.

分割資料Split the data

請將資料分割成個別的訓練和測試資料集,用來訓練和測試模型。Use your data for both training the model and testing it by splitting the data into separate training and testing datasets.

  1. 在搜尋方塊中輸入分割資料以尋找分割資料模組,並將其連線至清除遺漏的資料模組的左側連接埠。Type split data in the search box to find the Split Data module and connect it to the left port of the Clean Missing Data module.

  2. 選取分割資料模組。Select the Split Data module. 在 [屬性] 窗格中,將第一個輸出資料集中的資料列比例設為 0.7。In the Properties pane, set the Fraction of rows in the first output dataset to 0.7. 如此,我們將使用百分之 70 的資料來訓練模型,並保留百分之 30 供測試之用。This way, we'll use 70 percent of the data to train the model, and hold back 30 percent for testing.

  3. 按兩下 [分割資料] ,然後輸入註解「將資料集分割為訓練集 (0.7) 和測試集 (0.3)」Double-click the Split Data and type the comment "Split the dataset into training set(0.7) and test set(0.3)"

訓練模型Train the model

您將藉由提供一組包含價格的資料來訓練模型。Train the model by giving it a set of data that includes the price. 模型會掃描的資料,然後尋找汽車性能與價格之間的關聯性。The model scans the data and looks for correlations between a car's features and its price.

  1. 若要選取學習演算法,請清除您的模組選擇區搜尋方塊。To select the learning algorithm, clear your module palette search box.

  2. 依序展開 [機器學習] 和 [初始化模型] 。Expand the Machine Learning then expand Initialize Model. 這會顯示數個可用來初始化機器學習演算法的模組類別。This displays several categories of modules that can be used to initialize machine learning algorithms.

  3. 在此實驗中,請選取 [迴歸] > [線性迴歸] ,然後將其拖曳到實驗畫布。For this experiment, select Regression > Linear Regression and drag it to the experiment canvas.

  4. 找出訓練模型模組,並將其拖曳到實驗畫布。Find and drag the Train Model module to the experiment canvas. 將「線性迴歸」模組的輸出連線至「訓練模型」模組的左側輸入,並將分割資料模組的訓練資料輸出 (左側連接埠) 連線至訓練模型模組的右側輸入。Connect the output of the Linear Regression module to the left input of the Train Model module, and connect the training data output (left port) of the Split Data module to the right input of the Train Model module.

    此螢幕擷取畫面顯示「訓練模型」模組的正確組態。

  5. 選取 [訓練模型] 模組。Select the Train Model module. 在 [屬性] 窗格中選取 [啟動資料行選取器],然後在 [包含資料行名稱] 旁邊輸入價格In the Properties pane, Select Launch column selector and then type price next to Include column names. 「價格」是我們的模型所將預測的值Price is the value that your model is going to predict

    此螢幕擷取畫面顯示資料行選取器的正確組態。

    實驗看起來如下:Your experiment should look like this:

    此螢幕擷取畫面顯示實驗在新增「訓練模型」模組之後的正確組態。

評估機器學習模型Evaluate a machine learning model

現在您已完成使用百分之 70 資料模型的訓練,而可用它來為其他百分之 30 的資料評分,以了解模型的運作是否理想。Now that you've trained the model using 70 percent of your data, you can use it to score the other 30 percent of the data to see how well your model functions.

  1. 在搜尋方塊中輸入評分模型以尋找評分模型模組,並將其拖曳到實驗畫布。Type score model in the search box to find the Score Model module and drag the module to the experiment canvas. 訓練模型模組的輸出連線至評分模型的左側輸入連接埠。Connect the output of the Train Model module to the left input port of Score Model. 分割資料模組的測試資料輸出 (右側連接埠) 連線至評分模型的右側輸入連接埠。Connect the test data output (right port) of the Split Data module to the right input port of Score Model.

  2. 在搜尋方塊中輸入評估以尋找 [評估模型] ,並將模組拖曳到實驗畫布。Type evaluate in the search box to find the Evaluate Model and drag the module to the experiment canvas. 評分模型模組的輸出連線至評估模型的左側輸入。Connect the output of the Score Model module to the left input of Evaluate Model. 實驗最終應呈現如下:The final experiment should look something like this:

    此螢幕擷取畫面顯示實驗的正確組態。

  3. 使用您先前建立的計算資源來執行實驗。Run the experiment using the compute resource you created earlier.

  4. 選取評分模型的輸出連接埠並選取 [視覺化] ,以檢視評分模型模組的輸出。View the output from the Score Model module by selecting the output port of Score Model and select Visualize. 輸出會顯示價格的預測值,以及來自測試資料的已知值。The output shows the predicted values for price and the known values from the test data.

    此螢幕擷取畫面將「評分標籤」資料行醒目提示的輸出視覺效果

  5. 若要檢視評估模型模組的輸出,請選取輸出連接埠,然後選取 [視覺化] 。To view the output from the Evaluate Model module, select the output port, and then select Visualize.

    此螢幕擷取畫面顯示最終實驗的評估結果。

您的模型會顯示下列統計資料:The following statistics are shown for your model:

  • 平均絕對誤差 (MAE) :絕對誤差的平均值 (「誤差」是指預測值與實際值之間的差異)。Mean Absolute Error (MAE): The average of absolute errors (an error is the difference between the predicted value and the actual value).
  • 均方根誤差 (RMSE) :對測試資料集所做之預測的平方誤差的評分根平均值。Root Mean Squared Error (RMSE): The square root of the average of squared errors of predictions made on the test dataset.
  • 相對絕對誤差:相對於實際值與所有實際值之平均值之間的絕對差異的絕對誤差平均值。Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
  • 相對平方誤差:相對於實際值與所有實際值之平均值之間的平方差異的平方誤差平均值。Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
  • 決定係數:也稱為 R 平方值,這是一個統計計量,可指出模型對於資料的適用程度。Coefficient of Determination: Also known as the R squared value, this is a statistical metric indicating how well a model fits the data.

針對每個誤差統計資料,越小越好。For each of the error statistics, smaller is better. 值越小,表示預測越接近實際值。A smaller value indicates that the predictions more closely match the actual values. 就 [決定係數] 而言,其值愈接近一 (1.0),預測就愈精準。For Coefficient of Determination, the closer its value is to one (1.0), the better the predictions.

清除資源Clean up resources

重要

您可以使用您所建立的資源來作為其他 Azure Machine Learning 服務教學課程和操作說明文章的先決條件。You can use the resources that you created as prerequisites for other Azure Machine Learning service tutorials and how-to articles.

刪除所有內容Delete everything

如果您不打算使用您所建立的任何資源,請刪除整個資源群組,以免產生任何費用:If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges:

  1. 在 Azure 入口網站中,於視窗左側選取 [資源群組] 。In the Azure portal, select Resource groups on the left side of the window.

    在 Azure 入口網站中刪除資源群組

  2. 在清單中,選取您所建立的資源群組。In the list, select the resource group that you created.

  3. 在視窗的右側,選取省略符號按鈕 ( ... )。On the right side of the window, select the ellipsis button (...).

  4. 選取 [刪除資源群組] 。Select Delete resource group.

刪除資源群組同時會刪除您在視覺化介面中所建立的所有資源。Deleting the resource group also deletes all resources that you created in the visual interface.

僅刪除計算目標Delete only the compute target

您在這裡建立的計算目標會在不使用時自動調整為零個節點。The compute target that you created here automatically autoscales to zero nodes when it's not being used. 這是為了盡量降低費用。This is to minimize charges. 如果您想要刪除計算目標,請採取下列步驟: If you want to delete the compute target, take these steps:

  1. Azure 入口網站中,開啟您的工作區。In the Azure portal, open your workspace.

    刪除計算目標

  2. 在工作區的 [計算] 區段中選取資源。In the Compute section of your workspace, select the resource.

  3. 選取 [刪除] 。Select Delete.

刪除個別資產Delete individual assets

在實驗建立所在的視覺化介面中,藉由選取個別資產再選取 [刪除] 按鈕,即可刪除個別資產。In the visual interface where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.

刪除實驗

後續步驟Next steps

在本教學課程的第一部分中,您已完成下列步驟:In part one of this tutorial, you completed these steps:

  • 建立實驗Created an experiment
  • 準備資料Prepare the data
  • 訓練模型Train the model
  • 對模型進行評分和評估Score and evaluate the model

在第二部分中,您將了解如何將模型部署為 Azure Web 服務。In part two, you'll learn how to deploy your model as an Azure web service.