教學課程:透過視覺化介面預測汽車價格Tutorial: Predict automobile price with the visual interface

在本教學課程中,您將進一步了解如何在 Azure Machine Learning 服務視覺化介面中開發預測解決方案。In this tutorial, you take an extended look at developing a predictive solution in the Azure Machine Learning service visual interface. 本教學課程結束時,您將會有可根據您所傳送的技術規格對任何汽車預測價格的解決方案。By the end of this tutorial, you'll have a solution that can predict the price of any car based on technical specifications you send it.

本教學課程是快速入門的延續,並且是兩部分教學課程系列的第一部分This tutorial continues from the quickstart and is part one of a two-part tutorial series. 不過,您無須完成快速入門即可開始進行本教學課程。However, you don't have to complete the quickstart before starting.

在教學課程系列的第一部分中,您將了解如何:In part one of the tutorial series you learn how to:

  • 匯入及清除資料 (步驟與快速入門相同)Import and clean data (the same steps as the quickstart)
  • 訓練機器學習模型Train a machine learning model
  • 對模型進行評分和評估Score and evaluate a model

在教學課程系列的第二部分中,您將了解如何將預測模型部署為 Azure Web 服務。In part two of the tutorial series, you'll learn how to deploy your predictive model as an Azure web service.

注意

本教學課程完成後,將可作為範例實驗。A completed version of this tutorial is available as a sample experiment. 從 實驗 頁面中,移至 新增 > 範例 1 - 迴歸:汽車價格預測 (基本)From the Experiments page, go to Add New > Sample 1 - Regression: Automobile Price Prediction(Basic)

建立工作區Create a workspace

如果您有 Azure Machine Learning 服務工作區,請跳至下一節If you have an Azure Machine Learning service workspace, skip to the next section. 否則,請建立此工作區。Otherwise, create one now.

  1. 使用您所使用之 Azure 訂用帳戶的認證來登入 Azure 入口網站Sign in to the Azure portal by using the credentials for the Azure subscription you use.

    Azure 入口網站

  2. 在入口網站的左上角,選取 [建立資源] 。In the upper-left corner of the portal, select Create a resource.

    在 Azure 入口網站中建立資源

  3. 在搜尋列中輸入 Machine LearningIn the search bar, enter Machine Learning. 選取 Machine Learning Service 工作區的搜尋結果。Select the Machine Learning service workspace search result.

    搜尋工作區

  4. 在 [ML 服務工作區] 窗格中選取 [建立] 來開始操作。In the ML service workspace pane, select Create to begin.

    建立按鈕

  5. 在 [ML 服務工作區] 窗格中,設定您的工作區。In the ML service workspace pane, configure your workspace.

    欄位Field 說明Description
    工作區名稱Workspace name 輸入可識別您工作區的唯一名稱。Enter a unique name that identifies your workspace. 在此範例中,我們使用 docs-wsIn this example, we use docs-ws. 名稱必須是整個資源群組中唯一的。Names must be unique across the resource group. 請使用可輕鬆回想並且與其他人建立的工作區有所區別的名稱。Use a name that's easy to recall and differentiate from workspaces created by others.
    訂用帳戶Subscription 選取您要使用的 Azure 訂用帳戶。Select the Azure subscription that you want to use.
    資源群組Resource group 在您的訂用帳戶中使用現有的資源群組,或輸入名稱來建立新的資源群組。Use an existing resource group in your subscription, or enter a name to create a new resource group. 資源群組會保留 Azure 方案的相關資源。A resource group holds related resources for an Azure solution. 在此範例中,我們使用 docs-amlIn this example, we use docs-aml.
    位置Location 選取最接近您的使用者與資料資源的位置。Select the location closest to your users and the data resources. 此位置是建立工作區的所在位置。This location is where the workspace is created.

    建立工作區

  6. 若要開始執行建立程序,請選取 [檢閱 + 建立] 。To start the creation process, select Review + Create.

    建立

  7. 檢閱工作區設定。Review your workspace configuration. 如果正確,請選取 [建立] 。If it is correct, select Create. 建立工作區可能需要一些時間。It can take a few moments to create the workspace.

    建立

  8. 若要檢查部署狀態,請選取工具列上的通知圖示 (鈴鐺) 。To check on the status of the deployment, select the Notifications icon, bell, on the toolbar.

  9. 程序完成後,會出現部署成功訊息。When the process is finished, a deployment success message appears. 它也會出現在通知區段中。It's also present in the notifications section. 若要檢視新的工作區,選取 [前往資源] 。To view the new workspace, select Go to resource.

    工作區建立狀態

開啟視覺化介面網頁Open the visual interface webpage

  1. Azure 入口網站中開啟工作區。Open your workspace in the Azure portal.

  2. 在您的工作區中,選取 [視覺化介面] 。In your workspace, select Visual interface. 然後,選取 [啟動視覺化介面] 。Then select Launch visual interface.

    Azure 入口網站的螢幕擷取畫面,顯示如何從機器學習服務工作區存取視覺化介面

    介面網頁會在新的瀏覽器頁面中開啟。The interface webpage opens in a new browser page.

匯入及清除資料Import and clean your data

首先,您需要全新的資料。The first thing you need is clean data. 如果您已完成快速入門,在此您可以重複使用資料準備實驗。If you completed the quickstart, you can reuse your data prep experiment here. 如果您尚未完成快速入門,請略過下一節並從新的實驗開始If you haven't completed the quickstart, skip the next section and start from a new experiment.

重複使用快速入門實驗Reuse the quickstart experiment

  1. 開啟您的快速入門實驗。Open your quickstart experiment.

  2. 選取視窗底部的 [另存新檔] 。Select Save As at the bottom of the window.

  3. 在顯示的快顯對話方塊中,指定實驗的新名稱。Give it a new name in the pop-up dialog that appears.

    顯示如何將實驗重新命名為「教學課程 - 預測汽車價格」的螢幕擷取畫面

  4. 實驗目前看起來如下:The experiment should now look something like this:

    此螢幕擷取畫面顯示實驗的預期狀態。

如果您成功地重複使用快速入門實驗,請略過開始下一節,並開始進行訓練模型If you successfully reused your quickstart experiment, skip the next section to begin training your model.

從新的實驗開始Start from a new experiment

如果您未完成快速入門,請依照下列步驟快速建立新的實驗,以匯入並清除汽車資料集。If you didn't complete the quickstart, follow these steps to quickly create a new experiment that imports and cleans the automobile data set.

  1. 選取視覺化介面視窗底部的 [+新增] ,以建立新的實驗。Create a new experiment by selecting +New at the bottom of the visual interface window.

  2. 選取 [實驗] > [空白實驗] 。Select Experiments > Blank Experiment.

  3. 選取畫布頂端的預設實驗名稱「實驗建立時間」 ,並將其重新命名為有意義的名稱。Select the default experiment name "Experimented Created on ..." at the top of the canvas and rename it to something meaningful. 例如,汽車價格預測For example, Automobile price prediction. 此名稱不必是唯一的。The name doesn't need to be unique.

  4. 實驗畫布左側是資料集和模組的調色盤。To the left of the experiment canvas is a palette of datasets and modules. 若要尋找模組,請使用位於模組選擇區頂端的搜尋方塊。To find modules, use the search box at the top of the module palette. 在搜尋方塊中輸入汽車,以尋找標示為汽車價格資料 (原始) 的資料集。Type automobile in the search box to find the dataset labeled Automobile price data (Raw). 將此資料集拖曳到實驗畫布。Drag this dataset to the experiment canvas.

    此螢幕擷取畫面顯示如何尋找汽車價格資料集

    現在您已有資料,接下來可以新增一個完全移除 [自負虧損] 資料行的模組。Now that you have your data, you can add a module that removes the normalized-losses column completely. 然後,再新增一個將任何含有遺漏資料的資料列移除的模組。Then, add another module that removes any row that has missing data.

  5. 在搜尋方塊中輸入選取資料行,以尋找選取資料集中的資料行模組。Type select columns in the search box to find the Select Columns in Dataset module. 然後,將其拖曳到實驗畫布上。Then drag it to the experiment canvas. 此模組可讓您選取要將哪些資料行包含在模型中,或是從模型中排除。This module allows you to select which columns of data you want to include or exclude in the model.

  6. 汽車價格資料 (原始) 資料集的輸出連接埠連線至「選取資料集中的資料行」的輸入連接埠。Connect the output port of the Automobile price data (Raw) dataset to the input port of the Select Columns in Dataset.

    此動畫 gif 顯示如何將「汽車價格資料」模組連線至「選取資料行」模組

  7. 選取「選取資料集中的資料行」模組,然後選取 [屬性] 窗格中的 [啟動資料行選取器] 。Select the Select Columns in Dataset module and select Launch column selector in the Properties pane.

    1. 在左側選取 [套用規則] On the left, select With rules

    2. 在 [開始於] 旁邊,選取 [所有資料行] 。Next to Begin With, select All columns. 這些規則會指示選取資料集中的資料行傳遞所有資料行 (但我們將排除的資料行除外)。These rules direct Select Columns in Dataset to pass through all the columns (except those columns we're about to exclude).

    3. 在下拉式清單中,選取 [排除] 和 [資料行名稱] ,然後在文字方塊內輸入自負盈虧From the drop-downs, select Exclude and column names, and then type normalized-losses inside the text box.

    4. 按一下 [確定] 按鈕,以關閉資料行選取器 (位於右下方)。Select the OK button to close the column selector (on the lower right).

    現在,[選取資料集中的資料行] 的屬性窗格指出它會傳遞資料集中的所有資料行,但 [自負虧損] 除外。Now the properties pane for Select Columns in Dataset indicates that it will pass through all columns from the dataset except normalized-losses.

  8. 按兩下選取資料集中的資料行模組,並輸入「排除自負虧損」,以將註解新增至該模組。Add a comment to the Select Columns in Dataset module by double-clicking the module and entering "Exclude normalized losses.". 這有助於您快速檢視模組在您實驗中的執行情況。This can help you see, at a glance, what the module is doing in your experiment.

    此螢幕擷取畫面顯示「選取資料行」模組的正確組態

  9. 在搜尋方塊中輸入清除,以尋找清除遺漏的資料模組。Type Clean in the Search box to find the Clean Missing Data module. 清除遺漏的資料模組拖曳到實驗畫布,然後將其連線至選取資料集中的資料行模組。Drag the Clean Missing Data module to the experiment canvas and connect it to the Select Columns in Dataset module.

  10. 在 [屬性] 窗格中,選取 [清除模式] 底下的 [移除整個資料列] 。In the Properties pane, select Remove entire row under Cleaning mode. 這些選項會指示清除遺漏的資料藉由移除含任何遺漏值的資料列來清除資料。These options direct Clean Missing Data to clean the data by removing rows that have any missing values. 按兩下模組,並輸入註解「移除遺漏值資料列」。Double-click the module and type the comment "Remove missing value rows."

此螢幕擷取畫面顯示「清除遺漏的資料」模組的正確組態

訓練模型Train the model

現在資料已就緒,接下來可以建構預測模型。Now that the data is ready, you can construct a predictive model. 您將使用您的資料來訓練模型。You'll use your data to train the model. 然後,您將測試模型以確認它預測價格的精準度。Then you'll test the model to see how closely it's able to predict prices.

分類迴歸是兩種受監督的機器學習服務演算法。Classification and regression are two types of supervised machine learning algorithms. 分類可從一組已定義的類別預測答案,例如色彩 (紅色、藍色或綠色)。Classification predicts an answer from a defined set of categories, such as a color (red, blue, or green). 迴歸可用來預測數字。Regression is used to predict a number.

因為要預測價格,也就是一個數字,因此您將使用迴歸演算法。Because you want to predict price, which is a number, you can use a regression algorithm. 在此範例中,您將使用線性迴歸模型。For this example, you'll use a linear regression model.

您將藉由提供一組包含價格的資料來訓練模型。Train the model by giving it a set of data that includes the price. 模型會掃描的資料,然後尋找汽車性能與價格之間的關聯性。The model scans the data and looks for correlations between a car's features and its price. 然後,藉由為模型提供一組它所熟悉的汽車性能,並確認模型預測已知價格的精準度,來測試模型。Then test the model by giving it a set of features for automobiles it's familiar with and see how close the model comes to predicting the known price.

請將資料分割成個別的訓練和測試資料集,用來訓練和測試模型。Use your data for both training the model and testing it by splitting the data into separate training and testing datasets.

  1. 在搜尋方塊中輸入分割資料以尋找分割資料模組,並將其連線至清除遺漏的資料模組的左側連接埠。Type split data in the search box to find the Split Data module and connect it to the left port of the Clean Missing Data module.

  2. 選取您剛剛連線的分割資料模組。Select the Split Data module you just connected to select it. 在 [屬性] 窗格中,將第一個輸出資料集中的資料列比例設為 0.7。In the Properties pane, set the Fraction of rows in the first output dataset to 0.7. 如此,我們將使用百分之 70 的資料來訓練模型,並保留百分之 30 供測試之用。This way, we'll use 70 percent of the data to train the model, and hold back 30 percent for testing.

    此螢幕擷取畫面顯示屬性窗格的正確組態。

  3. 按兩下 [分割資料] ,然後輸入註解「將資料集分割為訓練集 (0.7) 和測試集 (0.3)」Double-click the Split Data and type the comment "Split the dataset into training set(0.7) and test set(0.3)"

  4. 若要選取學習演算法,請清除您的模組選擇區搜尋方塊。To select the learning algorithm, clear your module palette search box.

  5. 依序展開 [機器學習] 和 [初始化模型] 。Expand the Machine Learning then expand Initialize Model. 這會顯示數個可用來初始化機器學習演算法的模組類別。This displays several categories of modules that can be used to initialize machine learning algorithms.

  6. 在此實驗中,請選取 [迴歸] > [線性迴歸] ,然後將其拖曳到實驗畫布。For this experiment, select Regression > Linear Regression and drag it to the experiment canvas.

    此螢幕擷取畫面顯示屬性窗格的正確組態。

  7. 找出訓練模型模組,並將其拖曳到實驗畫布。Find and drag the Train Model module to the experiment canvas. 將「線性迴歸」模組的輸出連線至「訓練模型」模組的左側輸入,並將分割資料模組的訓練資料輸出 (左側連接埠) 連線至訓練模型模組的右側輸入。Connect the output of the Linear Regression module to the left input of the Train Model module, and connect the training data output (left port) of the Split Data module to the right input of the Train Model module.

    此螢幕擷取畫面顯示「訓練模型」模組的正確組態。

  8. 選取 [訓練模型] 模組。Select the Train Model module. 在 [屬性] 窗格中選取 [啟動資料行選取器],然後在 [包含資料行名稱] 旁邊輸入價格In the Properties pane, Select Launch column selector and then type price next to Include column names. 「價格」是我們的模型所將預測的值Price is the value that your model is going to predict

    此螢幕擷取畫面顯示資料行選取器的正確組態。

    此時實驗應呈現如下。Now the experiment should look like. 此螢幕擷取畫面顯示實驗在新增「訓練模型」模組之後的正確組態。Screenshot showing the correct configuration of the experiment after adding the Train Model module.

執行訓練實驗Run the training experiment

對計算目標執行實驗,此目標為連結至工作區的計算資源。An experiment runs on a compute target, a compute resource that is attached to your workspace. 建立計算目標之後,您可以將其重複用於未來的執行。Once you create a compute target, you can reuse it for future runs.

  1. 選取底部的 [執行] 以執行實驗。Select Run at the bottom to run the experiment.

    執行實驗

  2. 當 [設定計算目標] 對話方塊出現時,如果您的工作區中已有計算資源,您即可加以選取。When the Setup Compute Targets dialog appears, if your workspace already has a compute resource, you can select it now. 否則,請選取 [新建] 。Otherwise, select Create new.

    注意

    視覺化介面只能對 Machine Learning Compute 目標執行實驗。The visual interface can only run experiments on Machine Learning Compute targets. 其他計算目標將不會顯示。Other compute targets will not be shown.

  3. 提供計算資源的名稱。Provide a name for the compute resource.

  4. 選取 [執行] 。Select Run.

    設定計算目標

    此時會建立計算資源。The compute resource will now be created. 在實驗的右上角檢視狀態。View the status in the top-right corner of the experiment.

    注意

    建立計算資源大約需要 5 分鐘。It takes approximately 5 minutes to create a compute resource. 資源建立後,您可以在未來執行時加以重複使用,而略過這段等候時間。After the resource is created, you can reuse it and skip this wait time for future runs.

    計算資源在閒置時會自動調整為 0 個節點,以節省成本。The compute resource will autoscale to 0 nodes when it is idle to save cost. 當您在一段時間後再次加以使用時,它在重新相應增加時可能又會再出現約 5 分鐘的等候時間。When you use it again after a delay, you may again experience approximately 5 minutes of wait time while it scales back up.

對模型進行評分和評估Score and evaluate the model

現在您已完成使用百分之 70 資料模型的訓練,而可用它來為其他百分之 30 的資料評分,以了解模型的運作是否理想。Now that you've trained the model using 70 percent of your data, you can use it to score the other 30 percent of the data to see how well your model functions.

  1. 在搜尋方塊中輸入評分模型以尋找評分模型模組,並將其拖曳到實驗畫布。Type score model in the search box to find the Score Model module and drag the module to the experiment canvas. 訓練模型模組的輸出連線至評分模型的左側輸入連接埠。Connect the output of the Train Model module to the left input port of Score Model. 分割資料模組的測試資料輸出 (右側連接埠) 連線至評分模型的右側輸入連接埠。Connect the test data output (right port) of the Split Data module to the right input port of Score Model.

  2. 在搜尋方塊中輸入評估以尋找 [評估模型] ,並將模組拖曳到實驗畫布。Type evaluate in the search box to find the Evaluate Model and drag the module to the experiment canvas. 評分模型模組的輸出連線至評估模型的左側輸入。Connect the output of the Score Model module to the left input of Evaluate Model. 實驗最終應呈現如下:The final experiment should look something like this:

    此螢幕擷取畫面顯示實驗的正確組態。

  3. 使用先前使用的相同計算目標來執行實驗。Run the experiment using the same compute target used previously.

  4. 選取評分模型的輸出連接埠並選取 [視覺化] ,以檢視評分模型模組的輸出。View the output from the Score Model module by selecting the output port of Score Model and select Visualize. 輸出會顯示價格的預測值,以及來自測試資料的已知值。The output shows the predicted values for price and the known values from the test data.

    此螢幕擷取畫面將「評分標籤」資料行醒目提示的輸出視覺效果

  5. 若要檢視「評估模型」模組的輸出,請選取輸出連接埠,然後選取 [視覺化]。To view the output from the Evaluate Model module, select the output port, and then select Visualize.

    此螢幕擷取畫面顯示最終實驗的評估結果。

您的模型會顯示下列統計資料:The following statistics are shown for your model:

  • 平均絕對誤差 (MAE) :絕對誤差的平均值 (「誤差」是指預測值與實際值之間的差異)。Mean Absolute Error (MAE): The average of absolute errors (an error is the difference between the predicted value and the actual value).
  • 均方根誤差 (RMSE) :對測試資料集所做之預測的平方誤差的評分根平均值。Root Mean Squared Error (RMSE): The square root of the average of squared errors of predictions made on the test dataset.
  • 相對絕對誤差:相對於實際值與所有實際值之平均值之間的絕對差異的絕對誤差平均值。Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
  • 相對平方誤差:相對於實際值與所有實際值之平均值之間的平方差異的平方誤差平均值。Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
  • 決定係數:也稱為 R 平方值,這是一個統計計量,可指出模型對於資料的適用程度。Coefficient of Determination: Also known as the R squared value, this is a statistical metric indicating how well a model fits the data.

針對每個誤差統計資料,越小越好。For each of the error statistics, smaller is better. 值越小,表示預測越接近實際值。A smaller value indicates that the predictions more closely match the actual values. 就 [決定係數] 而言,其值愈接近一 (1.0),預測就愈精準。For Coefficient of Determination, the closer its value is to one (1.0), the better the predictions.

在 Azure Machine Learning 服務工作區中管理實驗Manage experiments in Azure Machine Learning service workspace

您可以透過 Azure Machine Learning 服務工作區來管理在視覺化介面中建立的實驗。The experiments you create in the visual interface can be managed from the Azure Machine Learning service workspace. 您可以使用工作區查看更多詳細資訊,例如個人實驗執行、診斷記錄、執行圖形等等。Use the workspace to see more detailed information such as individuals experiment runs, diagnostic logs, execution graphs, and more.

  1. Azure 入口網站中開啟工作區。Open your workspace in the Azure portal.

  2. 在您的工作區中,選取 [實驗] 。In your workspace, select Experiments. 然後,選取您建立的實驗。Then select the experiment you created.

    顯示如何在 Azure 入口網站中瀏覽至實驗的螢幕擷取畫面

    在此頁面上,您會看到實驗的概觀及其最新的執行。On this page, you'll see an overview of the experiment and its latest runs.

    此螢幕擷取畫面顯示實驗統計資料在 Azure 入口網站中的概觀

  3. 選取執行號碼以查看特定執行的相關詳細資料。Select a run number to see more details about a specific execution.

    詳細執行報告的螢幕擷取畫面

    執行報告會即時更新。The run report is updated in real time. 如果您在實驗中使用執行 Python 指令碼模組,您可以在 [記錄] 索引標籤中指定要輸出的指令碼記錄。If you used an Execute Python Script module in your experiment, you can specify script logs to output in the Logs tab.

清除資源Clean up resources

重要

您可以使用您所建立的資源來作為其他 Azure Machine Learning 服務教學課程和操作說明文章的先決條件。You can use the resources that you created as prerequisites for other Azure Machine Learning service tutorials and how-to articles.

刪除所有內容Delete everything

如果您不打算使用您所建立的任何資源,請刪除整個資源群組,以免產生任何費用:If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges:

  1. 在 Azure 入口網站中,於視窗左側選取 [資源群組] 。In the Azure portal, select Resource groups on the left side of the window.

    在 Azure 入口網站中刪除資源群組

  2. 在清單中,選取您所建立的資源群組。In the list, select the resource group that you created.

  3. 在視窗的右側,選取省略符號按鈕 ( ... )。On the right side of the window, select the ellipsis button (...).

  4. 選取 [刪除資源群組] 。Select Delete resource group.

刪除資源群組同時會刪除您在視覺化介面中所建立的所有資源。Deleting the resource group also deletes all resources that you created in the visual interface.

僅刪除計算目標Delete only the compute target

您在這裡建立的計算目標會在不使用時自動調整為零個節點。The compute target that you created here automatically autoscales to zero nodes when it's not being used. 這是為了盡量降低費用。This is to minimize charges. 如果您想要刪除計算目標,請採取下列步驟: If you want to delete the compute target, take these steps:

  1. Azure 入口網站中,開啟您的工作區。In the Azure portal, open your workspace.

    刪除計算目標

  2. 在工作區的 [計算] 區段中選取資源。In the Compute section of your workspace, select the resource.

  3. 選取 [刪除] 。Select Delete.

刪除個別資產Delete individual assets

在實驗建立所在的視覺化介面中,藉由選取個別資產再選取 [刪除] 按鈕,即可刪除個別資產。In the visual interface where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.

刪除實驗

後續步驟Next steps

在本教學課程的第一部分中,您已完成下列步驟:In part one of this tutorial, you completed these steps:

  • 重複使用在快速入門中建立的實驗Reuse the experiment created in the Quickstart
  • 準備資料Prepare the data
  • 訓練模型Train the model
  • 對模型進行評分和評估Score and evaluate the model

在第二部分中,您將了解如何將模型部署為 Azure Web 服務。In part two, you'll learn how to deploy your model as an Azure web service.