教學課程:使用設計工具預測汽車價格Tutorial: Predict automobile price with the designer

在這個分成兩部分的教學課程中,您將了解如何使用 Azure Machine Learning 設計工具來訓練及部署預測可預測任何汽車價格的機器學習模型。In this two-part tutorial, you learn how to use the Azure Machine Learning designer to train and deploy a machine learning model that predicts the price of any car. 設計工具是一項拖放工具,可讓您建立機器學習模型,而且不需要任何一行程式碼。The designer is a drag-and-drop tool that lets you create machine learning models without a single line of code.

在教學課程的第一部分中,您將了解如何:In part one of the tutorial, you'll learn how to:

  • 建立新管線。Create a new pipeline.
  • 匯入資料。Import data.
  • 準備資料。Prepare data.
  • 將機器學習模型定型。Train a machine learning model.
  • 評估機器學習模型。Evaluate a machine learning model.

在教學課程的第二部分中,您會將模型部署為即時推斷端點,以根據您傳入的技術規格預測任何汽車的價格。In part two of the tutorial, you'll deploy your model as a real-time inferencing endpoint to predict the price of any car based on technical specifications you send it.

注意

本教學課程完成後,將可作為範例管線。A completed version of this tutorial is available as a sample pipeline.

若要尋找此範例,請移至工作區中的設計工具。To find it, go to the designer in your workspace. 在 [新增管線] 區段中,選取 [範例 1 - 迴歸:汽車價格預測 (基本)]In the New pipeline section, select Sample 1 - Regression: Automobile Price Prediction(Basic).

重要

如果您看不到這份文件中提及的圖形元素,例如工作室或設計工具中的按鈕,可能是您沒有工作區的正確權限層級。If you do not see graphical elements mentioned in this document, such as buttons in studio or designer, you may not have the right level of permissions to the workspace. 請洽詢您的 Azure 訂用帳戶管理員,以確認您已獲得授與正確的存取層級。Please contact your Azure subscription administrator to verify that you have been granted the correct level of access. 如需詳細資訊,請參閱管理使用者和角色For more information, see Manage users and roles.

建立新管線Create a new pipeline

Azure Machine Learning 管線會將多個機器學習和資料處理步驟組織成單一資源。Azure Machine Learning pipelines organize multiple machine learning and data processing steps into a single resource. 管線可讓您在不同的專案和使用者間組織、管理和重複使用複雜的機器學習工作流程。Pipelines let you organize, manage, and reuse complex machine learning workflows across projects and users.

若要建立 Azure Machine Learning 管線,您必須要有 Azure Machine Learning 工作區。To create an Azure Machine Learning pipeline, you need an Azure Machine Learning workspace. 在本節中,您將了解如何建立這些資源。In this section, you learn how to create both these resources.

建立新的工作區Create a new workspace

若要使用設計工具,您首先需要 Azure Machine Learning 工作區。In order to use the designer, you first need an Azure Machine Learning workspace. 工作區是 Azure Machine Learning 的最上層資源,其提供一個集中位置來處理您在 Azure Machine Learning 中建立的所有成品。The workspace is the top-level resource for Azure Machine Learning, it provides a centralized place to work with all the artifacts you create in Azure Machine Learning.

建立管線Create the pipeline

  1. 登入 ml.azure.com,並選取您要使用的工作區。Sign in to ml.azure.com, and select the workspace you want to work with.

  2. 選取 [設計工具]。Select Designer.

    視覺效果工作區的螢幕擷取畫面,其中顯示如何存取設計工具

  3. 選取 [易於使用的預建模組]。Select Easy-to-use prebuilt modules.

  4. 在畫布頂端選取預設管線名稱 Pipeline-Created-onAt the top of the canvas, select the default pipeline name Pipeline-Created-on. 請將其重新命名為 汽車價格預測Rename it to Automobile price prediction. 此名稱不必是唯一的。The name doesn't need to be unique.

設定預設計算目標Set the default compute target

對計算目標執行管線,此目標為連結至工作區的計算資源。A pipeline runs on a compute target, which is a compute resource that's attached to your workspace. 建立計算目標之後,您可以將其重複用於未來的執行。After you create a compute target, you can reuse it for future runs.

您可以為整個管線設定 預設計算目標,這將會告訴每個模組依預設使用相同的計算目標。You can set a Default compute target for the entire pipeline, which will tell every module to use the same compute target by default. 不過,您也可以針對每個模組指定計算目標。However, you can specify compute targets on a per-module basis.

  1. 在管線名稱旁邊,選取畫布頂端的 齒輪圖示 齒輪圖示螢幕擷取畫面,即可開啟 [設定] 窗格。Next to the pipeline name, select the Gear icon Screenshot of the gear icon at the top of the canvas to open the Settings pane.

  2. 在畫布右側的 [設定] 窗格中,選取 [選取計算目標]。In the Settings pane to the right of the canvas, select Select compute target.

    如果您已經有可用的計算目標,您可以選取該目標來執行此管線。If you already have an available compute target, you can select it to run this pipeline.

    注意

    設計工具只能在 Azure Machine Learning Compute 上執行訓練實驗,但其他計算目標將不會顯示。The designer can only run training experiments on Azure Machine Learning Compute but other compute targets won't be shown.

  3. 輸入計算資源的名稱。Enter a name for the compute resource.

  4. 選取 [儲存]。Select Save.

    注意

    建立計算資源大約需要五分鐘。It takes approximately five minutes to create a compute resource. 資源建立後,您可以在未來執行時加以重複使用,而略過這段等候時間。After the resource is created, you can reuse it and skip this wait time for future runs.

    計算資源在閒置時會自動調整為零個節點,以節省成本。The compute resource autoscales to zero nodes when it's idle to save cost. 當您在一段時間後再次加以使用時,它在相應增加時可能又會再出現約五分鐘的等候時間。When you use it again after a delay, you might experience approximately five minutes of wait time while it scales back up.

匯入資料Import data

設計工具中包含數個範例資料集,可供您在實驗時使用。There are several sample datasets included in the designer for you to experiment with. 在本教學課程中,使用 汽車價格資料 (未經處理)For this tutorial, use Automobile price data (Raw).

  1. 管線畫布左側是資料集和模組的選擇區。To the left of the pipeline canvas is a palette of datasets and modules. 選取 [範例資料集] 以查看可用的範例資料集。Select Sample datasets to view the available sample datasets.

  2. 選取資料集 汽車價格資料 (原始) ,並將其拖曳到畫布上。Select the dataset Automobile price data (Raw), and drag it onto the canvas.

    將資料拖曳到畫布

將資料視覺化Visualize the data

您可以將資料視覺化,以了解您將使用的資料集。You can visualize the data to understand the dataset that you'll use.

  1. 以滑鼠右鍵按一下 [汽車價格資料 (原始)],然後選取 [視覺化]。Right-click the Automobile price data (Raw) and select Visualize.

  2. 選取資料視窗中的不同資料行,以檢視各個資料行的相關資訊。Select the different columns in the data window to view information about each one.

    每個資料列分別代表一款汽車,而與每款汽車相關聯的變數會顯示為資料行。Each row represents an automobile, and the variables associated with each automobile appear as columns. 此資料集中有 205 個資料列和 26 個資料行。There are 205 rows and 26 columns in this dataset.

準備資料Prepare data

資料集在分析之前通常需進行一些前置處理。Datasets typically require some preprocessing before analysis. 您在檢查資料集時,可能會發現有某些遺漏值。You might have noticed some missing values when you inspected the dataset. 必須清除這些遺漏的值,才能讓模型正確地分析資料。These missing values must be cleaned so that the model can analyze the data correctly.

移除資料行Remove a column

當您定型模型時,您必須對遺漏的資料採取某些動作。When you train a model, you have to do something about the data that's missing. 在此資料集中,自負虧損 資料行遺漏了許多值,因此您會將該資料行完全排除於模型外。In this dataset, the normalized-losses column is missing many values, so you will exclude that column from the model altogether.

  1. 在畫布左側的模組選擇區中,展開 [資料轉換] 區段,然後尋找 [選取資料集中的資料行] 模組。In the module palette to the left of the canvas, expand the Data Transformation section and find the Select Columns in Dataset module.

  2. 選取資料集中的資料行 模組拖曳到畫布上。Drag the Select Columns in Dataset module onto the canvas. 將模組放在資料集模組下。Drop the module below the dataset module.

  3. 汽車價格資料 (未經處理) 資料集連線至 選取資料集中的資料行 模組。Connect the Automobile price data (Raw) dataset to the Select Columns in Dataset module. 從資料集的輸出連接埠 (即畫布上位於資料集底部的小圓圈) 拖曳到 選取資料集中的資料行 的輸入連接埠 (即模組頂端的小圓圈)。Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, to the input port of Select Columns in Dataset, which is the small circle at the top of the module.

    提示

    將某個模組的輸出連接埠連線至另一個模組的輸入連接埠時,您會透過管線建立資料流程。You create a flow of data through your pipeline when you connect the output port of one module to an input port of another.

    連線模組

  4. 選取 選取資料集中的資料行 模組。Select the Select Columns in Dataset module.

  5. 在畫布右側的 [模組詳細資料] 窗格中,選取 [編輯資料行]。In the module details pane to the right of the canvas, select Edit column.

  6. 展開 [包含] 旁邊的 [資料行名稱] 下拉式清單,然後選取 [所有資料行]。Expand the Column names drop down next to Include, and select All columns.

  7. 選取 + 以新增規則。Select the + to add a new rule.

  8. 從下拉式功能表中,選取 [排除] 和 [資料行名稱]。From the drop-down menus, select Exclude and Column names.

  9. 在文字方塊中輸入 自負虧損Enter normalized-losses in the text box.

  10. 在右下方選取 [儲存] 按鈕,以關閉資料行選取器。In the lower right, select Save to close the column selector.

    排除資料行

  11. 選取 選取資料集中的資料行 模組。Select the Select Columns in Dataset module.

  12. 在畫布右側的 [模組詳細資料] 窗格中,選取 [註解] 文字方塊,然後輸入「排除自負虧損」。In the module details pane to the right of the canvas, select the Comment text box and enter Exclude normalized losses.

    圖形上會出現註解,以協助您組織管線。Comments will appear on the graph to help you organize your pipeline.

清除遺漏的資料Clean missing data

移除 自負虧損 資料行之後,您的資料集仍有遺漏值。Your dataset still has missing values after you remove the normalized-losses column. 您可以使用 清除遺漏的資料 模組來移除其餘遺漏的資料。You can remove the remaining missing data by using the Clean Missing Data module.

提示

在使用設計工具中大部分的模組時,都必須從輸入資料中清除遺漏值。Cleaning the missing values from input data is a prerequisite for using most of the modules in the designer.

  1. 在畫布左側的模組選擇區中,展開 [資料轉換] 區段,然後尋找 [清除遺漏資料] 模組。In the module palette to the left of the canvas, expand the section Data Transformation, and find the Clean Missing Data module.

  2. 清除遺漏的資料 模組拖曳至管線畫布上。Drag the Clean Missing Data module to the pipeline canvas. 將其連線至 選取資料集中的資料行 模組。Connect it to the Select Columns in Dataset module.

  3. 選取 [清除遺漏資料] 模組。Select the Clean Missing Data module.

  4. 在畫布右側的 [模組詳細資料] 窗格中,選取 [編輯資料行]。In the module details pane to the right of the canvas, select Edit Column.

  5. 在顯示的 [要清除的資料行] 視窗中,展開 [包含] 旁的下拉式功能表。In the Columns to be cleaned window that appears, expand the drop-down menu next to Include. 選取 [所有資料行]Select, All columns

  6. 選取 [儲存]。Select Save

  7. 在畫布右邊的 [模組詳細資料] 窗格中,選取 [清除模式] 底下的 [整個資料列]。In the module details pane to the right of the canvas, select Remove entire row under Cleaning mode.

  8. 在畫布右側的 [模組詳細資料] 窗格中,選取 [註解] 方塊,然後輸入「移除遺漏值資料列」。In the module details pane to the right of the canvas, select the Comment box, and enter Remove missing value rows.

    您的管線此時應會顯示如下:Your pipeline should now look something like this:

    Select-column

訓練機器學習模型Train a machine learning model

現在您已備妥用來處理資料的模組,接下來即可設定定型模組。Now that you have the modules in place to process the data, you can set up the training modules.

因為要預測價格,也就是一個數字,因此您將使用迴歸演算法。Because you want to predict price, which is a number, you can use a regression algorithm. 在此範例中,您將使用線性迴歸模型。For this example, you use a linear regression model.

分割資料Split the data

分割資料是機器學習服務中常見的工作。Splitting data is a common task in machine learning. 您會將資料分割成兩個不同的資料集。You will split your data into two separate datasets. 一個資料集會定型模型,另一個則會測試模型的執行效果。One dataset will train the model and the other will test how well the model performed.

  1. 在模組選擇區中展開 [資料轉換] 區段,然後尋找 [分割資料] 模組。In the module palette, expand the section Data Transformation and find the Split Data module.

  2. 將 [分割資料] 模組拖曳到管線畫布上。Drag the Split Data module to the pipeline canvas.

  3. 清除遺漏的資料 模組左側的連接埠連線至 分割資料 模組。Connect the left port of the Clean Missing Data module to the Split Data module.

    重要

    請確實將 清除遺漏的資料 的左側輸出連接埠連線至 分割資料Be sure that the left output ports of Clean Missing Data connects to Split Data. 左側連接埠包含已清除的資料。The left port contains the the cleaned data. 右側連接埠包含已捨棄的資料。The right port contains the discarted data.

  4. 選取 分割資料 模組。Select the Split Data module.

  5. 在畫布右側的 [模組詳細資料] 窗格中,將 [第一個輸出資料集中的資料列比例] 設定為 0.7。In the module details pane to the right of the canvas, set the Fraction of rows in the first output dataset to 0.7.

    此選項會分割 70% 的資料來定型模型,而 30% 供測試之用。This option splits 70 percent of the data to train the model and 30 percent for testing it. 70% 的資料集將透過左側輸出連接埠來存取。The 70 percent dataset will be accessible through the left output port. 其餘資料可透過右側輸出連接埠取得。The remaining data will be available through the right output port.

  6. 在畫布右側的 [模組詳細資料] 窗格中,選取 [註解] 方塊,然後輸入「將資料集分割成訓練集 (0.7) 與測試集 (0.3)」。In the module details pane to the right of the canvas, select the Comment box, and enter Split the dataset into training set (0.7) and test set (0.3).

將模型定型Train the model

藉由提供一個包含價格的資料集,將模型定型。Train the model by giving it a dataset that includes the price. 此演算法會建立一個模型,用以說明定型資料所呈現的特性與價格之間的關聯性。The algorithm constructs a model that explains the relationship between the features and the price as presented by the training data.

  1. 在模組選擇區中,展開 [機器學習演算法]。In the module palette, expand Machine Learning Algorithms.

    此選項會顯示數個可用來初始化學習演算法的模組類別。This option displays several categories of modules that you can use to initialize learning algorithms.

  2. 選取 [迴歸] > [線性迴歸],然後將其拖曳到管線畫布上。Select Regression > Linear Regression, and drag it to the pipeline canvas.

  3. 在模組選擇區中,展開 [模組訓練] 區段,然後將 [訓練模型] 模組拖曳至畫布。In the module palette, expand the section Module training, and drag the Train Model module to the canvas.

  4. 將 [線性迴歸] 模組的輸出連接到 [訓練模型] 模組的左側輸入。Connect the output of the Linear Regression module to the left input of the Train Model module.

  5. 分割資料 模組的訓練資料輸出 (左側連接埠) 連接到 訓練模型 模組的右側輸入。Connect the training data output (left port) of the Split Data module to the right input of the Train Model module.

    重要

    請確實將 分割資料 的左側輸出連接埠連線至 定型模型Be sure that the left output ports of Split Data connects to Train Model. 左側連接埠包含定型集。The left port contains the the training set. 右側連接埠包含測試集。The right port contains the test set.

    此螢幕擷取畫面顯示「訓練模型」模組的正確組態。「線性迴歸」模組連線至「訓練模型」模組的左側連接埠,「分割資料」模組連線至「訓練模型」的右側連接埠。

  6. 選取 訓練模型 模組。Select the Train Model module.

  7. 在畫布右側的 [模組詳細資料] 窗格中,選取 [編輯資料行] 選取器。In the module details pane to the right of the canvas, select Edit column selector.

  8. 在 [標籤資料行] 對話方塊中,展開下拉式功能表,然後選取 [資料行名稱]。In the Label column dialog box, expand the drop-down menu and select Column names.

  9. 在文字方塊中輸入 價格,以指定您的模型要預測的值。In the text box, enter price to specify the value that your model is going to predict.

    重要

    請確定您輸入的資料行名稱完全相符。Make sure you enter the column name exactly. price 的首字母請勿使用大寫。Do not capitalize price.

    您的管線應會顯示如下:Your pipeline should look like this:

    此螢幕擷取畫面顯示管線在新增「訓練模型」模組之後的正確組態。

新增評分模型模組Add the Score Model module

使用 70% 的資料來定型模型後,您即可將該模型用來為其他 30% 的資料評分,以了解模型的運作是否理想。After you train your model by using 70 percent of the data, you can use it to score the other 30 percent to see how well your model functions.

  1. 在搜尋方塊中輸入 評分模型 以尋找 評分模型 模組。Enter score model in the search box to find the Score Model module. 將此模組拖曳到管線畫布上。Drag the module to the pipeline canvas.

  2. 訓練模型 模組的輸出連線至 評分模型 的左側輸入連接埠。Connect the output of the Train Model module to the left input port of Score Model. 分割資料 模組的測試資料輸出 (右側連接埠) 連線至 評分模型 的右側輸入連接埠。Connect the test data output (right port) of the Split Data module to the right input port of Score Model.

新增評估模型模組Add the Evaluate Model module

使用 評估模型 模組,評估您的模型在測試資料集下的評分。Use the Evaluate Model module to evaluate how well your model scored the test dataset.

  1. 在搜尋方塊中輸入 評估,以尋找 評估模型 模組。Enter evaluate in the search box to find the Evaluate Model module. 將此模組拖曳到管線畫布上。Drag the module to the pipeline canvas.

  2. 評分模型 模組的輸出連線至 評估模型 的左側輸入。Connect the output of the Score Model module to the left input of Evaluate Model.

    最終的管線應會顯示如下:The final pipeline should look something like this:

    此螢幕擷取畫面顯示管線的正確組態。

提交管線Submit the pipeline

現在管線已全部設定完成,您可以提交管線執行來定型您的機器學習模型。Now that your pipeline is all setup, you can submit a pipeline run to train your machine learning model. 您可以在任何時間點提交有效的管線執行,以便在開發期間用來檢閱管線的變更。You can submit a valid pipeline run at any point, which can be used to review changes to your pipeline during development.

  1. 在畫布頂端,選取 [提交]。At the top of the canvas, select Submit.

  2. 在 [設定管線執行] 對話方塊中,選取 [新建]。In the Set up pipeline run dialog box, select Create new.

    注意

    實驗群組的類似管線會一起執行。Experiments group similar pipeline runs together. 如果您多次執行某個管線,您可以選取相同的實驗進行後續執行。If you run a pipeline multiple times, you can select the same experiment for successive runs.

    1. 輸入 新實驗名稱 的描述性名稱。Enter a descriptive name for New experiment Name.

    2. 選取 [提交]。Select Submit.

    您可以在畫布右上方檢視執行狀態和詳細資料。You can view run status and details at the top right of the canvas.

    如果是第一次執行,您的管線可能需要 20 分鐘的時間才能完成執行。If is the first run, it may take up to 20 minutes for your pipeline to finish running. 預設計算設定的最小節點大小為 0,這表示設計工具必須在閒置之後配置資源。The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. 重複的管線執行花費較少的時間,因為已經配置計算資源。Repeated pipeline runs will take less time since the compute resources are already allocated. 此外,設計工具會針對每個模組使用快取的結果,進一步提升效率。Additionally, the designer uses cached results for each module to further improve efficiency.

檢視評分標籤View scored labels

執行完成後,您可以檢視管線執行的結果。After the run completes, you can view the results of the pipeline run. 首先,請查看迴歸模型產生的預測。First, look at the predictions generated by the regression model.

  1. 以滑鼠右鍵按一下 [評分模型] 模組,然後選取 [視覺化] 以檢視輸出結果。Right click the Score Model module, and select Visualize to view its output.

    您可以在這裡看到測試資料中的預測價格和實際價格。Here you can see the predicted prices and the actual prices from the testing data.

    此螢幕擷取畫面將「評分標籤」資料行醒目提示的輸出視覺效果

評估模型Evaluate models

使用 評估模型,查看定型模型對測試資料集的執行效果。Use the Evaluate Model to see how well the trained model performed on the test dataset.

  1. 以滑鼠右鍵按一下 [評估模型] 模組,然後選取 [視覺化] 以檢視輸出結果。Right-click the Evaluate Model module and select Visualize to view its output.

您的模型會顯示下列統計資料:The following statistics are shown for your model:

  • 平均絕對誤差 (MAE) :絕對誤差的平均值。Mean Absolute Error (MAE): The average of absolute errors. 誤差是指預測值與實際值之間的差異。An error is the difference between the predicted value and the actual value.
  • 均方根誤差 (RMSE) :對測試資料集所做之預測的平方誤差的評分根平均值。Root Mean Squared Error (RMSE): The square root of the average of squared errors of predictions made on the test dataset.
  • 相對絕對誤差:相對於實際值與所有實際值之平均值之間的絕對差異的絕對誤差平均值。Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.
  • 相對平方誤差:相對於實際值與所有實際值之平均值之間的平方差異的平方誤差平均值。Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
  • 決定係數:也稱為 R 平方值,這是一個統計計量,可指出模型對於資料的適用程度。Coefficient of Determination: Also known as the R squared value, this statistical metric indicates how well a model fits the data.

針對每個誤差統計資料,越小越好。For each of the error statistics, smaller is better. 值越小,表示預測越接近實際值。A smaller value indicates that the predictions are closer to the actual values. 就決定係數而言,其值愈接近一 (1.0),預測就愈精準。For the coefficient of determination, the closer its value is to one (1.0), the better the predictions.

清除資源Clean up resources

如果您想要繼續進行本教學課程的第2部分:部署模型,請略過本節。Skip this section if you want to continue on with part 2 of the tutorial, deploying models.

重要

您可以使用您所建立的資源,作為其他 Azure Machine Learning 教學課程和操作說明文章的先決條件。You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to articles.

刪除所有內容Delete everything

如果您不打算使用所建立的任何資源,請刪除整個資源群組,以免產生任何費用。If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges.

  1. 在 Azure 入口網站中,於視窗左側選取 [資源群組] 。In the Azure portal, select Resource groups on the left side of the window.

    在 Azure 入口網站中刪除資源群組

  2. 在清單中,選取您所建立的資源群組。In the list, select the resource group that you created.

  3. 選取 [刪除資源群組] 。Select Delete resource group.

刪除資源群組同時會刪除您在設計工具中建立的所有資源。Deleting the resource group also deletes all resources that you created in the designer.

刪除個別資產Delete individual assets

在建立實驗的設計工具中,藉由選取個別資產,再選取 [刪除] 按鈕,即可刪除個別資產。In the designer where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.

您在這裡建立的計算目標會在不使用時自動調整為零個節點。The compute target that you created here automatically autoscales to zero nodes when it's not being used. 如此可將費用降至最低。This action is taken to minimize charges. 如果您想要刪除計算目標,請採取下列步驟: If you want to delete the compute target, take these steps:

刪除資產

您可以選取每個資料集並選取 [取消註冊] ,從工作區中將資料集取消註冊。You can unregister datasets from your workspace by selecting each dataset and selecting Unregister.

取消註冊資料集

若要刪除資料集,請使用 Azure 入口網站或 Azure 儲存體總管移至儲存體帳戶,並手動刪除這些資產。To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually delete those assets.

後續步驟Next steps

在第二部分中,您將了解如何將模型部署為即時端點。In part two, you'll learn how to deploy your model as a real-time endpoint.