快速入門:準備資料並加以視覺化,不需在 Azure Machine Learning 中撰寫程式碼Quickstart: Prepare and visualize data without writing code in Azure Machine Learning

在具備拖放功能的 Azure Machine Learning 介面中準備資料並加以視覺化。Prepare and visualize your data in the drag-and-drop visual interface (preview) for Azure Machine Learning. 您將使用的資料包含許多個別汽車的項目,包括製造、模型、技術規格和價格等資訊。The data you'll use includes entries for various individual automobiles, including information such as make, model, technical specifications, and price. 完成本快速入門後,您就能使用此資料來預測汽車的價格。Once you complete this quickstart, you'll be ready to use this data to predict an automobile's price.

定型機器學習模型之前,您需要了解並準備您的資料。Before you train a machine learning model, you need to understand and prepare your data. 在本快速入門中,您將:In this quickstart you'll:

  • 建立您的第一個實驗以新增和預覽資料Create your first experiment to add and preview data
  • 移除遺漏的值以準備資料Prepare the data by removing missing values
  • 執行實驗Run the experiment
  • 將產生的資料視覺化Visualize the resulting data

如果您之前未曾接觸過機器學習,適用於初學者的資料科學影片系列將可帶領您認識機器學習。If you're brand new to machine learning, the video series Data Science for Beginners is a great introduction to machine learning.

必要條件Prerequisites

如果您沒有 Azure 訂用帳戶,請在開始前先建立一個免費帳戶。If you don’t have an Azure subscription, create a free account before you begin. 立即試用免費或付費版本的 Azure Machine Learning 服務Try the free or paid version of Azure Machine Learning service today.

建立工作區Create a workspace

如果您有 Azure Machine Learning 服務工作區,請跳至下一節If you have an Azure Machine Learning service workspace, skip to the next section. 否則,請建立此工作區。Otherwise, create one now.

  1. 使用您所使用之 Azure 訂用帳戶的認證來登入 Azure 入口網站Sign in to the Azure portal by using the credentials for the Azure subscription you use.

    Azure 入口網站

  2. 在入口網站的左上角,選取 [建立資源] 。In the upper-left corner of the portal, select Create a resource.

    在 Azure 入口網站中建立資源

  3. 在搜尋列中輸入 Machine LearningIn the search bar, enter Machine Learning. 選取 Machine Learning Service 工作區的搜尋結果。Select the Machine Learning service workspace search result.

    搜尋工作區

  4. 在 [ML 服務工作區] 窗格中選取 [建立] 來開始操作。In the ML service workspace pane, select Create to begin.

    建立按鈕

  5. 在 [ML 服務工作區] 窗格中,設定您的工作區。In the ML service workspace pane, configure your workspace.

    欄位Field 說明Description
    工作區名稱Workspace name 輸入可識別您工作區的唯一名稱。Enter a unique name that identifies your workspace. 在此範例中,我們使用 docs-wsIn this example, we use docs-ws. 名稱必須是整個資源群組中唯一的。Names must be unique across the resource group. 請使用可輕鬆回想並且與其他人建立的工作區有所區別的名稱。Use a name that's easy to recall and differentiate from workspaces created by others.
    訂用帳戶Subscription 選取您要使用的 Azure 訂用帳戶。Select the Azure subscription that you want to use.
    資源群組Resource group 在您的訂用帳戶中使用現有的資源群組,或輸入名稱來建立新的資源群組。Use an existing resource group in your subscription, or enter a name to create a new resource group. 資源群組會保留 Azure 方案的相關資源。A resource group holds related resources for an Azure solution. 在此範例中,我們使用 docs-amlIn this example, we use docs-aml.
    位置Location 選取最接近您的使用者與資料資源的位置。Select the location closest to your users and the data resources. 此位置是建立工作區的所在位置。This location is where the workspace is created.

    建立工作區

  6. 若要開始執行建立程序,請選取 [檢閱 + 建立] 。To start the creation process, select Review + Create.

    建立

  7. 檢閱工作區設定。Review your workspace configuration. 如果正確,請選取 [建立] 。If it is correct, select Create. 建立工作區可能需要一些時間。It can take a few moments to create the workspace.

    建立

  8. 若要檢查部署狀態,請選取工具列上的通知圖示 (鈴鐺) 。To check on the status of the deployment, select the Notifications icon, bell, on the toolbar.

  9. 程序完成後,會出現部署成功訊息。When the process is finished, a deployment success message appears. 它也會出現在通知區段中。It's also present in the notifications section. 若要檢視新的工作區,選取 [前往資源] 。To view the new workspace, select Go to resource.

    工作區建立狀態

開啟視覺化介面網頁Open the visual interface webpage

  1. Azure 入口網站中開啟工作區。Open your workspace in the Azure portal.

  2. 在您的工作區中,選取 [視覺化介面] 。In your workspace, select Visual interface. 然後,選取 [啟動視覺化介面] 。Then select Launch visual interface.

    啟動視覺化介面

    介面網頁會在新的瀏覽器頁面中開啟。The interface webpage opens in a new browser page.

建立您的第一個實驗Create your first experiment

視覺化介面工具提供互動式的視覺化位置,讓您輕鬆建置、測試和反覆運算預測分析模型。The visual interface tool provides an interactive, visual place to easily build, test, and iterate on a predictive analysis model. 您可以將資料集和分析模組拖放到互動式畫布,將它們連接在一起以構成「實驗」 。You drag-and-drop datasets and analysis modules onto an interactive canvas, connecting them together to form an experiment. 現在請建立您的第一個實驗。Create your first experiment now.

  1. 在左下角,選取 [新增] 。In the bottom-left corner, select Add New. 新增實驗Add new experiment

  2. 選取 [空白實驗] 。Select Blank Experiment.

  3. 您的實驗會獲得預設名稱。Your experiment is given a default name. 選取這段文字,並將其重新命名為「快速入門 - 探索資料」。Select this text and rename it to "Quickstart-explore data." 此名稱不必是唯一的。This name doesn't need to be unique.

  4. 畫面底部的縮圖可用來檢視大型實驗。The Mini Map at the bottom of the screen is useful for viewing large experiments. 您在本快速入門中不需使用縮圖,因此請按一下頂端的箭號將其最小化。You won't need it in this quickstart so click on the arrow at the top to minimize it.

    將實驗重新命名

加入資料Add data

要執行機器學習,首先您必須要有資料。The first thing you need for machine learning is data. 此介面附有多個範例資料集供您使用,或者,您可以從許多來源匯入資料。There are several sample datasets included in this interface that you can use, or you can import data from many sources. 在此範例中,您將使用範例資料集汽車價格資料 (原始)For this example, you'll use the sample dataset Automobile price data (Raw).

  1. 實驗畫布左側是資料集和模組的調色盤。To the left of the experiment canvas is a palette of datasets and modules. 選取 [儲存的資料集] ,然後選取 [範例] 以檢視可用的範例資料集。Select Saved Datasets then select Samples to view the available sample datasets.

  2. 選取資料集汽車價格資料 (原始) ,並將其拖曳到畫布上。Select the dataset, Automobile price data (raw), and drag it onto the canvas.

    將資料拖曳到畫布

Select columnsSelect columns

選取要使用的資料行。Select which columns of data to work with. 首先,請設定要顯示所有可用資料行的模組。To start with, configure the module to show all available columns.

提示

如果您知道所需資料或模組的名稱,請使用選擇區頂端的搜尋列以快速尋找。If you know the name of the data or module you want, use the search bar at the top of the palette to find it quickly. 本快速入門的其餘部分將使用此捷徑。The rest of the quickstart will use this shortcut.

  1. 在 [搜尋] 方塊中輸入選取,以尋找選取資料集中的資料行模組。Type Select in the Search box to find the Select Columns in Dataset module.

  2. 按住選取資料集中的資料行,並將其拖曳到畫布上。Click and drag the Select Columns in Dataset onto the canvas. 將模組放在您先前新增的資料集下。Drop the module below the dataset you added earlier.

  3. 若要將資料集連線至選取資料集中的資料行:按住資料集的輸出連接埠,並拖曳到選取資料集中的資料行的輸入連接埠,然後放開滑鼠按鈕。Connect the dataset to the Select Columns in Dataset: click the output port of the dataset, drag to the input port of Select Columns in Dataset, then release the mouse button. 即使您在畫布上移動資料集或模組,這兩者仍會保持連線。The dataset and module remain connected even if you move either around on the canvas.

    提示

    資料集和模組以小圓圈代表輸入和輸出連接埠,輸入連接埠位在頂端,輸出連接埠在底端。Datasets and modules have input and output ports represented by small circles - input ports at the top, output ports at the bottom. 將某個模組的輸出連接埠連線至另一個模組的輸入連接埠時,您會透過實驗建立資料流程。You create a flow of data through your experiment when you connect the output port of one module to an input port of another.

    如果連線模組時發生問題,請嘗試直接拖曳到您連線的節點中。If you have trouble connecting modules, try dragging all the way into the node you are connecting.

    連線模組

    紅色驚嘆號標示表示您尚未設定該模組的屬性。The red exclamation mark indicates that you haven't set the properties for the module yet. 您接下來將執行該作業。You'll do that next.

  4. 選取選取資料集中的資料行模組。Select the Select Columns in Dataset module.

  5. 在畫布右側的 [屬性] 窗格中,選取 [編輯資料行] 。In the Properties pane to the right of the canvas, select Edit columns.

    在 [選取資料行] 對話方塊中選取 [所有資料行] ,並包含 [所有功能] 。In the Select columns dialog, select ALL COLUMNS and include all features. 對話方塊應該會看起來如下:The dialog should look like this:

    資料行選取器

  6. 在右下方,選取 [確定] 按鈕,以關閉資料行選取器。On the lower right, select OK to close the column selector.

執行實驗Run the experiment

您隨時可以按一下資料集或模組的輸出連接埠,以查看資料於該時間點在資料流程中的型態。At any time, click the output port of a dataset or module to see what the data looks like at that point in the data flow. 如果 [視覺化] 選項已停用,則必須先執行實驗。If the Visualize option is disabled, you first need to run the experiment. 您接下來將執行該作業。You'll do that next.

對計算目標執行實驗,此目標為連結至工作區的計算資源。An experiment runs on a compute target, a compute resource that is attached to your workspace. 建立計算目標之後,您可以將其重複用於未來的執行。Once you create a compute target, you can reuse it for future runs.

  1. 選取底部的 [執行] 以執行實驗。Select Run at the bottom to run the experiment.

    執行實驗

  2. 當 [設定計算目標] 對話方塊出現時,如果您的工作區中已有計算資源,您即可加以選取。When the Setup Compute Targets dialog appears, if your workspace already has a compute resource, you can select it now. 否則,請選取 [新建] 。Otherwise, select Create new.

    注意

    視覺化介面只能對 Machine Learning Compute 目標執行實驗。The visual interface can only run experiments on Machine Learning Compute targets. 其他計算目標將不會顯示。Other compute targets will not be shown.

  3. 提供計算資源的名稱。Provide a name for the compute resource.

  4. 選取 [執行] 。Select Run.

    設定計算目標

    此時會建立計算資源。The compute resource will now be created. 在實驗的右上角檢視狀態。View the status in the top-right corner of the experiment.

    注意

    建立計算資源大約需要 5 分鐘。It takes approximately 5 minutes to create a compute resource. 資源建立後,您可以在未來執行時加以重複使用,而略過這段等候時間。After the resource is created, you can reuse it and skip this wait time for future runs.

    計算資源在閒置時會自動調整為 0 個節點,以節省成本。The compute resource will autoscale to 0 nodes when it is idle to save cost. 當您在一段時間後再次加以使用時,它在重新相應增加時可能又會再出現約 5 分鐘的等候時間。When you use it again after a delay, you may again experience approximately 5 minutes of wait time while it scales back up.

有可用的計算目標之後,便會執行實驗。After the compute target is available, the experiment runs. 執行完成時,每個模組上會出現綠色的核取記號。When the run is complete, a green checkmark appears on each module.

檢視狀態

預覽資料Preview the data

現在您已執行初始實驗,接下來可以將資料視覺化,以深入了解您必須處理的相關資訊。Now that you have run your initial experiment, you can visualize the data to understand more about the information you have to work with.

  1. 選取位於選取資料集中的資料行底部的輸出連接埠,然後選取 [視覺化] 。Select the output port at the bottom of the Select Columns in Dataset then select Visualize.

  2. 按一下 [資料] 視窗中的不同資料行,以檢視該資料行的相關資訊。Click on different columns in the data window to view information about that column.

    在此資料集中,每個資料列分別代表一款汽車,而與每款汽車相關聯的變數會顯示為資料行。In this dataset, each row represents an automobile, and the variables associated with each automobile appear as columns. 此資料集中有 205 個資料列和 26 個資料行。There are 205 rows and 26 columns in this dataset.

    每當您按一下資料的資料行時,該資料行的 [統計資料] 資訊和 [視覺效果] 影像就會出現在左側。Each time you click a column of data, the Statistics information and Visualization image of that column appears on the left. 例如,當您按一下車門數時,您會看到它有 2 個唯一值和 2 個遺漏值。For example, when you click on num-of-doors you see it has 2 unique values and 2 missing values. 向下捲動可查看其值:雙門和四門。Scroll down to see the values: two and four doors.

    預覽資料

  3. 按一下每個資料行即可深入了解您的資料集,並判斷這些資料行是否有助於預測汽車的價格。Click on each column to understand more about your dataset, and think about whether these columns will be useful to predict the price of an automobile.

準備資料Prepare data

資料集通常必須先經過某些前置處理,才能進行分析。A dataset usually requires some preprocessing before it can be analyzed. 您可能已經注意到在各種不同資料列的資料行中有遺漏的值。You might have noticed the missing values present in the columns of various rows. 必須清除這些遺漏的值,讓模型才能正確地分析資料。These missing values need to be cleaned so the model can analyze the data correctly. 您將移除含有遺漏值的所有資料列。You'll remove any rows that have missing values. 此外,自負虧損資料行含有比例很高的遺漏值,因此您會將該資料行從模型中完全排除。Also, the normalized-losses column has a large proportion of missing values, so you'll exclude that column from the model altogether.

提示

在使用大部分的模組時,都必須從輸入資料中清除遺漏值。Cleaning the missing values from input data is a prerequisite for using most of the modules.

移除資料行Remove column

首先,完全移除自負虧損資料行。First, remove the normalized-losses column completely.

  1. 選取選取資料集中的資料行模組。Select the Select Columns in Dataset module.

  2. 在畫布右側的 [屬性] 窗格中,選取 [編輯資料行] 。In the Properties pane to the right of the canvas, select Edit columns.

    • 保留 [套用規則] 和 [所有資料行] 的選取狀態。Leave With rules and ALL COLUMNS selected.

    • 在下拉式清單中,選取 [排除] 和 [資料行名稱] ,然後按一下文字方塊內部。From the drop-downs, select Exclude and column names, and then click inside the text box. 輸入自負虧損Type normalized-losses.

    • 在右下方,選取 [確定] 按鈕,以關閉資料行選取器。On the lower right, select OK to close the column selector.

    排除資料行

    現在,[選取資料集中的資料行] 的屬性窗格指出它會傳遞資料集中的所有資料行,但 [自負虧損] 除外。Now the properties pane for Select Columns in Dataset indicates that it will pass through all columns from the dataset except normalized-losses.

    [屬性] 窗格顯示 [自負虧損] 資料行已排除。The properties pane shows that the normalized-losses column is excluded.

    屬性窗格

    您可以按兩下模組並輸入文字,為模組新增註解。You can add a comment to a module by double-clicking the module and entering text. 這有助於您快速檢視模組在您實驗中的執行情況。This can help you see at a glance what the module is doing in your experiment.

  3. 按兩下選取資料集中的資料行模組,然後輸入註解「排除自負虧損」。Double-click the Select Columns in Dataset module and type the comment "Exclude normalized losses."

    輸入註解後,請按一下該模組外部。After you type the comment, click outside the module. 此時會出現向下箭號,以顯示該模組包含註解。A down-arrow appears to show that the module contains a comment.

  4. 按一下向下箭號以顯示註解。Click on the down-arrow to display the comment.

    模組此時會顯示向上箭號以隱藏註解。The module now shows an up-arrow to hide the comment.

    註解

清除遺漏的資料Clean missing data

當您定型模型時,您必須對遺漏的資料採取某些動作。When you train a model, you have to do something about the data that is missing. 在此案例中,您將新增模組來移除任何含有遺漏資料的其餘資料列。In this case, you'll add a module to remove any remaining row that has missing data.

  1. 在搜尋方塊中輸入清除,以尋找清除遺漏的資料模組。Type Clean in the Search box to find the Clean Missing Data module.

  2. 清除遺漏的資料模組拖曳到實驗畫布,然後將其連線至選取資料集中的資料行模組。Drag the Clean Missing Data module to the experiment canvas and connect it to the Select Columns in Dataset module.

  3. 在 [屬性] 窗格中,選取 [清除模式] 下方的 [移除整個資料列] 。In the Properties pane, select Remove entire row under Cleaning mode.

    這些選項會指示清除遺漏的資料藉由移除含任何遺漏值的資料列來清除資料。These options direct Clean Missing Data to clean the data by removing rows that have any missing values.

  4. 按兩下模組,並輸入註解「移除遺漏值資料列」。Double-click the module and type the comment "Remove missing value rows."

    移除資料列

    您的實驗目前看起來如下:Your experiment should now look something like this:

    select-column

將結果視覺化Visualize the results

由於您對實驗中的模組進行了變更,狀態已變更為 [草稿]。Since you made changes to the modules in your experiment, the status has changed to "In draft". 若要將全新的資料視覺化,您必須先再次執行實驗。To visualize the new clean data, you first have to run the experiment again.

  1. 選取底部的 [執行] 以執行實驗。Select Run at the bottom to run the experiment.

    這次您可以重複使用先前建立的計算目標。This time you can reuse the compute target you created earlier.

  2. 在對話方塊中選取 [執行] 。Select Run in the dialog.

    執行實驗

  3. 執行完成時,以滑鼠右鍵按一下清除遺漏的資料模組,將全新的資料視覺化。When the run completes, right-click on the Clean Missing Data module to visualize the new clean data.

    將全新的資料視覺化

  4. 按一下 [已清除的資料] 視窗中的不同資料行,以查看資料變更的情形。Click on different columns in the cleaned data window to see how data has changed.

    將全新的資料視覺化

    現在有 193 個資料列和 25 個資料行。There are now 193 rows and 25 columns.

    當您按一下車門數時,您會看到現在仍有 2 個唯一值,但已沒有遺漏值。When you click on num-of-doors you see it still has 2 unique values but now has 0 missing values. 按一下其餘資料行,確認沒有任何遺漏值留在資料集中。Click through the rest of the columns to see that there are no missing values left in the dataset.

清除資源Clean up resources

重要

您可以使用您所建立的資源來作為其他 Azure Machine Learning 服務教學課程和操作說明文章的先決條件。You can use the resources that you created as prerequisites for other Azure Machine Learning service tutorials and how-to articles.

刪除所有內容Delete everything

如果您不打算使用您所建立的任何資源,請刪除整個資源群組,以免產生任何費用:If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges:

  1. 在 Azure 入口網站中,於視窗左側選取 [資源群組] 。In the Azure portal, select Resource groups on the left side of the window.

    在 Azure 入口網站中刪除資源群組

  2. 在清單中,選取您所建立的資源群組。In the list, select the resource group that you created.

  3. 在視窗的右側,選取省略符號按鈕 ( ... )。On the right side of the window, select the ellipsis button (...).

  4. 選取 [刪除資源群組] 。Select Delete resource group.

刪除資源群組同時會刪除您在視覺化介面中所建立的所有資源。Deleting the resource group also deletes all resources that you created in the visual interface.

僅刪除計算目標Delete only the compute target

您在這裡建立的計算目標會在不使用時自動調整為零個節點。The compute target that you created here automatically autoscales to zero nodes when it's not being used. 這是為了盡量降低費用。This is to minimize charges. 如果您想要刪除計算目標,請採取下列步驟: If you want to delete the compute target, take these steps:

  1. Azure 入口網站中,開啟您的工作區。In the Azure portal, open your workspace.

    刪除計算目標

  2. 在工作區的 [計算] 區段中選取資源。In the Compute section of your workspace, select the resource.

  3. 選取 [刪除] 。Select Delete.

刪除個別資產Delete individual assets

在實驗建立所在的視覺化介面中,藉由選取個別資產再選取 [刪除] 按鈕,即可刪除個別資產。In the visual interface where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.

刪除實驗

後續步驟Next steps

在此快速入門中,您已了解如何:In this quickstart, you learned how to:

  • 建立您的第一個實驗以新增和預覽資料Create your first experiment to add and preview data
  • 移除遺漏的值以準備資料Prepare the data by removing missing values
  • 視覺化備妥的資料Visualize the prepared data

繼續進行教學課程,以使用這項資料預測汽車的價格。Continue to the tutorial to use this data to predict the price of an automobile.