教程:在 Power BI 中生成机器学习模型Tutorial: Build a Machine Learning model in Power BI

在本教程中,你将在 Power BI 中使用自动机器学习创建并应用二进制预测模型。 In this tutorial article, you use Automated Machine Learning to create and apply a binary prediction model in Power BI. 此教程包括以下操作的指南:创建 Power BI 数据流,以及使用数据流定义的实体直接从 Power BI 训练并验证机器学习模型。The tutorial includes guidance for creating a Power BI dataflow, and using the entities defined in the dataflow to train and validate a machine learning model directly in Power BI. 然后使用该模型对新数据进行评分,以生成预测。We then use that model for scoring new data to generate predictions.

首先将创建二进制预测机器学习模型,用于基于在线购物者的联机会话属性集预测他们的购买意向。First, you'll create a Binary Prediction machine learning model, to predict the purchase intent of online shoppers based on a set of their online session attributes. 在此练习中使用了基准机器学习数据集。A benchmark machine learning dataset is used for this exercise. 训练模型后,Power BI 将自动生成验证报告,来说明模型结果。Once a model is trained, Power BI will automatically generate a validation report explaining the model results. 然后则可以查询验证报告,并将模型应用于数据,来进行评分。You can then review the validation report and apply the model to your data for scoring.

此教程包含下列步骤:This tutorial consists of following steps:

  • 使用输入数据创建数据流Create a dataflow with the input data
  • 创建并训练机器学习模型Create and train a machine learning model
  • 查看模型验证报告Review the model validation report
  • 将模型应用于数据流实体Apply the model to a dataflow entity
  • 在 Power BI 报表中使用模型的评分输出Using the scored output from the model in a Power BI report

使用输入数据创建数据流Create a dataflow with the input data

此教程的第 1 部分为使用输入数据创建数据流。The first part of this tutorial is to create a dataflow with input data. 此过程需要以下部分所示的几个步骤,首先是获取数据。That process takes a few steps, as shown in the following sections, beginning with getting data.

获取数据Get data

创建数据流的第 1 个步骤是准备好数据资源。The first step in creating a dataflow is to have your data sources ready. 在示例中,我们使用一组联机会话的机器学习数据集,其中的一些会话以购买结束。In our case, we use a machine learning dataset from a set of online sessions, some of which culminated in a purchase. 此数据集包含一组与这些会话相关的属性,我们将使用它们训练模型。The dataset contains a set of attributes about these sessions, which we'll use for training our model.

可以从 UC Irvine 网站下载此数据集。You can download the dataset from the UC Irvine website. 我们还在以下链接中提供了此数据集,以便于使用此教程:online_shoppers_intention.csvWe also have this available, for the purpose of this tutorial, from the following link: online_shoppers_intention.csv.

创建实体Create the entities

若要在数据流中创建实体,登录到 Power BI 服务并导航到启用 AI 的专用容量上的工作区。To create the entities in your dataflow, sign into the Power BI service and navigate to a workspace on your dedicated capacity that has AI enabled.

如果还没有工作区,可以通过从 Power BI 服务选择导航窗格菜单的“工作区”来创建一个,然后选择在面板底部显示的“创建工作区” 。If you don't already have a workspace, you can create one by selecting Workspaces in the nav pane menu in the Power BI service, and select Create workspace at the bottom of the panel that appears. 这将在右侧打开一个面板,用于输入工作区详细信息。This opens a panel on the right to enter the workspace details. 输入一个工作区名称,然后选择“高级” 。Enter a workspace name and select Advanced. 使用单选按钮确认工作区使用“专用容量”,并确保将它分配给已启用 AI 预览版的专用容量实例。Confirm that the workspace uses Dedicated Capacity using the radio button, and that it's assigned to a dedicated capacity instance that has the AI preview turned on. 然后,选择“保存” 。Then select Save.

创建工作区

创建工作区后,可以选择欢迎屏幕右下角的“跳过” ,如下图所示。Once the workspace is created, you can select Skip in the bottom right of the Welcome screen, as shown in the following image.

若已有工作区,请跳过

选择工作区右上角的“创建” 按钮,再选择“数据流” 。Select the Create button at the top right of the workspace, and then select Dataflow.

创建数据流

选择“添加新实体” 。Select Add new entities. 这将从浏览器启动 Power Query 编辑器。 This launches a Power Query editor in the browser.

添加新实体

选择“文本/CSV 文件”作为数据源,如下图所示。 Select Text/CSV File as a data source, shown in the following image.

已选择“文本/CSF 文件”

在接下来出现的“连接到数据源”页中,将指向“online_shoppers_intention.csv”的以下链接粘贴到“文件路径或 URL”框,,然后选择“下一步” 。In the Connect to a data source page that appears next, paste the following link to the online_shoppers_intention.csv into the File path or URL box, and then select Next.

https://raw.githubusercontent.com/santoshc1/PowerBI-AI-samples/master/Tutorial_AutomatedML/online_shoppers_intention.csv

文件路径

Power Query 编辑器显示 CSV 文件中的数据的预览。The Power Query Editor shows a preview of the data from the CSV file. 更改右侧窗格中“名称”框的值,即可以将查询重命名为一个更加友好的名称。You can rename the query to a friendlier name by changing the value in the Name box found in the right pane. 例如,可以将查询名称更改为“在线访问者” 。For example, you could change the Query name to Online Visitors.

更改为友好名称

Power Query 自动推断列的类型。Power Query automatically infers the type of columns. 可以通过单击列标题顶部的属性类型图标来更改列类型。You can change the column type by clicking on the attribute type icon at the top of the column header. 在此示例中,我们将“收入”列的类型更改为 True 或 False。In this example, we change the type of the Revenue column to True/False.

更改数据类型

选择“保存并关闭”按钮以关闭 Power Query 编辑器 。Select the Save & close button to close Power Query Editor. 为数据流提供一个名称,然后从对话框选择“保存”,如下图所示 。Provide a name for the dataflow, and then select Save on the dialog, as shown in the following image.

保存数据流

创建并训练机器学习模型Create and train a machine learning model

在包含训练数据和标签信息的基本实体的“操作”列表中,选择“应用 ML 模型”按钮,然后选择“添加机器学习模型,以添加机器学习模型”。 To add a machine learning model, Select the Apply ML model button in the Actions list for the base entity that contains your training data and label information, and then select Add a machine learning model.

添加机器学习模型

创建机器学习模型的第 1 个步骤是确认历史数据,包括想要预测的结果字段。The first step for creating our machine learning model is to identify the historical data including the outcome field that you want to predict. 将通过学习此数据创建模型。The model will be created by learning from this data.

对于我们所使用的数据集,即为“收入”字段。 In the case of the dataset we're using, this is the Revenue field. 选择“收入”作为“结果字段”值,然后选择“下一步” 。Select Revenue as the 'Outcome field' value and then select Next.

选择历史数据

接下来必须选择要创建的机器学习模型的类型。Next, we must select the type of machine learning model to create. Power BI 将分析你已确认的结果字段中的值,并推荐可以创建以用于预测该字段的机器学习模型类型。Power BI analyzes the values in the outcome field that you've identified and suggests the types of machine learning models that can be created to predict that field.

在此示例中,由于预测的是关于用户是否购买的二进制结果,因此建议选择“二进制预测”。In this case since we're predicting a binary outcome of whether a user will make a purchase or not, Binary Prediction is recommended. 因为我们对预测将进行购买的用户感兴趣,因此请选择“True”作为你最感兴趣的收入结果。Since we are interested in predicting users who will make a purchase, select True as the Revenue outcome that you're most interested in. 另外,为结果提供易记的标签,以便用于自动生成的用于汇总模型验证结果的报表。Additionally, you can provide friendly labels for the outcomes to be used in the automatically generated report that will summarize the results of the model validation. 然后选择“下一步”。Then select Next.

已选择二进制预测

接下来,Power BI 对你的数据示例进行初步扫描,并建议可能产生更准确的预测的输入。Next, Power BI does a preliminary scan of a sample of your data and suggests the inputs that may produce more accurate predictions. 如果 Power BI 不建议使用字段,则会在字段旁边提供说明。If Power BI doesn't recommend a field, an explanation would be provided next to it. 你可以选择更改选项,使其仅包含你希望模型学习的字段,或者通过选择实体名称旁边的复选框来选择所有字段。You have the option to change the selections to include only the fields you want the model to study, or you can select all the fields by selecting the checkbox next to the entity name. 选择“下一步”,以接受输入。 Select Next to accept the inputs.

选择“下一步”复选框

在最后一步中,我们必须为模型提供一个名称。In the final step, we must provide a name for our model. 为模型命名为“购买意向预测” 。Name the model Purchase Intent Prediction. 可以选择缩短训练时间以查看快速结果,或增加训练所用的时间以获得最佳模型。You can choose to reduce the training time to see quick results or increase the amount of time spent in training to get the best model. 然后选择“保存并训练”,开始训练模型 。Then select Save and train to start training the model.

保存模型

训练过程将从采集和规范化历史数据并将数据集拆分为两个新的实体(“购买意向预测训练数据”和“购买意向预测测试数据”)开始。 The training process will begin by sampling and normalizing your historical data and splitting your dataset into two new entities Purchase Intent Prediction Training Data and Purchase Intent Prediction Testing Data.

在任何地方,训练过程均可能需要几分钟到上一屏幕所选的训练时间,具体取决于数据集的大小。Depending on the size of the dataset, the training process can take anywhere from a few minutes to the training time selected at the previous screen. 此时,可以从数据流的“机器学习模型”选项卡看到此模型。 At this point, you can see the model in the Machine learning models tab of the dataflow. “已就绪”状态表示模型已在排队等待训练,或正在进行训练。The Ready status indicates that the model has been queued for training or is under training.

可以通过数据流的状态确认正在训练和验证模型。You can confirm that the model is being trained and validated through the status of the dataflow. 在工作区的“数据流”选项卡中,它显示为数据刷新“正在进行中”。 This appears as a data refresh in progress in the Dataflows tab of the workspace.

已为训练准备就绪

模型训练完成后,数据流显示更新后的刷新时间。Once the model training is completed, the dataflow displays an updated refresh time. 导航到数据流的“机器学习模型”选项卡,即可确认是否已训练模型。 You can confirm that the model is trained, by navigating to the Machine learning models tab in the dataflow. 创建的模型显示的状态应为“已训练”,并且“上一次训练”时间现在应更新 。The model you created should show status as Trained and the Last Trained time should now be updated.

上次训练时间

查看模型验证报告Review the model validation report

若要查看模型验证报表,在“机器学习模型”选项卡中,选择模型“操作”列里的“查看训练报表”按钮。To review the model validation report, in the Machine learning models tab, select the View training report button in the Actions column for the model. 此报告描述机器学习模型的性能趋势。This report describes how your machine learning model is likely to perform.

在报表的“模型性能”页中,选择“查看主要预测指标”,以查看模型的主要预测指标 。In the Model Performance page of the report, select See top predictors to view the top predictors for your model. 可以选择一个预测指标,查看结果分布与该预测指标的关联情况。You can select one of the predictors to see how the outcome distribution is associated with that predictor.

模型性能

在“模型性能”页上,可以使用“概率阈值”切片器,来查看它对模型的“精度”和“召回率”的影响。 You can use the Probability Threshold slicer on the Model Performance page to examine its influence on the Precision and Recall for the model.

概率阈值

报告的其他页描述模型的统计学性能指标。The other pages of the report describe the statistical performance metrics for the model.

此报告还包括“训练详细信息”页,其中说明了运行的各种迭代、如何从输入提取特征,以及使用的最终模型的超参数。The report also includes a Training Details page that describes the different iterations that were run, how features were extracted from the inputs, and the hyperparameters for the final model used.

将模型应用于数据流实体Apply the model to a dataflow entity

从报表顶部选择“应用模型”按钮,以调用此模型 。Select the Apply model button at the top of the report to invoke this model. 在“应用”对话框中,可以指定包含模型应应用到的源数据的目标实体。 In the Apply dialog, you can specify the target entity that has the source data to which the model should be applied.

应用模型

出现提示时,必须刷新数据流才能预览模型的结果。 When prompted, you must Refresh the dataflow to preview the results of your model.

应用模型会创建两个新实体,其后缀为“enriched <model_name>”和“enriched <model_name> explanations” 。Applying the model will create two new entities, with the suffix enriched <model_name> and enriched <model_name> explanations. 在这种情况下,将模型应用到“在线访问者”实体会创建“在线访问者扩充的购买意向预测”,其中包括模型的预测输出,和其中包含预测的最特定于记录的主要影响因素的“在线访问者扩充的购买意向预测说明” 。In our case, applying the model to the Online Visitors entity will create Online Visitors enriched Purchase Intent Prediction which includes the predicted output from the model, and Online Visitors enriched Purchase Intent Prediction explanations which contains top record-specific influencers for the prediction.

应用二进制预测模型后,将添加四个列,其中包含预测的结果、概率评分以及预测的特定于记录的主要影响因素,并解释索引每个列的前面均有指定的列名称。Applying a Binary Prediction model adds four columns with predicted outcome, probability score, the top record-specific influencers for the prediction, and explanation index each prefixed with the column name specified.

结果的三个列

数据流刷新完成后,可以选择“在线访问者扩充的购买意向预测”实体,以查看结果 。Once the dataflow refresh is completed, you can select the Online Visitors enriched Purchase Intent Prediction entity to view the results.

查看结果

在 Power BI 报表中使用模型的评分输出Using the scored output from the model in a Power BI report

可以使用数据流,从 Power BI Desktop 连接器连接到数据流,以使用机器学习模型的评分输出。To use the scored output from your machine learning model you can connect to your dataflow from the Power BI desktop, using the Dataflows connector. 在 Power BI 报表中,现在可以使用“在线访问者扩充的购买意向预测”实体合并模型的预测 。The Online Visitors enriched Purchase Intent Prediction entity can now be used to incorporate the predictions from your model in Power BI reports.

后续步骤Next steps

在此教程中,你通过以下步骤在 Power BI 中创建并应用了二进制预测模型:In this tutorial, you created and applied a binary prediction model in Power BI using these steps:

  • 使用输入数据创建数据流Create a dataflow with the input data
  • 创建并训练机器学习模型Create and train a machine learning model
  • 查看模型验证报告Review the model validation report
  • 将模型应用于数据流实体Apply the model to a dataflow entity
  • 在 Power BI 报表中使用模型的评分输出Using the scored output from the model in a Power BI report

有关 Power BI 中的机器学习自动化的详细信息,请参阅 Power BI 中的自动机器学习For more information about Machine Learning automation in Power BI, see Automated Machine Learning in Power BI.