在查询编辑器中使用 RUsing R in Query Editor

你可以在 Power BI Desktop 查询编辑器中使用 R,R 是统计学家、数据科学家和数据分析师使用最广泛的一种编程语言。You can use R, a programming language widely used by statisticians, data scientists, and data analysts, in the Power BI Desktop Query Editor. 查询编辑器中集成的 R 可使你用 R 来执行数据清理,并在数据集中执行高级数据调整和分析,包括丢失数据补全、预测和聚类分析,此处仅举几例。This integration of R into Query Editor lets you perform data cleansing using R, and perform advanced data shaping and analytics in datasets, including completion of missing data, predictions, and clustering, just to name a few. R 是功能强大的语言,可用于在“查询编辑器”中准备你的数据模型并创建报表。R is a powerful language, and can be used in Query Editor to prepare your data model and create reports.

安装 RInstalling R

若要在 Power BI Desktop 的查询编辑器中使用 R,需要在本地计算机上安装 RTo use R in Power BI Desktop's Query Editor, you need to install R on your local machine. 可以从很多位置免费下载并安装 R,其中包括 Revolution Open download page(Revolution Open 下载页),以及 CRAN 存储库You can download and install R for free from many locations, including the Revolution Open download page, and the CRAN Repository.

在查询编辑器中使用 RUsing R in Query Editor

为了演示如何在查询编辑器中使用 R,我们将使用股票市场数据集中的示例,该示例基于 .CSV 文件,你可以从此处下载并按照示例进行操作。To show how to use R in Query Editor, we'll use an example from a stock market dataset, based on a .CSV file that you can download from here and follow along. 此示例中的步骤如下所示:The steps for this example are the following:

  1. 首先,将数据加载到 Power BI Desktop中。First, load your data into Power BI Desktop. 在此示例中,我们将加载 EuStockMarkets_NA.csv 文件。In this example we'll load the EuStockMarkets_NA.csv file. Power BI Desktop 的“主页”功能区中选择“获取数据”>“CSV”。Select Get Data > CSV from the Home ribbon in Power BI Desktop.

  2. 选择该文件,并选择“打开”,然后该 CSV 将显示在“CSV 文件”对话框中。Select the file and select Open, and the CSV is displayed in the CSV file dialog.

  3. 加载数据后,你会在 Power BI Desktop 中的“字段”窗格中看到它。Once the data is loaded, you'll see it in the Fields pane in Power BI Desktop.

  4. 通过从 Power BI Desktop 中的“主页”选项卡中选择“查询编辑器”来打开“查询编辑器”。Open Query Editor by selecting Edit Queries from the Home tab in Power BI Desktop.

  5. 在“转换”选项卡中,选择“运行 R 脚本”,然后“运行 R 脚本”编辑器随即出现(下一步中所示)。In the Transform tab, select Run R Script and the Run R Script editor appears (shown in the next step). 注意,第 15 和 20 行受数据丢失影响。下图中无法看见的其他行也是如此。Notice that rows 15 and 20 suffer from missing data, as do other rows you can't see in the following image. 以下步骤演示 R(将)如何为你补全这些行。The steps below show how R can (and will) complete those rows for you.

  6. 此示例中,我们将输入以下脚本代码:For this example, we'll enter the following script code:

    library(mice)
    tempData <- mice(dataset,m=1,maxit=50,meth='pmm',seed=100)
    completedData <- complete(tempData,1)
    output <- dataset
    output$completedValues <- completedData$"SMI missing values"
    

    备注

    需要在 R 环境中安装 mice 库才能使之前的脚本代码正常运行。You'll need to have the mice library installed in your R environment for the previous script code to work properly. 若要安装 mice,请在 R 安装中运行以下命令:| > install.packages('mice')To install mice, run the following in your R installation: | > install.packages('mice')

    当放入“运行 R 脚本”对话框时,代码如下所示:When put into the Run R Script dialog, the code looks like the following:

  7. 选择“确定”时,“查询编辑器”将显示与数据隐私相关的警告。When we select OK, Query Editor displays a warning about data privacy.

  8. 为使 R 脚本在 Power BI 服务中正常工作,所有的数据源都需要设置为“公用”。For the R scripts to work properly in the Power BI service, all data sources need to be set to public. 有关隐私设置及其含义的详细信息,请参阅隐私级别For more information about privacy settings and their implications, see Privacy Levels.

    执行这些操作后,我们会在“字段”框中看到一个名为 completedValues 的新列。Once we do so, we see a new column in the Fields well called completedValues. 注意,有一些行缺少数据元素,如第 15 和 18 行。Notice there are a few missing data elements, such as on row 15 and 18. 我们将在下一节中了解 R 如何处理该问题。We'll see how R handles that in the next section.

只需要五行 R 脚本,查询编辑器就能用预测模型填写丢失的值。With just five lines of R script, Query Editor filled in the missing values with a predictive model.

从 R 脚本数据创建视觉效果Creating visuals from R script data

现在,我们可以创建视觉对象以查看 R 脚本代码如何使用 mice 库补全缺少的值,如下图所示。Now we can create a visual to see how the R script code using the mice library completed the missing values, as shown in the following image.

完成视觉对象后,在有其它任何我们可能想使用 Power BI Desktop 创建的视觉对象时,可以保存 Power BI Desktop 文件(保存为 .pbix 文件),然后在 Power BI 服务中使用该数据模型以及作为其中一部分的 R 脚本。Once that visual is complete, and any other visuals we might want to create using Power BI Desktop, we can save the Power BI Desktop file (which saves as a .pbix file) and then use the data model, including the R scripts that are part of it, in the Power BI service.

备注

想要查看完成了这些步骤的完整 .pbix 文件吗?Want to see a completed .pbix file with these steps completed? 真幸运 - 你可以在此处下载示例中使用的完整 Power BI Desktop 文件。You're in luck - you can download the completed Power BI Desktop file used in these examples right here.

将 .pbix 文件上传到 Power BI 服务后,还需要几个步骤来启用数据刷新(在服务中),以及启用服务中待更新的视觉对象(为了更新视觉对象,数据需要访问 R)。Once you've uploaded the .pbix file to the Power BI service, a couple more steps are necessary to enable data refresh (in the service) and to enable visuals to be updated in the service (the data needs access to R for visuals to be updated). 其它步骤如下所示:The additional steps are the following:

  • 启用数据集的计划刷新 - 若要为包含 R 脚本数据集的工作簿启用计划刷新,请参阅配置计划刷新,其中也包含有关 个人网关 的信息。Enable scheduled refresh for the dataset - to enable scheduled refresh for the workbook that contains your dataset with R scripts, see Configuring scheduled refresh, which also includes information about Personal Gateway.
  • 安装个人网关 - 需要在计算机上与文件和 R 安装位置相同的位置安装个人网关;Power BI 服务必须访问该工作簿并重新呈现任何已更新的视觉对象。Install the Personal Gateway - you need a Personal Gateway installed on the machine where the file is located, and where R is installed; the Power BI service must access that workbook and re-render any updated visuals. 你可以获取有关如何安装和配置个人网关的详细信息。You can get more information on how to install and configure Personal Gateway.

限制Limitations

对包括 R 脚本,在查询编辑器中创建的查询有一些限制:There are some limitations to queries that include R scripts created in Query Editor:

  • 所有 R 数据源设置都必须设置为“公用”,并且查询编辑器中创建的查询中的所有其它步骤也必须设为“公用”。All R data source settings must be set to Public, and all other steps in a query created in Query Editor must also be public. 若要获取数据源设置,请在 Power BI Desktop 中,选择“文件”>“选项和设置”>“数据源设置”。To get to data source settings, in Power BI Desktop select File > Options and settings > Data source settings.

    从“数据源设置”对话框中,选择“数据源”,然后选择“编辑权限...”并确保“隐私级别”设置为“公用”。From the Data Source Settings dialog, select the the data source(s) and then select Edit Permissions... and ensure that the Privacy Level is set to Public.

  • 若要启用 R 视觉对象或数据集的计划更新,你需要启用“计划更新”,并且拥有安装在存储工作簿和 R 安装的计算机上的个人网关To enable scheduled refresh of your R visuals or dataset, you need to enable Scheduled refresh and have a Personal Gateway installed on the computer that houses the workbook and the R installation. 有关这两方面的详细信息,请参阅本文中的之前章节,其中提供了链接可了解每个方面的详细信息。For more information on both, see the previous section in this article, which provides links to learn more about each.

通过 R 和自定义查询你能进行各种各样的操作,因此你可以按照你想要显示的方式来探索和分析你的数据。There are all sorts of things you can do with R and custom queries, so explore and shape your data just the way you want it to appear.