使用 Power BI Desktop 分析网页数据(教程)Analyzing Web page data using Power BI Desktop (tutorial)

在本教程中,您将学习如何从网页导入数据并创建报表以可视化数据。In this tutorial, you learn how to import a table of data from a Web page and create a report to visualize this data. 在此过程中,您将在网页中各个表之间导航,并应用数据转换步骤以调整表。As part of this process, you navigate across tables available on a web page, and apply data transformation steps to bring the table into a new shape.

本文内容:In this article:

  • 任务 1︰连接到 Web 数据源Task 1: Connect to a web data source
  • 任务 2︰在“查询”视图中调整数据Task 2: Shape data in the Query view
    • 步骤 1:删除其他列,只显示所需的列Step 1: Remove Other Columns to only display columns of interest
    • 步骤 2︰替换值以清理所选列中的值Step 2: Replace Values to clean up values in a selected column
    • 步骤 3 ︰筛选列中的值Step 3: Filter values in a column
    • 步骤 4 ︰重命名列Step 4: Rename a column
    • 步骤 5︰筛选列中的 null 值Step 5: Filter null values in a column
    • 步骤 6 ︰重命名列Step 6: Rename a query
    • 已创建查询步骤Query Steps created
  • 任务 3︰使用“报表”视图创建可视化效果Task 3: Create visualizations using the Report view
    • 步骤 1 ︰将查询加载到您的报表Step 1: Load the query to your report
    • 步骤 2 ︰创建地图可视化效果Step 2: Create a Map visualization

任务 1 ︰连接到 Web 数据源Task 1: Connect to a web data source

在任务 1 中,您从 UEFA European Football Championship Wikipedia 页面上的以下位置导入 Tournament Summary 表︰http://en.wikipedia.org/wiki/UEFA_European_Football_ChampionshipIn task 1, you import a Tournament Summary table from the UEFA European Football Championship Wikipedia page at the following location: http://en.wikipedia.org/wiki/UEFA_European_Football_Championship

添加 Wikipedia 页面数据源Add a Wikipedia page data source

  1. 入门对话框中或主页功能区选项卡中,选择获取数据In the Getting Started dialog or in the Home ribbon tab, select Get Data.
  2. 将显示获取数据对话框,您可以在其中选择各种数据源以便向 Power BI Desktop 导入数据。This brings up the Get Data dialog, where you can pick from a wide range of data sources to import data into Power BI Desktop. 我们将选择位于所有其他组下的 WebWe will select Web which is available under the All or Other group.
  3. Web 内容对话框中的 URL 文字框中,粘贴 Wikipedia URL (http://en.wikipedia.org/wiki/UEFA_European_Football_Championship)。In the Web Content dialog box, in the URL text box, paste the Wikipedia URL (http://en.wikipedia.org/wiki/UEFA_European_Football_Championship).
  4. 单击确定Click OK.

建立与网页的连接后,将在导航器对话框中显示此 Wikipedia 页面上的可用表的列表。After establishing a connection to the web page, you see a list of tables available on this Wikipedia page in the Navigator dialog. 可以单击其中每个表,以预览数据。You can single-click on each of these tables to preview the data.

导航器左窗格中,选择结果 [编辑] 表了解 Tournament Summary 结果,或选择结果 [编辑] 表,然后选择编辑In the Navigator left-pane, select the Results[edit] table for the Tournament Summary results, or select the Results[edit] table and select Edit. 这将使我们能够在将此表加载到报表前调整此表,因为数据无法满足我们要执行的分析的要求。This will allow us to reshape this table before loading it to the Report, since the data is not in the shape that we need for our analysis.

将在“查询”视图中显示表的预览,并且我们可以在此视图中应用一组转换步骤来清理数据。This will land a preview of the table in the Query view, where we can apply a set of transformation steps to clean up the data.

任务 2︰在主题表中调整数据Task 2: Shape data in the subject table

现在,已为数据查询中选择主题表,您将学习如何执行各种数据调整和清理步骤。Now that you have the subject table selected for your data query, you learn how to perform various data shaping and cleansing steps.

步骤 1:删除其他列,只显示所需的列Step 1: Remove Other Columns to only display columns of interest

在此步骤中,除 YearFinal Winners 外的所有列都将删除。In this step, you remove all columns except Year and Final Winners.

  1. 查询预览网格中,选择 YearFinal Winners 列(使用 CTRL + 单击操作执行)。In the Query Preview grid, select the Year and Final Winners columns (use CTRL + Click).
  2. 右键单击查询预览网格中的列标题,然后单击删除其他列以删除未选定的列。Right-click a column header in the Query Preview grid, and click Remove Other Columns to remove the unselected columns. 请注意,也可在主页功能区选项卡的管理列组中执行此操作。Note that this operation is also available in the Home ribbon tab, in the Manage Columns group.

步骤 2︰替换值以清理所选列中的值Step 2: Replace Values to clean up values in a selected column

在此步骤中,替换 Year 列中的“详细信息”后缀。In this step, you replace the Details suffix in the Year column. 请注意,此后缀置于新行中,因此在表预览中不可见。Note that this suffix is on a new line so it is not visible in the table preview. 但是,如果单击 Year 列中含数值的单元格,您将在详细视图中看到完整值。However, if you click in one of the cells with a numeric value in the Year column, you will see the full value in the detailed view.

  1. 选择 Year 列。Select the Year column.
  2. 查询视图功能区上,单击主页选项卡下的替换值,或右键单击 Year 列并单击替换值,将“详细信息”替换为空文本。In the Query view ribbon, click Replace Values under the Home tab or right-click the Year column, and click Replace Values to replace Details with empty text.
  3. 替换值对话框框中,在要查找的值文本框中键入详细信息,将替换为文本框留空。In the Replace Values dialog box, type Details in the Value to Find text box and leave the Replace With text box empty.
  4. 单击确定Click OK.

步骤 3Step 3: Filter values in a column

在此步骤中,将筛选 Year 列以显示不包含“Year”行。In this step, you filter the Year column to display rows that do not contain “Year”.

  1. 单击 Year 列中的筛选器下拉箭头。Click the filter drop down arrow on the Year column.
  2. 筛选器下拉列表中,清除 Year 选项。In the Filter drop-down, clear the Year option.
  3. 单击确定Click OK.

步骤 4︰重命名列Step 4: Rename a column

现在,已清理 Year 列中的数据,然后我们将处理 Final Winner 列。Now that we have cleaned up the data in the Year column, we are going to work on the Final Winner column.

由于我们只搜索获胜者的列表,因此可以将该列重命名为 CountrySince we are only looking at the list of winners, we can rename this column to Country.

  1. 在查询预览中选择 Final Winner 列。Select the Final Winner column in the Query preview.
  2. 查询视图功能区的转换选项卡和任何列组下,会显示重命名In the Query view ribbon, under the Transform tab and Any Column group, you will find Rename.
  3. 这将使列名称可编辑。This will make the column name editable. 我们将该列重命名为 CountryWe will rename this column to Country.

步骤 5:筛选出列中的 null 值Step 5: Filter out null values in a column

我们还需要筛选出 Country 列中的 null 值。We also need to filter out null values in the Country column. 为此,可使用步骤 3 中所示的筛选器菜单,或者可以执行以下操作︰In order to do this, we could use the filter menu as we saw in Step 3, or alternatively we can:

  1. 右键单击 Country 列中包含 null 值的一个单元格。Right-click on one of the cells in the Country column that contain a null value.
  2. 在上下文菜单中选择文本筛选器 -> 不等于Select Text Filters -> Does not Equal in the context menu.
  3. 将创建一个新筛选器步骤以删除 Country 列中含 null 值的行。This creates a new filter step to remove rows with null values in the Country column.

步骤 6︰ 命名查询Step 6: Name a query

在此步骤中,将最终查询命名为 Euro Cup WinnersIn this step, you name your final query Euro Cup Winners.

  1. 查询设置窗格中的名称文字框中,输入 Euro Cup WinnersIn the Query Settings pane, in the Name text box, enter Euro Cup Winners.

任务 3︰使用“报表”视图创建可视化效果Task 3: Create visualizations using the Report view

现在我们已将数据转换成所需形式以用于执行分析,我们可以将所得表加载到报告,并创建几种可视化。Now that we have converted the data into the shape that we need for our analysis, we can load the resulting table into our Report and create a few visualizations.

第 1 步︰将查询加载到您的报表Step 1: Load the query to your report

为了将查询结果加载到 Power BI Desktop 以及创建报表,我们从主页功能区中选择关闭并加载In order to load the query results to Power BI Desktop and create a report, we select Close & Load from the Home ribbon.

这将导致评估查询以及将表输出内容加载到报表。This will trigger evaluation of the query and load of the table output to the Report. 在 Power BI Desktop 中,选择报表图标可在“报表”视图中查看 Power BI Desktop。In Power BI Desktop, select the Report icon to see Power BI Desktop in Report view.

您可以在报表视图右侧的字段窗格中看到所生成的表字段。You can see the resulting table fields in the Fields pane at the right of the Report view.

步骤 2︰创建地图可视化Step 2: Create a Map visualization

要创建可视化,我们可以将字段从字段列表拖放到报表画布中。In order to create a visualization, we can drag fields from the Field list and drop them in the Report canvas.

  1. Country 字段拖放到报表画布Drag the Country field and drop it in the Report canvas. 将在报表画布中创建新的可视化。This will create a new visualization in the Report canvas. 在此示例中,由于我们具有 Country 列表,因此它将创建地图可视化In this case, since we have a list of countries, it will create a Map visualization.

  2. 可以方便地通过在可视化效果窗格中选择不同图标来更改可视化效果类型。We can easily change the type of visualization by clicking on a different icon in the Visualization pane.

  3. 我们将继续使用地图可视化类型。对于地图,还可以调整可视化效果,方法是将可视化效果的一个角拖到所需大小。We are going to stay with the Map visualization type to Map, We can also resize the visualization by dragging from one of the corners of the visualization up to the desired size.

  4. 请注意,地图中的所有当前点具有相同的大小。Note that currently all the points in the map have the same size. 我们需要对此进行更改,以便使拥有多个 Euro Cup 大赛冠军的国家/地区由地图中较大的点表示。We want to change this so that countries with more Euro Cup tournaments won are represented with a larger point in the map. 为此,我们可以将字段列表中的 Year 字段拖动到字段窗格下半部分的框中。In order to do thiso, we can drag the Year field in the Fields list to the Values box in the lower half of the Fields pane.

如您所见,可以非常方便地在报表中自定义可视化效果以便你按所需方式呈现数据。As you can see, it is very easy to customize visualizations in your report, in order to present the data in the way that you want. Power BI Desktop 提供无缝的端到端体验(从各种数据源获取数据到拆分以满足您的分析需求再到以丰富的交互方式可视化这些数据)。Power BI Desktop provides a seamless end-to-end experience from getting data from a wide range of data sources and shaping it to meet your analysis needs to visualizing this data in rich and interactive ways. 在报表准备就绪后,您可以将其上载到 Power BI 并基于它创建仪表板与其他 Power BI 用户共享。Once your report is ready, you can upload it to Power BI and create dashboards based on it, which you can share with other Power BI users.

我们的从 Web 导入数据教程到此为止。This concludes the Importing Data from the Web tutorial. 你可以在此处下载完整的 Power BI Desktop 文件。You can download the completed Power BI Desktop file here.

我还可以在哪些位置获取详细信息?Where else can I get more information?