Power BI 散点图中的高密度采样High density sampling in Power BI scatter charts

从 2017 年 9 月发布 Power BI Desktop 版本和 Power BI 服务更新后,可使用新的采样算法来改进散点图显示高密度数据的方式。Beginning with the September 2017 release of the Power BI Desktop and updates to the Power BI service, a new sampling algorithm is available that improves how scatter charts represent high density data.

例如,可以通过组织的销售活动创建一个散点图,其中每个商店每年都有成千上万个数据点。For example, you might create a scatter chart from your organization's sales activity, each store having tens of thousands of data points each year. 此类信息的散点图将从可用数据中采样数据(选择数据中有意义的代表,以展示销售情况如何随时间变化),并创建一个散点图表示基础数据。A scatter chart of such information would sample data (select a meaningful representation of that data, to illustrate how sales occurred over time) from the available data, and create a scatter chart that represents the underlying data. 这是高密度散点图中的常见做法,Power BI 改进了高密度数据的采样,本文详细介绍了相关信息。This is common practice in high density scatter charts, and Power BI has improved its sampling of high density data, the details of which are described in this article.

备注

注意:本文所述的高密度采样算法适用于 Power BI Desktop 和 Power BI 服务中的散点图,在两者中都可以使用。The high density sampling algorithm described in this article applies to, and is available in, scatter charts in both Power BI Desktop and the Power BI service.

高密度散点图的工作方式How high density scatter charts work

以前,Power BI 以确定性方式在所有基础数据中选择采样数据点的集合来创建散点图。Previously, Power BI selected a collection of sample data points in the full range of underlying data in a deterministic fashion to create a scatter chart. 具体而言,Power BI 会在散点图系列中选择第一行和最后一行数据,然后将剩余的行平均分配,以便在散点图上绘制总共 3,500 个数据点。Specifically, Power BI would select the first and last rows of data in the scatter chart series, then would divide the remaining rows evenly so that 3,500 data points total would be plotted on the scatter chart. 例如,如果示例有 35,000 行,则选择第一行和最后一行进行绘制,然后每 10 行绘制一个数据点(35,000 / 10 = 每 10 行 = 3,500 个数据点)。For example, if the sample had 35,000 rows, then the first and last rows would be selected for plotting, then every tenth row would also be plotted (35,000 / 10 = every tenth row = 3,500 data points). 另外,在此之前,在数据系列中无法绘制的 null 值或点(如文本值)不会显示出来,因此在生成视觉对象时不会加以考虑。Also previously, null values or points that could not be plotted (such as text values) in data series weren't shown, and thus were not considered when generating the visual. 通过此类采样,散点图的感知密度同样会基于代表性数据点,因此隐含的可视化密度属于采样点,而不是基础数据的完整集合。And with such sampling, the perceived density of the scatter chart was also based on the representative data points, and thus the implied visual density was a circumstance of the sampled points, and not the full collection of the underlying data.

启用“高密度采样”时,Power BI 会执行一种算法来消除重叠点,并确保与视觉对象交互时可以访问视觉对象上的点。When you enable High Density Sampling, Power BI implements an algorithm that eliminates overlapping points, and ensures that the points on the visual can be reached when interacting with the visual. 此外,它还确保数据集中的所有点都会显示在视觉对象中,从而为所选点的含义提供上下文,而不是仅仅绘制一个代表性的样本。It also ensures that all points in data set are represented in the visual, providing context to the meaning of selected points, rather than just plotting a representative sample.

根据定义,对高密度数据进行采样,以快速合理地创建能响应交互操作的视觉对象(视觉对象上过多的数据点可能会阻碍它并降低趋势的可见性)。By definition, high density data is sampled to enable visualizations that can be created reasonably quickly, and are responsive to interactivity (too many data points on a visual can bog it down, and can detract from the visibility of trends). 如何对数据进行采样才能提供最佳的视觉对象体验并确保显示所有数据,这些疑问促使了采样算法的创建。How such data is sampled, to provide the best visualization experience and ensure all data is represented, is what drives the creation of the sampling algorithm. Power BI 中对该算法进行了改进,将整体数据集中重要点的响应、表示和清楚保存以最佳方式组合。In Power BI, the algorithm has been improved to provide the best combination of responsiveness, representation, and clear preservation of important points in the overall data set.

备注

使用高密度采样算法的散点图最好在方形视觉对象上绘制(类似于所有散点图的方式)。Scatter charts using the high density sampling algorithm are best plotted on square visuals, as with all scatter charts.

新的散点图采样算法的工作方式How the new scatter chart sampling algorithm works

适用于散点图的“高密度采样”的新算法采用能够更有效地捕获和表示基础数据的方法,此类方法还可以消除重叠点。The new algorithm for High Density Sampling for scatter charts employs methods that capture and represent the underlying data more effectively, and eliminate overlapping points. 具体操作方法为:首先为每个数据点绘制一个小型半径(可视化效果上给定点的可视圆圈大小)。It does this by starting with a small radius for each data point (the visual circle size for a given point on the visualization). 然后增加所有数据点的半径大小;当两个(或多个)数据点重叠时,用一个(增加了半径大小的)圆圈表示这些重叠的数据点。It then increases the radius of all data points; when two (or more) data points overlap, a single circle (of the increased radius size) represents those overlapped data points. 该算法继续增加数据点的半径,直到半径值产生的合理数量的数据点(3,500)显示在散点图中。The algorithm continues to increase the radius of data points, until that radius value results in a reasonable number of data points - 3,500 - being displayed in the scatter chart.

此算法中的方法可确保在生成的视觉对象中显示离群值。The methods in this algorithm ensure that outliers are represented in the resulting visual. 该算法在确定重叠的同时还会设置比例,完全按照基础可视化点直观显示指数比例。The algorithm respects scale when determining overlap, too, such that exponential scales are visualized with fidelity to the underlying visualized points.

该算法还将保留散点图的整体形状。The algorithm also preserves the overall shape of the scatter chart.

备注

将高密度采样算法用于散点图时,目标是准确分发数据,而不是隐含的可视化密度。When using the High Density Sampling algorithm for scatter charts, accurate distribution of the data is the goal, and implied visual density is not the goal. 例如,你可能会看到一个散点图,其中有许多圆圈在某个区域重叠(密度),并想像肯定有许多数据点聚集在那里;由于高密度采样算法可以使用一个圆圈来表示许多数据点,因此隐含的可视化密度(或“群集”)将不会出现。For example, you might see a scatter chart with lots of circles that overlap (density) in a certain area, and imagine many data points must be clustered there; since the High Density Sampling algorithm can use one circle to represent many data points, such implied visual density (or "clustering") will not show up. 若要在给定区域获得更多详细信息,可以使用切片器执行放大操作。To get more detail in a given area, you can use slicers to zoom in.

此外,会忽略不能绘制的数据点(例如 null 或文本值),因此选择另一个可以绘制的值,从而进一步确保散点图的真实形状保持不变。In addition, data points that cannot be plotted (such as nulls or text values) are ignored, so another value that can be plotted is selected, further ensuring the true shape of the scatter chart is maintained.

使用散点图的标准算法时When the standard algorithm for scatter charts is used

在一些情况下,高密度采样不能应用于散点图,而是使用原始算法。There are circumstances under which High Density Sampling cannot be applied to a scatter chart, and the original algorithm is used. 这些情况如下所示:Those circumstances are the following:

  • 如果右键单击“详细信息”,然后从显示的菜单中选择“显示不含数据的项”,那么散点图将恢复为原始算法。If you right-click on Details, then select Show items with no data from the menu that appears, the scatter chart will revert to the original algorithm.

  • “播放”轴中的任何值都将导致散点图恢复为原始算法。Any values in the Play axis will result in the scatter chart reverting to the original algorithm.
  • 如果散点图上缺少 X 轴和 Y 轴,则图表将恢复为原始算法。If both X and Y axes are missing on a scatter chart, the chart reverts to the original algorithm.
  • 使用“分析”窗格中的“比率线”会导致图表恢复为原始算法。Using a Ratio line in the Analytics pane results in the chart reverting to the original algorithm.

如何为散点图启用高密度采样How to turn on high density sampling for a scatter chart

若要启用“高密度采样”,请选择散点图,然后转到“格式设置”窗格,并展开“常规”卡。To turn on High Density Sampling, select a scatter chart and then go to the Formatting pane, and expand the General card. 在卡的底部附近,有一个名为“高密度采样”的切换滑块可供使用。Near the bottom of that card, a toggle slider called High Density Sampling is available. 若要启用,请将滑块滑动到“打开”。To turn it on, slide it to On.

备注

启用滑块后,Power BI 将在可能的情况下尝试使用“高密度采样”算法。Once the slider is turned on, Power BI will attempt to use the High Density Sampling algorithm whenever possible. 如果该算法无法使用(例如,在“播放”轴添加一个值),滑块将停留在“打开”位置,即使图表已恢复为标准算法也是如此。When the algorithm cannot be used (for example, you place a value in the Play axis), the slider stays in the On position even though the chart has reverted to the standard algorithm. 如果之后你从“播放”轴删除一个值(或者情况变为允许使用高密度采样算法),由于滑块处于打开状态,图表将自动为该图表使用高密度采样。If you then remove a value from the Play axis (or conditions change to enable use of the high density sampling algorithm), since the slider is on the chart will automatically use high density sampling for that chart.

备注

数据点按照索引进行分组和/或选择。Data points are grouped and/or selected by the index. 包含图例不会影响算法采样,它只影响视觉对象的排序。Having a legend does not affect sampling for the algorithm, it only affects the ordering of the visual.

注意事项和限制Considerations and limitations

高密度采样算法是 Power BI 的一个重要改进,但在使用高密度值和散点图时需要了解以下注意事项。The high density sampling algorithm is an important improvement to Power BI, but there are a few considerations you need to know when working with high density values and scatter charts.

  • 高密度采样算法仅适用于到基于 Power BI 服务的模型、导入的模型或 DirectQuery 的实时连接。The High Density Sampling algorithm only works with live connections to Power BI service-based models, imported models, or DirectQuery.

后续步骤Next steps

有关其他图表中高密度采样的详细信息,请参阅以下文章。For more information about high density sampling in other charts, see the following article.