Power BI 散佈圖中的高密度取樣High density sampling in Power BI scatter charts

Power BI Desktop 的 2017 年 9 月版和「Power BI 服務」的更新開始,有新的取樣演算法可以改善散佈圖代表高密度資料的方式。Beginning with the September 2017 release of the Power BI Desktop and updates to the Power BI service, a new sampling algorithm is available that improves how scatter charts represent high density data.

例如,您可以從組織的銷售活動建立散佈圖,每間商店每年有數以萬計的資料點。For example, you might create a scatter chart from your organization's sales activity, each store having tens of thousands of data points each year. 這類資訊的散佈圖會可用資料進行取樣 (選取以有意義的方式轉譯該資料,來說明銷售隨著時間的變化),並建立代表基礎資料的散佈圖。A scatter chart of such information would sample data (select a meaningful representation of that data, to illustrate how sales occurred over time) from the available data, and create a scatter chart that represents the underlying data. 這是高密度散佈圖的常見做法,而且 Power BI 已改善對高密度資料的取樣,本文會描述其詳細資料。This is common practice in high density scatter charts, and Power BI has improved its sampling of high density data, the details of which are described in this article.

注意

本文中所述的「高密度取樣」演算法適用且可用於 Power BI DesktopPower BI 服務The high density sampling algorithm described in this article applies to, and is available in, scatter charts in both Power BI Desktop and the Power BI service.

高密度散佈圖的運作方式How high density scatter charts work

先前,Power BI 透過決定性方式選取完整基礎資料範圍中的範例資料點集合,以建立散佈圖。Previously, Power BI selected a collection of sample data points in the full range of underlying data in a deterministic fashion to create a scatter chart. 具體來說,Power BI 會在散佈圖圖表數列中,選取第一個和最後一個資料列,然後平均分配剩餘的資料列,以便在散佈圖上繪製總計 3500 個資料點。Specifically, Power BI would select the first and last rows of data in the scatter chart series, then would divide the remaining rows evenly so that 3,500 data points total would be plotted on the scatter chart. 例如,如果樣本有 35,000 個資料列,則會選取第一個和最後一個資料列來繪製,然後也會繪製每十個資料列 (35000 / 10 = 每十個資料列 = 3,500 個資料點)。For example, if the sample had 35,000 rows, then the first and last rows would be selected for plotting, then every tenth row would also be plotted (35,000 / 10 = every tenth row = 3,500 data points). 此外,先前資料數列中無法繪製的 null 值或點 (例如文字值) 的點不會顯示,因此在產生視覺效果時並未考慮它們。Also previously, null values or points that could not be plotted (such as text values) in data series weren't shown, and thus were not considered when generating the visual. 使用這樣的取樣,散佈圖的認知密度也是根據代表性資料點,因此隱含的視覺效果密度是取樣點的一種情況,而不是完整的基礎資料集合。And with such sampling, the perceived density of the scatter chart was also based on the representative data points, and thus the implied visual density was a circumstance of the sampled points, and not the full collection of the underlying data.

當您啟用高密度取樣時,Power BI 會實作演算法,刪除重疊的點,並確保視覺效果上的點可以在與視覺效果互動時觸達。When you enable High Density Sampling, Power BI implements an algorithm that eliminates overlapping points, and ensures that the points on the visual can be reached when interacting with the visual. 它也可確保在視覺效果會代表資料集裡的所有點,為選取點的意義提供內容,而不只是繪製代表性的樣本。It also ensures that all points in data set are represented in the visual, providing context to the meaning of selected points, rather than just plotting a representative sample.

根據定義,會針對高密度資料進行取樣,讓可相當快速建立的視覺效果回應互動性 (視覺效果上有太多資料點可能會導致動彈不得,而且可能會影響趨勢可見性)。By definition, high density data is sampled to enable visualizations that can be created reasonably quickly, and are responsive to interactivity (too many data points on a visual can bog it down, and can detract from the visibility of trends). 此類資料的取樣方式,以提供最佳視覺效果體驗並確保代表所有資料,是推動建立取樣演算法的力量。How such data is sampled, to provide the best visualization experience and ensure all data is represented, is what drives the creation of the sampling algorithm. 在 Power BI 中,此演算法已經過改善,可提供回應、轉譯和清楚保留整體資料集中重要點的最佳組合。In Power BI, the algorithm has been improved to provide the best combination of responsiveness, representation, and clear preservation of important points in the overall data set.

注意

使用高密度取樣演算法的散佈圖,最適合繪製在正方形的視覺效果,如同所有散佈圖一樣。Scatter charts using the high density sampling algorithm are best plotted on square visuals, as with all scatter charts.

新散佈圖取樣演算法的運作方式How the new scatter chart sampling algorithm works

散佈圖的新高密度取樣演算法會利用方法,擷取並更有效地代表基礎資料,同時消除重疊的點。The new algorithm for High Density Sampling for scatter charts employs methods that capture and represent the underlying data more effectively, and eliminate overlapping points. 它會從每個資料點的小半徑開始 (視覺效果上的指定點的視覺效果圓形大小)。It does this by starting with a small radius for each data point (the visual circle size for a given point on the visualization). 然後增加所有資料點的半徑。當兩個 (或更多) 資料點重疊時,單一圓形 (半徑大小增加) 會代表這些重疊的資料點。It then increases the radius of all data points; when two (or more) data points overlap, a single circle (of the increased radius size) represents those overlapped data points. 此演算法會繼續增加資料點的半徑,直到半徑值導致有合理的資料點數目 - 3,500 - 顯示在散佈圖中。The algorithm continues to increase the radius of data points, until that radius value results in a reasonable number of data points - 3,500 - being displayed in the scatter chart.

此演算法中的方法可確保在產生的視覺效果中表示出極端值。The methods in this algorithm ensure that outliers are represented in the resulting visual. 判斷重疊時,演算法會尊重小數位數,使得指數小數位數在視覺化時能忠實於基礎的視覺效果點。The algorithm respects scale when determining overlap, too, such that exponential scales are visualized with fidelity to the underlying visualized points.

演算法也會保留散佈圖的整體圖形。The algorithm also preserves the overall shape of the scatter chart.

注意

針對散佈圖使用高密度取樣演算法時,資料的「正確分佈」是目標,而隱含的視覺效果密度「不是」目標。When using the High Density Sampling algorithm for scatter charts, accurate distribution of the data is the goal, and implied visual density is not the goal. 例如,您可能會看到散佈圖有很多在特定區域裡重疊 (密度) 的圓圈,並想像有許多資料點必須叢集在那裡。因為高密度取樣演算法可以使用一個圓形來代表許多資料點,所以不會顯示這樣的隱含視覺效果密度 (或「叢集」)。For example, you might see a scatter chart with lots of circles that overlap (density) in a certain area, and imagine many data points must be clustered there; since the High Density Sampling algorithm can use one circle to represent many data points, such implied visual density (or "clustering") will not show up. 若要在指定區域裡有更多詳細資料,您可以使用交叉分析篩選器拉近。To get more detail in a given area, you can use slicers to zoom in.

此外,無法繪製的資料點 (例如 null 值或文字值) 會被忽略,因此會選取另一個可繪製的值,進一步確保維護散佈圖的真實圖形。In addition, data points that cannot be plotted (such as nulls or text values) are ignored, so another value that can be plotted is selected, further ensuring the true shape of the scatter chart is maintained.

針對散佈圖使用標準演算法的時機When the standard algorithm for scatter charts is used

有些情況下無法套用高密度取樣至散佈圖,而會使用原始的演算法。There are circumstances under which High Density Sampling cannot be applied to a scatter chart, and the original algorithm is used. 這些情況如下所示:Those circumstances are the following:

  • 如果您以滑鼠右鍵按一下 [詳細資料],然後從出現的功能表選取 [顯示沒有資料的項目],散佈圖會還原成原始的演算法。If you right-click on Details, then select Show items with no data from the menu that appears, the scatter chart will revert to the original algorithm.

  • [播放] 軸中的任何值將導致散佈圖還原成原始的演算法。Any values in the Play axis will result in the scatter chart reverting to the original algorithm.
  • 如果散佈圖上同時遺漏 X 和 Y 軸,圖表會還原為原始的演算法。If both X and Y axes are missing on a scatter chart, the chart reverts to the original algorithm.
  • 在 [分析] 窗格中使用 [比率行] 會導致圖表還原成原始的演算法。Using a Ratio line in the Analytics pane results in the chart reverting to the original algorithm.

如何開啟散佈圖的高密度取樣How to turn on high density sampling for a scatter chart

若要開啟 [高密度取樣],請選取散佈圖,然後移至 [格式設定] 窗格,展開 [一般] 卡片。To turn on High Density Sampling, select a scatter chart and then go to the Formatting pane, and expand the General card. 在接近卡片底部,有一個稱為 [高密度取樣] 的切換滑桿可用。Near the bottom of that card, a toggle slider called High Density Sampling is available. 若要將它開啟,請滑動到 [開啟]。To turn it on, slide it to On.

注意

一旦開啟滑桿,Power BI 會嘗試盡可能使用高密度取樣演算法。Once the slider is turned on, Power BI will attempt to use the High Density Sampling algorithm whenever possible. 無法使用此演算法時 (例如,您在「播放」軸放了值),滑桿會停留在 [開啟]位置,但圖表已還原成標準演算法。When the algorithm cannot be used (for example, you place a value in the Play axis), the slider stays in the On position even though the chart has reverted to the standard algorithm. 如果您接著從「播放」軸移除值 (或是狀況變更為可使用高密度取樣演算法),因為滑桿為開啟,所以圖表會自動使用高密度取樣。If you then remove a value from the Play axis (or conditions change to enable use of the high density sampling algorithm), since the slider is on the chart will automatically use high density sampling for that chart.

注意

資料點會依索引分組及/或選取。Data points are grouped and/or selected by the index. 具有圖例不會影響演算法的取樣,而只會影響視覺效果的排序。Having a legend does not affect sampling for the algorithm, it only affects the ordering of the visual.

考量與限制Considerations and limitations

高密度取樣演算法是 Power BI 的重要改善,但在使用高密度值和散佈圖時有一些需要知道的考量。The high density sampling algorithm is an important improvement to Power BI, but there are a few considerations you need to know when working with high density values and scatter charts.

  • 高密度取樣演算法只能搭配對以 Power BI 服務為基礎的模型、匯入的模型,或 DirectQuery 等的即時連線。The High Density Sampling algorithm only works with live connections to Power BI service-based models, imported models, or DirectQuery.

後續步驟Next steps

如需在其他圖表進行高密度取樣的詳細資訊,請參閱下列文章。For more information about high density sampling in other charts, see the following article.