关键影响因素可视化效果Key influencers visualization

备注

可以在 Power BI Desktop 和 Power BI 服务中创建并查看这些视觉对象。These visuals can be created and viewed in both Power BI Desktop and the Power BI service. 本文中的步骤和图示来自 Power BI Desktop。The steps and illustrations in this article are from Power BI Desktop.

关键影响因素视觉对象有助于理解驱动你所关注指标的因素。The key influencers visual helps you understand the factors that drive a metric you're interested in. 它可分析数据,对重要因素进行排序,并将其显示为“关键影响因素”。It analyzes your data, ranks the factors that matter, and displays them as key influencers. 例如,假设你想要找出影响员工流动(也称为流失)的因素。For example, suppose you want to figure out what influences employee turnover, which is also known as churn. 一大因素可能是就业合同时限,另一大因素可能是员工年龄。One factor might be employment contract length, and another factor might be employee age.

何时使用关键影响因素When to use key influencers

如果需要执行以下操作,可以选择关键影响因素视觉对象:The key influencers visual is a great choice if you want to:

  • 看看哪些因素会影响所分析的指标。See which factors affect the metric being analyzed.
  • 对比这些因素的相对重要性。Contrast the relative importance of these factors. 例如,短期合同比长期合同对流失的影响更大吗?For example, do short-term contracts have more impact on churn than long-term contracts?

关键影响因素视觉对象的功能Features of the key influencers visual

功能编号

  1. 选项卡:选择选项卡以在视图之间进行切换。Tabs: Select a tab to switch between views. “关键影响因素”显示对所选指标值影响最大的一些因素 。Key influencers shows you the top contributors to the selected metric value. “首要区段”显示对所选指标值影响最大的一些区段 。Top segments shows you the top segments that contribute to the selected metric value. “细分市场”由值的组合组成 。A segment is made up of a combination of values. 例如,一个区段可能是已成为客户至少 20 年并居住在西部地区的消费者。For example, one segment might be consumers who have been customers for at least 20 years and live in the west region.

  2. 下拉列表框:正在调查的指标的价值。Drop-down box: The value of the metric under investigation. 在此示例中,请查看指标“评级” 。In this example, look at the metric Rating. 所选的值为“低” 。The selected value is Low.

  3. 重述:帮助解释左窗格中的视觉对象。Restatement: It helps you interpret the visual in the left pane.

  4. 左窗格:左窗格包含一个视觉对象。Left pane: The left pane contains one visual. 在此情况下,左窗格显示首要关键影响因素列表。In this case, the left pane shows a list of the top key influencers.

  5. 重述:帮助解释右窗格中的视觉对象。Restatement: It helps you interpret the visual in the right pane.

  6. 右窗格:右窗格包含一个视觉对象。Right pane: The right pane contains one visual. 在此情况下,列图显示左窗格中已选中关键影响因素“主题”的所有值 。In this case, the column chart displays all the values for the key influencer Theme that was selected in the left pane. 左窗格中可用性的具体值以绿色显示。The specific value of usability from the left pane is shown in green. “主题”的所有其他值均以黑色显示 。All the other values for Theme are shown in black.

  7. 平均线:除“可用性”(即选中的影响因素)以外,计算了“主题”的所有可能值的平均值 。Average line: The average is calculated for all possible values for Theme except usability (which is the selected influencer). 因此该计算适用于所有黑色的值。So the calculation applies to all the values in black. 它显示了其他低分“主题”的百分比 。It tells you what percentage of the other Themes had a low rating. 本例中,11.35% 的主题获得了低分(虚线所示)。In this case 11.35% had a low rating (shown by the dotted line).

  8. 复选框:筛选掉右侧窗格中的视觉对象,仅显示影响该字段的值。Check box: Filters out the visual in the right pane to only show values that are influencers for that field. 本示例中,将按可用性、安全性和导航筛选视觉对象。In this example, this would filter the visual to usability, security and navigation.

分析分类型指标Analyze a metric that is categorical

观看此视频,了解如何创建具有分类指标的关键影响因素视觉对象。Watch this video to learn how to create a key influencers visual with a categorical metric. 然后,执行以下步骤创建一个。Then follow these steps to create one.

备注

此视频使用较低版本的 Power BI Desktop。This video uses an earlier version of Power BI Desktop.

你的产品经理希望找出哪些因素会导致客户对云服务进行负面评论。Your Product Manager wants you to figure out which factors lead customers to leave negative reviews about your cloud service. 若要继续了解,请打开 Power BI Desktop 中的客户反馈 .PBIX 文件To follow along, open the Customer Feedback PBIX file in Power BI Desktop. 也可以下载 Power BI 服务 或 Power BI Desktop 的客户反馈 Excel 文件You also can download the Customer Feedback Excel file for Power BI service or Power BI Desktop. 选择任一链接,然后在打开的 GitHub 页面中选择“下载” 。Select either link and then select Download from the GitHub page that opens.

备注

客户反馈数据集基于 [Moro et al., 2014] S. Moro, P. Cortez 和 P. Rita.The Customer Feedback data set is based on [Moro et al., 2014] S. Moro, P. Cortez, and P. Rita. “一种预测银行电话营销成功的数据驱动方法。”"A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems, Elsevier, 62:22-31, June 2014 。Decision Support Systems, Elsevier, 62:22-31, June 2014.

  1. 打开报表并选择“关键影响因素”图标 。Open the report, and select the Key influencers icon.

    从“可视化效果”窗格中选择“关键影响因素”模板

  2. 将要研究的指标移动到“分析”字段 。Move the metric you want to investigate into the Analyze field. 若要查看导致客户对服务的评级较低的因素,请选择“客户表” > “评级” 。To see what drives a customer rating of the service to be low, select Customer Table > Rating.

  3. 将认为可能影响“评级”的字段移动到“解释依据”区域 。Move fields that you think might influence Rating into the Explain by field. 可以根据需要移动任意数量的字段。You can move as many fields as you want. 在此例中从以下字段开始:In this case, start with:

    • 国家/地区Country-Region
    • 在组织中的角色Role in Org
    • 订阅类型Subscription Type
    • 公司规模Company Size
    • 主题Theme
  4. 将“扩展方式”字段保留为空 。Leave the Expand by field empty. 此字段仅在分析度量值或汇总字段时使用。This field is only used when analyzing a measure or summarized field.

  5. 若要集中查看负面评级,请在“什么导致评级为”的下拉框中选择“低” 。To focus on the negative ratings, select Low in the What influences Rating to be drop-down box.

    从下拉列表框中选择“低”

分析在所分析字段的表级别上运行。The analysis runs on the table level of the field that's being analyzed. 在此例中,为“评级”指标 。In this case, it's the Rating metric. 该指标是在客户级别定义的。This metric is defined at a customer level. 每位客户都给出了一个高分数或低分数。Each customer has given either a high score or a low score. 所有的解释因素都必须以客户级别进行定义,以便视觉对象可进行利用。All the explanatory factors must be defined at the customer level for the visual to make use of them.

在先前示例中,所有的解释因素与指标均为一对一或多对一关系。In the previous example, all of the explanatory factors have either a one-to-one or a many-to-one relationship with the metric. 在本例中,每个客户均为其评级分配了一个主题。In this case, each customer assigned a single theme to their rating. 同样,来自同一国家/地区的客户在其组织中执行同一成员身份类型和同一角色。Similarly, customers come from one country, have one membership type, and perform one role in their organization. 解释因素已经是客户属性,无需转换。The explanatory factors are already attributes of a customer, and no transformations are needed. 视觉对象可以直接使用它们。The visual can make immediate use of them.

本教程的后面部分将介绍更复杂的示例,其中具有一对多关系。Later in the tutorial, you look at more complex examples that have one-to-many relationships. 在这些情况下,必须首先将列向下聚合到客户级别,然后才能运行分析。In those cases, the columns have to first be aggregated down to the customer level before you can run the analysis.

用作解释因素的度量值和聚合也在“分析”指标的表级别进行评估 。Measures and aggregates used as explanatory factors are also evaluated at the table level of the Analyze metric. 本文后面部分将提供一些示例。Some examples are shown later in this article.

解释分类关键影响因素Interpret categorical key influencers

我们来看看导致低评级的关键影响因素。Let's take a look at the key influencers for low ratings.

影响低评级可能性的第一大因素Top single factor that influences the likelihood of a low rating

此示例中的客户可以使用三个角色:使用者、管理员和发布者。The customer in this example can have three roles: consumer, administrator, and publisher. 客户是导致低评级的首要因素。Being a consumer is the top factor that contributes to a low rating.

选择“在组织中的角色是客户”

更准确地说,客户给服务留下负面评分的可能性是 2.57 倍。More precisely, your consumers are 2.57 times more likely to give your service a negative score. 关键影响因素图表先在左侧列表中列出“在组织中的角色是客户” 。The key influencers chart lists Role in Org is consumer first in the list on the left. 选择“在组织中的角色是使用者”后,Power BI 会在右侧窗格中显示其他详细信息 。By selecting Role in Org is consumer, Power BI shows additional details in the right pane. 会显示各角色在导致低评级上的影响的比较分析。The comparative effect of each role on the likelihood of a low rating is shown.

  • 14.93% 的客户给出低分。14.93% of consumers give a low score.
  • 平均来看,5.78% 的所有其他角色给出了低分。On average, all other roles give a low score 5.78% of the time.
  • 与其他角色相比,使用者给出低分的可能性高出 2.57 倍。Consumers are 2.57 times more likely to give a low score compared to all other roles. 可以通过用绿色条数据除以红色虚线数据来确定。You can determine this by dividing the green bar by the red dotted line.

影响低评级可能性的第二大因素Second single factor that influences the likelihood of a low rating

关键影响因素视觉对象可比较许多不同变量的因素并对其进行排列。The key influencers visual compares and ranks factors from many different variables. 第二影响因素与“在组织中的角色”无关 。选择列表中的第二影响因素,即“主题为可用性” 。The second influencer has nothing to do with Role in Org. Select the second influencer in the list, which is Theme is usability.

选择“主题为可用性”

第二重要因素与客户评价的主题相关。The second most important factor is related to the theme of the customer’s review. 与对可靠性、设计或速度等其他主题发表评论的客户相比,对产品可用性发表评论的客户给出低分的可能性为 2.55 倍。Customers who commented about the usability of the product were 2.55 times more likely to give a low score compared to customers who commented on other themes, such as reliability, design, or speed.

在视觉对象中,由红色虚线显示的平均值从 5.78% 变为 11.34%。Between the visuals, the average, which is shown by the red dotted line, changed from 5.78% to 11.34%. 平均值是动态的,因为该值基于所有其他值的平均值。The average is dynamic because it's based on the average of all other values. 对于第一个影响因素,平均值排除了客户角色的影响。For the first influencer, the average excluded the customer role. 对于第二个影响因素,排除了可用性主题。For the second influencer, it excluded the usability theme.

选择“仅显示是影响因素的值”复选框,仅使用有影响的值进行筛选 。Select the Only show values that are influencers check box to filter by using only the influential values. 在此例中,他们是导致低分的角色。In this case, they're the roles that drive a low score. 12 个主题减少为 4 个主题,Power BI 将这 4 个主题识别为导致低评级的主题。Twelve themes are reduced to the four that Power BI identified as the themes that drive low ratings.

选择复选框

与其他视觉对象交互Interact with other visuals

每当选择画布上的切片器、筛选器或其他视觉对象时,关键影响因素视觉对象就会重新运行对新数据部分的分析。Every time you select a slicer, filter, or other visual on the canvas, the key influencers visual reruns its analysis on the new portion of data. 例如,可以将“公司规模”移动到报表,并用作切片器 。For example, you can move Company Size into the report and use it as a slicer. 使用它来了解企业客户的关键影响因素是否不同于一般人群。Use it to see if the key influencers for your enterprise customers are different than the general population. 公司规模大于 50000 人。An enterprise company size is larger than 50,000 employees.

选择“> 50000”会重新运行分析,可以看到影响因素已发生变化 。Selecting >50,000 reruns the analysis, and you can see that the influencers changed. 对于大型企业客户,低评级的首要影响因素具有与安全性相关的主题。For large enterprise customers, the top influencer for low ratings has a theme related to security. 你可能需要进一步调查,从而了解是否存在大型客户不满意的特定安全功能。You might want to investigate further to see if there are specific security features your large customers are unhappy about.

按公司规模切分

解释持续关键影响因素Interpret continuous key influencers

到目前为止,我们已了解如何使用视觉对象来探索不同类别字段影响低评级的方式。So far, you've seen how to use the visual to explore how different categorical fields influence low ratings. “解释依据”字段中也可含有年龄、身高和价格等连续性因素 。It's also possible to have continuous factors such as age, height, and price in the Explain by field. 让我们来看看将“服务期”从客户表移动到“解释依据”时会发生什么 。Let’s look at what happens when Tenure is moved from the customer table into Explain by. “服务期”表明客户使用服务的时长。Tenure depicts how long a customer has used the service.

随着“服务期”变长,获得低评级的可能性也在增加。As tenure increases, the likelihood of receiving a lower rating also increases. 这一趋势表明,长期客户更有可能给出负面评分。This trend suggests that the longer-term customers are more likely to give a negative score. 这种见解很有趣,会使你想进行后续跟进。This insight is interesting, and one that you might want to follow up on later.

可视化效果显示,每当服务期延长 13.44 个月,低评级的可能性平均增加 1.23 倍。The visualization shows that every time tenure goes up by 13.44 months, on average the likelihood of a low rating increases by 1.23 times. 在此情况下,13.44 个月表示服务期的标准差。In this case, 13.44 months depict the standard deviation of tenure. 所以你得到的见解着眼于按标准数额(服务期标准差)增加服务期会如何影响收到低评级的可能性。So the insight you receive looks at how increasing tenure by a standard amount, which is the standard deviation of tenure, affects the likelihood of receiving a low rating.

右侧窗格中的散点图绘制了每个服务期值的低评级的平均百分比。The scatter plot in the right pane plots the average percentage of low ratings for each value of tenure. 它用走向线突出显示了斜率。It highlights the slope with a trend line.

服务期的散点图

分箱的连续关键影响因素Binned continuous key influencers

在某些情况下,你可能会发现,连续因素会自动变为类别因素。In some cases you may find that your continuous factors were automatically turned into categorical ones. 这是因为我们意识到变量之间的关系并非是线性的,因此我们不能简单地将此关系描述为增加或减少(就像我们在上述示例中所做的那样)。This is because we realized the relationship between the variables is not linear and so we cannot describe the relationship as simply increasing or decreasing (like we did in the example above).

我们运行相关性测试,以确定影响因素与目标之间相关性的线性程度。We run correlation tests to determine how linear the influencer is with regards to the target. 如果目标属于连续目标,则运行 Perason 相关性测试,如果目标属于类别目标,则运行 Point Biserial 相关性测试。If the target is continuous, we run Pearson correlation and if the target is categorical, we run Point Biserial correlation tests. 如果我们检测到关系的线性不足,我们将执行监督式分箱并生成最多 5 个箱。为了弄清楚哪些箱最有意义,我们使用监督式分箱方法,该方法研究解释性因素与被分析的目标之间的关系。If we detect the relationship is not sufficiently linear we conduct supervised binning and generate a maximum of 5 bins. To figure out which bins make the most sense we use a supervised binning method which looks at the relationship between the explanatory factor and the target being analyzed.

将度量值和聚合值作为关键影响因素进行解释Interpret measures and aggregates as key influencers

可以将度量值和聚合值用作分析中的解释因素。You can use measures and aggregates as explanatory factors inside your analysis. 例如,你可能想要了解客户支持工单计数或开放式工单的平均持续时间对所得评分的影响。For example, you might want to see what effect the count of customer support tickets or the average duration of an open ticket has on the score you receive.

在此情况下,你想知道客户拥有的支持工单数量是否会影响其给出的分数。In this case, you want to see if the number of support tickets that a customer has influences the score they give. 现在,从支持工单表中引入“支持工单 ID” 。Now you bring in Support Ticket ID from the support ticket table. 由于客户可拥有多个支持工单,因此你可将 ID 聚合到客户级别。Because a customer can have multiple support tickets, you aggregate the ID to the customer level. 聚合非常重要,因为分析在客户级别运行,因此必须在该粒度级别上定义所有因素。Aggregation is important because the analysis runs on the customer level, so all drivers must be defined at that level of granularity.

我们来看看 ID 计数。Let's look at the count of IDs. 每个客户行都有一个与之关联的支持工单计数。Each customer row has a count of support tickets associated with it. 在此情况下,随着支持工单计数的增加,低评级的可能性增加了 5.51 倍。In this case, as the count of support tickets increases, the likelihood of the rating being low goes up 5.51 times. 右侧视觉对象显示了按不同“评级”值(在客户级别上评估)划分的支持工单的平均计数 。The visual on the right shows the average number of support tickets by different Rating values evaluated at the customer level.

支持工单 ID 的影响

解释结果:首要细分市场Interpret the results: Top segments

可以使用“关键影响因素”选项卡分别评估每个因素 。You can use the Key influencers tab to assess each factor individually. 还可以使用“首要区段”选项卡查看因素组合如何影响正在分析的指标 。You also can use the Top segments tab to see how a combination of factors affects the metric that you're analyzing.

首要区段一开始会显示 Power BI 发现的所有区段的概述。Top segments initially show an overview of all the segments that Power BI discovered. 以下示例显示找到了六个区段。The following example shows that six segments were found. 这些区段是按区段内低评级的百分比进行排序的。These segments are ranked by the percentage of low ratings within the segment. 例如,区段 1 客户的低评级百分比为 74.3%。Segment 1, for example, has 74.3% customer ratings that are low. 气泡越高,低评级的比例就越高。The higher the bubble, the higher the proportion of low ratings. 气泡尺寸表示该区段中的客户数量。The size of the bubble represents how many customers are within the segment.

选择“首要区段”选项卡

选择气泡可了解该细分市场的详细信息。Selecting a bubble drills into the details of that segment. 例如,如果选择“区段 1”,则会发现它由相对成熟的客户组成。If you select Segment 1, for example, you find that it's made up of relatively established customers. 他们成为客户已经超过 29 个月,并拥有超过四张支持工单。They've been customers for over 29 months and have more than four support tickets. 最后,他们并不是发布者,因此是客户或管理员。Finally, they're not publishers, so they're either consumers or administrators.

在这一组中,74.3% 的客户评价较低。In this group, 74.3% of the customers gave a low rating. 此时给出低评级的普通客户的百分比为 11.7%,所以此区段的低评级比例较高。The average customer gave a low rating 11.7% of the time, so this segment has a larger proportion of low ratings. 高出 63 个百分点。It's 63 percentage points higher. 区段 1 还包含大约 2.2% 的数据,因此可表示总体中的可处理部分。Segment 1 also contains approximately 2.2% of the data, so it represents an addressable portion of the population.

选择第一首要区段

添加计数Adding counts

有时,影响因素可以有很大的影响,但只代表极少的数据。Sometimes an influencer can have a big impact but represent very little of the data. 例如,“主题为可用性”是低分的第二大影响因素 。For example, Theme is usability is the second biggest influencer for low ratings. 但可能仅有少数客户抱怨可用性问题。However there might have only been a handful of customers who complained about usability. 计数有助于对要关注的影响因素进行优先级排序。Counts can help you prioritize which influencers you want to focus on.

可通过格式设置窗格中的“分析卡”启用计数 。You can turn counts on through the Analysis card of the formatting pane.

添加计数

启用计数之后,每个影响因素的气泡周围将显示一个环,表示该影响因素所包含数据的大致百分比。Once counts are turned on, you’ll see a ring around each influencer’s bubble, which represents the approximate percentage of data that influencer contains. 该环包围的气泡面积越大,其包含的数据就越多。The more of the bubble the ring circles, the more data it contains. 可以看到“主题为可用性”包含的数据比例非常小 。We can see that Theme is usability contains a very small proportion of data.

显示计数

还可以使用视觉对象左下角的“排序方式”按钮,先按计数(而不是影响)对气泡进行排序。You can also use the Sort by toggle in the bottom left of the visual to sort the bubbles by count first instead of impact. “订阅类型为顶级”是基于计数的最大影响因素 。Subscription Type is Premier is the top influencer based on count.

按计数排序

完整的圆环表示影响因素包含 100% 的数据。Having a full ring around the circle means the influencer contains 100% of the data. 可使用格式设置窗格上“分析卡”中的“计数类型”下拉菜单,将计数类型更改为与最大影响因素成比例 。You can change the count type to be relative to the maximum influencer using the Count type dropdown in the Analysis card of the formatting pane. 现在,具有最大数据量的影响因素将表示为完整的环,而其他所有计数都将与之成比例。Now the influencer with the most amount of data will be represented by a full ring and all other counts will be relative to it.

显示相对计数

分析数值型指标Analyze a metric that is numeric

如果将未汇总的数值字段移动到“分析”字段,可以选择如何处理该场景 。If you move an unsummarized numerical field into the Analyze field, you have a choice how to handle that scenario. 可以通过进入“格式化窗格”并在“类别分析类型”和“连续分析类型”之间切换来更改视觉对象的行为 。You can change the behavior of the visual by going into the Formatting Pane and switching between Categorical Analysis Type and Continuous Analysis Type.

从类别更改为连续

“类别分析类型”的行为如上所述 。A Categorical Analysis Type behaves as described above. 例如,如果正在查看从 1 到 10 的调查分数,则可以问“是什么导致调查分数为 1?”For instance, if you were looking at survey scores ranging from 1 to 10, you could ask ‘What influences Survey Scores to be 1?’

“连续分析类型”将问题更改为连续型问题 。A Continuous Analysis Type changes the question to a continuous one. 在上述例子中,问题就变成了“是什么导致调查分数增加/减少?”In the example above, our new question would be ‘What influences Survey Scores to increase/decrease?’

如果分析的字段中有许多唯一值,这一区别会非常有用。This distinction is very helpful when you have lots of unique values in the field you are analyzing. 以下示例以房价为例。In the example below we look at house prices. 如果问“是什么导致房价是 156,214”,并无多大意义。It is not very meaningful to ask ‘What influences House Price to be 156,214?’ 因为这个问题太具体,很可能没有足够的数据来推断出一种模式。as that is very specific and we are likely not to have enough data to infer a pattern.

但我们可以问,“影响房价上涨的因素是什么?”Instead we may want to ask, ‘What influences House Price to increase’? 这样我们可以将房价视为一个范围而非具体的值。which allows us to treat house prices as a range rather than distinct values.

数值型问题

解释结果:关键影响因素Interpret the results: Key influencers

在此方案中,我们关注“影响房价上涨的因素”。In this scenario we look at ‘What influences House Price to increase’. 我们会关注一些可能影响房价的解释性因素,如“建成年份”(房屋建成年份)、“KitchenQual”(厨房质量)和“YearRemodAdd”(房屋改造年份) 。We are looking at a number of explanatory factors that could impact a house price like Year Built (year the house was built), KitchenQual (kitchen quality) and YearRemodAdd (year the house was remodeled).

在以下示例中,我们关注影响力最大的因素,即厨房质量非常好。In the example below we look at our top influencer which is kitchen quality being Excellent. 结果与我们分析类别指标时看到的结果非常相似,但有一些重要区别:The results are very similar to the ones we saw when we were analyzing categorical metrics with a few important differences:

  • 右侧的柱形图关注平均值而不是百分比。The column chart on the right is looking at the averages rather than percentages. 因此,它向我们展示了拥有优质厨房的房屋的平均房价(绿色条)与没有优质厨房的房屋的平均房价(虚线)的对比It therefore shows us what the average house price of a house with an excellent kitchen is (green bar) compared to the average house price of a house without an excellent kitchen (dotted line)
  • 气泡中的数字仍然是红色虚线和绿色条之间的差额,但它表示为数字 ($158.49K) 而不是可能性 (1.93x)。The number in the bubble is still the difference between the red dotted line and green bar but it’s expressed as a number ($158.49K) rather than a likelihood (1.93x). 因此,平均而言,拥有优质厨房的房屋比没有优质厨房的房屋贵将近 16 万美元。So on average, houses with excellent kitchens are almost $160K more expensive than houses without excellent kitchens.

数值目标类别影响因素

在以下示例中,我们关注连续型因素(房屋改造年份)对房价的影响。In the example below we are looking at the impact a continuous factor (year house was remodeled) has on house price. 以下显示了与类别指标持续性影响因素的分析方式之间的差异:The differences compared to how we analyze continuous influencers for categorical metrics are as follows:

  • 右侧窗格中的散点图绘制了每个不同改造年份的平均房价。The scatter plot in the right pane plots the average house price for each distinct value of year remodeled.
  • 气泡中的值显示了平均房价上涨的幅度(在本例中是 287 万美元),而房屋改造年份的增加幅度是其标准差(在本例中是 20 年)The value in the bubble shows by how much the average house price increases (in this case $2.87k) when the year the house was remodeled increases by its standard deviation (in this case 20 years)

数值目标连续型影响因素

最后,在度量值的例子中,我们关注房屋建成的平均年份。Finally, in the case of measures we are looking at the average year a house was built. 分析如下:The analysis here is as follows:

  • 右侧窗格中的散点图绘制了表中每个不同值的平均房价The scatterplot in the right pane plots the average house price for each distinct value in the table
  • 气泡中的值显示了平均房价上涨的幅度(在本例中是 $1.35K),而平均年份增加幅度是其标准差(在本例中是 30 年)The value in the bubble shows by how much the average house price increases (in this case $1.35K) when the average year increases by its standard deviation (in this case 30 years)

数值目标度量值影响因素

解释结果:首要区段Interpret the results: Top Segments

数值目标的首要区段显示平均房价高于整体数据集的组。Top segments for numerical targets show groups where the house prices on average are higher than in the overall dataset. 例如,下面我们可以看到“区段 1”包含符合后述特征的房屋:“GarageCars”(车库可容纳的汽车数量)大于 2,“RoofStyle”(屋顶风格)为“时尚” 。For example, below we can see that Segment 1 is made up of houses where GarageCars (number of cars the garage can fit) is greater than 2 and the RoofStyle is Hip. 具有这些特征的房屋的平均价格为 355,000 美元,而总体数据的平均值为 18 万美元。Houses with those characteristics have an average price of $355K compared to the overall average in the data which is $180K.

数值目标度量值影响因素

分析度量值或汇总列型指标Analyze a metric that is a measure or a summarized column

如果是度量值或汇总列,则分析默认为上述的“连续分析类型” 。In the case of a measure or summarized column the analysis defaults to the Continuous Analysis Type described above. 这无法更改。This cannot be changed. 分析度量值/汇总列和分析未汇总数字列之间的最大区别在于分析运行的级别。The biggest difference between analyzing a measure/summarized column and an unsummarized numeric column is the level at which the analysis runs.

如果是未汇总列,分析总是在表级运行。In the case of unsummarized columns, the analysis always runs at the table level. 在上述房价示例中,我们分析了“房价”指标,以了解房价上涨/下跌的影响因素 。In the house price example above, we analyzed the House Price metric to see what influences a house price to increase/decrease. 该分析在表级别自动运行。The analysis automatically runs on the table level. 表中每个房屋都有唯一的 ID,因此分析在房屋级别运行。Our table has a unique ID for each house so the analysis runs at a house level.

度量值表

对于度量值和汇总列,无法立即确定在哪个级别进行分析。For measures and summarized columns, we don't immediately know what level to analyze them at. 如果将“房价”汇总为“平均”,则需要考虑要在哪个级别上计算此平均房价 。If House Price was summarized as an Average, we would need to consider what level we would like this average house price calculated. 是社区级别的平均房价?Is it the average house price at a neighborhood level? 还是地区级别的?Or perhaps a regional level?

在所用“扩展方式”字段的级别自动分析度量值和汇总列 。Measures and summarized columns are automatically analyzed at the level of the Explain by fields used. 假设“扩展方式”中有三个我们感兴趣的字段 :“厨房质量”、“楼宅类型”和“空调” 。Imagine we have three fields in Explain By we are interested in: Kitchen Quality, Building Type and Air Conditioning. 将计算这三个字段的每个独特组合的“平均房价” 。Average House Price would be calculated for each unique combination of those three fields. 切换为表视图查看将评估的数据是什么样通常会有所帮助。It is often helpful to switch to a table view to take a look at what the data being evaluated looks like.

度量值表

这种分析完全是总结性的,因此回归模型很难在数据中发现可学习的模式。This analysis is very summarized and so it will be hard for the regression model to find any patterns in the data it can learn from. 应在更详细的级别运行分析,以获得更好的结果。We should run the analysis at a more detailed level to get better results. 如果想要在房屋级别分析房价,则需要将“ID”字段显式添加到分析 。If we wanted to analyze the house price at the house level we would need to explicitly add the ID field to the analysis. 但我们不想将房屋 ID 视为影响因素。Nevertheless, we don't want the house ID to be considered an influencer. 了解房价随房屋 ID 的增加而上涨没有意义。It is not helpful to learn that as house ID increases, the price of a house increase. 这时,“扩展方式”字段格选项就很方便 。This is where the Expand By field well option comes in handy. 使用“扩展方式”,可添加要用于设置分析级别的字段,而无需寻找新的影响因素 。You can use Expand By to add fields you want to use for setting the level of the analysis without looking for new influencers.

将“ID”添加到“扩展方式”之后,查看可视化效果 。Take a look at what the visualization looks like once we add ID to Expand By. 定义了想要评估度量值的级别之后,解释未汇总数字列的影响因素是完全相同的。Once you have defined the level at which you want your measure evaluated, interpreting influencers is exactly the same as for unsummarized numeric columns.

度量值表

如果想要详细了解如何使用关键影响因素可视化效果分析度量值,请观看以下教程。If you would like to learn more about how you can analyze measures with the key influencers visualization please watch the following tutorial.

注意事项和疑难解答Considerations and troubleshooting

视觉对象的限制是什么? What are the limitations for the visual?

关键影响因素视觉对象具有一些限制:The key influencers visual has some limitations:

  • 不支持直接查询Direct Query is not supported
  • 不支持与 Azure Analysis Services 和 SQL Server Analysis Services 的实时连接Live Connection to Azure Analysis Services and SQL Server Analysis Services is not supported
  • 不支持发布到 WebPublish to web is not supported
  • 需要 .NET Framework 4.6 或更高版本.NET Framework 4.6 or higher is required

数值型问题

我遇到错误:找不到任何影响因素或区段。这是为什么?I see an error that no influencers or segments were found. Why is that?

“未找到影响因素”错误

“解释依据”中已包含字段但未找到影响因素时,会出现此错误 。This error occurs when you included fields in Explain by but no influencers were found.

  • 同时在“分析”和“解释依据”中包含了所分析的指标 。You included the metric you were analyzing in both Analyze and Explain by. 将其从“解释依据”中删除 。Remove it from Explain by.
  • 解释字段具有太多类别,而只有少量观测数据。Your explanatory fields have too many categories with few observations. 这种情况使得可视化效果很难确定哪些因素是影响因素。This situation makes it hard for the visualization to determine which factors are influencers. 仅根据少数观察结果很难概括。It’s hard to generalize based on only a few observations. 如果正在分析数值字段,可能会需要在“分析”卡下的“格式化窗格”中从“类别分析”切换到“连续分析” 。If you are analyzing a numeric field you may want to switch from Categorical Analysis to Continuous Analysis in the Formatting Pane under the Analysis card.
  • 解释因素具有足够的观测数据来进行归纳,但可视化效果并未发现任何有意义的相关性。Your explanatory factors have enough observations to generalize, but the visualization didn't find any meaningful correlations to report.

出现了错误:我分析的指标没有足够数据运行分析。这是为什么?I see an error that the metric I'm analyzing doesn't have enough data to run the analysis on. Why is that?

“数据不充足”错误

可视化的工作原理是将一个组的数据模式与其他组进行比较。The visualization works by looking at patterns in the data for one group compared to other groups. 例如,它会查找所给评级较低的客户,而不是评级较高的客户。For example, it looks for customers who gave low ratings compared to customers who gave high ratings. 如果模型中的数据只有少量观测数据,则很难发现模式。If the data in your model has only a few observations, patterns are hard to find. 如果可视化效果没有足够数据查找有意义的影响因素,将表示需要更多数据运行分析。If the visualization doesn’t have enough data to find meaningful influencers, it indicates that more data is needed to run the analysis.

建议为所选状态至少获取 100 个观测数据。We recommend that you have at least 100 observations for the selected state. 在此例中,状态是客户流失。In this case, the state is customers who churn. 还需要为用于比较的状态获取至少 10 个观测数据。You also need at least 10 observations for the states you use for comparison. 在此例中,比较状态是客户未流失。In this case, the comparison state is customers who don't churn.

如果正在分析数值字段,可能会需要在“分析”卡下的“格式化窗格”中从“类别分析”切换到“连续分析” 。If you are analyzing a numeric field you may want to switch from Categorical Analysis to Continuous Analysis in the Formatting Pane under the Analysis card.

我看到一个错误:如果“分析”未汇总,则分析总是在其父表的行级运行。不允许通过“扩展方式”字段更改此级别。这是为什么?I see an error that when 'Analyze' is not summarized, the analysis always runs at the row level of its parent table. Changing this level via 'Expand by' fields is not allowed. Why is that?

分析数值列或分类列时,分析总是在表级运行。When analyzing a numeric or categorical column, the analysis always runs at the table level. 例如,如果分析的是房价,且表中包含 ID 列,则分析将自动在房屋 ID 级别运行。For example, if you are analyzing house prices and your table contains an ID column, the analysis will automatically run at the house ID level.

如果分析的是度量值或汇总列,则需要显式地声明想要分析在哪个级别运行。When you are analyzing a measure or summarized column, you need to explicitly state at which level you would like the analysis to run at. 可以使用“扩展方式”更改度量值和汇总列的分析级别,而不添加新的影响因素 。You can use Expand by to change the level of the analysis for measures and summarized columns without adding new influencers. 如果将“房价”定义为度量值,则可以向“扩展方式”添加房屋 ID 列,从而更改分析级别 。If House price was defined as a measure you could add the house ID column to Expand by to change the level of the analysis.

出现了错误:“解释依据”中的字段与包含所分析指标的表并不唯一相关。 这是为什么?I see an error that a field in Explain by isn't uniquely related to the table that contains the metric I'm analyzing. Why is that?

分析在所分析字段的表级别上运行。The analysis runs on the table level of the field that's being analyzed. 例如,如果分析的是客户对服务的反馈,则可能具有一个表,该表可告知客户给出了高评级或低评级。For example, if you analyze customer feedback for your service, you might have a table that tells you whether a customer gave a high rating or a low rating. 在此例中,分析会在客户表级别运行。In this case, your analysis is running at the customer table level.

如果具有相关表,同时该表是以比包含指标的表更精细的级别进行定义的,则会出现此错误。If you have a related table that's defined at a more granular level than the table that contains your metric, you see this error. 下面的示例说明:Here's an example:

  • 你在分析导致客户对服务给出低评级的原因。You analyze what drives customers to give low ratings of your service.
  • 你想要查看客户使用服务的设备是否会影响给出的评价。You want to see if the device on which the customer is consuming your service influences the reviews they give.
  • 客户可以通过多种不同方式使用服务。A customer can consume the service in multiple different ways.
  • 在下面的示例中,客户 10000000 同时使用浏览器和平板电脑与服务交互。In the following example, customer 10000000 uses both a browser and a tablet to interact with the service.

一个相关表,该表以比包含指标的表更精细的级别进行定义

如果尝试使用设备列作为解释因素,则会出现以下错误:If you try to use the device column as an explanatory factor, you see the following error:

“错误的列”错误

出现此错误是因为未在客户级别定义设备。This error appears because the device isn't defined at the customer level. 客户可以在多个设备上使用该服务。One customer can consume the service on multiple devices. 若要使可视化效果可查找模式,设备必须是客户属性。For the visualization to find patterns, the device must be an attribute of the customer. 有几种解决方案,取决于你对业务的理解:There are several solutions that depend on your understanding of the business:

  • 可以更改要计数的设备的摘要。You can change the summarization of devices to count. 例如,如果设备数量可能影响客户给出的分数,请使用计数。For example, use count if the number of devices might affect the score that a customer gives.
  • 你可以透视设备列,查看在特定设备上使用服务是否会影响客户评级。You can pivot the device column to see if consuming the service on a specific device influences a customer’s rating.

此示例对数据进行了透视,以便为浏览器、移动设备和平板电脑创建新列(请确保在透视数据后在建模视图中删除并重新创建关系)。In this example, the data was pivoted to create new columns for browser, mobile, and tablet (make sure you delete and re-create your relationships in the modeling view after pivoting your data). 现在可以在“解释依据”中使用这些特定设备 。You can now use these specific devices in Explain by. 所有设备都是影响因素,而浏览器对客户评分的影响最大。All devices turn out to be influencers, and the browser has the largest effect on customer score.

更准确地说,不通过浏览器使用服务的客户比通过浏览器使用服务的客户给出低分数的可能性高 3.79 倍。More precisely, customers who don't use the browser to consume the service are 3.79 times more likely to give a low score than the customers who do. 列表中越往下走,这一数字越低,移动端的情况则完全相反。Lower down in the list, for mobile the inverse is true. 使用移动应用的客户比不使用移动应用的客户更可能给出低分。Customers who use the mobile app are more likely to give a low score than the customers who don’t.

已解决

出现了警告:度量值未包含在分析中。这是为什么?I see a warning that measures weren't included in my analysis. Why is that?

“未包含度量值”错误

分析在所分析字段的表级别上运行。The analysis runs on the table level of the field that's being analyzed. 如果分析的是客户流失,可能会有表格告知是否有客户流失。If you analyze customer churn, you might have a table that tells you whether a customer churned or not. 在此例中,分析会在客户表级别运行。In this case, your analysis runs at the customer table level.

默认情况下,在该表级别分析度量值和聚合。Measures and aggregates are by default analyzed at the table level. 如果存在“每月平均支出”度量值,则将在客户表级别对其进行分析。If there were a measure for average monthly spending, it would be analyzed at the customer table level.

如果客户表没有唯一标识符,则无法对度量值进行评估,分析中也会将其忽略。If the customer table doesn't have a unique identifier, you can't evaluate the measure and it's ignored by the analysis. 若要避免这种情况,请确保包含指标的表具有唯一标识符。To avoid this situation, make sure the table with your metric has a unique identifier. 在此例中,是客户表,其唯一标识符是客户 ID。In this case, it's the customer table and the unique identifier is customer ID. 使用 Power Query 添加索引列也很轻松。It’s also easy to add an index column by using Power Query.

出现了警告:所分析指标具有 10 个以上唯一值,这可能会影响分析质量。这是为什么?I see a warning that the metric I'm analyzing has more than 10 unique values and that this amount might affect the quality of my analysis. Why is that?

AI 可视化效果可以分析类别字段和数值字段。The AI visualization can analyze categorical fields and numeric fields. 在类别字段的情况下,一个例子是“客户流失”为“是”或“否”,客户满意度为“高”、“中”或“低”。In the case of categorical fields, an example may be Churn is Yes or No, and Customer Satisfaction is High, Medium, or Low. 增加要分析的类别数意味着每个类别的观察量会减少。Increasing the number of categories to analyze means there are fewer observations per category. 这种情况使可视化效果更难以在数据中找到模式。This situation makes it harder for the visualization to find patterns in the data.

在分析数值字段时,可以选择将数值字段作为文本处理,在这种情况下,所运行的分析与类别数据的情况相同(“类别分析”) 。When analyzing numeric fields you have a choice between treating the numeric fields like text in which case you will run the same analysis as you do for categorical data (Categorical Analysis). 如果有大量不同的值,建议将分析切换为“连续分析”,因为这意味着可以通过数值增加或减少来推断模式,而不是将它们视为离散的值 。If you have lots of distinct values we recommend you switch the analysis to Continuous Analysis as that means we can infer patterns from when numbers increase or decrease rather than treating them as distinct values. 可以在“分析”卡下的“格式化窗格”中从“分类分析”切换到“连续分析” 。You can switch from Categorical Analysis to Continuous Analysis in the Formatting Pane under the Analysis card.

为了查找更强的影响因素,建议将类似值组合到一个单元中。To find stronger influencers, we recommend that you group similar values into a single unit. 例如,如果有价格指标,将类似价格分组为“高”、“中”、“低”类别,而不是使用单独的价格点,可能会获得更好的结果。For example, if you have a metric for price, you're likely to obtain better results by grouping similar prices into High, Medium, and Low categories vs. using individual price points.

“超过 10 个不同因素”警告

数据中的一些因素看起来应该是关键影响因素,但并非如此。为何出现这种情况?There are factors in my data that look like they should be key influencers, but they aren't. How can that happen?

在以下示例中,身份为使用者的客户导致评级较低,低评级占 14.93%。In the following example, customers who are consumers drive low ratings, with 14.93% of ratings that are low. 管理员角色也有很高比例的低评级 (13.42%),但未将其视为影响因素。The administrator role also has a high proportion of low ratings, at 13.42%, but it isn't considered an influencer.

得出此判断的原因是可视化效果在发现影响因素时也考虑了数据点的数量。The reason for this determination is that the visualization also considers the number of data points when it finds influencers. 以下示例包含超过 29,000 个使用者和数量是使用者数量 10 分之 1 的管理员(约为 2,900 个)。The following example has more than 29,000 consumers and 10 times fewer administrators, about 2,900. 其中仅 390 人给出低评级。Only 390 of them gave a low rating. 视觉对象没有足够数据来确定是否凭借管理员评级找到了模式,或者是否只是偶然找到。The visual doesn’t have enough data to determine whether it found a pattern with administrator ratings or if it’s just a chance finding.

如何确定影响因素

如何计算类别分析的关键影响因素?How do you calculate key influencers for categorical analysis?

AI 可视化效果使用 ML.NET 在后台运行逻辑回归来计算关键影响因素。Behind the scenes, the AI visualization uses ML.NET to run a logistic regression to calculate the key influencers. 逻辑回归是统计模型,用于相互比较不同的组。A logistic regression is a statistical model that compares different groups to each other.

如果想要找出导致低评级的原因,逻辑回归会查找给出低评级的客户与给出高评级的客户之间的区别。If you want to see what drives low ratings, the logistic regression looks at how customers who gave a low score differ from the customers who gave a high score. 如果有多个类别,例如高、中和低分,可以研究评分较低的客户与评分不低的客户的差异。If you have multiple categories, such as high, neutral, and low scores, you look at how the customers who gave a low rating differ from the customers who didn't give a low rating. 在此例中,评分低的客户与评分高或中等的客户有什么不同?In this case, how do the customers who gave a low score differ from the customers who gave a high rating or a neutral rating?

逻辑回归在数据中搜索模式,查找给出低评级的客户与给出高评级的客户之间的区别。The logistic regression searches for patterns in the data and looks for how customers who gave a low rating might differ from the customers who gave a high rating. 例如,逻辑回归可能会发现,与拥有很少或没有支持工单的客户相比,拥有更多支持工单的客户给出低评级的百分比要高得多。It might find, for example, that customers with more support tickets give a higher percentage of low ratings than customers with few or no support tickets.

逻辑回归还会考虑存在的数据点数量。The logistic regression also considers how many data points are present. 例如,如果具有管理员角色的客户给出负面分数的比例更高,但由于管理员数量很少,便不会将其视为影响因素。For example, if customers who play an admin role give proportionally more negative scores but there are only a few administrators, this factor isn't considered influential. 得出此判断的原因是没有足够的数据点可用于推断模式。This determination is made because there aren't enough data points available to infer a pattern. 统计检验(称为 Wald 检验)用于确定是否将某因素视为影响因素。A statistical test, known as a Wald test, is used to determine whether a factor is considered an influencer. 视觉对象使用 0.05 的 p 值确定阈值。The visual uses a p-value of 0.05 to determine the threshold.

如何计算数值分析的关键影响因素?How do you calculate key influencers for numeric analysis?

AI 可视化效果使用 ML.NET 在后台运行线性回归来计算关键影响因素。Behind the scenes, the AI visualization uses ML.NET to run a linear regression to calculate the key influencers. 线性回归是一种统计模型,它研究正在分析的字段结果如何基于解释性因素而变化。A linear regression is a statistical model that looks at how the outcome of the field you are analyzing changes based on your explanatory factors.

例如,如果分析房价,那么线性回归关注优质厨房对房价的影响。For example, if we are analyzing house prices, a linear regression will look at the impact having an excellent kitchen will have on the house price. 与没有优质厨房的房屋相比,拥有优质厨房的房屋通常具有更低还是更高的房价?Do houses with excellent kitchens generally have lower or higher house prices compared to houses without excellent kitchens?

线性回归还考虑数据点的数量。The linear regression also considers the number of data points. 例如,如果带网球场的房屋价格较高,但带网球场的房屋很少,这个因素就不视为具有影响力。For example, if houses with tennis courts have higher prices but we have very few houses that have a tennis court, this factor is not considered influential. 得出此判断的原因是没有足够的数据点可用于推断模式。This determination is made because there aren't enough data points available to infer a pattern. 统计检验(称为 Wald 检验)用于确定是否将某因素视为影响因素。A statistical test, known as a Wald test, is used to determine whether a factor is considered an influencer. 视觉对象使用 0.05 的 p 值确定阈值。The visual uses a p-value of 0.05 to determine the threshold.

如何计算细分市场?How do you calculate segments?

AI 可视化效果使用 ML.NET 在后台运行决策树查找关注的子组。Behind the scenes, the AI visualization uses ML.NET to run a decision tree to find interesting subgroups. 决策树的目标是最终得到在关注的指标中相对高的数据点子组。The objective of the decision tree is to end up with a subgroup of data points that's relatively high in the metric you're interested in. 可能会是评级较低的客户或价格较高的房屋。This could be customers with low ratings or houses with high prices.

决策树采用每个解释因素,并试图推断出给出最佳“分支”的因素 。The decision tree takes each explanatory factor and tries to reason which factor gives it the best split. 例如,如果将数据筛选为仅包含大型企业客户,是否可将给出高评级的客户与给出低评级的客户区分开来?For example, if you filter the data to include only large enterprise customers, will that separate out customers who gave a high rating vs. a low rating? 或者,如果将数据筛选为仅包含评论了安全性的客户,情况是否可能更好?Or perhaps is it better to filter the data to include only customers who commented about security?

决策树设置分支后,它会获取数据子组,并确定该数据的下一个最佳分支。After the decision tree does a split, it takes the subgroup of data and determines the next best split for that data. 在此例中,子组是评论了安全性的客户。In this case, the subgroup is customers who commented on security. 每次创建分支后,还会考虑是否有足够的数据点使其成为具有足够代表性的组来推断模式,或者它是否只是数据中的异常情况,并非真正的区段。After each split, it also considers whether it has enough data points for this group to be representative enough to infer a pattern from or whether it's an anomaly in the data and not a real segment. 另一种统计检验适用于 p 值为 0.05 时检查分支条件的统计学意义。Another statistical test is applied to check for the statistical significance of the split condition with p-value of 0.05.

决策树完成运行后,将采用所有分支(例如安全评论和大型企业)并创建 Power BI 筛选器。After the decision tree finishes running, it takes all the splits, such as security comments and large enterprise, and creates Power BI filters. 此筛选器组合打包为视觉对象中的细分市场。This combination of filters is packaged up as a segment in the visual.

为什么将更多字段移动到“解释依据”字段时,某些因素会成为影响因素或不再是影响因素? Why do certain factors become influencers or stop being influencers as I move more fields into the Explain by field?

可视化效果同时评估所有解释因素。The visualization evaluates all explanatory factors together. 一个因素单独来看时可能是一个影响因素,但与其他因素一起考虑时,它可能就不是了。A factor might be an influencer by itself, but when it's considered with other factors it might not. 假设需要分析导致房价高的原因,卧室和房子大小是解释因素:Suppose you want to analyze what drives a house price to be high, with bedrooms and house size as explanatory factors:

  • 就其本身而言,更多的卧室可能是房价高的驱动因素。By itself, more bedrooms might be a driver for house prices to be high.
  • 在分析中纳入房屋大小意味着,现在可以了解在房屋大小不变的情况下,卧室会有何变化。Including house size in the analysis means you now look at what happens to bedrooms while house size remains constant.
  • 如果房屋大小固定为 1,500 平方英尺,那么光增加卧室数量则不太可能使房价大幅提升。If house size is fixed at 1,500 square feet, it's unlikely that a continuous increase in the number of bedrooms will dramatically increase the house price.
  • 在将房屋大小纳入考虑之后,卧室作为一个因素的重要性可能会降低。Bedrooms might not be as important of a factor as it was before house size was considered.

后续步骤Next steps