使用 DirectQuery 的 HDInsight 上的 SparkSpark on HDInsight with DirectQuery

通过使用 DirectQuery 的 Azure HDInsight 上的 Spark 你可以根据 Spark 群集中已有的数据和指标来创建动态报表。Spark on Azure HDInsight with DirectQuery allows you to create dynamic reports based on data and metric you already have in your Spark cluster. 借助 DirectQuery,查询会在你浏览报表视图中的数据时发送回 Azure HDInsight Spark 群集。With DirectQuery, queries are sent back to your Azure HDInsight Spark cluster as you explore the data in the report view. 对于熟悉自己连接到的实体的用户,建议使用此体验。This experience is suggested for users who are familiar with the entities they connect to.

警告

对于在基于 Spark 的数据集上生成的仪表板磁贴,已禁用自动磁贴刷新。Automatic tile refresh has been disabled for dashboard tiles built on Spark based datasets. 可以选择“刷新仪表板磁贴”手动刷新。You can select Refresh Dashboard Tiles to refresh manually. 报告不受影响,应持续更新。Reports are not impacted and should remain up-to-date.

可使用以下步骤并通过 Power BI 服务中的 DirectQuery 的连接到 Azure HDInsight 上的 Spark 数据源。You can use the following steps to connect to your Spark on Azure HDInsight data source using DirectQuery within the Power BI service.

  1. 选择左侧导航窗格底部的获取数据Select Get Data at the bottom of the left navigation pane.

  2. 选择数据库和其他Select Databases & More.

  3. 选择HDInsight 上的 Spark连接器,然后选择连接Select the Spark on HDInsight connector and choose Connect.

  4. 输入要连接的服务器的名称,以及你的用户名密码Enter the name of the server you want to connect to, as well as your username and password. 服务器始终以 <群集名>.azurehdinsight.net 的形式表示,请参阅有关查找下面这些值的更多详细信息。The server is always in the form <clustername>.azurehdinsight.net, see more details about finding these values below.

  5. 连接后,你将看到命名为“SparkDataset”的新数据集。Once connected, you'll see a new dataset with named “SparkDataset”. 你还可以通过创建的占位符磁贴访问该数据集。You can also access the dataset through the placeholder tile that is created.

  6. 深入查看此数据集,你可以浏览数据库中的所有表和列。Drilling into the dataset, you can explore all of the tables and columns in your database. 选择某列会将查询发送回源,从而动态创建视觉对象。Selecting a column will send a query back to the source, dynamically creating your visual. 这些视觉对象可以保存在新报表中,并重新固定到仪表板。These visuals can be saved in a new report, and pinned back to your dashboard.

查找 HDInsight 上的 Spark 的参数Finding your Spark on HDInsight parameters

服务器始终以 <群集名>.azurehdinsight.net 的形式表示,并且可以在 Azure 门户中找到。The server is always in the form <clustername>.azurehdinsight.net, and can be found in the Azure portal.

此外,还可以在 Azure 门户中找到用户名和密码。The username and password can also be found in the Azure portal.

限制Limitations

随着我们继续改进体验,这些限制和说明可能会发生变化。These restrictions and notes may change as we continue to improve the experiences. 可在 Use BI tools with Apache Spark on Azure HDInsight(将 BI 工具与 Azure HDInsight 上的 Apache Spark 配合使用)中找到其他文档Additional documentation can be found at Use BI tools with Apache Spark on Azure HDInsight

  • Power BI 服务仅支持 Spark 2.0 和 HDInsights 3.5 的配置。The Power BI service only supports a configuration of Spark 2.0 and HDInsight 3.5.
  • 每个操作(例如选择列或添加筛选器)都会将查询发送回数据库 – 因此在选择非常大的字段之前,请考虑选择适当的视觉对象类型。Every action such as selecting a column or adding a filter will send a query back to the database – before selecting very large fields, consider choosing an appropriate visual type.
  • 问答不可用于 DirectQuery 数据集。Q&A is not available for DirectQuery datasets.
  • 不会自动选取架构更改。Schema changes are not picked up automatically.
  • Power BI 支持数据集中跨所有表的 16,000 列。Power BI supports 16,000 columns across all tables within a dataset. Power BI 还包括每个表中的内部行号列。Power BI also includes an internal row number column per table. 这意味着,如果数据集中有 100 个表,可用的列数将为 15,900。This means if you have 100 tables in the dataset, the available number of columns would be 15,900. 具体取决于你正在从 Spark 数据源使用的数据量,可能会受此限制。Depending on the amount of data you are working with from your Spark data source, you may encounter this limitation.

故障排除Troubleshooting

如果在对群集执行查询时遇到问题,请验证应用程序是否仍在运行,并且在必要时重新启动。If you're hitting issues executing queries against your cluster, verify the application is still running and restart if necessary.

还可以依次转到“配置” > “缩放群集”下,分配 Azure 门户内的其他资源:You can also allocate additional resources within the Azure portal under Configuration > Scale Cluster:

后续步骤Next steps

入门:使用 Spark SQL 在 HDInsight Linux 上创建 Apache Spark 群集并运行交互式查询Get started: Create Apache Spark cluster on HDInsight Linux and run interactive queries using Spark SQL
Power BI 入门Get started with Power BI
获取 Power BI 的数据Get Data for Power BI
更多问题?More questions? 尝试参与 Power BI 社区Try the Power BI Community