Microsoft Power BI is a business analytics service that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards by themselves without having to depend on information technology staff or database administrators.
When you use Azure Databricks as a data source with Power BI, you can bring the advantages of Azure Databricks performance and technology beyond data scientists and data engineers to all business users.
You can connect Power BI Desktop to your Azure Databricks clusters using the built-in Spark connector. As a bonus, this connector lets you use DirectQuery to offload processing to Databricks, which is great when you have a massive amount of data that you don’t want to load into PowerBI or when you want to perform near real-time analysis.
- Download and install the Power BI Desktop client. See Get Power BI Desktop in the Microsoft documentation.
- Get a personal access token for Databricks API access. See Authentication.
Connect Power BI Desktop to a Databricks cluster
You must first get the JDBC connection information for your cluster and then provide that information as a server address when you configure the connection in Power BI Desktop.
Step 1: Get the JDBC server address
In Azure Databricks, go to Clusters and select the cluster you want to connect to.
On the cluster edit page, scroll down and select the JDBC/ODBC tab.
On the JDBC/ODBC tab, copy and save the JDBC URL.
Construct the JDBC server address that you will use when you set up your Spark cluster connection in Power BI Desktop. Take the JDBC URL that you copied and saved in step 3 and do the following:
Remove everything in the path between the port number and
sql, retaining the components indicated by the boxes in the image below.
In our example, the server address would be:
or, if you choose the aliased version:
Step 2: Configure and make the connection in Power BI Desktop
Launch Power BI Desktop, click Get Data in the toolbar, and click More….
In the Get Data dialog, search for and select the Spark connector.
On the Spark dialog, configure your cluster connection.
- Server: Enter the server address that you constructed from the JDBC URL in Step 1.
- Protocol: Select HTTP.
- Data Connectivity mode: Select DirectQuery, which lets you offload processing to Spark. This is ideal when you have a large volume of data or when you want near real-time analysis.
On the next dialog, enter the word
tokenin the User name field and a personal access token in the Password field.
The Power BI Navigator should display the data available for query in your Databricks cluster.