Analyze data in Azure Data Lake Storage Gen2 by using Power BI
In this article you'll learn how to use Power BI Desktop to analyze and visualize data that is stored in a storage account that has a hierarchical namespace (Azure Data Lake Storage Gen2).
Before you begin this tutorial, you must have the following:
- An Azure subscription. See Get Azure free trial.
- A storage account that has a hierarchical namespace. Follow these instructions to create one.
This article assumes that you've created an account named
- A sample data file named
Drivers.txtlocated in your storage account. You can download this sample from Azure Data Lake Git Repository, and then upload that file to your storage account.
- Power BI Desktop. You can download this from Microsoft Download Center.
Create a report in Power BI Desktop
Launch Power BI Desktop on your computer.
From the Home tab of the Ribbon, click Get Data, and then click More.
In the Get Data dialog box, click Azure, click Azure Data Lake Store Gen2 (Beta), and then click Connect.
In the Azure Data Lake Storage Gen2 dialog box, you can provide the URL to your Azure Data Lake Storage Gen2 account, filesystem or subfolder using the container endpoint format. URLs for Data Lake Storage Gen2 have the following pattern
https://<accountname>.dfs.core.windows.net/<filesystemname>/<subfolder>and then click OK.
In the next dialog box, click Sign in to sign into your storage account. You'll be redirected to your organization's sign in page. Follow the prompts to sign into the account.
After you've successfully signed in, click Connect.
The next dialog box shows all files under the URL you provided in step 4 above including the file that you uploaded to your storage account. Verify the information, and then click Load.
After the data has been successfully loaded into Power BI, you'll see the following fields in the Fields tab.
However, to visualize and analyze the data, we prefer the data to be available per the following fields.
In the next steps, we will update the query to convert the imported data in the desired format.
From the Home tab on the ribbon, click Edit Queries.
In the Query Editor, under the Content column, click Binary. The file will automatically be detected as CSV and you should see an output as shown below. Your data is now available in a format that you can use to create visualizations.
From the Home tab on the ribbon, click Close and Apply, and then click Close and Apply.
Once the query is updated, the Fields tab will show the new fields available for visualization.
Let us create a pie chart to represent the drivers in each city for a given country. To do so, make the following selections.
From the Visualizations tab, click the symbol for a pie chart.
The columns that we are going to use are Column 4 (name of the city) and Column 7 (name of the country). Drag these columns from Fields tab to Visualizations tab as shown below.
The pie chart should now resemble like the one shown below.
By selecting a specific country from the page level filters, you can now see the number of drivers in each city of the selected country. For example, under the Visualizations tab, under Page level filters, select Brazil.
The pie chart is automatically updated to display the drivers in the cities of Brazil.
From the File menu, click Save to save the visualization as a Power BI Desktop file.
Publish report to Power BI service
After you've created the visualizations in Power BI Desktop, you can share it with others by publishing it to the Power BI service. For instructions on how to do that, see Publish from Power BI Desktop.