Use DirectQuery in Power BI Desktop
With Power BI Desktop, when you connect to your data source, it is always possible to import a copy of the data into the Power BI Desktop. For some data sources, an alternative approach is available: connect directly to the data source using DirectQuery.
Supported Data Sources
For a full listing of data sources that support DirectQuery, see Data sources supported by DirectQuery.
How to Connect using DirectQuery
When you use Get Data to connect to a data source supported by DirectQuery, the connection window lets you select how you want to connect.
The differences between selecting Import and DirectQuery are the following:
Import – the selected tables and columns are imported into Power BI Desktop. As you create or interact with a visualization, Power BI Desktop uses the imported data. You must refresh the data, which imports the full data set again, to see any changes that occurred to the underlying data since the initial import or the most recent refresh.
DirectQuery – no data is imported or copied into Power BI Desktop. For relational sources, the selected tables and columns appear in the Fields list. For multi-dimensional sources like SAP Business Warehouse, the dimensions and measures of the selected cube appear in the Fields list. As you create or interact with a visualization, Power BI Desktop queries the underlying data source, which means you’re always viewing current data.
Many data modeling and data transformations are available when using DirectQuery, though with some limitations. When creating or interacting with a visualization, the underlying source must be queried and the time necessary to refresh the visualization is dependent on the performance of the underlying data source. When the data necessary to service the request has recently been requested, Power BI Desktop uses recent data to reduce the time required to display the visualization. Selecting Refresh from the Home ribbon will ensure all visualizations are refreshed with current data.
The Power BI and DirectQuery article describes DirectQuery in detail. Also, see the following sections for more information about benefits, limitations, and important considerations when using DirectQuery.
Benefits of using DirectQuery
There are a few benefits to using DirectQuery:
- DirectQuery lets you build visualizations over very large datasets, where it would otherwise be unfeasible to first import all of the data with pre-aggregation
- Underlying data changes can require a refresh of data, and for some reports, the need to display current data can require large data transfers, making re-importing data unfeasible. By contrast, DirectQuery reports always use current data
- The 1-GB dataset limitation does not apply to DirectQuery
Limitations of DirectQuery
There are currently a few limitations to using DirectQuery:
All tables must come from a single database, unless using composite models
If the Query Editor query is overly complex, an error will occur. To remedy the error you must either delete the problematic step in Query Editor, or Import the data instead of using DirectQuery. For multi-dimensional sources like SAP Business Warehouse, there is no Query Editor
Time intelligence capabilities are not available in DirectQuery. For example, special treatment of date columns (year, quarter, month, day, so on) is not supported in DirectQuery mode.
Limitations are placed on DAX expressions allowed in measures to ensure that queries sent to the underlying data source have acceptable performance.
There is a one-million-row limit for returning data when using DirectQuery. The limit does not affect aggregations or calculations used to create the dataset returned using DirectQuery, only the rows returned. For example, you can aggregate 10 million rows with your query that runs on the data source, and accurately return the results of that aggregation to Power BI using DirectQuery as long as the data returned to Power BI is less than 1 million rows. If more than 1 million rows would be returned from DirectQuery, Power BI returns an error.
Important considerations when using DirectQuery
The following three points should be taken into consideration when using DirectQuery:
Performance and load - All DirectQuery requests are sent to the source database, so the time required to refresh a visual is dependent on how long that back-end source takes to respond with the results from the query (or queries). The recommended response time (with requested data being returned) for using DirectQuery for visuals is five seconds or less, with a maximum recommended results response time of 30 seconds. Any longer, and the experience of a user consuming the report becomes unacceptably poor. In addition, once a report is published to the Power BI service, any query that takes longer than a few minutes will time out, and the user will receive an error.
Load on the source database should also be considered, based on the number of Power BI users who will consume the published report. Using Row Level Security (RLS) can have a significant impact as well; a non-RLS dashboard tile shared by multiple users results in a single query to the database, but using RLS on a dashboard tile usually means the refresh of a tile requires one query per user, thus significantly increasing load on the source database and potentially impacting performance.
Power BI creates queries that are as efficient as possible. Under certain situations however, the generated query may not be efficient enough to avoid refresh that would fail. One example of this situation is when a generated query would retrieve an excessively large number of rows from the back-end data source, in which case the following error occurs:
The resultset of a query to external data source has exceeded
This situation can occur with a simple chart that includes a very high cardinality column, with the aggregation option set to Don’t Summarize. The visual needs to only have columns with a cardinality below 1 million, or must have appropriate filters applied.
Security - By default all users who consume a published report connect to the back-end data source using the credentials entered after publication to the Power BI service. This is the same situation as data that is imported: all users see the same data, irrespective of any security rules defined in the backend source. Customers who want per-user security implemented with DirectQuery sources should use either RLS or configure Kerberos constrained authentication against source. Kerberos is not available for all sources. Learn more about RLS. Learn more about Kerberos in DirectQuery.
Supported features - Not all features in Power BI Desktop are supported in DirectQuery mode, or have some limitations. In addition, there are some capabilities in the Power BI service (such as Quick Insights) that are not available for datasets using DirectQuery. As such, the limitation of such features when using DirectQuery should be taken into consideration when determining whether to use DirectQuery.
Publish to the Power BI service
Reports created using DirectQuery can be published to the Power BI Service.
If the data source used does not need the On-premises data gateway (Azure SQL Database, Azure SQL Data Warehouse, or Redshift), credentials must be provided before the published report will be displayed in the Power BI Service.
You can provide credentials by selecting the Settings gear icon in Power BI, then select Settings.
Power BI displays the Settings window. From there, select the Datasets tab and choose the dataset that uses DirectQuery, and select Edit credentials.
Until credentials are supplied, opening a published report or exploring a dataset created with a DirectQuery connection to such data sources results in an error.
For data sources other than Azure SQL Database, Azure SQL Data Warehouse and Redshift that use DirectQuery, an On-premises data gateway must be installed and the data source must be registered to establish a data connection. You can learn more about On-premises data gateway.
For more information about DirectQuery, check out the following resources: