Configure and consume a dataflow

With dataflows, you can unify data from multiple sources and prepare that unified data for modeling. Whenever you create a dataflow, you're prompted to refresh the data for the dataflow. Refreshing a dataflow is required before it can be consumed in a dataset inside Power BI Desktop, or referenced as a linked or computed entity.

Configuring a dataflow

To configure the refresh of a dataflow, select the More menu (the ellipsis) and select Settings.

dataflow settings portal

The Settings options provide many options for your dataflow, as the following sections describe.

dataflow settings

  • Take ownership: If you're not the owner of the dataflow, many of these settings are disabled. To take ownership of the dataflow, select Take over to take control. You are prompted to provide credentials to ensure you have the necessary access level.

  • Gateway Connection: In this section, you can choose whether the dataflow uses a gateway, and select which gateway is used.

  • Data Source Credentials: In this section you choose which credentials are being used, and can change how you authenticate to the data source.

  • Sensitivity Label: Here you can define the sensitivity of the data in the dataflow. To learn more about sensitivity labels, see how to apply sensitivity labels in Power BI.

  • Scheduled Refresh: Here you can define the times of day the selected dataflow refreshes. A dataflow can be refreshed at the same frequency as a dataset.

  • Enhanced Compute Engine settings: Here you can define whether the dataflow is stored inside the compute engine. The compute engine allows subsequent dataflows, which reference this dataflow, to perform merges and joins and other transformations much faster than you would otherwise. It also allows DirectQuery to be performed over the dataflow. Selecting On ensures the dataflow is always supported in DirectQuery mode, and any references benefit from the engine. Selecting Optimized means the engine is only used if there is a reference to this dataflow. Selecting Off disables the compute engine and DirectQuery capability for this dataflow.

  • Endorsements: You can define whether the dataflow is certified or promoted.

Refreshing a dataflow

Dataflows act as building blocks on top of one another. Suppose you have a dataflow called Raw Data and a linked entity called Transformed Data which contains a linked entity to the Raw Data dataflow. When the schedule refresh for the dataflow Raw Data triggers, it will trigger any dataflow that references it upon completion. This functionality creates a chain effect of refreshes, allowing you to avoid having to schedule dataflows manually. There are a few nuances to be aware of when dealing with linked entities refreshes:

  • A linked entity will be triggered by a refresh only if it exists in the same workspace

  • A linked entity will be locked for editing if a source entity is being refreshed. If any of the dataflows in a reference chain fail to refresh, all the dataflows will roll back to the old data (dataflow refreshes are transactional within a workspace).

  • Only referenced entities are refreshed when triggered by a source refresh completion. To schedule all the entities, you should set a schedule refresh on the linked entity as well. Avoid setting a refresh schedule on linked dataflows to avoid double refresh.

Cancel Refresh Dataflows support the ability to cancel a refresh, unlike datasets. If a refresh is running a long time, you can select the dataflow options (the ellipses next to the dataflow) and then select Cancel refresh.

Incremental Refresh (Premium only) Dataflows can be also set to refresh incrementally. To do so, select the dataflow you wish to set up for incremental refresh, and then select the incremental refresh icon.

dataflow incremental refresh

Setting incremental refresh adds parameters to the dataflow to specify the date range. For detailed information on how to set up incremental refresh, see the incremental refresh in Power Query article.

There are some circumstances under which you should not set incremental refresh:

  • Linked entities should not use incremental refresh if they reference a dataflow. Dataflows do not support query folding (even if the entity is Direct Query enabled).

  • Datasets referencing dataflows should not use incremental refresh. Refreshes to dataflows are generally performant, so incremental refreshes shouldn't be necessary. If refreshes take too long, consider using the compute engine, or DirectQuery mode.

Consuming a dataflow

A dataflow can be consumed in the following three ways:

  • Create a linked entity from the dataflow to allow another dataflow author to use the data

  • Create a dataset from the dataflow to allow a user to utilize the data to create reports

  • Create a connection from external tools that can read from the CDM format

Consuming from Power BI Desktop To consume a dataflow, run Power BI Desktop and select the Power BI dataflows connector in the Get Data dialog.

Note

The Power BI dataflows connector uses a different set of credentials than the current logged in user. This is by design, to support multi-tenant users.

dataflow connector

Select which dataflow and which entities to which you want to connect.

Note

You can connect to any dataflow or entity regardless of which workspace it resides in, and whether or not it was defined in a Premium or non-Premium workspace.

dataflow connector entities

If DirectQuery is available, you're prompted to choose whether you want to connect to the entities through DirectQuery or Import.

In DirectQuery mode, you can quickly interrogate large-scale datasets locally. However, you cannot perform any additional transformations.

Using Import bring the data into Power BI, and requires the dataset to be refreshed independently of the dataflow.

Next steps

The following articles provide more information about dataflows and Power BI: