Composite model guidance in Power BI Desktop

This article targets data modelers developing Power BI Composite models. It describes Composite model use cases, and provides you with design guidance. Specifically, the guidance is to help you determine whether a Composite model is appropriate for your solution. If it is, then this article will also help you design an optimal model.

Note

An introduction to Composite models is not covered in this article. If you're not completely familiar with Composite models, we recommend you first read the Use composite models in Power BI Desktop article.

Because Composite models consist of at least one DirectQuery source, it's also important that you have a thorough understanding of model relationships, DirectQuery models, and DirectQuery model design guidance.

Composite model use cases

Whenever possible, it's best to develop a model in Import mode. This mode provides the greatest design flexibility, and best performance.

However, challenges related to large data volumes, or reporting on near real-time data, cannot be solved by Import models. In either of these cases, you can consider a DirectQuery model, providing your data is stored in a single data source that's supported by DirectQuery mode.

Further, you can consider developing a Composite model in the following situations.

  • Your model could be a DirectQuery model, but you want to boost performance. In a Composite model, performance can be improved by configuring appropriate storage for each table. You can also add aggregations. Both of these optimizations are discussed later in this article.
  • You want to combine a DirectQuery model with additional data, which must be imported into the model. Imported data can be loaded from a different data source, or from calculated tables.
  • You want to combine two or more DirectQuery data sources into a single model.

Note

Composite models cannot combine connections to external analytic databases. These include live connections to external-hosted models, Power BI datasets, SAP Business Warehouse, and SAP HANA when treating SAP HANA as a multidimensional source.

Optimize model design

A Composite model can be optimized by configuring table storage modes, and by adding aggregations.

Table storage mode

In a Composite model, you can configure the storage mode for each table (except calculated tables):

  • DirectQuery: We recommend you set this mode for tables that represent large data volumes, or which need to deliver near real-time results. Data will never be imported into these tables. Usually, these tables will be fact-type tables—tables used for summarization.
  • Import: We recommend you set this mode for tables that are not used for filtering and grouping of fact tables in DirectQuery or Hybrid mode. It's also the only option for tables based on sources not supported by DirectQuery mode. Calculated tables are always Import tables.
  • Dual: We recommend you set this mode for dimension-type tables, when there's a possibility they'll be queried together with DirectQuery fact-type tables from the same source.
  • Hybrid: We recommend you configure this mode by adding Import partitions, as well as one DirectQuery partition to a fact table if you want to include the latest data changes in real time, or if you want to provide fast access to the most frequently used data through Import partitions while leaving the bulk of more infrequently used data in the data warehouse.

There are several possible scenarios when Power BI queries a Composite model:

  • Queries only Import or Dual table(s): All data is retrieved from the model cache. It will deliver the fastest possible performance. This scenario is common for dimension-type tables queried by filters or slicer visuals.
  • Queries Dual table(s) or DirectQuery table(s) from the same source: All data is retrieved by sending one or more native queries to the DirectQuery source. It will deliver good performance, especially when appropriate indexes exist on the source tables. This scenario is common for queries that relate Dual dimension-type tables and DirectQuery fact-type tables. These queries are intra source group, and so all one-to-one or one-to-many relationships are evaluated as regular relationships.
  • Queries Dual table(s) or Hybrid table(s) from the same source: This is a combination of the previous two scenarios. The data is retrieved from the model cache if it is available in Import partitions, otherwise by sending one or more native queries to the DirectQuery source. It will deliver the fastest possible performance because only a slice of the data is queried in the data warehouse, especially when appropriate indexes exist on the source tables. As for the Dual dimension-type tables and DirectQuery fact-type tables, these queries are intra source group, and so all one-to-one or one-to-many relationships are evaluated as regular relationships.
  • All other queries: These queries involve cross source group relationships. It's either because an Import table relates to a DirectQuery table, or a Dual table relates to a DirectQuery table from a different source—in which case it behaves as an Import table. All relationships are evaluated as limited relationships. It also means that groupings applied to non-DirectQuery tables must be sent to the DirectQuery source as a virtual table. In this case, the native query can be inefficient, especially for large grouping sets. And, it has the potential to expose sensitive data in the native query.

In summary, we recommend that you:

  • Consider carefully that a Composite model is the right solution—while it allows model-level integration of different data sources, it also introduces design complexities with possible consequences
  • Set the storage mode to DirectQuery when a table is a fact-type table storing large data volumes, or it needs to deliver near real-time results
  • Consider using Hybrid mode by defining an incremental refresh policy and real-time data or by partitioning the fact table through TOM, TMSL, or a third-party tool. For more information, see Incremental refresh and real-time data for datasets.
  • Set the storage mode to Dual when a table is a dimension-type table, and it will be queried together with DirectQuery or Hybrid fact-type tables based on the same source
  • Configure appropriate refresh frequencies to keep the model cache for Dual and Hybrid tables (and any dependent calculated tables) in sync with the source database(s)
  • Strive to ensure data integrity across data sources (including the model cache)—limited relationships will eliminate rows when related column values don't match
  • Optimize DirectQuery data sources with appropriate indexes for efficient joins, filtering and grouping
  • Don't load sensitive data into Import, Dual or Hybrid tables if there's risk of a native query being intercepted—for more information, see Use composite models in Power BI Desktop (Security implications)

Aggregations

You can add aggregations to DirectQuery tables in your Composite model. Aggregations are cached in the model, and so they behave as Import tables (although they can't be used like a model table). Their purpose is to improve performance for "higher grain" queries. For more information, see Aggregations in Power BI Desktop.

Note

Hybrid tables do not support aggregations because some of the partitions operate in Import mode. It is not possible to add aggregations at the level of an individual DirectQuery partition.

We recommend that an aggregation table follows a basic rule: Its row count should be at least a factor of 10 smaller than the underlying table. For example, if the underlying table stores 1 billion rows, then the aggregation table shouldn't exceed 100 million rows. This rule ensures there's an adequate performance gain relative to the cost of creating and maintaining the aggregation table.

Next steps

For more information related to this article, check out the following resources: