Large datasets in Power BI Premium
Power BI datasets can store data in a highly compressed in-memory cache for optimized query performance, enabling fast user interactivity. With Premium capacities, large datasets beyond the default limit can be enabled with the Large dataset storage format setting. When enabled, dataset size is limited by the Premium capacity size or the maximum size set by the administrator.
Large datasets can be enabled for all Premium P SKUs, Embedded A SKUs, and with Premium Per User (PPU). The large dataset size limit in Premium is comparable to Azure Analysis Services, in terms of data model size limitations.
While required for datasets to grow beyond 10 GB, enabling the Large dataset storage format setting has other benefits. If you're planning to use XMLA endpoint-based tools for dataset write operations, be sure to enable the setting, even for datasets that you wouldn't necessarily characterize as a large dataset. When enabled, the large dataset storage format can improve XMLA write operations performance.
Large datasets in the service do not affect the Power BI Desktop model upload size, which is still limited to 10 GB. Instead, datasets can grow beyond that limit in the service on refresh.
Power BI Premium does support large datasets. Enable the Large dataset storage format option to use datasets in Power BI Premium that are larger than the default limit.
Enable large datasets
Steps here describe enabling large datasets for a new model published to the service. For existing datasets, only step 3 is necessary.
Create a model in Power BI Desktop. If your dataset will become larger and progressively consume more memory, be sure to configure Incremental refresh.
Publish the model as a dataset to the service.
In the service > dataset > Settings, expand Large dataset storage format, click the slider to On, and then click Apply.
Invoke a refresh to load historical data based on the incremental refresh policy. The first refresh could take a while to load the history. Subsequent refreshes should be faster, depending on your incremental refresh policy.
Set default storage format
All new datasets created in a workspace assigned to Premium capacity can have the large dataset storage format enabled by default.
In the workspace, click Settings > Premium.
In Default storage format, select Large dataset storage format, and then click Save.
Enable with PowerShell
You can also enable large dataset storage format by using PowerShell. You must have capacity admin and workspace admin privileges to run the PowerShell cmdlets.
Find the dataset ID (GUID). On the Datasets tab for the workspace, under the dataset settings, you can see the ID in the URL.
From a PowerShell admin prompt, install the MicrosoftPowerBIMgmt module.
Install-Module -Name MicrosoftPowerBIMgmt
Run the following cmdlets to sign in and check the dataset storage mode.
Login-PowerBIServiceAccount (Get-PowerBIDataset -Scope Organization -Id <Dataset ID> -Include actualStorage).ActualStorage
The response should be the following. The storage mode is ABF (Analysis Services backup file), which is the default.
Id StorageMode -- ----------- <Dataset ID> Abf
Run the following cmdlets to set the storage mode. It can take a few seconds to convert to Premium Files.
Set-PowerBIDataset -Id <Dataset ID> -TargetStorageMode PremiumFiles (Get-PowerBIDataset -Scope Organization -Id <Dataset ID> -Include actualStorage).ActualStorage
The response should be the following. The storage mode is now set to Premium Files.
Id StorageMode -- ----------- <Dataset ID> PremiumFiles
You can check the status of dataset conversions to and from Premium Files by using the Get-PowerBIWorkspaceMigrationStatus cmdlet.
Power BI uses dynamic memory management to evict inactive datasets from memory. Power BI evicts datasets so it can load other datasets to address user queries. Dynamic memory management allows the sum of dataset sizes to be significantly greater than the memory available on the capacity, but a single dataset must fit into memory. For more info on dynamic memory management, see How capacities function.
You should consider the impact of eviction on large models. Despite relatively fast dataset load times, there could still be a noticeable delay for users if they have to wait for large evicted datasets to be reloaded. For this reason, in its current form, the large models feature is recommended primarily for capacities dedicated to enterprise BI requirements rather than capacities mixed with self-service BI requirements. Capacities dedicated to enterprise BI requirements are less likely to frequently trigger eviction and need to reload datasets. Capacities for self-service BI on the other hand can have many small datasets that are more frequently loaded in and out of memory.
Checking dataset size
You can also check the dataset size by running the following DMV queries from SSMS. Sum the DICTIONARY_SIZE and USED_SIZE columns from the output to see the dataset size in bytes.
SELECT * FROM SYSTEMRESTRICTSCHEMA ($System.DISCOVER_STORAGE_TABLE_COLUMNS, [DATABASE_NAME] = '<Dataset Name>') //Sum DICTIONARY_SIZE (bytes) SELECT * FROM SYSTEMRESTRICTSCHEMA ($System.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS, [DATABASE_NAME] = '<Dataset Name>') //Sum USED_SIZE (bytes)
Default segment size
For datasets using the large dataset storage format, Power BI automatically sets the default segment size to 8 million rows to strike a good balance between memory requirements and query performance for large tables. This is the same segment size as in Azure Analysis Services. Keeping the segment sizes aligned helps ensure comparable performance characteristics when migrating a large data model from Azure Analysis Services to Power BI.
Considerations and limitations
Keep in mind the following restrictions when using large datasets:
New workspaces are required: Large datasets only work with New workspaces.
Download to Power BI Desktop: If a dataset is stored on Premium Files, downloading as a .pbix file will fail.
Supported regions: Large datasets are supported in all Azure regions that support Premium Files Storage. To learn more, see Products available by region, and consult the table in the following section.
Setting maximum dataset size: Maximum dataset size can be set by administrators. Maximum value can be set from 0.1 GB up to the maximum capacity of the SKU.
Push datasets: Push datasets do not support the large dataset storage format.
You cannot enable large datasets using the REST API.
Large datasets in Power BI are only available in certain Azure regions that support Azure Premium Files Storage.
The following list provides regions where large datasets in Power BI are available. Regions not in the following list are not supported for large models.
Once a large dataset is created in a workspace, it must stay in that region. You cannot reassign a workspace with a large dataset to a Premium capacity in another region.
|Azure region||Azure region abbreviation|
|East US 2||eastus2|
|North Central US||northcentralus|
|South Central US||southcentralus|
|West US 2||westus2|
The following links provide information that can be useful for working with large models:
- Azure Premium Files Storage
- Configure Multi-Geo support for Power BI Premium
- Bring your own encryption keys for Power BI
- How capacities function
- Incremental refresh for datasets
Power BI has introduced Power BI Premium Gen2 as a preview offering, which improves the Power BI Premium experience with improvements in the following:
- Per-user licensing
- Greater scale
- Improved metrics
- Reduced management overhead
For more information about Power BI Premium Gen2, see Power BI Premium Generation 2.