Export table data to Azure Data Lake Storage Gen2
Effective November 2020:
- Common Data Service has been renamed to Microsoft Dataverse. Learn more
- Some terminology in Microsoft Dataverse has been updated. For example, entity is now table and field is now column. Learn more
This article will be updated soon to reflect the latest terminology.
The Export to Data Lake service is a pipeline to continuously export data from Microsoft Dataverse to Azure Data Lake Storage Gen2. The Export to Data Lake service is designed for enterprise big data analytics by delivering scalable high availability with disaster recovery capabilities. Data is stored in the Common Data Model format, which provides semantic consistency across apps and deployments.
Export to Data Lake provides these features:
- Linking or unlinking the Dataverse environment to Data Lake Storage Gen2 in your Azure subscription.
- Continuous replication of tables to Data Lake Storage Gen2.
- Initial write, followed by incremental writes for data and metadata.
- Replication of both standard and custom tables.
- Replication of create, update, and delete (CrUD) transactions.
- Continuous snapshot updates for large analytics scenarios.
- Facilitated metadata discovery and interoperability between data producers and consumers such as Power BI, Azure Data Factory, Azure Databricks, and Azure Machine Learning.
How data and metadata are exported
The Export to Data Lake service supports initial and incremental writes for table data and metadata. Any data or metadata changes in Dataverse are automatically pushed to the data lake without any additional action. This is a push, rather than pull, operation. Changes are pushed to the destination without you needing to set up refresh intervals.
Both standard and custom tables can be exported. Notice that the change tracking table attribute in Dataverse is used to keep the data synchronized in an efficient manner by detecting what data has changed since it was initially extracted or last synchronized.
All create, update, and delete operations are exported from Dataverse to the data lake. For example, when a user deletes an Account table row in Dataverse, the transaction is replicated to the destination data lake.
Before you can export Dataverse data to a data lake, you must create and configure an Azure Storage v2 (general-purpose v2) storage account.
Follow the steps in the Create an Azure Storage account article, and note these requirements:
- You must be granted an owner role on the storage account.
- Set your storage type as Storagev2 (general purpose v2).
- The storage account must have the Hierarchical namespace feature enabled.
We recommend that you set replication to read-access geo-redundant storage (RA-GRS). More information: Read-access geo-redundant storage
- The storage account must be created in the same Azure Active Directory (Azure AD) tenant as your Power Apps tenant.
- The storage account must be created in the same region as the Power Apps environment you'll use the feature in.
- To link the Dataverse environment to Azure Data Lake Storage Gen2, you must be a Dataverse administrator.
- Only tables that have change tracking enabled can be exported.
Select and export Dataverse table data to Azure Data Lake Storage Gen2
Sign in to Power Apps, expand Data, and then select Tables.
On the command bar, select Export to data lake, and then on the Export to data lake page, select New link to data lake.
Select each of the following settings, and then select Next:
- Subscription. Select your Azure subscription.
- Resource group. Select the resource group that contains the Storage v2 (general-purpose v2) storage account.
- Storage account. Select the Storage v2 (general-purpose v2) storage account to use for the export.
As part of linking the Dataverse environment to a data lake, you grant the Export to Data Lake service access to your storage account. Ensure that you followed the prerequisites of creating and configuring the Azure data lake storage account, and granting yourself an owner role on the storage account. Additionally, you grant the Power Platform Dataflows service access to your storage account. More information: Self-service data prep with dataflows.
Select the tables that you want to export to the data lake, and then select Save. Only tables with change tracking enabled can be exported. More information: Enable change tracking
Your Dataverse environment is linked to the Azure Data Lake Storage Gen2 account. The file system in the Azure storage account is created with a folder for each table selected to be replicated to the data lake.
You can follow the steps above to create a link from one Dataverse environment to multiple Azure data lakes in your Azure subscription. Similarly, you could create a link from multiple Dataverse envrionments to the same Azure Data Lake, all within the same tenant.
The data exported by Export to data lake service is encrypted at rest in Azure Data Lake Storage Gen2. Additionally, transient data in the blob storage is also encrypted at rest. Encryption in Azure Data Lake Storage Gen2 helps you protect your data, implement enterprise security policies, and meet regulatory compliance requirements. More information: Azure Data Encryption-at-Rest
Currently, you can't provide public IPs for Export to data lake service that can be used in Azure Data Lake firewall settings. Public IP network rules have no effect on requests originating from the same Azure region as the storage account. Services deployed in the same region as the storage account use private Azure IP addresses for communication. Thus, you can't restrict access to specific Azure services based on their public outbound IP address range. More information: Configure Azure Storage firewalls and virtual networks
Manage table data to the data lake
After you've set up data export to Azure Data Lake Storage Gen2 in your subscription, you can manage the export of table data to the data lake in one of two ways:
On the Power Apps maker portal Export to data lake area, select Manage tables on the command bar to add or remove one or more linked tables.
On the Power Apps maker portal Tables area, select … next to a table, and then select the linked data lake where you want to export table data.
To unlink all linked tables, on the Power Apps maker portal Export to data lake area, select Unlink data lake.
View your data in Azure Data Lake Storage Gen2
- Sign in to Azure, select the storage account, and then in the leftmost navigation pane, select Storage Explorer.
- Expand File Systems, and then select commondataservice-environmentName-org-Id.
The model.json file, along with its name and version, provides a list of tables that have been exported to the data lake. The model.json file also includes the initial sync status and sync completion time.
A folder that includes snapshot comma-delimited (CSV format) files is displayed for each table exported to the data lake.
Continuous snapshot updates
Dataverse data can continuously change through create, update, and delete transactions. Snapshots provide a read-only copy of data that's updated at regular intervals, in this case every hour. This ensures that at any given point, a data analytics consumer can reliably consume data in the lake.
When tables are added as part of the initial export, the table data is written to the table.csv files under the corresponding folders in the data lake. This is the T1 interval, where a snapshot read-only file named table-T1.csv—for example, Account-T1.csv or Contacts-T1.csv—is created. Additionally, the model.json file is updated to point to these snapshot files. Opening model.json, you can view the snapshot details.
Here's an example of an Account.csv partitioned file and snapshot folder in the data lake.
Changes in Dataverse are continuously pushed to the corresponding CSV files by using the trickle feed engine. This is the T2 interval, where another snapshot is taken. table-T2.csv—for example, Accounts-T2.csv or Contacts-T2.csv (assuming there are changes for the table) —and model.json are updated to the new snapshot files. Any new person who views snapshot data from T2 onward is directed to the newer snapshot files. This way, the original snapshot viewer can continue to work on the older snapshot T1 files while newer viewers can read the latest updates. This is useful in scenarios that have longer-running downstream processes.
Here's an example of the model.json file, which always points to the latest time-stamped account snapshot file.
Transporting an Export to Data Lake configuration across environments
In Power Apps, solutions are used to transport apps and components from one environment to another, or to apply a set of customizations to existing apps. To make the Export to Data Lake configurations solution-aware, import the Export to Data Lake Core solution into the environment. This enables basic application lifecycle management (ALM) abilities such as distribution, and backup and restore of the Export to Data Lake configuration.
Import the Export to Data Lake Core solution
- From the Power Apps maker portal, select the environment where you want to distribute the Export to Data Lake configuration.
- On the leftmost navigation pane, select Solutions, select Open AppSource, search for the solution named Export to Data Lake Core, and then import the solution.
- Repeat above steps in the destination environment. You need the Export to Data Lake Core solution in both, the source and destination environments.
Add an Export to Data Lake configuration to a solution
Before you can add an Export to Data Lake configuration, you must install the Export to Data Lake Core solution described earlier.
- From the Power Apps maker portal, select the environment where you want to distribute the Export to Data Lake configuration, and then on the leftmost navigation pane, select Solutions.
- Select New solution, provide a name, select a publisher, and then specify a version number.
- Open the solution you created in the previous step, select Add existing > Other > Export to data lake config.
- Select the linked data lake configurations that you want, and then select Add.
- In the Solutions area, select the solution, and then on the command bar, select Export.
- In the Before you export pane, select Publish to publish all changes before you export, and then select Next.
Import the solution that contains the Export to Data Lake configuration
In the environment where you want to import your solution, in the Power Apps maker portal Solutions area, import the solution.
Verify the Export to Data Lake configuration
From the Power Apps maker portal in the environment where you imported the Export to Data Lake configuration, verify that you can see your linked data lake in addition to the tables that you transported from your other environment.
After successfully using the Export to Data Lake service, discover how you can analyze and consume your data with Discover Hub. To access Discover Hub, go to Power Apps > Export to data lake. Select your linked service and then select the Discover Hub tab. Here you can find recommended tools and curated documentation to help you get the most value out of your data.