Configure and run data profiling for a data asset (Preview)
Data profiling is the process of examining the data available in different data sources and collecting statistics and information about this data. Data profiling helps to assess the quality level of the data according to defined set of goals. If data is of a poor quality, or managed in structures that can't be integrated to meet the needs of the enterprise, business processes and decision-making suffer. Data profiling allows you to understand the trustworthiness and quality of your data, which is a prerequisite for making data-driven decisions that boost revenue and foster growth.
Prerequisites
- To run and schedule data quality assessment scans, your users must be in the data quality steward role.
- Currently, the Microsoft Purview account must be set to allow public access so data quality scans can run.
Data quality life cycle
Data profiling is the fifth step the data quality life cycle for a data asset. Previous steps are:
- Assign users(s) data quality steward permissions in your data catalog to use all data quality features.
- Register and scan a data source in your Microsoft Purview Data Map.
- Add your data asset to a data product
- Set up a data source connection to prepare your source for data quality assessment.
Supported Azure storage asset types
- Azure Data Lake Storage Gen2
- File Types: Delta format
- Azure SQL Database
- Microsoft Fabric lakehouse (delta table)
Supported authentication methods
Currently, Microsoft Purview can only run data profiling and data quality assessments using managed identity as the authentication option. For more information about supported regions, see data quality overview.
Important
If the schema is updated on the data source, it is necessary to rerun data map scan before running a data profiling.
Select the business domain
Configure a data source connection to the asset if you haven't already created one.
From Microsoft Purview Data Catalog, select the Data Management menu and Data quality submenu.
In the data quality submenu, select the Business domain for data profiling.
Select a data product to profile a data asset linked to that product.
Select a data asset to navigate into data quality Overview page for profiling.
Select the Profile data button to run profiling job for the selected data asset.
The AI recommendation engine suggests potentially important columns to run data profiling against. You can deselect recommended columns and/or select more columns to be profiled.
Once you've selected the relevant columns, select Run Profile.
While the job is running, you can track its progress from the data quality monitoring page in the business domain.
When the job is complete, select the Profile tab from left menu of the asset's data quality page to list browse the profiling result and statistical snapshot. There could be several profile result pages depending on how many columns your data assets have.
Browse the profiling results and statistical measures for each column.
Next steps
- Set up data quality rules based on the profiling results, and apply them to your data asset.
- Configure and run a data quality scan on a data product to assess the quality of all supported assets in the data product.
- Review your scan results to evaluate your data product's current data quality.
Feedback
https://aka.ms/ContentUserFeedback.
Bald verfügbar: Im Laufe des Jahres 2024 werden wir GitHub-Issues stufenweise als Feedbackmechanismus für Inhalte abbauen und durch ein neues Feedbacksystem ersetzen. Weitere Informationen finden Sie unterFeedback senden und anzeigen für