Freigeben über


Configure and run data profiling for a data asset (Preview)

Data profiling is the process of examining the data available in different data sources and collecting statistics and information about this data. Data profiling helps to assess the quality level of the data according to defined set of goals. If data is of a poor quality, or managed in structures that can't be integrated to meet the needs of the enterprise, business processes and decision-making suffer. Data profiling allows you to understand the trustworthiness and quality of your data, which is a prerequisite for making data-driven decisions that boost revenue and foster growth.

Prerequisites

  • To run and schedule data quality assessment scans, your users must be in the data quality steward role.
  • Currently, the Microsoft Purview account must be set to allow public access so data quality scans can run.

Data quality life cycle

Data profiling is the fifth step the data quality life cycle for a data asset. Previous steps are:

  1. Assign users(s) data quality steward permissions in your data catalog to use all data quality features.
  2. Register and scan a data source in your Microsoft Purview Data Map.
  3. Add your data asset to a data product
  4. Set up a data source connection to prepare your source for data quality assessment.

Supported Azure storage asset types

  • Azure Data Lake Storage Gen2
    • File Types: Delta format
  • Azure SQL Database
  • Microsoft Fabric lakehouse (delta table)

Supported authentication methods

Currently, Microsoft Purview can only run data profiling and data quality assessments using managed identity as the authentication option. For more information about supported regions, see data quality overview.

Important

If the schema is updated on the data source, it is necessary to rerun data map scan before running a data profiling.

Select the business domain

  1. Configure a data source connection to the asset if you haven't already created one.

  2. From Microsoft Purview Data Catalog, select the Data Management menu and Data quality submenu.

  3. In the data quality submenu, select the Business domain for data profiling.

  4. Select a data product to profile a data asset linked to that product.

    Screenshot of the data quality menu, showing how to select a data product.

  5. Select a data asset to navigate into data quality Overview page for profiling.

    Screenshot of a data product with a data asset highlighted.

    Screenshot of the data asset overivew tab, with the profile tab highlighted.

  6. Select the Profile data button to run profiling job for the selected data asset.

  7. The AI recommendation engine suggests potentially important columns to run data profiling against. You can deselect recommended columns and/or select more columns to be profiled.

    Screenshot of the profiling column suggestions.

  8. Once you've selected the relevant columns, select Run Profile.

  9. While the job is running, you can track its progress from the data quality monitoring page in the business domain.

  10. When the job is complete, select the Profile tab from left menu of the asset's data quality page to list browse the profiling result and statistical snapshot. There could be several profile result pages depending on how many columns your data assets have.

    Screenshot of the profiling page with one column highlighted.

  11. Browse the profiling results and statistical measures for each column.

    Screenshot of the statistical snapshot for a single column.

Next steps

  1. Set up data quality rules based on the profiling results, and apply them to your data asset.
  2. Configure and run a data quality scan on a data product to assess the quality of all supported assets in the data product.
  3. Review your scan results to evaluate your data product's current data quality.