Section 3: Publish your data products

Creating data products is essential to ensure that the right data is made discoverable by your organization. Data products will help to prevent over governing data that is low or no value in your data estate because it has no use or limited value. Ensuring your data experts are about to publish data products will activate your most valuable data and build the right level of governance based on that value. Curating assets that technical teams don't know the business purpose of or trying to govern everything in your complex and growing data estate will cause extra time and lost productivity chasing down the details of data that might never be used or could just be removed from the estate. Instead focus on the pieces of data that have value and that people need to discover and build even more value. As teams use more data and gain a better understanding of what is needed or more useful data products can be created to meet those demands and governance can adapt to ensure it always stays the right size based on the value and sensitivity of the data.

Prerequisites

Create and publish a data product

  1. Open the Microsoft Purview portal.

  2. Select the Data Catalog.

  3. Select Data management and then Business domains.

  4. From the Business domains page, select the Personal Health domain

  5. Select the Go to data products link under Business concepts

  6. Here's where the data experts called data product owners will identify the data assets that are intended to be consumed by others in your organization, and provide the necessary information to make them usable.

  7. Select + New data product

  8. Provide details about the data product

    1. Name: 'Covid-19 Vaccination and Case Trending by Age'
    2. Description: 'This data comes from the CDC as a part of the U.S. Department of Health & Human Services. The data contains trends in vaccinations and cases by age group, at the US national level. Data is stratified by at least one dose and fully vaccinated. Data also represents all vaccine partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.'
    3. Type: Dataset
    4. Select Next
    5. Use cases: “This data is provided for public use and is intended to help understand the trends of vaccination up take and new cases by different age groups. The ages are banded into two groups ranging from <2 years to 65+ years. Similarly the trends are provided in daily numbers that provide seven day average of new cases by age group.”
    6. Mark as Endorsed as checked.
    7. Select Save.
  9. Now you have the base metadata of the data product built out. Next add some properties and map the asset from the data map.

    Screenshot of selecting assets to add to a data product.

  10. Select the + Add data assets button.

  11. You'll see the assets you have scanned into the data map, this will include all folders and layers of the data source.

  12. Search for the Covid19 Vaccine and Case Trends asset you added to the gold container of your data lake and select this resource set.

  13. Select Add. You can select as many assets as needed for a data product but here only one is needed.

    Tip

    Try the Get suggestions button to have GenAI help pick from the assets in your data map and select the Covid19 Vaccine and Case Trends from a reduced list of results.

  14. You can now see the asset added to your data product.

  15. Select + Add term next to the glossary terms title

    Screenshot of adding a glossary term to the data product.

  16. Select the Outbreak term created earlier and select Add

    Screenshot of selecting a glossary term.

  17. You should see the critical data element for age group from the asset mapped to the data product now.

  18. Select + Add OKR next to the OKR title

  19. Select the Reduce pandemic risk by enabling effective patient vaccine uptake. It's the objective we created in the first section.

Manage data product access request policies

At the top of the page, the last step before publishing the data product is to select the Manage policies button. Here the access policies and request access workflow are configured by making selections and providing the names for approval. You can also use the Inherited policies tab to see the business domain policy applied for data copies attestation we applied earlier. It's the same for the Manager approval required coming from the Outbreak glossary term.

  1. Select the Manage policies tab.

    Screenshot of managing data product policies.

  2. Under Access time limit, provide details for how long the request for access is good before needing to be renewed. We'll set this to grant access for up to one year.

  3. In the box, put 1.

  4. Select years in the drop-down.

  5. Under approval requirements, provide your name in the approvers box. (It will require the name registered in Microsoft Entra ID)

    Note

    We don't need to check manager approval because we inherit that policy from the outbreak glossary term.

  6. Select the Preview request form button to see what the catalog consumers will view when requesting access. You'll see the data copy attestation and manager approval required because they were set by the business domain and glossary term.

    Screenshot of the preview of an access request form.

  7. Select Save changes.

Once you have the data assets mapped and the access policies configured, you're ready to publish your data product to the catalog.

  1. Select Publish on the data product.

  2. Try creating a Profit Report in other domains you created earlier

    1. Profit Report, Type: Dashboards/reports.
    2. Product Master, Type: Maser data and reference data.

Note

You can add many assets to these and see how a data product with many assets will look and may the data products to the terms from any domain to see how the glossary is used to describe the data using a consistent set of terms.

Next steps

Section 4 - Run data quality