Data Products in Microsoft Purview (Preview)

A data product is a group of data assets (tables, files, PBI reports, etc.), a packaged offering to an enterprise that provides assets with a use case to be shared to data consumers.

In Microsoft Purview, data governance isn't only a way to make sure your data is secure and compliant, but it's also a tool to accelerate your data's business value. Cataloging data in your estate makes it possible to better manage data for right use, but it also provides a complete picture of your data landscape. Now that there's a list of every available data asset, users no longer have to rely on networking or team knowledge to find what they need; they can search the catalog themselves. But giving every user a raw list of all available data is overwhelming, and not inherently useful. Even with good descriptions, tagging, and glossary terms it can be hard to know what you're looking for. And for a complete data visualization you probably need several data assets, and not just one. As your data catalog grows, context needs to grow alongside it to make it easier for your users to find and request access to the data they need.

To provide scalable data context and access management, Microsoft Purview is introducing the data product.

What's a data product?

A data product is an business concept with a name, description, owners, and most importantly a list of associated data assets. The data product provides context for these assets, grouping them under a use case for data consumers. A business domain can house many data products but a data product is managed by a single business domain and can be discovered across many domains.

A successful data product makes it easy for data consumers to recognize valuable data using their day-to-day language, and at the same time streamlines ownership responsibilities for those data assets. Let's explore what that looks like.

Scalable data context

As an example, a data scientist creates a set of data assets for their data model to consume and they want others to be able to use the same dataset.

Without data products, the data scientist can use the data catalog add a glossary term to all the relevant data assets. A user might not know which glossary term to search, so it might be best to add a description to each data asset to make it more relevant in searches for similar information. But both additions don't guarantee that other users will see all the associated data assets. They might group in other assets that aren't as relevant, or miss a critical data piece, and spend time repeating research the original data scientist has already performed.

With a data product, a data scientist can create a data product that lists all the assets used to create their data model. The description provides a full use case, with examples or suggestions on how to use the data. The data scientist is now a data product owner and they've improved their data consumer's search experience by helping them get everything they need in this one data product.

Scalable data governance

Data products streamline governance for data assets as well. Using the same example of a data scientist who creates a set of data assets:

Without data products, if a user wants access to the data assets for the data set, they must request access to each data asset individually. A data owner might know that these assets are being used for machine learning models, but if any changes are made to policies around their security and use cases, the data owner must go to each asset individually to make those updates.

With data products, a user finds the data product, they can request access to the data product, which will provide them access (after approval) to all the associated data assets. If more approval or data use policies are put in place around datasets for machine learning, a data owner only needs to apply the new policies to the data product, and they'll trickle down automatically to the assets.

Data products are also associated with business health controls and OKRs. These controls allow data owners to assess data health and prioritize assets that need attention, and assess which data assets are providing business value. This supports not only progress towards complete data governance in your estate, but also encourages developing business value from your data. Assets are no longer abstract, but tied to real use cases and business objectives that your team can focus on.

Data access policies

Data security and access is the core tenant of successful data governance. But to implement data governance and successfully drive data use (and therefore value), the data access process needs to be secure, convenient, and customizable to all scenarios across your data estate. Some data should be widely useable and accessible, and some needs to be under rigorous approval and monitoring to ensure right use.

Each data product has an access policy that determines how users request access, the terms of use for the data, and who should approve access to the data. Each of these access policies is customizable for appropriate use, and will evolve to cover more use cases in the future. All users need to do is select Request Access inside a data product and they'll automatically be taken through the process to agree to terms of use and get approval from correct parties.

For more information about access for data products, see the article on managing data catalog access policies in Microsoft Purview.

Next steps