Data catalog development best practices (Preview)

In this article, we review the data governance strategies you can implement to maintain healthy, valuable, discoverable data. For a list of technical steps to set up your data catalog, see our guide to get started with the data catalog.

  1. Know your data with business concepts
    1. Create business domains
    2. Create data products
    3. Define glossary terms
  2. Unlock business value
    1. Let users search and browse your new domains and products
    2. Create OKRs
    3. Compliant data access
    4. Build logical data models with Critical Data Elements
  3. Enhance data maturity
    1. Improve data products with governance-focused actions
    2. Improve the trustworthiness of your data with data quality\
    3. Create source of truth data products with master data management
    4. Measure governance maturity with data health controls
    5. Build domain-specific standards

Know your data with business concepts

Business concepts like business domains, data products, and glossary terms unite your data with your day-to-day business practices. This not only makes it easier for your data consumers to understand the data they're using, but it allows you to democratize the data governance of those resources. Use your existing experts and data champions to build your data catalog into a rich resource.

Create business domains

Business domains are used to both distribute ownership and maintenance tasks, and make it easier for users to find the data that they need. Distributing information by business domains allows your users to reach the right level of information they need, without needing to traverse the entirety of your data estate.

When creating business domains, or reviewing your business domain structure, here are some things to consider:

  1. Business domain structure model
  2. Development planning

Business domain structure models

  • Central domain (good) - using a single domain can be efficient for small organizations, but might not scale well and is prone to bottlenecks during growth.
  • Department-based domains (good) - departments don't make decisions consistently, and if departments regularly shift, you might need to shift your data catalog structure.
  • Functional/line-of-business domains (better) - grants flexibility to teams, and aligns with the existing business model. This can be difficult to manage at scale, and might need many sub domains to empower data decision makers. It can also create data use silos, which is the antithesis of the data catalog's governance approach.
  • Domain mix (best) - having a combination of domains across subject areas/data domains, functional domains, regulatory domains, and project domains align your data to its experts. In the data catalog, your data experts are your most powerful resource; they know what policies need to be applied and what others need to know to make the most use of the data. This structure will also be the most durable to organizational updates, since it's based on how the data is used in the day-to-day, instead of on business structures.

Business domain development planning

  1. When you begin creating your business domains, start with a few domains aligned to teams that already have strong data stewardship:
    1. Assign data stewards and data product owners to your business domains, and have them begin development on a glossary and data products that align with their current practices.
    2. If needed, scan data into the data map in parallel to supplement your data products.
    3. Leave your business domain in a draft state until a few data products have been developed and are ready for users.
  2. Publish your business domain, and assign data catalog reader permissions to your first users to let them start exploring.
  3. With the feedback from your first batch of users, iterate on existing data products, or expand to your next data products or business domains.
  4. Starting with a few business domains that have mostly complete coverage with data products ensures data consumers that the data catalog has what they need and that they can continue to come back.

Tip

It's not recommended to align your business domains with your platform domains. IT is typically aligned with a technology structure or service/application, and isn't aligned with how data is used by business teams. Platform domains in the data map likely align to these technology teams, instead of your business teams. The goal of business domains is to align business users with the information that's most useful to them. Focus on data use, instead of data structure, to develop your business domains.

Create data products

Much of the data that is stored today has little to no known value and can take time and manual effort to evaluate and understand before it can be removed or improved. Focusing on the data with known value and use will enable more teams to build consistent value and show the benefits to having well understood and highly utilized data. This drives further adoption of data governance practices and makes the effort to clean up data estates easier as the value of each data asset becomes clearer.

Focus on data resources that already exist in your organization. Adding these as data products in your data catalog will make it easier for your users to discover them. It will also make access more scalable, and improve trustworthiness with lineage, data quality, and accountability. Some examples of existing data resources are:

  1. Gold zone data lakes, highly curated SQL stores, curated data warehouses/data lakehouses that teams use to support their day-to-day practices.
  2. Reports that are used to make decisions.
  3. Data tables that are used in reporting environments.
  4. Master and reference data.

Data product development planning

  1. Planning data products should be part of your intake process when data sources are added to the Microsoft Purview Data Map. Data Product Owners should know which data stores are being registered and scanned, and which have data assets ready to be added to the data catalog.
  2. Build your first data products from core data assets that have been scanned into the data map.
  3. Publish your first data products when your users are ready to consume data with that domain.

Define glossary terms

When you're building terms, start with what you already know and continue to build value from your data to show where effort is the most impactful. Here are some tips you can follow when creating and managing glossary terms to create the most value.

  1. Giving data to the most passionate users demonstrates the ability to continue growing value and provides prioritization for more governance.
  2. Many business teams already have a glossary to help new employees to orient themselves to the business. Use these as some of your first term candidates to describe a business domain and its data.
  3. If you aren't sure if a term represents another concept (like an entity or business process) adding a term is a good place to start so the most basic metadata is collected. If needed, th term can be expired, and a new concept can be used to collect more metadata and drive the intended end-to-end experience.
  4. Once glossary terms are added, linking these terms to data products will improve the discoverability of data products and enhance the consumers knowledge of the data.
  5. Periodically checking the data products that are mapped to a term to enable data stewards to better understand their use across the data estate.
  6. Term definitions can always be improved and edited. Waiting to publish a term until it's fully aligned will delay teams use of the term and prevent new value creation or escalation of potential improvements.

Term development planning

  1. Data stewards should learn the framework of the business domain, and then begin to add known terms and start to develop new ones.
  2. Term definitions should be developed and contain valuable information for consumers to understand their context and use.
  3. The first set of terms and data products should be published together for consumers to start their data use cases and discovery of data in the catalog.
  4. Building semantic knowledge never stops, so make a plan on how you can enable your team to continue to contribute terms throughout your governance lifecycle.

Unlock business value

Now that your basic data catalog structures are in place, it's time to start unlocking the value of your data by making it accessible to your users, and tying it directly to your business goals. Creating value from data comes from using that data, but using data means every person in the company needs to find the right data at the right time and in the right format to provide the needed insights or functionality. Data consumers are the key to making new business value from data.

Let users search and browse your business domains and data products

You've taken the time to build out business domains and data products, so give your data consumers access to use them and see how they do. Business users might be looking for strategic reports that are already available with the insights they need to make business decisions in a timely and well informed manner.

Here's how you can think about granting access to your users strategically:

  • Do not start by granting access to the data catalog to everyone at your company. Enable the teams that need the data you have in your catalog first. If your data products aren't available in the format data scientists need or the data isn't in predefined reports for business users, they'll lose trust in your catalog. Enabling the right roles to use the catalog first to build the pathway to success.
  • Do start with the teams that need the data you have in your catalog first. Who did you build your data products for? Which teams helped develop your glossary terms? Those are good initial candidates.
  • Do start with analysts and data experts that can tell you where gaps exist in the catalog. They can help point to experts and business owners that can contribute to the data catalog. Over time the completeness of the data catalog will be great enough everyone in the company will be able to find most of their data needs.

Create OKRs

Demonstrate the business value of your data by building Objectives and Key Results and tying them to the data products that help to drive or measure that value. Ensuring business leaders appreciate the value of their data and the importance of governance will drive prioritization and new synergies in how teams build, maintain and govern their data to create insights.

Building out an objective provides immediate recognition of the importance of the data to the users and the business it drives. This greatly enhances the understanding of the role certain data plays in business processes or in the ability to achieve their goals.

  1. Consider OKRs for process improvements, quality issues, major strategic goals, and anything else that you would measure with data to demonstrate business value and change.
  2. Make sure to create key result for each objective to show how the objective is being measured and evaluated, and create accountability to meet that goal.
  3. There can complex objectives that have many key results required to be able to accomplish. Key results might progress independently of the other key results that the measurement can show the areas that need prioritization or help to get back on track.

Compliant data access

Providing access to data can introduce risk to your company and following known standards and policies is a must to ensure access is granted appropriately and that there's responsible use of data. Users in the data catalog can complete a form for data access at the time of discovery or data use. Keeping this form and process as a part of the catalog makes access, secure, quick, and consistent for a highly variable and technical data estate. Here are some ways you can successfully set up access in your catalog:

  • Ensure the appropriate approvers are in place on data products and that they understand the processing needs of the data products.
  • Some data products might have hundreds or thousands of access requests so having a team in distributed time zones could be required to ensure timely access approval and provisioning.
  • Prepare groups or back up approvers in case there are vacations or unplanned time off.
  • Business domain owners should check on the access requests summary periodically to validate expectations and see if changes to controls monitoring the access request process are driving the desired response times.

Build logical data models with critical data elements

Improving deep technical understanding and expectations of data entities and elements will include new controls to assert if the data is meeting those expectations. Creating data dictionaries and logical models of data provide the structure and deep business expectations of the data that ensures it's fit for its purpose. By incorporating this knowledge into the Data Catalog teams will immediately gain an understanding of how data is structured and why, and how what is actually available in the physical data estate might differ.

  • Focus on the data elements that are most important for your domain. Critical data elements will show the deep expertise and importance that data has on your business.
  • Don't focus on the completeness of the elements across an entire domain. Not every column needs this level of control and many data elements might be self-explanatory for users.
  • Evaluating critical data elements across different teams ensures that business teams have a common understanding of their data and how what one team creates is impacting many other areas of the business.
  • Aligning access policies with critical data elements ensures proper access controls are in place for critical data across your whole data estate.
  • Building data quality rules for critical data elements ensures that data meets expectations no matter where or how it's being used.

Enhance data maturity

Improve your data estate and governance to fill gaps and remove bottlenecks to value creation:

  • Monitor your health actions to improve governance incrementally across your entire data catalog.
  • Optimize for new uses of the data and eliminate data issues by improving data quality.
  • Create best-in-class data products for single sources of truth with master data management.
  • Evaluate your data health and prioritize for the greatest value impact.

Investing deeply in the core data that runs your company ensures this data is usable across the entire business consistently eliminating data issues and providing a stable base for insights creation. Having evidence of data issues helps to ensure actionability of Data governance but it will drive improvements that will immediately unlock new value without investing in data areas that have low value aren't fully understood yet. Continuously improving data maturity will help teams share learnings with each other and show the proof of the improvement as changes take place.

Improve data products with governance-focused actions

Building trust in data requires continuous improvement and support. While consumers will take time to find and apply data, bringing attention to issues or support needs there are easy actions that can be taken ahead of time based on best practices. Health actions in data estate health provide a complete list of these useful actions for your data catalog, to help you focus on what you can do next to improve your governance. Here are some best practices for using health actions to get the most value:

  • Check the actions of your data products while they're still in a draft state. This ensures that when you do publish it has the basics covered and provide comfort to consumers that this data was published with care.
  • Not all actions have to be taken at the same time. Some actions could take time to resolve as you learn more about the data or work with stewards to create more clarity. Keep checking on actions to see where new improvements are ready to be made.
  • If actions begin to seem overwhelming, unnecessary or like they're low value noise, consider making changes to your health controls. Optimizing the number of actions any person takes ensures that the right level of governance is being applied to data.

Improve the trustworthiness of your data with data quality

Too often data quality is a one-off project to fix a particular problem in the data. These improvements but don't last. Good data quality requires continuous evaluation and improvement to ensure problems won't return or new problems aren't created.

  • Once a baseline of data quality expectations has been defined. building a plan to remediate issues in a timely manner is essential to keeping the business functioning with data fit for use.
  • Scheduling your data quality scans to run regularly will help to ensure consumers that data is continuously being improved and is highly supported.
  • Setting alerts on the critical rules and score changes will enable data providers to correct issues before a consumer finds or experiences an issue. Alerting can also be used to share transparently with consumers issues before they find it in an experience or by making a decision based on poor quality data.

Create source of truth data products with master data management

Some data is so critical to nearly every process and the entire business that it deserves exceptional levels of management and governance. These data entities are usually cross-cutting entities like customer lists or employee profiles and they can require deep business expertise and experience in many business processes. Some data is highly usable but low scale, and still benefits from the deeper level of control and management; like reference data attributes of country/region, currency or industry segments. Each of these data types would benefit from master data management solutions to build a source of truth that is fit for use across your entire business.

  • Practicing master data management with data quality is crucial to ensure that this vital data is clean and consistent.
  • This level of data management is high effort, so choose valuable data elements or high-risk data elements to ensure your effort produces high value.
  • Creating a critical data element and a data product for mastered data. These partner objects will help to elevate your mastered data in the data catalog and increase its use and understanding.
  • Build new health controls for master data to continuously evaluate its use at scale and prevent new unmastered data from gaining use and causing confusion in a quickly evolving data estate.

Measure governance maturity with data health controls

Evaluating the maturity of data governance at scale across the entire business is required to ensure governance is effective and creating business value. By applying the built-in measurement of controls data estate health enables the central data office or an individual business domain to see where there's more that can be done. Collecting this evidence at scale quickly elevates the most critical data issues impacting the business and where one issue maybe impacting many areas of the business. This evidence helps to resolve prioritization issues with making data management changes and quickly demonstrates the value of having the right level of governance in place.

  1. Establish a rhythm of business to review data estate health practices:
    1. Have a monthly review with business domain leaders and the central data office to discuss priorities and needs for new governance or technical solutions.
    2. Empower teams to dig deep into their data estate health reports to make sure they can make the best decisions to create the value they need in their business.
    3. Bring the data estate health to all levels of the business from the SLT to the individual steward to ensure that governance is right leveled and consistently actionable.
    4. Where data has larger issues requiring cross-business collaboration or deeper governance, consider creating a new business domain and defining ownership for driving the governance of that data.
  2. Don't expect all business domains to have the same level of maturity or be focused on the same aspects of governance:
    1. Enabling governance at the right level empowers business owners to make the most valuable decisions about what to do with their data.
    2. Not all parts of the business have the same needs of their data and forcing deeper levels of governance might not help to create business value when the focus is elsewhere.
    3. Some data is less valuable or emergent in the data estate and the value isn't yet fully known. Enabling teams to move fast and adapt to their needs they can mature their governance with the value of the data.
  3. Consistently evaluate the data estate health to look for large changes that can indicate large issues or new learnings that need attention.
  4. Share your data estate health scores. Sharing can bring teams together to learn what works for them or how they're finding new controls to build new value within a domain. Seeing what 'good' health looks like can motivate other teams to improve and ensure they're also delivering valuable data to their consumers.

Build domain-specific standards

Ensuring data governance is right sized for the level of value and control required is best handled by the business owners of the data. These business teams already have dependencies on the data and are in the best position to define their expectations and needs to make sure data is valuable.

  • Empower business domains to create new controls for their data regardless of where the data is used.
  • Don't expect all business domains to need the same level of controls or to adopt all controls. Data that is confined for use to a single part of the business by design might not benefit from a high level of control. Creating more control over data that doesn't have the appropriate value might prevent teams from collecting or keeping data that isn't fully utilized.
  • Use the right level of control to help prioritize where low value data can be removed from the business domain to eliminate risk and increase the value of the data estate.