Architect a classification schema for personal data

Previous articles in this series focus on using sensitive information types to identify personal data that is subject to GDPR. Sensitive information types are a form of classification. This might be all the classification you need. However, many organizations implement a broader data governance strategy using labels. Use this topic to decide if you also want to implement labels as part of your GDPR plan. If you do, this topic provides some guidance and examples.

Note: Defining a classification schema for an organization and configuring policies, labels, and conditions requires careful planning and preparation. It is important to realize that this is not an IT driven process. Be sure to work with your legal and compliance team to develop an appropriate classification and labeling schema for your organization’s data.

Decide if you are using labels in addition to sensitive data types

You can take one of two approaches for classification in Office 365 for personal information. Either of these can be used for GDPR protection. If decide to use only sensitive information types for classification, you can skip the rest of this topic.

Choose one of the following options.

Option 1: Use only Office 365 sensitive information types

  • Sensitive information types work well to identify and protect personal data subject to GDPR and other types of regulations.

  • These are simpler to use if your organization doesn’t already have or plan to implement a broader data governance plan using labels.

  • These work with DLP rules (so do Office labels).

  • In the future these will work with Cloud App Security so you can detect sensitive information in other SaaS apps.

Option 2: Use sensitive information types + Office labels

  • You’ll need sensitive information types to automatically apply labels to personal data that is subject to GDPR, so these are a prerequisite.

  • Using Office labels allows you to include personal data that is subject to GDPR into a broader data governance plan for your organization.

  • Later, Office labels will converge with Azure Information Protection labels into a unified classification and labeling engine.

Develop a label schema that includes personal data

Before using technical capabilities to apply labels and protection, first work across your organization to define a classification schema. Your organization might already have a classification schema, which makes it easier to add personal data. This topic includes an example classification schema. You can use this as a starting point, if needed.

Getting started

Begin by deciding on the number and names of labels to implement. Do this activity without worrying about which technology to use and how labels will be applied. Apply this schema universally throughout your organization, including data that resides on premises and in other cloud services.

Recommendations

When designing and implementing policies, labels and conditions, consider following these recommendations:

  • Use existing classification schema (if any) — Many organizations already are using data classification in some form. Carefully evaluate the existing label schema and if possible use it as is. Using familiar labels that are recognizable to the end-user will drive adoption.

  • Start with default policies and labels — All solutions come with a set of predefined policies and labels. Carefully evaluate these against the organizations legal and business requirements and consider using them instead of creating new ones.

  • Start small — There is virtually no limit to the number of labels that can be created. However, large numbers of labels and sub-labels will negatively impact the adoption. Too many choices often means no choice at all.

  • Use scenarios and use cases — Identify common use cases within the organization and use scenarios derived from the GDPR to verify if the envisioned label and classification configuration will work in practice.

  • Question every request for a new label, does every scenario or use case really need a new label or can we use what we already have? Keeping the number of labels to a minimum improves adoption.

  • Use sub-labels for key departments, some departments will have specific needs that require specific labels. Define these labels as sub-labels to an existing label and consider using scoped policies that are assigned to user groups instead of globally.

  • Consider scoped policies, polices targeted at subsets of users will prevent "label overload". A scoped policy enables assigning role or department specific (sub-)labels to just employees that work for that specific department.

  • Use meaningful label names, it is recommended not to use jargon, standards or acronyms as label names. Try to use names that resonate with the end user to improve adoption. Instead of using labels like PII, PCI, HIPAA, LBI, MBI and HBI consider names like Non-Business, Public, General, Confidential and Highly Confidential.

Example classification schema

Label name Description
Personal Non-business data, for personal use only.
Public Business data that is specifically prepared and approved for public consumption.
Customer data Business data that contains personal identifiable information. Examples are credit card numbers, bank account numbers, and social security numbers.
HR data Human Resource data about Contoso employees, such as employee number and salary data.
Confidential Sensitive business data that could cause damage to the business if shared with unauthorized people. Examples include contracts, security reports, forecast summaries, and sales account data.
Highly confidential Very sensitive business data that would cause damage to the business if it was shared with unauthorized people. Examples include employee and customer information, passwords, source code, and pre-announced financial reports.

Define a taxonomy and search criteria for each label

After developing a classification schema for your organization, the next step is to develop the taxonomy and search criteria for finding this data. For personal data, you’ve already completed this work by identifying sensitive information types and also by customizing or creating new sensitive information types for your environment.

The following table provides an example schema, taxonomy, and search criteria for an organization. The labels are ordered by sensitivity level from least sensitive to most sensitive to ensure that data that matches multiple label conditions is assigned the appropriate label.

Note: The configuration example is provided for illustration only and is not intended as deployment guidance or reference.

The important takeaway is to ensure that the work you invest to classify personal data for GDPR compliance fits together with the objectives for your entire organization.

Example schema, taxonomy, and search criteria

Label Taxonomy Method Search syntax
Personal Documents manually labelled personal by the end user. Manual Documents manually labelled personal by the end user.
Public Documents containing the case insensitive phrase Approved for Public Release ##/#### where # represents any digit.

KQL

RegEx

KQL — Approved for Public Release*

RegEx — (?i)(\bApproved for Public Release \d{2}\/\d{4}\b)

Customer data Sensitive information types for EU citizen data. Sensitive information types
Human Resources — Employee Data Documents that include the case sensitive employee id in the format CONTOSO-9##### where # represents any digit.

KQL

RegEx

KQL — CONTOSO-9*

RegEx — (\bCONTOSO-9\d{5}\b)

Human Resources — Salary Data Documents that include the keyword (not case sensitive) Contoso AND either keyword (not case sensitive) Salary OR Compensation

KQL

RegEx

KQL — Contoso AND (Salary OR Compensation)

RegEx — (\bCONTOSO-9\d{5}\b)

Confidential Documents containing the phrase (not case sensitive) Contoso Confidential.

KQL

RegEx

KQL — Contoso Confidential

RegEx — (?i)(\bContoso Confidential\b)

Highly confidential Documents that include either pharase (case sensitive) Contoso Secret or Secret-C#### where # represents any digit.

KQL

RegEx

KQL — Contoso Secret OR Secret-C*

RegEx — (?i)(\bContoso Secret\b)|(\bSecret-C\d{4}\b)