Protect Your IP Using Document Fingerprints

In Exchange Online you can protect your intellectual property (IP) from being sent (leaked) in emails - commonly referred to as Data Loss Protection (DLP).

To define what you want to block from being sent usually involves a bit of work; eg defining patterns of say credit card numbers etc.

But DLP just got easier with what we call "Document Fingerprinting". In the same way that a person’s fingerprints have unique patterns, documents have unique word patterns.

When you upload a file, the DLP agent identifies the unique word pattern in the document, creates a document fingerprint based on that pattern, and uses that document fingerprint to detect outbound documents containing the same pattern.


To understand how this works, let’s take a look at an example scenario.

Contoso Pharma is a pharmaceutical company with a research division. Employees in the research division collaborate with their peers across the company to create new products and services, and file patents to protect their intellectual property. The law firm used by the company for patent filing uses a standard template for patent applications as shown below.

The patent template shown above contains the blank fields “Patent title,” “Inventors,” and “Description” and descriptions for each of those fields—that’s the word pattern.

When you upload the template the DLP agent uses an algorithm to convert this word pattern into a document fingerprint, which is a small Unicode XML file containing a unique hash value representing the original text, and the fingerprint is saved as a data classification in Active Directory. As a security measure, the original document itself isn’t stored on the service; only the hash value is stored, and the original document can’t be reconstructed from the hash value.

The patent fingerprint then becomes a sensitive information type that you can associate with a DLP policy.

After you associate the fingerprint with a DLP policy, the DLP agent detects any outbound emails containing documents that match the patent fingerprint and deals with them according to your organization’s policy (transport rules).

For example, you might want to set up a DLP policy that prevents regular employees from sending outgoing messages containing patents. The DLP agent will use the patent fingerprint to detect patents and block those emails. Alternatively, you might want to let your legal department to be able to send patents to other organizations because it has a business need for doing so. You can allow specific departments to send sensitive information by creating exceptions for those departments in your DLP policy, or you can allow them to override a policy tip with a business justification.

How to create a Document Fingerprint

Say you’re an administrator at Contoso Pharma. You can use Document Fingerprinting to define a customized sensitive information type called “Sensitive Information” (or whatever your prefer)

To do so, you use the administrative interface in the Exchange Admin Center (EAC) to create a new document fingerprint.

1. Open EAC from your Office 365 portal

2. Click Compliance Management -> Data Loss Prevention -> Manage Document Fingerprints to open the "Document Fingerprints" dialog

3. Click "+" to open the "New Document Fingerprint" dialog

4. Type a name for the new fingerprint (e.g. "Sensitive Information", and a description

5. Click "+" to upload a document template

6. In the Explorer navigate to and select the file you want to fingerprint and click Open

7. Verify that the file is fingerprinted (uploaded) and click Save

8. Click Close to return to EAC

How to create a Transport Rule to take action on the fingerprinted document

Now that I have the Document Fingerprint "Sensitive Information" in my service, all I need to do next is create a Transport Rule to define what action I want to take if one of my users accidently tries to send a document matching that template

1. In EAC click Mail Flow -> Rules

2. Click "+" -> "Apply Rights Protection to Messages..." to open the "New Rule" dialog

3. Type a name for the new rule, and then click the "Apply This Rule If..." drop down to define the condition

4. Click "The Message.." -> "Contains Sensitive Information" to open the "Sensitive Information Types" dialog

5. Click "+" to open the list of sensitive information types on record

6. Scroll down in the list and select your recently created type (in this example the "Sensitive Information" type) 

7. Click Add

8. Verify the new sensitive type is now listed

9. Click the "Do the Following" drop down -> "Notify the Sender with a Policy Tip" (or any other action that suits your needs)

10. Complete the actions in the "New Rule" dialog and click Save

From now on if any of the users tries to attach a patent document to an outgoing email they get a warning.

Although the scenario above refers to patents, you can easily imagine document fingerprinting being used to detect sensitive information in many other circumstances, like a hospital fingerprinting custom forms that contain personal health information etc

See also

  • Integrating Sensitive Information Rules with Transport Rules - link
  • Data loss prevention in Exchange just got better - link
  • Document Fingerprinting - link