Customize a built-in sensitive information type
When looking for sensitive information in content, you need to describe that information in what's called a rule . Data loss prevention (DLP) includes rules for the most-common sensitive information types that you can use right away. To use these rules, you have to include them in a policy. You might find that you want to adjust these built-in rules to meet your organization's specific needs, and you can do that by creating a custom sensitive information type. This topic shows you how to customize the XML file that contains the existing rule collection to detect a wider range of potential credit-card information.
You can take this example and apply it to other built-in sensitive information types. For a list of default sensitive information types and XML definitions, see What the sensitive information types look for.
Export the XML file of the current rules
To export the XML, you need to connect to the Security and Compliance Center via Remote PowerShell..
- In the PowerShell, type the following to display your organization's rules on screen. If you haven't created your own, you'll only see the default, built-in rules, labeled "Microsoft Rule Package."
- Store your organization's rules in a variable by typing the following. Storing something in a variable makes it easily available later in a format that works for remote PowerShell commands.
$ruleCollections = Get-DlpSensitiveInformationTypeRulePackage
- Make a formatted XML file with all that data by typing the following. (
Set-contentis the part of the cmdlet that writes the XML to the file.)
Set-Content -path C:\custompath\exportedRules.xml -Encoding Byte -Value $ruleCollections.SerializedClassificationRuleCollection
Make sure that you use the file location where your rule pack is actually stored.
C:\custompath\ is a placeholder.
Find the rule that you want to modify in the XML
The cmdlets above exported the entire rule collection, which includes the default rules we provide. Next you'll need to look specifically for the Credit Card Number rule that you want to modify.
Use a text editor to open the XML file that you exported in the previous section.
Scroll down to the
<Rules>tag, which is the start of the section that contains the DLP rules. Because this XML file contains the information for the entire rule collection, it contains other information at the top that you need to scroll past to get to the rules.
Look for Func_credit_card to find the Credit Card Number rule definition. In the XML, rule names can't contain spaces, so the spaces are usually replaced with underscores, and rule names are sometimes abbreviated. An example of this is the U.S. Social Security number rule, which is abbreviated "SSN." The Credit Card Number rule XML should look like the following code sample.
<Entity id="50842eb7-edc8-4019-85dd-5a5c1f2bb085" patternsProximity="300" recommendedConfidence="85"> <Pattern confidenceLevel="85"> <IdMatch idRef="Func_credit_card" /> <Any minMatches="1"> <Match idRef="Keyword_cc_verification" /> <Match idRef="Keyword_cc_name" /> <Match idRef="Func_expiration_date" /> </Any> </Pattern> </Entity>
Now that you have located the Credit Card Number rule definition in the XML, you can customize the rule's XML to meet your needs. For a refresher on the XML definitions, see the Term glossary at the end of this topic.
Modify the XML and create a new sensitive information type
First, you need to create a new sensitive information type because you can't directly modify the default rules. You can do a wide variety of things with custom sensitive information types, which are outlined in Create a custom sensitive information type in Security & Compliance Center PowerShell. For this example, we'll keep it simple and only remove corroborative evidence and add keywords to the Credit Card Number rule.
All XML rule definitions are built on the following general template. You need to copy and paste the Credit Card Number definition XML in the template, modify some values (notice the ". . ." placeholders in the following example), and then upload the modified XML as a new rule that can be used in policies.
<?xml version="1.0" encoding="utf-16"?> <RulePackage xmlns="https://schemas.microsoft.com/office/2011/mce"> <RulePack id=". . ."> <Version major="1" minor="0" build="0" revision="0" /> <Publisher id=". . ." /> <Details defaultLangCode=". . ."> <LocalizedDetails langcode=" . . . "> <PublisherName>. . .</PublisherName> <Name>. . .</Name> <Description>. . .</Description> </LocalizedDetails> </Details> </RulePack> <Rules> <!-- Paste the Credit Card Number rule definition here.--> <LocalizedStrings> <Resource idRef=". . ."> <Name default="true" langcode=" . . . ">. . .</Name> <Description default="true" langcode=". . ."> . . .</Description> </Resource> </LocalizedStrings> </Rules> </RulePackage>
Now, you have something that looks similar to the following XML. Because rule packages and rules are identified by their unique GUIDs, you need to generate two GUIDs: one for the rule package and one to replace the GUID for the Credit Card Number rule. The GUID for the entity ID in the following code sample is the one for our built-in rule definition, which you need to replace with a new one. There are several ways to generate GUIDs, but you can do it easily in PowerShell by typing [guid]::NewGuid().
<?xml version="1.0" encoding="utf-16"?> <RulePackage xmlns="https://schemas.microsoft.com/office/2011/mce"> <RulePack id="8aac8390-e99f-4487-8d16-7f0cdee8defc"> <Version major="1" minor="0" build="0" revision="0" /> <Publisher id="8d34806e-cd65-4178-ba0e-5d7d712e5b66" /> <Details defaultLangCode="en"> <LocalizedDetails langcode="en"> <PublisherName>Contoso Ltd.</PublisherName> <Name>Financial Information</Name> <Description>Modified versions of the Microsoft rule package</Description> </LocalizedDetails> </Details> </RulePack> <Rules> <Entity id="db80b3da-0056-436e-b0ca-1f4cf7080d1f" patternsProximity="300" recommendedConfidence="85"> <Pattern confidenceLevel="85"> <IdMatch idRef="Func_credit_card" /> <Any minMatches="1"> <Match idRef="Keyword_cc_verification" /> <Match idRef="Keyword_cc_name" /> <Match idRef="Func_expiration_date" /> </Any> </Pattern> </Entity> <LocalizedStrings> <Resource idRef="db80b3da-0056-436e-b0ca-1f4cf7080d1f"> <!-- This is the GUID for the preceding Credit Card Number entity because the following text is for that Entity. --> <Name default="true" langcode="en-us">Modified Credit Card Number</Name> <Description default="true" langcode="en-us">Credit Card Number that looks for additional keywords, and another version of Credit Card Number that doesn't require keywords (but has a lower confidence level)</Description> </Resource> </LocalizedStrings> </Rules> </RulePackage>
Remove the corroborative evidence requirement from a sensitive information type
Now that you have a new sensitive information type that you're able to upload to the Security & Compliance Center, the next step is to make the rule more specific. Modify the rule so that it only looks for a 16-digit number that passes the checksum but doesn't require additional (corroborative) evidence, like keywords. To do this, you need to remove the part of the XML that looks for corroborative evidence. Corroborative evidence is very helpful in reducing false positives. In this case there are usually certain keywords or an expiration date near the credit card number. If you remove that evidence, you should also adjust how confident you are that you found a credit card number by lowering the
confidenceLevel, which is 85 in the example.
<Entity id="db80b3da-0056-436e-b0ca-1f4cf7080d1f" patternsProximity="300" <Pattern confidenceLevel="85"> <IdMatch idRef="Func_credit_card" /> </Pattern> </Entity>
Look for keywords that are specific to your organization
You might want to require corroborative evidence but want different or additional keywords, and perhaps you want to change where to look for that evidence. You can adjust the
patternsProximity to expand or shrink the window for corroborative evidence around the 16-digit number. To add your own keywords, you need to define a keyword list and reference it within your rule. The following XML adds the keywords "company card" and "Contoso card" so that any message that contains those phrases within 150 characters of a credit card number will be identified as a credit card number.
<Rules> <! -- Modify the patternsProximity to be "150" rather than "300." --> <Entity id="db80b3da-0056-436e-b0ca-1f4cf7080d1f" patternsProximity="150" recommendedConfidence="85"> <Pattern confidenceLevel="85"> <IdMatch idRef="Func_credit_card" /> <Any minMatches="1"> <Match idRef="Keyword_cc_verification" /> <Match idRef="Keyword_cc_name" /> <!-- Add the following XML, which references the keywords at the end of the XML sample. --> <Match idRef="My_Additional_Keywords" /> <Match idRef="Func_expiration_date" /> </Any> </Pattern> </Entity> <!-- Add the following XML, and update the information inside the <Term> tags with the keywords that you want to detect. --> <Keyword id="My_Additional_Keywords"> <Group matchStyle="word"> <Term caseSensitive="false">company card</Term> <Term caseSensitive="false">Contoso card</Term> </Group> </Keyword>
Upload your rule
To upload your rule, you need to do the following.
Save it as an .xml file with Unicode encoding. This is important because the rule won't work if the file is saved with a different encoding.
In the PowerShell, type the following.
New-DlpSensitiveInformationTypeRulePackage -FileData (Get-Content -Path "C:\custompath\MyNewRulePack.xml" -Encoding Byte).
Make sure that you use the file location where your rule pack is actually stored.
C:\custompath\ is a placeholder.
- To confirm, type Y, and then press Enter.
- Verify that your new rule was uploaded and it's display name by typing:
To start using the new rule to detect sensitive information, you need to add the rule to a DLP policy. To learn how to add the rule to a policy, see Create a DLP policy from a template.
These are the definitions for the terms you encountered during this procedure.
|Entity||Entities are what we call sensitive information types, such as credit card numbers. Each entity has a unique GUID as its ID. If you copy a GUID and search for it in the XML, you'll find the XML rule definition and all the localized translations of that XML rule. You can also find this definition by locating the GUID for the translation and then searching for that GUID.|
|Functions||The XML file references
|IdMatch||This is the identifier that the pattern is to trying to match—for example, a credit card number.|
|Keyword lists||The XML file also references
|Pattern||The pattern contains the list of what the sensitive type is looking for. This includes keywords, regexes, and internal functions, which perform tasks like verifying checksums. Sensitive information types can have multiple patterns with unique confidences. This is useful when creating a sensitive information type that returns a high confidence if corroborative evidence is found and a lower confidence if little or no corroborative evidence is found.|
|Pattern confidenceLevel||This is the level of confidence that the DLP engine found a match. This level of confidence is associated with a match for the pattern if the pattern's requirements are met. This is the confidence measure you should consider when using Exchange mail flow rules (also known as transport rules).|
|patternsProximity||When we find what looks like a credit card number pattern,
|recommendedConfidence||This is the confidence level we recommend for this rule. The recommended confidence applies to entities and affinities. For entities, this number is never evaluated against the
For more information