Modify Exact Data Match schema to use configurable match
Exact Data Match (EDM) based classification enables you to create custom sensitive information types that refer to exact values in a database of sensitive information. When you need to allow for variants of a exact string, you can use configurable match to tell Microsoft 365 to ignore case and some delimiters.
Use this procedure to modify an existing EDM schema and data file.
Uninstall the EdmUploadAgent.exe from the computer that you use to connect to Microsoft 365 for EDM schema and data file upload purposes.
Download the appropriate EdmUploadAgent.exe file for your subscription using the links below:
Authorize the EDM Upload Agent, open Command Prompt window (as an administrator) and run the following command:
If you don't have a current copy of the existing schema, you'll need to download a copy of the existing schema, run this command:
EdmUploadAgent.exe /SaveSchema /DataStoreName <dataStoreName> [/OutputDir [Output dir location]]
Customize the schema so each column utilizes “caseInsensitive” and / or “ignoredDelimiters”. The default value for “caseInsensitive” is “false” and for “ignoredDelimiters”, it is an empty string.
The underlying custom sensitive information type or built in sensitive information type used to detect the general regex pattern must support detection of the variations inputs listed with ignoredDelimiters. For example, the built in U.S. social security number (SSN) sensitive information type can detect variations in the data that include dashes, spaces, or lack of spaces between the grouped numbers that make up the SSN. As a result, the only delimiters that are relevant to include in EDM’s ignoredDelimiters for SSN data are: dash and space.
Here is a sample schema that simulates case insensitive match by creating the extra columns needed to recognize case variations in the sensitive data.
<EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm"> <DataStore name="PatientRecords" description="Schema for patient records policy" version="1"> <Field name="PolicyNumber" searchable="true" /> <Field name="PolicyNumberLowerCase" searchable="true" /> <Field name="PolicyNumberUpperCase" searchable="true" /> <Field name="PolicyNumberCapitalLetters" searchable="true" /> </DataStore> </EdmSchema>
In the above example, the variations of the original
PolicyNumbercolumn will no longer be needed if both
To update this schema so that EDM uses configurable match use the
ignoredDelimitersflags. Here's how that looks:
<EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm"> <DataStore name="PatientRecords" description="Schema for patient records policy" version="1"> <Field name="PolicyNumber" searchable="true" caseInsensitive="true" ignoredDelimiters="-,/,*,#,^" /> </DataStore> </EdmSchema>
ignoredDelimitersflag supports any non-alphanumeric character, here are some examples:
ignoredDelimitersflag doesn't support:
- characters 0-9
Connect to the Security & Compliance center using the procedures in Connect to Security & Compliance Center PowerShell.
If your organization has set up Customer Key for Microsoft 365 at the tenant level (public preview), Exact data match will make use of its encryption functionality automatically. This is available only to E5 licensed tenants in the Commercial cloud.
Update your schema by running these cmdlets one at a time:
$edmSchemaXml=Get-Content .\\edm.xml -Encoding Byte -ReadCount 0
Set-DlpEdmSchema -FileData $edmSchemaXml -Confirm:$true
If necessary, update the data file to match the new schema version
Optionally, you can run a validation against your csv file before uploading by running:
EdmUploadAgent.exe /ValidateData /DataFile [data file] [schema file]
For more information on all the EdmUploadAgent.exe >supported parameters run
Open Command Prompt window (as an administrator) and run the following command to hash and upload your sensitive data:
EdmUploadAgent.exe /UploadData /DataStoreName [DS Name] /DataFile [data file] /HashLocation [hash file location] /Salt [custom salt] /Schema [Schema file]