修改精确数据匹配架构,以使用可配置匹配项Modify Exact Data Match schema to use configurable match

基于精确数据匹配 (EDM) 的分类允许你创建自定义敏感信息类型,它们将引用敏感信息数据库中的精确值。Exact Data Match (EDM) based classification enables you to create custom sensitive information types that refer to exact values in a database of sensitive information. 当需要允许精确字符串的变体时,可使用 可配置匹配项 来告知 Microsoft 365 忽略大小写和某些分隔符。When you need to allow for variants of a exact string, you can use configurable match to tell Microsoft 365 to ignore case and some delimiters.

重要

请使用此过程来修改现有 EDM 架构和数据文件。Use this procedure to modify an existing EDM schema and data file.

  1. 从用于连接到 Microsoft 365 的计算机上卸载 EdmUploadAgent.exe,以实现 EDM 架构和数据文件上传目的。Uninstall the EdmUploadAgent.exe from the computer that you use to connect to Microsoft 365 for EDM schema and data file upload purposes.

  2. 使用以下链接为你的订阅下载适当的 EdmUploadAgent.exe 文件:Download the appropriate EdmUploadAgent.exe file for your subscription using the links below:

    • 商用 + GCC - 商业客户应使用此文件Commercial + GCC - most commercial customers should use this
    • GCC-High - 这是专为高安全性政府云订阅者提供的GCC-High - This is specifically for high security government cloud subscribers
    • DoD - 这是专为美国国防部云客户提供的DoD - this is specifically for United States Department of Defense cloud customers
  3. 授权 EDM 上传代理,打开命令提示符窗口(以管理员身份),然后运行以下命令:Authorize the EDM Upload Agent, open Command Prompt window (as an administrator) and run the following command:

    EdmUploadAgent.exe /Authorize

  4. 如果没有现有架构的当前副本,则需要下载现有架构的副本,然后运行以下命令:If you don't have a current copy of the existing schema, you'll need to download a copy of the existing schema, run this command:

    EdmUploadAgent.exe /SaveSchema /DataStoreName <dataStoreName> [/OutputDir [Output dir location]]

  5. 自定义架构,以便每个列都使用“caseInsensitive”和/或“ignoredDelimiters”。Customize the schema so each column utilizes “caseInsensitive” and / or “ignoredDelimiters”. “caseInsensitive”的默认值为“false”,而“ignoredDelimiters”的默认值为空字符串。The default value for “caseInsensitive” is “false” and for “ignoredDelimiters”, it is an empty string.

备注

用于检测常规正则表达式模式的基础自定义敏感信息类型或内置敏感信息类型必须支持检测用 ignoredDelimiters 列出的变体输入。The underlying custom sensitive information type or built in sensitive information type used to detect the general regex pattern must support detection of the variations inputs listed with ignoredDelimiters. 例如,内置的美国社会保险号 (SSN) 敏感信息类型可以检测数据中的差异,包括短划线、空格或构成 SSN 的分组数字之间缺少空格。For example, the built in U.S. social security number (SSN) sensitive information type can detect variations in the data that include dashes, spaces, or lack of spaces between the grouped numbers that make up the SSN. 因此,在 EDM 的 SSN 数据的 ignoredDelimiters 中包含的唯一相关分隔符是:短划线和空格。As a result, the only delimiters that are relevant to include in EDM’s ignoredDelimiters for SSN data are: dash and space.

下面是一个示例架构,它通过创建识别敏感数据中的大小写变体所需的额外列来模拟不区分大小写的匹配项。Here is a sample schema that simulates case insensitive match by creating the extra columns needed to recognize case variations in the sensitive data.

<EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm">
  <DataStore name="PatientRecords" description="Schema for patient records policy" version="1">
           <Field name="PolicyNumber" searchable="true" />
           <Field name="PolicyNumberLowerCase" searchable="true" />
           <Field name="PolicyNumberUpperCase" searchable="true" />
           <Field name="PolicyNumberCapitalLetters" searchable="true" />
  </DataStore>
</EdmSchema>

在上面的示例中,如果同时添加 caseInsensitiveignoredDelimiters,则不再需要原始 PolicyNumber 列的变体。In the above example, the variations of the original PolicyNumber column will no longer be needed if both caseInsensitive and ignoredDelimiters are added.

若要更新此架构以便 EDM 使用可配置匹配项,请使用 caseInsensitiveignoredDelimiters 标志。To update this schema so that EDM uses configurable match use the caseInsensitive and ignoredDelimiters flags. 其外观如下:Here's how that looks:

<EdmSchema xmlns="http://schemas.microsoft.com/office/2018/edm">
  <DataStore name="PatientRecords" description="Schema for patient records policy" version="1">
         <Field name="PolicyNumber" searchable="true" caseInsensitive="true" ignoredDelimiters="-,/,*,#,^" />
  </DataStore>
</EdmSchema>

ignoredDelimiters 标志支持任何非字母数字字符,以下是一些示例:The ignoredDelimiters flag supports any non-alphanumeric character, here are some examples:

  • ..
  • -
  • /
  • _
  • *
  • ^
  • #
  • !
  • ?
  • [
  • ]
  • {
  • }
  • \
  • ~
  • ;

ignoredDelimiters 标志不支持:The ignoredDelimiters flag doesn't support:

  • 字符 0-9characters 0-9
  • A-ZA-Z
  • a-za-z
  • "
  • ,
  1. 使用连接到安全与合规中心 PowerShell 中的步骤连接到安全与合规中心。Connect to the Security & Compliance center using the procedures in Connect to Security & Compliance Center PowerShell.

备注

如果您的组织已在租户级别(公共预览)设置了Microsoft 365的客户密钥,则精确数据匹配将自动使用其加密功能。If your organization has set up Customer Key for Microsoft 365 at the tenant level (public preview), Exact data match will make use of its encryption functionality automatically. 这仅适用于商业云中 E5 许可的租户。This is available only to E5 licensed tenants in the Commercial cloud.

  1. 通过一次运行以下 cmdlet 中的一个来更新架构:Update your schema by running these cmdlets one at a time:

$edmSchemaXml=Get-Content .\\edm.xml -Encoding Byte -ReadCount 0

Set-DlpEdmSchema -FileData $edmSchemaXml -Confirm:$true

  1. 如有必要,请更新数据文件以匹配新架构版本If necessary, update the data file to match the new schema version

提示

或者,可在上传前对 csv 文件运行验证,方法是运行:Optionally, you can run a validation against your csv file before uploading by running:

EdmUploadAgent.exe /ValidateData /DataFile [data file] [schema file]

有关 EdmUploadAgent.exe 支持的所有参数的详细信息,请运行For more information on all the EdmUploadAgent.exe >supported parameters run

EdmUploadAgent.exe /?

  1. 打开命令提示符窗口(以管理员身份),然后运行以下命令以便为敏感数据创建哈希并上传敏感数据:Open Command Prompt window (as an administrator) and run the following command to hash and upload your sensitive data:

    EdmUploadAgent.exe /UploadData /DataStoreName [DS Name] /DataFile [data file] /HashLocation [hash file location] /Salt [custom salt] /Schema [Schema file]