在高级电子数据展示中配置搜索和分析设置Configure search and analytics settings in Advanced eDiscovery

您可以为每个高级电子数据展示案例配置设置,以控制以下功能。You can configure settings for each Advanced eDiscovery case to control the following functionality.

  • 近重复项和电子邮件线程Near duplicates and email threading

  • 主题Themes

  • 自动生成的审阅集查询Autogenerated review set query

  • 忽略文本Ignore text

  • 光学字符识别Optical character recognition

配置案例的搜索和分析设置:To configure search and analytics settings for a case:

  1. "高级电子数据展示" 页上,选择案例。On the Advanced eDiscovery page, select the case.

  2. 在"设置" 选项卡上的 "搜索&分析" 下,单击"选择"。On the Settings tab, under Search & analytics, click Select.

    将显示"大小写设置"页。The case settings page is displayed. 这些设置将应用于案例的所有审阅集。These settings are applied to all review sets in a case.

    为高级电子数据展示案例配置分析和搜索设置

近重复项和电子邮件线程Near duplicates and email threading

在此部分中,你可以为重复检测、近重复检测和电子邮件线程设置参数。In this section, you can set parameters for duplicate detection, near duplicate detection, and email threading. 有关详细信息,请参阅"近重复检测"和"电子邮件线程"。For more information, see Near duplicate detection and Email threading.

  • 近重复项/电子邮件线程: 打开后,当您对审阅集内的数据运行分析时,重复检测、近重复检测和电子邮件线程将包含在工作流中。Near duplicates/email threading: When turned on, duplicate detection, near duplicate detection, and email threading are included as part of the workflow when you run analytics on the data in a review set.

  • 文档和电子邮件相似性阈值: 如果两个文档的相似性级别高于阈值,则两个文档都放在同一个近重复集合中。Document and email similarity threshold: If the similarity level for two documents is above the threshold, both documents are put in the same near duplicate set.

  • 最小/最大字数: 这些设置指定仅对至少具有最少字数和最多最大字数的文档执行近重复项和电子邮件线程分析。Minimum/maximum number of words: These settings specify that near duplicates and email threading analysis are performed only on documents that have at least the minimum number of words and at most the maximum number of words.

主题Themes

在此部分中,您可以设置主题的参数。In this section, you can set parameters for themes. 有关详细信息,请参阅 主题For more information, see Themes.

  • 主题: 打开后,当您对审阅集中的数据运行分析时,主题群集作为工作流的一部分执行。Themes: When turned on, themes clustering is performed as part of the workflow when you run analytics on the data in a review set.

  • 最大主题数: 指定对审阅集内的数据运行分析时可生成的主题的最大数量。Maximum number of themes: Specifies the maximum number of themes that can be generated when you run analytics on the data in a review set.

  • 在主题中包括数字: 打开后,将 (主题) 时包含用于标识主题的编号。Include numbers in themes: When turned on, numbers (that identify a theme) are included when generating themes.

  • 动态调整最大主题数: 在某些情况下,审阅集内可能没有足够的文档来生成所需数量的主题。Adjust maximum number of themes dynamically: In certain situations, there may not be enough documents in a review set to produce the desired number of themes. 启用此设置后,高级电子数据展示将动态调整主题的最大数量,而不是尝试强制实施最大主题数。When this setting is enabled, Advanced eDiscovery adjusts the maximum number of themes dynamically rather than attempting to enforce the maximum number of themes.

审阅集查询Review set query

如果选中"分析后 自动创建已 保存的审阅搜索"复选框,则高级电子数据展示自动生成名为"供审阅"的审阅集 查询。If you select the Automatically create a For Review saved search after analytics checkbox, Advanced eDiscovery autogenerates review set query named For Review.

For Review 自动生成的查询

此查询基本上筛选出审阅集的重复项。This query basically filters out duplicate items from the review set. 这样,你可以查看审阅集内的唯一项目。This lets you review the unique items in the review set. 此查询仅在对案例审阅集运行分析时创建。This query is created only when you run analytics for a review set in the case. 有关详细信息,有关审阅集查询,请参阅 查询审阅集内的数据For more information, about review set queries, see Query the data in a review set.

忽略文本Ignore text

在某些情况下,某些文本会降低分析质量,例如,无论电子邮件内容如何,都会添加到电子邮件中的长免责声明。There are situations where certain text will diminish the quality of analytics, such as lengthy disclaimers that get added to email messages regardless of the content of the email. 如果您知道应忽略的文本,则可以通过指定文本字符串和分析功能 (近重复项、电子邮件线程、主题和相关性) 将其从分析中排除。If you know of text that should be ignored, you can exclude it from analytics by specifying the text string and the analytics functionality (Near-duplicates, Email threading, Themes, and Relevance) that the text should be excluded for. 此外,还 (正则表达式) 正则表达式作为忽略的文本。Using regular expressions (RegEx) as ignored text is also supported.

OCR (光学字符识别) Optical character recognition (OCR)

启用此设置后,将在图像文件上运行 OCR 处理。When this setting is turned on, OCR processing will be run on image files. OCR 处理在下列情况下运行:OCR processing is run in the following situations:

  • 将保管人 和非监管数据源 添加到案例时。When custodians and non-custodial data sources are added to a case. OCR 处理在高级索引过程中执行。OCR processing is performed during the Advanced indexing process. 这意味着匹配搜索条件的图像文件中的文本将在集合搜索中返回。This means that text in image files that matches the search criteria will be returned in a collection search.

  • 当来自其他数据源 (未与保管人关联且添加到非托管数据源中案例的内容) 添加到审阅集。When content from other data sources (that aren't associated with a custodian and added to the case in a non-custodial data source) is added to a review set.

将数据添加到审阅集后,可以审阅、搜索、标记和分析图像文本。After data is added to a review set, image text can be reviewed, searched, tagged, and analyzed. 可以在审阅集内所选图像文件的文本查看器中查看提取的文本。You can view the extracted text in the Text viewer of the selected image file in the review set. 有关详细信息,请参阅:For more information, see: