建立預測編碼模型 (預覽) Create a predictive coding model (preview)

在 Advanced eDiscovery 中使用「預測編碼的機器學習功能的第一步是建立預測編碼模型。The first step in using the machine learning capabilities of predictive coding in Advanced eDiscovery is to create a predictive coding model. 在您建立模型之後,您可以訓練此模型,識別考核集中的相關和非相關內容。After you create a model, you can train it identify the relevant and non-relevant content in a review set.

若要查看預測編碼工作流程,請參閱瞭解 Advanced eDiscovery 中的預測編碼To review the predictive coding workflow, see Learn about predictive coding in Advanced eDiscovery

建立模型之前Before you create a model

  • 若要建立預測編碼模型,檢查集合中必須至少要有2000個專案。There must be a minimum of 2,000 items in a review set to create a predictive coding model.

  • 在建立模型之前,請務必先將所有集合都認可至審閱集。Be sure to commit all collections to the review set before you create a model. 建立模型之後,將不會處理新增至考核集的專案,而且會將其指派給模型所產生的預測分數。Items added to a review set after the model is created will not be processed and assigned a prediction score that generated by the model.

  • 不含文字的複查集中的任何專案,將不會由模型處理,也不會被指定為預測分數。Any item in the review set that doesn't contain text would will not be processed by the model or assigned a prediction score. 具有文字的專案將會包含在控制項集或訓練集內。Items with text will be included in the control set or a training set.

建立模型Create a model

  1. 在 Microsoft 365 規範中心] 中,開啟 Advanced eDiscovery 案例,然後選取 [複查集] 索引標籤。In the Microsoft 365 compliance center, open an Advanced eDiscovery case and then select the Review sets tab.

  2. 開啟一個複查集,然後按一下 [分析 > 管理預測編碼 (預覽])Open a review set and then click Analytics > Manage predictive coding (preview).

    按一下 [檢查集合] 中的 [分析] 下拉式功能表,以移至 [預測編碼] 頁面

  3. 在 [ 預測編碼模型] (預覽) ] 頁面上,按一下 [ 新增模型]。On the Predictive coding models (preview) page, click New model.

  4. 在 [飛入] 頁面上,輸入模型的名稱和選用的描述。On the flyout page, type a name for the model and an optional description.

  5. 或者,您可以按一下彈出頁面上的 [ 高級選項 ],以設定 [高級設定 (]) 與「置信層級」與「錯誤」邊界相關。Optionally, you can configure advanced settings (by clicking Advanced options on the flyout page) related to the confidence level and margin of error. 這些設定會影響控制項集中包含的專案數。These settings affect the number of items included in the control set. 在訓練程式期間使用該 控制項集 ,評估模型會指派給具有您在訓練過程中所執行之標籤之專案的預測分數。The control set is used during the training process to evaluate the prediction scores that the model assigns to items with the labeling that you perform during the training rounds. 如果您的組織具有對檔審閱之信賴等級和誤差邊界的指導方針,請在適當的方塊中加以指定。If your organization has guidelines about confidence level and margin of error for document review, specify them in the appropriate boxes. 否則,請使用預設設定。Otherwise, use the default settings.

  6. 按一下 [ 儲存 ] 以建立模型。Click Save to create the model.

    系統會花幾分鐘的時間來準備您的模型。It will take a couple minutes for the system to prepare your model. 準備好後,您就可以執行第一輪訓練。After it's ready, you can perform the first round of training.

建立模型後會發生什麼事What happens after you create a model

在您建立模型之後,在建立及準備模型時,會在背景中發生下列情況:After you create a model, the following things occur in the background during the creation and preparation of the model:

  • 系統會計算控制項集的專案數。The system calculates the number of items for the control set. 這個大小是以考核集中的專案數和信賴層級的設定以及錯誤的邊界來定。This size is based on the number of items in the review set and the settings for the confidence level and the margin of error. 會隨機選取控制項集的專案,並指定為控制項集專案。Items for the control set are randomly selected and designated as control set items. 系統會在第一輪訓練中的控制項集中包含10個專案。The system includes 10 items from the control set in the first round of training.

  • 系統會從考核集內隨機選取40專案,以包含在第一輪訓練的訓練集內。The system randomly selects 40 items from the review set to be included in the training set for the first round of training. 因此,第一輪訓練包括50個專案的標籤40:每個訓練集的專案和控制項集中的10個專案。Therefore, the first round of training includes 50 items for labeling: 40 items from the training set and 10 items from the control set.

後續步驟Next steps

在您建立複查集的模型之後,下一步是執行訓練以「講授」模型,以識別與調查相關的內容。After you create a model for a review set, the next step is performing training rounds to "teach" the model to identify content that is relevant to your investigation. 如需詳細資訊,請參閱 訓練預測編碼模型For more information, see Train a predictive coding model.