深入瞭解 Advanced eDiscovery (預覽中的預測編碼) Learn about predictive coding in Advanced eDiscovery (preview)

Advanced eDiscovery 中的預測編碼模組使用智慧的機器學習功能,協助您減少要查看的內容數量。The predictive coding module in Advanced eDiscovery uses the intelligent, machine learning capabilities to help you reduce the amount of content to review. 預測編碼可協助您將大量案例內容縮小及挑選為一組可供查看的相關專案。Predictive coding helps you reduce and cull large volumes of case content to a relevant set of items that you can prioritize for review. 這是透過建立及訓練您自己的預測編碼模型,協助您在評審集中查看最相關專案的優先順序。This is accomplished by creating and training your own predictive coding models that help you prioritize the review of the most relevant items in a review set.

預測編碼模組的設計目的是為了簡化在審查集內管理模型的複雜性,並提供重複的方法來訓練您的模型,讓您在 Advanced eDiscovery 中的機器學習功能能夠快速開始。The predictive coding module is designed to streamline the complexity of managing a model within a review set and provide an iterative approach to training your model so you can get started faster with the machine learning capabilities in Advanced eDiscovery. 若要開始使用,您可以建立模型,並將其標籤為相關或不相關的50專案。To get started, you can create a model, label as few as 50 items as relevant or not relevant. 系統會使用這項訓練,對考核集中的每個專案套用預測分數。The system uses this training to apply prediction scores to every item in the review set. 這可讓您根據預測分數來篩選項目,這可讓您先查看最相關的 (或非相關的) 專案。This lets you filter items based on the prediction score, which allows you to review the most relevant (or non-relevant) items first. 如果您想要使用較高的精度和重新叫付率訓練模型,您可以在後續訓練中繼續標示專案,直到模型穩定為止。If you want to train models with higher accuracies and recall rates, you can continue labeling items in subsequent training rounds until the model stabilizes.

預測編碼工作流程The predictive coding workflow

以下是每個步驟預測編碼工作流程的概述和描述。Here's an overview and description of each step predictive coding workflow. 如需預測編碼程式之概念及術語的詳細描述,請參閱 預測編碼參考For a more detailed description of the concepts and terminology of the predictive coding process, see Predictive coding reference.

預測編碼工作流程

  1. 在複查集內建立新的預測編碼模型Create a new predictive coding model in the review set. 第一步是在審查集內建立新的預測編碼模型。The first step is to create a new predictive coding model in the review set. 您必須在 [複查集] 中有至少2000個專案,才能建立模型。You must have at least 2,000 items in the review set to create a model. 在您建立模型之後,系統會決定要用來做為 控制項集 的專案數。After you create a model, the system will determine the number of items to use as a control set. 在訓練程式期間使用該控制項集,評估模型會指派給具有您在訓練過程中所執行之標籤之專案的預測分數。The control set is used during the training process to evaluate the prediction scores that the model assigns to items with the labeling that you perform during training rounds. 控制項集的大小取決於審閱集中的專案數,以及建立模型時所設定之錯誤值的置信層級與邊際界限。The size of the control set is based on the number of items in the review set and the confidence level and margin of error values that are set when creating the model. 控制項集中的專案永遠不會變更,而且無法識別使用者。Items in the control set never change and aren't identifiable to users.

    如需詳細資訊,請參閱 建立預測編碼模型For more information, see Create a predictive coding model.

  2. 專案標記為相關或不相關,以完成第一個訓練舍入Complete the first training round by labeling items as relevant or not relevant. 下一步是透過開始一輪訓練,訓練模型。The next step is to train the model by starting the first round of training. 當您開始訓練時,模型會隨機從審閱集(稱為 訓練集)中選取其他專案。When you start a training round, the model randomly selects additional items from the review set, which is called the training set. 這兩個專案 (從控制項集和訓練集) 呈現給您,讓您可以將每個專案標記為 "相關" 或 "不相關"。These items (both from the control set and the training set) are presented to you so that you can label each one as either "relevant" or "not relevant". 相關性取決於專案中的內容,而不是任何檔中繼資料。Relevancy is based on the content in the item and not any of the document metadata. 當您完成訓練中的標籤程式後,模型會根據您標示訓練集中專案的方式來「瞭解」。After you complete the labeling process in the training round, the model will "learn" based on how you labeled the items in the training set. 根據這項訓練,模型會處理考核集中的專案,並對每個專案套用一個預測分數。Based on this training, the model will process the items in the review set and apply a prediction score to each one.

    如需詳細資訊,請參閱 訓練預測編碼模型For more information, see Train a predictive coding model.

  3. 套用 對考核集中專案的預測分數篩選器Apply the prediction score filter to items in review set. 在上一個訓練步驟完成之後,下一步是對評審中的專案套用預測分數篩選,以顯示模型已確定為 "最相關" 的專案 (此外,您也可以使用預測篩選來顯示「不相關」 ) 的專案。After the previous training step is completed, the next step is to apply the prediction score filter to the items in the review to display the items that the model has determined are "most relevant" (alternatively, you could also use a prediction filter to display items that are "not relevant"). 當您套用預測篩選時,您會指定要篩選的預測分數範圍。When you apply the prediction filter, you specify a range of prediction scores to filter. 聯想分數範圍介於 01 之間, 0 表示「不相關」,而 1 則是相關的。The range of prediction scores fall between 0 and 1, with 0 being "not-relevant" and 1 being relevant. 一般說來,具有預測分數 00.5 的專案會被視為「不相關」,而在 0.51 之間的預測分數會被視為相關專案。In general, items with prediction scores between 0 and 0.5 are considered "not-relevant" and items with prediction scores between 0.5 and 1 are considered relevant.

    如需詳細資訊,請參閱 Apply a 聯想 filter to a 審校 setFor more information, see Apply a prediction filter to a review set.

  4. 執行更多訓練,直到模型穩定為止Perform more training rounds until the model stabilizes. 如果您想要以更高的預測準確度和提高的重新取率來建立模型,您可以執行其他的訓練。You can perform additional rounds of training if you want to create a model with a higher accuracy of prediction and increased recall rates. 召回比率 度量模型所預測的專案與實際相關的專案( (您在訓練) 中標示為相關的專案)相關的專案比例。Recall rate measures the proportion of items the model predicted were relevant among items that are actually relevant (the ones you marked as relevant during training). 召回比率分數的範圍是從 01The recall rate score ranges from 0 to 1. 分數越接近于 1 ,表示模型會識別更相關的專案。A score closer to 1 indicates the model will identify more relevant items. 在新的訓練迴圈中,您可以在新的訓練集中標示其他專案。In a new training round, you label additional items in a new training set. 在您完成該訓練後,模型會根據訓練集中的最近一輪標籤專案的新學習進行更新。After you complete that training round, the model is updated based on new learning from your most recent round of labeling items in the training set. 模型將會再次處理複查集中的專案,並套用新的預測分數。The model will process the items in the review set again, and apply new prediction scores. 您可以繼續執行訓練,直到模型穩定為止。You can continue performing training rounds until your model stabilizes. 當最新的訓練舍入率低於5% 時,會被視為穩定的模型。A model is considered stabilized when the churn rate after the latest round of training is less than 5%. 改動率」是定義為考核集(在訓練值之間變更的預測分數)中的專案百分比。Churn rate is defined as percentage of items in a review set where the prediction score changed between training rounds. 「預測編碼」儀表板會顯示協助您評估模型穩定性的資訊和統計資料。The predictive coding dashboard displays information and statistics that help you assess the stability of a model.

  5. 套用 "final" 預測分數篩選,以查看設定專案以優先審閱Apply the "final" prediction score filter to review set items to prioritize review. 完成所有訓練並以穩定模式後,最後一個步驟是將最後的預測分數套用至評審集,以排定相關和非相關專案的複查優先順序。After you complete all the training rounds and stabilize the model, the last step is to apply the final prediction score to the review set to prioritize the review of relevant and non-relevant items. 這是您在步驟3中執行的相同工作,但是此時模型是穩定的,您不會計畫執行任何其他訓練舍入。This is the same task that you performed in step 3, but at this point the model is stable and you don't plan on running any more training rounds.