開始使用可訓練分類器(預覽版)Get started with trainable classifiers (preview)

Microsoft 365 trainable 分類器是一種工具,可讓您訓練以辨識各種類型的內容,其範例可供您查看。A Microsoft 365 trainable classifier is a tool you can train to recognize various types of content by giving it samples to look at. 經過訓練之後,您可以使用它來識別應用 Office 敏感度標籤、通訊合規性原則及保留標籤原則的專案。Once trained, you can use it to identify item for application of Office sensitivity labels, Communications compliance policies, and retention label policies.

建立自訂的 trainable 分類器優先于提供以人工方式挑選並正確符合類別的範例。Creating a custom trainable classifier first involves giving it samples that are human picked and positively match the category. 然後,在處理完這類功能之後,您可以透過提供正負樣本的混合來測試分類器的預測能力。Then, after it has processed those, you test the classifiers ability to predict by giving it a mix of positive and negative samples. 本文說明如何建立及訓練自訂的分類器,以及如何透過重新培訓,改善自訂 trainable 分類器和預先訓練的分類器的效能。This article shows you how to create and train a custom classifier and how to improve the performance of custom trainable classifiers and pre-trained classifiers over their lifetime through retraining.

若要深入瞭解不同類型的分類器,請參閱 瞭解如何 trainable 的分類器 (預覽) To learn more about the different types of classifiers, see Learn about trainable classifiers (preview).

必要條件Prerequisites

授權需求Licensing requirements

分類器是 Microsoft 365 E5 或 E5 規範功能。Classifiers are a Microsoft 365 E5, or E5 Compliance feature. 您必須具有這些訂閱中的其中一項,才能使用這些訂閱。You must have one of these subscriptions to make use of them.

權限Permissions

在 UI 中存取分類器:To access classifiers in the UI:

  • 全域管理員必須加入宣告租使用者,才能建立自訂分類符。the Global admin needs to opt in for the tenant to create custom classifiers.
  • 合規性管理員角色或符合性資料管理員是訓練分類器所需的。the Compliance admin role or Compliance Data Administrator is required to train a classifier.

在下列情況下,您將需要具有這些許可權的帳戶才能使用分類器:You'll need accounts with these permissions to use classifiers in these scenarios:

  • 保留標籤原則案例:記錄管理和保留管理角色Retention label policy scenario: Record Management and Retention Management roles
  • 敏感度標籤原則案例:安全性管理員、合規性管理員、規範資料管理員Sensitivity label policy scenario: Security Administrator, Compliance Administrator, Compliance Data Administrator
  • 通訊相容性原則案例:內幕人士風險管理管理員、主管審查管理員Communication compliance policy scenario: Insider Risk Management Admin, Supervisory Review Administrator

重要

根據預設,只有建立自訂分類器的使用者可以訓練及審核該分類器所進行的預測。By default, only the user who creates a custom classifier can train and review predictions made by that classifier. 如果您想要讓其他人訓練及審閱分類器預測,請參閱 授與其他人訓練及審查許可權If you want others to be able to train and review classifier predictions, see Give others train and review rights.

準備自訂 trainable 分類器Prepare for a custom trainable classifier

深入瞭解建立自訂 trainable 分類器之前,請先瞭解相關專案。It's helpful to understand what's involved in creating a custom trainable classifier before you dive in.

時間表Timeline

此時程表會反映 trainable 分類器的範例部署。This timeline reflects a sample deployment of trainable classifiers.

trainable-分類器-時程表

提示

Trainable 分類器第一次需要自願加入。Opt-in is required the first time for trainable classifiers. Microsoft 365 需要12天的時間來完成組織內容的基準評估。It takes twelve days for Microsoft 365 to complete a baseline evaluation of your organizations content. 請與您的全域系統管理員聯繫,以啟動自願加入處理常式。Contact your global administrator to kick off the opt-in process.

整體工作流程Overall workflow

若要深入瞭解建立自訂 trainable 分類程式的整體工作流程,請參閱 建立客戶 trainable 分類器的處理流程。To understand more about the overall workflow of creating custom trainable classifiers, see Process flow for creating customer trainable classifiers.

Seed 內容Seed content

當您想要 trainable 的分類器獨立且準確地將專案識別為屬於特定類別的內容時,您必須先呈現該類別中內容類型的許多範例。When you want a trainable classifier to independently and accurately identify an item as being in particular category of content, you first have to present it with many samples of the type of content that are in the category. Trainable 分類器的此樣本饋送稱為「 入」。This feeding of samples to the trainable classifier is known as seeding. Seed content 是由人類選取,並會判斷代表內容的類別。Seed content is selected by a human and is judged to represent the category of content.

提示

您必須具有至少50的肯定範例,以及多達500。You need to have at least 50 positive samples and as many as 500. Trainable 分類程式會處理最多500最近建立的範例 (按檔建立的日期/時間戳記) 。The trainable classifier will process up to the 500 most recent created samples (by file created date/time stamp). 您提供的範例越多,分類器所做的預測就越精確。The more samples you provide, the more accurate the predictions the classifier will make.

測試內容Testing content

Trainable 分類器處理足夠的肯定樣本來建立預測模型之後,您必須測試所做的預測,以查看分類器是否可以正確辨別與類別及專案不符的專案。Once the trainable classifier has processed enough positive samples to build a prediction model, you need to test the predictions it makes to see if the classifier can correctly distinguish between items that match the category and items that don't. 若要這麼做,您可以選取另一個,但願會包含應屬於類別的範例,也可以是更大的人工挑選內容集合,也就是不是的範例。You do this by selecting another, hopefully larger, set of human picked content that consists of samples that should fall into the category and samples that won't. 您應測試與您第一次提供之原始 seed 資料不同的資料。You should test with different data than the initial seed data you first provided. 一旦處理好,您就可以手動流覽結果,並確認每個預測是否正確、不正確或不確定。Once it processes those, you manually go through the results and verify whether each prediction is correct, incorrect, or you aren't sure. Trainable 分類器使用這種意見反應來改善其預測模型。The trainable classifier uses this feedback to improve its prediction model.

提示

為了獲得最佳結果,您的測試範例集合中至少有200個專案,且具有正值和負比對的均勻分佈。For best results, have at least 200 items in your test sample set with an even distribution of positive and negative matches.

如何建立 trainable 的分類器How to create a trainable classifier

  1. 在 50-500 seed content 專案之間收集。Collect between 50-500 seed content items. 這兩個範例都必須是強烈代表您想要讓 trainable 分類器正確識別為分類類別中的內容類型的範例。These must be only samples that strongly represent the type of content you want the trainable classifier to positively identify as being in the classification category. 如需支援的檔案類型,請參閱 SharePoint Server 中的預設編目副檔名和分析檔案類型See, Default crawled file name extensions and parsed file types in SharePoint Server for the supported file types.

    重要

    Seed 及 test 範例專案不能加密,必須是英文。The seed and test sample items must not be encrypted and they must be in English.

    重要

    請確定 seed 集合中的專案是類別的 範例。Make sure the items in your seed set are strong examples of the category. Trainable 分類器最初會根據您植入的模型來建立模型。The trainable classifier initially builds its model based on what you seed it with. 分類器假設所有種子範例都是強陽性的,而且沒有任何方式知道樣本是否與類別弱或消極。The classifier assumes all seed samples are strong positives and has no way of knowing if a sample is a weak or negative match to the category.

  2. 將 seed 內容放在專為 只保留 seed 內容的 SharePoint Online 資料夾中。Place the seed content in a SharePoint Online folder that is dedicated to holding the seed content only. 請記下網站、文件庫和資料夾 URL。Make note of the site, library, and folder URL.

    提示

    如果您為種子資料建立新的網站和資料夾,至少要有一個小時的時間進行索引,才可建立會使用該植入資料的 trainable 分類器。If you create a new site and folder for your seed data, allow at least an hour for that location to be indexed before creating the trainable classifier that will use that seed data.

  3. 使用合規性管理員或安全性系統管理員角色存取,登入 microsoft 365 合規性中心,並開啟microsoft 365 規範中心microsoft 365 的安全性中心 > 資料分類Sign in to Microsoft 365 compliance center with compliance admin or security admin role access and open Microsoft 365 compliance center or Microsoft 365 security center > Data classification.

  4. 選擇 [ Trainable 類元 ] 索引標籤。Choose the Trainable classifiers tab.

  5. 選擇 [ 建立 trainable 的分類器]。Choose Create trainable classifier.

  6. 針對 Name Description 您想要此 trainable 的分類器識別之專案類別的 [和] 欄位填入適當的值。Fill in appropriate values for the Name and Description fields of the category of items you want this trainable classifier to identify.

  7. 從步驟2挑選 seed 內容網站的 SharePoint 線上網站、程式庫和資料夾 URL。Pick the SharePoint Online site, library, and folder URL for the seed content site from step 2. 選擇 AddChoose Add.

  8. 複查設定並選擇 Create trainable classifierReview the settings and choose Create trainable classifier.

  9. 在24小時內,trainable 分類程式會處理 seed 資料,並建立預測模型。Within 24 hours the trainable classifier will process the seed data and build a prediction model. 分類器狀態是在 In progress 處理 seed 資料時。The classifier status is In progress while it processes the seed data. 當分類器完成處理 seed 資料時,狀態變更為 Need test itemsWhen the classifier is finished processing the seed data, the status changes to Need test items.

  10. 您現在可以選擇分類器來查看 [詳細資料] 頁面。You can now view the details page by choosing the classifier.

    trainable 分類器準備好用於測試trainable classifier ready for testing

  11. 請至少收集200測試內容專案 (10000 最大) 以取得最佳結果。Collect at least 200 test content items (10,000 max) for best results. 這些專案應該混合使用強陽性、強負片和有些專案,但其性質卻不明顯。These should be a mix of items that are strong positives, strong negatives and some that are a little less obvious in their nature. 如需支援的檔案類型,請參閱 SharePoint Server 中的預設編目副檔名和分析檔案類型See, Default crawled file name extensions and parsed file types in SharePoint Server for the supported file types.

    重要

    範例專案不能加密,必須是英文。The sample items must not be encrypted and they must be in English.

  12. 將測試內容放入專用於 只保留測試內容的 SharePoint Online 資料夾中。Place the test content in a SharePoint Online folder that is dedicated to holding the test content only. 請記下 SharePoint 線上網站、文件庫和資料夾 URL。Make note of the SharePoint Online site, library, and folder URL.

    提示

    如果您為測試資料建立新的網站和資料夾,至少要有一個小時的時間,才能建立會使用該植入資料的 trainable 分類程式。If you create a new site and folder for your test data, allow at least an hour for that location to be indexed before creating the trainable classifier that will use that seed data.

  13. 選擇 Add items to testChoose Add items to test.

  14. 從步驟12的測試內容網站,挑選 SharePoint 線上網站、程式庫和資料夾 URL。Pick the SharePoint Online site, library, and folder URL for the test content site from step 12. 選擇 AddChoose Add.

  15. 選擇 [完成] DoneFinish the wizard by choosing Done. 您的 trainable 分類程式會需要長達一小時才能處理測試檔案。Your trainable classifier will take up to an hour to process the test files.

  16. 當 trainable 分類器完成處理測試檔案時,[詳細資料] 頁面上的狀態會變更為 Ready to reviewWhen the trainable classifier is done processing your test files, the status on the details page will change to Ready to review. 如果您需要增加測試樣本大小,請選擇 Add items to test 並允許 trainable 的分類器處理其他專案。If you need to increase the test sample size, choose Add items to test and allow the trainable classifier to process the additional items.

    準備好回顧螢幕擷取畫面ready to review screenshot

  17. 選擇 [索引標籤 Tested items to review ] 以查看專案。Choose Tested items to review tab to review items.

  18. Microsoft 365 一次會出現30個專案。Microsoft 365 will present 30 items at a time. 請加以檢查,然後在方塊中 We predict this item is "Relevant". Do you agree? 選擇 [ Yes 或] 或] No Not sure, skip to next itemReview them and in the We predict this item is "Relevant". Do you agree? box choose either Yes or No or Not sure, skip to next item. 模型準確性會在每30個專案之後自動更新。Model accuracy is automatically updated after every 30 items.

    [複查專案] 對話方塊review items box

  19. 回顧 至少 200 個專案。Review at least 200 items. 當精確度分數穩定之後,[ 發行 ] 選項就會變成可用,分類程式的狀態將會說 Ready to useOnce the accuracy score has stabilized, the publish option will become available and the classifier status will say Ready to use.

    精確度分數並準備發佈accuracy score and ready to publish

  20. 發佈分類器。Publish the classifier.

  21. 一旦發佈您的分類符,就會在使用靈敏度標籤的 Office 自動標籤中提供條件,根據狀況和通訊規範自動套用保留標籤原則Once published your classifier will be available as a condition in Office auto-labeling with sensitivity labels, auto-apply retention label policy based on a condition and in Communication compliance.

讓其他人訓練和審閱許可權Give others train and review rights

使用此程式可提供其他人訓練、審閱及調整自訂 trainable 分類器的許可權。Use this procedure to give others permissions to train, review and tune your custom trainable classifier.

  1. 就像是分類器的建立者,全域系統管理員或 eDiscovery 系統管理員會使用 PowerShell,使用 connect To Security & 規範中心 PowerShell中的程式來連接至規範中心。As the creator of the classifier, a global admin or eDiscovery admin connect to the Compliance center using PowerShell using the procedures in Connect to Security & Compliance Center PowerShell.

  2. 執行此命令︰Run this command:

    Add-ComplianceCaseMember -Case "<classifier name>" -Member "<user or role group>"
    

    例如:For example:

    Add-ComplianceCaseMember -Case "Financial Contract Classifier" -Member johnevans@contoso.com

    您可以多次執行此命令來新增多個使用者。You can run this command multiple times to add multiple users. 請注意,您只能將 Exchange Online Protection (EOP) 角色群組和 Azure 角色群組。Note that you can only add Exchange Online Protection (EOP) Role Groups and not Azure Role Groups.