Using a built-in classifier (preview)
Microsoft has trained and tested a number of classifiers using very large sample data sets, which can help to identify certain categories of content. See Getting started with trainable classifiers (preview). These classifiers show up in the
Ready to use group by default.
- Offensive Language: detects text items that contain profanities, slurs, taunts, and disguised expressions (which are expressions that have the same meaning as a more offensive term).
- Resumes: detects items that are textual accounts of an applicant's personal, educational, professional qualifications, work experience, and other personally identifying information.
- Source Code: detects items that contain a set of instructions and statements written in the top 25 used computer programming languages on GitHub.
- Harassment: detects a specific category of offensive language text items related to offensive conduct targeting one or multiple individuals based on the following traits: race, ethnicity, religion, national origin, gender, sexual orientation, age, disability.
- Profanity: detects a specific category of offensive language text items that contain expressions that embarrass most people.
- Threat: detects a specific category of offensive language text items related to threats to commit violence or do physical harm or damage to a person or property,
Before using built-in classifiers in your classification and labeling workflow, you should test it against a sample of your organization's content that you feel fits the category to verify that its classification predictions meet your expectations.
Please note that the offensive language, harassment, profanity, and threat classifiers only work with searchable text are not exhaustive or complete. Further, language and cultural standards continually change, and in light of these realities, Microsoft reserves the right to update these classifiers in its discretion. While the classifiers may assist your organization in monitoring offensive and other language used, the classifiers do not address consequences of such language and are not intended to provide your organization's sole means of monitoring or responding to the use of such language. Your organization, and not Microsoft or its subsidiaries, remains responsible for all decisions related to monitoring, enforcement, blocking, removal and retention of any content identified by a pre-trained classifier.
How to prepare for and use a built-in classifier
- Collect disposable test content items that you feel belong in the category of the built-in classifier (positive matches) and ones that shouldn't be included (negative matches) in the category you're testing.
The sample items must not be encrypted and they must be in English.
Create a dedicated SharePoint Online folder; wait at least an hour for the folder to be added to the search index. Make note of the folder URL.
Sign in to Microsoft 365 compliance center with compliance admin or security admin role access and open Microsoft 365 compliance center > Records management (preview) > Label policies tab.
Auto-apply a label.
Choose a label to auto-apply.
Create new labelsand create a label for use just with this test. When you do this, leave
Retentionset to off. You don't want to turn on any retention or other actions. In this case, you'll be using the retention label simply as a text label, without enforcing any actions. For example, you can create a retention label named "SourceCode classifier test" with no actions, and then auto-apply that retention label to content that has Source code classifier as a condition. To learn more about creating retention labels, see Overview of retention labels.
Auto-apply a labeland then
Choose a label to auto-apply. To learn more about using condition based auto-apply a label see, auto-apply retention label policy based on a condition.
Choose your test label from the list and choose
Apply label to content that matches a trainable classifier.
Choose your classifier from the list, in this case
Name the policy, for example "Source code built-in classifier test".
Let me choose specific locations.
Turn off all locations except
SharePoint sitesand choose
Enter the URL for the site from step 2.
Finish the wizard and choose
Place the test items into the dedicated SharePoint Online folder.
Allow an hour for the label to be applied.
Check the properties of the documents for the label to see if the classifier included and excluded the test content as you expected.
Review the items that were labeled.
Delete the content and the label policy if you're done with your testing.