您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

了解文本审查概念Learn text moderation concepts

使用内容审查器的计算机辅助文本审查和人工评审功能来审核文本内容。Use Content Moderator’s machine-assisted text moderation and human review capabilities to moderate text content.

可根据策略和阈值阻止、批准或审核内容。You either block, approve or review the content based on your policies and thresholds. 用它来增强对合作伙伴、员工和消费者生成文本内容的环境的人工审核。Use it to augment human moderation of environments where partners, employees and consumers generate text content. 这些环境包括聊天室、讨论区、聊天机器人、电子商务目录和文档。These include chat rooms, discussion boards, chatbots, e-commerce catalogs, and documents.

服务响应包含以下信息:The service response includes the following information:

  • 不敬词:根据多种语言的内置不敬字词列表执行基于字词的匹配Profanity: term-based matching with built-in list of profane terms in various languages
  • 分类:在机器的辅助下分类成三个类别Classification: machine-assisted classification into three categories
  • 个人数据Personal data
  • 自动更正的文本Auto-corrected text
  • 原始文本Original text
  • 语言Language

亵渎内容Profanity

如果 API 在任何受支持语言中检测到任何亵渎字词,这些字词会包含在响应中。If the API detects any profane terms in any of the supported languages, those terms are included in the response. 响应还会包含这些字词在原始文本中的位置 (Index)。The response also contains their location (Index) in the original text. 以下示例 JSON 中的 ListId 引用自定义字词列表(如果有)中找到的字词。The ListId in the following sample JSON refers to terms found in custom term lists if available.

"Terms": [
{
    "Index": 118,
    "OriginalIndex": 118,
    "ListId": 0,
    "Term": "crap"
}

备注

对于 language 参数,请分配 eng 或将其留空以查看机器辅助的分类响应(预览功能)。For the language parameter, assign eng or leave it empty to see the machine-assisted classification response (preview feature). 此功能仅支持英语This feature supports English only.

对于不雅用语检测,请使用本文所列的支持语言的 ISO 639-3 代码或留空。For profanity terms detection, use the ISO 639-3 code of the supported languages listed in this article, or leave it empty.

分类Classification

内容审查器的机器辅助文本分类功能仅支持英语,可帮助检测可能不需要的内容。Content Moderator’s machine-assisted text classification feature supports English only, and helps detect potentially undesired content. 根据上下文,可能会将标记的内容评估为不合适。The flagged content may be assessed as inappropriate depending on context. 它传达了每个类别的可能性,并可能建议进行人工审核。It conveys the likelihood of each category and may recommend a human review. 该功能使用训练的模型来识别可能的辱骂、贬损或歧视性语言。The feature uses a trained model to identify possible abusive, derogatory or discriminatory language. 要评审的内容包括俚语、缩写词、冒犯性言语,以及有意拼错的单词。This includes slang, abbreviated words, offensive, and intentionally misspelled words for review.

以下 JSON 摘录内容显示了示例输出:The following extract in the JSON extract shows an example output:

"Classification": {
    "ReviewRecommended": true,
    "Category1": {
        "Score": 1.5113095059859916E-06
        },
    "Category2": {
        "Score": 0.12747249007225037
        },
    "Category3": {
        "Score": 0.98799997568130493
    }
}

说明Explanation

  • Category1 表示可能存在某些情况下被视为色情或成人性质的语言。Category1 refers to potential presence of language that may be considered sexually explicit or adult in certain situations.
  • Category2 表示可能存在某些情况下被视为性暗示或过于成熟的语言。Category2 refers to potential presence of language that may be considered sexually suggestive or mature in certain situations.
  • Category3 表示可能存在某些情况下被视为具攻击性的语言。Category3 refers to potential presence of language that may be considered offensive in certain situations.
  • Score 介于 0 和 1 之间。Score is between 0 and 1. 评分越高,模型预测类别可能适用的可能性越高。The higher the score, the higher the model is predicting that the category may be applicable. 此功能依赖于统计模型,而不是人工编码结果。This feature relies on a statistical model rather than manually coded outcomes. 我们建议你对自己的内容进行测试,以确定每个类别是否符合要求。We recommend testing with your own content to determine how each category aligns to your requirements.
  • ReviewRecommended 为 true 或 false,具体情况取决于内部评分阈值。ReviewRecommended is either true or false depending on the internal score thresholds. 客户应评估是使用该值,还是根据他们的内容策略确定自定义阈值。Customers should assess whether to use this value or decide on custom thresholds based on their content policies.

个人数据Personal data

PII 功能检测可能存在以下信息:The PII feature detects the potential presence of this information:

  • 电子邮件地址Email address
  • 美国邮寄地址US Mailing address
  • IP 地址IP address
  • 美国电话号码US Phone number
  • 英国电话号码UK Phone number
  • 社会安全号码 (SSN)Social Security Number (SSN)

以下示例显示了示例响应:The following example shows a sample response:

"PII": {
    "Email": [{
        "Detected": "abcdef@abcd.com",
        "SubType": "Regular",
        "Text": "abcdef@abcd.com",
        "Index": 32
        }],
    "IPA": [{
        "SubType": "IPV4",
        "Text": "255.255.255.255",
        "Index": 72
        }],
    "Phone": [{
        "CountryCode": "US",
        "Text": "6657789887",
        "Index": 56
        }, {
        "CountryCode": "US",
        "Text": "870 608 4000",
        "Index": 212
        }, {
        "CountryCode": "UK",
        "Text": "+44 870 608 4000",
        "Index": 208
        }, {
        "CountryCode": "UK",
        "Text": "0344 800 2400",
        "Index": 228
        }, {
        "CountryCode": "UK",
        "Text": "0800 820 3300",
        "Index": 245
        }],
    "Address": [{
        "Text": "1 Microsoft Way, Redmond, WA 98052",
        "Index": 89
        }],
    "SSN": [{
        "Text": "999999999",
        "Index": 56
        }, {
        "Text": "999-99-9999",
        "Index": 267
        }]
    }

自动更正Auto-correction

假设输入文本为(“lzay”和“f0x”是有意拼错的):Suppose the input text is (the ‘lzay’ and 'f0x' are intentional):

The qu!ck brown f0x jumps over the lzay dog.

如果请求执行自动更正,则响应会包含更正后的文本版本:If you ask for auto-correction, the response contains the corrected version of the text:

The quick brown fox jumps over the lazy dog.

创建和管理自定义字词列表Creating and managing your custom lists of terms

尽管在默认情况下,全局字词列表能够很好地满足大部分需要,但你可能想要根据自己的具体业务需求筛选字词。While the default, global list of terms works great for most cases, you may want to screen against terms that are specific to your business needs. 例如,你可能想要从用户的发布内容中,筛选出所有竞争品牌名称。For example, you may want to filter out any competitive brand names from posts by users.

备注

最多只能使用 5 个术语列表,每个列表中的术语数不得超过 10,000 个。There is a maximum limit of 5 term lists with each list to not exceed 10,000 terms.

以下示例显示匹配的列表 ID:The following example shows the matching List ID:

"Terms": [
{
    "Index": 118,
    "OriginalIndex": 118,
    "ListId": 231.
    "Term": "crap"
}

内容审查器提供字词列表 API 和相应的操作用于管理自定义字词列表。The Content Moderator provides a Term List API with operations for managing custom term lists. 请从字词列表 API 控制台开始,使用 REST API 代码示例。Start with the Term Lists API Console and use the REST API code samples. 如果你熟悉 Visual Studio 和 C#,另请参阅字词列表 .NET 快速入门Also check out the Term Lists .NET quickstart if you are familiar with Visual Studio and C#.

后续步骤Next steps

体验文本审查 API 控制台并使用 REST API 代码示例。Test drive the Text moderation API console and use the REST API code samples. 如果你熟悉 Visual Studio 和 C#,另请参阅文本审查 .NET 快速入门Also check out the Text moderation .NET quickstart if you're familiar with Visual Studio and C#.