如何在文字分析中使用命名實體辨識How to use Named Entity Recognition in Text Analytics

文字分析 API 可讓您採用非結構化的文字,並傳回一份清楚的實體清單,其中包含網路上詳細資訊的連結。The Text Analytics API lets you takes unstructured text and returns a list of disambiguated entities, with links to more information on the web. API 支援 (NER) 與實體連結的命名實體辨識。The API supports both named entity recognition (NER) and entity linking.

實體連結Entity Linking

實體連結是識別和區分在文字中找到之實體身分識別的能力 (例如,判斷是否出現 "Mars" 一字是指地球,或 war) 的羅馬上帝。Entity linking is the ability to identify and disambiguate the identity of an entity found in text (for example, determining whether an occurrence of the word "Mars" refers to the planet, or to the Roman god of war). 此程式需要以適當的語言存在知識庫,以連結文字中可辨識的實體。This process requires the presence of a knowledge base in an appropriate language, to link recognized entities in text. 實體連結會使用 維琪百科 作為此知識庫。Entity Linking uses Wikipedia as this knowledge base.

具名實體辨識 (NER)Named Entity Recognition (NER)

命名實體辨識 (NER) 能夠識別文字中的不同實體,並將它們分類成預先定義的類別或類型,例如: person、location、event、product 和組織。Named Entity Recognition (NER) is the ability to identify different entities in text and categorize them into pre-defined classes or types such as: person, location, event, product and organization.

命名實體辨識版本和功能Named Entity Recognition versions and features

重要

文字分析 API v3 在下列區域無法使用:印度中部、阿拉伯聯合大公國北部、中國北部 2、中國東部。Text Analytics API v3 is not available in the following regions: Central India, UAE North, China North 2, China East.

功能Feature NER v3。0NER v3.0 NER 3.1-preview. 2NER v3.1-preview.2
單一和批次要求的方法Methods for single, and batch requests XX XX
跨多個類別展開實體辨識Expanded entity recognition across several categories XX XX
用於傳送實體連結和 NER 要求的不同端點。Separate endpoints for sending entity linking and NER requests. XX XX
辨識個人 (PII) 和健全狀況 (PHI) 資訊實體Recognition of personal (PII) and health (PHI) information entities XX

請參閱 語言支援 以取得資訊。See language support for information.

實體類型Entity types

命名實體辨識 v3 提供跨多個類型的擴充偵測。Named Entity Recognition v3 provides expanded detection across multiple types. 目前,NER v3.0 可以辨識 一般實體類別中的實體。Currently, NER v3.0 can recognize entities in the general entity category.

命名實體辨識 3.1-preview。2包括 v3.0 的偵測功能,以及使用端點偵測個人資訊 () 的功能 PII v3.1-preview.2/entities/recognition/piiNamed Entity Recognition v3.1-preview.2 includes the detection capabilities of v3.0, and the ability to detect personal information (PII) using the v3.1-preview.2/entities/recognition/pii endpoint. 您可以使用選擇性 domain=phi 參數來偵測 () 的機密健康情況資訊 PHIYou can use the optional domain=phi parameter to detect confidential health information (PHI). 如需詳細資訊,請參閱下面的「 實體類別 」一文和「 要求端點 」一節。See the entity categories article, and request endpoints section below for more information.

傳送 REST API 要求Sending a REST API request

準備Preparation

您必須具有下列格式的 JSON 檔:識別碼、文字、語言。You must have JSON documents in this format: ID, text, language.

每份檔都必須使用5120個字元,而且每個集合可以有多達1000個專案 (識別碼) 。Each document must be under 5,120 characters, and you can have up to 1,000 items (IDs) per collection. 集合會在要求本文中提交。The collection is submitted in the body of the request.

建立要求結構Structure the request

建立 POST 要求。Create a POST request. 您可以 使用 下列連結中的 Postman 或 API 測試主控台 ,快速地結構和傳送一個。You can use Postman or the API testing console in the following links to quickly structure and send one.

注意

您可以在 Azure 入口網站上找到適用於文字分析資源的金鑰和端點。You can find your key and endpoint for your Text Analytics resource on the azure portal. 您可以在 [資源管理] 下的資源 [快速啟動] 頁面中找到。They will be located on the resource's Quick start page, under resource management.

要求端點Request endpoints

命名實體辨識會 v3.1-preview.2 針對 NER 和實體連結要求使用不同的端點。Named Entity Recognition v3.1-preview.2 uses separate endpoints for NER and entity linking requests. 根據您的要求使用下列 URL 格式:Use a URL format below based on your request:

實體連結Entity linking

  • https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1-preview.2/entities/linking

命名實體辨識版本 3.1-預覽參考 LinkingNamed Entity Recognition version 3.1-preview reference for Linking

NERNER

  • 一般實體- https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1-preview.2/entities/recognition/generalGeneral entities - https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1-preview.2/entities/recognition/general

命名實體辨識版本 3.1-預覽參考 GeneralNamed Entity Recognition version 3.1-preview reference for General

  • 個人 (PII) 資訊- https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1-preview.2/entities/recognition/piiPersonal (PII) information - https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1-preview.2/entities/recognition/pii

您也可以使用選擇性 domain=phi 參數來偵測 PHI 文字中) 資訊 (健康情況。You can also use the optional domain=phi parameter to detect health (PHI) information in text.

https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.1-preview.2/entities/recognition/pii?domain=phi

請注意, redactedText 包含已修改輸入文字的回應 JSON 中的屬性,其中會針對實體的每個字元,將所偵測到的 PII 實體取代為 *。Please note the addition of the redactedText property in the response JSON which contains the modified input text where the detected PII entities are replaced by an * for each character of the entities.

命名實體辨識版本 3.1-預覽參考 PIINamed Entity Recognition version 3.1-preview reference for PII

設定要求標頭以包含您的文字分析 API 金鑰。Set a request header to include your Text Analytics API key. 在要求主體中,提供您準備的 JSON 檔。In the request body, provide the JSON documents you prepared.

範例 NER 要求Example NER request

以下是您可能傳送給 API 的內容範例。The following is an example of content you might send to the API. 這兩個 API 版本的要求格式是相同的。The request format is the same for both versions of the API.

{
  "documents": [
    {
        "id": "1",
        "language": "en",
        "text": "Our tour guide took us up the Space Needle during our trip to Seattle last week."
    }
  ]
}

張貼要求Post the request

分析會在接收要求時執行。Analysis is performed upon receipt of the request. 請參閱概觀中的資料限制一節,以取得您每分鐘和每秒鐘可以傳送的要求大小和數量資訊。See the data limits section in the overview for information on the size and number of requests you can send per minute and second.

文字分析 API 是無狀態的。The Text Analytics API is stateless. 您的帳戶中不會儲存任何資料,且結果會在回應中立即傳回。No data is stored in your account, and results are returned immediately in the response.

檢視結果View results

所有 POST 要求都會傳回 JSON 格式的回應,其中包含識別碼和偵測到的實體屬性。All POST requests return a JSON formatted response with the IDs and detected entity properties.

輸出會立即傳回。Output is returned immediately. 您可以將結果串流處理到可接受 JSON 的應用程式,或將輸出儲存到本機系統上的檔案,然後將它匯入能讓您排序、搜尋和操作資料的應用程式。You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data. 由於多語系和表情符號的支援,回應可能會包含文字位移。Due to multilingual and emoji support, the response may contain text offsets. 如需詳細資訊,請參閱 如何處理文字位移See how to process text offsets for more information.

範例回應Example responses

第3版提供適用于一般 NER、PII 和實體連結的個別端點。Version 3 provides separate endpoints for general NER, PII and entity linking. 這兩項作業的回應如下。The responses for both operations are below.

PII 回應的範例:Example of a PII response:

{
  "documents": [
    {
    "redactedText": "You can even pre-order from their online menu at *************************, call ************ or send email to ***************************!",
    "id": "0",
    "entities": [
        {
        "text": "www.contososteakhouse.com",
        "category": "URL",
        "offset": 49,
        "length": 25,
        "confidenceScore": 0.8
        }, 
        {
        "text": "312-555-0176",
        "category": "Phone Number",
        "offset": 81,
        "length": 12,
        "confidenceScore": 0.8
        }, 
        {
        "text": "order@contososteakhouse.com",
        "category": "Email",
        "offset": 111,
        "length": 27,
        "confidenceScore": 0.8
        }
      ],
    "warnings": []
    }
  ],
  "errors": [],
  "modelVersion": "2020-07-01"
}

實體連結回應的範例:Example of an Entity linking response:

{
  "documents": [
    {
      "id": "1",
      "entities": [
        {
          "bingId": "f8dd5b08-206d-2554-6e4a-893f51f4de7e", 
          "name": "Space Needle",
          "matches": [
            {
              "text": "Space Needle",
              "offset": 30,
              "length": 12,
              "confidenceScore": 0.4
            }
          ],
          "language": "en",
          "id": "Space Needle",
          "url": "https://en.wikipedia.org/wiki/Space_Needle",
          "dataSource": "Wikipedia"
        },
        {
          "bingId": "5fbba6b8-85e1-4d41-9444-d9055436e473",
          "name": "Seattle",
          "matches": [
            {
              "text": "Seattle",
              "offset": 62,
              "length": 7,
              "confidenceScore": 0.25
            }
          ],
          "language": "en",
          "id": "Seattle",
          "url": "https://en.wikipedia.org/wiki/Seattle",
          "dataSource": "Wikipedia"
        }
      ],
      "warnings": []
    }
  ],
  "errors": [],
  "modelVersion": "2020-02-01"
}

摘要Summary

在本文中,您已了解在認知服務中使用文字分析的實體連結概念和工作流程。In this article, you learned concepts and workflow for entity linking using Text Analytics in Cognitive Services. 摘要說明:In summary:

  • 要求本文中的 JSON 文件包含識別碼、文字和語言代碼。JSON documents in the request body include an ID, text, and language code.
  • POST 要求會傳送至一或多個端點,使用個人化 存取金鑰和 對您訂用帳戶有效的端點。POST requests are sent to one or more endpoints, using a personalized access key and an endpoint that is valid for your subscription.
  • 由連結實體 (包含每個文件識別碼的信賴分數、位移和網頁連結) 組成的回應輸出可用於任何應用程式Response output, which consists of linked entities (including confidence scores, offsets, and web links, for each document ID) can be used in any application

後續步驟Next steps