How to use Named Entity Recognition in Text Analytics

The Text Analytics API lets you takes unstructured text and returns a list of disambiguated entities, with links to more information on the web. The API supports both named entity recognition (NER) and entity linking.

Entity Linking

Entity linking is the ability to identify and disambiguate the identity of an entity found in text (for example, determining whether an occurrence of the word "Mars" refers to the planet, or to the Roman god of war). This process requires the presence of a knowledge base in an appropriate language, to link recognized entities in text. Entity Linking uses Wikipedia as this knowledge base.

Named Entity Recognition (NER)

Named Entity Recognition (NER) is the ability to identify different entities in text and categorize them into pre-defined classes or types such as: person, location, event, product and organization.

Named Entity Recognition versions and features

Important

Text Analytics API v3 is not available in the following regions: Central India, Fairfax, UAE North, South Africa North, China North 2, China East.

Feature NER v3.0 NER v3.1-preview.1
Methods for single, and batch requests X X
Expanded entity recognition across several categories X X
Separate endpoints for sending entity linking and NER requests. X X
Recognition of personal (PII) and health (PHI) information entities X

See language support for information.

Entity types

Named Entity Recognition v3 provides expanded detection across multiple types. Currently, NER v3.0 can recognize entities in the general entity category.

Named Entity Recognition v3.1-preview.1 includes the detection capabilities of v3.0, and the ability to detect personal information (PII) using the v3.1-preview.1/entities/recognition/pii endpoint. You can use the optional domain=phi parameter to detect confidential health information (PHI). See the entity categories article, and request endpoints section below for more information.

Sending a REST API request

Preparation

You must have JSON documents in this format: ID, text, language.

Each document must be under 5,120 characters, and you can have up to 1,000 items (IDs) per collection. The collection is submitted in the body of the request.

Structure the request

Create a POST request. You can use Postman or the API testing console in the following links to quickly structure and send one.

Note

You can find your key and endpoint for your Text Analytics resource on the azure portal. They will be located on the resource's Quick start page, under resource management.

Request endpoints

Named Entity Recognition v3 uses separate endpoints for NER and entity linking requests. Use a URL format below based on your request:

Entity linking

  • https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0/entities/linking

NER

  • https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0/entities/recognition/general

Set a request header to include your Text Analytics API key. In the request body, provide the JSON documents you prepared.

Example NER request

The following is an example of content you might send to the API. The request format is the same for both versions of the API.

{
  "documents": [
    {
        "id": "1",
        "language": "en",
        "text": "Our tour guide took us up the Space Needle during our trip to Seattle last week."
    }
  ]
}

Post the request

Analysis is performed upon receipt of the request. See the data limits section in the overview for information on the size and number of requests you can send per minute and second.

The Text Analytics API is stateless. No data is stored in your account, and results are returned immediately in the response.

View results

All POST requests return a JSON formatted response with the IDs and detected entity properties.

Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See how to process text offsets for more information.

Example v3 responses

Version 3 provides separate endpoints for NER and entity linking. The responses for both operations are below.

Example NER response

{
  "documents": [
    {
      "id": "1",
      "entities": [
        {
          "text": "tour guide",
          "category": "PersonType",
          "offset": 4,
          "length": 10,
          "confidenceScore": 0.45
        },
        {
          "text": "Space Needle",
          "category": "Location",
          "offset": 30,
          "length": 12,
          "confidenceScore": 0.38
        },
        {
          "text": "trip",
          "category": "Event",
          "offset": 54,
          "length": 4,
          "confidenceScore": 0.78
        },
        {
          "text": "Seattle",
          "category": "Location",
          "subcategory": "GPE",
          "offset": 62,
          "length": 7,
          "confidenceScore": 0.78
        },
        {
          "text": "last week",
          "category": "DateTime",
          "subcategory": "DateRange",
          "offset": 70,
          "length": 9,
          "confidenceScore": 0.8
        }
      ],
      "warnings": []
    }
  ],
  "errors": [],
  "modelVersion": "2020-04-01"
}

Example entity linking response

{
  "documents": [
    {
      "id": "1",
      "entities": [
        {
          "name": "Space Needle",
          "matches": [
            {
              "text": "Space Needle",
              "offset": 30,
              "length": 12,
              "confidenceScore": 0.4
            }
          ],
          "language": "en",
          "id": "Space Needle",
          "url": "https://en.wikipedia.org/wiki/Space_Needle",
          "dataSource": "Wikipedia"
        },
        {
          "name": "Seattle",
          "matches": [
            {
              "text": "Seattle",
              "offset": 62,
              "length": 7,
              "confidenceScore": 0.25
            }
          ],
          "language": "en",
          "id": "Seattle",
          "url": "https://en.wikipedia.org/wiki/Seattle",
          "dataSource": "Wikipedia"
        }
      ],
      "warnings": []
    }
  ],
  "errors": [],
  "modelVersion": "2020-02-01"
}

Summary

In this article, you learned concepts and workflow for entity linking using Text Analytics in Cognitive Services. In summary:

  • JSON documents in the request body include an ID, text, and language code.
  • POST requests are sent to one or more endpoints, using a personalized access key and an endpoint that is valid for your subscription.
  • Response output, which consists of linked entities (including confidence scores, offsets, and web links, for each document ID) can be used in any application

Next steps