How to use Named Entity Recognition in Text Analytics

The Text Analytics API lets you takes unstructured text and returns a list of disambiguated entities, with links to more information on the web. The API supports both named entity recognition (NER) and entity linking.

Entity Linking

Entity linking is the ability to identify and disambiguate the identity of an entity found in text (for example, determining whether an occurrence of the word Mars refers to the planet, or to the Roman god of war). This process requires the presence of a knowledge base in an appropriate language, to link recognized entities in text. Entity Linking uses Wikipedia as this knowledge base.

Named Entity Recognition (NER)

Named Entity Recognition (NER) is the ability to identify different entities in text and categorize them into pre-defined classes or types such as: person, location, event, product and organization.

Starting in version 3, this feature of the Text Analytics API can also identify personal and sensitive information types such as: phone number, Social Security Number, email address, and bank account number. Identifying these entities can help in classifying sensitive documents, and redacting personal information.

Named Entity Recognition versions and features

The Text Analytics API offers two versions of Named Entity Recognition - v2 and v3. Version 3 (Public preview) provides increased detail in the entities that can be detected and categorized.

Feature NER v2 NER v3
Methods for single, and batch requests X X
Basic entity recognition across several categories X X
Expanded classification for recognized entities X
Separate endpoints for sending entity linking and NER requests. X
Model versioning X

See language support for information.

Entity types

Named Entity Recognition v3 provides expanded detection across multiple types. Currently, NER v3 can recognize the following categories of entities:

  • General
  • Personal Information

For a detailed list of supported entities and languages, see the NER v3 supported entity types article.

Request endpoints

Named Entity Recognition v3 uses separate endpoints for NER and entity linking requests. Use a URL format below based on your request:

NER

  • General entities - https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0-preview.1/entities/recognition/general

  • Personal information - https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0-preview.1/entities/recognition/pii

Entity linking

  • https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0-preview.1/entities/linking

Model versioning

Version 3 of the Text Analytics API lets you choose the model version that is most current for your data. Use the optional model-version parameter to select the version of the model that is desired for your requests. If this parameter isn't specified the API will default to latest, the latest stable version. Even though you can use the newest model-version in any request, only some features are updated in each version. The table below describes which features have been updated in each model version:

Model version Features updated Latest version for:
2020-02-01 Entity recognition Entity recognition
2019-10-01 Entity recognition, Sentiment analysis Language detection, Key phrase extraction, Sentiment analysis

Each response from the v3 endpoints includes a model-version field specifying the model version that was used.

{
    "documents": […]
    "errors": []
    "model-version": "2019-10-01"
}

See What's new for details on the updates for these model versions.

Sending a REST API request

Preparation

You must have JSON documents in this format: ID, text, language.

Each document must be under 5,120 characters, and you can have up to 1,000 items (IDs) per collection. The collection is submitted in the body of the request.

Structure the request

Create a POST request. You can use Postman or the API testing console in the following links to quickly structure and send one.

Note

You can find your key and endpoint for your Text Analytics resource on the azure portal. They will be located on the resource's Quick start page, under resource management.

Named Entity Recognition v3 reference

Version 3 uses separate endpoints for NER and entity linking requests. Use a URL format below based on your request:

NER

  • General entities - https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0-preview.1/entities/recognition/general

  • Personal information entities - https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0-preview.1/entities/recognition/pii

Entity linking

  • https://<your-custom-subdomain>.cognitiveservices.azure.com/text/analytics/v3.0-preview.1/entities/linking

Set a request header to include your Text Analytics API key. In the request body, provide the JSON documents you prepared.

Example NER request

The following is an example of content you might send to the API. The request format is the same for both versions of the API.

{
  "documents": [
    {
      "language": "en",
      "id": "1",
      "text": "I had a wonderful trip to Seattle last week."
    }
  ]
}

Post the request

Analysis is performed upon receipt of the request. See the data limits section in the overview for information on the size and number of requests you can send per minute and second.

The Text Analytics API is stateless. No data is stored in your account, and results are returned immediately in the response.

View results

All POST requests return a JSON formatted response with the IDs and detected entity properties.

Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data.

Example v3 responses

Version 3 provides separate endpoints for NER and entity linking. The responses for both operations are below.

Example NER response

{
    "documents": [{
    "id": "1",
    "entities": [{
        "text": "Seattle",
        "type": "Location",
        "offset": 26,
        "length": 7,
        "score": 0.80624294281005859
    }, {
        "text": "last week",
        "type": "DateTime",
        "subtype": "DateRange",
        "offset": 34,
        "length": 9,
        "score": 0.8
    }]
    }],
    "errors": [],
    "modelVersion": "2019-10-01"
}

Example entity linking response

{
  "documents": [{
    "id": "1",
    "entities": [{
      "name": "Seattle",
      "matches": [{
        "text": "Seattle",
        "offset": 26,
        "length": 7,
        "score": 0.15046201222847677
      }],
      "language": "en",
      "id": "Seattle",
      "url": "https://en.wikipedia.org/wiki/Seattle",
      "dataSource": "Wikipedia"
    }]
  }],
  "errors": [],
  "modelVersion": "2019-10-01"
}

Summary

In this article, you learned concepts and workflow for entity linking using Text Analytics in Cognitive Services. In summary:

  • Named Entity Recognition is available for selected languages in two versions.
  • JSON documents in the request body include an ID, text, and language code.
  • POST requests are sent to one or more endpoints, using a personalized access key and an endpoint that is valid for your subscription.
  • Response output, which consists of linked entities (including confidence scores, offsets, and web links, for each document ID) can be used in any application

Next steps