Entity Recognition cognitive skill

The Entity Recognition skill extracts entities of different types from text. This skill uses the machine learning models provided by Text Analytics in Cognitive Services.

Note

As you expand scope by increasing the frequency of processing, adding more documents, or adding more AI algorithms, you will need to attach a billable Cognitive Services resource. Charges accrue when calling APIs in Cognitive Services, and for image extraction as part of the document-cracking stage in Azure Cognitive Search. There are no charges for text extraction from documents.

Execution of built-in skills is charged at the existing Cognitive Services pay-as-you go price. Image extraction pricing is described on the Azure Cognitive Search pricing page.

@odata.type

Microsoft.Skills.Text.EntityRecognitionSkill

Data limits

The maximum size of a record should be 50,000 characters as measured by String.Length. If you need to break up your data before sending it to the key phrase extractor, consider using the Text Split skill.

Skill parameters

Parameters are case-sensitive and are all optional.

Parameter name Description
categories Array of categories that should be extracted. Possible category types: "Person", "Location", "Organization", "Quantity", "Datetime", "URL", "Email". If no category is provided, all types are returned.
defaultLanguageCode Language code of the input text. The following languages are supported: de, en, es, fr, it
minimumPrecision A value between 0 and 1. If the confidence score (in the namedEntities output) is lower than this value, the entity is not returned. The default is 0.
includeTypelessEntities Set to true if you want to recognize well-known entities that don't fit the current categories. Recognized entities are returned in the entities complex output field. For example, "Windows 10" is a well-known entity (a product), but since "Products" is not a supported category, this entity would be included in the entities output field. Default is false

Skill inputs

Input name Description
languageCode Optional. Default is "en".
text The text to analyze.

Skill outputs

Note

Not all entity categories are supported for all languages. Only en, es support extraction of "Quantity", "Datetime", "URL", "Email" types.

Output name Description
persons An array of strings where each string represents the name of a person.
locations An array of strings where each string represents a location.
organizations An array of strings where each string represents an organization.
quantities An array of strings where each string represents a quantity.
dateTimes An array of strings where each string represents a DateTime (as it appears in the text) value.
urls An array of strings where each string represents a URL
emails An array of strings where each string represents an email
namedEntities An array of complex types that contains the following fields:
  • category
  • value (The actual entity name)
  • offset (The location where it was found in the text)
  • confidence (Higher value means it's more to be a real entity)
entities An array of complex types that contains rich information about the entities extracted from text, with the following fields
  • name (the actual entity name. This represents a "normalized" form)
  • wikipediaId
  • wikipediaLanguage
  • wikipediaUrl (a link to Wikipedia page for the entity)
  • bingId
  • type (the category of the entity recognized)
  • subType (available only for certain categories, this gives a more granular view of the entity type)
  • matches (a complex collection that contains)
    • text (the raw text for the entity)
    • offset (the location where it was found)
    • length (the length of the raw entity text)

Sample definition

  {
    "@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
    "categories": [ "Person", "Email"],
    "defaultLanguageCode": "en",
    "includeTypelessEntities": true,
    "minimumPrecision": 0.5,
    "inputs": [
      {
        "name": "text",
        "source": "/document/content"
      }
    ],
    "outputs": [
      {
        "name": "persons",
        "targetName": "people"
      },
      {
        "name": "emails",
        "targetName": "contact"
      },
      {
        "name": "entities"
      }
    ]
  }

Sample input

{
    "values": [
      {
        "recordId": "1",
        "data":
           {
             "text": "Contoso corporation was founded by John Smith. They can be reached at contact@contoso.com",
             "languageCode": "en"
           }
      }
    ]
}

Sample output

{
  "values": [
    {
      "recordId": "1",
      "data" : 
      {
        "persons": [ "John Smith"],
        "emails":["contact@contoso.com"],
        "namedEntities": 
        [
          {
            "category":"Person",
            "value": "John Smith",
            "offset": 35,
            "confidence": 0.98
          }
        ],
        "entities":  
        [
          {
            "name":"John Smith",
            "wikipediaId": null,
            "wikipediaLanguage": null,
            "wikipediaUrl": null,
            "bingId": null,
            "type": "Person",
            "subType": null,
            "matches": [{
                "text": "John Smith",
                "offset": 35,
                "length": 10
            }]
          },
          {
            "name": "contact@contoso.com",
            "wikipediaId": null,
            "wikipediaLanguage": null,
            "wikipediaUrl": null,
            "bingId": null,
            "type": "Email",
            "subType": null,
            "matches": [
            {
                "text": "contact@contoso.com",
                "offset": 70,
                "length": 19
            }]
          },
          {
            "name": "Contoso",
            "wikipediaId": "Contoso",
            "wikipediaLanguage": "en",
            "wikipediaUrl": "https://en.wikipedia.org/wiki/Contoso",
            "bingId": "349f014e-7a37-e619-0374-787ebb288113",
            "type": null,
            "subType": null,
            "matches": [
            {
                "text": "Contoso",
                "offset": 0,
                "length": 7
            }]
          }
        ]
      }
    }
  ]
}

Error cases

If the language code for the document is unsupported, an error is returned and no entities are extracted.

See also