Key Phrase Extraction cognitive skill

The Key Phrase Extraction skill evaluates unstructured text, and for each record, returns a list of key phrases. This skill uses the machine learning models provided by Text Analytics in Cognitive Services.

This capability is useful if you need to quickly identify the main talking points in the record. For example, given input text "The food was delicious and there were wonderful staff", the service returns "food" and "wonderful staff".

Note

Starting December 21, 2018, you can attach a Cognitive Services resource with an Azure Search skillset. This allows us to start charging for skillset execution. On this date, we also began charging for image extraction as part of the document-cracking stage. Text extraction from documents continues to be offered at no additional cost.

Built-in cognitive skill execution is charged at the Cognitive Services pay-as-you go price, at the same rate as if you had performed the task directly. Image extraction is an Azure Search charge, currently offered at preview pricing. For details, see the Azure Search pricing page or How billing works.

@odata.type

Microsoft.Skills.Text.KeyPhraseExtractionSkill

Data limits

The maximum size of a record should be 50,000 characters as measured by String.Length. If you need to break up your data before sending it to the key phrase extractor, consider using the Text Split skill.

Skill parameters

Parameters are case-sensitive.

Inputs Description
defaultLanguageCode (Optional) The language code to apply to documents that don't specify language explicitly. If the default language code is not specified, English (en) will be used as the default language code.
See Full list of supported languages.
maxKeyPhraseCount (Optional) The maximum number of key phrases to produce.

Skill inputs

Inputs Description
text The text to be analyzed.
languageCode A string indicating the language of the records. If this parameter is not specified, the default language code will be used to analyze the records.
See Full list of supported languages

Sample definition

 {
    "@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
    "inputs": [
      {
        "name": "text",
        "source": "/document/text"
      },
      {
        "name": "languageCode",
        "source": "/document/languagecode" 
      }
    ],
    "outputs": [
      {
        "name": "keyPhrases",
        "targetName": "myKeyPhrases"
      }
    ]
  }

Sample input

{
    "values": [
      {
        "recordId": "1",
        "data":
           {
             "text": "Glaciers are huge rivers of ice that ooze their way over land, powered by gravity and their own sheer weight. They accumulate ice from snowfall and lose it through melting. As global temperatures have risen, many of the world’s glaciers have already started to shrink and retreat. Continued warming could see many iconic landscapes – from the Canadian Rockies to the Mount Everest region of the Himalayas – lose almost all their glaciers by the end of the century.",
             "language": "en"
           }
      }
    ]

Sample output

{
    "values": [
      {
        "recordId": "1",
        "data":
           {
            "keyPhrases": 
            [
              "world’s glaciers", 
              "huge rivers of ice", 
              "Canadian Rockies", 
              "iconic landscapes",
              "Mount Everest region",
              "Continued warming"
            ]
           }
      }
    ]
}

Errors and warnings

If you provide an unsupported language code, an error is generated and key phrases are not extracted. If your text is empty, a warning will be produced. If your text is larger than 50,000 characters, only the first 50,000 characters will be analyzed and a warning will be issued.

See also