Transparency note for Key Phrase Extraction


This article assumes that you're familiar with guidelines and best practices for the Text Analytics service. For more information, see Transparency note for Text Analytics.

The Text Analytics API's Key Phrase Extraction feature allows you to quickly identify the main concepts in text. For example, in the text "The food was delicious and there were wonderful staff", Key Phrase Extraction will return the main talking points: "food" and "wonderful staff". Non-essential words are discarded single terms or phrases that appear to be the subject or object of a sentence are returned.

Note that no confidence score is returned for this feature, unlike some other Text Analytics features.

Example use cases

Key Phrase Extraction is used in multiple scenarios across a variety of industries. Some examples include:

  • Enhancing search. Key phrases can be used to create a search index that can enhance search results. For example, customers can provide thousands of documents and then run Key Phrase Extraction on top of it using the built-in Azure Search Text Analytics skill. The outcome of this are key phrases from the input dataset, which can then be used to create an index. This index can be updated by running the skill again whenever there is a new document set available.
  • View aggregate trends in text data. For example, a word cloud can be generated with key phrases to help visualize key concepts in text comments or feedback. For example, a hotel could generate a word cloud based on key phrases identified in their comments and might see that people are commenting most frequently about the location, cleanliness and helpful staff.

Considerations when choosing a use case

Do not use

  • Do not use for automatic actions without human intervention for high risk scenarios. A person should always review source data when another person's economic situation, health or safety is affected.

Characteristics and limitations

Depending on your scenario and input data, you could experience different levels of performance. The following information is designed to help you understand key concepts about performance as they apply to using the Text Analytics key phrase extraction.

System limitations and best practices for enhancing performance

Unlike other Text Analytics models, the key phrase extraction model is an unsupervised model that is not trained on human labeled ground truth data. All of the noun phrases in the text sent to the service are detected and then ranked based on frequency and cooccurrence. Therefore, what is returned by the model may not agree with what a human would choose as the most important phrases. In some cases the model may appear partially correct, in that a noun is returned without the adjective that modifies it.

  • Longer text will perform better. Do not break your source text up into pieces like sentences or paragraphs. Send the entire text, for example, a complete customer review or paper abstract.
  • If your text includes some boilerplate or other text that has no topical relevance to the actual content you're trying to analyze, the words in this text will affect your results. For example, emails might have "Subject:", "Body:", "Sender:", etc. included in the text. We recommend removing any known text that is not part of the actual content you are trying to analyze before sending it to the service.

See also