Extract Key Phrases from Text
Extracts key phrases from given text
Category: Text Analytics
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
This article explains how to use the Extract Key Phrases from Text module in Azure Machine Learning Studio, to pre-process a text column. Given a column of natural language text, the module extracts one or more meaningful phrases. A phrase might be a single word, a compound noun, or a modifier plus a noun.
This module is a wrapper for natural language processing APIs for key-phrase extraction. The phrases are analyzed as potentially meaningful in the context of the sentence for various reasons:
- The phrase captures the topic of the sentence.
- The phrase contains a combination of modifier and noun that indicates sentiment.
For example, assume the sentence analyzed is: "It was a wonderful hotel to stay at, with unique decor and friendly staff."
The Extract Key Phrases from Text module might return these key phrases:
- wonderful hotel
- friendly staff
- unique decor
How to configure Extract Key Phrases from Text
To extract key phrases, you must connect a dataset that has a column of text.
Add the Extract Key Phrases from Text module to your experiment in Azure Machine Learning Studio. Then, connect a dataset that has at least one full-text column.
Use the Column Selector to select a column of type string, from which to extract key phrases.
For Language, select a language to use when analyzing phrases. If you specify a language, only phrases in the target language will be output.
If the text column contains phrases in multiple languages, choose the option, Language identified in columns. A new column selector is displayed that lets you select a column in your data set that contains a language identifier. The language identifier can either be the language name or the Iso6391 culture identifier. For example, either "English" or "en" are acceptable.
Before running Extract Key Phrases from Text, use the Detect Languages module to identify the language in each row and generate the identifier for you. An error is raised if the language identifier column contains any languages not supported by Extract Key Phrases from Text.
The output of the module is a dataset containing a column of comma-separated key phrases.
For example, the following example results are for an input dataset containing reviews in multiple languages:
|novel,nuclear submarine,good book,adventure story,avalanche of events,good characters|
All output phrases are contained in a single column; no other columns are passed through, and an identifier is not added. However, if you want to align the output phrases with the source text, you can recombine the output phrases with the input by using the Add Columns module.
The output of key-phrase extraction does not flag the language of individual phrases.
If a language is included that is not supported by the Extract Key Phrases module, an error is raised (0039). To avoid errors, be sure to filter out input text that has an incompatible language identifier.
If there are very few rows of other languages, you can also avoid the error by omitting the language identifier, and analyzing all text using a single language selection. However, when you do so, results are very poor, because entire sentences in the other languages might be output as a single key phrase.
The following example demonstrates how to use this module to extract key phrases and then build a word cloud from the phrases: Extract Key Phrases and Show Word Cloud
See the Azure AI Gallery for more examples of text processing using Azure Machine Learning.
This module currently supports the following languages:
|Dataset||Data Table||The table containing the text to be processed.|
|Culture-language column||ColumnSelection||language:Column contains language||Name or one-based index of the column containing the culture-language information|
|Text column||ColumnSelection||Required||Name or one-based index of the text column.|
|Language||T_Language||English, Spanish, French, Dutch, German, Italian, Column contains language||Required||English||Select the language of the text to be processed.|
|Results dataset||Data Table||The extracted key phrases|
|Error 0003||Exception occurs if one or more of inputs are null or empty.|
|Error 0010||Exception occurs if input datasets have column names that should match but do not.|
|Error 0016||Exception occurs if input datasets passed to the module should have compatible column types but do not.|
|Error 0008||Exception occurs if parameter is not in range.|
For a list of errors specific to Studio modules, see Machine Learning Error codes
For a list of API exceptions, see Machine Learning REST API Error Codes.