Text Analytics

This article describes the text analytics modules included in Azure Machine Learning Studio. These modules provide specialized computational tools for working with both structured and unstructured text, including:

  • Multiple options for preprocessing text.
  • Language detection.
  • Creation of features from text using customizable n-gram dictionaries.
  • Feature hashing, to efficiently analyze text without preprocessing or advanced linguistic analysis.
  • Vowpal Wabbit, for very fast machine learning on text. Vowpal Wabbit supports feature hashing, topic modeling (LDA), and classification.
  • Named entity recognition, to extract the names of people, places, and organizations from unstructured text.

Note

Applies to: Machine Learning Studio

This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.

Examples

For examples of text analytics using Azure Machine Learning, see the Azure AI Gallery:

  • News categorization: Uses feature hashing to classify articles into a predefined list of categories.

  • Find similar companies: Uses the text of Wikipedia articles to categorize companies.

  • Text classification: Demonstrates the end-to-end process of using text from Twitter messages in sentiment analysis (five-part sample).

List of modules

The Text Analytics category in Azure Machine Learning Studio includes these modules:

See also