Text Analytics

This article describes the text analytics modules included in Azure Machine Learning Studio (classic). These modules provide specialized computational tools for working with both structured and unstructured text, including:

  • Multiple options for preprocessing text.
  • Language detection.
  • Creation of features from text using customizable n-gram dictionaries.
  • Feature hashing, to efficiently analyze text without preprocessing or advanced linguistic analysis.
  • Vowpal Wabbit, for very fast machine learning on text. Vowpal Wabbit supports feature hashing, topic modeling (LDA), and classification.
  • Named entity recognition, to extract the names of people, places, and organizations from unstructured text.


Applies to: Machine Learning Studio (classic)

This content pertains only to Studio (classic). Similar drag and drop modules have been added to Azure Machine Learning designer. Learn more in this article comparing the two versions.


For examples of text analytics using Azure Machine Learning, see the Azure AI Gallery:

  • News categorization: Uses feature hashing to classify articles into a predefined list of categories.

  • Find similar companies: Uses the text of Wikipedia articles to categorize companies.

  • Text classification: Demonstrates the end-to-end process of using text from Twitter messages in sentiment analysis (five-part sample).

List of modules

The Text Analytics category in Azure Machine Learning Studio (classic) includes these modules:

See also