Text Analytics

This article describes the text analytics modules included in Azure Machine Learning Studio. These modules provide specialized computational tools for working with both structured and unstructured text, including:

  • Multiple options for preprocessing text.
  • Language detection.
  • Creation of features from text using customizable n-gram dictionaries.
  • Feature hashing, to efficiently analyze text without preprocessing or advanced linguistic analysis.
  • Vowpal Wabbit, for very fast machine learning on text. Vowpal Wabbit supports feature hashing, topic modeling (LDA), and classification.
  • Named entity recognition, to extract the names of people, places, and organizations from unstructured text.

Examples

For examples of text analytics using Azure Machine Learning, see the Azure AI Gallery:

  • News categorization: Uses feature hashing to classify articles into a predefined list of categories.

  • Find similar companies: Uses the text of Wikipedia articles to categorize companies.

  • Text classification: Demonstrates the end-to-end process of using text from Twitter messages in sentiment analysis (five-part sample).

List of modules

The Text Analytics category in Azure Machine Learning Studio includes these modules:

See also