Text Analytics

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

This article describes the text analytics modules included in Machine Learning Studio (classic). These modules provide specialized computational tools for working with both structured and unstructured text, including:

  • Multiple options for preprocessing text.
  • Language detection.
  • Creation of features from text using customizable n-gram dictionaries.
  • Feature hashing, to efficiently analyze text without preprocessing or advanced linguistic analysis.
  • Vowpal Wabbit, for very fast machine learning on text. Vowpal Wabbit supports feature hashing, topic modeling (LDA), and classification.
  • Named entity recognition, to extract the names of people, places, and organizations from unstructured text.

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Examples

For examples of text analytics using Machine Learning, see the Azure AI Gallery:

  • News categorization: Uses feature hashing to classify articles into a predefined list of categories.

  • Find similar companies: Uses the text of Wikipedia articles to categorize companies.

  • Text classification: Demonstrates the end-to-end process of using text from Twitter messages in sentiment analysis (five-part sample).

List of modules

The Text Analytics category in Machine Learning Studio (classic) includes these modules:

See also