TextAnalytics with AzureML
Text Analytics is the process of transforming text into information and actionable output. Text is prevalent in all the industries. If something doesn’t already exist in text format, it ultimately lands up as text for consumable purposes. Most of the Machine Learning algorithms depend on CSV and JSON formats, which are object representations in textual format. What we speak also gets converted to text when it comes to speech recognition applications. Why? Because it is easier for applications to parse and consume text. Understanding the context is a different story, and that is where Natural Language Processing comes into play by transforming text into understandable corpora and lexicons. What does Microsoft have to offer in Text Analytics? Combined, our applications process more text than anyone else in the world (Office and Bing).
The AzureML Team has produced a series of experiments for building Text Classification models.
Loading, editing, cleaning, partitioning, and filtering the dataset is covered in this module.
In the preprocessing step, this experiment demonstrates the importance of processing the text for cleaning the dataset. Some of the preprocessing examples include removing special characters, assigning contextual meaning to special characters and text symbols (e.g. J, LOL), removing duplicates, punctuations, and stop-words.
Step 3: Feature engineering
Step 3 has 2 parts:
Now that you have the data cleaned up, it’s time to extract features.
For a mathematical machine learning algorithm, textual features doesn’t make much sense, therefore, this module demonstrates the use of Feature Hashing to convert variable length text into numeric feature vectors. When its numbers, Math is happy. The step also demonstrates how to simplify the dimensions of the feature vectors using the “Filter Based Feature Selection” module.
The hard part is over. In this module, you select your favorite algorithm(s), and train the machine learning model.
Step 5: Deploy trained models as web services
Step 5 has two parts:
And finally, you deploy the web services to be used in your applications.
Text Analytics Web Service
If you don’t want to build an ML service from scratch, the AzureML team has also published a TextAnalytics web service in the Azure Datamarket with sample code and documentation.