Introduction

Text analytics, also known as text mining or text data mining is a process of analyzing unstructured text data to derive high-quality meaningful information that can be used for predictive classification purposes or to respond to different types of business problems.

The following are some of the basic outputs generated from text analytics algorithms:

  • Text analytics can be used to find who, what, or where from a set of unstructured textual data. The algorithms can be designed to output keywords for a location, company name, or a product name.

  • It's also used to derive a meaningful, short, and concise summary of large volumes of text data.

  • Text mining algorithms scan the textual data to derive the theme or concept being conveyed in a given set of textual data.

  • Text analytics is also widely used to perform sentiment analysis based on the positive, negative, or neutral keywords used in the data.

Note

This module's labs can be completed for free using the Databricks 14-day trial, but you cannot use an Azure free trial subscription to create a Databricks workspace. To switch a free trial subscription to pay-as-you-go, go to your profile and change your subscription offer to pay-as-you-go. You may also need to remove the spending limit, and request a quota increase for vCPUs in your region. When you create your Azure Databricks workspace, you can select the Trial (Premium - 14-Days Free DBUs) pricing tier to give the workspace access to free Premium Azure Databricks DBUs for 14 days.

Learning objectives

In this module, you will:

  • Identify the different techniques for performing text analytics using Azure Databricks.
  • Train and evaluate multiple machine learning models for text classification.
  • Create a text classifier using deep learning.