What is the Text Analytics API?
The Text Analytics API is a cloud-based service that provides advanced natural language processing over raw text, and includes four main functions: sentiment analysis, key phrase extraction, language detection, and entity recognition.
The API is a part of Azure Cognitive Services, a collection of machine learning and AI algorithms in the cloud for your development projects.
Text analysis can mean different things, but in Cognitive Services, the Text Analytics API provides four types of analysis as described below. You can use these features with the REST API, or a client library for .NET, Python, Node.js, Go, or Ruby.
Use sentiment analysis to find out what customers think of your brand or topic by analyzing raw text for clues about positive or negative sentiment. This API returns a sentiment score between 0 and 1 for each document, where 1 is the most positive.
The analysis models are pretrained using an extensive body of text and natural language technologies from Microsoft. For selected languages, the API can analyze and score any raw text that you provide, directly returning results to the calling application.
Key Phrase Extraction
Automatically extract key phrases to quickly identify the main points. For example, for the input text "The food was delicious and there were wonderful staff", the API returns the main talking points: "food" and "wonderful staff".
You can detect which language the input text is written in and report a single language code for every document submitted on the request in a wide range of languages, variants, dialects, and some regional/cultural languages. The language code is paired with a score indicating the strength of the score.
Named Entity Recognition
Identify and categorize entities in your text as people, places, organizations, date/time, quantities, percentages, currencies, and more. Well-known entities are also recognized and linked to more information on the web.
Use the Text Analytics containers to extract key phrases, detect language, and analyze sentiment locally, by installing standardized Docker containers closer to your data.
The workflow is simple: you submit data for analysis and handle outputs in your code. Analyzers are consumed as-is, with no additional configuration or customization.
Formulate a request containing your data as raw unstructured text, in JSON.
Post the request to the endpoint established during sign-up, appending the desired resource: sentiment analysis, key phrase extraction, language detection, or entity identification.
Stream or store the response locally. Depending on the request, results are either a sentiment score, a collection of extracted key phrases, or a language code.
Output is returned as a single JSON document, with results for each text document you posted, based on ID. You can subsequently analyze, visualize, or categorize the results into actionable insights.
Data is not stored in your account. Operations performed by the Text Analytics API are stateless, which means the text you provide is processed and results are returned immediately.
Text Analytics for multiple programming experience levels
You can start using the Text Analytics API in your processes, even if you don't have much experience in programming. Use these tutorials to learn how you can use the API to analyze text in different ways to fit your experience level.
- Minimal programming required:
- Programming experience recommended:
This section has been moved to a separate article for better discoverability. Refer to Supported languages in the Text Analytics API for this content.
All of the Text Analytics API endpoints accept raw text data. The current limit is 5,120 characters for each document; if you need to analyze larger documents, you can break them up into smaller chunks. If you still require a higher limit, contact us so that we can discuss your requirements.
|Maximum size of a single document||5,120 characters as measured by
|Maximum size of entire request||1 MB|
|Maximum number of documents in a request||1,000 documents|
Your rate limit will vary with your pricing tier.
|Tier||Requests per second||Requests per minute|
Requests are measured for each Text Analytics feature separately. For example, you can send the maximum number of requests for your pricing tier to each feature, at the same time.
The Text Analytics API uses Unicode encoding for text representation and character count calculations. Requests can be submitted in both UTF-8 and UTF-16 with no measurable differences in the character count. Unicode codepoints are used as the heuristic for character length and are considered equivalent for the purposes of text analytics data limits. If you use
StringInfo.LengthInTextElements to get the character count, you are using the same method we use to measure data size.
Create an Azure resource for Text Analytics to get a key and endpoint for your applications.
Quickstart is a walkthrough of the REST API calls written in C#. Learn how to submit text, choose an analysis, and view results with minimal code. If you prefer, you can start with the Python quickstart instead.
See what's new in the Text Analytics API for information on new releases and features.
Dig in a little deeper with this sentiment analysis tutorial using Azure Databricks.
Check out our list of blog posts and more videos on how to use the Text Analytics API with other tools and technologies in our External & Community Content page.