Text Analytics API Version 2.0

Text Analytics API is a cloud-based service that provides advanced natural language processing over raw text, and includes three main functions: sentiment analysis, key phrase extraction, and language detection.

The API is backed by resources in Microsoft Cognitive Services, a collection of machine learning and AI algorithms in the cloud, readily consumable in your development projects.

Capabilities in Text Analytics

Text analysis can mean different things, but in Cognitive Services, APIs provide three types of analysis as described in the following table.

Operations Description APIs
Sentiment Analysis Find out what customers think of your brand or topic by analyzing raw text for clues about positive or negative sentiment. This API returns a sentiment score between 0 and 1 for each document, where 1 is the most positive.

Our models are pretrained using an extensive body of text and natural language technologies from Microsoft. For selected languages, the API can analyze and score any raw text that you provide, directly returning results to the calling application.

Key Phrase Extraction Automatically extract key phrases to quickly identify the main points. For example, for the input text ‘The food was delicious and there were wonderful staff’, the API returns the main talking points: ‘food’ and ‘wonderful staff’. REST
Language Detection For up to 120 languages, detect which language the input text is written in and report a single language code for every document submitted on the request. The language code is paired with a score indicating the strength of the score. REST

Typical workflow

The workflow is simple: you submit data for analysis and handle outputs in your code. Analyzers are consumed as-is, with no additional configuration or customization.

  1. Sign up for an access key. The key must be passed on each request.

  2. Formulate a request containing your data as raw unstructured text, in JSON.

  3. Post the request to the endpoint established during sign-up, appending the desired resource: sentiment analysis, key phrase extraction, or language detection.

  4. Stream or store the response locally. Depending on the request, results are either a sentiment score, a collection of extracted key phrases, or a language code.

Output is returned as a single JSON document, with results for each text document you posted, based on ID. You can subsequently analyze, visualize, or categorize the results into actionable insights.

Data is not stored in your account. Operations performed by Text Analytics API are stateless, which means the text you provide is processed and results are returned immediately.

Supported languages

Text Analytics can detect up to 120 different languages. Language Detection returns the "script" of a language. For instance, for the phrase "I have a dog" it will return en instead of en-US. The only special case is Chinese, where the language detection capability will return zh_CHS or zh_CHT if it can determine the script given the text provided. In situations where a specific script cannot be identified for a Chinese document, it will return simply zh.

For sentiment analysis and key phrase extraction, the list of supported languages is more selective as we refine the analyzers to accommodate the linguistic rules of additional languages.

Language support is initially rolled out in preview, graduating to generally available (GA) status, independently of each other and of the Text Analytics service overall. It's possible for languages to remain in preview, even while Text Analytics API transitions to generally available.

Language Language code Sentiment Key phrases Notes
Danish da ✔ *
Dutch nl ✔ *
English en
Finnish fi ✔ *
French fr
German de ✔ *
Greek el ✔ *
Italian it ✔ *
Japanese ja
Norwegian no ✔ *
Polish pl ✔ *
Portuguese (Portugal) pt-PT pt also accepted
Portuguese (Brazil) pt-BR
Russian ru ✔ *
Spanish es
Swedish sv ✔ *
Turkish tr ✔ *

* indicates language support in preview

Data limits

All three Text Analytics APIs accept raw text data. The current limit is 5,000 characters for each document; if you need to analyze larger documents, you can break them up into smaller chunks. If you still require a higher limit, contact us so that we can discuss your requirements.

Maximum size of a single document 5,000 characters as measured by String.Length.
Maximum size of entire request 1 MB
Maximum number of documents in a request 1,000 documents

The rate limit is 100 calls per minute. Note that you can submit a large quantity of documents in a single call (up to 1000 documents).

Unicode encoding

Text Analytics API uses Unicode encoding for text representation and character count calculations. Requests can be submitted in both UTF-8 and UTF-16 with no measurable differences in the character count. Unicode codepoints are used as the heuristic for character length and are considered equivalent for the purposes of text analytics data limits. If you use String.Length(strlen) to get the character count, you are using the same methods we use to measure data size

Next steps

First, try the interactive demo. You can paste a text input (5K character maximum) to detect the language (up to 120), calculate a sentiment score, or extract key phrases. No sign-up necessary.

When you are ready to call the API directly:

  • Sign up for an access key and review the steps for calling the API.

  • Quickstart is a walkthrough of the REST API calls written in C#. Learn how to submit text, choose an analysis, and view results with minimal code.

  • API reference documentation provides the technical documentation for the APIs. The documentation supports embedded calls so that you can call the API from each documentation page.

  • External & Community Content provides a list of blog posts and videos demonstrating how to use Text Analytics with other tools and technologies.

See also

Cognitive Services Documentation page
Cognitive Services Product page