Quickstart for Text Analytics API with Python

This walkthrough shows you how to detect language, analyze sentiment, and extract key phrases using the Text Analytics APIs with Python.

You can run this example as a Jupyter notebook on MyBinder by clicking on the launch Binder badge:


Refer to the API definitions for technical documentation for the APIs.


You must have a Cognitive Services API account with Text Analytics API. You can use the free tier for 5,000 transactions/month to complete this walkthrough.

You must also have the endpoint and access key that was generated for you during sign-up.

To continue with this walkthrough, replace subscription_key with a valid subscription key that you obtained earlier.

subscription_key = None
assert subscription_key

Next, verify that the region in text_analytics_base_url corresponds to the one you used when setting up the service. If you are using a free trial key, you do not need to change anything.

text_analytics_base_url = "https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.0/"

Detect languages

The Language Detection API detects the language of a text document, using the Detect Language method. The service endpoint of the language detection API for your region is available via the following URL:

language_api_url = text_analytics_base_url + "languages"

The payload to the API consists of a list of documents, each of which in turn contains an id and a text attribute. The text attribute stores the text to be analyzed.

Replace the documents dictionary with any other text for language detection.

documents = { 'documents': [
    { 'id': '1', 'text': 'This is a document written in English.' },
    { 'id': '2', 'text': 'Este es un document escrito en Español.' },
    { 'id': '3', 'text': '这是一个用中文写的文件' }

The next few lines of code call out to the language detection API using the requests library in Python to determine the language in the documents.

import requests
from pprint import pprint
headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(language_api_url, headers=headers, json=documents)
languages = response.json()
{'documents': [{'detectedLanguages': [{'iso6391Name': 'en',
                                       'name': 'English',
                                       'score': 1.0}],
                'id': '1'},
               {'detectedLanguages': [{'iso6391Name': 'es',
                                       'name': 'Spanish',
                                       'score': 1.0}],
                'id': '2'},
               {'detectedLanguages': [{'iso6391Name': 'zh_chs',
                                       'name': 'Chinese_Simplified',
                                       'score': 1.0}],
                'id': '3'}],
 'errors': []}

The following lines of code render the JSON data as an HTML table.

from IPython.display import HTML
table = []
for document in languages["documents"]:
    text  = next(filter(lambda d: d["id"] == document["id"], documents["documents"]))["text"]
    langs = ", ".join(["{0}({1})".format(lang["name"], lang["score"]) for lang in document["detectedLanguages"]])
    table.append("<tr><td>{0}</td><td>{1}</td>".format(text, langs))
HTML("<table><tr><th>Text</th><th>Detected languages(scores)</th></tr>{0}</table>".format("\n".join(table)))

Analyze sentiment

The Sentiment Analysis API detexts the sentiment of a set of text records, using the Sentiment method. The following example scores two documents, one in English and another in Spanish.

The service endpoint for sentiment analysis is available for your region via the following URL:

sentiment_api_url = text_analytics_base_url + "sentiment"

As with the language detection example, the service is provided with a dictionary with a documents key that consists of a list of documents. Each document is a tuple consisting of the id, the text to be analyzed and the language of the text. You can use the language detection API from the previous section to populate this field.

documents = {'documents' : [
  {'id': '1', 'language': 'en', 'text': 'I had a wonderful experience! The rooms were wonderful and the staff was helpful.'},
  {'id': '2', 'language': 'en', 'text': 'I had a terrible time at the hotel. The staff was rude and the food was awful.'},  
  {'id': '3', 'language': 'es', 'text': 'Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos.'},  
  {'id': '4', 'language': 'es', 'text': 'La carretera estaba atascada. Había mucho tráfico el día de ayer.'}

The sentiment API can now be used to analyze the documents for their sentiments.

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(sentiment_api_url, headers=headers, json=documents)
sentiments = response.json()
{'documents': [{'id': '1', 'score': 0.7673527002334595},
               {'id': '2', 'score': 0.18574094772338867},
               {'id': '3', 'score': 0.5}],
 'errors': []}

The sentiment score for a document is between $0$ and $1$, with a higher score indicating a more positive sentiment.

Extract key phrases

The Key Phrase Extraction API extracts key-phrases from a text document, using the Key Phrases method. This section of the walkthrough extracts key phrases for both English and Spanish documents.

The service endpoint for the key-phrase extraction service is accessed via the following URL:

key_phrase_api_url = text_analytics_base_url + "keyPhrases"

The collection of documents is the same as what was used for sentiment analysis.

{'documents': [{'id': '1', 'text': 'This is a document written in English.'},
               {'id': '2', 'text': 'Este es un document escrito en Español.'},
               {'id': '3', 'text': '这是一个用中文写的文件'}]}
headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(key_phrase_api_url, headers=headers, json=documents)
key_phrases = response.json()
{'documents': [{'id': '1', 'keyPhrases': ['document', 'English']},
               {'id': '2',
                'keyPhrases': ['Este es', 'document escrito en Español']},
               {'id': '3', 'keyPhrases': ['这是一个用中文写的文件']}],
 'errors': []}

The JSON object can once again be rendered as an HTML table using the following lines of code:

from IPython.display import HTML
table = []
for document in key_phrases["documents"]:
    text    = next(filter(lambda d: d["id"] == document["id"], documents["documents"]))["text"]    
    phrases = ",".join(document["keyPhrases"])
    table.append("<tr><td>{0}</td><td>{1}</td>".format(text, phrases))
HTML("<table><tr><th>Text</th><th>Key phrases</th></tr>{0}</table>".format("\n".join(table)))

Next steps

See also

Text Analytics overview
Frequently asked questions (FAQ)