Quickstart: Using the Python REST API to call the Text Analytics Cognitive Service

Use this quickstart to begin analyzing language with the Text Analytics REST API and Python. This article shows you how to detect language, analyze sentiment, extract key phrases, and identify linked entities.

Refer to the API definitions for technical documentation for the APIs.

Prerequisites

  • Python 3.x

  • The endpoint and access key that was generated for you during sign-up.

  • The Python requests library

    You can install the library with this command:

    pip install --upgrade requests
    

You must have a Cognitive Services API subscription with access to the Text Analytics API. If you don't have a subscription, you can create an account for free. Before continuing, you will need the Text Analytics subscription key provided after activating your account.

Create a new Python application

Create a new Python application in your favorite editor or IDE. Add the following imports to your file.

import requests
# pprint is used to format the JSON response
from pprint import pprint
from IPython.display import HTML

Create variables for your subscription key, and the endpoint for the Text Analytics REST API. Verify that the region in the endpoint corresponds to the one you used when you signed up (for example westcentralus). If you're using a free trial key, you don't need to change anything.

subscription_key = "<ADD YOUR KEY HERE>"
text_analytics_base_url = "https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/"

The following sections describe how to call each of the API's features.

Detect languages

Append languages to the Text Analytics base endpoint to form the language detection URL. For example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/languages

language_api_url = text_analytics_base_url + "languages"

The payload to the API consists of a list of documents, which are tuples containing an id and a text attribute. The text attribute stores the text to be analyzed, and the id can be any value.

documents = { "documents": [
    { "id": "1", "text": "This is a document written in English." },
    { "id": "2", "text": "Este es un document escrito en Español." },
    { "id": "3", "text": "这是一个用中文写的文件" }
]}

Use the Requests library to send the documents to the API. Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(language_api_url, headers=headers, json=documents)
languages = response.json()
pprint(languages)

Output

{
"documents":[
    {
        "detectedLanguages":[
        {
            "iso6391Name":"en",
            "name":"English",
            "score":1.0
        }
        ],
        "id":"1"
    },
    {
        "detectedLanguages":[
        {
            "iso6391Name":"es",
            "name":"Spanish",
            "score":1.0
        }
        ],
        "id":"2"
    },
    {
        "detectedLanguages":[
        {
            "iso6391Name":"zh_chs",
            "name":"Chinese_Simplified",
            "score":1.0
        }
        ],
        "id":"3"
    }
],
"errors":[]
}

Analyze sentiment

To detect the sentiment (which ranges between positive or negative) of a set of documents, append sentiment to the Text Analytics base endpoint to form the language detection URL. For example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/sentiment

sentiment_url = text_analytics_base_url + "sentiment"

As with the language detection example, create a dictionary with a documents key that consists of a list of documents. Each document is a tuple consisting of the id, the text to be analyzed and the language of the text.

documents = {"documents" : [
  {"id": "1", "language": "en", "text": "I had a wonderful experience! The rooms were wonderful and the staff was helpful."},
  {"id": "2", "language": "en", "text": "I had a terrible time at the hotel. The staff was rude and the food was awful."},  
  {"id": "3", "language": "es", "text": "Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos."},  
  {"id": "4", "language": "es", "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."}
]}

Use the Requests library to send the documents to the API. Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(sentiment_url, headers=headers, json=documents)
sentiments = response.json()
pprint(sentiments)

Output

The sentiment score for a document is between 0.0 and 1.0, with a higher score indicating a more positive sentiment.

{
  "documents":[
    {
      "id":"1",
      "score":0.9708490371704102
    },
    {
      "id":"2",
      "score":0.0019068121910095215
    },
    {
      "id":"3",
      "score":0.7456425428390503
    },
    {
      "id":"4",
      "score":0.334433376789093
    }
  ],
  "errors":[

  ]
}

Extract key phrases

To extract the key phrases from a set of documents, append keyPhrases to the Text Analytics base endpoint to form the language detection URL. For example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/keyPhrases

keyphrase_url = text_analytics_base_url + "keyPhrases"

This collection of documents is the same used for the sentiment analysis example.

documents = {"documents" : [
  {"id": "1", "language": "en", "text": "I had a wonderful experience! The rooms were wonderful and the staff was helpful."},
  {"id": "2", "language": "en", "text": "I had a terrible time at the hotel. The staff was rude and the food was awful."},  
  {"id": "3", "language": "es", "text": "Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos."},  
  {"id": "4", "language": "es", "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."}
]}

Use the Requests library to send the documents to the API. Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(keyphrase_url, headers=headers, json=documents)
key_phrases = response.json()
pprint(key_phrases)

Output

{
  "documents":[
    {
      "keyPhrases":[
        "wonderful experience",
        "staff",
        "rooms"
      ],
      "id":"1"
    },
    {
      "keyPhrases":[
        "food",
        "terrible time",
        "hotel",
        "staff"
      ],
      "id":"2"
    },
    {
      "keyPhrases":[
        "Monte Rainier",
        "caminos"
      ],
      "id":"3"
    },
    {
      "keyPhrases":[
        "carretera",
        "tráfico",
        "día"
      ],
      "id":"4"
    }
  ],
  "errors":[

  ]
}

Identify Entities

To identify well-known entities (people, places, and things) in text documents, append entities to the Text Analytics base endpoint to form the language detection URL. For example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/entities

entities_url = text_analytics_base_url + "entities"

Create a collection of documents, like in the previous examples.

documents = {"documents" : [
  {"id": "1", "text": "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800."}
]}

Use the Requests library to send the documents to the API. Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(entities_url, headers=headers, json=documents)
entities = response.json()

Output

{'documents': [{'id': '1',
   'entities': [{'name': 'Microsoft',
     'matches': [{'wikipediaScore': 0.502357972145024,
       'entityTypeScore': 1.0,
       'text': 'Microsoft',
       'offset': 0,
       'length': 9}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Microsoft',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Microsoft',
     'bingId': 'a093e9b9-90f5-a3d5-c4b8-5855e1b01f85',
     'type': 'Organization'},
    {'name': 'Bill Gates',
     'matches': [{'wikipediaScore': 0.5849375085784292,
       'entityTypeScore': 0.999847412109375,
       'text': 'Bill Gates',
       'offset': 25,
       'length': 10}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Bill Gates',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Bill_Gates',
     'bingId': '0d47c987-0042-5576-15e8-97af601614fa',
     'type': 'Person'},
    {'name': 'Paul Allen',
     'matches': [{'wikipediaScore': 0.5314163053043621,
       'entityTypeScore': 0.9988409876823425,
       'text': 'Paul Allen',
       'offset': 40,
       'length': 10}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Paul Allen',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Paul_Allen',
     'bingId': 'df2c4376-9923-6a54-893f-2ee5a5badbc7',
     'type': 'Person'},
    {'name': 'April 4',
     'matches': [{'wikipediaScore': 0.37312706493069636,
       'entityTypeScore': 0.8,
       'text': 'April 4',
       'offset': 54,
       'length': 7}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'April 4',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/April_4',
     'bingId': '52535f87-235e-b513-54fe-c03e4233ac6e',
     'type': 'Other'},
    {'name': 'April 4, 1975',
     'matches': [{'entityTypeScore': 0.8,
       'text': 'April 4, 1975',
       'offset': 54,
       'length': 13}],
     'type': 'DateTime',
     'subType': 'Date'},
    {'name': 'BASIC',
     'matches': [{'wikipediaScore': 0.35916049097766867,
       'entityTypeScore': 0.8,
       'text': 'BASIC',
       'offset': 89,
       'length': 5}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'BASIC',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/BASIC',
     'bingId': '5b16443d-501c-58f3-352e-611bbe75aa6e',
     'type': 'Other'},
    {'name': 'Altair 8800',
     'matches': [{'wikipediaScore': 0.8697256853652899,
       'entityTypeScore': 0.8,
       'text': 'Altair 8800',
       'offset': 116,
       'length': 11}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Altair 8800',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Altair_8800',
     'bingId': '7216c654-3779-68a2-c7b7-12ff3dad5606',
     'type': 'Other'}]}],
 'errors': []}

Next steps

See also

Text Analytics overview
Frequently asked questions (FAQ)