快速入門:使用 Python REST API 呼叫文字分析認知服務Quickstart: Using the Python REST API to call the Text Analytics Cognitive Service

透過此快速入門,開始使用文字分析 REST API 和 Python 來分析語言。Use this quickstart to begin analyzing language with the Text Analytics REST API and Python. 本文示範如何偵測語言分析情感擷取關鍵片語,以及識別已連結的實體This article shows you how to detect language, analyze sentiment, extract key phrases, and identify linked entities.

如需 API 的技術文件,請參閱 API 定義Refer to the API definitions for technical documentation for the APIs.

必要條件Prerequisites

您必須有具備文字分析 API 存取權的認知服務 API 訂用帳戶You must have a Cognitive Services API subscription with access to the Text Analytics API. 如果您沒有訂用帳戶,可以建立免費帳戶If you don't have a subscription, you can create an account for free. 在繼續之前,您必須有啟用帳戶之後所提供的文字分析訂用帳戶金鑰。Before continuing, you will need the Text Analytics subscription key provided after activating your account.

建立新的 Python 應用程式Create a new Python application

在您最愛的編輯器或 IDE 中建立新的 Python 應用程式。Create a new Python application in your favorite editor or IDE. 在您的檔案中新增下列匯入項目。Add the following imports to your file.

import requests
# pprint is used to format the JSON response
from pprint import pprint
from IPython.display import HTML

建立訂用帳戶金鑰的變數,以及文字分析 REST API 的端點。Create variables for your subscription key, and the endpoint for the Text Analytics REST API. 確認端點中的區域會對應到您註冊時所使用的區域 (例如 westcentralus)。Verify that the region in the endpoint corresponds to the one you used when you signed up (for example westcentralus). 如果您使用免費試用金鑰,則不需變更任何項目。If you're using a free trial key, you don't need to change anything.

subscription_key = "<ADD YOUR KEY HERE>"
text_analytics_base_url = "https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/"

下列各節說明如何呼叫 API 的各項功能。The following sections describe how to call each of the API's features.

偵測語言Detect languages

languages 附加至文字分析基底端點,以形成語言偵測 URL。Append languages to the Text Analytics base endpoint to form the language detection URL. 例如:https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/languagesFor example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/languages

language_api_url = text_analytics_base_url + "languages"

對 API 的酬載是由 documents 清單所組成,其為包含 idtext 屬性的 Tuple。The payload to the API consists of a list of documents, which are tuples containing an id and a text attribute. text 屬性會儲存要分析的文字,而 id 可以是任何值。The text attribute stores the text to be analyzed, and the id can be any value.

documents = { "documents": [
    { "id": "1", "text": "This is a document written in English." },
    { "id": "2", "text": "Este es un document escrito en Español." },
    { "id": "3", "text": "这是一个用中文写的文件" }
]}

使用 Requests 程式庫將文件傳送給 API。Use the Requests library to send the documents to the API. 將您的訂用帳戶金鑰新增至 Ocp-Apim-Subscription-Key 標頭,然後使用 requests.post() 傳送要求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(language_api_url, headers=headers, json=documents)
languages = response.json()
pprint(languages)

輸出Output

{
"documents":[
    {
        "detectedLanguages":[
        {
            "iso6391Name":"en",
            "name":"English",
            "score":1.0
        }
        ],
        "id":"1"
    },
    {
        "detectedLanguages":[
        {
            "iso6391Name":"es",
            "name":"Spanish",
            "score":1.0
        }
        ],
        "id":"2"
    },
    {
        "detectedLanguages":[
        {
            "iso6391Name":"zh_chs",
            "name":"Chinese_Simplified",
            "score":1.0
        }
        ],
        "id":"3"
    }
],
"errors":[]
}

分析人氣Analyze sentiment

若要偵測一組文件的情感 (範圍介於正或負之間),請將 sentiment 附加至文字分析基底端點,以形成語言偵測 URL。To detect the sentiment (which ranges between positive or negative) of a set of documents, append sentiment to the Text Analytics base endpoint to form the language detection URL. 例如:https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/sentimentFor example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/sentiment

sentiment_url = text_analytics_base_url + "sentiment"

如同語言偵測範例,建立具有 documents 索引鍵的字典,而該索引鍵是由文件清單所組成。As with the language detection example, create a dictionary with a documents key that consists of a list of documents. 每份文件都是一個由 id、要分析的 text,以及文字的 language 所組成的 Tuple。Each document is a tuple consisting of the id, the text to be analyzed and the language of the text.

documents = {"documents" : [
  {"id": "1", "language": "en", "text": "I had a wonderful experience! The rooms were wonderful and the staff was helpful."},
  {"id": "2", "language": "en", "text": "I had a terrible time at the hotel. The staff was rude and the food was awful."},  
  {"id": "3", "language": "es", "text": "Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos."},  
  {"id": "4", "language": "es", "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."}
]}

使用 Requests 程式庫將文件傳送給 API。Use the Requests library to send the documents to the API. 將您的訂用帳戶金鑰新增至 Ocp-Apim-Subscription-Key 標頭,然後使用 requests.post() 傳送要求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(sentiment_url, headers=headers, json=documents)
sentiments = response.json()
pprint(sentiments)

輸出Output

文件的情感分數會介於 0.0 和 1.0 之間,分數越高表示情感越正面。The sentiment score for a document is between 0.0 and 1.0, with a higher score indicating a more positive sentiment.

{
  "documents":[
    {
      "id":"1",
      "score":0.9708490371704102
    },
    {
      "id":"2",
      "score":0.0019068121910095215
    },
    {
      "id":"3",
      "score":0.7456425428390503
    },
    {
      "id":"4",
      "score":0.334433376789093
    }
  ],
  "errors":[

  ]
}

擷取關鍵片語Extract key phrases

若要從一組文件中擷取關鍵片語,請將 keyPhrases 附加至文字分析基底端點,以形成語言偵測 URL。To extract the key phrases from a set of documents, append keyPhrases to the Text Analytics base endpoint to form the language detection URL. 例如:https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/keyPhrasesFor example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/keyPhrases

keyphrase_url = text_analytics_base_url + "keyPhrases"

此文件集合與用於情感分析範例的集合相同。This collection of documents is the same used for the sentiment analysis example.

documents = {"documents" : [
  {"id": "1", "language": "en", "text": "I had a wonderful experience! The rooms were wonderful and the staff was helpful."},
  {"id": "2", "language": "en", "text": "I had a terrible time at the hotel. The staff was rude and the food was awful."},  
  {"id": "3", "language": "es", "text": "Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos."},  
  {"id": "4", "language": "es", "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."}
]}

使用 Requests 程式庫將文件傳送給 API。Use the Requests library to send the documents to the API. 將您的訂用帳戶金鑰新增至 Ocp-Apim-Subscription-Key 標頭,然後使用 requests.post() 傳送要求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(keyphrase_url, headers=headers, json=documents)
key_phrases = response.json()
pprint(key_phrases)

輸出Output

{
  "documents":[
    {
      "keyPhrases":[
        "wonderful experience",
        "staff",
        "rooms"
      ],
      "id":"1"
    },
    {
      "keyPhrases":[
        "food",
        "terrible time",
        "hotel",
        "staff"
      ],
      "id":"2"
    },
    {
      "keyPhrases":[
        "Monte Rainier",
        "caminos"
      ],
      "id":"3"
    },
    {
      "keyPhrases":[
        "carretera",
        "tráfico",
        "día"
      ],
      "id":"4"
    }
  ],
  "errors":[

  ]
}

識別實體Identify Entities

若要識別文字文件中的知名實體 (人員、地點和事物),請將 entities 附加至文字分析基底端點,以形成語言偵測 URL。To identify well-known entities (people, places, and things) in text documents, append entities to the Text Analytics base endpoint to form the language detection URL. 例如:https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/entitiesFor example: https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.1/entities

entities_url = text_analytics_base_url + "entities"

如先前的範例,建立文件的集合。Create a collection of documents, like in the previous examples.

documents = {"documents" : [
  {"id": "1", "text": "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800."}
]}

使用 Requests 程式庫將文件傳送給 API。Use the Requests library to send the documents to the API. 將您的訂用帳戶金鑰新增至 Ocp-Apim-Subscription-Key 標頭,然後使用 requests.post() 傳送要求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(entities_url, headers=headers, json=documents)
entities = response.json()

輸出Output

{'documents': [{'id': '1',
   'entities': [{'name': 'Microsoft',
     'matches': [{'wikipediaScore': 0.502357972145024,
       'entityTypeScore': 1.0,
       'text': 'Microsoft',
       'offset': 0,
       'length': 9}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Microsoft',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Microsoft',
     'bingId': 'a093e9b9-90f5-a3d5-c4b8-5855e1b01f85',
     'type': 'Organization'},
    {'name': 'Bill Gates',
     'matches': [{'wikipediaScore': 0.5849375085784292,
       'entityTypeScore': 0.999847412109375,
       'text': 'Bill Gates',
       'offset': 25,
       'length': 10}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Bill Gates',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Bill_Gates',
     'bingId': '0d47c987-0042-5576-15e8-97af601614fa',
     'type': 'Person'},
    {'name': 'Paul Allen',
     'matches': [{'wikipediaScore': 0.5314163053043621,
       'entityTypeScore': 0.9988409876823425,
       'text': 'Paul Allen',
       'offset': 40,
       'length': 10}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Paul Allen',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Paul_Allen',
     'bingId': 'df2c4376-9923-6a54-893f-2ee5a5badbc7',
     'type': 'Person'},
    {'name': 'April 4',
     'matches': [{'wikipediaScore': 0.37312706493069636,
       'entityTypeScore': 0.8,
       'text': 'April 4',
       'offset': 54,
       'length': 7}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'April 4',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/April_4',
     'bingId': '52535f87-235e-b513-54fe-c03e4233ac6e',
     'type': 'Other'},
    {'name': 'April 4, 1975',
     'matches': [{'entityTypeScore': 0.8,
       'text': 'April 4, 1975',
       'offset': 54,
       'length': 13}],
     'type': 'DateTime',
     'subType': 'Date'},
    {'name': 'BASIC',
     'matches': [{'wikipediaScore': 0.35916049097766867,
       'entityTypeScore': 0.8,
       'text': 'BASIC',
       'offset': 89,
       'length': 5}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'BASIC',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/BASIC',
     'bingId': '5b16443d-501c-58f3-352e-611bbe75aa6e',
     'type': 'Other'},
    {'name': 'Altair 8800',
     'matches': [{'wikipediaScore': 0.8697256853652899,
       'entityTypeScore': 0.8,
       'text': 'Altair 8800',
       'offset': 116,
       'length': 11}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Altair 8800',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Altair_8800',
     'bingId': '7216c654-3779-68a2-c7b7-12ff3dad5606',
     'type': 'Other'}]}],
 'errors': []}

後續步驟Next steps

另請參閱See also

文字分析概觀Text Analytics overview
常見問題集 (FAQ)Frequently asked questions (FAQ)