How to summarize sentiment for a long text with Text Analytics sentiment API?

Yu Zhu 6 Reputation points
2020-07-12T22:34:58.627+00:00

With Azure Text Analytics sentiment API, we're able to label sentiment for a text piece. However, for a long piece of text that exceeds the 5120 limit, we have to split it into chunks. My question is after getting the sentiment labels for individual chunks, what's the best way to aggregate them to get the overall sentiment label for the whole text?

Currently, I see two approaches. 1) take the average of the "chunk" sentiment; 2) take the average of every sentence. Which will be the more appropriate way?

Thank you.

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
354 questions
{count} vote

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,611 Reputation points
    2020-07-13T08:24:31.437+00:00

    @Yu Zhu V3 of Sentiment Analysis, It can now detect the sentiment of both a document level and its individual sentences.
    https://learn.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis?tabs=version-3
    https://learn.microsoft.com/en-us/azure/cognitive-services/text-analytics/tutorials/tutorial-power-bi-key-phrases
    You wouldn’t have to break them into sentences. The API does it for you. You can also ignore the sentence level sentiment and only use the document level one. See example below:

    Input: you can pass in 1000 documents (i.e. comments) in a single API call

    {  
      "documents": [  
        {  
          "language": "en",  
          "id": "1",  
          "text": "This program is very helpful. I received a great response to my inquiry. I was disappointed with the response time however."  
        },  
         {  
          "language": "en",  
          "id": "2",  
          "text": "I really enjoyed the interaction with the agent."  
        }  
      ]  
    }  
      
    Output: (get document and sentence scores)  
      
    {  
        "documents": [  
            {  
                "id": "1",  
                "sentiment": "mixed",  
                "documentScores": {  
                    "positive": 0.66642224788665771,  
                    "neutral": 0.0001706598413875,  
                    "negative": 0.333407074213028  
                },  
                "sentences": [  
                    {  
                        "sentiment": "positive",  
                        "sentenceScores": {  
                            "positive": 0.99993038177490234,  
                            "neutral": 2.86401773337E-05,  
                            "negative": 4.09220410802E-05  
                        },  
                        "offset": 0,  
                        "length": 29  
                    },  
                    {  
                        "sentiment": "positive",  
                        "sentenceScores": {  
                            "positive": 0.99930989742279053,  
                            "neutral": 0.0004785652272403,  
                            "negative": 0.0002115316892741  
                        },  
                        "offset": 30,  
                        "length": 42  
                    },  
                    {  
                        "sentiment": "negative",  
                        "sentenceScores": {  
                            "positive": 2.64735208475E-05,  
                            "neutral": 4.7741514209E-06,  
                            "negative": 0.9999687671661377  
                        },  
                        "offset": 73,  
                        "length": 50  
                    }  
                ]  
            },  
            {  
                "id": "2",  
                "sentiment": "positive",  
                "documentScores": {  
                    "positive": 0.99982720613479614,  
                    "neutral": 9.10629023565E-05,  
                    "negative": 8.17300679046E-05  
                },  
                "sentences": [  
                    {  
                        "sentiment": "positive",  
                        "sentenceScores": {  
                            "positive": 0.99982720613479614,  
                            "neutral": 9.10629023565E-05,  
                            "negative": 8.17300679046E-05  
                        },  
                        "offset": 0,  
                        "length": 48  
                    }  
                ]  
            }  
        ],  
        "errors": []  
    }