question

Mike-Ubezzi-MSFT avatar image
0 Votes"
Mike-Ubezzi-MSFT asked ·

What is the difference between Tags and Description["tags"] in the Computer Vision API?

"My team is using the Computer Vision service from the Microsoft Cognitive Services API. Within the JSON output from the images we are submitting, there are two sets of data, one with a key of 'Tags' and one with a key of 'Description[""tags""].

There appears to be some overlap between the data in this two sections but there are also unique tags in both, but I do not understand the difference an no one else on the team seems to understand it either.

Can anyone enlighten us? "

[Note: As we migrate from MSDN, this question has been posted by an Azure Cloud Engineer as a frequently asked question] Source: MSDN


azure-computer-vision
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

RohitMungi-7434 avatar image
0 Votes"
RohitMungi-7434 answered ·

We have checked with our product team to understand the inner details of the two fields in the response. Here are more details to clarify the same.

The Tags section of the response is based on a model that is different from the Description[Tags] model and they are using different threshold settings internally which provide different set of tags where some of them can be common and some of them only available in the Description[Tags] section as we have seen below.

 {
   "description": {
     "tags": ["outdoor", "road", "grass", "path", "trail", "forest", "tree", "side", "area", "narrow", "country", "track", "train", "street", "traveling", "dirt", "covered", "sign", "riding", "standing", "stop", "man", "red", "snow"],
     "captions": [{
       "text": "a path with trees on the side of a road",
       "confidence": 0.965715635493424
     }]
   },
   "requestId": "<id>",
   "metadata": {
     "width": 800,
     "height": 600,
     "format": "Jpeg"
   }
 }


The threshold setting in our Tags setting is more optimized on precision while the captioning or Description[Tags] section is optimized on recall to encourage more words for captioning an image and sentence generation.

If you want to understand more details about Precision and recall please check this documentation from custom vision which explains these scenarios.

So, the above responses are basically available based on customer scenarios to help them use either precision or recall i.e to either use Tags for precise scenarios with higher thresholds or Description[Tags] for recall where sentence or text generation of an image is the primary objective.


Source: Azure Documentation


·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.