Quickstart: Analyze a remote image with Python

In this quickstart, you analyze a remote image using Computer Vision. To analyze a local image, see Analyze a local image with Python.

You can run this quickstart in a step-by step fashion using a Jupyter notebook on MyBinder. To launch Binder, select the following button:

Binder

Prerequisites

To use Computer Vision, you need a subscription key; see Obtaining Subscription Keys.

Analyze a remote image

With the Analyze Image method, you can extract visual features based on image content. You can upload an image or specify an image URL and choose which features to return, including:

  • A detailed list of tags related to the image content.
  • A description of image content in a complete sentence.
  • The coordinates, gender, and age of any faces contained in the image.
  • The ImageType (clip art or a line drawing).
  • The dominant color, the accent color, or whether an image is black & white.
  • The category defined in this taxonomy.
  • Does the image contain adult or sexually suggestive content?

To run the sample, do the following steps:

  1. Copy the following code to a new Python script file.
  2. Replace <Subscription Key> with your valid subscription key.
  3. Change the vision_base_url value to the location where you obtained your subscription keys, if necessary.
  4. Optionally, change the image_url value to another image.
  5. Run the script.

The following code uses the Python requests library to call the Computer Vision Analyze Image API. It returns the results as a JSON object. The API key is passed in via the headers dictionary. The types of features to recognize is passed in via the params dictionary.

Analyze Image request

# Replace <Subscription Key> with your valid subscription key.
subscription_key = "<Subscription Key>"
assert subscription_key

# You must use the same region in your REST call as you used to get your
# subscription keys. For example, if you got your subscription keys from
# westus, replace "westcentralus" in the URI below with "westus".
#
# Free trial subscription keys are generated in the westcentralus region.
# If you use a free trial subscription key, you shouldn't need to change
# this region.
vision_base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/"

vision_analyze_url = vision_base_url + "analyze"

# Set image_url to the URL of an image that you want to analyze.
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/" + \
    "Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"

import requests
headers  = {'Ocp-Apim-Subscription-Key': subscription_key }
params   = {'visualFeatures': 'Categories,Description,Color'}
data     = {'url': image_url}
response = requests.post(vision_analyze_url, headers=headers, params=params, json=data)
response.raise_for_status()

# The 'analysis' object contains various fields that describe the image. The most
# relevant caption for the image is obtained from the 'descriptions' property.
analysis = response.json()
print(analysis)

image_caption = analysis["description"]["captions"][0]["text"].capitalize()

# Display the image and overlay it with the caption.
# If you are using a Jupyter notebook, uncomment the following line.
#%matplotlib inline
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
image = Image.open(BytesIO(requests.get(image_url).content))
plt.imshow(image)
plt.axis("off")
_ = plt.title(image_caption, size="x-large", y=-0.1)

Analyze Image response

A successful response is returned in JSON, for example:

{
  "categories": [
    {
      "name": "outdoor_",
      "score": 0.00390625,
      "detail": {
        "landmarks": []
      }
    },
    {
      "name": "outdoor_street",
      "score": 0.33984375,
      "detail": {
        "landmarks": []
      }
    }
  ],
  "description": {
    "tags": [
      "building",
      "outdoor",
      "street",
      "city",
      "people",
      "busy",
      "table",
      "walking",
      "traffic",
      "filled",
      "large",
      "many",
      "group",
      "night",
      "light",
      "crowded",
      "bunch",
      "standing",
      "man",
      "sign",
      "crowd",
      "umbrella",
      "riding",
      "tall",
      "woman",
      "bus"
    ],
    "captions": [
      {
        "text": "a group of people on a city street at night",
        "confidence": 0.9122243847383961
      }
    ]
  },
  "color": {
    "dominantColorForeground": "Brown",
    "dominantColorBackground": "Brown",
    "dominantColors": [
      "Brown"
    ],
    "accentColor": "B54316",
    "isBwImg": false
  },
  "requestId": "c11894eb-de3e-451b-9257-7c8b168073d1",
  "metadata": {
    "height": 600,
    "width": 450,
    "format": "Jpeg"
  }
}

Next steps

Explore a Python application that uses Computer Vision to perform optical character recognition (OCR); create smart-cropped thumbnails; plus detect, categorize, tag, and describe visual features, including faces, in an image.