Analyze an image

Completed

To analyze an image, you can use the Analyze Image REST method or the equivalent method in the SDK for your preferred programming language, specifying the visual features you want to include in the analysis (and if you select categories, whether or not to include details of celebrities or landmarks). This method returns a JSON document containing the requested information.

Note

Detection of celebrities will require getting approved through a Limited Access policy. You can read more about the addition of this policy to our Responsible AI standard. Celebrity recognition is seen in some screenshots, however is not included in the lab.

using Azure.AI.Vision.ImageAnalysis;

ImageAnalysisClient client = new ImageAnalysisClient(
    Environment.GetEnvironmentVariable("ENDPOINT"),
    new AzureKeyCredential(Environment.GetEnvironmentVariable("KEY")));

ImageAnalysisResult result = client.Analyze(
    new Uri("<url>"),
    VisualFeatures.Caption | VisualFeatures.Read,
    new ImageAnalysisOptions { GenderNeutralCaption = true });
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint=os.environ["ENDPOINT"],
    credential=AzureKeyCredential(os.environ["KEY"])
)

result = client.analyze(
    image_url="<url>",
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.READ],
    gender_neutral_caption=True,
    language="en",
)

Available visual features are contained in the VisualFeatures enum:

  • VisualFeatures.Tags: Identifies tags about the image, including objects, scenery, setting, and actions
  • VisualFeatures.Objects: Returns the bounding box for each detected object
  • VisualFeatures.Caption: Generates a caption of the image in natural language
  • VisualFeatures.DenseCaptions: Generates more detailed captions for the objects detected
  • VisualFeatures.People: Returns the bounding box for detected people
  • VisualFeatures.SmartCrops: Returns the bounding box of the specified aspect ratio for the area of interest
  • VisualFeatures.Read: Extracts readable text
  • VisualFeatures.TAGS: Identifies tags about the image, including objects, scenery, setting, and actions
  • VisualFeatures.OBJECTS: Returns the bounding box for each detected object
  • VisualFeatures.CAPTION: Generates a caption of the image in natural language
  • VisualFeatures.DENSE_CAPTIONS: Generates more detailed captions for the objects detected
  • VisualFeatures.PEOPLE: Returns the bounding box for detected people
  • VisualFeatures.SMART_CROPS: Returns the bounding box of the specified aspect ratio for the area of interest
  • VisualFeatures.READ: Extracts readable text

Specifying the visual features you want analyzed in the image determines what information the response will contain. Most responses will contain a bounding box (if a location in the image is reasonable) or a confidence score (for features such as tags or captions).

The JSON response for image analysis looks similar to this example, depending on your requested features:

{
  "apim-request-id": "abcde-1234-5678-9012-f1g2h3i4j5k6",
  "modelVersion": "<version>",
  "denseCaptionsResult": {
    "values": [
      {
        "text": "a house in the woods",
        "confidence": 0.7055229544639587,
        "boundingBox": {
          "x": 0,
          "y": 0,
          "w": 640,
          "h": 640
        }
      },
      {
        "text": "a trailer with a door and windows",
        "confidence": 0.6675070524215698,
        "boundingBox": {
          "x": 214,
          "y": 434,
          "w": 154,
          "h": 108
        }
      }
    ]
  },
  "metadata": {
    "width": 640,
    "height": 640
  }
}