Observed people tracking & matched faces - transparency note

Important

Face identification, customization and celebrity recognition features access is limited based on eligibility and usage criteria in order to support our Responsible AI principles. Face identification, customization and celebrity recognition features are only available to Microsoft managed customers and partners. Use the Face Recognition intake form to apply for access.

Observed people tracking and matched faces are Azure Video Indexer AI features that automatically detect and match people in media files. Observed people tracking and matched faces can be set to display insights on people, their clothing, and the exact timeframe of their appearance.

The resulting insights are displayed in a categorized list in the Insights tab and includes a thumbnail of each person and their ID. Clicking the thumbnail of a person displays the matched person (the corresponding face in the People insight). Insights are also generated in a categorized list in a JSON file which includes the thumbnail ID of the person, the percentage of time appearing in the file, Wiki link (if they are a celebrity) and confidence level.

Prerequisites

Review transparency note overview

General principles

This transparency note discusses observed people tracking and matched faces and the key considerations for making use of this technology responsibly. There are a number of things you need to consider when deciding how to use and implement an AI-powered feature:

  • Will this feature perform well in my scenario? Before deploying observed people tracking and matched faces into your scenario, test how it performs using real-life data and make sure it can deliver the accuracy you need.
  • Are we equipped to identify and respond to errors? AI-powered products and features will not be 100% accurate, so consider how you will identify and respond to any errors that may occur.

View the insight

When uploading the media file, go to Video + Audio Indexing and select Advanced.

To display observed people tracking and matched faces insight on the website, do the following:

  1. After the file has been indexed, go to Insights and then scroll to observed people.

To see the insights in a JSON file, do the following:

  1. Click Download and then Insights (JSON).

  2. Copy the observedPeople text and paste it into your JSON viewer.

    The following section shows observed people and clothing. For the person with id 4 ("id": 4) there is also a matching face.

    "observedPeople": [
        {
     "id": 1,
     "thumbnailId": "4addcebf-6c51-42cd-b8e0-aedefc9d8f6b",
     "clothing": [
     	{
     		"id": 1,
     		"type": "sleeve",
     		"properties": {
     			"length": "long"
     		}
     	},
     	{
     		"id": 2,
     		"type": "pants",
     		"properties": {
     			"length": "long"
     		}
     	}
     ],
     "instances": [
     	{
     		"adjustedStart": "0:00:00.0667333",
     		"adjustedEnd": "0:00:12.012",
     		"start": "0:00:00.0667333",
     		"end": "0:00:12.012"
     	}
     ]
    },
    {
     "id": 2,
     "thumbnailId": "858903a7-254a-438e-92fd-69f8bdb2ac88",
     "clothing": [
     	{
     		"id": 1,
     		"type": "sleeve",
     		"properties": {
     			"length": "short"
     		}
     	}
     ],
     "instances": [
     	{
     		"adjustedStart": "0:00:23.2565666",
     		"adjustedEnd": "0:00:25.4921333",
     		"start": "0:00:23.2565666",
     		"end": "0:00:25.4921333"
     	},
     	{
     		"adjustedStart": "0:00:25.8925333",
     		"adjustedEnd": "0:00:25.9926333",
     		"start": "0:00:25.8925333",
     		"end": "0:00:25.9926333"
     	},
     	{
     		"adjustedStart": "0:00:26.3930333",
     		"adjustedEnd": "0:00:28.5618666",
     		"start": "0:00:26.3930333",
     		"end": "0:00:28.5618666"
     	}
     ]
    },
    {
     "id": 3,
     "thumbnailId": "1406252d-e7f5-43dc-852d-853f652b39b6",
     "clothing": [
     	{
     		"id": 1,
     		"type": "sleeve",
     		"properties": {
     			"length": "short"
     		}
     	},
     	{
     		"id": 2,
     		"type": "pants",
     		"properties": {
     			"length": "long"
     		}
     	},
     	{
     		"id": 3,
     		"type": "skirtAndDress"
     	}
     ],
     "instances": [
     	{
     		"adjustedStart": "0:00:31.9652666",
     		"adjustedEnd": "0:00:34.4010333",
     		"start": "0:00:31.9652666",
     		"end": "0:00:34.4010333"
     	}
     ]
    },
    {
     "id": 4,
     "thumbnailId": "d09ad62e-e0a4-42e5-8ca9-9a640c686596",
     "clothing": [
     	{
     		"id": 1,
     		"type": "sleeve",
     		"properties": {
     			"length": "short"
     		}
     	},
     	{
     		"id": 2,
     		"type": "pants",
     		"properties": {
     			"length": "short"
     		}
     	}
     ],
     "matchingFace": {
     	"id": 1310,
     	"confidence": 0.3819
     },
     "instances": [
     	{
     		"adjustedStart": "0:00:34.8681666",
     		"adjustedEnd": "0:00:36.0026333",
     		"start": "0:00:34.8681666",
     		"end": "0:00:36.0026333"
     	},
     	{
     		"adjustedStart": "0:00:36.6699666",
     		"adjustedEnd": "0:00:36.7367",
     		"start": "0:00:36.6699666",
     		"end": "0:00:36.7367"
     	},
     	{
     		"adjustedStart": "0:00:37.2038333",
     		"adjustedEnd": "0:00:39.6729666",
     		"start": "0:00:37.2038333",
     		"end": "0:00:39.6729666"
     	}
     ]
     },    
    

To download the JSON file via the API, use the Azure Video Indexer developer portal.

Observed people tracking and matched faces components

During the observed people tracking and matched faces procedure, images in a media file are processed, as follows:

Component Definition
Source file The user uploads the source file for indexing.
Detection The media file is tracked to detect observed people and their clothing. For example, shirt with long sleeves, dress or long pants. Note that to be detected, the full upper body of the person must appear in the media.
Local grouping The identified observed faces are filtered into local groups. If a person is detected more than once, additional observed faces instances are created for this person.
Matching and Classification The observed people instances are matched to faces. If there is a known celebrity, the observed person will be given their name. Any number of observed people instances can be matched to the same face.
Confidence value The estimated confidence level of each observed person is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty will be represented as an 0.82 score.

Example use cases

  • Tracking a person’s movement for example in law enforcement for more efficiency when analyzing an accident or crime.
  • Improving efficiency by deep searching for matched people in organizational archives for insight on specific celebrities, for example when creating promos and trailers.
  • Improved efficiency when creating feature stories, for example, searching for people wearing a red shirt in the archives of a football game at a News or Sports agency.

Considerations and limitations when choosing a use case

Below are some considerations to keep in mind when using observed people and matched faces.

  • When uploading a file always use high-quality video content. The recommended maximum frame size is HD and frame rate is 30 FPS. A frame should contain no more than 10 people. When outputting frames from videos to AI models, only send around 2 or 3 frames per second. Processing 10 and more frames might delay the AI result. People and faces in videos recorded by cameras that are high-mounted, down-angled or with a wide field of view (FOV) may have fewer pixels which may result in lower accuracy of the generated insights.
  • Typically, small people or objects under 200 pixels and people who are seated may not be detected. People wearing similar clothes or uniforms might be detected as being the same person and will be given the same ID number. People or objects that are obstructed may not be detected. Tracks of people with front and back poses may be split into different instances.
  • An observed person must first be detected and appear in the people category before they are matched. Tracks are optimized to handle observed people who frequently appear in the front. Obstructions like overlapping people or faces may cause mismatches between matched people and observed people. Mismatching may occur when different people appear in the same relative spatial position in the frame within a short period.

When used responsibly and carefully, Azure Video Indexer is a valuable tool for many industries. To respect the privacy and safety of others, and to comply with local and global regulations, we recommend the following:

  • Always respect an individual’s right to privacy, and only ingest videos for lawful and justifiable purposes.
  • Do not purposely disclose inappropriate media showing young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.
  • Commit to respecting and promoting human rights in the design and deployment of your analyzed media.
  • When using 3rd party materials, be aware of any existing copyrights or permissions required before distributing content derived from them.
  • Always seek legal advice when using media from unknown sources.
  • Always obtain appropriate legal and professional advice to ensure that your uploaded videos are secured and have adequate controls to preserve the integrity of your content and to prevent unauthorized access.
  • Provide a feedback channel that allows users and individuals to report issues with the service.
  • Be aware of any applicable laws or regulations that exist in your area regarding processing, analyzing, and sharing media containing people.
  • Keep a human in the loop. Do not use any solution as a replacement for human oversight and decision-making.
  • Fully examine and review the potential of any AI model you are using to understand its capabilities and limitations.

Next steps

Learn More about Responsible AI

Contact us

visupport@microsoft.com

Azure Video Indexer transparency notes