Redact faces with Azure Media Analytics

Overview

Azure Media Redactor is an Azure Media Analytics media processor (MP) that offers scalable face redaction in the cloud. Face redaction enables you to modify your video in order to blur faces of selected individuals. You may want to use the face redaction service in public safety and news media scenarios. A few minutes of footage that contains multiple faces can take hours to redact manually, but with this service the face redaction process will require just a few simple steps. For more information, see this blog.

This article gives details about Azure Media Redactor and shows how to use it with Media Services SDK for .NET.

Face redaction modes

Facial redaction works by detecting faces in every frame of video and tracking the face object both forwards and backwards in time, so that the same individual can be blurred from other angles as well. The automated redaction process is complex and does not always produce 100% of desired output, for this reason Media Analytics provides you with a couple of ways to modify the final output.

In addition to a fully automatic mode, there is a two-pass workflow, which allows the selection/de-selection of found faces via a list of IDs. Also, to make arbitrary per frame adjustments the MP uses a metadata file in JSON format. This workflow is split into Analyze and Redact modes. You can combine the two modes in a single pass that runs both tasks in one job; this mode is called Combined.

Combined mode

This produces a redacted mp4 automatically without any manual input.

Stage File Name Notes
Input asset foo.bar Video in WMV, MOV, or MP4 format
Input config Job configuration preset {'version':'1.0', 'options': {'mode':'combined'}}
Output asset foo_redacted.mp4 Video with blurring applied

Input example:

view this video

Output example:

view this video

Analyze mode

The analyze pass of the two-pass workflow takes a video input and produces a JSON file of face locations, and jpg images of each detected face.

Stage File Name Notes
Input asset foo.bar Video in WMV, MPV, or MP4 format
Input config Job configuration preset {'version':'1.0', 'options': {'mode':'analyze'}}
Output asset foo_annotations.json Annotation data of face locations in JSON format. This can be edited by the user to modify the blurring bounding boxes. See sample below.
Output asset foo_thumb%06d.jpg [foo_thumb000001.jpg, foo_thumb000002.jpg] A cropped jpg of each detected face, where the number indicates the labelId of the face

Output example:

    {
      "version": 1,
      "timescale": 24000,
      "offset": 0,
      "framerate": 23.976,
      "width": 1280,
      "height": 720,
      "fragments": [
        {
          "start": 0,
          "duration": 48048,
          "interval": 1001,
          "events": [
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [],
            [
              {
                "index": 13,
                "id": 1138,
                "x": 0.29537,
                "y": -0.18987,
                "width": 0.36239,
                "height": 0.80335
              },
              {
                "index": 13,
                "id": 2028,
                "x": 0.60427,
                "y": 0.16098,
                "width": 0.26958,
                "height": 0.57943
              }
            ],

    … truncated

Redact mode

The second pass of the workflow takes a larger number of inputs that must be combined into a single asset.

This includes a list of IDs to blur, the original video, and the annotations JSON. This mode uses the annotations to apply blurring on the input video.

The output from the Analyze pass does not include the original video. The video needs to be uploaded into the input asset for the Redact mode task and selected as the primary file.

Stage File Name Notes
Input asset foo.bar Video in WMV, MPV, or MP4 format. Same video as in step 1.
Input asset foo_annotations.json annotations metadata file from phase one, with optional modifications.
Input asset foo_IDList.txt (Optional) Optional new line separated list of face IDs to redact. If left blank, this blurs all faces.
Input config Job configuration preset {'version':'1.0', 'options': {'mode':'redact'}}
Output asset foo_redacted.mp4 Video with blurring applied based on annotations

Example output

This is the output from an IDList with one ID selected.

view this video

Example foo_IDList.txt

 1
 2
 3

Blur types

In the Combined or Redact mode, there are 5 different blur modes you can choose from via the JSON input configuration: Low, Med, High, Box, and Black. By default Med is used.

You can find samples of the blur types below.

Example JSON:

    {'version':'1.0', 'options': {'Mode': 'Combined', 'BlurType': 'High'}}

Low

Low

Med

Med

High

High

Box

Box

Black

Black

Elements of the output JSON file

The Redaction MP provides high precision face location detection and tracking that can detect up to 64 human faces in a video frame. Frontal faces provide the best results, while side faces and small faces (less than or equal to 24x24 pixels) are challenging.

The job produces a JSON output file that contains metadata about detected and tracked faces. The metadata includes coordinates indicating the location of faces, as well as a face ID number indicating the tracking of that individual. Face ID numbers are prone to reset under circumstances when the frontal face is lost or overlapped in the frame, resulting in some individuals getting assigned multiple IDs.

The output JSON includes the following elements:

Root JSON elements

Element Description
version This refers to the version of the Video API.
timescale "Ticks" per second of the video.
offset This is the time offset for timestamps. In version 1.0 of Video APIs, this will always be 0. In future scenarios we support, this value may change.
width, hight The width and hight of the output video frame, in pixels.
framerate Frames per second of the video.
fragments The metadata is chunked up into different segments called fragments. Each fragment contains a start, duration, interval number, and event(s).

Fragments JSON elements

Element Description
start The start time of the first event in "ticks."
duration The length of the fragment, in “ticks.”
index (Applies to Azure Media Redactor only) defines the frame index of the current event.
interval The interval of each event entry within the fragment, in “ticks.”
events Each event contains the faces detected and tracked within that time duration. It is an array of events. The outer array represents one interval of time. The inner array consists of 0 or more events that happened at that point in time. An empty bracket [] means no faces were detected.
id The ID of the face that is being tracked. This number may inadvertently change if a face becomes undetected. A given individual should have the same ID throughout the overall video, but this cannot be guaranteed due to limitations in the detection algorithm (occlusion, etc.).
x, y The upper left X and Y coordinates of the face bounding box in a normalized scale of 0.0 to 1.0.
-X and Y coordinates are relative to landscape always, so if you have a portrait video (or upside-down, in the case of iOS), you'll have to transpose the coordinates accordingly.
width, height The width and height of the face bounding box in a normalized scale of 0.0 to 1.0.
facesDetected This is found at the end of the JSON results and summarizes the number of faces that the algorithm detected during the video. Because the IDs can be reset inadvertently if a face becomes undetected (e.g., the face goes off screen, looks away), this number may not always equal the true number of faces in the video.

.NET sample code

The following program shows how to:

  1. Create an asset and upload a media file into the asset.
  2. Create a job with a face redaction task based on a configuration file that contains the following json preset:

            {
                'version':'1.0',
                'options': {
                    'mode':'combined'
                }
            }
    
  3. Download the output JSON files.

Create and configure a Visual Studio project

Set up your development environment and populate the app.config file with connection information, as described in Media Services development with .NET.

Example

using System;
using System.Configuration;
using System.IO;
using System.Linq;
using Microsoft.WindowsAzure.MediaServices.Client;
using System.Threading;
using System.Threading.Tasks;

namespace FaceRedaction
{
    class Program
    {
        // Read values from the App.config file.
        private static readonly string _AADTenantDomain =
            ConfigurationManager.AppSettings["AMSAADTenantDomain"];
        private static readonly string _RESTAPIEndpoint =
            ConfigurationManager.AppSettings["AMSRESTAPIEndpoint"];
        private static readonly string _AMSClientId =
            ConfigurationManager.AppSettings["AMSClientId"];
        private static readonly string _AMSClientSecret =
            ConfigurationManager.AppSettings["AMSClientSecret"];

        // Field for service context.
        private static CloudMediaContext _context = null;

        static void Main(string[] args)
        {
            AzureAdTokenCredentials tokenCredentials =
                new AzureAdTokenCredentials(_AADTenantDomain,
                    new AzureAdClientSymmetricKey(_AMSClientId, _AMSClientSecret),
                    AzureEnvironments.AzureCloudEnvironment);

            var tokenProvider = new AzureAdTokenProvider(tokenCredentials);

            _context = new CloudMediaContext(new Uri(_RESTAPIEndpoint), tokenProvider);

            // Run the FaceRedaction job.
            var asset = RunFaceRedactionJob(@"C:\supportFiles\FaceRedaction\SomeFootage.mp4",
                        @"C:\supportFiles\FaceRedaction\config.json");

            // Download the job output asset.
            DownloadAsset(asset, @"C:\supportFiles\FaceRedaction\Output");
        }

        static IAsset RunFaceRedactionJob(string inputMediaFilePath, string configurationFile)
        {
            // Create an asset and upload the input media file to storage.
            IAsset asset = CreateAssetAndUploadSingleFile(inputMediaFilePath,
            "My Face Redaction Input Asset",
            AssetCreationOptions.None);

            // Declare a new job.
            IJob job = _context.Jobs.Create("My Face Redaction Job");

            // Get a reference to Azure Media Redactor.
            string MediaProcessorName = "Azure Media Redactor";

            var processor = GetLatestMediaProcessorByName(MediaProcessorName);

            // Read configuration from the specified file.
            string configuration = File.ReadAllText(configurationFile);

            // Create a task with the encoding details, using a string preset.
            ITask task = job.Tasks.AddNew("My Face Redaction Task",
            processor,
            configuration,
            TaskOptions.None);

            // Specify the input asset.
            task.InputAssets.Add(asset);

            // Add an output asset to contain the results of the job.
            task.OutputAssets.AddNew("My Face Redaction Output Asset", AssetCreationOptions.None);

            // Use the following event handler to check job progress.  
            job.StateChanged += new EventHandler<JobStateChangedEventArgs>(StateChanged);

            // Launch the job.
            job.Submit();

            // Check job execution and wait for job to finish.
            Task progressJobTask = job.GetExecutionProgressTask(CancellationToken.None);

            progressJobTask.Wait();

            // If job state is Error, the event handling
            // method for job progress should log errors.  Here we check
            // for error state and exit if needed.
            if (job.State == JobState.Error)
            {
                ErrorDetail error = job.Tasks.First().ErrorDetails.First();
                Console.WriteLine(string.Format("Error: {0}. {1}",
                                error.Code,
                                error.Message));
                return null;
            }

            return job.OutputMediaAssets[0];
        }

        static IAsset CreateAssetAndUploadSingleFile(string filePath, string assetName, AssetCreationOptions options)
        {
            IAsset asset = _context.Assets.Create(assetName, options);

            var assetFile = asset.AssetFiles.Create(Path.GetFileName(filePath));
            assetFile.Upload(filePath);

            return asset;
        }

        static void DownloadAsset(IAsset asset, string outputDirectory)
        {
            foreach (IAssetFile file in asset.AssetFiles)
            {
                file.Download(Path.Combine(outputDirectory, file.Name));
            }
        }

        static IMediaProcessor GetLatestMediaProcessorByName(string mediaProcessorName)
        {
            var processor = _context.MediaProcessors
            .Where(p => p.Name == mediaProcessorName)
            .ToList()
            .OrderBy(p => new Version(p.Version))
            .LastOrDefault();

            if (processor == null)
                throw new ArgumentException(string.Format("Unknown media processor",
                                       mediaProcessorName));

            return processor;
        }

        static private void StateChanged(object sender, JobStateChangedEventArgs e)
        {
            Console.WriteLine("Job state changed event:");
            Console.WriteLine("  Previous state: " + e.PreviousState);
            Console.WriteLine("  Current state: " + e.CurrentState);

            switch (e.CurrentState)
            {
                case JobState.Finished:
                    Console.WriteLine();
                    Console.WriteLine("Job is finished.");
                    Console.WriteLine();
                    break;
                case JobState.Canceling:
                case JobState.Queued:
                case JobState.Scheduled:
                case JobState.Processing:
                    Console.WriteLine("Please wait...\n");
                    break;
                case JobState.Canceled:
                case JobState.Error:
                    // Cast sender as a job.
                    IJob job = (IJob)sender;
                    // Display or log error details as needed.
                    // LogJobStop(job.Id);
                    break;
                default:
                    break;
            }
        }
    }
}

Next steps

Read about the Azure Media Services learning paths:

Provide feedback

Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services. You also can go directly to one of the following categories:

Azure Media Services Analytics Overview

Azure Media Analytics demos