November 2016

Volume 31 Number 11

[Modern Apps]

Add Facial Recognition Features to Your App

By Frank La

Frank La VigneAt Build 2016, Microsoft announced the release of the Cognitive Services API. Among the many APIs available are several computer vision services. These services can analyze the age and gender of faces in an input image. There’s even an API for detecting individuals’ emotions based on their facial expressions. To highlight the technology, there were numerous kiosks throughout the event space demonstrating various uses of the technology. The Cognitive Services API leverages Microsoft’s experience and efforts in the space of machine learning. Thousands of labeled images were fed through a neural network. Best of all, you can leverage these services without any knowledge of machine learning or artificial intelligence. You simply call a Web service from your app. You can watch an interview with one of the team members involved with the project to learn more about the process at bit.ly/1TGi1QK

With Cognitive Services APIs, you can add basic facial detection to your app without calling any APIs. The Windows.Media.Face­Analysis namespace contains functionality to detect faces in images or videos. The feature set is basic and lacks the rich data set of Cognitive Services. In fact, it’s very similar to facial detection found in many digital cameras. While the features are basic, they have two distinct advantages: They work offline and, because you’re not calling an API, you won’t incur any charges. As an optimization strategy, your app can detect the presence of a face locally before calling the Cognitive Services API. This way the app won’t send images without faces to the Cognitive Services API. That can amount to a significant cost savings for you and reduced bandwidth usage for your users. Detecting faces locally can be a useful augmentation to intelligent cloud services like Cognitive Services.

Setting Up the Project

In Visual Studio 2015, create a new Universal Windows Platform (UWP) app project, choose the Blank template, and name it FaceDetection. Because the app will use the webcam, you must add that capability to the app. In Solution Explorer, double-click on the Package.appxmanifest file. In the Capabilities tab, check the checkboxes next to Microphone and Webcam, as shown in Figure 1. Save the file.

Adding Webcam and Microphone Capabilities to the App
Figure 1 Adding Webcam and Microphone Capabilities to the App

Now, add the following XAML to the MainPage.xaml file to create the UI:

<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
  <Grid.RowDefinitions>
    <RowDefinition Height="320*"/>
    <RowDefinition Height="389*"/>
  </Grid.RowDefinitions>
  <CaptureElement Name="cePreview" Stretch="Uniform" Grid.Row="0" />
  <Canvas x:Name="cvsFaceOverlay" Grid.Row="0" ></Canvas>
  <StackPanel Grid.Row="1" HorizontalAlignment="Center" Margin="5">
    <Button x:Name="btnCamera" Click="btnCamera_Click" >Turn on Camera</Button>
    <Button x:Name="btnDetectFaces" Click="btnDetectFaces_Click" >Detect
      Faces</Button>
  </StackPanel>
</Grid>

You might not be familiar with the CaptureElement control. The CaptureElement control renders a stream from an attached capture device, usually a camera or webcam. In the codebehind, you’ll use the MediaCapture API to connect it to a stream from the webcam.

Previewing the Video from the Camera

In the MainPage.xaml.cs file add the following namespaces:

using Windows.Media.Capture;
using Windows.Media.Core;
using Windows.Media.FaceAnalysis;

Next, add the two following members to the MainPage class:

private FaceDetectionEffect _faceDetectionEffect;
private MediaCapture _mediaCapture;
private IMediaEncodingProperties _previewProperties;

Now, add the following event handler for the Start Camera button:

private async void btnCamera_Click(object sender, RoutedEventArgs e)
  {
    _mediaCapture = new MediaCapture();
    await _mediaCapture.InitializeAsync();
    cePreview.Source = _mediaCapture;
    await _mediaCapture.StartPreviewAsync();
  }

Run the project and then click the Start Camera button. You should now see the output of your webcam in the app. If you don’t have a webcam attached to your system, then an exception will be thrown.

Tracking Faces

With the CaptureElement control successfully streaming video from the Webcam, now it’s time to start tracking faces. Tracking faces requires the creation of a FaceDetectionDefinition object, setting some properties on the object, and then connecting it to the _mediaCapture object that steams the video to the Capture­Element created.

Inside the event handlers for the Detect Faces button, add the following code:

private async void btnDetectFaces_Click(object sender, RoutedEventArgs e)
{
  var faceDetectionDefinition = new FaceDetectionEffectDefinition();
  faceDetectionDefinition.DetectionMode = FaceDetectionMode.HighPerformance;
  faceDetectionDefinition.SynchronousDetectionEnabled = false;
  _faceDetectionEffect = (FaceDetectionEffect) await    
  _mediaCapture.AddVideoEffectAsync(faceDetectionDefinition,
    MediaStreamType.VideoPreview);
  _faceDetectionEffect.FaceDetected += FaceDetectionEffect_FaceDetected;
  _faceDetectionEffect.DesiredDetectionInterval = TimeSpan.FromMilliseconds(33);
  _faceDetectionEffect.Enabled = true;
}

This code creates a FaceDetectionDefinition object that’s optimized for performance. This can be seen in the line of code where DetectionMode is set to HighPerformance. The FaceDetectionMode enumeration has three members: HighPerformance prioritizes speed over accuracy, HighQuality prioritizes accuracy over speed, and Balanced finds a compromise between accuracy and speed. The next line of code doesn’t delay incoming video frames while running the face detection algorithms. This keeps the preview video running smoothly.

Next, the FaceDetectionDefinition is added to the MediaCapture object, along with an enumeration specifying the type of media in the stream. Once added, a FaceDetectionEffect object is returned. This object has a FaceDetected event that fires when a face is detected, a DesiredDetectionInterval property that sets the frequency of face detection, and an Enabled property that enables or disables face detection.

Drawing Rectangles Around Faces

Now that the FaceDetectionEffect has been added to the Media­Capture object and enabled, it’s time to add code to the Face­Detection event handler:

private async void FaceDetectionEffect_FaceDetected(
  FaceDetectionEffect sender, FaceDetectedEventArgs args)
{
  var detectedFaces = args.ResultFrame.DetectedFaces;
  await Dispatcher
    .RunAsync(CoreDispatcherPriority.Normal, 
      () => DrawFaceBoxes(detectedFaces));
}

As this event runs on another thread, you must use the Dispatcher to make changes to the UI thread. The next line of code iterates through the IReadOnlyList of detected faces. Each detected face has a bounding box of where on the image the face was detected. Based on that data, you then create a new Rectangle object and add that to the faces overlay canvas, as shown in Figure 2.

Figure 2 Adding a Rectangle Object to the Faces Overlay Canvas

private void DrawFaceBoxes(IReadOnlyList<DetectedFace> detectedFaces)
{
  cvsFaceOverlay.Children.Clear();
  for (int i = 0; i < detectedFaces.Count; i++)
  {
    var face = detectedFaces[i];
    var faceBounds = face.FaceBox;
    Rectangle faceHighlightRectangle= new Rectangle()
    {
     Height = faceBounds.Height,
     Width = faceBounds.Width
    };
    Canvas.SetLeft(faceHighlightRectangle, faceBounds.X);
    Canvas.SetTop(faceHighlightRectangle, faceBounds.Y);
    faceHighlightRectangle.StrokeThickness = 2;
    faceHighlightRectangle.Stroke = new SolidColorBrush(Colors.Red);
    cvsFaceOverlay.Children.Add(faceHighlightRectangle);
  }
}

Run the solution now and you’ll notice something is “off” about the rectangles, as shown in Figure 3.

Face Detected but Location in the UI Isn’t Right
Figure 3 Face Detected but Location in the UI Isn’t Right

Finding the Correct Offset

The reason that rectangles are off is that the face detection algorithm’s pixel grid starts at the top left of the media stream and not the representation of it shown in the UI. You must also consider that the resolution of the video feed from the camera may differ from the resolution in the UI. To place the rectangle in the proper place, you must take both the position and scale differences in the UI and the video stream into account. To accomplish this, you’ll add two functions that’ll do the work: MapRectangleToDetectedFace and LocatePreviewStreamCoordinates.

The first step is to retrieve information about the preview stream. You do this by casting the class wide _previewProperties to a VideoEncodingProperties object. VideoEncodingProperties describes the format of a video stream. Primarily, you want to know the stream’s height and width. With that information, you can determine the aspect ratio of the media stream and whether it’s different than the CaptureElement control.

The LocatePreviewStreamCoordinates method compares the media stream height and width to those of the CaptureElement control.  Depending on the aspect ratio differences between the two, one of three cases are possible: The aspect ratios are the same and there will be no adjustment. If the aspect ratios differ, then letterboxes will be added. If the CaptureElement’s aspect ratio is greater than the media stream’s aspect ratio, then letterboxes are added to the sides.

If letterboxes are added to the sides, then the face rectangle’s X coordinate must be adjusted. If the media stream’s aspect ratio is greater than the CaptureElement, then letterboxes are added above and below the video. In that case, then the face rectangle’s Y coordinate must be adjusted.

With the placement of the letterboxes taken into account, you now must determine the difference in scaling between the media stream and the CaptureElement control in the UI. With a Rectangle, there are four elements to set: top, left, width and height. Top and left are dependency properties. For a good overview of Dependency Properties, read the article at bit.ly/2bqvsVY.

Run the solution again, as shown in Figure 4, and you should see the placement of the face highlight rectangle to be more accurate, as seen in Figure 5.

Figure 4 Code to Calculate the Correct Offset

private Rectangle MapRectangleToDetectedFace(BitmapBounds detectedfaceBoxCoordinates)
  {
    var faceRectangle = new Rectangle();
    var previewStreamPropterties =
      _previewProperties as VideoEncodingProperties;
    double mediaStreamWidth = previewStreamPropterties.Width;
    double mediaStreamHeight = previewStreamPropterties.Height;
    var faceHighlightRect = LocatePreviewStreamCoordinates(previewStreamPropterties,
      this.cePreview);
    faceRectangle.Width = (detectedfaceBoxCoordinates.Width / mediaStreamWidth) *
      faceHighlightRect.Width;
    faceRectangle.Height = (detectedfaceBoxCoordinates.Height / mediaStreamHeight) *
      faceHighlightRect.Height;
    var x = (detectedfaceBoxCoordinates.X / mediaStreamWidth) *
      faceHighlightRect.Width;
    var y = (detectedfaceBoxCoordinates.Y / mediaStreamHeight) *
      faceHighlightRect.Height;
    Canvas.SetLeft(faceRectangle, x);
    Canvas.SetTop(faceRectangle, y);
    return faceRectangle;
  }
  public Rect LocatePreviewStreamCoordinates(
    VideoEncodingProperties previewResolution,
    CaptureElement previewControl)
  {
    var uiRectangle = new Rect();
    var mediaStreamWidth = previewResolution.Width;
    var mediaStreamHeight = previewResolution.Height;
    uiRectangle.Width = previewControl.ActualWidth;
    uiRectangle.Height = previewControl.ActualHeight;
    var uiRatio = previewControl.ActualWidth / previewControl.ActualHeight;
    var mediaStreamRatio = mediaStreamWidth / mediaStreamHeight;
    if (uiRatio > mediaStreamRatio)
    {
     var scaleFactor = previewControl.ActualHeight / mediaStreamHeight;
     var scaledWidth = mediaStreamWidth * scaleFactor;
     uiRectangle.X = (previewControl.ActualWidth - scaledWidth) / 2.0;
     uiRectangle.Width = scaledWidth;
    }
    else
     {
      var scaleFactor = previewControl.ActualWidth / mediaStreamWidth;
      var scaledHeight = mediaStreamHeight * scaleFactor;
      uiRectangle.Y = (previewControl.ActualHeight - scaledHeight) / 2.0;
      uiRectangle.Height = scaledHeight;
     }
    return uiRectangle;
    }

More Accurate Placement of Face Highlight Rectangle
Figure 5 More Accurate Placement of Face Highlight Rectangle

Stopping Face Detection

Face detection consumes processing power and, in battery-powered mobile devices, can lead to significantly reduced battery life. Once you have face detection on, you might wish to give users the option to turn it off.

Fortunately, turning face detection off is fairly straightforward. First, add a button to the StackPanel in the MainPage.xaml file:

<Button x:Name="btnStopDetection" Click="btnStopDetection_Click">Stop
  Detecting Faces</Button>

Now, add the following code to the event handler for the Stop Detecting Faces button:

private async void btnStopDetection_Click(object sender, RoutedEventArgs e)
  {
    _faceDetectionEffect.Enabled = false;
    _faceDetectionEffect.FaceDetected -= FaceDetectionEffect_FaceDetected;
    await _mediaCapture.ClearEffectsAsync(MediaStreamType.VideoPreview);
    _faceDetectionEffect = null;
  }

The code essentially undoes the setup process. The face detection effect is disabled, the FaceDetected event is unsubscribed from the event handler and the effect is cleared from the media capture object. Finally, the _faceDetectionEffect is set to null to free up memory.

Run the project now. Click on the Start Camera button, then Detect Faces and, last, Stop Detecting Faces. You should notice that even though the app is no longer detecting faces, there’s still a rectangle in the last location a face was detected. Let’s fix that.

Stop the app and go back to the btnStopDetection_Click event handler and add the following line of code to clear the contents of the cvsFacesOverlay canvas:

this.cvsFaceOverlay.Children.Clear();

Run the solution again and repeat all the steps. Now, when face detection is turned off, there are no rectangle highlights.

Wrapping Up

The Cognitive Services APIs provide easy access to powerful computer vision algorithms. Like all cloud services, however, they require Internet access to work. For some use cases that require off­line scenarios, you can still perform basic facial detection by using the Face Detection APIs built right into the UWP. 

Offline use cases can include placing a webcam attached to a Raspberry Pi 2 running Windows IoT Core in a remote location. The app would then save images locally when it detected a face. When data from the device was collected, the images could then be uploaded to Cognitive Services for advanced analysis.

Performing face detection locally also optimizes the bandwidth transmission and cloud service usage by allowing developers to only upload images with faces in them. In short, local Face Detection can augment online scenarios and empower offline uses, as well.

For a deeper dive, be sure to look at the CameraFaceDetection sample in the UWP sample apps on GitHub (bit.ly/2b27gLk).


Frank La Vigne is a technology evangelist on the Microsoft Technology and Civic Engagement team, where he helps users leverage technology to create a better community. He blogs regularly at FranksWorld.com and has a YouTube channel called Frank’s World TV (youtube.com/FranksWorldTV).

Thanks to the following technical expert for reviewing this article: Rachel Appel


Discuss this article in the MSDN Magazine forum