November 2019

Volume 34 Number 11

[Artificially Intelligent]

Exploring Face Detection and Recognition

By Frank La Vigne | November 2019 | Get the Code

Frank La VigneHumans are uniquely adept at detecting and recognizing faces, and face recognition is among the first cognitive skills that humans develop. In fact, we’re so good at this that we often perceive faces where there are none—a phenomenon known as pareidolia. In engineering terms, face detection algorithms in the human brain are prone to false positives. It’s taken computer science a considerable amount of time to develop algorithms that can detect faces and accurately identify people based solely on their facial appearance, but now that it’s happened, it’s provoking an important ethical and legal debate.

This isn’t my first dance with face recognition. In my November 2016 Modern Apps column (msdn.com/magazine/mt788628), I used a function built into the Universal Windows Platform (UWP) to detect faces in images captured by a camera. Exploring how the built-in detection system actually recognized faces was beyond the scope of that article at the time, but now I’ll explore in some detail how facial detection and recognition works. Yes, the Cognitive Services Face API (bit.ly/2l0d5y1) can perform all these tasks for you, but I wanted to look closer at how the underlying mechanisms work. Doing so gave me a greater appreciation of the Face API.

Detecting Faces

It’s important to point out two distinct terms that are often used interchangeably: face detection and face recognition. Face detection, as the name implies, is limited to detecting the presence of faces in an image. Face recognition involves discerning unique facial characteristics (such as location and shape of the eyes, nose, mouth) to identify individuals based solely on their facial appearance.

Finding faces in images was a problem that eluded computer science until the early 2000s, when researchers Paul Viola and Michael Jones pioneered an algorithm that now bears their name. The Viola-Jones algorithm scans an image using a rectangular filter looking for contrasting patterns of light and dark. While prone to false positives, Viola-Jones is fast and ideal for low-power and battery-driven devices. For an in-depth explanation of the Viola-Jones algorithm, I recommend the following YouTube video: youtu.be/uEJ71VlUmMQ.

Another approach to detecting faces is the Histogram of Oriented Gradients (HOG), which analyzes each pixel in an image and its immediate neighboring pixels looking for changes from light to dark. This captures two points of data: the magnitude and the direction of the change from light to dark. This has the end effect of finding the edges in an image and ignoring smoother areas where there are little differences between pixels. The next step is to break down the image into smaller units, say 16 x 16 pixels, and then average out the magnitude and direction values. This reduces the data significantly and provides a bit of noise reduction. Noise, in this context, means any changes from light to dark that aren’t of interest to the algorithm. By taking an average over a set of pixels, we blur out the minor variations in an image, such as subtle shadows, and are left with only the major differences that tend to denote the edges of objects.

For a more in-depth look at this algorithm, check out this tutorial: bit.ly/2mkjCne. While it focuses on using HOG in the context of the OpenCV library, the algorithm and the mathematics behind it are the same.

Finding Face Landmarks

Once a face is detected, the next step is to determine the coordinates of common facial features in the image. There are 68 landmark points on the human face that are of interest to most face detection algorithms. These points include the nose, the mouth, eyebrows, jaw and more. Figure 1 depicts highlighted facial landmark points.

The Author with a Face Bounding Box (Left) and Facial Landmark Points (Right)
Figure 1 The Author with a Face Bounding Box (Left) and Facial Landmark Points (Right)

This level of detail provides data for two useful purposes. First, the algorithm can determine the orientation of the face. In a two-dimensional image, the locations of the 68 landmark points can get distorted by the roll, tilt and angle of the face. The algorithm’s ability to infer this information is how popular messaging apps can do things like place virtual sunglasses over people’s eyes or apply makeup on a face and sync its movements with the person as he or she moves. Second, this collection of facial landmark points is used to identify an individual. The distance between these points is a unique characteristic and algorithms can be trained to recognize an individual to a degree of accuracy rivaling humans.

Coding a Face Detection System

Let’s start by creating a Python 3 notebook on your preferred platform (I covered Jupyter Notebooks in a previous column at msdn.com/magazine/mt829269). Create an empty cell, enter the following code to enable inline display of images and execute the cell:

%matplotlib inline

There are multiple libraries that perform face detection and face recognition. For this article, I chose to work with the face_recognition library at pypi.org/project/face_recognition. Create a new cell and enter the following code to install it:

! pip install face_recognition

This process may take a few moments as the package downloads, compiles and installs. Enter the following code into a new cell and execute it to import the required libraries:

from matplotlib.pyplot import imshow
import numpy as np
import PIL.Image
import PIL.ImageDraw
import face_recognition

Next, enter the code in Figure 2 to create a function that takes an image file and runs it through the face_recognition library.

Figure 2 Code for the findFaces Function

def findFaces(imageName):
  image = face_recognition.load_image_file(imageName)
  face_locations = face_recognition.face_locations(image)
  number_of_faces = len(face_locations)
  print("{} face(s) in this image".format(number_of_faces))
  pil_image = PIL.Image.fromarray(image)
  for face_location in face_locations:
    top, right, bottom, left = face_location
    print("Face coordinates: Top: {}, Left: {}, Bottom: {},
      Right: {}".format(top, left, bottom, right))
  imshow(np.asarray(pil_image))

After executing that code, create a new cell and enter the following:

findFaces("frank.jpg")

The output should read:

1 face(s) in this image
Face coordinates: Top: 172, Left: 171, Bottom: 726, Right: 726

Feel free to test the algorithm with your own images. In the project on the Azure Notebook service, I have several images in the project directory to test with, including a crowd image with multiple faces.

Next, enter the following code to display all the face landmarks in an image:

faceImage = face_recognition.load_image_file("frank.jpg")
face_landmarks = face_recognition.face_landmarks(image)
print(face_landmarks)

The result should display sets of coordinates labeled “left eyebrow,” “nose tip” and so on. Again, feel free to experiment with your own images.

As stated earlier, detecting faces and finding landmarks are the precursors to face recognition. Let’s now use the face_recognition library to compare two images—frank.jpg and frank2.jpg—to see if they’re the same person. Enter the following code into a new cell and execute it:

known_image = face_recognition.load_image_file("frank.jpg")
mystery_image = face_recognition.load_image_file("frank2.jpg")
frank_encoding = face_recognition.face_encodings(known_image)[0]
mystery_encoding = face_recognition.face_encodings(mystery_image)[0]
results = face_recognition.compare_faces([frank_encoding], mystery_encoding)
print (results)

Not surprisingly (as both images are of me) the code returns a result with the Boolean value of true. Next, enter the following code into a new cell and execute it:

mystery_image2 = face_recognition.load_image_file("andy.jpg")
mystery_encoding2 = face_recognition.face_encodings(mystery_image2)[0]
results = face_recognition.compare_faces([frank_encoding], mystery_encoding2)
print (results)

In this instance, I’ve introduced the image (andy.jpg) of another person, and compared it to the image of frank.jpg. And no surprise, the result comes back as false. Thus far we’ve only compared images containing one face. What about more complicated images? Let’s now apply facial recognition to images depicting a crowd of people. Enter the following code into a new cell and execute it:

crowd_image = face_recognition.load_image_file("crowd.jpg")
crowd_encoding = face_recognition.face_encodings(crowd_image)
for encoding in crowd_encoding:
  is_frank_in_the_crowd = face_recognition.compare_faces([frank_encoding], encoding)
  print(is_frank_in_the_crowd)

The answer comes back as false 21 times, which is correct as there are 21 faces detected in that image and I am not in the picture. Feel free to experiment with various pictures of your own to see what kind of results you can get.

Wrapping Up

Our faces, by definition, are personally identifiable information (PII). So it comes as little surprise that work in the field of face recognition has stirred up controversy. The Viola-Jones algorithm was a landmark discovery that led to widespread innovation in everything from digital cameras to surveillance systems, yielding deep concerns about privacy and civil liberties (to the point that some jurisdictions have banned use of face recognition by law enforcement). Given our relentless march toward a more connected and data-driven society, concerns around this technology—especially in the context of AI-driven systems—will only intensify.

The ethical and political issues around facial recognition may be complicated, but the algorithms enabling basic face detection and recognition are not. In fact, much of what this article explored could’ve been easily accomplished with the Cognitive Services Face API via a few simple REST calls. However, I feel it’s important for software engineers to have a deeper knowledge of the algorithmic underpinnings of these tools and to appreciate the work that went into creating the Face API. Additionally, knowing more about the underlying mathematical principles can help developers identify edge cases they may encounter.

In this article I touched upon the rich and fascinating subject of facial recognition. This is very much at the forefront of AI research. Much is still being done in this field to reduce the rate of false positives and mitigate biases. And researchers continue to make discoveries, for instance the recent finding that certain patterns and colors in face paint can hinder many face detection systems (bit.ly/2mkn0hW).

Saying Goodbye

These last few years have been an interesting personal and professional journey for me, as I transitioned from a smart client developer to a machine learning engineer. I discovered that much of the training material in the artificial intelligence (AI) space was geared toward academic researchers and those with extensive experience in advanced mathematics. There just wasn’t much for software engineers to grab hold of. This inspired me to retire the Modern Apps column and its coverage of UWP development, and re-launch as Artificially Intelligent. The goal: To explore the realms of data science and AI from a software engineer’s point of view.

This column will be coming to an end, but I plan to continue my work making AI and data science more approachable to software developers. I’ll continue to post articles on my blog and to podcast at DataDriven. Recently, I started hosting virtual summits, one-day virtual events focused on a particular topic. And I have other projects in the works to continue my mission of developer training and empowerment. For a full list of activities and resources, visit franksworld.com/msdn for offers, links and discounts just for readers of this column.

It has been a distinct honor and privilege to write for MSDN Magazine these last several years. I’ve been amazed and delighted to be recognized for my column by people at an event or customer meeting, and to hear how much they learned from the column over the years. Finally, I would like to extend my deep gratitude to Rachel Appel, whose column I took over in 2016, and to Michael Desmond for being such a great (and patient) editor.


Frank La Vigne works at Microsoft as an AI Technology Solutions Professional where he helps companies achieve more by getting the most out of their data with analytics and AI. He also co-hosts the DataDriven podcast. He blogs regularly at FranksWorld.com and you can watch him on his YouTube channel, “Frank’s World TV” (FranksWorld.TV).

Thanks to the following Microsoft technical expert for reviewing this article: Andy Leonard