Sunday, May 29, 2022

Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning

https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78

https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbVk2U2JZUnRZVV9hejd0VmpDZVEtMzhkS0dLZ3xBQ3Jtc0tsSE5GMlVaVktTRC1QRmMxWWhHamhLX0h2Y0pGWWFJVnJnUGI1c1FjVkppbDZ3ZTNBX1dwT24tU0NYRTJ0dXFXeE5rRDdkaXRyYTNrU0VkajNYRlVGUzVfWUJrSGU3R19HMnZ0eUd2X3U1cjF5b3RCbw&q=https%3A%2F%2Fwww.computervision.zone%2Fcourses%2Fface-attendance%2F&v=sz25xxF_AVE

This technology is called face recognition. Facebook’s algorithms are able to recognize your friends’ faces after they have been tagged only a few times. It’s pretty amazing technology — Facebook can recognize faces with 98% accuracy which is pretty much as good as humans can do!

Let’s learn how modern face recognition works! But just recognizing your friends would be too easy. We can push this tech to the limit to solve a more challenging problem — telling Will Ferrell (famous actor) apart from Chad Smith (famous rock musician)!

One of these people is Will Farrell. The other is Chad Smith. I swear they are different people!

How to use Machine Learning on a Very Complicated Problem

So far in Part 1, 2 and 3, we’ve used machine learning to solve isolated problems that have only one step — estimating the price of a house, generating new data based on existing data and telling if an image contains a certain object. All of those problems can be solved by choosing one machine learning algorithm, feeding in data, and getting the result.

But face recognition is really a series of several related problems:

First, look at a picture and find all the faces in it
Second, focus on each face and be able to understand that even if a face is turned in a weird direction or in bad lighting, it is still the same person.
Third, be able to pick out unique features of the face that you can use to tell it apart from other people— like how big the eyes are, how long the face is, etc.
Finally, compare the unique features of that face to all the people you already know to determine the person’s name.

As a human, your brain is wired to do all of this automatically and instantly. In fact, humans are too good at recognizing faces and end up seeing faces in everyday objects:

Computers are not capable of this kind of high-level generalization (at least not yet…), so we have to teach them how to do each step in this process separately.

We need to build a pipeline where we solve each step of face recognition separately and pass the result of the current step to the next step. In other words, we will chain together several machine learning algorithms:

How a basic pipeline for detecting faces might work

Face Recognition — Step by Step

Let’s tackle this problem one step at a time. For each step, we’ll learn about a different machine learning algorithm. I’m not going to explain every single algorithm completely to keep this from turning into a book, but you’ll learn the main ideas behind each one and you’ll learn how you can build your own facial recognition system in Python using OpenFace and dlib.

Step 1: Finding all the Faces

The first step in our pipeline is face detection. Obviously we need to locate the faces in a photograph before we can try to tell them apart!

If you’ve used any camera in the last 10 years, you’ve probably seen face detection in action:

Face detection is a great feature for cameras. When the camera can automatically pick out faces, it can make sure that all the faces are in focus before it takes the picture. But we’ll use it for a different purpose — finding the areas of the image we want to pass on to the next step in our pipeline.

Face detection went mainstream in the early 2000's when Paul Viola and Michael Jones invented a way to detect faces that was fast enough to run on cheap cameras. However, much more reliable solutions exist now. We’re going to use a method invented in 2005 called Histogram of Oriented Gradients — or just HOG for short.

To find faces in an image, we’ll start by making our image black and white because we don’t need color data to find faces:

Then we’ll look at every single pixel in our image one at a time. For every single pixel, we want to look at the pixels that directly surrounding it:

Our goal is to figure out how dark the current pixel is compared to the pixels directly surrounding it. Then we want to draw an arrow showing in which direction the image is getting darker:

Looking at just this one pixel and the pixels touching it, the image is getting darker towards the upper right.

If you repeat that process for every single pixel in the image, you end up with every pixel being replaced by an arrow. These arrows are called gradients and they show the flow from light to dark across the entire image:

This might seem like a random thing to do, but there’s a really good reason for replacing the pixels with gradients. If we analyze pixels directly, really dark images and really light images of the same person will have totally different pixel values. But by only considering the direction that brightness changes, both really dark images and really bright images will end up with the same exact representation. That makes the problem a lot easier to solve!

But saving the gradient for every single pixel gives us way too much detail. We end up missing the forest for the trees. It would be better if we could just see the basic flow of lightness/darkness at a higher level so we could see the basic pattern of the image.

To do this, we’ll break up the image into small squares of 16x16 pixels each. In each square, we’ll count up how many gradients point in each major direction (how many point up, point up-right, point right, etc…). Then we’ll replace that square in the image with the arrow directions that were the strongest.

The end result is we turn the original image into a very simple representation that captures the basic structure of a face in a simple way:

The original image is turned into a HOG representation that captures the major features of the image regardless of image brightnesss.

To find faces in this HOG image, all we have to do is find the part of our image that looks the most similar to a known HOG pattern that was extracted from a bunch of other training faces:

Using this technique, we can now easily find faces in any image:

If you want to try this step out yourself using Python and dlib, here’s code showing how to generate and view HOG representations of images.

Step 2: Posing and Projecting Faces

Whew, we isolated the faces in our image. But now we have to deal with the problem that faces turned different directions look totally different to a computer:

Humans can easily recognize that both images are of Will Ferrell, but computers would see these pictures as two completely different people.

To account for this, we will try to warp each picture so that the eyes and lips are always in the sample place in the image. This will make it a lot easier for us to compare faces in the next steps.

To do this, we are going to use an algorithm called face landmark estimation. There are lots of ways to do this, but we are going to use the approach invented in 2014 by Vahid Kazemi and Josephine Sullivan.

The basic idea is we will come up with 68 specific points (called landmarks) that exist on every face — the top of the chin, the outside edge of each eye, the inner edge of each eyebrow, etc. Then we will train a machine learning algorithm to be able to find these 68 specific points on any face:

The 68 landmarks we will locate on every face. This image was created by Brandon Amos of CMU who works on OpenFace.

Here’s the result of locating the 68 face landmarks on our test image:

PROTIP: You can also use this same technique to implement your own version of Snapchat’s real-time 3d face filters!

Now that we know were the eyes and mouth are, we’ll simply rotate, scale and shear the image so that the eyes and mouth are centered as best as possible. We won’t do any fancy 3d warps because that would introduce distortions into the image. We are only going to use basic image transformations like rotation and scale that preserve parallel lines (called affine transformations):

Now no matter how the face is turned, we are able to center the eyes and mouth are in roughly the same position in the image. This will make our next step a lot more accurate.

If you want to try this step out yourself using Python and dlib, here’s the code for finding face landmarks and here’s the code for transforming the image using those landmarks.

Step 3: Encoding Faces

Now we are to the meat of the problem — actually telling faces apart. This is where things get really interesting!

The simplest approach to face recognition is to directly compare the unknown face we found in Step 2 with all the pictures we have of people that have already been tagged. When we find a previously tagged face that looks very similar to our unknown face, it must be the same person. Seems like a pretty good idea, right?

There’s actually a huge problem with that approach. A site like Facebook with billions of users and a trillion photos can’t possibly loop through every previous-tagged face to compare it to every newly uploaded picture. That would take way too long. They need to be able to recognize faces in milliseconds, not hours.

What we need is a way to extract a few basic measurements from each face. Then we could measure our unknown face the same way and find the known face with the closest measurements. For example, we might measure the size of each ear, the spacing between the eyes, the length of the nose, etc. If you’ve ever watched a bad crime show like CSI, you know what I am talking about:

The most reliable way to measure a face

Ok, so which measurements should we collect from each face to build our known face database? Ear size? Nose length? Eye color? Something else?

It turns out that the measurements that seem obvious to us humans (like eye color) don’t really make sense to a computer looking at individual pixels in an image. Researchers have discovered that the most accurate approach is to let the computer figure out the measurements to collect itself. Deep learning does a better job than humans at figuring out which parts of a face are important to measure.

The solution is to train a Deep Convolutional Neural Network (just like we did in Part 3). But instead of training the network to recognize pictures objects like we did last time, we are going to train it to generate 128 measurements for each face.

The training process works by looking at 3 face images at a time:

Load a training face image of a known person
Load another picture of the same known person
Load a picture of a totally different person

Then the algorithm looks at the measurements it is currently generating for each of those three images. It then tweaks the neural network slightly so that it makes sure the measurements it generates for #1 and #2 are slightly closer while making sure the measurements for #2 and #3 are slightly further apart:

After repeating this step millions of times for millions of images of thousands of different people, the neural network learns to reliably generate 128 measurements for each person. Any ten different pictures of the same person should give roughly the same measurements.

Machine learning people call the 128 measurements of each face an embedding. The idea of reducing complicated raw data like a picture into a list of computer-generated numbers comes up a lot in machine learning (especially in language translation). The exact approach for faces we are using was invented in 2015 by researchers at Google but many similar approaches exist.

Encoding our face image

This process of training a convolutional neural network to output face embeddings requires a lot of data and computer power. Even with an expensive NVidia Telsa video card, it takes about 24 hours of continuous training to get good accuracy.

But once the network has been trained, it can generate measurements for any face, even ones it has never seen before! So this step only needs to be done once. Lucky for us, the fine folks at OpenFace already did this and they published several trained networks which we can directly use. Thanks Brandon Amos and team!

So all we need to do ourselves is run our face images through their pre-trained network to get the 128 measurements for each face. Here’s the measurements for our test image:

So what parts of the face are these 128 numbers measuring exactly? It turns out that we have no idea. It doesn’t really matter to us. All that we care is that the network generates nearly the same numbers when looking at two different pictures of the same person.

If you want to try this step yourself, OpenFace provides a lua script that will generate embeddings all images in a folder and write them to a csv file. You run it like this.

Step 4: Finding the person’s name from the encoding

This last step is actually the easiest step in the whole process. All we have to do is find the person in our database of known people who has the closest measurements to our test image.

You can do that by using any basic machine learning classification algorithm. No fancy deep learning tricks are needed. We’ll use a simple linear SVM classifier, but lots of classification algorithms could work.

All we need to do is train a classifier that can take in the measurements from a new test image and tells which known person is the closest match. Running this classifier takes milliseconds. The result of the classifier is the name of the person!

So let’s try out our system. First, I trained a classifier with the embeddings of about 20 pictures each of Will Ferrell, Chad Smith and Jimmy Falon:

Then I ran the classifier on every frame of the famous youtube video of Will Ferrell and Chad Smith pretending to be each other on the Jimmy Fallon show:

It works! And look how well it works for faces in different poses — even sideways faces!

Running this Yourself

Let’s review the steps we followed:

Encode a picture using the HOG algorithm to create a simplified version of the image. Using this simplified image, find the part of the image that most looks like a generic HOG encoding of a face.
Figure out the pose of the face by finding the main landmarks in the face. Once we find those landmarks, use them to warp the image so that the eyes and mouth are centered.
Pass the centered face image through a neural network that knows how to measure features of the face. Save those 128 measurements.
Looking at all the faces we’ve measured in the past, see which person has the closest measurements to our face’s measurements. That’s our match!

Now that you know how this all works, here’s instructions from start-to-finish of how run this entire face recognition pipeline on your own computer:

UPDATE 4/9/2017: You can still follow the steps below to use OpenFace. However, I’ve released a new Python-based face recognition library called face_recognition that is much easier to install and use. So I’d recommend trying out face_recognition first instead of continuing below!

I even put together a pre-configured virtual machine with face_recognition, OpenCV, TensorFlow and lots of other deep learning tools pre-installed. You can download and run it on your computer very easily. Give the virtual machine a shot if you don’t want to install all these libraries yourself!

Original OpenFace instructions:

If you liked this article, please consider signing up for my Machine Learning is Fun! newsletter:

Face recognition with OpenCV, Python, and deep learning Install (installed virtualenv + virtualenvwrapper )

https://pyimagesearch.com/2018/06/18/face-recognition-with-opencv-python-and-deep-learning/#pyis-cta-modal

Note: For the following installs, ensure you are in a Python virtual environment if you’re using one. I highly recommend virtual environments for isolating your projects — it is a Python best practice. If you’ve followed my OpenCV install guides (and installed

virtualenv  + 

virtualenvwrapper ) then you can use the 

workon

command prior to installing dlib and face_recognition.

Face recognition with OpenCV, Python, and deep learning

Inside this tutorial, you will learn how to perform facial recognition using OpenCV, Python, and deep learning.

We’ll start with a brief discussion of how deep learning-based facial recognition works, including the concept of “deep metric learning.”

From there, I will help you install the libraries you need to actually perform face recognition.

Finally, we’ll implement face recognition for both still images and video streams.

As we’ll discover, our face recognition implementation will be capable of running in real-time.

Understanding deep learning face recognition embeddings

So, how does deep learning + face recognition work?

The secret is a technique called deep metric learning.

If you have any prior experience with deep learning you know that we typically train a network to:

Accept a single input image
And output a classification/label for that image

However, deep metric learning is different.

Instead, of trying to output a single label (or even the coordinates/bounding box of objects in an image), we are instead outputting a real-valued feature vector.

For the dlib facial recognition network, the output feature vector is 128-d (i.e., a list of 128 real-valued numbers) that is used to quantify the face. Training the network is done using triplets:

Figure 1: Facial recognition via deep metric learning involves a “triplet training step.” The triplet consists of 3 unique face images — 2 of the 3 are the same person. The NN generates a 128-d vector for each of the 3 face images. For the 2 face images of the same person, we tweak the neural network weights to make the vector closer via distance metric. *Image credit:* Adam Geitgey’s *“Machine Learning is Fun”* blog

Here we provide three images to the network:

Two of these images are example faces of the same person.
The third image is a random face from our dataset and is not the same person as the other two images.

As an example, let’s again consider Figure 1 where we provided three images: one of Chad Smith and two of Will Ferrell.

Our network quantifies the faces, constructing the 128-d embedding (quantification) for each.

From there, the general idea is that we’ll tweak the weights of our neural network so that the 128-d measurements of the two Will Ferrel will be closer to each other and farther from the measurements for Chad Smith.

Our network architecture for face recognition is based on ResNet-34 from the Deep Residual Learning for Image Recognition paper by He et al., but with fewer layers and the number of filters reduced by half.

The network itself was trained by Davis King on a dataset of ≈3 million images. On the Labeled Faces in the Wild (LFW) dataset the network compares to other state-of-the-art methods, reaching 99.38% accuracy.

Both Davis King (the creator of dlib) and Adam Geitgey (the author of the face_recognition module we’ll be using shortly) have written detailed articles on how deep learning-based facial recognition works:

I would highly encourage you to read the above articles for more details on how deep learning facial embeddings work.

Install your face recognition libraries

In order to perform face recognition with Python and OpenCV we need to install two additional libraries:

The dlib library, maintained by Davis King, contains our implementation of “deep metric learning” which is used to construct our face embeddings used for the actual recognition process.

The

face_recognition

library, created by Adam Geitgey, wraps around dlib’s facial recognition functionality, making it easier to work with.

Learn from Adam Geitgey and Davis King at PyImageConf 2018

I assume that you have OpenCV installed on your system. If not, no worries — just visit my OpenCV install tutorials page and follow the guide appropriate for your system.

From there, let’s install

dlib

and the

face_recognition

packages.

virtualenv

virtualenvwrapper

) then you can use the

workon

command prior to installing

dlib

and

face_recognition

Installing `dlib` without GPU support

If you do not have a GPU you can install

dlib

using pip by following this guide:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ workon # optional
$ pip install dlib

Or you can compile from source:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ workon <your env name here> # optional
$ git clone https://github.com/davisking/dlib.git
$ cd dlib
$ mkdir build
$ cd build
$ cmake .. -DUSE_AVX_INSTRUCTIONS=1
$ cmake --build .
$ cd ..
$ python setup.py install --yes USE_AVX_INSTRUCTIONS

Installing `dlib` with GPU support (optional)

If you do have a CUDA compatible GPU you can install

dlib

with GPU support, making facial recognition faster and more efficient.

For this, I recommend installing

dlib

from source as you’ll have more control over the build:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ workon <your env name here> # optional
$ git clone https://github.com/davisking/dlib.git
$ cd dlib
$ mkdir build
$ cd build
$ cmake .. -DDLIB_USE_CUDA=1 -DUSE_AVX_INSTRUCTIONS=1
$ cmake --build .
$ cd ..
$ python setup.py install --yes USE_AVX_INSTRUCTIONS --yes DLIB_USE_CUDA

Install the `face_recognition` package

The face_recognition module is installable via a simple pip command:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ workon <your env name here> # optional
$ pip install face_recognition

Install `imutils`

You’ll also need my package of convenience functions, imutils. You may install it in your Python virtual environment via pip:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ workon <your env name here> # optional
$ pip install imutils

Our face recognition dataset

Figure 2: An example face recognition dataset was created programmatically with Python and the Bing Image Search API. Shown are six of the characters from the Jurassic Park movie series.

Since Jurassic Park (1993) is my favorite movie of all time, and in honor of Jurassic World: Fallen Kingdom (2018) being released this Friday in the U.S., we are going to apply face recognition to a sample of the characters in the films:

Alan Grant, paleontologist (22 images)
Claire Dearing, park operations manager (53 images)
Ellie Sattler, paleobotanist (31 images)
Ian Malcolm, mathematician (41 images)
John Hammond, businessman/Jurassic Park owner (36 images)
Owen Grady, dinosaur researcher (35 images)

This dataset was constructed in < 30 minutes using the method discussed in my How to (quickly) build a deep learning image dataset tutorial. Given this dataset of images we’ll:

Create the 128-d embeddings for each face in the dataset
Use these embeddings to recognize the faces of the characters in both images and video streams

Face recognition project structure

Our project structure can be seen by examining the output from the

tree

command:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ tree --filelimit 10 --dirsfirst
.
├── dataset
│   ├── alan_grant [22 entries]
│   ├── claire_dearing [53 entries]
│   ├── ellie_sattler [31 entries]
│   ├── ian_malcolm [41 entries]
│   ├── john_hammond [36 entries]
│   └── owen_grady [35 entries]
├── examples
│   ├── example_01.png
│   ├── example_02.png
│   └── example_03.png
├── output
│   └── lunch_scene_output.avi
├── videos
│   └── lunch_scene.mp4
├── search_bing_api.py
├── encode_faces.py
├── recognize_faces_image.py
├── recognize_faces_video.py
├── recognize_faces_video_file.py
└── encodings.pickle
10 directories, 11 files

Our project has 4 top-level directories:

dataset/
: Contains face images for six characters organized into subdirectories based on their respective names.
examples/
: Has three face images for testing that are not in the dataset.
output/
: This is where you can store your processed face recognition videos. I’m leaving one of mine in the folder — the classic “lunch scene” from the original Jurassic Park movie.
videos/
: Input videos should be stored in this folder. This folder also contains the “lunch scene” video but it hasn’t undergone our face recognition system yet.

We also have 6 files in the root directory:

search_bing_api.py
: Step 1 is to build a dataset (I’ve already done this for you). To learn how to use the Bing API to build a dataset with my script, just see this blog post.
encode_faces.py
: Encodings (128-d vectors) for faces are built with this script.
recognize_faces_image.py
: Recognize faces in a single image (based on encodings from your dataset).
recognize_faces_video.py
: Recognize faces in a live video stream from your webcam and output a video.
recognize_faces_video_file.py
: Recognize faces in a video file residing on disk and output the processed video to disk. I won’t be discussing this file today as the bones are from the same skeleton as the video stream file.
encodings.pickle
: Facial recognitions encodings are generated from your dataset via
encode_faces.py
and then serialized to disk.

After a dataset of images is created (with

search_bing_api.py

), we’ll run

encode_faces.py

to build the embeddings.

From there, we’ll run the recognize scripts to actually recognize the faces.

Encoding the faces using OpenCV and deep learning

Figure 3: Facial recognition via deep learning and Python using the `face_recognition` module method generates a 128-d real-valued number feature vector per face.

Before we can recognize faces in images and videos, we first need to quantify the faces in our training set. Keep in mind that we are not actually training a network here — the network has already been trained to create 128-d embeddings on a dataset of

~3

million images.

We certainly could train a network from scratch or even fine-tune the weights of an existing model but that is more than likely overkill for many projects. Furthermore, you would need a lot of images to train the network from scratch.

Instead, it’s easier to use the pre-trained network and then use it to construct 128-d embeddings for each of the 218 faces in our dataset.

Then, during classification, we can use a simple k-NN model + votes to make the final face classification. Other traditional machine learning models can be used here as well.

To construct our face embeddings open up

encode_faces.py

from the “Downloads” associated with this blog post:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# import the necessary packages
from imutils import paths
import face_recognition
import argparse
import pickle
import cv2
import os

First, we need to import required packages. Again, take note that this script requires

imutils

face_recognition

, and OpenCV installed. Scroll up to the “Install your face recognition libraries” to make sure you have the libraries ready to go on your system.

Let’s handle our command line arguments that are processed at runtime with

argparse

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--dataset", required=True,
	help="path to input directory of faces + images")
ap.add_argument("-e", "--encodings", required=True,
	help="path to serialized db of facial encodings")
ap.add_argument("-d", "--detection-method", type=str, default="cnn",
	help="face detection model to use: either `hog` or `cnn`")
args = vars(ap.parse_args())

If you’re new to PyImageSearch, let me direct your attention to the above code block which will become familiar to you as you read more of my blog posts. We’re using

argparse

to parse command line arguments. When you run a Python program in your command line, you can provide additional information to the script without leaving your terminal. Lines 10-17 do not need to be modified as they parse input coming from the terminal. Check out my blog post about command line arguments if these lines look unfamiliar.

Let’s list out the argument flags and discuss them:

--dataset
: The path to our dataset (we created a dataset with
search_bing_api.py
described in method #2 of last week’s blog post).
--encodings
: Our face encodings are written to the file that this argument points to.
--detection-method
: Before we can encode faces in images we first need to detect them. Or two face detection methods include either
hog
or
cnn
. Those two flags are the only ones that will work for
--detection-method
.

Now that we’ve defined our arguments, let’s grab the paths to the files in our dataset (as well as perform two initializations):

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# grab the paths to the input images in our dataset
print("[INFO] quantifying faces...")
imagePaths = list(paths.list_images(args["dataset"]))
# initialize the list of known encodings and known names
knownEncodings = []
knownNames = []

Line 21 uses the path to our input dataset directory to build a list of all

imagePaths

contained therein.

We also need to initialize two lists before our loop,

knownEncodings

and

knownNames

, respectively. These two lists will contain the face encodings and corresponding names for each person in the dataset (Lines 24 and 25).

It’s time to begin looping over our Jurassic Park character faces!

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
	# extract the person name from the image path
	print("[INFO] processing image {}/{}".format(i + 1,
		len(imagePaths)))
	name = imagePath.split(os.path.sep)[-2]
	# load the input image and convert it from BGR (OpenCV ordering)
	# to dlib ordering (RGB)
	image = cv2.imread(imagePath)
	rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

This loop will cycle 218 times corresponding to our 218 face images in the dataset. We’re looping over the paths to each of the images on Line 28.

From there, we’ll extract the

name

of the person from the

imagePath

(as our subdirectory is named appropriately) on Line 32.

Then let’s load the

image

while passing the

imagePath

cv2.imread

(Line 36).

OpenCV orders color channels in BGR, but the

dlib

actually expects RGB. The

face_recognition

module uses

dlib

, so before we proceed, let’s swap color spaces on Line 37, naming the new image

rgb

Next, let’s localize the face and compute encodings:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
	# detect the (x, y)-coordinates of the bounding boxes
	# corresponding to each face in the input image
	boxes = face_recognition.face_locations(rgb,
		model=args["detection_method"])
	# compute the facial embedding for the face
	encodings = face_recognition.face_encodings(rgb, boxes)
	# loop over the encodings
	for encoding in encodings:
		# add each encoding + name to our set of known names and
		# encodings
		knownEncodings.append(encoding)
		knownNames.append(name)

This is the fun part of the script!

For each iteration of the loop, we’re going to detect a face (or possibly multiple faces and assume that it is the same person in multiple locations of the image — this assumption may or may not hold true in your own images so be careful here).

For example, let’s say that

rgb

contains a picture (or pictures) of Ellie Sattler’s face.

Lines 41 and 42 actually find/localize the faces of her resulting in a list of face

boxes

. We pass two parameters to the

face_recognition.face_locations

method:

rgb
: Our RGB image.
model
: Either
cnn
or
hog
(this value is contained within our command line arguments dictionary associated with the
"detection_method"
key). The CNN method is more accurate but slower. HOG is faster but less accurate.

Then, we’re going to turn the bounding

boxes

of Ellie Sattler’s face into a list of 128 numbers on Line 45. This is known as encoding the face into a vector and the

face_recognition.face_encodings

method handles it for us.

From there we just need to append the Ellie Sattler

encoding

and

name

to the appropriate list (

knownEncodings

and

knownNames

We’ll continue to do this for all 218 images in the dataset.

What would be the point of encoding the images unless we could use the

encodings

in another script which handles the recognition?

Let’s take care of that now:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# dump the facial encodings + names to disk
print("[INFO] serializing encodings...")
data = {"encodings": knownEncodings, "names": knownNames}
f = open(args["encodings"], "wb")
f.write(pickle.dumps(data))
f.close()

Line 56 constructs a dictionary with two keys —

"encodings"

and

"names"

From there Lines 57-59 dump the names and encodings to disk for future recall.

How should I run the

encode_faces.py

script in the terminal?

To create our facial embeddings open up a terminal and execute the following command:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ python encode_faces.py --dataset dataset --encodings encodings.pickle
[INFO] quantifying faces...
[INFO] processing image 1/218
[INFO] processing image 2/218
[INFO] processing image 3/218
...
[INFO] processing image 216/218
[INFO] processing image 217/218
[INFO] processing image 218/218
[INFO] serializing encodings...
$ ls -lh encodings*
-rw-r--r--@ 1 adrian  staff   234K May 29 13:03 encodings.pickle

As you can see from our output, we now have a file named

encodings.pickle

— this file contains the 128-d face embeddings for each face in our dataset.

On my Titan X GPU, processing the entire dataset took a little over a minute, but if you’re using a CPU, be prepared to wait awhile for this script complete!

On my Macbook Pro (no GPU), encoding 218 images required 21min 20sec.

You should expect much faster speeds if you have a GPU and compiled dlib with GPU support.

Recognizing faces in images

Figure 4: John Hammond’s face is recognized using Adam Geitgey’s deep learning `face_recognition` Python module.

Now that we have created our 128-d face embeddings for each image in our dataset, we are now ready to recognize faces in image using OpenCV, Python, and deep learning.

Open up

recognize_faces_image.py

and insert the following code (or better yet, grab the files and image data associated with this blog post from the “Downloads” section found at the bottom of this post, and follow along):

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# import the necessary packages
import face_recognition
import argparse
import pickle
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-e", "--encodings", required=True,
	help="path to serialized db of facial encodings")
ap.add_argument("-i", "--image", required=True,
	help="path to input image")
ap.add_argument("-d", "--detection-method", type=str, default="cnn",
	help="face detection model to use: either `hog` or `cnn`")
args = vars(ap.parse_args())

This script requires just four imports on Lines 2-5. The

face_recognition

module will do the heavy lifting and OpenCV will help us to load, convert, and display the processed image.

We’ll parse three command line arguments on Lines 8-15:

--encodings
: The path to the pickle file containing our face encodings.
--image
: This is the image that is undergoing facial recognition.
--detection-method
: You should be familiar with this one by now — we’re either going to use a
hog
or
cnn
method depending on the capability of your system. For speed, choose
hog
and for accuracy, choose
cnn
.

IMPORTANT! If you are:

Running the face recognition code on a CPU
OR you using a Raspberry Pi
…you’ll want to set the
--detection-method
to
hog
as the CNN face detector is (1) slow without a GPU and (2) the Raspberry Pi won’t have enough memory to run the CNN either.

From there, let’s load the pre-computed encodings + face names and then construct the 128-d face encoding for the input image:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(args["encodings"], "rb").read())
# load the input image and convert it from BGR to RGB
image = cv2.imread(args["image"])
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# detect the (x, y)-coordinates of the bounding boxes corresponding
# to each face in the input image, then compute the facial embeddings
# for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb,
	model=args["detection_method"])
encodings = face_recognition.face_encodings(rgb, boxes)
# initialize the list of names for each face detected
names = []

Line 19 loads our pickled encodings and face names from disk. We’ll need this data later during the actual face recognition step.

Then, on Lines 22 and 23 we load and convert the input

image

rgb

color channel ordering (just as we did in the

encode_faces.py

script).

We then proceed to detect all faces in the input image and compute their 128-d

encodings

on Lines 29-31 (these lines should also look familiar).

Now is a good time to initialize a list of

names

for each face that is detected — this list will be populated in the next step.

Next, let’s loop over the facial

encodings

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# loop over the facial embeddings
for encoding in encodings:
	# attempt to match each face in the input image to our known
	# encodings
	matches = face_recognition.compare_faces(data["encodings"],
		encoding)
	name = "Unknown"

On Line 37, we begin to loop over the face encodings computed from our input image.

Then the facial recognition magic happens!

We attempt to match each face in the input image (

encoding

) to our known encodings dataset (held in

data["encodings"]

) using

face_recognition.compare_faces

(Lines 40 and 41).

This function returns a list of

True

False

values, one for each image in our dataset. For our Jurassic Park example, there are 218 images in the dataset and therefore the returned list will have 218 boolean values.

Internally, the

compare_faces

function is computing the Euclidean distance between the candidate embedding and all faces in our dataset:

If the distance is below some tolerance (the smaller the tolerance, the more strict our facial recognition system will be) then we return
True
, indicating the faces match.
Otherwise, if the distance is above the tolerance threshold we return
False
as the faces do not match.

Essentially, we are utilizing a “more fancy” k-NN model for classification. Be sure to refer to the compare_faces implementation for more details.

The

name

variable will eventually hold the name string of the person — for now, we leave it as

"Unknown"

in case there are no “votes” (Line 42).

Given our

matches

list we can compute the number of “votes” for each name (number of

True

values associated with each name), tally up the votes, and select the person’s name with the most corresponding votes:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
	# check to see if we have found a match
	if True in matches:
		# find the indexes of all matched faces then initialize a
		# dictionary to count the total number of times each face
		# was matched
		matchedIdxs = [i for (i, b) in enumerate(matches) if b]
		counts = {}
		# loop over the matched indexes and maintain a count for
		# each recognized face face
		for i in matchedIdxs:
			name = data["names"][i]
			counts[name] = counts.get(name, 0) + 1
		# determine the recognized face with the largest number of
		# votes (note: in the event of an unlikely tie Python will
		# select first entry in the dictionary)
		name = max(counts, key=counts.get)
	
	# update the list of names
	names.append(name)

If there are any

True

votes in

matches

(Line 45) we need to determine the indexes of where these

True

values are in

matches

. We do just that on Line 49 where we construct a simple list of

matchedIdxs

which might look like this for

example_01.png

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
(Pdb) matchedIdxs
[35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75]

We then initialize a dictionary called

counts

which will hold the character name as the key the number of votes as the value (Line 50).

From there, let’s loop over the

matchedIdxs

and set the value associated with each name while incrementing it as necessary in

counts

. The

counts

dictionary might look like this for a high vote score for Ian Malcolm:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
(Pdb) counts
{'ian_malcolm': 40}

Recall that we only have 41 pictures of Ian in the dataset, so a score of 40 with no votes for anybody else is extremely high.

Line 61 extracts the name with the most votes from

counts

, in this case, it would be

'ian_malcolm'

The second iteration of our loop (as there are two faces in our example image) of the main facial encodings loop yields the following for

counts

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
(Pdb) counts
{'alan_grant': 5}

That is definitely a smaller vote score, but still, there is only one name in the dictionary so we likely have found Alan Grant.

Note: The PDB Python Debugger was used to verify values of the

counts

dictionary. PDB usage is outside the scope of this blog post; however, you can discover how to use it on the Python docs page.

As shown in Figure 5 below, both Ian Malcolm and Alan Grant have been correctly recognized, so this part of the script is working well.

Let’s move on and loop over the bounding boxes and labeled names for each person and draw them on our output image for visualization purposes:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# loop over the recognized faces
for ((top, right, bottom, left), name) in zip(boxes, names):
	# draw the predicted face name on the image
	cv2.rectangle(image, (left, top), (right, bottom), (0, 255, 0), 2)
	y = top - 15 if top - 15 > 15 else top + 15
	cv2.putText(image, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,
		0.75, (0, 255, 0), 2)
# show the output image
cv2.imshow("Image", image)
cv2.waitKey(0)

On Line 67, we begin looping over the detected face bounding

boxes

and predicted

names

. To create an iterable object so we can easily loop through the values, we call

zip(boxes, names)

resulting in tuples that we can extract the box coordinates and name from.

We use the box coordinates to draw a green rectangle on Line 69.

We also use the coordinates to calculate where we should draw the text for the person’s name (Line 70) followed by actually placing the name text on the image (Lines 71 and 72). If the face bounding box is at the very top of the image, we need to move the text below the top of the box (handled on Line 70), otherwise, the text would be cut off.

We then proceed to display the image until a key is pressed (Lines 75 and 76).

How should you run the facial recognition Python script?

Using your terminal, first ensure you’re in your respective Python correct virtual environment with the

workon

command (if you are using a virtual environment, of course).

Then run the script while providing the two command line arguments at a minimum. If you choose to use the HoG method, be sure to pass

--detection-method hog

as well (otherwise it will default to the deep learning detector).

Let’s go for it!

To recognize a face using OpenCV and Python open up your terminal and execute our script:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ python recognize_faces_image.py --encodings encodings.pickle \
	--image examples/example_01.png
[INFO] loading encodings...
[INFO] recognizing faces...

Figure 5: Alan Grant and Ian Malcolm’s faces are recognized using our Python + OpenCV + deep learning method.

A second face recognition example follows:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ python recognize_faces_image.py --encodings encodings.pickle \
	--image examples/example_03.png
[INFO] loading encodings...
[INFO] recognizing faces...

Figure 6: Face recognition with OpenCV and Python.

Recognizing faces in video

Figure 7: Face recognition for videos with OpenCV and Python.

Now that we have applied face recognition to images let’s also apply face recognition to videos (in real-time) as well.

Important Performance Note: The CNN face recognizer should only be used in real-time if you are working with a GPU (you can use it with a CPU, but expect less than 0.5 FPS which makes for a choppy video). Alternatively (you are using a CPU), you should use the HoG method (or even OpenCV Haar cascades covered in a future blog post) and expect adequate speeds.

The following script draws many parallels from the previous

recognize_faces_image.py

script. Therefore I’ll be breezing past what we’ve already covered and just review the video components so that you understand what is going on.

Once you’ve grabbed the “Downloads”, open up

recognize_faces_video.py

and follow along:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# import the necessary packages
from imutils.video import VideoStream
import face_recognition
import argparse
import imutils
import pickle
import time
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-e", "--encodings", required=True,
	help="path to serialized db of facial encodings")
ap.add_argument("-o", "--output", type=str,
	help="path to output video")
ap.add_argument("-y", "--display", type=int, default=1,
	help="whether or not to display output frame to screen")
ap.add_argument("-d", "--detection-method", type=str, default="cnn",
	help="face detection model to use: either `hog` or `cnn`")
args = vars(ap.parse_args())

We import packages on Lines 2-8 and then proceed to parse our command line arguments on Lines 11-20.

We have four command line arguments, two of which you should recognize from above (

--encodings

and

--detection-method

). The other two arguments are:

--output
: The path to the output video.
--display
: A flag which instructs the script to display the frame to the screen. A value of
1
displays and a value of
0
will not display the output frames to our screen.

From there we’ll load our encodings and start our

VideoStream

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(args["encodings"], "rb").read())
# initialize the video stream and pointer to output video file, then
# allow the camera sensor to warm up
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
writer = None
time.sleep(2.0)

To access our camera we’re using the

VideoStream

class from imutils. Line 29 starts the stream. If you have multiple cameras on your system (such as a built-in webcam and an external USB cam), you can change the

src=0

src=1

and so forth.

We’ll be optionally writing processed video frames to disk later, so we initialize

writer

None

(Line 30). Sleeping for 2 complete seconds allows our camera to warm up (Line 31).

From there we’ll start a

while

loop and begin to grab and process frames:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# loop over frames from the video file stream
while True:
	# grab the frame from the threaded video stream
	frame = vs.read()
	
	# convert the input frame from BGR to RGB then resize it to have
	# a width of 750px (to speedup processing)
	rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	rgb = imutils.resize(frame, width=750)
	r = frame.shape[1] / float(rgb.shape[1])
	# detect the (x, y)-coordinates of the bounding boxes
	# corresponding to each face in the input frame, then compute
	# the facial embeddings for each face
	boxes = face_recognition.face_locations(rgb,
		model=args["detection_method"])
	encodings = face_recognition.face_encodings(rgb, boxes)
	names = []

Our loop begins on Line 34 and the first step we take is to grab a

frame

from the video stream (Line 36).

The remaining Lines 40-50 in the above code block are nearly identical to the lines in the previous script with the exception being that this is a video frame and not a static image. Essentially we read the

frame

, preprocess, and then detect face bounding

boxes

+ calculate

encodings

for each bounding box.

Next, let’s loop over the facial

encodings

associated with the faces we have just found:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
	# loop over the facial embeddings
	for encoding in encodings:
		# attempt to match each face in the input image to our known
		# encodings
		matches = face_recognition.compare_faces(data["encodings"],
			encoding)
		name = "Unknown"
		# check to see if we have found a match
		if True in matches:
			# find the indexes of all matched faces then initialize a
			# dictionary to count the total number of times each face
			# was matched
			matchedIdxs = [i for (i, b) in enumerate(matches) if b]
			counts = {}
			# loop over the matched indexes and maintain a count for
			# each recognized face face
			for i in matchedIdxs:
				name = data["names"][i]
				counts[name] = counts.get(name, 0) + 1
			# determine the recognized face with the largest number
			# of votes (note: in the event of an unlikely tie Python
			# will select first entry in the dictionary)
			name = max(counts, key=counts.get)
		
		# update the list of names
		names.append(name)

In this code block, we loop over each of the

encodings

and attempt to match the face. If there are matches found, we count the votes for each name in the dataset. We then extract the highest vote count and that is the name associated with the face. These lines are identical to the previous script we reviewed, so let’s move on.

In this next block, we loop over the recognized faces and proceed to draw a box around the face and the display name of the person above the face:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
	# loop over the recognized faces
	for ((top, right, bottom, left), name) in zip(boxes, names):
		# rescale the face coordinates
		top = int(top * r)
		right = int(right * r)
		bottom = int(bottom * r)
		left = int(left * r)
		# draw the predicted face name on the image
		cv2.rectangle(frame, (left, top), (right, bottom),
			(0, 255, 0), 2)
		y = top - 15 if top - 15 > 15 else top + 15
		cv2.putText(frame, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX,
			0.75, (0, 255, 0), 2)

Those lines are identical too, so let’s focus on the video-related code.

Optionally, we’re going to write the frame to disk, so let’s see how writing video to disk with OpenCV works:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
	# if the video writer is None *AND* we are supposed to write
	# the output video to disk initialize the writer
	if writer is None and args["output"] is not None:
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(args["output"], fourcc, 20,
			(frame.shape[1], frame.shape[0]), True)
	# if the writer is not None, write the frame with recognized
	# faces to disk
	if writer is not None:
		writer.write(frame)

Assuming we have an output file path provided in the command line arguments and we haven’t already initialized a video

writer

(Line 99), let’s go ahead and initialize it.

On Line 100, we initialize the

VideoWriter_fourcc

. FourCC is a 4-character code and in our case, we’re going to use the “MJPG” 4-character code.

From there, we’ll pass that object into the

VideoWriter

along with our output file path, frames per second target, and frame dimensions (Lines 101 and 102).

Finally, if the

writer

exists, we can go ahead and write a frame to disk (Lines 106-107).

Let’s handle whether or not we should display the face recognition video frames on the screen:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
	# check to see if we are supposed to display the output frame to
	# the screen
	if args["display"] > 0:
		cv2.imshow("Frame", frame)
		key = cv2.waitKey(1) & 0xFF
		# if the `q` key was pressed, break from the loop
		if key == ord("q"):
			break

If our display command line argument is set, we go ahead and display the frame (Line 112) and check if the quit key (

"q"

) has been pressed (Lines 113-116), at which point we’d

break

out of the loop (Line 117).

Lastly, let’s perform our housekeeping duties:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()
# check to see if the video writer point needs to be released
if writer is not None:
	writer.release()

In Lines 120-125, we clean up and release the display, video stream, and video writer.

Are you ready to see the script in action?

To demonstrate real-time face recognition with OpenCV and Python in action, open up a terminal and execute the following command:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ python recognize_faces_video.py --encodings encodings.pickle \
	--output output/webcam_face_recognition_output.avi --display 1
[INFO] loading encodings...
[INFO] starting video stream...

Below you can find an output example video that I recorded demonstrating the face recognition system in action:

Face recognition in video files

As I mentioned in our “Face recognition project structure” section, there’s an additional script included in the “Downloads” for this blog post —

recognize_faces_video_file.py

This file is essentially the same as the one we just reviewed for the webcam except it will take an input video file and generate an output video file if you’d like.

I applied our face recognition code to the popular “lunch scene” from the original Jurassic Park movie where the cast is sitting around a table sharing their concerns with the park:

→ Launch Jupyter Notebook on Google Colab

Face recognition with OpenCV, Python, and deep learning
$ python recognize_faces_video_file.py --encodings encodings.pickle \
	--input videos/lunch_scene.mp4 --output output/lunch_scene_output.avi \
	--display 0

Here’s the result:

Note: Recall that our model was trained on four members of the original cast: Alan Grant, Ellie Sattler, Ian Malcolm, and John Hammond. The model was not trained on Donald Gennaro (the lawyer) which is why his face is labeled as “Unknown”. This behavior was by design (not an accident) to show that our face recognition system can recognize faces it was trained on while leaving faces it cannot recognize as “Unknown”.

And in the following video I have put together a “highlight reel” of Jurassic Park and Jurassic World clips, mainly from the trailers:

As we can see, we can see, our face recognition and OpenCV code works quite well!

Sunday, May 29, 2022

Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning

How to use Machine Learning on a Very Complicated Problem

Face Recognition — Step by Step

Step 1: Finding all the Faces

Step 2: Posing and Projecting Faces

Step 3: Encoding Faces

The most reliable way to measure a face

Encoding our face image

Step 4: Finding the person’s name from the encoding

Running this Yourself

Face recognition with OpenCV, Python, and deep learning Install (installed virtualenv + virtualenvwrapper )

Face recognition with OpenCV, Python, and deep learning

Understanding deep learning face recognition embeddings

Install your face recognition libraries

Installing dlib without GPU support

Installing dlib with GPU support (optional)

Install the face_recognition package

Install imutils

Our face recognition dataset

Face recognition project structure

Encoding the faces using OpenCV and deep learning

Recognizing faces in images

Recognizing faces in video

Face recognition in video files

Introduction to Keras and TensorFlow for Training Deep Learning Classifiers

Installing `dlib` without GPU support

Installing `dlib` with GPU support (optional)

Install the `face_recognition` package

Install `imutils`