FP-AI-FACEREC1 getting started

Revision as of 17:46, 30 November 2020 by Registered User (→‎Performances)

This article explains how to get started on the Face Recognition application running on STM32 microcontroller. The Face Recognition application is capable of recognizing the face of a know (i.e. enrolled) user.

This article provides an overview on the following topics:

  • Overview of the required hardware setup
  • Overview of the software architecture
  • Description of the Face Reco application running on STM32

1. Hardware setup

The Face Reco application is running on a hardware setup made up of a STM32 microcontroller board connected to a camera module board.

1.1. STM32 board: STM32H747 Discovery Kit

The STM32H747I-DISCO is a complete demonstration and development platform for STMicroelectronics STM32H747XIH6 microcontroller, designed to simplify user application development. The STM32H747XIH6device is based on the high-performance Arm® Cortex®-M7 and Cortex®-M4 32-bit RISC cores. The Cortex®-M7 core operates at up to 480 MHz and the Cortex®-M4 core at up to 240 MHz.

The STM32H747XIH6device incorporates high-speed embedded memories with a dual-bank Flash memory of 2 Mbytes and of 1 Mbyte of RAM.

The other key specifications of the board are:

  • On-board STLINK-V3E debugger/programmer
  • USB OTG HS
  • 4” capacitive touch LCD display module with MIPI® DSI interface
  • 2 x 512-Mbit Quad-SPI NOR Flash memory
  • 256-Mbit SDRAM
  • 8-bit camera connector
  • microSD™ card
An image of STM32H747I-DISCO board.

1.2. Camera board

In the context of the Face Reco application the firmware supports the two following sensors:

  • The OV5640 sensor
  • The OV9655 sensor

(which is th eone mounted on the MB1379 camera daughterboard). Two camera module boards are supported:

1.2.1. STM32F4DIS-CAM camera module

The picture below shows the STM32F4DIS-CAM camera module featuring a OV9655 sensor:


1.2.2. B-CAMS-OMV based camera module

The picture below shows the B-CAMS-OMV mother board on which a MB1379 camera daughterboard is plugged:

An image of B-CAM-OMV camera board board.

The B-CAMS-OMV board is an adaptation which, on top of MB1379 camera daughterboard, enables to support camera board from OpenMV and Arducam.

In the context of the Face Reco application the firmware supports only the OV5640 sensor (which is the one mounted on the MB1379 camera daughterboard).

1.3. STM32H747I-DISCO connection to camera boards

The picture below shows how to connect the camera board to the STM32H747I-DISCO board using a flex cable:

An image of connection between STM32H747I-DISCO and camera board.

2. Software architecture

The figure below depicts the software architecture of the Face recognition application:

An image of the software architecture for the Face Reco application.

Here is below a brief description of the middleware components:

2.1. STM32 AI Runtime

This is a STM32 optimized AI library generated by the X-CUBE-AI tool when generating the C neural network model.

2.2. STM32 Image

This is a library of functions for image preprocessing: rescaling, pixel format conversion etc.

2.3. STM32 Face Detect

Library containing functions for Face Detection purpose.

2.4. STM32 Face Reco

Library containing functions for Face Recognition purpose.

3. Description of Face Reco application

3.1. Frame processing flow

The figure below shows the different frame processing stages involved in the Face Recognition application:

An image of the data pipe for the Face Reco application.

3.1.1. Camera capture

The camera frame capture has the following characteristics:

  • Resolution is set to QVGA (320x240)
  • Pixel color format is set to RGB565

3.1.2. Frame Pre-processing

The main preprocessing stage involved in the Face Reco application is the pixel color format conversion so to convert the RGB565 captured frame into a RGB888 frame.

3.1.3. Face Detection

The Face Detection block is in charge of finding the faces present in the input frame (QVGA, RGB888). In the current version of the application, the maximum number of faces that can be found is set to one. The output of this block is a frame of resolution 96x96 that contains the face found in the input captured frame.

3.1.4. Face Recognition

The Face Recognition block is in charge of extracting features and computing a signature (feature vector) corresponding to the input face.

3.1.5. Face Identification

The Face Identification block is in charge of computing the distance between:

  • The vector produced by the Face Recognition block, and
  • Each of the vector stored in memory (and corresponding to the enrolled faces)

The output Face Identification block generates the two following outputs:

  • a User Face ID corresponding to the minimum distance
  • a similarity score

3.2. Running the application

The application has two main operating modes:

3.2.1. The "nominal" mode

Nominal mode is the default operating mode during which the application is attempting to match the face contained in the input frame with a specific User Face ID.

3.2.2. The "enrollment" mode

Prior being able to link a face with a User Face ID, the system should have the User face ID recorded into is database. For that purpose, a user must enroll himself so to have his face's features (in the form of a feature vector) recorded into the memory.

It has to be noted that the current version of the Face Reco application does not support retention of the enrolled users. In other words, all the enrollment information are lost upon reset.

3.2.3. In practice

Practically, here is how it works:

3.2.3.1. Nominal mode

Upon reset, the system is running in the so-called nominal mode.

As soon as a face is detected within the camera captured frame, a rectangle box is drawn around it.

If the system is not able to match the detected face with one of the enrolled faces (either because the user face is not yet enrolled or because the Face Identification similarity score is lower than the default recognition threshold), the "No match" message is displayed on the right banner of the LCD screen and the rectangle box is drawn in red.

If the system is able to match the detected face with one of the enrolled faces, the rectangle box is drawn in green and a thumbnail (representing the user's enrolled face matching the detected face) is displayed on the right banner of the LCD screen along with the following information:

  • The User Face ID
  • The similarity score expressed in percentage (%).

When in nominal mode a FPS (Frame Per Second) information is displayed at the top of the right banner.

3.2.3.2. Enrollment mode

In order to enroll his face, the user must perform a long press on the blue joystick [SEL] button present on the board. There exist two options for triggering the enrollment:

  • Either the enrollment is triggered immediately and automatically following up a long press on the joystick [SEL] button (= default behavior upon reset)
  • Or the enrollment is triggered after a 4 seconds countdown following up a long press on the joystick [SEL] button

In both cases, once the enrollment is completed the "ENROLLMENT COMPLETED RELEASE JOYSTICK" message is displayed and a thumbnail corresponding to the enrolled face is displayed on the right banner of the LCD screen along with the User Face ID.

Up to 100 user can be enrolled in the system.

In this version of the application, the enrollment information are stored in volatile memory. As a consequence they are lost when resetting the system.

3.2.3.3. FPS number

When in nominal mode a FPS (Frame Per Second) information is displayed at the top of the right banner. This FPS corresponds to the number of frames that could be processed by the system in one second. So the FPS is directly linked to the the following timings:

  • Duration of the Face Detection execution
  • Duration of the Face Recognition execution

The camera frame acquisition is performed in parallel of the Face Detection and Face Recognition algorithm execution.

3.2.3.4. Performances
  • The default recognition threshold is set to 0.70. Based on our validation dataset, it corresponds to a FAR of ~1%. FAR stands for False Acceptance Rate and it is the probability of cases for which the system fallaciously authorizes an unauthorized person.
  • The current FPS is ~3.6:
    • Face Detection execution time ~150ms
    • Face Recognition execution time ~125ms
  • On-chip memory footprint: The Face Reco system is running 100% from internal on-chip memory. The external SDRAM is used only for the LCD display purpose.
    • SRAM (data) memory footprint: 650 Kbytes
    • Flash (code) memory footprint: 1.6 MBytes
  • The user must keep in mind that the following parameters impact the accuracy of the recognition:
    • Ambiance light and illumination
    • Distance between the camera and the User (1.5m maximum)
    • Enrollment phase: it is important to perform a enrollment of good quality.

Adjusting the camera contrast by pressing the joystick [LEFT] ans [RIGHT] buttons is a way to improve the accuracy of the recognition.For an optimized recognition, it is important to be in good illumination condition.

3.2.3.5. Configuration menu

A configuration menu is available so to enable the user to:

  • Visualize debug information such as:
    • Execution timing
    • Recognition threshold value
  • * Number of enrolled user
  • Change the enrollment trigger mode:
    • Countdown
    • Immediate
  • Update the Recognition threshold value (by steps of 0.05)
  • Change the camera orientation:
    • Flip
    • Mirror
    • Mirror and Flip

The configuration menu is accessible by performing a long press on the blue Wakeup button.