Last edited one month ago

On-device learning for object detection



1. Article purpose[edit | edit source]

This article explains how to use the teacher-student learning feature for object detection applications with on-device automatic labeling. We will demonstrate the concept of transfer learning using the ONNXRuntime training API on an STM32MP2 series' boards More info green.png device.

2. Description[edit | edit source]

The application demonstrates the teacher-student machine learning use case for object detection. Frames are grabbed from a camera sensor, then processed and labeled using a powerful and accurate model known as the teacher model, with high accuracy and low real time constraints. This local dataset is used to retrain the student, validate its loss convergence, and export a new inference model to run in the same application within a Gstreamer pipeline. If data are collected in a generic uniform way, the user will notice an improvement of the inference accuracy of the student model.
The models used with this application are :

  • The SSD MobileNet V2 as the student: defined and exported from the PyTorch-SSD repo[1], and trained on PASCAL VOC dataset. The training artifacts of this model are provided along with the application package.
  • The RT-DETR as the teacher: exported in large version from the Ultralytics[2] Python module and trained on a larger dataset COCO. The ONNX model is provided with the application package, but the user can optionally export it to ONNX format, as detailed in this article.


3. Installation[edit | edit source]

3.1. Install from the OpenSTLinux AI package repository[edit | edit source]

After having configured the AI OpenSTLinux package you can install X-LINUX-AI components for on-device learning application.

3.1.1. Install the GTK+3.0 UI application[edit | edit source]

The application is available only on Python and STM32MP2 series' boards More info green.png.

  • To install this application, use the following command:
x-linux-ai -i on-device-learning-obj-detect-python


  • Then, restart the demo launcher:
systemctl restart weston-graphical-session.service


3.2. Export the teacher model: RT-DETR (optional)[edit | edit source]

3.3. Generate the student model training artifacts (optional)[edit | edit source]

3.4. Source code location[edit | edit source]

  • In the OpenSTLinux Distribution with X-LINUX-AI Expansion Package:
<Distribution Package installation directory>/layers/meta-st/meta-st-x-linux-ai/recipes-samples/on-device-learning/files/
  • On GitHub:
recipes-samples/on-device-learning/files/teacher-student

4. How to use the application[edit | edit source]

4.1. Launching via the demo launcher[edit | edit source]

You can click on the icon to run the Python application installed on your STM32MP2 series' boards More info green.png.

Demo launcher

4.2. Executing with the command line[edit | edit source]

The on-device learning for object detection Python applications are located in the userfs partition:

/usr/local/x-linux-ai/on-device-learning/odl_teacher_student_obj_detect.py

It accepts the following input parameters:

  • In Python application:
 
Usage: python3 odl_teacher_student_obj_detect.py -t <model .onnx> -l <label .txt file> --training_artifacts_path <artifacts parent dir>

-t --teacher_model <.onnx file path>:        .onnx teacher model to be executed for data annotation 
-l --label_file <label file path>:           Name of file containing labels
--training_artifact_path <directory path>:   Path of the directory containing the training artifacts
--inference_model_path <file path>:          The initial inference model path in case there is no new inference model
--frame_width  <val>:                        Width of the camera frame (default is 640)
--frame_height <val>:                        Height of the camera frame (default is 480)
--framerate <val>:                           Framerate of the camera (default is 15fps)
--conf_threshold <val>:                      Threshold of accuracy above which the boxes are display (default 0.60)
--iou_threshold <val>:                       Threshold of intersection over union above which the boxes are displayed (default 0.45)
--nb_calib_img                               Number of images to consider for static quantization parameters
--help:                                      Show this help


4.3. Navigating through the tabs[edit | edit source]

The application is developed with GTK [5], and provided in the format of a GTK notebook, allowing you a smooth navigation through all the steps of the teacher-student workflow.

4.3.1. Data retrieval tab[edit | edit source]

The primary advantage of on-device learning is that data remain on the device, ensuring enhanced privacy and security. This approach eliminates the need to transfer sensitive data to external servers for processing, thereby reducing the risk of data breaches and unauthorized access, therefore the need to use on-device camera sensor.
On this tab, the user had the option to choose the retrieval frequency of the images and the number of samples to grab and save.

Data retrieval tab

Once the retrieval process is done, you are now set to move to the next tab.

4.3.2. Data Visualization[edit | edit source]

This tab displays the retrieved images to inspect visually their quality to identify potential errors or inconsistencies in the data retrieval process. There are two sections called Old data and New data. This is a solution for a common problem known as catastrophique forgetting, an issue occurring when a neural network trained on new tasks or data loses performance on previously learned tasks. This happens because updating the model's parameters to optimize for new data overwrites the knowledge encoded from old data, akin to a system forgetting past learning. Keeping a small subset of old data and interleaving them with the new ones during training mitigates the issue, by refreshing the model's memory of prior patterns.

  • Start by setting the percentage of old data you want to add to your new data by moving the scale.
  • Choose the percentage of the data you want to set for training. 10% is set by default for testing. The remaining part is allocated for evaluation during the training phase.
Data vizualization tab

Once the dataset splitting process is done, you will be redirected automatically to the next tab.

4.3.3. Data Annotation using teacher model[edit | edit source]

To annotate the collected data, run inferences using the STAI_MPU API and the RT-DETR model converted previously to ONNX format and deployed on target as shown in this section.

Data annotation tab

The use of this tab is straightforward: pressing the launch annotation button starts the annotation process. The annotated images are displayed one after another on this tab, to help you visually monitor the annotation process, which may take some time.

Once the annotation process is done, you are now set to move to the next tab which represents the training phase.

4.3.4. Training, quantizing and evaluating the student model[edit | edit source]

The next step is to train the student model using the annotations generated by the teacher model. This process involves feeding the pre-processed images and their corresponding labels into the student model, optimizing the model parameters through iterative learning. The goal is to achieve a model that efficiently and accurately detects objects, even with potentially fewer resources or simpler architecture compared to the teacher model.
As you can notice on this tab, you have the option to set the number of epochs for which you want to train your student model, the learning rate, and the batch size.

Once the parameters are set, you can launch a training session by pressing the associated button. The training session may take few minutes, depending upon the number of samples and the number of epochs.

Training tab

The final step of this tab is to run an evaluation on the eval set images to notice the mean average precision of the models before and after training, and on both old and new data image, to monitor any old patterns forgetting. After the end of the training session and the evaluation, the new inference model is exported automatically and quantized on device to allow it to run in inference mode on the NPU/GPU through the STAI_MPU API, as shown in the next paragraph.

4.3.5. Inferencing using the new updated student model[edit | edit source]

The final step of this workflow is to run the newly trained and exported model in an inference application based on live video stream from the camera sensor. This enables a visual validation of the new model's behavior, compared to the old model. You are allowed to switch between the two models by pressing on one of the buttons in the bottom.

Inference tab

5. References[edit | edit source]