1. Article purpose[edit | edit source]
The main purpose of this article is to give main steps and advice on how to deploy Neural Networks (NN) models on STM32MPU boards through the X-LINUX-AI expansion package. The X-LINUX-AI is designed to be user-friendly and to facilitate the NN model deployment on all the STM32MPU targets with a common and coherent ecosystem.
3. Deploy NN model on STM32MP1x board[edit | edit source]
This part details the steps to follow to deploy the NN model on the STM32MP1x board.
3.1. Which type of NN model is used[edit | edit source]
On STM32MP1x, X-LINUX-AI ecosystem only support three types of NN models which are :
- TensorFlowTM Lite model : the common extension for this type of model is .tflite
- Coral Edge TPUTM model : this type of model is a derivative of classic TensorFlowTM Lite model, the extension of the model remain .tflite but the model is pre-compiled for Edge TPUTM using a specific compiler. To go further with Coral models please refer to the dedicated wiki article : How to compile model and run inference on Coral Edge TPUTM
- ONNXTM model : the common extension for this type of model is .onnx
If the model that you want to deploy is not in the above list, it means that the model type is not supported as is and may need conversion. Here is a list of common AI frameworks extension conversion to TensorFlowTM Lite or ONNXTM type :
- TensorFlowTM : for TensorFlowTM saved model with .pb extension, the conversion to a TensorFlowTM Lite model, could be easily done using TensorFlowTM Lite converter
- Keras : for Keras .h5 file, the conversion to a TensorFlowTM Lite model, could also be done using TensorFlowTM Lite converter as Keras is part of TensorFlowTM since 2017
- PytorchTM : for typical PytorchTM model .pt, it is possible to directly export a ONNXTM model using the PytorchTM built-in function torch.onnx.export. It is not possible to directly export a TensorFlowTM Lite model but it is possible to convert ONNXTM model to TensorFlowTM Lite model using packages like onnx-tf or onnx2tf
- Ultralitics Yolo : Ultralitics provide a build-in function to export YoloVx models with several formats such as ONNXTM and TensorFlowTM Lite
3.2. Which quantization type is used[edit | edit source]
The most important point is to determine if the model to execute on target is quantized or not. Generally, common AI frameworks like TensorFlowTM, ONNXTM, PytorchTM use 32-bit floating point representation during the training phase of the model which is optimized for modern GPUs and CPUs but not for embedded devices.
To determine if a model is quantized, the most convenient way is to use a tool like Netron, which is a visualizer for neural network models. For each layer of the NN, the data type is mentioned (float32, int8, uint8 ...) but also the quantization type and the quantization parameters ... If the data type of internal layers (excepted inputs and outputs layers) are in 8-bits or lower it means that the model is quantized.
Float-32 models can be run on the CPU of STM32MP1x using TensorFlowTM Lite or ONNXTM Runtime but the performances will be very slow. It is highly recommended to perform a 8-bits quantization using per-channel quantization scheme. A 8-bit quantized model will run faster with in most cases an acceptable accuracy loss.
To quantize a model with post-training quantization,TensorFlowTM Lite converter and ONNXTM Runtime frameworks provide all the necessary to perform such quantization directly on host PC, the documentation can be found on their website.
3.3. Deploy the model on target[edit | edit source]
Once the model is in TensorFlowTM Lite or ONNXTM format and optimized for embedded deployment, the next step is to perform a benchmark on target using X-LINUX-AI unified benchmark to validate the good functioning of the model. To do it please refer to the dedicated article : How to benchmark your NN model on STM32MPU
To go further, with developing an AI application based on this model using TensorFlowTM Lite runtime or ONNXTM runtime please refer to application example wiki articles : AI - Application examples