This article shows how to use the Teachable Machine online tool with STM32Cube.AI and the FP-AI-VISION function pack to create an image classifier running on the STM32H747I-DISCO board.
This tutorial is divided into three parts: the first part shows how to use the Teachable Machine to train and export a deep learning model, then STM32Cube.AI is used to convert this model into optimized C code for STM32 MCUs. The last part explains how to integrate this new model into the FP-AI-VISION1 to run live inference on an STM32 board with a camera. The whole process is described below:
1. Prerequisites
1.1. Hardware
- STM32H747I-DISCO Board
- STM32F4DIS-CAM camera module
- A Micro-USB to USB cable
- 'A webcam '(Optional)
1.2. Software
- STM32Cube IDE
- X-Cube-AI command line tool
- FP-AI-VISION1
- STM32CubeProgrammer
2. Training a model using Teachable Machine
In this section, we will train deep neural network in the browser using Teachable Machine. We first need to choose something to classify. In this example, we will classify ST boards and modules. The chosen boards are shown in the figure below:
You can choose whatever object you want to classify it: fruits, pasta, animals, people, etc...
Let's get started. Open https://teachablemachine.withgoogle.com/, preferably from Chrome browser.
Click Get started, then select Image Project. You will be presented with the following interface.
2.1. Adding training data
For each category you want to classify, edit the class name by clicking the pencil icon. In this example, we choose to start with SensorTile
.
To add images with your webcam, click the webcam icon and record some images. If you have image files on your computer, click upload and select the directory containing your images.
Once you have a satisfactory amount of images for this class, repeat the process for the next one until your dataset is complete.
2.2. Training the model
Now that we have a good amount of data, we are going to train a deep learning model for classifying these different objects. To do this, click the Train Model button as shown below:
This process can take a while, depending on the amount of data you have. To monitor the training progress, you can select Advanced and click Under the hood. A side panel displays training metrics.
When the training is complete, you can see the predictions of your network on the "Preview" panel. You can either choose a webcam input or an imported file.
2.2.1. What happens under the hood (for the curious)
Teachable Machine is based on Tensorflow.js to allow neural network training and inference in the browser. However, as image classification is a task that requires a lot of training time, Teachable Machine uses a technique called transfer learning: The webpage downloads a MobileNetV2 model that was previously trained on a big image dataset of 1000 categories. The convolution layers of this pre-trained model are very good at doing feature extraction so they do not need to be trained again. Only the last layers of the neural network are trained using Tensorflow.js, thus saving a lot of time.
2.3. Exporting the model
If you are happy with your model, it is time to export it. To do so, click the Export Model button. In the pop-up window, select Tensorflow Lite, check Quantized and click Download my model.
Since the model conversion is done in the cloud, this step can take a few minutes.
Your browser downloads a zip file containing the model as a .tflite
file and a .txt
file containing your label. Extract these two files in an empty directory that we will call workspace
in the rest of this tutorial.
2.3.1. Inspect the model using Netron (optional)
It is always interesting to take a look at a model architecture as well as its input and output formats and shapes. To do this, use the Netron webapp.
Visit https://lutzroeder.github.io/netron/ and select Open model, then choose the model.tflite
file from Teachable Machine. Click sequental_1_input
: we observe that the input is of type uint8
and of size [1, 244, 244, 3]
. Now let's look at the outputs: in this example we have 6 classes, so we see that the output shape is [1,6]
. The quantization parameters are also reported. Refer to part 3 for how to use them.
3. Porting to target
3.1. STM32H747I-DISCO
In this part we will use the stm32ai
command line tool to convert the TensorflowLite model to optimized C code for STM32.
Start by opening a shell in your workspace directory, then execute the following command:
cd <path to your workspace> stm32ai generate -m model.tflite -v 2
The expected output is:
Neural Network Tools for STM32 v1.2.0 (AI tools v5.0.0)
Running "generate" cmd...
-- Importing model
model files : /path/to/workspace/model.tflite
model type : tflite (tflite)
-- Importing model - done (elapsed time 0.531s)
-- Rendering model
-- Rendering model - done (elapsed time 0.184s)
-- Generating C-code
Creating /path/to/workspace/stm32ai_output/network.c
Creating /path/to/workspace/stm32ai_output/network_data.c
Creating /path/to/workspace/stm32ai_output/network.h
Creating /path/to/workspace/stm32ai_output/network_data.h
-- Generating C-code - done (elapsed time 0.782s)
Creating report file /path/to/workspace/stm32ai_output/network_generate_report.txt
Exec/report summary (generate dur=1.500s err=0)
-----------------------------------------------------------------------------------------------------------------
model file : /path/to/workspace/model.tflite
type : tflite (tflite)
c_name : network
compression : None
quantize : None
L2r error : NOT EVALUATED
workspace dir : /path/to/workspace/stm32ai_ws
output dir : /path/to/workspace/stm32ai_output
model_name : model
model_hash : 2d2102c4ee97adb672ca9932853941b6
input : input_0 [150,528 items, 147.00 KiB, ai_u8, scale=0.003921568859368563, zero=0, (224, 224, 3)]
input (total) : 147.00 KiB
output : nl_71 [6 items, 6 B, ai_i8, scale=0.00390625, zero=-128, (6,)]
output (total) : 6 B
params # : 517,794 items (526.59 KiB)
macc : 63,758,922
weights (ro) : 539,232 (526.59 KiB)
activations (rw) : 853,648 (833.64 KiB)
ram (total) : 1,004,182 (980.65 KiB) = 853,648 + 150,528 + 6
------------------------------------------------------------------------------------------------------------------
id layer (type) output shape param # connected to macc rom
------------------------------------------------------------------------------------------------------------------
0 input_0 (Input) (224, 224, 3)
conversion_0 (Conversion) (224, 224, 3) input_0 301,056
------------------------------------------------------------------------------------------------------------------
1 pad_1 (Pad) (225, 225, 3) conversion_0
------------------------------------------------------------------------------------------------------------------
2 conv2d_2 (Conv2D) (112, 112, 16) 448 pad_1 5,820,432 496
nl_2 (Nonlinearity) (112, 112, 16) conv2d_2
------------------------------------------------------------------------------------------------------------------
( ... )
------------------------------------------------------------------------------------------------------------------
71 nl_71 (Nonlinearity) (1, 1, 6) dense_70 102
------------------------------------------------------------------------------------------------------------------
72 conversion_72 (Conversion) (1, 1, 6) nl_71
------------------------------------------------------------------------------------------------------------------
model p=517794(526.59 KBytes) macc=63758922 rom=526.59 KBytes ram=833.64 KiB io_ram=147.01 KiB
Complexity per-layer - macc=63,758,922 rom=539,232
------------------------------------------------------------------------------------------------------------------
id layer (type) macc rom
------------------------------------------------------------------------------------------------------------------
0 conversion_0 (Conversion) || 0.5% | 0.0%
2 conv2d_2 (Conv2D) ||||||||||||||||||||||||| 9.1% | 0.1%
3 conv2d_3 (Conv2D) |||||||||| 3.5% | 0.0%
4 conv2d_4 (Conv2D) ||||||| 2.5% | 0.0%
5 conv2d_5 (Conv2D) |||||||||||||||||||||||||| 9.4% | 0.1%
7 conv2d_7 (Conv2D) ||||||| 2.6% | 0.1%
( ... )
64 conv2d_64 (Conv2D) |||| 1.5% ||||| 3.7%
65 conv2d_65 (Conv2D) | 0.3% | 0.8%
66 conv2d_66 (Conv2D) |||||||| 2.9% |||||||| 7.1%
67 conv2d_67 (Conv2D) ||||||||||||||||||||||||||||||| 11.3% ||||||||||||||||||||||||||||||| 27.5%
69 dense_69 (Dense) | 0.2% |||||||||||||||||||||||||| 23.8%
70 dense_70 (Dense) | 0.0% | 0.1%
71 nl_71 (Nonlinearity) | 0.0% | 0.0%
------------------------------------------------------------------------------------------------------------------
This command will generate 4 files:
- network.c
- network_data.c
- network.h
- network_data.h
under workspace/stm32ai_ouptut/
.
Let's take a look at the highlighted lines: we learn that the model uses 526.59 Kbytes of weights (read-only memory) and 833.54 Kbytes of activations. As the STM32H747xx MCUs do not have 833 Kbytes of contiguous RAM, we need to use the external SDRAM present on the STM32H747-DISCO board. Refer to UM2411 section 5.8 "SDRAM" for more information.
3.1.1. Integration with FP-AI-VISION1
In this part we will import our brand new model into the FP-AI-VISION1 function pack. This function pack provides a software example for a food classification application. For more information on FP-AI-VISION1, refer to here.
The main objective of this step is to replace the network
and network_data
files in FP-AI-VISION1 by the newly generated files and make a few adjustments to the code.
3.1.1.1. Open the project
If it is not already done, download the zip file from ST website and extract the content to your workspace. It must now contain the following elements:
- model.tflite
- labels.txt
- stm32ai_output
- FP_AI_VISION1
If we take a look inside the function pack, we see two configurations for the model data type, as shown below.
Our model is a quantized one, so we choose the Quantized_Model directory.
Go into workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/STM32CubeIDE/STM32H747I_DISCO
and double-click on .project
. STM32CubeIDE starts with the project loaded.
3.1.1.2. Replace the network files
The model files are located in workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/CM7/
Src
and Inc
directory.
Delete the following files and replace them with the ones from workspace/stm32ai_output
:
In Src
:
- network.c
- network_data.c
In Inc
:
- network.h
- network_data.h
3.1.1.3. Update the labels
In this step we will updates the labels for the network output. The label.txt
file downloaded with Teachable Machine can help you doing this. In our example, the content of this file looks like this:
0 SensorTile 1 IoTNode 2 STLink 3 Craddle Ext 4 Fanout 5 Background
From STM32CubeIDE, open fp_vision_app.c
. Go to line 129 where the g_food_classes
is defined and update this variable with our label names:
// fp_vision_app.c line 129
const char* g_food_classes[AI_NET_OUTPUT_SIZE] = {
"SensorTile", "IoTNode", "STLink", "Craddle Ext", "Fanout", "Background"};
3.1.1.4. Update the output format
In the original code, the neural network output is in floating point format (ai_float
). With Teachable Machine, the output is an 8-bit integer (ai_i8
). Update the code consequently.
In fp_vision_app.c
line 97 modify the line as follows:
// fp_vision_app.c line 97
// ai_float nn_output_buff[AI_NET_OUTPUT_SIZE] = {0}; // Old code
ai_i8 nn_output_buff[AI_NET_OUTPUT_SIZE] = {0}; // New code code
Apply this change to fp_vision_app.h
as well. You can open the file from the Include
directory in the Project Explorer as shown bellow.
// fp_vision_app.h line 48
//extern ai_float nn_output_buff[]; // Old code
extern ai_i8 nn_output_buff[]; // New code
3.1.1.5. Dequantize the output
The last operation that needs to be done is the dequantization of the output from the model. As we can see from the output of stm32ai
, the output is quantized in 8-bit signed integer format with the following parameters:
scale = 0.00390625
zero_point = -128
The formula for converting quantized to real is the following:
π πππ = πππππ Γ (ππ’πππ‘ππ§ππ β πππππππππ‘)
Let's apply this transformation to the code: open main.c
and visit the AI_Output_Display()
function located line 562.
Just before calling the BubbleSort()
function, perform the dequantization: add the following lines and update the BublleSort()
call as following:
/* Added lines */
const ai_i32 zero_point = -128;
const ai_float scale = 0.00390625f;
ai_float float_output[NN_OUPUT_CLASS_NUMBER];
for(int i = 0; i < NN_OUPUT_CLASS_NUMBER; i++){
float_output[i] = scale * ( (ai_float) (NN_OUTPUT_BUFFER[i]) - zero_point);
}
/* End added lines */
Bubblesort(float_output, ranking, NN_OUPUT_CLASS_NUMBER); /* Updated line */
We still need to add two modifications to this function:
- First update the display mode by setting
display_mode
to 1 in order to see the image and label on the LCD display. - Then use
float_output
in the display function (as shown below).
All the modifications are highlighted in the snippet below
/**
* Copyright (c) 2020 STMicroelectronics. All rights reserved.
*
* This software component is licensed by ST under Ultimate Liberty license
* SLA0044, the "License"; You may not use this file except in compliance with
* the License. You may obtain a copy of the License at:
* www.st.com/SLA0044
*/
/* (...) */
static void AI_Output_Display(void)
{
static uint32_t occurrence_number = NN_OUTPUT_DISPLAY_REFRESH_RATE;
static uint32_t display_mode=1; /* Updated line */
occurrence_number--;
if (occurrence_number == 0)
{
char msg[70];
int ranking[NN_OUPUT_CLASS_NUMBER];
occurrence_number = NN_OUTPUT_DISPLAY_REFRESH_RATE;
for (int i = 0; i < NN_OUPUT_CLASS_NUMBER; i++)
{
ranking[i] = i;
}
/* Added lines */
const ai_i32 zero_point = -128;
const ai_float scale = 0.00390625f;
ai_float float_output[NN_OUTPUT_CLASS_NUMBER];
for(int i = 0; i < NN_OUPUT_CLASS_NUMBER; i++){
float_output[i] = scale * ( (ai_float) (NN_OUTPUT_BUFFER[i]) - zero_point);
}
/* End added lines */
Bubblesort(float_output, ranking, NN_OUPUT_CLASS_NUMBER); /* Updated line */
/*Check if PB is pressed*/
if (BSP_PB_GetState(BUTTON_WAKEUP) != RESET)
{
display_mode = !display_mode;
BSP_LCD_Clear(LCD_COLOR_BLACK);
if (display_mode == 1)
{
sprintf(msg, "Entering CAMERA PREVIEW mode");
}
else if (display_mode == 0)
{
sprintf(msg, "Exiting CAMERA PREVIEW mode");
}
BSP_LCD_DisplayStringAt(0, LINE(9), (uint8_t*)msg, CENTER_MODE);
sprintf(msg, "Please release button");
BSP_LCD_DisplayStringAt(0, LINE(11), (uint8_t*)msg, CENTER_MODE);
LCD_Refresh();
/*Wait for PB release*/
while (BSP_PB_GetState(BUTTON_WAKEUP) != RESET);
HAL_Delay(200);
BSP_LCD_Clear(LCD_COLOR_BLACK);
}
if (display_mode == 0)
{
BSP_LCD_Clear(LCD_COLOR_BLACK);/*To clear the camera capture*/
DisplayFoodLogo(LCD_RES_WIDTH / 2 - 64, LCD_RES_HEIGHT / 2 -100, ranking[0]);
}
else if (display_mode == 1)
{
sprintf(msg, "CAMERA PREVIEW MODE");
BSP_LCD_DisplayStringAt(0, LINE(DISPLAY_ACQU_MODE_LINE), (uint8_t*)msg, CENTER_MODE);
}
for (int i = 0; i < NN_TOP_N_DISPLAY; i++)
{
sprintf(msg, "%s %.0f%%", NN_OUTPUT_CLASS_LIST[ranking[i]], float_output[i] * 100); /* Updated Line */
BSP_LCD_DisplayStringAt(0, LINE(DISPLAY_TOP_N_LAST_LINE - NN_TOP_N_DISPLAY + i), (uint8_t *)msg, CENTER_MODE);
}
sprintf(msg, "Inference: %ldms", *(NN_INFERENCE_TIME));
BSP_LCD_DisplayStringAt(0, LINE(DISPLAY_INFER_TIME_LINE), (uint8_t *)msg, CENTER_MODE);
sprintf(msg, "Fps: %.1f", 1000.0F / (float)(Tfps));
BSP_LCD_DisplayStringAt(0, LINE(DISPLAY_FPS_LINE), (uint8_t *)msg, CENTER_MODE);
LCD_Refresh();
/* (...) */
3.1.1.6. Crop the image
Teachable Machine crops the webcam image to fit the model input size. In FP-AI-VISION1, the image is resized to the model input size, hence losing the aspect ratio, we will change this default behavior and implement a crop of the camera image.
In order to have square image and avoid image deformation we are going to crop the camera image using the DCMI. The goal of this step is to go from the 640x480 resolution to a 480x480 resolution.
First, edit fp_vision_app.h
line 148 to update the CAMERA_WIDTH
define to 480 pixels:
/****************************/
/***CAMERA related defines***/
/****************************/
#if CAMERA_CAPTURE_RES == VGA_640_480_RES
#define CAMERA_RESOLUTION RESOLUTION_R640x480
#define CAM_RES_WIDTH 480 /* was 640 */
#define CAM_RES_HEIGHT 480
Then, edit stm32h747i_discovery_camera_patch.c
located in Drivers/BSP/STM32H747I_DISCO
Modify the BSP_CAMERA_Init
function to configure DCMI cropping:
/**
* @brief Initializes the camera.
* @param Resolution : camera sensor requested resolution (x, y) : standard resolution
* naming QQVGA, QVGA, VGA ...
* @retval Camera status
*/
uint8_t BSP_CAMERA_Init(uint32_t Resolution)
{
DCMI_HandleTypeDef *phdcmi;
uint8_t status = CAMERA_ERROR;
/* Get the DCMI handle structure */
phdcmi = &hdcmi_discovery;
/*** Configures the DCMI to interface with the camera module ***/
/* DCMI configuration */
phdcmi->Init.CaptureRate = DCMI_CR_ALL_FRAME;
phdcmi->Init.HSPolarity = DCMI_HSPOLARITY_LOW;
phdcmi->Init.SynchroMode = DCMI_SYNCHRO_HARDWARE;
phdcmi->Init.VSPolarity = DCMI_VSPOLARITY_HIGH;
phdcmi->Init.ExtendedDataMode = DCMI_EXTEND_DATA_8B;
phdcmi->Init.PCKPolarity = DCMI_PCKPOLARITY_RISING;
phdcmi->Instance = DCMI;
/* Power up camera */
BSP_CAMERA_PwrUp();
/* Read ID of Camera module via I2C */
if(ov9655_ReadID(CAMERA_I2C_ADDRESS) == OV9655_ID)
{
/* Initialize the camera driver structure */
camera_drv = &ov9655_drv;
CameraHwAddress = CAMERA_I2C_ADDRESS;
/* DCMI Initialization */
BSP_CAMERA_MspInit(&hdcmi_discovery, NULL);
HAL_DCMI_Init(phdcmi);
/* Camera Module Initialization via I2C to the wanted 'Resolution' */
if (Resolution == CAMERA_R480x272)
{ /* For 480x272 resolution, the OV9655 sensor is set to VGA resolution
* as OV9655 doesn't supports 480x272 resolution,
* then DCMI is configured to output a 480x272 cropped window */
camera_drv->Init(CameraHwAddress, CAMERA_R640x480);
HAL_DCMI_ConfigCROP(phdcmi, /* Crop in the middle of the VGA picture */
(CAMERA_VGA_RES_X - CAMERA_480x272_RES_X)/2,
(CAMERA_VGA_RES_Y - CAMERA_480x272_RES_Y)/2,
(CAMERA_480x272_RES_X * 2) - 1,
CAMERA_480x272_RES_Y - 1);
HAL_DCMI_EnableCROP(phdcmi);
}
else
{
camera_drv->Init(CameraHwAddress, Resolution);
/* Removed HAL_DCMI_DisableCROP(phdcmi); */
HAL_DCMI_ConfigCROP(phdcmi, /* Crop in the middle of the VGA picture */
(CAMERA_VGA_RES_X - 480)/2,
(CAMERA_VGA_RES_Y - 480)/2,
(480 * 2) - 1,
480 - 1);
HAL_DCMI_EnableCROP(phdcmi);
}
CameraCurrentResolution = Resolution;
/* Return CAMERA_OK status */
status = CAMERA_OK;
}
else
{
/* Return CAMERA_NOT_SUPPORTED status */
status = CAMERA_NOT_SUPPORTED;
}
const uint8_t MVFP_REG = 0x1E;
CAMERA_IO_Write(CameraHwAddress, MVFP_REG, 0x30);
return status;
}
Lastly, in the same file, update the BSP_CAMERA_ContinuousStart
function to start the DMA transfert with the correct size:
/**
* @brief Starts the camera capture in continuous mode.
* @param buff: pointer to the camera output buffer
* @retval None
*/
void BSP_CAMERA_ContinuousStart(uint8_t *buff)
{
/* Start the camera capture */
/* Removed HAL_DCMI_Start_DMA(&hdcmi_discovery, DCMI_MODE_CONTINUOUS, (uint32_t)buff, GetSize(CameraCurrentResolution)); */
HAL_DCMI_Start_DMA(&hdcmi_discovery, DCMI_MODE_CONTINUOUS, (uint32_t)buff, 480*480/2);
}
Now cropping is enabled and the image is square.
3.1.2. Compile the project
The function pack for quantized models comes in 4 memory configurations :
- Quantized_Ext
- Quantized_Int_Fps
- Quantized_Int_Mem
- Quantized_Int_Split
As we saw in Part 2, the activation buffer takes more than 800 KB of RAM, for this reason, we can only use the Quantized_Ext configuration in order to place activation buffer. For more details on the memory configuration, refer to UM2611 section 3.2.4 "Memory requirements".
In order to compile only the Quantized_Ext configuration, select Project > Properties
from the top bar. Then select C/C++ Build from the left pane. Click on manage configuration and then delete all configurations which are not Quantized_Ext. You should be left with only one configuration.
Clean the project by selecting Project > Clean...
and clicking Clean
.
Finally, build the project clicking on Project > Build All
.
At the end of compilation, a file named STM32H747I_DISCO_CM7.hex
is generated in
workspace > FP_AI_VISION1 > Projects > STM32H747I-DISCO > Applications > FoodReco_MobileNetDerivative > Quantized_Model > STM32CubeIDE > STM32H747I_DISCO > Quantized_Ext
3.1.3. Flash the board
Connect the STM32H747I-DISCO to your PC via a micro-USB to USB cable. Open STM32CubeProgrammer and connect to ST-LINK. Then flash the board with the hex file.
3.1.4. Test the model
Plug the camera to the STM32H747I-DISCO board using a flex cable. In order the have the image in the upright position, the camera should be placed with the flex cable facing up as shown in the figure below. Once the camera is connected, power the board and press the reset button. After the "Welcome Screen", you will see the camera preview and output prediction of the model on the LCD Screen.
3.2. Troubleshooting
You may notice that once the model is running on STM32, the performance of the deep learning model is not as expected. This can be explained by a few things.
- Quantization: The quantization process can reduce the performance of the model, as going from a 32-bit floating point to a 8-bit integer representation means a loss of precision.
- Camera: The webcam used for training the model is different from the the camera on the DISCO board. This difference of data between the training and the inference can explain a loss of performance.