This message will disappear after all relevant tasks have been resolved.

Semantic MediaWiki

There are 1 incomplete or pending task to finish installation of Semantic MediaWiki. An administrator or user with sufficient rights can complete it. This should be done before adding new data to avoid inconsistencies.

This article shows how to use the Teachable Machine online tool with STM32Cube.AI and the FP-AI-VISION function pack to create an image classifier running on the STM32H747I-DISCO board.

This tutorial is divided into three parts: the first part shows how to use the Teachable Machine to train and export a deep learning model, then STM32Cube.AI is used to convert this model into optimized C code for STM32 MCUs. The last part explains how to integrate this new model into the FP-AI-VISION1 to run live inference on an STM32 board with a camera. The whole process is described below:

Information

Teachable Machine is an online tool allowing to quickly train a deep learning model for various tasks including image classification. It is an educational tool not suitable for production purposes.
STM32Cube.AI is a software aiming at the generation of optimized C code for STM32 and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).
The FP-AI-VISION1 Function Pack is a software example of an image classifier running on the STM32H747I-DISCO board. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048)

1. Prerequisites

1.1. Hardware

STM32H747I-DISCO Board
B-CAMS-OMV Flexible Camera Adapter board
A Micro-USB to USB cable
Optional: a webcam

1.2. Software

STM32Cube IDE
X-Cube-AI version 7.1.0 command line tool
FP-AI-VISION1 version 3.1.0
STM32CubeProgrammer

2. Training a model using Teachable Machine

In this section, we will train deep neural network in the browser using Teachable Machine. We first need to choose something to classify. In this example, we will classify ST boards and modules. The chosen boards are shown in the figure below:

You can choose whatever object you want to classify it: fruits, pasta, animals, people, etc...

Information

Note: If you do not have any webcam, Teachable Machine allows to import images from your computer.

Let's get started. Open https://teachablemachine.withgoogle.com/, preferably from Chrome browser.

Click Get started, then select Image Project, then Standard image model (224x244px color images). You will be presented with the following interface.

2.1. Adding training data

For each category you want to classify, edit the class name by clicking the pencil icon. In this example, we choose to start with SensorTile.

To add images with your webcam, click the webcam icon and record some images. If you have image files on your computer, click upload and select the directory containing your images.

The STM32H747 discovery kit combined with the B-CAMS-OMV camera daughter board can be used as a USB webcam. Using the ST kit for data collection will help to get better results as the same camera will be used for data collection and inference when the model will have been trained.

To use the ST kit as a webcam, simply program the board with the following binary of the function pack:

FP-AI-VISION1_V3.1.0/Projects/STM32H747I-DISCO/Applications/USB_Webcam/Binary/STM32H747I-DISCO_Webcam_V310.bin

Then plug a USB cable from the PC to the USB connector identified as USB OTG HS. Depending on how you oriented the camera board, you might prefer to flip the image. If you do so you need to use the same option when generating the code on STM32.

Information

No need to capture too many images, ~100 images per class is usually well enough. Try to vary camera angle, subject pose and scale as much as possible. To obtain the best results, a uniform background is recommended.

Once you have a satisfactory amount of images for this class, repeat the process for the next one until your dataset is complete.

Information

Note: It can be nice to have a "Background/Nothing" class, so that the model is able to tell when nothing is presented to the camera.

2.2. Training the model

Now that we have a good amount of data, we are going to train a deep learning model for classifying these different objects. To do this, click the Train Model button as shown below:

This process can take a while, depending on the amount of data you have. To monitor the training progress, you can select Advanced and click Under the hood. A side panel displays training metrics.

When the training is complete, you can see the predictions of your network on the "Preview" panel. You can either choose a webcam input or an imported file.

2.2.1. What happens under the hood (for the curious)

Teachable Machine is based on Tensorflow.js to allow neural network training and inference in the browser. However, as image classification is a task that requires a lot of training time, Teachable Machine uses a technique called transfer learning: The webpage downloads a MobileNetV2 model that was previously trained on a big image dataset of 1000 categories. The convolution layers of this pre-trained model are very good at doing feature extraction so they do not need to be trained again. Only the last layers of the neural network are trained using Tensorflow.js, thus saving a lot of time.

2.3. Exporting the model

If you are happy with your model, it is time to export it. To do so, click the Export Model button. In the pop-up window, select Tensorflow Lite, check Quantized and click Download my model.

Since the model conversion is done in the cloud, this step can take a few minutes.

Your browser downloads a zip file containing the model as a .tflite file and a .txt file containing your label. Extract these two files in an empty directory that we will call workspace in the rest of this tutorial.

2.3.1. Inspecting the model using Netron (optional)

It is always interesting to take a look at a model architecture as well as its input and output formats and shapes. To do this, use the Netron webapp.

Visit https://lutzroeder.github.io/netron/ and select Open model, then choose the model.tflite file from Teachable Machine. Click sequental_1_input: we observe that the input is of type uint8 and of size [1, 244, 244, 3]. Now let's look at the outputs: in this example we have 6 classes, so we see that the output shape is [1,6]. The quantization parameters are also reported. Refer to part 3 for how to use them.

3. Porting to a target board

3.1. STM32H747I-DISCO

In this part we will use the stm32ai command line tool to convert the TensorflowLite model to optimized C code for STM32.

Warning

Warning: FP-AI-VISION1 v3.1.0 is based on X-Cube-AI version 7.1.0. You can check your version of Cube.AI by running stm32ai --version

For ease of usage, add the X-Cube-AI installation folder to your path, for Windows:

 set CUBE_FW_DIR=C:\Users\<USERNAME>\STM32Cube\Repository
 set X_CUBE_AI_DIR=%CUBE_FW_DIR%\Packs\STMicroelectronics\X-CUBE-AI\7.1.0
 set PATH=%X_CUBE_AI_DIR%\Utilities\windows;%PATH%

Start by opening a shell in your workspace directory, then execute the following command:

 cd <path to your workspace>
 stm32ai generate -m model.tflite -v 2

The expected output is:

Neural Network Tools for STM32AI v1.6.0 (STM.ai v7.1.0-RC3)

 Exec/report summary (generate)
 ------------------------------------------------------------------------------------------------------------------------
 model file           : C:\path_to_workspace\model.tflite
 type                 : tflite
 c_name               : network
 compression          : lossless
 workspace dir        : C:\path_to_workspace\stm32ai_ws
 output dir           : C:\path_to_workspace\stm32ai_output
 model_name           : model
 model_hash           : 3e04a924905fea0274099abd7153e500
 input 1/1            : 'serving_default_sequential_1_input0'
                        150528 items, 147.00 KiB, ai_u8, scale=0.00747405, zero_point=134, (1,224,224,3), domain:user/
 output 1/1           : 'nl_71_0_conversion'
                        5 items, 5 B, ai_u8, scale=0.00390625, zero_point=0, (1,1,1,5), domain:user/
 params #             : 517,688 items (526.47 KiB)
 macc                 : 58,587,644
 weights (ro)         : 539,128 B (526.49 KiB) / +20(+0.0%) vs original model (1 segment)
 activations (rw)     : 610,784 B (596.47 KiB) (1 segment)
 ram (total)          : 761,317 B (743.47 KiB) = 610,784 + 150,528 + 5


 Generated C-graph summary
 ------------------------------------------------------------------------------------------------------------------------
 model name           : model
 c-name               : network
 c-node #             : 68
 c-array #            : 264
 activations size     : 610784 (1 segments)
 weights size         : 539128 (1 segments)
 macc                 : 58587644
 inputs               : ['serving_default_sequential_1_input0_output']
 outputs              : ['nl_71_0_conversion_output']

 C-Arrays (264)
 ------------------------------------------------------------------------------------------------------------------------------------------------
 c_id   name (*_array)                               item/size       domain/mem-pool           c-type          fmt                       comment
 ------------------------------------------------------------------------------------------------------------------------------------------------
 0      serving_default_sequential_1_input0_output   150528/150528   user/                     uint8_t         int8/ua                   /input
 1      conversion_0_output                          150529/150529   activations/**default**   int8_t          int8/sa  
 2      conv2d_2_output                              200704/200704   activations/**default**   int8_t          int8/sa  

( ... )

 261    conv2d_67_scratch0                           13248/13248     activations/**default**   int8_t          int/ss   
 262    conv2d_67_scratch1                           62720/62720     activations/**default**   int8_t          int8/sa  
 263    conv2d_67_scratch2                           62720/62720     activations/**default**   int8_t          int8/sa  
 ------------------------------------------------------------------------------------------------------------------------------------------------

 C-Layers (68)
 -----------------------------------------------------------------------------------------------------------------------------------------------
 c_id   name (*_layer)          id   layer_type         macc      rom      tensors                                         shape (array id)
 -----------------------------------------------------------------------------------------------------------------------------------------------
 0      conversion_0            0    conv               301056    0        I: serving_default_sequential_1_input0_output   (1,224,224,3) (0)
                                                                           O: conversion_0_output                          (1,224,224,3) (1)
 -----------------------------------------------------------------------------------------------------------------------------------------------
 1      conv2d_2                2    conv2d             5419024   496      I: conversion_0_output                          (1,224,224,3) (1)
                                                                           S: conv2d_2_scratch0                         
                                                                           S: conv2d_2_scratch1                         
                                                                           W: conv2d_2_weights                             (3,16,3,3) (69)
                                                                           W: conv2d_2_bias                                (1,1,1,16) (70)
                                                                           O: conv2d_2_output                              (1,112,112,16) (2)
 -----------------------------------------------------------------------------------------------------------------------------------------------

( ... )

Complexity report per layer - macc=58,587,764 weights=539,232 act=610,784 ram_io=150,534
---------------------------------------------------------------------------------
id   name           c_macc                    c_rom                     c_id    
---------------------------------------------------------------------------------
0    conversion_0   |                  0.5%   |                  0.0%   [0]     
2    conv2d_2       ||||||||||||       9.2%   |                  0.1%   [1]     
3    conv2d_3       ||||               3.1%   |                  0.0%   [2]     
4    conv2d_4       ||||               2.7%   |                  0.0%   [3]     
5    conv2d_5       |||||||||||        8.2%   |                  0.1%   [4]     
7    conv2d_7       |||                2.3%   |                  0.1%   [5]     
  
( ... )
  
-----------------------------------------------------------------------------------------------------------------------------------------------
 66     nl_71                   71   nl                 75        0        I: dense_70_0_conversion_output                 (1,1,1,5) (66)
                                                                           O: nl_71_output                                 (1,1,1,5) (67)
 -----------------------------------------------------------------------------------------------------------------------------------------------
 67     nl_71_0_conversion      71   conv               10        0        I: nl_71_output                                 (1,1,1,5) (67)
                                                                           O: nl_71_0_conversion_output                    (1,1,1,5) (68)
 -----------------------------------------------------------------------------------------------------------------------------------------------

This command generates five files under workspace/stm32ai_ouptut/:

network_config.h
network.c
network_data.c
network.h
network_data.h

Let's take a look at the highlighted lines: we learn that the model uses 526.49 Kbytes of weights (read-only memory) and 596.47 Kbytes of activations. The STM32H747xx MCUs do not have 596.47 Kbytes of contiguous RAM, we need either to use the external SDRAM present on the STM32H747-DISCO board. Refer to UM2411 section 5.8 "SDRAM" for more information. The optimal option is to use the multi-heap feature available from the v7.1 version of X-CUBE-AI (more information can be found in the m file:///C:/Users/<USERNAME>/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/7.1.0/Documentation/embedded_client_api.html#ref_multiple_heap. For easiness in this tutorial we will use the external memory option.

Information

These figures about memory footprint might be different for your model as it depends on the number of classes you have.

3.1.1. Integration with FP-AI-VISION1

In this part we will import our brand-new model into the FP-AI-VISION1 function pack. This function pack provides a software example for a food classification application. For more information on FP-AI-VISION1, go here.

The main objective of this section is to replace the network and network_data files in FP-AI-VISION1 by the newly generated files and make a few adjustments to the code.

3.1.1.1. Open the project

If it is not already done, download the zip file from ST website and extract the content to your workspace. It must now contain the following elements:

model.tflite
labels.txt
stm32ai_output
FP_AI_VISION1

If we take a look inside the function pack, we'll start from the FoodReco_MobileNetDerivative application we can see two configurations for the model data type, as shown below.

Since our model is a quantized one, we have to select the Quantized_Model directory.

Go into workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/STM32CubeIDE and double-click .project. STM32CubeIDE starts with the project loaded. You will notice 2 sub-project for each core of the microcontroller : CM4 and CM7, as we don't use CM4, ignore it and work with the CM7 project.

3.1.1.2. Replacing the network files

The model files are located in workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/CM7/ Src and Inc directory.

Delete the following files and replace them with the ones from workspace/stm32ai_output:

In Src:

network.c
network_data.c

In Inc:

network.h
network_data.h

3.1.1.3. Updating the labels and display

In this step we will update the labels for the network output. The label.txt file downloaded with Teachable Machine can help you doing this. In our example, the content of this file looks like this:

0 SensorTile
1 IoTNode
2 STLink
3 Craddle Ext
4 Fanout
5 Background

From STM32CubeIDE, open fp_vision_app.c. Go to line 125 where the output_labels is defined and update this variable with our label names: