Difference between revisions of "AI:How to use Teachable Machine to create an image classification application on STM32"

[quality revision] [quality revision]
m
m

This article shows how to use the Teachable Machine online tool with STM32Cube.AI and the FP-AI-VISION function pack to create an image classifier running on the STM32H747I-DISCO board.

This tutorial is divided into three parts: the first part shows how to use the Teachable Machine to train and export a deep learning model, then STM32Cube.AI is used to convert this model into optimized C code for STM32 MCUs. The last part explains how to integrate this new model into the FP-AI-VISION1 to run live inference on an STM32 board with a camera. The whole process is described below:

CLI_as_back-end


Info.png
  • Teachable Machine is an online tool allowing to quickly train a deep learning model for various tasks including image classification. It is an educational tool not suitable for production purposes.
  • STM32Cube.AI is a software aiming at the generation of optimized C code for STM32 and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).
  • The FP-AI-VISION1 Function Pack is a software example of an image classifier running on the STM32H747I-DISCO board. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048)

1 Prerequisites[edit]

1.1 Hardware[edit]

1.2 Software[edit]

2 Training a model using Teachable Machine[edit]

In this section, we will train deep neural network in the browser using Teachable Machine. We first need to choose something to classify. In this example, we will classify ST boards and modules. The chosen boards are shown in the figure below:

Boards used for classification

You can choose whatever object you want to classify it: fruits, pasta, animals, people, etc...

Info.png Note: If you do not have any webcam, Teachable Machine allows to import images from your computer.

Let's get started. Open https://teachablemachine.withgoogle.com/, preferably from Chrome browser.

Click Get started, then select Image Project, then Standard image model (224x244px color images). You will be presented with the following interface.

Teachable Machine interface

2.1 Adding training data[edit]

For each category you want to classify, edit the class name by clicking the pencil icon. In this example, we choose to start with SensorTile.

To add images with your webcam, click the webcam icon and record some images. If you have image files on your computer, click upload and select the directory containing your images.

The STM32H747 discovery kit combined with the B-CAMS-OMV camera daughter board can be used as a USB webcam. Using the ST kit for data collection will help to get better results as the same camera will be used for data collection and inference when the model will have been trained.

To use the ST kit as a webcam, simply program the board with the following binary of the function pack:

FP-AI-VISION1_V3.0.0/Projects/STM32H747I-DISCO/Applications/USB_Webcam/Binary/STM32H747I-DISCO_Webcam_V300.bin

Then plug a USB cable from the PC to the USB connector identified as USB OTG HS.

Adding images with a webcam
Info.png No need to capture too many images, ~100 images per class is usually well enough. Try to vary camera angle, subject pose and scale as much as possible. To obtain the best results, a uniform background is recommended.

Once you have a satisfactory amount of images for this class, repeat the process for the next one until your dataset is complete.

Info.png Note: It can be nice to have a "Background/Nothing" class, so that the model is able to tell when nothing is presented to the camera.

2.2 Training the model[edit]

Now that we have a good amount of data, we are going to train a deep learning model for classifying these different objects. To do this, click the Train Model button as shown below:

Train a model

This process can take a while, depending on the amount of data you have. To monitor the training progress, you can select Advanced and click Under the hood. A side panel displays training metrics.

When the training is complete, you can see the predictions of your network on the "Preview" panel. You can either choose a webcam input or an imported file.

Predictions from the model

2.2.1 What happens under the hood (for the curious)[edit]

Teachable Machine is based on Tensorflow.js to allow neural network training and inference in the browser. However, as image classification is a task that requires a lot of training time, Teachable Machine uses a technique called transfer learning: The webpage downloads a MobileNetV2 model that was previously trained on a big image dataset of 1000 categories. The convolution layers of this pre-trained model are very good at doing feature extraction so they do not need to be trained again. Only the last layers of the neural network are trained using Tensorflow.js, thus saving a lot of time.

2.3 Exporting the model[edit]


If you are happy with your model, it is time to export it. To do so, click the Export Model button. In the pop-up window, select Tensorflow Lite, check Quantized and click Download my model.

Export the model

Since the model conversion is done in the cloud, this step can take a few minutes.

Your browser downloads a zip file containing the model as a .tflite file and a .txt file containing your label. Extract these two files in an empty directory that we will call workspace in the rest of this tutorial.

2.3.1 Inspecting the model using Netron (optional)[edit]

It is always interesting to take a look at a model architecture as well as its input and output formats and shapes. To do this, use the Netron webapp.

Visit https://lutzroeder.github.io/netron/ and select Open model, then choose the model.tflite file from Teachable Machine. Click sequental_1_input: we observe that the input is of type uint8 and of size [1, 244, 244, 3]. Now let's look at the outputs: in this example we have 6 classes, so we see that the output shape is [1,6]. The quantization parameters are also reported. Refer to part 3 for how to use them.

Model visualization

3 Porting to a target board[edit]

3.1 STM32H747I-DISCO[edit]

In this part we will use the stm32ai command line tool to convert the TensorflowLite model to optimized C code for STM32.

Warning.png Warning: FP-AI-VISION1 v2v3.0.0 is based on X-Cube-AI version 56.10.0. You can check your version of Cube.AI by running stm32ai --version


Start by opening a shell in your workspace directory, then execute the following command:

PC $> cd <path to your workspace>
PC $> stm32ai generate -m model.tflite -v 2

The expected output is:

Neural Network Tools for STM32STM32AI v1.34.01 (AI tools v5.1STM.ai v6.0.0-RC6)
RunningCreated "generate" cmd...
-- Importing model
 model files : /path/to/workspace/model.tflite
 model type  : tflite (tflite)
-- Importing model - done (elapsed time 0.531s)
-- Rendering model
-- Rendering model - done (elapsed time 0.184s)
-- Generating C-code
Creating /path/to/workspace/stm32ai_output/network.c
Creating /path/to/workspace/stm32ai_output/network_data.c
Creating /path/to/workspace/stm32ai_output/network.h
Creating /path/to/workspace/stm32ai_output/network_data.h
-- Generating C-code - done (elapsed time 0.782s)

Creating report file /path/to/workspace/stm32ai_output/network_generate_report.txt

Exec/report summary (generate dur=1.500s err=0)
date       : date
Parameters         : generate -m model.tflite -v 2

Exec/report summary (generate)
------------------------------------------------------------------------------------------------------------------------
model file         : /path/to/workspace/model.tflite
type               : tflite
(tflite)
c_name             : network
compression        : None
quantize           : None
L2r error          : NOT EVALUATED
workspace dir      : /path/to/workspace/stm32ai_ws
output dir         : /path/to/workspace/stm32ai_output

model_name         : model
model_hash         : 2d2102c4ee97adb672ca9932853941b6bc9c8f8c7d3364832d581f05626edf2a
input              : sequential_1_input_0 [150,528150528 items, 147.00 KiB, ai_u8, scale=0.003921568859368563007843137718737125, zero_point=0127, (224, 224, 3)]
inputinputs (total)      : 147.00 KiB
output             : nl_71_fmt [6 items, 6 B, ai_i8u8, scale=0.00390625, zero_point=-1280, (1, 1, 6,)]
outputoutputs (total)     : 6 B
params #           : 517,794 items (526.59 KiB)
macc               : 6358,758587,922764
weights (ro)       : 539,232 B (526.59 KiB) 
activations (rw)   : 853610,648784 B (833596.6447 KiB) 
ram (total)        : 1761,004,182318 B (980743.6547 KiB) = 853610,648784 + 150,528 + 6

Model name - model ['sequential_1_input'] ['conversion_72']
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
id   layer (type)               output  shape                  param/size #       macc        connected to         |   c_size   c_macc           rom   c_type                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
0    sequential_1_input_0 (Input)   (h:224, w:224, c:3)       (224, 224, 3)                                              |                               
     conversion_0 (Conversion)    (h:224, w:224, c:3)                 input_0     301,056     sequential_1_input   |               301,056                 conv(i)[0]             
------------------------------------------------------------------------------------------------------------------
1   pad_1 (Pad)                (225, 225, 3)                 conversion_0                                         
------------------------------------------------------------------------------------------------------------------
2   conv2d_2 (Conv2D)          (112, 112, 16)    448         
1    pad_1 (Pad)                   5,820,432 (h:225, w:225, c:3)     496               nl_2 (Nonlinearity)        (112, 112, 16)                conv2d_2conversion_0         |                               
    
------------------------------------------------------------------------------------------------------------------

( ... )

------------------------------------------------------------------------------------------------------------------
71   nl_71 (Nonlinearity)       (1, 1, (c:6)                     dense_70               90  102        dense_70             |            +12(+13.3%)         nl()/conv(i)/o[66, 67] 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
72   conversion_72 (Conversion) (1, 1, (c:6)                     nl_71               12          nl_71                |            -12(-100.0%)       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
model/c-model: p=517794(526.59 KBytes) macc=63758922 rom=526.59 KBytes ram=833.64 KiB io_ram=147.01 KiB

 
Complexity per-macc=61,211,116/58,587,764 -2,623,352(-4.3%) weights=539,232/539,232  activations=--/610,784 io=--/150,534

( ... )

Complexity report per layer - macc=6358,758587,922764 romweights=539,232 act=610,784 ram_io=150,534
------------------------------------------------------------------------------------------------------------------
id   name   layer (type)               c_macc                                    c_rom                     c_id    
          
------------------------------------------------------------------------------------------------------------------
0       conversion_0 (Conversion)  ||                                0.5%   |                  0.0%   [0]     
      0.0% 
2       conv2d_2 (Conv2D)          |||||||||||||||||||||||||         9.1%2%   |                  0.1%   [1]     
      0.1% 
3       conv2d_3 (Conv2D)          ||||||||||                        3.5%1%   |                  0.0%   [2]     
      0.0% 
4       conv2d_4 (Conv2D)          |||||||                           2.5%7%   |                  0.0%   [3]     
      0.0% 
5       conv2d_5 (Conv2D)          ||||||||||||||||||||||||||        98.4%2%   |                  0.1%   [4]     
      0.1% 
7       conv2d_7 (Conv2D)          |||||||                           2.6%3%   |                  0.1%   [5]     
  
   0.1% 

( ... )
 64 
64    conv2d_64 (Conv2D)         ||||                              1.5%6%   |||||                3.7%   [59]    
     3.7% 
65      conv2d_65 (Conv2D)         |                                 0.3%   |                  0.8%   [60]    
       0.8% 
66      conv2d_66 (Conv2D)         ||||||||               3.1%           2.9%  ||||||||               7.1%   [61]    
   7.1% 
67      conv2d_67 (Conv2D)         |||||||||||||||||||||||||||||||  1112.3%1%   |||||||||||||||||||||||||||||||  27.5%  69 [62]    
dense_69 (Dense)  dense_69         |                                 0.2%   ||||||||||||||||||||||||||       23.8%  70 [63]    
dense_70 (Dense)  dense_70         |                                 0.0%   |                                 0.1%  71 [64, 65]
71   nl_71 (Nonlinearity)       |  |                               0.0%   |                                 0.0%  ------------------------------------------------------------------------------------------------------------------
 [66, 67]

This command generates four files under workspace/stm32ai_ouptut/:

  • network.c
  • network_data.c
  • network.h
  • network_data.h

Let's take a look at the highlighted lines: we learn that the model uses 526.59 Kbytes of weights (read-only memory) and 833596.54 47 Kbytes of activations. As the STM32H747xx MCUs do not have 833 596.47 Kbytes of contiguous RAM, we need to use the external SDRAM present on the STM32H747-DISCO board. Refer to UM2411 section 5.8 "SDRAM" for more information.

Info.png These figures about memory footprint might be different for your model as it depends on the number of classes you have.

3.1.1 Integration with FP-AI-VISION1[edit]

In this part we will import our brand-new model into the FP-AI-VISION1 function pack. This function pack provides a software example for a food classification application. For more information on FP-AI-VISION1, go here.

The main objective of this section is to replace the network and network_data files in FP-AI-VISION1 by the newly generated files and make a few adjustments to the code.

3.1.1.1 Open the project[edit]

If it is not already done, download the zip file from ST website and extract the content to your workspace. It must now contain the following elements:

  • model.tflite
  • labels.txt
  • stm32ai_output
  • FP_AI_VISION1

If we take a look inside the function pack, we'll start from the FoodReco_MobileNetDerivative application we can see two configurations for the model data type, as shown below.

FP-AI-VISION1 model data types

Since our model is a quantized one, we have to select the Quantized_Model directory.

Go into workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/STM32CubeIDE and double-click .project. STM32CubeIDE starts with the project loaded. You will notice 2 sub-project for each core of the microcontroller : CM4 and CM7, as we don't use CM4, ignore it and work with the CM7 project.

3.1.1.2 Replacing the network files[edit]

The model files are located in workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/CM7/ Src and Inc directory.

Delete the following files and replace them with the ones from workspace/stm32ai_output:

In Src:

  • network.c
  • network_data.c

In Inc:

  • network.h
  • network_data.h

3.1.1.3 Updating the labels and display[edit]

In this step we will update the labels for the network output. The label.txt file downloaded with Teachable Machine can help you doing this. In our example, the content of this file looks like this:

0 SensorTile
1 IoTNode
2 STLink
3 Craddle Ext
4 Fanout
5 Background

From STM32CubeIDE, open fp_vision_app.c. Go to line 142 123 where the output_labels is defined and update this variable with our label names:

This snippet is provided AS IS, and by taking it, you agree to be bound to the license terms which can be found here for the component: Application.
// fp_vision_app.c line 142123
const char* output_labels[AI_NET_OUTPUT_SIZE] = {
    "SensorTile", "IoTNode", "STLink", "Craddle Ext", "Fanout", "Background"};

While we're here, we'll update the display mode that it shows camera image instead of food logos. Go around line 230 224 and update the App_Output_Display function. At the top of the function, the display_mode variable should be set to 01.

This snippet is provided AS IS, and by taking it, you agree to be bound to the license terms which can be found here for the component: Application.
static void App_Output_Display(AppContext_TypeDef *App_Context_Ptr)
{
  static uint32_t occurrence_number = NN_OUTPUT_DISPLAY_REFRESH_RATE;
  static uint32_t display_mode = 01; // UpdatedWas 0

3.1.1.4 Cropping the image[edit]

Teachable Machine crops the webcam image to fit the model input size. In FP-AI-VISION1, the image is resized to the model input size, hence losing the aspect ratio. We will change this default behavior and implement a crop of the camera image.

In order to have square images and avoid image deformation we are going to crop the camera image using the DCMI. The goal of this step is to go from the 640x480 resolution to a 480x480 resolution.

First, edit fp_vision_camera.h line 59 60 to update the CAMERA_WIDTH define to 480 pixels:

This snippet is provided AS IS, and by taking it, you agree to be bound to the license terms which can be found here for the component: Application.
//fp_vision_camera.h line 5960
#if CAMERA_CAPTURE_RES == VGA_640_480_RES
#define CAMERA_RESOLUTION CAMERA_R640x480
#define CAM_RES_WIDTH 480 // Was 640
#define CAM_RES_HEIGHT 480

Then, edit fp_vision_camera.c located in Application/.

Modify the CAMERA_Init function (line 5159) to configure DCMI cropping (update the function with the highlighted code bellow) :

This snippet is provided AS IS, and by taking it, you agree to be bound to the license terms which can be found here for the component: Application.
void CAMERA_Init(CameraContext_TypeDef* Camera_Context_Ptr)
{
  CAMERA_Context_Init(Camera_Context_Ptr);

  /* Reset and power down camera to be sure camera is Off prior start */
  BSP_CAMERA_PwrDown(0);
  
  /* Wait delay */ 
  HAL_Delay(200);
  
  /* Initialize the Camera */
  if (BSP_CAMERA_Init(0, CAMERA_RESOLUTION, CAMERA_PF_RGB565) != BSP_ERROR_NONE) 
  {
    Error_Handler();
  }

  HAL_RCC_MDMA_CLK_ENABLE();

  (...)

  /* Set camera mirror / flip configuration */
  CAMERA_Set_MirrorFlip(Camera_Context_Ptr, Camera_Context_Ptr->mirror_flip);

  HAL_Delay(100);
  
  /* Center-crop the 640x480 frame to 480x480 */
  const uint32_t x0 = (640 - 480) / 2;
  const uint32_t y0 = 0;

  /* Note: 1 px every 2 DCMI_PXCLK (8-bit interface in RGB565) */
  HAL_DCMI_ConfigCrop(&hcamera_dcmi,
                      x0 * 2,
                      y0,
                      CAM_RES_WIDTH * 2 - 1,
                      CAM_RES_HEIGHT - 1);

  HAL_DCMI_EnableCrop(&hcamera_dcmi);
  HAL_DCMI_Start_DMA(&hcamera_dcmi, DCMI_MODE_CONTINUOUS,
       /* Wait for the camera initialization after HW reset */ 
  HAL_Delay(200);
  
  /*
   (uint32_t)Camera_Context_Ptr->camera_capture_buffer, (CAM_RES_WIDTH*CAM_RES_HEIGHT) / 2);

  /* Start the Camera Capture */    
  /* [REMOVED]
  if(BSP_CAMERA_Start(0, (uint8_t *)Camera_Context_Ptr->camera_capture_buffer Start the Camera Capture
   * Using intermediate line buffer in D2-AHB domain to support high pixel clocks.
   */
  if (HAL_DCMIEx_Start_DMA_MDMA(&hcamera_dcmi, CAMERA_MODE_CONTINUOUS)!=BSP_ERROR_NONE),
  {     while(1);   }   */      /* Wait for the camera initialization after HW reset */    HAL_Delay(200);(uint8_t *)Camera_Context_Ptr->camera_capture_buffer,
    #if MEMORY_SCHEME == FULL_INTERNAL_MEM_OPT   /* Wait until camera acquisition of first frame is completed => frame ignored*/   while (Camera_Context_Ptr->new_frame_ready == 0)   {     BSP_LED_Toggle(LED_GREEN);
    HAL_Delay(100);
  };
  Camera_Context_Ptr->new_frame_ready = 0;
#endif
CAM_LINE_SIZE, CAM_RES_HEIGHT) != HAL_OK)
  {
    while(1);
  }

Now image cropping is enabled and the image is square.

3.1.2 Compiling the project[edit]

The function pack for quantized models comes in four different memory configurations :

  • Quantized_Ext
  • Quantized_Int_Fps
  • Quantized_Int_Mem
  • Quantized_Int_Split

As we saw in Part 2, the activation buffer requires more than 800 512 Kbytes of RAM. For this reason, we can only use the Quantized_Ext configuration to place activation buffer. For more details on the memory configuration, refer to UM2611 section 3.2.4 "Memory requirements".

To compile only the Quantized_Ext configuration, select Project > Properties from the top bar. Then select C/C++ Build from the left pane. Click manage configuration and then delete all configurations that are not Quantized_Ext. Only one configuration is left.

Memory configuration settings

Clean the project by selecting Project > Clean... and clicking Clean.

Eventually, build the project by clicking Project > Build All.

When the compilation is complete, a file named STM32H747I_DISCO_CM7.hex is generated in

workspace > FP_AI_VISION1 > Projects > STM32H747I-DISCO > Applications > FoodReco_MobileNetDerivative > Quantized_Model > STM32CubeIDE > STM32H747I_DISCO > Quantized_Ext

3.1.3 Flashing the board[edit]

Connect the STM32H747I-DISCO to your PC via a Micro-USB to USB cable. Open STM32CubeProgrammer and connect to ST-LINK. Then flash the board with the hex file.

3.1.4 Testing the model[edit]

Connect the camera to the STM32H747I-DISCO board using a flex cable. To have the image in the upright position, the camera must be placed with the flex cable facing up as shown in the figure below. Once the camera is connected, power on the board and press the reset button. After the "Welcome Screen", you will see the camera preview and output prediction of the model on the LCD Screen.

Model inference running onboard

3.2 Troubleshooting[edit]

You may notice that once the model is running on STM32, the performance of the deep learning model is not as expected. The rationale is the following:

  • Quantization: the quantization process can reduce the performance of the model, as going from a 32-bit floating point to a 8-bit integer representation means a loss in precision.
  • Camera: the webcam used for training the model is different from the the camera on the Discovery board. This difference of data between the training and the inference can explain a loss in performance.


This article shows how to use the '''Teachable Machine''' online tool with '''STM32Cube.AI''' and the '''FP-AI-VISION''' function pack to create an image classifier running on the STM32H747I-DISCO board.

This tutorial is divided into three parts: the first part shows how to use the Teachable Machine to train and export a deep learning model, then STM32Cube.AI is used to convert this model into optimized C code for STM32 MCUs. The last part explains how to integrate this new model into the FP-AI-VISION1 to run live inference on an STM32 board with a camera. The whole process is described below:
<div class="res-img">

[[File:tm_workflow.png|center|alt=CLI_as_back-end|Teachable Machine with STM32 workflow]]</div>


{{Info| 
* Teachable Machine is an online tool allowing to quickly train a deep learning model for various tasks including image classification. It is an educational tool not suitable for production purposes. 
* STM32Cube.AI is a software aiming at the generation of optimized C code for STM32 and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement ([https://st.com/SLA0048 SLA0048]).
* The FP-AI-VISION1 Function Pack is a software example of an image classifier running on the STM32H747I-DISCO board. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement ([https://st.com/SLA0048 SLA0048]) }}

= Prerequisites =

== Hardware ==

* [https://www.st.com/en/evaluation-tools/stm32h747i-disco.html STM32H747I-DISCO Board] 
* [https://www.st.com/en/development-tools/b-cams-omv.html B-CAMS-OMV Flexible Camera Adapter board]
* A Micro-USB to USB cable
* ''Optional: A webcam''

== Software ==

* [https://www.st.com/en/development-tools/stm32cubeide.html STM32Cube IDE]
* [https://www.st.com/en/embedded-software/x-cube-ai.html X-Cube-AI] '''version 5.16.0.0''' command line tool
* [https://www.st.com/en/embedded-software/fp-ai-vision1.html FP-AI-VISION1] version 23.0.0
* [https://www.st.com/en/development-tools/stm32cubeprog.html STM32CubeProgrammer]

= Training a model using Teachable Machine =

In this section, we will train deep neural network in the browser using ''Teachable Machine''. We first need to choose something to classify. In this example, we will classify ST boards and modules. The chosen boards are shown in the figure below:
<div class="res-img">

[[File:tm_boards.png|center|alt=Boards used for classification|Boards used for classification]]</div>


You can choose whatever object you want to classify it: fruits, pasta, animals, people, etc...

{{Info| Note: If you do not have any webcam, Teachable Machine allows to import images from your computer.}}

Let's get started. Open https://teachablemachine.withgoogle.com/, preferably from [https://www.google.com/chrome/ Chrome] browser.

Click {{Highlight|Get started}}, then select {{Highlight| Image Project}}, then {{Highlight| Standard image model}} (224x244px color images). You will be presented with the following interface.
<div class="res-img">

[[File:tm_blank_interface.png|center|alt=Teachable Machine interface|Teachable Machine interface]]</div>


== Adding training data ==
<!--
{{Info|You can skip this part by importing the dataset for this example provided in  [http://stm32ai_github_link.here here] and clicking on {{Highlight| Teachable Machine}} on the top-left side, then {{Highlight| Open Project file}} }}
-->

For each category you want to classify, edit the class name by clicking the pencil icon. In this example, we choose to start with <code>SensorTile</code>.

To add images with your webcam, click the webcam icon and record some images. If you have image files on your computer, click upload and select the directory containing your images.
The STM32H747 discovery kit combined with the B-CAMS-OMV camera daughter board can be used as a USB webcam.
Using the ST kit for data collection will help to get better results as the same camera will be used for data collection and inference when the model will have been trained.

To use the ST kit as a webcam, simply program the board with the following binary of the function pack:

FP-AI-VISION1_V3.0.0/Projects/STM32H747I-DISCO/Applications/USB_Webcam/Binary/STM32H747I-DISCO_Webcam_V300.bin

Then plug a USB cable from the PC to the USB connector identified as USB OTG HS.
<div class="res-img">

[[File:tm_add_images.png|center|alt=Adding images with a webcam|Adding images with a webcam]]</div>


{{Info| No need to capture too many images, ~100 images per class is usually well enough. Try to vary camera angle, subject pose and scale as much as possible. To obtain the best results, a uniform background is recommended.}}

Once you have a satisfactory amount of images for this class, repeat the process for the next one until your dataset is complete.

{{Info|Note: It can be nice to have a &quot;Background/Nothing&quot; class, so that the model is able to tell when nothing is presented to the camera.}}

== Training the model ==

Now that we have a good amount of data, we are going to train a deep learning model for classifying these different objects. To do this, click the {{Highlight| Train Model}} button as shown below: 
<div class="res-img">

[[File:tm_train_model.png|center|alt=Train a model|Train a model]]</div>


This process can take a while, depending on the amount of data you have. To monitor the training progress, you can select {{Highlight|Advanced}} and click {{Highlight|Under the hood}}. A side panel displays training metrics.

When the training is complete, you can see the '''predictions''' of your network on the &quot;Preview&quot; panel. You can either choose a webcam input or an imported file.
<div class="res-img">

[[File:tm_predictions.png|center|alt=Predictions from the model|Predictions from the model]]</div>


=== What happens under the hood  (for the curious) ===

Teachable Machine is based on [https://www.tensorflow.org/js Tensorflow.js] to allow neural network training and inference in the browser. However, as image classification is a task that requires a lot of training time, Teachable Machine uses a technique called '''transfer learning''': The webpage downloads a MobileNetV2 model that was previously trained on a big image dataset of 1000 categories. The convolution layers of this pre-trained model are very good at doing feature extraction so they do not need to be trained again. Only the last layers of the neural network are trained using Tensorflow.js, thus saving a lot of time.

== Exporting the model ==

{{InternalInfo|You may run into an issue during this step when using the website from an ST PC: the message ''Something went wrong while converting the model'' appears when clicking Download. This is due to the fact that Zscaler issues a warning when Teachable Machine makes the request for the model file. To overcome this problem, visit https://converter-release-2-1-2-bqjfnlxgwq-uc.a.run.app and accept ZScaler Warning. Ignore whatever is on this page and go back to Teachable Machine.}} 

If you are happy with your model, it is time to export it. To do so, click the {{Highlight| Export Model}} button. In the pop-up window, select {{Highlight|Tensorflow Lite}}, check {{Highlight| Quantized}} and click {{Highlight| Download my model}}.
<div class="res-img">

[[File:tm_export.png|center|alt=Export the model|Export the model]]</div>


Since the model conversion is done in the cloud, this step can take a few minutes.

Your browser downloads a zip file containing the model as a <code>.tflite</code> file and a <code>.txt</code> file containing your label. Extract these two files in an empty directory that we will call <code>workspace</code> in the rest of this tutorial.

=== Inspecting the model using Netron (optional)===

It is always interesting to take a look at a model architecture as well as its input and output formats and shapes. To do this, use the Netron webapp.

Visit https://lutzroeder.github.io/netron/ and select {{Highlight|Open model}}, then choose the <code>model.tflite</code> file from Teachable Machine. Click <code>sequental_1_input</code>: we observe that the input is of type <code>uint8</code> and of size <code>[1, 244, 244, 3]</code>. Now let's look at the outputs: in this example we have 6 classes, so we see that the output shape is <code>[1,6]</code>. The quantization parameters are also reported. Refer to part 3 for how to use them.
<div class="res-img">

[[File:tm_netron.png|center|alt=Model visualization|Model visualization]]</div>


= Porting to a target board =

== STM32H747I-DISCO ==

In this part we will use the <code>stm32ai</code> command line tool to convert the TensorflowLite model to optimized C code for STM32.
<!--
{{Info|For instructions on how to setup X-Cube-AI command line environment, please refer to [[How_to_use_STM32Cube.AI_command_line#Setting_the_environment|this article]]
 }}
-->

{{Warning|Warning: FP-AI-VISION1 v2v3.0.0 is based on X-Cube-AI version 5.16.0.0. You can check your version of Cube.AI by running <code>stm32ai --version</code>}}

Start by opening a shell in your workspace directory, then execute the following command:
 {{PC$}} cd <path to your workspace>

 {{PC$}} stm32ai generate -m model.tflite -v 2

The expected output is:
<source lang="text" highlight="37-3823-24">

Neural Network Tools for STM32STM32AI v1.3.0 (AI tools v5.1.0)
Running "generate" cmd...
-- Importing model
 model files : /path/to/workspace/model.tflite
 model type  : tflite (tflite)
-- Importing model - done (elapsed time 0.531s)
-- Rendering model
-- Rendering model - done (elapsed time 0.184s)
-- Generating C-code
Creating /path/to/workspace/stm32ai_output/network.c
Creating /path/to/workspace/stm32ai_output/network_data.c
Creating /path/to/workspace/stm32ai_output/network.h
Creating /path/to/workspace/stm32ai_output/network_data.h
-- Generating C-code - done (elapsed time 0.782s)

Creating report file /path/to/workspace/stm32ai_output/network_generate_report.txt

Exec/report summary (generate dur=1.500s err=0)4.1 (STM.ai v6.0.0-RC6)
Created date       : date
Parameters         : generate -m model.tflite -v 2

Exec/report summary (generate)
------------------------------------------------------------------------------------------------------------------------
model file         : /path/to/workspace/model.tflite
type               : tflite(tflite)c_name             : network
compression        : None
quantize           : NoneL2r error          : NOT EVALUATEDworkspace dir      : /path/to/workspace/stm32ai_ws
output dir         : /path/to/workspace/stm32ai_output

model_name         : model
model_hash         : 2d2102c4ee97adb672ca9932853941b6bc9c8f8c7d3364832d581f05626edf2a

input              : sequential_1_input_0 [150,528 [150528 items, 147.00 KiB, ai_u8, scale=0.003921568859368563007843137718737125, zero=0_point=127, (224, 224, 3)]inputinputs (total)      : 147.00 KiB
output             : nl_71_fmt [6 items, 6 B, ai_i8u8, scale=0.00390625, zero=-128, (6,)]
output (total)     _point=0, (1, 1, 6)]
outputs (total)    : 6 B
params #           : 517,794 items (526.59 KiB)
macc               : 63,758,92258,587,764

weights (ro)       : 539,232 B (526.59 KiB) 
activations (rw)   : 853,648 (833.64610,784 B (596.47 KiB) 
ram (total)        : 1,004,182 (980.65 KiB) = 853,648 + 150,528 + 6

------------------------------------------------------------------------------------------------------------------
id  layer (type)               output shape      param #     connected to             macc           rom          
------------------------------------------------------------------------------------------------------------------
0   input_0 (Input)            (224, 224, 3)                                                                      
    conversion_0 (Conversion)  (224, 224, 3)                 input_0                  301,056                     
------------------------------------------------------------------------------------------------------------------
1   pad_1 (Pad)                (225, 225, 3)                 conversion_0                                         
------------------------------------------------------------------------------------------------------------------
2   conv2d_2 (Conv2D)          (112, 112, 16)    448         pad_1                    5,820,432      496          
    nl_2 (Nonlinearity)        (112, 112, 16)                conv2d_2                                             
------------------------------------------------------------------------------------------------------------------

( ... )

------------------------------------------------------------------------------------------------------------------
71  nl_71 (Nonlinearity)       (1, 1, 6)                     dense_70                 102                         
------------------------------------------------------------------------------------------------------------------
72  conversion_72 (Conversion) (1, 1, 6)                     nl_71                                                
------------------------------------------------------------------------------------------------------------------
model p=517794(526.59 KBytes) macc=63758922 rom=526.59 KBytes ram=833.64 KiB io_ram=147.01 KiB

Complexity per-layer - macc=63,758,922 rom=539,232
------------------------------------------------------------------------------------------------------------------
id      layer (type)               macc                                    rom                                    
------------------------------------------------------------------------------------------------------------------
0       conversion_0 (Conversion)  ||                                0.5%  |                                 0.0% 
2       conv2d_2 (Conv2D)          |||||||||||||||||||||||||         9.1%  |                                 0.1% 
3       conv2d_3 (Conv2D)          ||||||||||                        3.5%  |                                 0.0% 
4       conv2d_4 (Conv2D)          |||||||                           2.5%  |                                 0.0% 
5       conv2d_5 (Conv2D)          ||||||||||||||||||||||||||        9.4%  |                                 0.1% 
7       conv2d_7 (Conv2D)          |||||||                           2.6%  |                                 0.1% 

( ... )

64      conv2d_64 (Conv2D)         ||||                              1.5%  |||||                             3.7% 
65      conv2d_65 (Conv2D)         |                                 0.3%  |                                 0.8% 
66      conv2d_66 (Conv2D)         ||||||||                          2.9%  ||||||||                          7.1% 
67      conv2d_67 (Conv2D)         |||||||||||||||||||||||||||||||  11.3%  |||||||||||||||||||||||||||||||  27.5% 
69      dense_69 (Dense)           |                                 0.2%  ||||||||||||||||||||||||||       23.8% 
70      dense_70 (Dense)           |                                 0.0%  |                                 0.1% 
71      nl_71 (Nonlinearity)       |                                 0.0%  |                                 0.0% 
------------------------------------------------------------------------------------------------------------------761,318 B (743.47 KiB) = 610,784 + 150,528 + 6

Model name - model ['sequential_1_input'] ['conversion_72']
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
id   layer (type)                 shape                  param/size        macc        connected to         |   c_size   c_macc              c_type                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
0    sequential_1_input (Input)   (h:224, w:224, c:3)                                                       |                               
     conversion_0 (Conversion)    (h:224, w:224, c:3)                      301,056     sequential_1_input   |                                conv(i)[0]             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
1    pad_1 (Pad)                  (h:225, w:225, c:3)                                  conversion_0         |                               
-------------------------------------------------------------------------------------------------

( ... )

--------------------------------------------------------------------
71   nl_71 (Nonlinearity)         (c:6)                                    90          dense_70             |            +12(+13.3%)         nl()/conv(i)/o[66, 67] 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
72   conversion_72 (Conversion)   (c:6)                                    12          nl_71                |            -12(-100.0%)       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
model/c-model: macc=61,211,116/58,587,764 -2,623,352(-4.3%) weights=539,232/539,232  activations=--/610,784 io=--/150,534

( ... )

Complexity report per layer - macc=58,587,764 weights=539,232 act=610,784 ram_io=150,534
---------------------------------------------------------------------------------
id   name           c_macc                    c_rom                     c_id    
---------------------------------------------------------------------------------
0    conversion_0   |                  0.5%   |                  0.0%   [0]     
2    conv2d_2       ||||||||||||       9.2%   |                  0.1%   [1]     
3    conv2d_3       ||||               3.1%   |                  0.0%   [2]     
4    conv2d_4       ||||               2.7%   |                  0.0%   [3]     
5    conv2d_5       |||||||||||        8.2%   |                  0.1%   [4]     
7    conv2d_7       |||                2.3%   |                  0.1%   [5]     

( ... )

64   conv2d_64      ||                 1.6%   |||                3.7%   [59]    
65   conv2d_65      |                  0.3%   |                  0.8%   [60]    
66   conv2d_66      ||||               3.1%   ||||               7.1%   [61]    
67   conv2d_67      ||||||||||||||||  12.1%   ||||||||||||||||  27.5%   [62]    
69   dense_69       |                  0.2%   |||||||||||||     23.8%   [63]    
70   dense_70       |                  0.0%   |                  0.1%   [64, 65]
71   nl_71          |                  0.0%   |                  0.0%   [66, 67]</source>

This command generates four files under <code>workspace/stm32ai_ouptut/</code>:

* network.c
* network_data.c
* network.h
* network_data.h

Let's take a look at the highlighted lines: we learn that the model uses 526.59 Kbytes of weights (read-only memory) and 833.54596.47 Kbytes of activations. As the STM32H747xx MCUs do not have 833596.47 Kbytes of contiguous RAM, we need to use the external SDRAM present on the STM32H747-DISCO board. Refer to  [https://www.st.com/resource/en/user_manual/dm00504240-discovery-kit-with-stm32h747xi-mcu-stmicroelectronics.pdf UM2411] section 5.8 "SDRAM" for more information.

{{Info|These figures about memory footprint might be different for your model as it depends on the number of classes you have.
 }}

=== Integration with FP-AI-VISION1 ===

In this part we will import our brand-new model into the FP-AI-VISION1 function pack. This function pack provides a software example for a food classification application. For more information on FP-AI-VISION1, go  [https://www.st.com/en/embedded-software/fp-ai-vision1.html here].

The main objective of this section is to replace the <code>network</code> and <code>network_data</code> files in FP-AI-VISION1 by the newly generated files and make a few adjustments to the code.
<!-- Not used 
{{Info| The whole project containing all modifications presented bellow is available for download [http://github_link here]}}
-->

==== Open the project ====

If it is not already done, download the zip file from ST website and extract the content to your workspace. It must now contain the following elements:

* model.tflite
* labels.txt
* stm32ai_output
* FP_AI_VISION1

If we take a look inside the function pack, we'll start from the '''FoodReco_MobileNetDerivative''' application we can see two configurations for the model data type, as shown below.
<div class="res-img">

[[File:tm_fp_files.png|center|alt=FP-AI-VISION1 model data types|FP-AI-VISION1 model data types]]</div>


Since our model is a quantized one, we have to select the ''Quantized_Model'' directory.

Go into <code>workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/STM32CubeIDE</code> and double-click <code>.project</code>. STM32CubeIDE starts with the project loaded. You will notice 2 sub-project for each core of the microcontroller : CM4 and CM7, as we don't use CM4, ignore it and work with the CM7 project.

==== Replacing the network files ====

The model files are located in <code>workspace/FP_AI_VISION1/Projects/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative/Quantized_Model/CM7/</code> <code>Src</code> and <code>Inc</code> directory.

Delete the following files and replace them with the ones from <code>workspace/stm32ai_output</code>:

In <code>Src</code>:

* network.c
* network_data.c

In <code>Inc</code>:

* network.h
* network_data.h

==== Updating the labels and display ====

In this step we will update the labels for the network output. The <code>label.txt</code> file downloaded with Teachable Machine can help you doing this. In our example, the content of this file looks like this:
<pre>0 SensorTile
1 IoTNode
2 STLink
3 Craddle Ext
4 Fanout
5 Background</pre>


From STM32CubeIDE, open <code>fp_vision_app.c</code>. Go to line 142123 where the <code>output_labels</code> is defined and update this variable with our label names:
{{Snippet | category=AI | component=Application | snippet=<source lang="c">

// fp_vision_app.c line 142123

const char* output_labels[AI_NET_OUTPUT_SIZE] = {
    "SensorTile", "IoTNode", "STLink", "Craddle Ext", "Fanout", "Background"};</source>

}}

While we're here, we'll update the display mode that it shows camera image instead of food logos. Go around line 230224 and update the <code>App_Output_Display</code> function. At the top of the function, the <code>display_mode</code> variable should be set to 01.

{{Snippet | category=AI | component=Application | snippet=<source lang="c">

static void App_Output_Display(AppContext_TypeDef *App_Context_Ptr)
{
  static uint32_t occurrence_number = NN_OUTPUT_DISPLAY_REFRESH_RATE;
  static uint32_t display_mode = 0; // Updated1; // Was 0</source>

}}

==== Cropping the image ====
Teachable Machine crops the webcam image to fit the model input size. In FP-AI-VISION1, the image is resized to the model input size, hence losing the aspect ratio. We will change this default behavior and implement a crop of the camera image.

In order to have square images and avoid image deformation we are going to crop the camera image using the DCMI. The goal of this step is to go from the 640x480 resolution to a 480x480 resolution.

First, edit <code>fp_vision_camera.h</code> line 5960 to update the <code>CAMERA_WIDTH</code> define to 480 pixels: 

{{Snippet | category=AI | component=Application | snippet=<source lang="c" highlight="4">

//fp_vision_camera.h line 5960

#if CAMERA_CAPTURE_RES == VGA_640_480_RES
#define CAMERA_RESOLUTION CAMERA_R640x480
#define CAM_RES_WIDTH 480 // Was 640
#define CAM_RES_HEIGHT 480</source>

}}

Then, edit <code>fp_vision_camera.c</code> located in <code>Application/</code>.

Modify the <code>CAMERA_Init</code> function (line 5159) to configure DCMI cropping (update the function with the highlighted code bellow) : 

{{Snippet | category=AI | component=Application | snippet=<source lang="c" highlight="22-4514-27">

void CAMERA_Init(CameraContext_TypeDef* Camera_Context_Ptr)
{
  CAMERA_Context_Init(Camera_Context_Ptr);
/* Reset and power down camera to be sure camera is Off prior start */
  BSP_CAMERA_PwrDown(0);

  /* Wait delay */ 
  HAL_Delay(200);

  /* Initialize the Camera */
  if (BSP_CAMERA_Init(0, CAMERA_RESOLUTION, CAMERA_PF_RGB565) != BSP_ERROR_NONE) 
  {
    Error_Handler();
  }
__HAL_RCC_MDMA_CLK_ENABLE();

  (...)
/* Set camera mirror / flip configuration */
  CAMERA_Set_MirrorFlip(Camera_Context_Ptr, Camera_Context_Ptr->mirror_flip);

  HAL_Delay(100);

  /* Center-crop the 640x480 frame to 480x480 */
  const uint32_t x0 = (640 - 480) / 2;
  const uint32_t y0 = 0;

  /* Note: 1 px every 2 DCMI_PXCLK (8-bit interface in RGB565) */
  HAL_DCMI_ConfigCrop(&hcamera_dcmi,
                      x0 * 2,
                      y0,
                      CAM_RES_WIDTH * 2 - 1,
                      CAM_RES_HEIGHT - 1);

  HAL_DCMI_EnableCrop(&hcamera_dcmi);HAL_DCMI_Start_DMA(&hcamera_dcmi, DCMI_MODE_CONTINUOUS,
                     (uint32_t)Camera_Context_Ptr->camera_capture_buffer, (CAM_RES_WIDTH*CAM_RES_HEIGHT) / 2);

  /* Start the Camera Capture */    
  /* [REMOVED]
  if(BSP_CAMERA_Start(0, (uint8_t *)Camera_Context_Ptr->camera_capture_buffer, CAMERA_MODE_CONTINUOUS)!=BSP_ERROR_NONE)
  {
    while(1);
  }
  */

  /* Wait for the camera initialization after HW reset */ 
  HAL_Delay(200);

#if MEMORY_SCHEME == FULL_INTERNAL_MEM_OPT
  /* Wait until camera acquisition of first frame is completed => frame ignored*/
  while (Camera_Context_Ptr->new_frame_ready == 0)
  {
    BSP_LED_Toggle(LED_GREEN);
    HAL_Delay(100);
  };
  Camera_Context_Ptr->new_frame_ready = 0;
#endif/* Wait for the camera initialization after HW reset */ 
  HAL_Delay(200);

  /*
   * Start the Camera Capture
   * Using intermediate line buffer in D2-AHB domain to support high pixel clocks.
   */
  if (HAL_DCMIEx_Start_DMA_MDMA(&hcamera_dcmi, CAMERA_MODE_CONTINUOUS,
                                (uint8_t *)Camera_Context_Ptr->camera_capture_buffer,
                                CAM_LINE_SIZE, CAM_RES_HEIGHT) != HAL_OK)
  {
    while(1);}
</source>

}}

Now image cropping is enabled and the image is square.

=== Compiling the project ===

The function pack for quantized models comes in four different memory configurations :

* Quantized_Ext
* Quantized_Int_Fps
* Quantized_Int_Mem
* Quantized_Int_Split

As we saw in Part 2, the activation buffer requires more than 800512 Kbytes of RAM. For this reason, we can only use the '''Quantized_Ext''' configuration to place activation buffer. For more details on the memory configuration, refer to [https://www.st.com/resource/en/user_manual/dm00630755-artificial-intelligence-ai-and-computer-vision-function-pack-for-stm32h7-microcontrollers-stmicroelectronics.pdf UM2611] section 3.2.4 "Memory requirements".

To compile only the Quantized_Ext configuration, select <code>Project &gt; Properties</code> from the top bar. Then select C/C++ Build from the left pane. Click manage configuration and then delete all configurations that are not Quantized_Ext. Only one configuration is left.
<div class="res-img">

[[File:tm_del_config.png|center|alt=Memory configuration settings|Memory configuration settings]]</div>


Clean the project by selecting <code>Project &gt; Clean...</code> and clicking <code>Clean</code>.

Eventually, build the project by clicking <code>Project &gt; Build All</code>.

When the compilation is complete, a file named <code>STM32H747I_DISCO_CM7.hex</code> is generated in
<code>workspace > FP_AI_VISION1 > Projects > STM32H747I-DISCO > Applications > FoodReco_MobileNetDerivative > Quantized_Model > STM32CubeIDE > STM32H747I_DISCO > Quantized_Ext </code>


=== Flashing the board ===

Connect the STM32H747I-DISCO to your PC via a Micro-USB to USB cable. Open STM32CubeProgrammer and connect to ST-LINK. Then flash the board with the hex file.

=== Testing the model ===

Connect the camera to the STM32H747I-DISCO board using a flex cable. To have the image in the upright position, the camera must be placed with the flex cable facing up as shown in the figure below.
Once the camera is connected, power on the board and press the reset button. After the "Welcome Screen", you will see the camera preview and output prediction of the model on the LCD Screen.
<div class="res-img">

[[File:tm_output.png|center|alt=Model inference running onboard|Model inference running on target]]</div>


== Troubleshooting ==
You may notice that once the model is running on STM32, the performance of the deep learning model is not as expected. The rationale is the following: 

* '''Quantization''': the quantization process can reduce the performance of the model, as going from a 32-bit floating point to a 8-bit integer representation means a loss in precision.
* '''Camera''': the webcam used for training the model is different from the the camera on the Discovery board. This difference of data between the training and the inference can explain a loss in performance.
<noinclude>

[[Category:Artifical Intelligence|06]]
{{PublicationRequestId | 16309 | 2020-06-03 }}	</noinclude>
(7 intermediate revisions by 3 users not shown)
Line 26: Line 26:
   
 
* [https://www.st.com/en/development-tools/stm32cubeide.html STM32Cube IDE]
 
* [https://www.st.com/en/development-tools/stm32cubeide.html STM32Cube IDE]
* [https://www.st.com/en/embedded-software/x-cube-ai.html X-Cube-AI] '''version 5.1.0''' command line tool
+
* [https://www.st.com/en/embedded-software/x-cube-ai.html X-Cube-AI] '''version 6.0.0''' command line tool
* [https://www.st.com/en/embedded-software/fp-ai-vision1.html FP-AI-VISION1] version 2.0.0
+
* [https://www.st.com/en/embedded-software/fp-ai-vision1.html FP-AI-VISION1] version 3.0.0
 
* [https://www.st.com/en/development-tools/stm32cubeprog.html STM32CubeProgrammer]
 
* [https://www.st.com/en/development-tools/stm32cubeprog.html STM32CubeProgrammer]
   
Line 44: Line 44:
 
Let's get started. Open https://teachablemachine.withgoogle.com/, preferably from [https://www.google.com/chrome/ Chrome] browser.
 
Let's get started. Open https://teachablemachine.withgoogle.com/, preferably from [https://www.google.com/chrome/ Chrome] browser.
   
Click {{Highlight|Get started}}, then select {{Highlight| Image Project}}. You will be presented with the following interface.
+
Click {{Highlight|Get started}}, then select {{Highlight| Image Project}}, then {{Highlight| Standard image model}} (224x244px color images). You will be presented with the following interface.
   
 
<div class="res-img">
 
<div class="res-img">
Line 58: Line 58:
   
 
To add images with your webcam, click the webcam icon and record some images. If you have image files on your computer, click upload and select the directory containing your images.
 
To add images with your webcam, click the webcam icon and record some images. If you have image files on your computer, click upload and select the directory containing your images.
  +
  +
The STM32H747 discovery kit combined with the B-CAMS-OMV camera daughter board can be used as a USB webcam.
  +
Using the ST kit for data collection will help to get better results as the same camera will be used for data collection and inference when the model will have been trained.
  +
  +
To use the ST kit as a webcam, simply program the board with the following binary of the function pack:
  +
  +
FP-AI-VISION1_V3.0.0/Projects/STM32H747I-DISCO/Applications/USB_Webcam/Binary/STM32H747I-DISCO_Webcam_V300.bin
  +
  +
Then plug a USB cable from the PC to the USB connector identified as USB OTG HS.
   
 
<div class="res-img">
 
<div class="res-img">
Line 123: Line 132:
 
  }}
 
  }}
 
-->
 
-->
{{Warning|Warning: FP-AI-VISION1 v2.0.0 is based on X-Cube-AI version 5.1.0. You can check your version of Cube.AI by running <code>stm32ai --version</code>}}
+
{{Warning|Warning: FP-AI-VISION1 v3.0.0 is based on X-Cube-AI version 6.0.0. You can check your version of Cube.AI by running <code>stm32ai --version</code>}}
   
   
Line 132: Line 141:
 
The expected output is:
 
The expected output is:
   
<source lang="text" highlight="37-38">
+
<source lang="text" highlight="23-24">
Neural Network Tools for STM32 v1.3.0 (AI tools v5.1.0)
+
Neural Network Tools for STM32AI v1.4.1 (STM.ai v6.0.0-RC6)
Running "generate" cmd...
+
Created date      : date
-- Importing model
+
Parameters        : generate -m model.tflite -v 2
model files : /path/to/workspace/model.tflite
 
model type  : tflite (tflite)
 
-- Importing model - done (elapsed time 0.531s)
 
-- Rendering model
 
-- Rendering model - done (elapsed time 0.184s)
 
-- Generating C-code
 
Creating /path/to/workspace/stm32ai_output/network.c
 
Creating /path/to/workspace/stm32ai_output/network_data.c
 
Creating /path/to/workspace/stm32ai_output/network.h
 
Creating /path/to/workspace/stm32ai_output/network_data.h
 
-- Generating C-code - done (elapsed time 0.782s)
 
   
Creating report file /path/to/workspace/stm32ai_output/network_generate_report.txt
+
Exec/report summary (generate)
 
+
------------------------------------------------------------------------------------------------------------------------
Exec/report summary (generate dur=1.500s err=0)
 
-----------------------------------------------------------------------------------------------------------------
 
 
model file        : /path/to/workspace/model.tflite
 
model file        : /path/to/workspace/model.tflite
type              : tflite (tflite)
+
type              : tflite
 
c_name            : network
 
c_name            : network
 
compression        : None
 
compression        : None
 
quantize          : None
 
quantize          : None
L2r error          : NOT EVALUATED
 
 
workspace dir      : /path/to/workspace/stm32ai_ws
 
workspace dir      : /path/to/workspace/stm32ai_ws
 
output dir        : /path/to/workspace/stm32ai_output
 
output dir        : /path/to/workspace/stm32ai_output
   
 
model_name        : model
 
model_name        : model
model_hash        : 2d2102c4ee97adb672ca9932853941b6
+
model_hash        : bc9c8f8c7d3364832d581f05626edf2a
input              : input_0 [150,528 items, 147.00 KiB, ai_u8, scale=0.003921568859368563, zero=0, (224, 224, 3)]
+
input              : sequential_1_input [150528 items, 147.00 KiB, ai_u8, scale=0.007843137718737125, zero_point=127, (224, 224, 3)]
input (total)     : 147.00 KiB
+
inputs (total)     : 147.00 KiB
output            : nl_71 [6 items, 6 B, ai_i8, scale=0.00390625, zero=-128, (6,)]
+
output            : nl_71_fmt [6 items, 6 B, ai_u8, scale=0.00390625, zero_point=0, (1, 1, 6)]
output (total)     : 6 B
+
outputs (total)   : 6 B
 
params #          : 517,794 items (526.59 KiB)
 
params #          : 517,794 items (526.59 KiB)
macc              : 63,758,922
+
macc              : 58,587,764
weights (ro)      : 539,232 (526.59 KiB)  
+
weights (ro)      : 539,232 B (526.59 KiB)  
activations (rw)  : 853,648 (833.64 KiB)  
+
activations (rw)  : 610,784 B (596.47 KiB)  
ram (total)        : 1,004,182 (980.65 KiB) = 853,648 + 150,528 + 6
+
ram (total)        : 761,318 B (743.47 KiB) = 610,784 + 150,528 + 6
   
------------------------------------------------------------------------------------------------------------------
+
Model name - model ['sequential_1_input'] ['conversion_72']
id  layer (type)              output shape      param #    connected to            macc          rom         
+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------
+
id  layer (type)                shape                  param/size        macc        connected to        |  c_size  c_macc              c_type               
0  input_0 (Input)            (224, 224, 3)                                                                     
+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
    conversion_0 (Conversion)  (224, 224, 3)                input_0                  301,056                   
+
0    sequential_1_input (Input)  (h:224, w:224, c:3)                                                      |                             
------------------------------------------------------------------------------------------------------------------
+
    conversion_0 (Conversion)    (h:224, w:224, c:3)                      301,056    sequential_1_input  |                                conv(i)[0]           
1  pad_1 (Pad)                (225, 225, 3)                conversion_0                                       
+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------
+
1    pad_1 (Pad)                  (h:225, w:225, c:3)                                  conversion_0        |                             
2  conv2d_2 (Conv2D)          (112, 112, 16)    448        pad_1                    5,820,432      496         
+
-------------------------------------------------------------------------------------------------
    nl_2 (Nonlinearity)        (112, 112, 16)                conv2d_2                                           
 
------------------------------------------------------------------------------------------------------------------
 
   
 
( ... )
 
( ... )
   
------------------------------------------------------------------------------------------------------------------
+
--------------------------------------------------------------------
71  nl_71 (Nonlinearity)      (1, 1, 6)                    dense_70                102                       
+
71  nl_71 (Nonlinearity)        (c:6)                                    90          dense_70            |            +12(+13.3%)        nl()/conv(i)/o[66, 67]
------------------------------------------------------------------------------------------------------------------
+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
72 conversion_72 (Conversion) (1, 1, 6)                     nl_71                                              
+
72   conversion_72 (Conversion)   (c:6)                                   12          nl_71               |            -12(-100.0%)     
------------------------------------------------------------------------------------------------------------------
+
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
model p=517794(526.59 KBytes) macc=63758922 rom=526.59 KBytes ram=833.64 KiB io_ram=147.01 KiB
+
model/c-model: macc=61,211,116/58,587,764 -2,623,352(-4.3%) weights=539,232/539,232  activations=--/610,784 io=--/150,534
   
+
( ... )
Complexity per-layer - macc=63,758,922 rom=539,232
 
------------------------------------------------------------------------------------------------------------------
 
id      layer (type)              macc                                    rom                                   
 
------------------------------------------------------------------------------------------------------------------
 
0      conversion_0 (Conversion)  ||                                0.5%  |                                0.0%
 
2      conv2d_2 (Conv2D)          |||||||||||||||||||||||||        9.1%  |                                0.1%
 
3      conv2d_3 (Conv2D)         ||||||||||                        3.5%  |                                0.0%
 
4      conv2d_4 (Conv2D)          |||||||                          2.5%  |                                0.0%
 
5      conv2d_5 (Conv2D)          ||||||||||||||||||||||||||        9.4%  |                                0.1%
 
7      conv2d_7 (Conv2D)          |||||||                          2.6%  |                                0.1%
 
   
  +
Complexity report per layer - macc=58,587,764 weights=539,232 act=610,784 ram_io=150,534
  +
---------------------------------------------------------------------------------
  +
id  name          c_macc                    c_rom                    c_id   
  +
---------------------------------------------------------------------------------
  +
0    conversion_0  |                  0.5%  |                  0.0%  [0]   
  +
2    conv2d_2      ||||||||||||      9.2%  |                  0.1%  [1]   
  +
3    conv2d_3      ||||              3.1%  |                  0.0%  [2]   
  +
4    conv2d_4      ||||              2.7%  |                  0.0%  [3]   
  +
5    conv2d_5      |||||||||||        8.2%  |                  0.1%  [4]   
  +
7    conv2d_7      |||                2.3%  |                  0.1%  [5]   
  +
 
 
( ... )
 
( ... )
 
+
 
64      conv2d_64 (Conv2D)        ||||                             1.5% |||||                            3.7%  
+
64   conv2d_64     ||                 1.6%   |||               3.7%   [59]   
65      conv2d_65 (Conv2D)        |                                 0.3% |                                 0.8%  
+
65   conv2d_65     |                 0.3%   |                 0.8%   [60]   
66      conv2d_66 (Conv2D)        ||||||||                          2.9% ||||||||                         7.1%  
+
66   conv2d_66     ||||               3.1%   ||||               7.1%   [61]   
67      conv2d_67 (Conv2D)        |||||||||||||||||||||||||||||||  11.3% |||||||||||||||||||||||||||||||  27.5%  
+
67   conv2d_67     ||||||||||||||||  12.1%   ||||||||||||||||  27.5%   [62]   
69     dense_69 (Dense)          |                                 0.2% ||||||||||||||||||||||||||      23.8%  
+
69   dense_69       |                 0.2%   |||||||||||||     23.8%   [63]   
70     dense_70 (Dense)          |                                 0.0% |                                 0.1%  
+
70   dense_70       |                 0.0%   |                 0.1%   [64, 65]
71     nl_71 (Nonlinearity)      |                                 0.0% |                                 0.0%  
+
71   nl_71         |                 0.0%   |                 0.0%   [66, 67]
------------------------------------------------------------------------------------------------------------------
 
 
</source>
 
</source>
 
This command generates four files under <code>workspace/stm32ai_ouptut/</code>:
 
This command generates four files under <code>workspace/stm32ai_ouptut/</code>:
Line 224: Line 217:
 
* network_data.h
 
* network_data.h
   
Let's take a look at the highlighted lines: we learn that the model uses 526.59 Kbytes of weights (read-only memory) and 833.54 Kbytes of activations. As the STM32H747xx MCUs do not have 833 Kbytes of contiguous RAM, we need to use the external SDRAM present on the STM32H747-DISCO board. Refer to  [https://www.st.com/resource/en/user_manual/dm00504240-discovery-kit-with-stm32h747xi-mcu-stmicroelectronics.pdf UM2411] section 5.8 "SDRAM" for more information.
+
Let's take a look at the highlighted lines: we learn that the model uses 526.59 Kbytes of weights (read-only memory) and 596.47 Kbytes of activations. As the STM32H747xx MCUs do not have 596.47 Kbytes of contiguous RAM, we need to use the external SDRAM present on the STM32H747-DISCO board. Refer to  [https://www.st.com/resource/en/user_manual/dm00504240-discovery-kit-with-stm32h747xi-mcu-stmicroelectronics.pdf UM2411] section 5.8 "SDRAM" for more information.
   
 
{{Info|These figures about memory footprint might be different for your model as it depends on the number of classes you have.
 
{{Info|These figures about memory footprint might be different for your model as it depends on the number of classes you have.
Line 284: Line 277:
 
5 Background</pre>
 
5 Background</pre>
   
From STM32CubeIDE, open <code>fp_vision_app.c</code>. Go to line 142 where the <code>output_labels</code> is defined and update this variable with our label names:
+
From STM32CubeIDE, open <code>fp_vision_app.c</code>. Go to line 123 where the <code>output_labels</code> is defined and update this variable with our label names:
 
{{Snippet | category=AI | component=Application | snippet=
 
{{Snippet | category=AI | component=Application | snippet=
 
<source lang="c">
 
<source lang="c">
// fp_vision_app.c line 142
+
// fp_vision_app.c line 123
 
const char* output_labels[AI_NET_OUTPUT_SIZE] = {
 
const char* output_labels[AI_NET_OUTPUT_SIZE] = {
 
     "SensorTile", "IoTNode", "STLink", "Craddle Ext", "Fanout", "Background"};</source>
 
     "SensorTile", "IoTNode", "STLink", "Craddle Ext", "Fanout", "Background"};</source>
 
}}
 
}}
   
While we're here, we'll update the display mode that it shows camera image instead of food logos. Go around line 230 and update the <code>App_Output_Display</code> function. At the top of the function, the <code>display_mode</code> variable should be set to 0.
+
While we're here, we'll update the display mode that it shows camera image instead of food logos. Go around line 224 and update the <code>App_Output_Display</code> function. At the top of the function, the <code>display_mode</code> variable should be set to 1.
   
 
{{Snippet | category=AI | component=Application | snippet=
 
{{Snippet | category=AI | component=Application | snippet=
Line 299: Line 292:
 
{
 
{
 
   static uint32_t occurrence_number = NN_OUTPUT_DISPLAY_REFRESH_RATE;
 
   static uint32_t occurrence_number = NN_OUTPUT_DISPLAY_REFRESH_RATE;
   static uint32_t display_mode = 0; // Updated
+
   static uint32_t display_mode = 1; // Was 0
 
</source>
 
</source>
 
}}
 
}}
Line 308: Line 301:
 
In order to have square images and avoid image deformation we are going to crop the camera image using the DCMI. The goal of this step is to go from the 640x480 resolution to a 480x480 resolution.
 
In order to have square images and avoid image deformation we are going to crop the camera image using the DCMI. The goal of this step is to go from the 640x480 resolution to a 480x480 resolution.
   
First, edit <code>fp_vision_camera.h</code> line 59 to update the <code>CAMERA_WIDTH</code> define to 480 pixels:  
+
First, edit <code>fp_vision_camera.h</code> line 60 to update the <code>CAMERA_WIDTH</code> define to 480 pixels:  
   
 
{{Snippet | category=AI | component=Application | snippet=
 
{{Snippet | category=AI | component=Application | snippet=
 
<source lang="c" highlight="4">
 
<source lang="c" highlight="4">
//fp_vision_camera.h line 59
+
//fp_vision_camera.h line 60
 
#if CAMERA_CAPTURE_RES == VGA_640_480_RES
 
#if CAMERA_CAPTURE_RES == VGA_640_480_RES
 
#define CAMERA_RESOLUTION CAMERA_R640x480
 
#define CAMERA_RESOLUTION CAMERA_R640x480
Line 322: Line 315:
 
Then, edit <code>fp_vision_camera.c</code> located in <code>Application/</code>.
 
Then, edit <code>fp_vision_camera.c</code> located in <code>Application/</code>.
   
Modify the <code>CAMERA_Init</code> function (line 51) to configure DCMI cropping (update the function with the highlighted code bellow) :  
+
Modify the <code>CAMERA_Init</code> function (line 59) to configure DCMI cropping (update the function with the highlighted code bellow) :  
   
 
{{Snippet | category=AI | component=Application | snippet=
 
{{Snippet | category=AI | component=Application | snippet=
<source lang="c" highlight="22-45">
+
<source lang="c" highlight="14-27">
 
void CAMERA_Init(CameraContext_TypeDef* Camera_Context_Ptr)
 
void CAMERA_Init(CameraContext_TypeDef* Camera_Context_Ptr)
 
{
 
{
 
   CAMERA_Context_Init(Camera_Context_Ptr);
 
   CAMERA_Context_Init(Camera_Context_Ptr);
   
   /* Reset and power down camera to be sure camera is Off prior start */
+
   __HAL_RCC_MDMA_CLK_ENABLE();
  BSP_CAMERA_PwrDown(0);
+
 
 
+
   (...)
  /* Wait delay */
 
  HAL_Delay(200);
 
 
 
  /* Initialize the Camera */
 
  if (BSP_CAMERA_Init(0, CAMERA_RESOLUTION, CAMERA_PF_RGB565) != BSP_ERROR_NONE)
 
   {
 
    Error_Handler();
 
  }
 
   
 
   /* Set camera mirror / flip configuration */
 
   /* Set camera mirror / flip configuration */
Line 359: Line 344:
   
 
   HAL_DCMI_EnableCrop(&hcamera_dcmi);
 
   HAL_DCMI_EnableCrop(&hcamera_dcmi);
   HAL_DCMI_Start_DMA(&hcamera_dcmi, DCMI_MODE_CONTINUOUS,
+
   /* Wait for the camera initialization after HW reset */
                    (uint32_t)Camera_Context_Ptr->camera_capture_buffer, (CAM_RES_WIDTH*CAM_RES_HEIGHT) / 2);
+
  HAL_Delay(200);
 
+
 
   /* Start the Camera Capture */    
+
   /*
  /* [REMOVED]
+
  * Start the Camera Capture
   if(BSP_CAMERA_Start(0, (uint8_t *)Camera_Context_Ptr->camera_capture_buffer, CAMERA_MODE_CONTINUOUS)!=BSP_ERROR_NONE)
+
  * Using intermediate line buffer in D2-AHB domain to support high pixel clocks.
  +
   */
  +
   if (HAL_DCMIEx_Start_DMA_MDMA(&hcamera_dcmi, CAMERA_MODE_CONTINUOUS,
  +
                                (uint8_t *)Camera_Context_Ptr->camera_capture_buffer,
  +
                                CAM_LINE_SIZE, CAM_RES_HEIGHT) != HAL_OK)
 
   {
 
   {
 
     while(1);
 
     while(1);
 
   }
 
   }
  */
 
 
 
  /* Wait for the camera initialization after HW reset */
 
  HAL_Delay(200);
 
 
 
#if MEMORY_SCHEME == FULL_INTERNAL_MEM_OPT
 
  /* Wait until camera acquisition of first frame is completed => frame ignored*/
 
  while (Camera_Context_Ptr->new_frame_ready == 0)
 
  {
 
    BSP_LED_Toggle(LED_GREEN);
 
    HAL_Delay(100);
 
  };
 
  Camera_Context_Ptr->new_frame_ready = 0;
 
#endif
 
}
 
   
 
</source>
 
</source>
Line 398: Line 372:
 
* Quantized_Int_Split
 
* Quantized_Int_Split
   
As we saw in Part 2, the activation buffer requires more than 800 Kbytes of RAM. For this reason, we can only use the '''Quantized_Ext''' configuration to place activation buffer. For more details on the memory configuration, refer to [https://www.st.com/resource/en/user_manual/dm00630755-artificial-intelligence-ai-and-computer-vision-function-pack-for-stm32h7-microcontrollers-stmicroelectronics.pdf UM2611] section 3.2.4 "Memory requirements".
+
As we saw in Part 2, the activation buffer requires more than 512 Kbytes of RAM. For this reason, we can only use the '''Quantized_Ext''' configuration to place activation buffer. For more details on the memory configuration, refer to [https://www.st.com/resource/en/user_manual/dm00630755-artificial-intelligence-ai-and-computer-vision-function-pack-for-stm32h7-microcontrollers-stmicroelectronics.pdf UM2611] section 3.2.4 "Memory requirements".
   
 
To compile only the Quantized_Ext configuration, select <code>Project &gt; Properties</code> from the top bar. Then select C/C++ Build from the left pane. Click manage configuration and then delete all configurations that are not Quantized_Ext. Only one configuration is left.
 
To compile only the Quantized_Ext configuration, select <code>Project &gt; Properties</code> from the top bar. Then select C/C++ Build from the left pane. Click manage configuration and then delete all configurations that are not Quantized_Ext. Only one configuration is left.
Line 434: Line 408:
   
 
<noinclude>
 
<noinclude>
[[Category:Artifical Intelligence]]
+
[[Category:Artifical Intelligence|06]]
 
{{PublicationRequestId | 16309 | 2020-06-03 }}
 
{{PublicationRequestId | 16309 | 2020-06-03 }}
 
</noinclude>
 
</noinclude>