STM32 step by step advance

Revision as of 11:46, 17 December 2021 by Registered User (→‎STM32 Advance DSP: new section)
STM32 Step by Step banner.png

STM32 advanced step-by-step is designed for people who have already completed the 5 tutorials of the STM32 Step by step and who want to go further.

StepbyStep Analog logo.png

Advance analog tutorial

In this tutorial, learn how to capture, filter and record sound with a STM32F769I-DISCO board. Go further and apply the cross correlation algorithm in order to estimate the direction of the sound

StepbyStep MC logo.png

Advance MotorControl tutorial

In this tutorial, learn how to install and use ST Motor Profiler / STM32 Motor Control SDK and increase your skills by running an engine .



1. STM32 Advance DSP

1.1. Introduction

This tutorial....

1.2. Prerequisites

1.3. Step 1 - Quantization

1.4. Step 2 - Code generation

1.5. Step 3 - Testing

2. (Optional step) Quantize the model


2.1. Quantization script with TensorflowLite Converter

Model quantization allows to reduce the weight size by 4 and can also speed up inference time to a factor of 3. For more information about quantization you check Cube.AI documentation about quantization.

In this article we will use TensorFlowLite Converter tool to quantize our mnist network. In the stm32cubeai/example directory, create a new python file named quantize.py and paste the following lines:

'''
Exemple of quantization for mnist_cnn Keras model
'''
import keras
from keras.datasets import mnist
from keras.models import load_model
import numpy as np
import tensorflow as tf

# input image dimensions
img_rows, img_cols = 28, 28

# load the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# reshape x_train
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)

# This function is needed to give TFLite converter a representative dataset of inputs
def representative_dataset_gen():
    for img in x_train[:512]:
        img = img.astype(np.float32)
        img /= 255.0
        yield [np.expand_dims(img, axis=0)]

converter = tf.lite.TFLiteConverter.from_keras_model_file('./mnist_cnn.h5')
converter.representative_dataset = representative_dataset_gen
converter.target_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] # Use INT8 builtins operators
converter.inference_input_type = tf.uint8 # Our input image will be in uint8 format
converter.inference_output_type = tf.float32 # The output from softmax will be in floating point

quant_model = converter.convert() # Call the converter

# Write the quantized model a file
with open('mnist_cnn_quant.tflite', "wb") as f:
    f.write(quant_model)

2.2. Analyse the quantized model

In a shell, run:

$ cd openmv/src/stm32ai/examples
$ stm32ai analyse -m mnist_cnn_quant.tflite
# You can compare with the floating point model
# stm32ai analyse -m mnist_cnn.h5

A comparison between the floating point and the quantized models yields the following results:

Model type ROM usage (weights) RAM usage (activations and I/Os)
Floating point 136 KB 51 KB
Quantized 34 KB 14 KB

We notice a 1/4 factor between floating-point and integer, due to the 32 to 8-bits representation of weights and tensors.

2.3. Generate the code

To generate the C code, run:

stm32ai generate -m mnist_cnn_quant.tflite -o ../data

The generated .c and .h files will be placed stm32cubeai/data

2.4. Edit the preprocessing function

Now that we are working with uint8 data instead of floating point, we need to update the ai_transform_input function located in nn_st.c.

If we inspect the mnist_cnn_quant.tflite file with Neutron, we notice that the input is expected to be quantized with the following scaling parameter : 0.0039. This value is equal to 1/255 and means that the floating point value is equal to the quantized value times 1/255. So in the case we don't need to do any transformation to the input as we already have greyscale values ranging from 0 to 255.

Back to the code, we just need get rid of the conversion to floating point and just copy the greyscale value to the neural network input buffer, as shown in the listing bellow.

void ai_transform_input(ai_buffer *input_net, image_t *img, ai_u8 *input_data,
                        rectangle_t *roi) {

  // Example for MNIST CNN
  // We don't need this casting to floating point
  // ai_float *_input_data = (ai_float *)input_data;
  int x_ratio = (int)((roi->w << 16) / input_net->width) + 1;
  int y_ratio = (int)((roi->h << 16) / input_net->height) + 1;

  for (int y = 0, i = 0; y < input_net->height; y++) {
    int sy = (y * y_ratio) >> 16;
    for (int x = 0; x < input_net->width; x++, i++) {
      int sx = (x * x_ratio) >> 16;
      uint8_t p = IM_GET_GS_PIXEL(img, sx + roi->x, sy + roi->y);

      // We don't need this conversion
      // _input_data[i] = (float)(p / 255.0f);
      // instead we simply set the input_data to the greyscale value of input
      input_data[i] = p;
    }
  }

2.5. Compile and run

A few modifications must be done to enable quantized model to be compiled. First, edit stm32cubeai/cube.mk line 26 to add -lgcc. The line should look like this after the modification :

LIBS += -l:NetworkRuntime_CM7_GCC.a -Lstm32cubeai/AI/Lib -lc -lm -lgcc

Then, edit the file stm32cubeai/Makefile line 20 to add some source files from the CMSIS DSP library. The full block of code should look like this after the modification:

SRCS += $(addprefix ../cmsis/src/dsp/SupportFunctions/,\
arm_float_to_q7.c\
arm_float_to_q15.c\
arm_q7_to_float.c\
arm_q15_to_float.c\
arm_q7_to_q15.c\
arm_q15_to_q7.c\
arm_fill_q7.c\
arm_fill_q15.c\
arm_copy_q15.c\
arm_copy_q7.c\
)

Note that 4 files have been added at the end of the list.

The steps to compile the firmware and run the python code are exactly the same as for the floating point case. Consider running make clean between builds.

This category currently contains no pages or media.