This article is providing benchmark of a set of well-known or reference pre-trained neural network models. Some STM32 results will be officially submitted to the MLPerf™ Tiny benchmark from MLCommons™.
1. Benchmark Results
1.1. STM32 High Performance
STM32 Board | STM32 characteristics |
Model Source/Link |
Flash Wgt. |
RAM Buf. |
Proc Time |
Cur. (mA) |
Energy (mJ) 3.3V |
Version |
---|---|---|---|---|---|---|---|---|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
mobilenet v1 0.25 128 quant tfl source |
500 KB | 200 KB | 10 ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
FoodReco quant h5 deriv mobilenet FP-AI-VISION1 |
500 KB | 200 KB | 10 ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
mobilenet v1 0.25 128 quant tfl source |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
mobilenet v1 0.25 128 quant tfl source |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
mobilenet v1 0.25 128 quant tfl source |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
(1) On Cortex® M7 core in SMPS mode 400MHz instead of 480 max in LDO.
1.2. STM32 Ultra Low Power
STM32 Board | STM32 characteristics |
Model Source/Link |
Flash Wgt. |
RAM Buf. |
Proc Time |
Cur. (mA) |
Energy (mJ) 3.3V |
Version |
---|---|---|---|---|---|---|---|---|
STM32U585 B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
mobilenet v1 0.25 128 quant tfl source |
KB | KB | ms | NA | NA | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB RAM 640KB Freq 120MHz |
mobilenet v1 0.25 128 quant tfl source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
2. Measure process
On this benchmark only the machine learning model inference processing is reported. In a complete application, the sensor acquisition, the data conditioning and pre-processing shall also be considered.
The STM32 characteristics column provides the available internal Flash size, the full internal RAM size and the frequency. The RAM size includes the different kind of memories and banks, TCM, SRAM etc. For the time being, the buffers used by X-CUBE-AI shall be placed in a continuous memory area, the maximal RAM size available in continuous area is provided between "()" if not equal to the full size. The frequency indicated is the operating frequency used for the benchmark, so generally the maximal frequency. The only different case is with the STM32H747 Discovery Kit which is operating by default in SMPS power mode and therefore is limited to 400 MHz instead of 480 MHz.
The memory footprints are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table).
The column Model Source/Links indicates the pre-trained ML model and the source, either how it was built / trained or where it can be downloaded. tfl stands for TensorFlow™ Lite .tfl model , h5 stands for Keras .h5 model, quant for quantized models on 8 bits. For FP-AI-VISION1 models, they are located in the package directory: FP-AI-VISION1_V3.0.0\Utilities\AI_resources.
The column Flash Wgt. reports the model weights occupancy in Flash.
The column RAM Buf. reports the RAM buffers occupancy used to store the model activations as well as input and output buffers. Note that to gain RAM space the "Use activation buffer for input buffer" and "Use activation buffer for the output buffer" options are selected (through X-CUBE-AI Advanced Settings panel).
The column Proc Time reports the model inference processing time. When the current / energy is indicated the measure is done thanks to X-CUBE-AI "System Performance" application following the process described on this WiKi article on power measurement. Otherwise the "Validation on target" application is used. In all case, when generating the application, the selected clock source is always the HSI, X-CUBE-AI is generating first the optimal clock settings and eventually afterwards the clock is set to HSI. STM32CubeMX then autonomously reconfigures the clock settings.
Cur. and Energy is the current and energy computed following the process describe in the WiKi article on power measurement and specifically in the section 4.3.1 "Measure process when current is below 50 mA" (using 10 s windows for averaging) using HSI as clock source.
Accuracy is not reported. X-CUBE-AI is not modifying the DL/ML model topology. The impact on accuracy should be limited. X-CUBE-AI is providing through the "Validation" application a way to measure the accuracy either on x86 or on the target. It can be used to check the eventual impact on accuracy. When running the "Validation on target" application several metrics are computed, one of them is the X-Cross providing error metrics between the original model executed in Python and the C model executed on the target. Random data can be used to compute the RMSE/MAE/L2R errors. For all models reported here, the L2R (scalar value of the relative 2-norm or Euclidean distance between the generated values of the original model and the C-model) is below 0.01, so a low probability of significant impact on the accuracy. For more details on the metrics, please refer to the X-CUBE-AI Embedded Documentation.
Note that accuracy check is important to compare a float model to a quantize model or when using the Weight compression feature of X-CUBE-AI for float models.