This article is providing benchmark of a set of well-known or reference pre-trained neural network models. Some STM32 results will be officially submitted to the MLPerf™ Tiny benchmark from MLCommons™.
1. Benchmark results
STM32 Board | STM32 characterist. |
Model Source/Link |
Flash Wgt. |
RAM Buf. |
Proc Time |
Current (mA) |
Energy (mJ) @ 3.3V |
Version | Comments |
---|---|---|---|---|---|---|---|---|---|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432KB) Freq 550MHz |
mobilenet v1 0.25 128 quant source |
500 KB | 200 KB | 10 ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432KB) Freq 550MHz |
Anomaly Detection v0.5 MLPerf™Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432KB) Freq 550MHz |
Key Word Spotting v0.5 MLPerf™Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432KB) Freq 550MHz |
Image Classif. v0.5 MLPerf™ Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432KB) Freq 550MHz |
Visual Wake Word v0.5 MLPerf™Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512 KB) Freq 480MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB (512 KB) RAM 1MB Freq 400MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
On Cortex® M7 core in SMPS mode 400MHz instead of 480 max in LDO |
STM32H7A3 NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18 MB) Freq 280MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32U585 B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB RAM 640KB Freq 120MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
2. Measure process
On this benchmark only the machine learning model inference processing is reported. In a complete application, the sensor acquisition, the data conditioning and pre-processing shall also be considered. The column STM32 characteristics provide the available internal Flash size, the full internal RAM size and the maximum frequency. The RAM size includes the different kind of memories and banks, TCM, SRAM etc. For the time being, the buffers used by X-CUBE-AI shall be placed in a continuous memory area, the maximal RAM size available in continuous area is provided between "()" if not equal to the full size. The frequency indicated is the operating frequency used for the benchmark, so generally the maximal frequency. The only different case is with the STM32H747 Discovery Kit which is operating by default in SMPS power mode and therefore is limited to 400 MHz instead of 480 MHz.
The memory footprint are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table). The input / output buffers are included, but the options have been selected allowing to overlay these buffers with the activations. The input / output buffer size are also reported.
RAM Model: buffers required to run the model, activations / input / output buffers with the "" option activated.
The inference time as well as the X-Cross error is the one reported by the "Validation on target". STM32Cube.AI is not modifying the DL/ML model topology. The impact on accuracy should be limited and the X-Cross error ensure that the difference... The clock source is always HSI and maximal frequency. Clock settings are configured automatically by X-CUBE-AI / STM32CUbeMX.
The validation can be done also with dataset...
Quantized case through CLI scripts + data compression.
When power measure is https://wiki.st.com/stm32mcu/wiki/AI:How_to_measure_machine_learning_model_power_consumption_with_STM32Cube.AI_generated_application