This article is providing benchmark of a set of well-known or reference pre-trained neural network models. Some STM32 results will be officially submitted to the MLPerf™ Tiny benchmark from MLCommons™.
1. Benchmark results
STM32 Board | STM32 characterist. |
Model Source/Link |
Flash Wgt. |
RAM Buf. |
Proc Time |
Current (mA) |
Energy (mJ) @ 3.3V |
Version | Comments |
---|---|---|---|---|---|---|---|---|---|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB Freq 550MHz |
mobilenet v1 0.25 128 quant source |
500 KB | 200 KB | 10 ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB Freq 550MHz |
Anomaly Detection v0.5 MLPerf™ Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB Freq 550MHz |
Key Word Spotting v0.5 MLPerf™ Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB Freq 550MHz |
Image Classif. v0.5 MLPerf™ Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB Freq 550MHz |
Visual Wake Word v0.5 MLPerf™ Tiny |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB Freq 480MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB Freq 400MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
On Cortex® M7 core in SMPS mode 400MHz instead of 480 max in LDO |
STM32H7A3 NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1376KB Freq 280MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32U585 B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
|
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB RAM 640KB Freq 120MHz |
mobilenet v1 0.25 128 quant source |
KB | KB | ms | NA | NA | Cube AI v7.0.0 Cube IDE v1.7.0 |
2. Measure process
Only the machine learning inference is considered. In a complete application, the sensor acquisition, the data conditioning and pre-processing shall also be considered.
The memory footprint are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table). The input / output buffers are included, but the options have been selected allowing to overlay these buffers with the activations. The input / output buffer size are also reported.
RAM Model: buffers required to run the model, activations / input / output buffers with the "" option activated.
The inference time as well as the X-Cross error is the one reported by the "Validation on target". STM32Cube.AI is not modifying the DL/ML model topology. The impact on accuracy should be limited and the X-Cross error ensure that the difference... The clock source is always HSI and maximal frequency. Clock settings are configured automatically by X-CUBE-AI / STM32CUbeMX.
The validation can be done also with dataset...
Quantized case through CLI scripts + data compression.
When power measure is https://wiki.st.com/stm32mcu/wiki/AI:How_to_measure_machine_learning_model_power_consumption_with_STM32Cube.AI_generated_application