This message will disappear after all relevant tasks have been resolved.

Semantic MediaWiki

There are 1 incomplete or pending task to finish installation of Semantic MediaWiki. An administrator or user with sufficient rights can complete it. This should be done before adding new data to avoid inconsistencies.

This article is providing benchmark of a set of well-known or reference pre-trained neural network models. Some STM32 results will be officially submitted to the MLPerf™ Tiny benchmark from MLCommons™.

Information

STM32Cube.AI is a software aiming at the generation of optimized C code for STM32 and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).
Inference time, current and energy measures process is described, not done in a certified laboratory but can be reproduce by any user. The results are average values and will vary depending on the input data (random data are currently used), temperature and the STM32 device itself.
Published data on this article are not contractual.

1. Benchmark results

STM32 Board	STM32 characterist.	Model Source/Link	Flash Wgt.	RAM Buf.	Proc Time	Current (mA)	Energy (mJ) @ 3.3V	Version	Comments
STM32H723 NUCLEO-H723ZG	Flash 1MB RAM 564KB (432KB) Freq 550MHz	mobilenet v1 0.25 128 quant source	500 KB	200 KB	10 ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32H723 NUCLEO-H723ZG	Flash 1MB RAM 564KB (432KB) Freq 550MHz	Anomaly Detection v0.5 MLPerf™Tiny	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32H723 NUCLEO-H723ZG	Flash 1MB RAM 564KB (432KB) Freq 550MHz	Key Word Spotting v0.5 MLPerf™Tiny	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32H723 NUCLEO-H723ZG	Flash 1MB RAM 564KB (432KB) Freq 550MHz	Image Classif. v0.5 MLPerf™ Tiny	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32H723 NUCLEO-H723ZG	Flash 1MB RAM 564KB (432KB) Freq 550MHz	Visual Wake Word v0.5 MLPerf™Tiny	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32H743 NUCLEO-H743ZI	Flash 2MB RAM 1MB (512 KB) Freq 480MHz	mobilenet v1 0.25 128 quant source	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32H747 SMPS STM32H747I-DISCO	Cortex® M7 Flash 2MB (512 KB) RAM 1MB Freq 400MHz	mobilenet v1 0.25 128 quant source	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0	On Cortex® M7 core in SMPS mode 400MHz instead of 480 max in LDO
STM32H7A3 NUCLEO-H7A3ZI-Q	Flash 2MB RAM 1.4MB (1.18 MB) Freq 280MHz	mobilenet v1 0.25 128 quant source	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32U585 B-U585I-IOT02A	Flash 2MB RAM 786KB Freq 160MHz	mobilenet v1 0.25 128 quant source	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0
STM32L4R5 NUCLEO-L4R5ZI	Flash 2MB RAM 640KB Freq 120MHz	mobilenet v1 0.25 128 quant source	KB	KB	ms	NA	NA	Cube AI v7.0.0 Cube IDE v1.7.0

2. Measure process

On this benchmark only the machine learning model inference processing is reported. In a complete application, the sensor acquisition, the data conditioning and pre-processing shall also be considered. The column STM32 characteristics provide the available internal Flash size, the full internal RAM size and the maximum frequency. The RAM size includes the different kind of memories and banks, TCM, SRAM etc. For the time being, the buffers used by X-CUBE-AI shall be placed in a continuous memory area, the maximal RAM size available in continuous area is provided between "()" if not equal to the full size. The frequency indicated is the operating frequency used for the benchmark, so generally the maximal frequency. The only different case is with the STM32H747 Discovery Kit which is operating by default in SMPS power mode and therefore is limited to 400 MHz instead of 480 MHz.

The memory footprint are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table). The input / output buffers are included, but the options have been selected allowing to overlay these buffers with the activations. The input / output buffer size are also reported.

RAM Model: buffers required to run the model, activations / input / output buffers with the "" option activated.

The inference time as well as the X-Cross error is the one reported by the "Validation on target". STM32Cube.AI is not modifying the DL/ML model topology. The impact on accuracy should be limited and the X-Cross error ensure that the difference... The clock source is always HSI and maximal frequency. Clock settings are configured automatically by X-CUBE-AI / STM32CUbeMX.

The validation can be done also with dataset...

Quantized case through CLI scripts + data compression.

When power measure is https://wiki.st.com/stm32mcu/wiki/AI:How_to_measure_machine_learning_model_power_consumption_with_STM32Cube.AI_generated_application

STM32Cube.AI model performances

1. Benchmark results

2. Measure process