STM32Cube.AI model performances

Revision as of 14:30, 9 August 2021 by Registered User

This article is providing benchmark of a set of well-known or reference pre-trained neural network models. Some STM32 results will be officially submitted to the MLPerf™ Tiny benchmark from MLCommons™.

Info white.png Information
  • STM32Cube.AI is a software aiming at the generation of optimized C code for STM32 and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).
  • Inference time, current and energy measures process is described, not done in a certified laboratory but can be reproduce by any user. The results are average values and will vary depending on the input data (random data are currently used), temperature and the STM32 device itself.
  • Published data on this article are not contractual.

1. Benchmark results

STM32 Board STM32
characterist.
Model
Source/Link
Flash
Wgt.
RAM
Buf.
Proc
Time
Current
(mA)
Energy
(mJ)
@ 3.3V
Version Comments
STM32H723
NUCLEO-H723ZG
Flash 1MB
RAM 564KB
Freq 550MHz
mobilenet v1
0.25 128 quant
source
500 KB 200 KB 10 ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1MB
RAM 564KB
Freq 550MHz
Anomaly Detection v0.5
MLPerf™ Tiny
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1MB
RAM 564KB
Freq 550MHz
Key Word Spotting v0.5
MLPerf™ Tiny
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1MB
RAM 564KB
Freq 550MHz
Image Classif. v0.5
MLPerf™ Tiny
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1MB
RAM 564KB
Freq 550MHz
Visual Wake Word v0.5
MLPerf™ Tiny
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2MB
RAM 1MB
Freq 480MHz
mobilenet v1
0.25 128 quant
source
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex® M7
Flash 2MB
RAM 1MB
Freq 400MHz
mobilenet v1
0.25 128 quant
source
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
On Cortex® M7 core in SMPS mode 400MHz instead of 480 max in LDO
STM32H7A3
NUCLEO-H7A3ZI-Q
Flash 2MB
RAM 1376KB
Freq 280MHz
mobilenet v1
0.25 128 quant
source
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32U585
B-U585I-IOT02A
Flash 2MB
RAM 786KB
Freq 160MHz
mobilenet v1
0.25 128 quant
source
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2MB
RAM 640KB
Freq 120MHz
mobilenet v1
0.25 128 quant
source
KB KB ms NA NA Cube AI v7.0.0
Cube IDE v1.7.0

2. Measure process

Only the machine learning inference is considered. In a complete application, the sensor acquisition, the data conditioning and pre-processing shall also be considered.

The memory footprint are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table). The input / output buffers are included, but the options have been selected allowing to overlay these buffers with the activations. The input / output buffer size are also reported.

RAM Model: buffers required to run the model, activations / input / output buffers with the "" option activated.

The inference time as well as the X-Cross error is the one reported by the "Validation on target". STM32Cube.AI is not modifying the DL/ML model topology. The impact on accuracy should be limited and the X-Cross error ensure that the difference... The clock source is always HSI and maximal frequency. Clock settings are configured automatically by X-CUBE-AI / STM32CUbeMX.

The validation can be done also with dataset...

Quantized case through CLI scripts + data compression.

When power measure is https://wiki.st.com/stm32mcu/wiki/AI:How_to_measure_machine_learning_model_power_consumption_with_STM32Cube.AI_generated_application

No categories assignedEdit