STM32Cube.AI model performances

Revision as of 09:10, 22 September 2021 by Registered User
Renaming.png This page is a candidate for renaming (move).
The requested new name is: STM32Cube.AI model benchmark .
The supplied reason is: Proposal to avoid capitalizing .
-- Registered User (-) 18:26, 21 September 2021 (CEST).
Wiki maintainers: remember to update the pages that link this page before renaming (moving) it.

This article provides benchmark results for a set of well-known or reference pre-trained Neural Network models. Some STM32 results are candidates for official submission to the MLPerf™ Tiny benchmark from MLCommons™.

Info white.png Information
  • STM32Cube.AI (X-CUBE-AI) is a software that generates optimized C code for STM32 microcontrollers and Neural Network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).
  • The inference time, current and energy measurement process described is not done in a certified laboratory but can be reproduced by any user. The results are average values, which may vary depending on the input data (random data are currently used), the temperature, and the STM32 device itself.
  • Published data in this article is not contractual.
  • Copyright STMicroelectronics - All right reserved. Do not publish the following data without written consent of STMicroelectronics

1. Benchmark results

1.1. STM32 High Performance MCUs

STM32 Board STM32
characteristics
Model
Source/Link
Flash
(Kbyte)
RAM
(Kbyte)
Proc
Time
(ms)
Cur.
(mA)
Energy
(mJ)
3.3 V
Version
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 49 ms 203 mA 33 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 51 ms 197 mA 33 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 93 ms 200 mA 62 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 1.2 ms 176 mA 0.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 11.5 ms 190 mA 7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 37 ms 190 mA 23 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 31 ms 198 mA 20 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 49 ms 97 mA 16 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 51 ms 95 mA 16 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 93 ms 96 mA 30 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 1.2 ms 86 mA 0.34 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 11.5 ms 97.5 mA 3.74 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 37 ms 91 mA 11 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 31 ms 94 mA 9.6 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2 Mbyte
RAM 1 Mbyte (512)
Freq 480 MHz
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 49 ms 203 mA 33 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2 Mbyte
RAM 1 Mbyte (512)
Freq 480 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 59 ms 192 mA 37 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2 Mbyte
RAM 1 Mbyte (512)
Freq 480 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 108 ms 194 mA 69 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2 Mbyte
RAM 1 Mbyte (512)
Freq 480 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 1.4 ms 168 mA 0.78 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2 Mbyte
RAM 1 Mbyte (512)
Freq 480 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 13 ms 196 mA 8.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2 Mbyte
RAM 1 Mbyte (512)
Freq 480 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 42 ms 183 mA 25.6 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H743
NUCLEO-H743ZI
Flash 2 Mbyte
RAM 1 Mbyte (512)
Freq 480 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 36 ms 189 mA 22 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbyte
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 68 ms 68 mA 15 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbyte
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 70.5 ms 69.5 mA 16 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbyte
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 130 ms 69.5 mA 30 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbyte
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 1.6 ms 64 mA 0.34 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbyte
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 16 ms 70 mA 3.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbyte
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 51 ms 66 mA 11 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbyte
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 43 ms 68.5 mA 9.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 96 ms 44 mA 14 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 100 ms 43.5 mA 14 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 184 ms 44 mA 26 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 2.3 ms 40 mA 0.3 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 23 ms 44.5 mA 3.3 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 72 ms 42 mA 10 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 61 ms 43 mA 9 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
AXI 140 MHz(2)
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 97.5 ms 40.6 mA 13 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
AXI 140 MHz(2)
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 101 ms 40.2 mA 13 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
AXI 140 MHz(2)
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 185 ms 40.4 mA 25 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
AXI 140 MHz(2)
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 2.3 ms 36 mA 0.28 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
AXI 140 MHz(2)
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 22.9 ms 40.1 mA 3 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
AXI 140 MHz(2)
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 72.5 ms 38.4 mA 9.2 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
AXI 140 MHz(2)
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 61.4 ms 39.6 mA 8 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 220 MHz (280)
AXI 110 MHz(3)
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 124 ms 28.4 mA 11.6 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 220 MHz (280)
AXI 110 MHz(3)
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 128 ms 28 mA 11.8 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 220 MHz (280)
AXI 110 MHz(3)
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 235 ms 28.1 mA 21.8 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 220 MHz (280)
AXI 110 MHz(3)
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 2.9 ms 25.5 mA 0.24 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 220 MHz (280)
AXI 110 MHz(3)
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 29.1 ms 28.7 mA 2.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 220 MHz (280)
AXI 110 MHz(3)
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 92.3 ms 26.8 mA 8.2 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 220 MHz (280)
AXI 110 MHz(3)
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 78 ms 27.6 mA 7.1 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0

(1) On Cortex®-M7 core in SMPS mode 400 MHz instead of 480 max in LDO. The Cortex®-M4 is running on a while(1) infinite loop.

(2) The MCU core frequency is set to its maximum 280 MHz, but the HPRE prescaler is set to /2 to provide an HCLK/AXI frequency of 140 MHz instead of the maximum 280 MHz. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.

(3) The MCU core frequency is set to 220 MHz and the HPRE prescaler is set to /2 to provide an HCLK/AXI frequency of 110 MHz to have an optimal power setting with limited impact on the latency.. In particular, the Voltage Scale power regulator can be set to VOS1 instead of VOS0. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.



For a given STM32 in a fixed configuration, the current consumption is in the same range regardless of the model. it might however vary depending on the complexity and topology of the model. The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time. From the average inference time of t second and the average current of i Ampere for a given input voltage of u Volt. The average energy is easily computed as (t x i x u) in Joule.

STM32 Board STM32H723
550 MHz
STM32H735 SMPS
550 MHz
STM32H743
480 MHz
STM32H747
400 MHz SMPS
STM32H7A3
280 MHz
STM32H7A3
280 MHz
AXI 140MHz(2)
STM32H7A3
220 MHz
AXI 110MHz(3)
Average
current (mA)
196 95 191 69 43 40 28


STM32Cube.AI (X-CUBE-AI) can also generate a TensorFlow™ Lite for Microcontroller runtime implementation (version 2.5.0 for STM32Cube.AI v7.0.0). The following table is comparing the TFLm runtime to the X-CUBE-AI runtime.

STM32 Board STM32
characteristics
Model
Source/Link
Runtime Flash
(Kbyte)
RAM
(Kbyte)
Proc
Time
(ms)
Version
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
X-CUBE-AI 77 KB 49 KB 37 ms STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
TFL micro 96 KB 53 KB 61 ms TFLm 2.5.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
X-CUBE-AI 214 KB 37 KB 31 ms STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H723
NUCLEO-H723ZG
Flash 1 Mbyte
RAM 564 Kbyte (432)
Freq 550 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
TFL micro 325 KB 98 KB 42 ms TFLm 2.5.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
X-CUBE-AI 77 KB 49 KB 72 ms STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
TFL micro 96 KB 53 KB 120 ms TFLm 2.5.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
X-CUBE-AI 214 KB 37 KB 61 ms STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbyte
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
TFL micro 325 KB 98 KB 84 ms TFLm 2.5.0
Cube IDE 1.7.0

1.2. STM32 Ultra Low Power MCUs

STM32 Board STM32
characteristics
Model
Source/Link
Flash
Wgt.
RAM
Buf.
Proc
Time
Cur.
(mA)
Energy
(mJ)
3.3V
Version
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 249 ms 10 mA 8.2 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 250 ms 9.1 mA 7.5 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 468 ms 9.4 mA 14.5 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 5.7 ms 9.1 mA 0.17 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 61 ms 10 mA 2 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 164 ms 9 mA 4.9 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 157 ms 9.1 mA 4.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 110 MHz(4)
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 361 ms 5.7 mA 6.8 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 110 MHz(4)
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 363 ms 5.6 mA 6.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 110 MHz(4)
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 681 ms 5.8 mA 13 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 110 MHz(4)
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 8.3 ms 5.6 mA 0.16 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 110 MHz(4)
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 88.2 ms 5.8 mA 1.7 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 110 MHz(4)
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 239 ms 5.6 mA 4.4 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 110 MHz(4)
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 229 ms 5.7 mA 4.3 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Single Bank
RAM 640 Kbyte
Freq 120 MHz
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 420 ms 23 mA 32 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Single Bank
RAM 640 Kbyte
Freq 120 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 433 ms 23 mA 33 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Single Bank
RAM 640 Kbyte
Freq 120 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 781 ms 24 mA 63 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Single Bank
RAM 640 Kbyte
Freq 120 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 9 ms 22 mA 0.66 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Single Bank
RAM 640 Kbyte
Freq 120 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 104 ms 24 mA 8.3 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Single Bank
RAM 640 Kbyte
Freq 120 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 255 ms 24 mA 20 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Single Bank
RAM 640 Kbyte
Freq 120 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 260 ms 23 mA 20 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Dual Bank
RAM 640 Kbyte
Freq 120 MHz
MobileNet v1 0.25
128 quant tfl
source
468 KB 66 KB 427 ms 24 mA 34 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Dual Bank
RAM 640 Kbyte
Freq 120 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
132 KB 148 KB 455 ms 24 mA 36 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Dual Bank
RAM 640 Kbyte
Freq 120 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
403 KB 197 KB 813 ms 25 mA 66 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Dual Bank
RAM 640 Kbyte
Freq 120 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 9.7 ms 23 mA 0.73 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Dual Bank
RAM 640 Kbyte
Freq 120 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 111 ms 24 mA 9 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Dual Bank
RAM 640 Kbyte
Freq 120 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 263 ms 25 mA 22 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32L4R5
NUCLEO-L4R5ZI
Flash 2 Mbyte
Dual Bank
RAM 640 Kbyte
Freq 120 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 270 ms 24 mA 21 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32WB55 SMPS
NUCLEO-WB55RG
Flash 1 Mbyte
RAM 256 Kbyte
Freq 64 MHz
Anomaly Detection
v0.5 tfl
MLPerf™Tiny
265 KB 0.75 KB 17 ms 9.8 mA 0.55 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32WB55 SMPS
NUCLEO-WB55RG
Flash 1 Mbyte
RAM 256 Kbyte
Freq 64 MHz
Key Word Spotting
v0.5 tfl
MLPerf™Tiny
24 KB 18 KB 180 ms 11 mA 6.6 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32WB55 SMPS
NUCLEO-WB55RG
Flash 1 Mbyte
RAM 256 Kbyte
Freq 64 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
77 KB 49 KB 457 ms 11 mA 16 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32WB55 SMPS
NUCLEO-WB55RG
Flash 1 Mbyte
RAM 256 Kbyte
Freq 64 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
214 KB 37 KB 459 ms 11 mA 16 mJ STM32Cube.AI 7.0.0
Cube IDE 1.7.0

(1) The MCU core frequency is set to 110 MHz, the Voltage Scale power regulator can then be set to VOS2 instead of VOS1. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.


The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time. From the average inference time of t second and the average current of i Ampere for a given input voltage of u Volt. The average energy is easily computed as (t x i x u) in Joule

STM32 Board STM32U585
160 MHz
STM32U585
110 MHz(4)
STM32L4R5
120 MHz
Single Bank
STM32L4R5
120 MHz
Dual Bank
STMWB55
64 MHz
Average
current (mA)
9.5 5.7 23.7 24.3 10.8


STM32Cube.AI (X-CUBE-AI) can also generate a TensorFlow™ Lite for Microcontroller runtime implementation (version 2.5.0 for STM32Cube.AI v7.0.0). The following table is comparing the TFLm runtime to the X-CUBE-AI runtime.

STM32 Board STM32
characteristics
Model
Source/Link
Runtime Flash
(Kbyte)
RAM
(Kbyte)
Proc
Time
(ms)
Version
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
X-CUBE-AI 77 KB 49 KB 164 ms STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Image Classif.
v0.5 tfl
MLPerf™Tiny
TFL micro 96 KB 53 KB 315 ms TFLm 2.5.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
X-CUBE-AI 214 KB 37 KB 157 ms STM32Cube.AI 7.0.0
Cube IDE 1.7.0
STM32U585 SMPS
B-U585I-IOT02A
Flash 2 Mbyte
RAM 786 Kbyte
Freq 160 MHz
Visual Wake Word
v0.5 tfl
MLPerf™Tiny
TFL micro 325 KB 98 KB 216 ms TFLm 2.5.0
Cube IDE 1.7.0

2. Measurement process

On this benchmark only the Machine Learning model inference processing is reported. In a complete application, the sensor acquisition, the data conditioning and pre-processing must also be considered.

The STM32 Board column indicates the STM32 reference and the board used for measurement. By default, the STM32 is configured in maximum performance configuration, so with maximum frequency and especially HCLK / AXI clock at maximal frequency. When a different setting is used it is specified (for instance lower frequency to use a different Voltage Scale or for STM32H7, lower HCLK/AXI frequency). When SMPS is indicated it means that the internal voltage regulator used is the SMPS (Switched-Mode Power Supply) step-down converter instead of the LDO (Linear Voltage Regulator).

The STM32 Characteristics column provides the available internal Flash size, the full internal RAM size and the frequency. The RAM size includes the different kind of memories and banks, TCM, SRAM etc. For the time being, the buffers used by X-CUBE-AI must be placed in a continuous memory area, the maximal RAM size available in continuous area is provided between "()" if not equal to the full size. The frequency indicated is the operating frequency used for the benchmark, so generally the maximal frequency. The only different case is with the STM32H747 Discovery Kit which is operating by default in SMPS power mode and therefore is limited to 400 MHz instead of 480 MHz. Data are rounded to 3 decimals.

The memory footprints are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table).

The column Model Source/Links indicates the pre-trained ML model and the source, either how it was built / trained or where it can be downloaded. tfl stands for TensorFlow™ Lite .tflite model , h5 stands for Keras .h5 model, quant for quantized models on 8 bits. For FP-AI-VISION1 models, they are located in the package directory: FP-AI-VISION1_V3.0.0\Utilities\AI_resources.

The column Flash reports the Flash occupancy including the model weights, the runtime code generated by X-CUBE-AI to run the neural network and its constants (including the initialized tables).

The column RAM Buf. reports the RAM buffers occupancy, used to store the model activations as well as input and output buffers, and the RAM required by the runtime to inference the model. Note that to gain RAM space the "Use activation buffer for input buffer" and "Use activation buffer for the output buffer" options are selected (through X-CUBE-AI Advanced Settings panel).

The Flash and RAM memory footprints related to the runtime/code execution are computed from the memory map once the X-CUBE-AI validation project for the given model has been built with STM32CubeIDE For X-CUBE-AI runtime, the runtime/code part is computed taking into account NetworkRuntime700_CMx_GCC.a, libc_nano.a and network.o memory footprints. For Tensor Flow Lite for microcontroller runtime, the runtime/code part is computed taking into account all the module used by tflite_micro and libc_nano.a.

The column Proc Time reports the model inference processing time. When the current / energy is indicated, the measure is obtained through the X-CUBE-AI "System Performance" application following the process described on this WiKi article on power measurement. Otherwise the "Validation on target" application is used. In all cases, when generating the application, the selected clock source is always the HSI, X-CUBE-AI is generating first the optimal clock settings and eventually afterwards the clock is set to HSI. STM32CubeMX then autonomously reconfigures the clock settings.

Cur. and Energy is the current and energy computed following the process describe in the WiKi article on power measurement. For STM32 Ultra Low Power microcontrollers, measures are done with the X-NUCLEO-LPM01A power shield as described in the section 4.3.1 "Measure process when current is below 50 mA". For STM32 High Performance microcontrollers measures are done with the Qoitec Otii Arc power analyzer as described in the section 4.3.2 Measure process when current is above 50 mA. In both cases, a 10 s windows is used for averaging) and HSI is selected as clock source.

Accuracy is not reported. X-CUBE-AI is not modifying the DL/ML model topology. The impact on accuracy should be limited. X-CUBE-AI is providing through the "Validation" application a way to measure the accuracy either on x86 or on the target. It can be used to check the eventual impact on accuracy. When running the "Validation on target" application several metrics are computed, one of them is the X-Cross providing error metrics between the original model executed in Python and the C model executed on the target. Random data can be used to compute the RMSE/MAE/L2R errors, however it is recommended to use true data to get the final accuracy. For more details on the metrics, please refer to the X-CUBE-AI Embedded Documentation.

Note that accuracy check is important to compare a float model to a quantize model or when using the Weight compression feature of X-CUBE-AI for float models.