This article provides performance results for a set of well-known or reference pre-trained Neural Network models. Some STM32 performances were also part of the official v0.7 submission of the MLPerf™ Tiny inference benchmark from MLCommons™.
1. Performance results
1.1. STM32 High Performance MCUs
STM32 High Performance MCUs inference time, memory footprint and energy at 3.3 V:
STM32 Board | STM32 characteristics |
Model Source/Link |
Flash Wgt. (Kbyte) |
RAM Buf. (Kbyte) |
Proc Time (ms) |
Cur. (mA) |
Energy (mJ) 3.3 V |
Version |
---|---|---|---|---|---|---|---|---|
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 49 ms | 203 mA | 33 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 51 ms | 197 mA | 33 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 93 ms | 200 mA | 62 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 1.2 ms | 176 mA | 0.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 11.5 ms | 190 mA | 7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 37 ms | 190 mA | 23 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 31 ms | 198 mA | 20 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 49 ms | 97 mA | 16 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 51 ms | 95 mA | 16 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 93 ms | 96 mA | 30 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 1.2 ms | 86 mA | 0.34 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 11.5 ms | 97.5 mA | 3.74 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 37 ms | 91 mA | 11 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 31 ms | 94 mA | 9.6 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2 Mbytes RAM 1 Mbyte (512) Freq 480 MHz |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 49 ms | 203 mA | 33 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2 Mbytes RAM 1 Mbyte (512) Freq 480 MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 59 ms | 192 mA | 37 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2 Mbytes RAM 1 Mbyte (512) Freq 480 MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 108 ms | 194 mA | 69 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2 Mbytes RAM 1 Mbyte (512) Freq 480 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 1.4 ms | 168 mA | 0.78 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2 Mbytes RAM 1 Mbyte (512) Freq 480 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 13 ms | 196 mA | 8.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2 Mbytes RAM 1 Mbyte (512) Freq 480 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 42 ms | 183 mA | 25.6 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2 Mbytes RAM 1 Mbyte (512) Freq 480 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 36 ms | 189 mA | 22 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex®-M7 Flash 2 Mbytes RAM 1 Mbyte (0.5) Freq 400 MHz(1) |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 68 ms | 68 mA | 15 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex®-M7 Flash 2 Mbytes RAM 1 Mbyte (0.5) Freq 400 MHz(1) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 70.5 ms | 69.5 mA | 16 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex®-M7 Flash 2 Mbytes RAM 1 Mbyte (0.5) Freq 400 MHz(1) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 130 ms | 69.5 mA | 30 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex®-M7 Flash 2 Mbytes RAM 1 Mbyte (0.5) Freq 400 MHz(1) |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 1.6 ms | 64 mA | 0.34 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex®-M7 Flash 2 Mbytes RAM 1 Mbyte (0.5) Freq 400 MHz(1) |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 16 ms | 70 mA | 3.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex®-M7 Flash 2 Mbytes RAM 1 Mbyte (0.5) Freq 400 MHz(1) |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 51 ms | 66 mA | 11 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex®-M7 Flash 2 Mbytes RAM 1 Mbyte (0.5) Freq 400 MHz(1) |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 43 ms | 68.5 mA | 9.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 96 ms | 44 mA | 14 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 100 ms | 43.5 mA | 14 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 184 ms | 44 mA | 26 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 2.3 ms | 40 mA | 0.3 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 23 ms | 44.5 mA | 3.3 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 72 ms | 42 mA | 10 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 61 ms | 43 mA | 9 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz AXI 140 MHz(2) |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 97.5 ms | 40.6 mA | 13 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz AXI 140 MHz(2) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 101 ms | 40.2 mA | 13 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz AXI 140 MHz(2) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 185 ms | 40.4 mA | 25 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz AXI 140 MHz(2) |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 2.3 ms | 36 mA | 0.28 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz AXI 140 MHz(2) |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 22.9 ms | 40.1 mA | 3 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz AXI 140 MHz(2) |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 72.5 ms | 38.4 mA | 9.2 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz AXI 140 MHz(2) |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 61.4 ms | 39.6 mA | 8 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 220 MHz (280) AXI 110 MHz(3) |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 124 ms | 28.4 mA | 11.6 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 220 MHz (280) AXI 110 MHz(3) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 128 ms | 28 mA | 11.8 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 220 MHz (280) AXI 110 MHz(3) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 235 ms | 28.1 mA | 21.8 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 220 MHz (280) AXI 110 MHz(3) |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 2.9 ms | 25.5 mA | 0.24 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 220 MHz (280) AXI 110 MHz(3) |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 29.1 ms | 28.7 mA | 2.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 220 MHz (280) AXI 110 MHz(3) |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 92.3 ms | 26.8 mA | 8.2 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 220 MHz (280) AXI 110 MHz(3) |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 78 ms | 27.6 mA | 7.1 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
(1) On Cortex®-M7 core in SMPS mode 400 MHz instead of 480 max in LDO. The Cortex®-M4 is running on a while(1) infinite loop.
(2) The MCU core frequency is set to its maximum 280 MHz, but the HPRE prescaler is set to /2 to provide an HCLK/AXI frequency of 140 MHz instead of the maximum 280 MHz. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.
(3) The MCU core frequency is set to 220 MHz and the HPRE prescaler is set to /2 to provide an HCLK/AXI frequency of 110 MHz to have an optimal power setting with limited impact on the latency.. In particular, the Voltage Scale power regulator can be set to VOS1 instead of VOS0. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.
For a given STM32 in a fixed configuration, the current consumption is in the same range regardless of the model.
it might however vary depending on the complexity and topology of the model.
The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time. From the average inference time of t second and the average current of i Ampere for a given input voltage of u Volt. The average energy is easily computed as (t x i x u) in Joule.
STM32 Board | STM32H723 550 MHz |
STM32H735 SMPS 550 MHz |
STM32H743 480 MHz |
STM32H747 400 MHz SMPS |
STM32H7A3 280 MHz |
STM32H7A3 280 MHz AXI 140 MHz(2) |
STM32H7A3 220 MHz AXI 110 MHz(3) |
---|---|---|---|---|---|---|---|
Average current (mA) |
196 | 95 | 191 | 69 | 43 | 40 | 28 |
STM32Cube.AI (X-CUBE-AI) can also generate a TensorFlow™ Lite for Microcontroller (TFLm) runtime implementation (version 2.7.0 for STM32Cube.AI v7.1.0).
The following table is comparing the TFLm runtime to the X-CUBE-AI runtime.
STM32 Board | STM32 characteristics |
Model Source/Link |
Runtime | Flash (Kbyte) |
RAM (Kbyte) |
Proc Time (ms) |
Version |
---|---|---|---|---|---|---|---|
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Image Classif. MLPerf™Tiny |
X-CUBE-AI | 77 Kbytes | 49 Kbytes | 37 ms | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Image Classif. MLPerf™Tiny |
TFLm | 96 Kbytes | 53 Kbytes | 61 ms | TFLm 2.5.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Visual Wake Word MLPerf™Tiny |
X-CUBE-AI | 214 Kbytes | 37 Kbytes | 31 ms | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1 Mbyte RAM 564 Kbytes (432) Freq 550 MHz |
Visual Wake Word MLPerf™Tiny |
TFLm | 325 Kbytes | 98 Kbytes | 42 ms | TFLm 2.5.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Image Classif. MLPerf™Tiny |
X-CUBE-AI | 77 Kbytes | 49 Kbytes | 72 ms | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Image Classif. MLPerf™Tiny |
TFLm | 96 Kbytes | 53 Kbytes | 120 ms | TFLm 2.5.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Visual Wake Word MLPerf™Tiny |
X-CUBE-AI | 214 Kbytes | 37 Kbytes | 61 ms | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2 Mbytes RAM 1.4 Mbyte (1.18) Freq 280 MHz |
Visual Wake Word MLPerf™Tiny |
TFLm | 325 Kbytes | 98 Kbytes | 84 ms | TFLm 2.5.0 STM32CubeIDE 1.7.0 |
1.2. STM32 Ultra Low Power MCUs
STM32 Ultra Low Power MCUs inference time, memory footprint and energy at 3.3 V:
STM32 Board | STM32 characteristics |
Model Source/Link |
Flash Wgt. (Kbyte) |
RAM Buf. (Kbyte) |
Proc Time (ms) |
Cur. (mA) |
Energy (mJ) 3.3 V |
Version |
---|---|---|---|---|---|---|---|---|
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 249 ms | 10 mA | 8.2 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 250 ms | 9.1 mA | 7.5 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 468 ms | 9.4 mA | 14.5 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 5.7 ms | 9.1 mA | 0.17 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 61 ms | 10 mA | 2 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 164 ms | 9 mA | 4.9 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 157 ms | 9.1 mA | 4.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 110 MHz(4) |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 361 ms | 5.7 mA | 6.8 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 110 MHz(4) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 363 ms | 5.6 mA | 6.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 110 MHz(4) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 681 ms | 5.8 mA | 13 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 110 MHz(4) |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 8.3 ms | 5.6 mA | 0.16 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 110 MHz(4) |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 88.2 ms | 5.8 mA | 1.7 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 110 MHz(4) |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 239 ms | 5.6 mA | 4.4 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 110 MHz(4) |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 229 ms | 5.7 mA | 4.3 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Single Bank RAM 640 Kbytes Freq 120 MHz |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 420 ms | 23 mA | 32 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Single Bank RAM 640 Kbytes Freq 120 MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 433 ms | 23 mA | 33 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Single Bank RAM 640 Kbytes Freq 120 MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 781 ms | 24 mA | 63 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Single Bank RAM 640 Kbytes Freq 120 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 9 ms | 22 mA | 0.66 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Single Bank RAM 640 Kbytes Freq 120 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 104 ms | 24 mA | 8.3 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Single Bank RAM 640 Kbytes Freq 120 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 255 ms | 24 mA | 20 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Single Bank RAM 640 Kbytes Freq 120 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 260 ms | 23 mA | 20 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Dual Bank RAM 640 Kbytes Freq 120 MHz |
MobileNet v1 0.25 128 quant tfl source |
468 Kbytes | 66 Kbytes | 427 ms | 24 mA | 34 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Dual Bank RAM 640 Kbytes Freq 120 MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 Kbytes | 148 Kbytes | 455 ms | 24 mA | 36 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Dual Bank RAM 640 Kbytes Freq 120 MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 Kbytes | 197 Kbytes | 813 ms | 25 mA | 66 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Dual Bank RAM 640 Kbytes Freq 120 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 9.7 ms | 23 mA | 0.73 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Dual Bank RAM 640 Kbytes Freq 120 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 111 ms | 24 mA | 9 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Dual Bank RAM 640 Kbytes Freq 120 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 263 ms | 25 mA | 22 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2 Mbytes Dual Bank RAM 640 Kbytes Freq 120 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 270 ms | 24 mA | 21 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1 Mbyte RAM 256 Kbytes Freq 64 MHz |
Anomaly Detection MLPerf™Tiny |
265 Kbytes | 0.75 Kbyte | 17 ms | 9.8 mA | 0.55 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1 Mbyte RAM 256 Kbytes Freq 64 MHz |
Key Word Spotting MLPerf™Tiny |
24 Kbytes | 18 Kbytes | 180 ms | 11 mA | 6.6 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1 Mbyte RAM 256 Kbytes Freq 64 MHz |
Image Classif. MLPerf™Tiny |
77 Kbytes | 49 Kbytes | 457 ms | 11 mA | 16 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1 Mbyte RAM 256 Kbytes Freq 64 MHz |
Visual Wake Word MLPerf™Tiny |
214 Kbytes | 37 Kbytes | 459 ms | 11 mA | 16 mJ | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
(1) The MCU core frequency is set to 110 MHz, the Voltage Scale power regulator can then be set to VOS2 instead of VOS1. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.
The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time. From the average inference time of t second and the average current of i Ampere for a given input voltage of u Volt. The average energy is easily computed as (t x i x u) in Joule
STM32 Board | STM32U585 160 MHz |
STM32U585 110 MHz(4) |
STM32L4R5 120 MHz Single Bank |
STM32L4R5 120 MHz Dual Bank |
STMWB55 64 MHz |
---|---|---|---|---|---|
Average current (mA) |
9.5 | 5.7 | 23.7 | 24.3 | 10.8 |
STM32Cube.AI (X-CUBE-AI) can also generate a TensorFlow™ Lite for Microcontroller (TFLm) runtime implementation (version 2.5.0 for STM32Cube.AI v7.0.0).
The following table is comparing the TFLm runtime to the X-CUBE-AI runtime.
STM32 Board | STM32 characteristics |
Model Source/Link |
Runtime | Flash (Kbyte) |
RAM (Kbyte) |
Proc Time (ms) |
Version |
---|---|---|---|---|---|---|---|
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Image Classif. MLPerf™Tiny |
X-CUBE-AI | 77 Kbytes | 49 Kbytes | 164 ms | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Image Classif. MLPerf™Tiny |
TFLm | 96 Kbytes | 53 Kbytes | 315 ms | TFLm 2.5.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Visual Wake Word MLPerf™Tiny |
X-CUBE-AI | 214 Kbytes | 37 Kbytes | 157 ms | STM32Cube.AI 7.0.0 STM32CubeIDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2 Mbytes RAM 786 Kbytes Freq 160 MHz |
Visual Wake Word MLPerf™Tiny |
TFLm | 325 Kbytes | 98 Kbytes | 216 ms | TFLm 2.5.0 STM32CubeIDE 1.7.0 |
2. Measurement process
On this performance only the Machine Learning model inference processing is reported. In a complete application, the sensor acquisition, the data conditioning and pre-processing must also be considered.
The STM32 Board column indicates the STM32 reference and the board used for measurement. By default, the STM32 is configured in maximum performance configuration, so with maximum frequency and especially HCLK / AXI clock at maximal frequency. When a different setting is used it is specified (for instance lower frequency to use a different Voltage Scale or for STM32H7, lower HCLK/AXI frequency). When SMPS is indicated it means that the internal voltage regulator used is the SMPS (Switched-Mode Power Supply) step-down converter instead of the LDO (Linear Voltage Regulator).
The STM32 Characteristics column provides the available internal Flash size, the full internal RAM size and the frequency. The RAM size includes the different kind of memories and banks, TCM, SRAM etc. For the time being, the buffers used by X-CUBE-AI must be placed in a continuous memory area, the maximal RAM size available in continuous area is provided between "()" if not equal to the full size. The frequency indicated is the operating frequency used for the test, so generally the maximal frequency. The only different case is with the STM32H747 Discovery kit (STM32H747I-DISCO), which is operating by default in SMPS power mode and therefore is limited to 400 MHz instead of 480 MHz. Data are rounded to 3 decimals.
The column Model Source/Links indicates the pre-trained ML model and the source, either how it was built / trained or where it can be downloaded. tfl stands for TensorFlow™ Lite .tflite model , h5 stands for Keras .h5 model, quant for quantized models on 8 bits. For FP-AI-VISION1 models, they are located in the package directory: FP-AI-VISION1_V3.0.0\Utilities\AI_resources.
The memory footprints are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table).
The column Flash Wgt. reports the Flash occupancy of the model weights.
The column RAM Buf. reports the RAM buffers occupancy, used to store the model activations as well as input and output buffers. Note that to gain RAM space the "Use activation buffer for input buffer" and "Use activation buffer for the output buffer" options are selected (through X-CUBE-AI Advanced Settings panel).
For the table comparing Tensor Flow Lite for Microcontroller to STM32Cube.AI runtime
The column Flash reports the Flash occupancy including the model weights, the runtime code generated by X-CUBE-AI to run the neural network and its constants (including the initialized tables).
The column RAM reports the RAM buffers occupancy, used to store the model activations as well as input and output buffers, and the RAM required by the runtime to inference the model. Note that to gain RAM space the "Use activation buffer for input buffer" and "Use activation buffer for the output buffer" options are selected (through X-CUBE-AI Advanced Settings panel).
For X-CUBE-AI runtime, the total Flash and RAM memory footprints are reported after an "Analyze" operation on the main panel by the fields Used Flash and Used RAM. The compiler used is gcc embedded in STM32CubeIDE.
Limitation: on X-CUBE-AI version 7.1 and below, the STM32U5 memory footprints reported in this fields are not integrating the runtime/code parts (the bug will be fixed in version 7.2), but will be identical to the one obtained with the NUCLEO-L552ZE-Q (based also on a Cortex®-M33).
For TensorFlow™ Lite for microcontroller runtime, the Flash and RAM memory footprints related to the runtime/code execution are computed from the memory map of the validation project of the given model built with STM32CubeIDE. The runtime/code part is computed taking into account all the modules used by tflite_micro.
The column Proc Time reports the model inference processing time. When the current / energy is indicated, the measure is obtained through the X-CUBE-AI "System Performance" application following the process described on this WiKi article on power measurement. Otherwise the "Validation on target" application is used. In all cases, when generating the application, the selected clock source is always the HSI, X-CUBE-AI is generating first the optimal clock settings and eventually afterwards the clock is set to HSI. STM32CubeMX then autonomously reconfigures the clock settings.
Cur. and Energy is the current and energy computed following the process described in the WiKi article on power measurement. For STM32 Ultra Low Power microcontrollers, measurement is done with the X-NUCLEO-LPM01A power shield as described in the section 4.3.1 "Measure process when current is below 50 mA". For STM32 High Performance microcontrollers, measurement is done with the Qoitec Otii Arc power analyzer as described in the section 4.3.2 "Measure process when current is above 50 mA". In both cases, a 10 s window is used for averaging) and HSI is selected as the clock source.
Accuracy is not reported. X-CUBE-AI is not modifying the DL/ML model topology. The impact on accuracy should be limited. X-CUBE-AI is providing through the "Validation" application a way to measure the accuracy either on x86 or on the target. It can be used to check the eventual impact on accuracy. When running the "Validation on target" application several metrics are computed, one of them is the X-Cross providing error metrics between the original model executed in Python™ and the C model executed on the target. Random data can be used to compute the RMSE/MAE/L2R errors, however it is recommended to use true data to get the final accuracy. For more details on the metrics, refer to the X-CUBE-AI Embedded Documentation.
Note that accuracy check is important to compare a float model with a quantize model or when using the Weight compression feature of X-CUBE-AI for float models.