This article is providing benchmark of a set of well-known or reference pre-trained neural network models. Some STM32 results will be officially submitted to the MLPerf™ Tiny benchmark from MLCommons™.
1. Benchmark Results
1.1. STM32 High Performance
STM32 Board | STM32 characteristics |
Model Source/Link |
Flash (KiB) |
RAM (KiB) |
Proc Time (ms) |
Cur. (mA) |
Energy (mJ) 3.3V |
Version |
---|---|---|---|---|---|---|---|---|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 49 ms | 203 mA | 33 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 51 ms | 197 mA | 33 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 93 ms | 200 mA | 62 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 1.2 ms | 176 mA | 0.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 11.5 ms | 190 mA | 7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 37 ms | 190 mA | 23 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 31 ms | 198 mA | 20 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1MB RAM 564KB (432) Freq 550MHz |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 49 ms | 97 mA | 16 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1MB RAM 564KB (432) Freq 550MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 51 ms | 95 mA | 16 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 93 ms | 96 mA | 30 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 1.2 ms | 86 mA | 0.34 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 11.5 ms | 97.5 mA | 3.74 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 37 ms | 91 mA | 11 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H735 SMPS STM32H735G-DK |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 31 ms | 94 mA | 9.6 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 49 ms | 203 mA | 33 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 59 ms | 192 mA | 37 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 108 ms | 194 mA | 69 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 1.4 ms | 168 mA | 0.78 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 13 ms | 196 mA | 8.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 42 ms | 183 mA | 25.6 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H743 NUCLEO-H743ZI |
Flash 2MB RAM 1MB (512) Freq 480MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 36 ms | 189 mA | 22 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 68 ms | 68 mA | 15 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 70.5 ms | 69.5 mA | 16 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 130 ms | 69.5 mA | 30 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 1.6 ms | 64 mA | 0.34 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 16 ms | 70 mA | 3.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 51 ms | 66 mA | 11 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H747 SMPS STM32H747I-DISCO |
Cortex® M7 Flash 2MB RAM 1MB (0.5) Freq 400MHz(1) |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 43 ms | 68.5 mA | 9.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 96 ms | 44 mA | 14 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 100 ms | 43.5 mA | 14 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 184 ms | 44 mA | 26 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 2.3 ms | 40 mA | 0.3 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 23 ms | 44.5 mA | 3.3 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 72 ms | 42 mA | 10 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 61 ms | 43 mA | 9 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz AXI 140MHz(2) |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 97.5 ms | 40.6 mA | 13 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz AXI 140MHz(2) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 101 ms | 40.2 mA | 13 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz AXI 140MHz(2) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 185 ms | 40.4 mA | 25 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz AXI 140MHz(2) |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 2.3 ms | 36 mA | 0.28 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz AXI 140MHz(2) |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 22.9 ms | 40.1 mA | 3 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz AXI 140MHz(2) |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 72.5 ms | 38.4 mA | 9.2 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz AXI 140MHz(2) |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 61.4 ms | 39.6 mA | 8 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 220MHz (280) AXI 110MHz(3) |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 124 ms | 28.4 mA | 11.6 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 220MHz (280) AXI 110MHz(3) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 128 ms | 28 mA | 11.8 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 220MHz (280) AXI 110MHz(3) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 235 ms | 28.1 mA | 21.8 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 220MHz (280) AXI 110MHz(3) |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 2.9 ms | 25.5 mA | 0.24 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 220MHz (280) AXI 110MHz(3) |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 29.1 ms | 28.7 mA | 2.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 220MHz (280) AXI 110MHz(3) |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 92.3 ms | 26.8 mA | 8.2 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 220MHz (280) AXI 110MHz(3) |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 78 ms | 27.6 mA | 7.1 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
(1) On Cortex® M7 core in SMPS mode 400MHz instead of 480 max in LDO.
(2) The MCU core frequency is set to its maximum 280 MHz, but the HPRE prescaler is set to /2 to provide an HCLK/AXI frequency of 140 MHz instead of the maximum 280 MHz. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.
(3) The MCU core frequency is set to 220 MHz and the HPRE prescaler is set to /2 to provide an HCLK/AXI frequency of 110 MHz to have an optimal power setting with limited impact on the latency.. In particular, the Voltage Scale power regulator can be set to VOS1 instead of VOS0. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.
For a given STM32 in a fixed configuration, the current consumption is in the same range regardless of the model.
it might however vary depending on the complexity and topology of the model.
The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time.
STM32 Board | STM32H723 550 MHz |
STM32H735 SMPS 550 MHz |
STM32H743 480 MHz |
STM32H747 400 MHz SMPS |
STM32H7A3 280 MHz |
---|---|---|---|---|---|
Average current (mA) |
196 | 95 | 191 | 69 | 43 |
STM32Cube.AI can also generate a Tensor Flow Lite for Microcontroller runtime implementation (version 2.5.0 for STM32Cube.AI v7.0.0).
The following table is comparing the TFLm runtime to the X-CUBE-AI runtime.
STM32 Board | STM32 characteristics |
Model Source/Link |
Runtime | Flash (KiB) |
RAM (KiB) |
Proc Time (ms) |
Version |
---|---|---|---|---|---|---|---|
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
X-CUBE-AI | 77 KB | 49 KB | 37 ms | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
TFL micro | 96 KB | 53 KB | 61 ms | TFLm 2.5.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
X-CUBE-AI | 214 KB | 37 KB | 31 ms | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H723 NUCLEO-H723ZG |
Flash 1MB RAM 564KB (432) Freq 550MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
TFL micro | 325 KB | 98 KB | 42 ms | TFLm 2.5.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
X-CUBE-AI | 77 KB | 49 KB | 72 ms | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
TFL micro | 96 KB | 53 KB | 120 ms | TFLm 2.5.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
X-CUBE-AI | 214 KB | 37 KB | 61 ms | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32H7A3 SMPS NUCLEO-H7A3ZI-Q |
Flash 2MB RAM 1.4MB (1.18) Freq 280MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
TFL micro | 325 KB | 98 KB | 84 ms | TFLm 2.5.0 Cube IDE 1.7.0 |
1.2. STM32 Ultra Low Power
STM32 Board | STM32 characteristics |
Model Source/Link |
Flash Wgt. |
RAM Buf. |
Proc Time |
Cur. (mA) |
Energy (mJ) 3.3V |
Version |
---|---|---|---|---|---|---|---|---|
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 249 ms | 10 mA | 8.2 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 250 ms | 9.1 mA | 7.5 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 468 ms | 9.4 mA | 14.5 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 5.7 ms | 9.1 mA | 0.17 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 61 ms | 10 mA | 2 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 164 ms | 9 mA | 4.9 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 157 ms | 9.1 mA | 4.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 110MHz(4) |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 361 ms | 5.7 mA | 6.8 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 110MHz(4) |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 363 ms | 5.6 mA | 6.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 110MHz(4) |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 681 ms | 5.8 mA | 13 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 110MHz(4) |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 8.3 ms | 5.6 mA | 0.16 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 110MHz(4) |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 88.2 ms | 5.8 mA | 1.7 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 110MHz(4) |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 239 ms | 5.6 mA | 4.4 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 110MHz(4) |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 229 ms | 5.7 mA | 4.3 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Single Bank RAM 640KB Freq 120MHz |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 420 ms | 23 mA | 32 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Single Bank RAM 640KB Freq 120MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 433 ms | 23 mA | 33 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Single Bank RAM 640KB Freq 120MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 781 ms | 24 mA | 63 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Single Bank RAM 640KB Freq 120MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 9 ms | 22 mA | 0.66 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Single Bank RAM 640KB Freq 120MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 104 ms | 24 mA | 8.3 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Single Bank RAM 640KB Freq 120MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 255 ms | 24 mA | 20 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Single Bank RAM 640KB Freq 120MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 260 ms | 23 mA | 20 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Dual Bank RAM 640KB Freq 120MHz |
MobileNet v1 0.25 128 quant tfl source |
468 KB | 66 KB | 427 ms | 24 mA | 34 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Dual Bank RAM 640KB Freq 120MHz |
FoodReco quant h5 deriv MobileNet FP-AI-VISION1 |
132 KB | 148 KB | 455 ms | 24 mA | 36 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Dual Bank RAM 640KB Freq 120MHz |
Person Presence MobileNet v2 128 FP-AI-VISION1 |
403 KB | 197 KB | 813 ms | 25 mA | 66 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Dual Bank RAM 640KB Freq 120MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 9.7 ms | 23 mA | 0.73 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Dual Bank RAM 640KB Freq 120MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 111 ms | 24 mA | 9 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Dual Bank RAM 640KB Freq 120MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 263 ms | 25 mA | 22 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32L4R5 NUCLEO-L4R5ZI |
Flash 2MB Dual Bank RAM 640KB Freq 120MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 270 ms | 24 mA | 21 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1MB RAM 256KB Freq 64MHz |
Anomaly Detection v0.5 tfl MLPerf™Tiny |
265 KB | 0.75 KB | 17 ms | 9.8 mA | 0.55 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1MB RAM 256KB Freq 64MHz |
Key Word Spotting v0.5 tfl MLPerf™Tiny |
24 KB | 18 KB | 180 ms | 11 mA | 6.6 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1MB RAM 256KB Freq 64MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
77 KB | 49 KB | 457 ms | 11 mA | 16 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32WB55 SMPS NUCLEO-WB55RG |
Flash 1MB RAM 256KB Freq 64MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
214 KB | 37 KB | 459 ms | 11 mA | 16 mJ | Cube AI 7.0.0 Cube IDE 1.7.0 |
(1) The MCU core frequency is set to 110 MHz, the Voltage Scale power regulator can then be set to VOS2 instead of VOS1. Note that this setting might impact the rest of the system in a broader application like DMA transfer or ChromeART speed.
The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time.
STM32 Board | STM32U585 160 MHz |
STM32L4R5 120 MHz Single Bank |
STM32L4R5 120 MHz Dual Bank |
---|---|---|---|
Average current (mA) |
9.5 | 23.7 | 24.3 |
STM32Cube.AI can also generate a Tensor Flow Lite for Microcontroller runtime implementation (version 2.5.0 for STM32Cube.AI v7.0.0).
The following table is comparing the TFLm runtime to the X-CUBE-AI runtime.
STM32 Board | STM32 characteristics |
Model Source/Link |
Runtime | Flash (KiB) |
RAM (KiB) |
Proc Time (ms) |
Version |
---|---|---|---|---|---|---|---|
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
X-CUBE-AI | 77 KB | 49 KB | 164 ms | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Image Classif. v0.5 tfl MLPerf™Tiny |
TFL micro | 96 KB | 53 KB | 315 ms | TFLm 2.5.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
X-CUBE-AI | 214 KB | 37 KB | 157 ms | Cube AI 7.0.0 Cube IDE 1.7.0 |
STM32U585 SMPS B-U585I-IOT02A |
Flash 2MB RAM 786KB Freq 160MHz |
Visual Wake Word v0.5 tfl MLPerf™Tiny |
TFL micro | 325 KB | 98 KB | 216 ms | TFLm 2.5.0 Cube IDE 1.7.0 |
2. Measure process
On this benchmark only the machine learning model inference processing is reported. In a complete application, the sensor acquisition, the data conditioning and pre-processing shall also be considered.
The STM32 Board column indicates the STM32 reference and the board used for measurement. By default, the STM32 is configured in maximum performance configuration, so with maximum frequency and especially HCLK / AXI clock at maximal frequency. When a different setting is used it is specified (for instance lower frequency to use a different Voltage Scale or for STM32H7, lower HCLK/AXI frequency). When SMPS is indicated it means that the internal voltage regulator used is the SMPS (Switched-Mode Power Supply) step-down converter instead of the LDO (Linear Voltage Regulator).
The STM32 Characteristics column provides the available internal Flash size, the full internal RAM size and the frequency. The RAM size includes the different kind of memories and banks, TCM, SRAM etc. For the time being, the buffers used by X-CUBE-AI shall be placed in a continuous memory area, the maximal RAM size available in continuous area is provided between "()" if not equal to the full size. The frequency indicated is the operating frequency used for the benchmark, so generally the maximal frequency. The only different case is with the STM32H747 Discovery Kit which is operating by default in SMPS power mode and therefore is limited to 400 MHz instead of 480 MHz. Data are rounded to 3 decimals.
The memory footprints are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table).
The column Model Source/Links indicates the pre-trained ML model and the source, either how it was built / trained or where it can be downloaded. tfl stands for TensorFlow™ Lite .tflite model , h5 stands for Keras .h5 model, quant for quantized models on 8 bits. For FP-AI-VISION1 models, they are located in the package directory: FP-AI-VISION1_V3.0.0\Utilities\AI_resources.
The column Flash reports the Flash occupancy including the model weights, the runtime code generated by X-CUBE-AI to run the neural network and its constants (including the initialized tables).
The column RAM Buf. reports the RAM buffers occupancy, used to store the model activations as well as input and output buffers, and the RAM required by the runtime to inference the model. Note that to gain RAM space the "Use activation buffer for input buffer" and "Use activation buffer for the output buffer" options are selected (through X-CUBE-AI Advanced Settings panel).
The Flash and RAM memory footprint related to the runtime/code execution are computed from the memory map once the X-CUBE-AI validation project for the given model has been built with STM32CubeIDE For X-CUBE-AI runtime, the runtime/code part is computed taking into account NetworkRuntime700_CMx_GCC.a, libc_nano.a and network.o memory footprints. For Tensor Flow Lite for microcontroller runtime, the runtime/code part is computed taking into account all the module used by tflite_micro and libc_nano.a.
The column Proc Time reports the model inference processing time. When the current / energy is indicated the measure is done thanks to X-CUBE-AI "System Performance" application following the process described on this WiKi article on power measurement. Otherwise the "Validation on target" application is used. In all case, when generating the application, the selected clock source is always the HSI, X-CUBE-AI is generating first the optimal clock settings and eventually afterwards the clock is set to HSI. STM32CubeMX then autonomously reconfigures the clock settings.
Cur. and Energy is the current and energy computed following the process describe in the WiKi article on power measurement. For STM32 Ultra Low Power microcontrollers, measures are done with the X-NUCLEO-LPM01A power shield as described in the section 4.3.1 "Measure process when current is below 50 mA". For STM32 High Performance microcontrollers measures are done with the Qoitec Otii Arc power analyzer as described in the section 4.3.2 Measure process when current is above 50 mA. In both cases, a 10 s windows is used for averaging) and HSI is selected as clock source.
Accuracy is not reported. X-CUBE-AI is not modifying the DL/ML model topology. The impact on accuracy should be limited. X-CUBE-AI is providing through the "Validation" application a way to measure the accuracy either on x86 or on the target. It can be used to check the eventual impact on accuracy. When running the "Validation on target" application several metrics are computed, one of them is the X-Cross providing error metrics between the original model executed in Python and the C model executed on the target. Random data can be used to compute the RMSE/MAE/L2R errors, however it is recommended to use true data to get the final accuracy. For more details on the metrics, please refer to the X-CUBE-AI Embedded Documentation.
Note that accuracy check is important to compare a float model to a quantize model or when using the Weight compression feature of X-CUBE-AI for float models.