STM32Cube.AI model performances

Revision as of 10:11, 4 August 2022 by Registered User

This article provides performance results for a set of well-known or reference pre-trained Neural Network models.

Performance metrics verified by the MLCommons association have been published in the MLPerf™ Tiny v0.7 benchmark. Below are additional performance metrics measured by STMicroelectronics, which have not been verified by MLCommons [ST 1].

Info white.png Information
  • X-CUBE-AI[ST 2] is an expansion software for STM32CubeMX that generates optimized C code for STM32 microcontrollers and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement[ST 3] with the additional component license schemes listed in the product data brief[ST 4]
  • The inference time, current and energy measurement process described is not done in a certified laboratory but can be reproduced by any user. The results are average values, which may vary depending on the input data (random data are currently used), the temperature, and the STM32 device itself.
  • Published data in this article is not contractual.
  • Copyright STMicroelectronics - All right reserved. Do not publish the following data without written consent of STMicroelectronics

1 Performance results

1.1 STM32 High Performance MCUs

STM32 High Performance MCUs inference time, memory footprint and energy at 3.3 V:

STM32 Board STM32
characteristics
Model
Source\Link
Flash
total
(Kbyte)
RAM
total
(Kbyte)
Proc
Time
(ms)
Cur.
(mA)
Energy
(mJ)
3.3 V
Version
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbytes (432)
Freq 550 MHz
MobileNet v1 0.25
128 quant tfl
source
516 Kbytes 80 Kbytes 46 ms 77 mA 12 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbytes (432)
Freq 550 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
177 Kbytes 161 Kbytes 50 ms 70 mA 11 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbytes (432)
Freq 550 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
542 Kbytes 261 Kbytes 83 ms 71 mA 19 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbytes (432)
Freq 550 MHz
Anomaly Detection
MLPerf™Tiny
282 Kbytes 6.87 Kbytes 0.9 ms 61 mA 0.18 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbytes (432)
Freq 550 MHz
Key Word Spotting
MLPerf™Tiny
85 Kbytes 25 Kbytes 15.5 ms 62 mA 2.13 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbytes (432)
Freq 550 MHz
Image Classif.
MLPerf™Tiny
141 Kbytes 50 Kbytes 28 ms 65 mA 6 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H735 SMPS
STM32H735G-DK
Flash 1 Mbyte
RAM 564 Kbytes (432)
Freq 550 MHz
Visual Wake Word
MLPerf™Tiny
301 Kbytes 58 Kbytes 24 ms 64 mA 5 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbytes
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
MobileNet v1 0.25
128 quant tfl
source
516 Kbytes 80 Kbytes 64 ms 54 mA 11.4 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbytes
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
177 Kbytes 161 Kbytes 69 ms 53 mA 12 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbytes
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Person Presence
MobileNet v2 128
FP-AI-VISION1
542 Kbytes 261 Kbytes 115 ms 69.5 mA 52 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbytes
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Anomaly Detection
MLPerf™Tiny
282 Kbytes 6.87 Kbytes 1.22 ms 53 mA 0.21 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbytes
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Key Word Spotting
MLPerf™Tiny
85 Kbytes 25 Kbytes 14.4 ms 54 mA 2.57 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbytes
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Image Classif.
MLPerf™Tiny
141 Kbytes 50 Kbytes 39 ms 54.4 mA 7 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H747 SMPS
STM32H747I-DISCO
Cortex®-M7
Flash 2 Mbytes
RAM 1 Mbyte (0.5)
Freq 400 MHz(1)
Visual Wake Word
MLPerf™Tiny
301 Kbytes 58 Kbytes 43 ms 37 mA 6.57 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
MobileNet v1 0.25
128 quant tfl
source
516 Kbytes 80 Kbytes 91 ms 42.8 mA 12.8 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
177 Kbytes 161 Kbytes 97.8 ms 42 mA 14.6 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
542 Kbytes 261 Kbytes 163 ms 41 mA 22 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Anomaly Detection
MLPerf™Tiny
282 Kbytes 6.87 Kbytes 1.83 ms 40 mA 0.2 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Key Word Spotting
MLPerf™Tiny
85 Kbytes 25 Kbytes 20.6 ms 42.4 mA 2.8 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Image Classif.
MLPerf™Tiny
141 Kbytes 50 Kbytes 55.8 ms 42.6 mA 7.8 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Visual Wake Word
MLPerf™Tiny
301 Kbytes 58 Kbytes 53 ms 41.6 mA 7.8 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0

(1) On Cortex®-M7 core in SMPS mode 400 MHz instead of 480 max in LDO. The Cortex®-M4 is running on a while(1) infinite loop.

For a given STM32 in a fixed configuration, the current consumption is in the same range regardless of the model. it might however vary depending on the complexity and topology of the model. The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time. From the average inference time of t second and the average current of i Ampere for a given input voltage of u Volt. The average energy is easily computed as (t x i x u) in Joule.

STM32 Board STM32H735
550 MHz SMPS
STM32H747
400 MHz SMPS
STM32H7A3
280 MHz SMPS
Average
current (mA)
68 54 42


STM32Cube.AI (X-CUBE-AI) can also generate a TensorFlow™ Lite for Microcontroller (TFLm) runtime implementation (based on TensorFlow™ version 2.7.0 sha-1 = 86c8d52 for STM32Cube.AI v7.2.0). The following table is comparing the TFLm runtime to the X-CUBE-AI runtime, the Flash and RAM footprints include the code / runtime footprint on top of the weights and activation buffer.

STM32 Board STM32
characteristics
Model
Source/Link
Runtime Flash
(Kbyte)
RAM
(Kbyte)
Proc
Time
(ms)
Version
STM32H7A3 SMPS
NUCLEO-H7A3ZI-Q
Flash 2 Mbytes
RAM 1.4 Mbyte (1.18)
Freq 280 MHz
Image Classif.
MLPerf™Tiny
X-CUBE-AI 141 Kbytes 50 Kbytes 56 ms STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
TFLm 148 Kbytes 55 Kbytes 99 ms TFLm sha-1 = 86c8d52
STM32CubeIDE 1.9.0
Visual Wake Word
MLPerf™Tiny
X-CUBE-AI 301 Kbytes 58 Kbytes 53 ms STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
TFLm 381 Kbytes 102 Kbytes 72 ms TFLm sha-1 = 86c8d52
STM32CubeIDE 1.9.0

1.2 STM32 Ultra Low Power MCUs

STM32 Ultra Low Power MCUs inference time, memory footprint and energy at 3.3 V:

STM32 Board STM32
characteristics
Model
Source/Link
Flash
Total.
(Kbyte)
RAM
Total.
(Kbyte)
Proc
Time
(ms)
Cur.
(mA)
Energy
(mJ)
3.3 V
Version
STM32U585 SMPS
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
MobileNet v1 0.25
128 quant tfl
source
516 Kbytes 80 Kbytes 238 ms 9.3 mA 7.3 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32U585 SMPS
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
177 Kbytes 161 Kbytes 244 ms 9.5 mA 7.6 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32U585 SMPS
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
542 Kbytes 261 Kbytes 399 ms 9.9 mA 13.1 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32U585 SMPS
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
Anomaly Detection
MLPerf™Tiny
282 Kbytes 6.87 Kbytes 4.8 ms 9 mA 0.14 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32U585 SMPS
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
Key Word Spotting
MLPerf™Tiny
85 Kbytes 25 Kbytes 52 ms 8.3 mA 1.4 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32U585 SMPS
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
Image Classif.
MLPerf™Tiny
142 Kbytes 50 Kbytes 148 ms 9.48 mA 4.6 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32U585 SMPS
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
Visual Wake Word
MLPerf™Tiny
302 Kbytes 58 Kbytes 134 ms 9.29 mA 4.2 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32L4R5 LDO
NUCLEO-L4R5ZI
Flash 2 Mbytes
Single Bank
RAM 640 Kbytes
Freq 120 MHz
MobileNet v1 0.25
128 quant tfl
source
516 Kbytes 80 Kbytes 408 ms 24 mA 33 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32L4R5 LDO
NUCLEO-L4R5ZI
Flash 2 Mbytes
Single Bank
RAM 640 Kbytes
Freq 120 MHz
FoodReco quant h5
deriv MobileNet
FP-AI-VISION1
177 Kbytes 161 Kbytes 487 ms 25 mA 40 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32L4R5 LDO
NUCLEO-L4R5ZI
Flash 2 Mbytes
Single Bank
RAM 640 Kbytes
Freq 120 MHz
Person Presence
MobileNet v2 128
FP-AI-VISION1
542 Kbytes 261 Kbytes 696 ms 26 mA 60 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32L4R5 LDO
NUCLEO-L4R5ZI
Flash 2 Mbytes
Single Bank
RAM 640 Kbytes
Freq 120 MHz
Anomaly Detection
MLPerf™Tiny
282 Kbytes 6.87 Kbytes 7.6 ms 21.56 mA 0.54 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32L4R5 LDO
NUCLEO-L4R5ZI
Flash 2 Mbytes
Single Bank
RAM 640 Kbytes
Freq 120 MHz
Key Word Spotting
MLPerf™Tiny
85 Kbytes 25 Kbytes 96.5 ms 22 mA 7mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32L4R5 LDO
NUCLEO-L4R5ZI
Flash 2 Mbytes
Single Bank
RAM 640 Kbytes
Freq 120 MHz
Image Classif.
MLPerf™Tiny
142 Kbytes 50 Kbytes 244 ms 32.5 mA 26 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32L4R5 LDO
NUCLEO-L4R5ZI
Flash 2 Mbytes
Single Bank
RAM 640 Kbytes
Freq 120 MHz
Visual Wake Word
MLPerf™Tiny
302 Kbytes 58 Kbytes 233 ms 28 mA 21 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32G474 LDO
NUCLEO-G474REI
Flash 512 Mbytes
RAM 128 Kbytes
Freq 170 MHz
Anomaly Detection
MLPerf™Tiny
282 Kbytes 6.87 Kbytes 5.15 ms 30 mA 0.52 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32G474 LDO
NUCLEO-G474REI
Flash 512 Mbytes
RAM 128 Kbytes
Freq 170 MHz
Key Word Spotting
MLPerf™Tiny
85 Kbytes 25 Kbytes 65.9 ms 31.8 mA 6.9 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32G474 LDO
NUCLEO-G474REI
Flash 512 Mbytes
RAM 128 Kbytes
Freq 170 MHz
Image Classif.
MLPerf™Tiny
142 Kbytes 50 Kbytes 171 ms 32.6 mA 18.42 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
STM32G474 LDO
NUCLEO-G474REI
Flash 512 Mbytes
RAM 128 Kbytes
Freq 170 MHz
Visual Wake Word
MLPerf™Tiny
302 Kbytes 58 Kbytes 162 ms 31.5 mA 16.8 mJ STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0

The following table is providing the average current consumption of the model listed in the table above table (excluding the Anomaly Detection model which has a specific topology). These data can be used as a first estimation of the current consumption and the energy consumption of a new model from just the measurement of its inference time. From the average inference time of t second and the average current of i Ampere for a given input voltage of u Volt. The average energy is easily computed as (t x i x u) in Joule

STM32 Board STM32U585
160 MHz SMPS
STM32L4R5
120 MHz LDO
Single Bank
STM32G474
170 MHz LDO
Average
current (mA)
9.3 26.5 31.7


STM32Cube.AI (X-CUBE-AI) can also generate a TensorFlow™ Lite for Microcontroller (TFLm) runtime implementation (based on TensorFlow™ version 2.7.0 sha-1 = 86c8d52 for STM32Cube.AI v7.2.0). The following table is comparing the TFLm runtime to the X-CUBE-AI runtime, the Flash and RAM footprints include the code / runtime footprint on top of the weights and activation buffer.

2 SMPS vs LDO

Inference time, memory footprint and energy for SMPS and LDO power configuration at 3.3V :

STM32 Board STM32
characteristics
Model
Source/Link
PWR
config
Cur.
(mA)
Energy
(mJ)
Proc
Time
(ms)
Version
STM32U585
NUCLEO-U575ZI-Q
Flash 2 Mbytes
RAM 786 Kbytes
Freq 160 MHz
Image Classif.
MLPerf™Tiny
SMPS 9.48 4.6 148 STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
LDO 18 8.8 148 STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
Visual Wake Word
MLPerf™Tiny
SMPS 9.29 4.2 134 ms STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0
LDO 17.58 7.79 134 ms STM32Cube.AI 7.2.0
STM32CubeIDE 1.9.0


3 Measurement process

On this performance only the Machine Learning model inference processing is reported. In a complete application, the sensor acquisition, the data conditioning and pre-processing must also be considered.

The STM32 Board column indicates the STM32 reference and the board used for measurement. By default, the STM32 is configured in maximum performance configuration, so with maximum frequency and especially HCLK / AXI clock at maximal frequency. When a different setting is used it is specified (for instance lower frequency to use a different Voltage Scale or for STM32H7, lower HCLK/AXI frequency). Many STM32 embed a powerful switched-mode power supply (SMPS) that can be used to improve power efficiency when the supply voltage is high enough. When used instead of the integrated low-dropout regulator (LDO), power consumption is optimized by a factor equal to the ratio of the internal VCORE supply voltage to the VDD voltage. The improvement due to the SMPS depends only upon the SMPS efficiency and the VDD voltage. When SMPS is indicated it means that the internal voltage regulator used is the SMPS step-down converter instead of the LDO.

The STM32 Characteristics column provides the available internal Flash size, the full internal RAM size and the frequency. The RAM size includes the different kind of memories and banks, TCM, SRAM etc. For the time being, the buffers used by X-CUBE-AI must be placed in a continuous memory area, the maximal RAM size available in continuous area is provided between "()" if not equal to the full size. The frequency indicated is the operating frequency used for the test, so generally the maximal frequency. The only different case is with the STM32H747 Discovery kit (STM32H747I-DISCO), which is operating by default in SMPS power mode and therefore is limited to 400 MHz instead of 480 MHz. Data are rounded to 3 decimals.

The column Model Source/Links indicates the pre-trained ML model and the source, either how it was built / trained or where it can be downloaded. tfl stands for TensorFlow™ Lite .tflite model , h5 stands for Keras .h5 model, quant for quantized models on 8 bits. For FP-AI-VISION1 models, they are located in the package directory: FP-AI-VISION1_V3.0.0\Utilities\AI_resources.

The memory footprints are the one reported by X-CUBE-AI using the "Analyze" function (the version of X-CUBE-AI used is mentioned in the table).
The column Flash reports the Flash occupancy including the model weights, the runtime code generated by X-CUBE-AI to run the neural network and its constants (including the initialized tables).
The column RAM reports the RAM buffers occupancy, used to store the model activations as well as input and output buffers, and the RAM required by the runtime to inference the model. Note that to gain RAM space the "Use activation buffer for input buffer" and "Use activation buffer for the output buffer" options are selected (through X-CUBE-AI Advanced Settings panel).
For X-CUBE-AI runtime, the total Flash and RAM memory footprints are reported after an "Analyze" operation on the main panel by the fields Used Flash and Used RAM. The compiler used is gcc embedded in STM32CubeIDE. Limitation: on X-CUBE-AI version 7.1 and below, the STM32U5 memory footprints reported in this fields are not integrating the runtime/code parts (the bug will be fixed in version 7.2), but will be identical to the one obtained with the NUCLEO-L552ZE-Q (based also on a Cortex®-M33).
For TensorFlow™ Lite for microcontroller runtime, the Flash and RAM memory footprints related to the runtime/code execution are computed from the memory map of the validation project of the given model built with STM32CubeIDE. The runtime/code part is computed taking into account all the modules used by tflite_micro. The STM32CubeIDE build options for TensorFlow™ Lite for microcontroller are the optimal ones (best compromise between speed and code size), -Ofast for GCC compiler and -Osize for G++ Compiler.

The column Proc Time reports the model inference processing time. When the current / energy is indicated, the measure is obtained through the X-CUBE-AI "System Performance" application following the process described on this WiKi article on power measurement. Otherwise the "Validation on target" application is used. In all cases, when generating the application, the selected clock source is always the HSI, X-CUBE-AI is generating first the optimal clock settings and eventually afterwards the clock is set to HSI. STM32CubeMX then autonomously reconfigures the clock settings.

Cur. and Energy is the current and energy computed following the process described in the WiKi article on power measurement. For STM32 Ultra Low Power microcontrollers, measurement is done with the X-NUCLEO-LPM01A power shield as described in the section 4.3.1 "Measure process when current is below 50 mA". For STM32 High Performance microcontrollers, measurement is done with the Qoitec Otii Arc power analyzer as described in the section 4.3.2 "Measure process when current is above 50 mA". In both cases, a 10 s window is used for averaging) and HSI is selected as the clock source.

Accuracy is not reported. X-CUBE-AI is not modifying the DL/ML model topology. The impact on accuracy should be limited. X-CUBE-AI is providing through the "Validation" application a way to measure the accuracy either on x86 or on the target. It can be used to check the eventual impact on accuracy. When running the "Validation on target" application several metrics are computed, one of them is the X-Cross providing error metrics between the original model executed in Python™ and the C model executed on the target. Random data can be used to compute the RMSE/MAE/L2R errors, however it is recommended to use true data to get the final accuracy. For more details on the metrics, refer to the X-CUBE-AI Embedded Documentation.

Note that accuracy check is important to compare a float model with a quantize model or when using the Weight compression feature of X-CUBE-AI for float models.

4 STMicroelectronics references

  1. MLPerf™ name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”
  2. X-CUBE-AI Expansion Package
  3. SLA0048 software license agreement
  4. DB3788 product data brief

See also: