Literature
- UM2237 STM32CubeProgrammer software description
- AN5054 Secure programming using STM32CubeProgrammer
- BootRom wiki article
- Secure Boot wiki article
- Security features for STM32N6MCUs wiki article
Prerequisites
- Hardware
- STM32N6 discovery board
- Discovery MB1860- STM32N6 (need USBC cable)
- Required tools
- IAR: v9.40.1 + IAR patch to support STM32N6 (delivered with V0.5.0) + IAR Patch EWARMv9_STM32N6xx_V0.6.2
- IAR patch is available in the STM32CubeFW: STM32Cube_FW_N6_Vx.x.x\Utilities\PC_Software
- STM32CubeProgrammer version 2.18.0
- STM32CubeIDE g. v.1.17.0+
- python>=3.9
- pyserial>=3.5
- protobuf>=3.20.3
- tqdm>=4.64
- IAR: v9.40.1 + IAR patch to support STM32N6 (delivered with V0.5.0) + IAR Patch EWARMv9_STM32N6xx_V0.6.2
- STEdgeAI
- Download the STEdgeAI package
1. Introduction
This article provides a step-by-step guide to encrypting neural network weights for deployment on STM32N6 platforms equipped with the Neural-ART accelerator. Weight encryption enhances the security of your AI models by protecting sensitive parameters during storage and transfer.
The encryption process relies on a protobuf-defined communication protocol between a host computer and the embedded target.
After setting encryption parameters such as keys and the number of rounds, unencrypted weights are sent to the board, encrypted by the Neural-ART hardware, and returned to the host in encrypted form.
2. Generate the specialized c-files using Neural-ART compiler
The Generate phase corresponds to the first key step in the complete process of encrypting the weights of a neural network intended to run on the STM32N6 platform with the Neural-ART accelerator.
This step goal is to transform a high-level neural network model (e.g., TensorFlow Lite format) into low-level code and data files that can be executed efficiently on the target hardware with Neural-ART acceleration.
It prepares the raw data (weights) that will be encrypted in the following step.
2.1. Inputs
- Model File ( .tflite or .onnx)
This is the neural network model file containing the network architecture and trained weights.
For this example, the Tflite model delivered in STEdgeAI package is used: C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite.
- Target Specification
Specifies the target hardware platform (STM32N6 MCU), enabling the compiler to optimize the code accordingly.
- Compiler Profile
A JSON or configuration file defines compilation options, including optimization flags, memory layouts, and optional parameters.
To support the encryption of the weights, the NPU compiler provides a specific option --encrypt-weights which generates the extra code needed to configure the stream engines to fetch the encrypted data and decrypt them on-the-fly for the processing units. Only three or four cycles of latency are added.
With this option, all weights/parameters regions are considered encrypted by the NPU compiler.
Importing the mpool in the json profile file is crucial to define and manage the memory layout for neural network weights and parameters. It ensures correct memory referencing, supports encryption workflows, and enables efficient use of embedded memory.
To enable the encryption, a profile with the option" --encrypt-weights" must be created in a json file.
In C:\ST\STEdgeAI\2.2\scripts\N6_encrypt folder create a new file neural_art_encrypt.json and copy paste the code below:
{
"Profiles": {
// Automatic search of best options + weight encryption option
"profile" : {
"memory_pool": "../../Utilities/windows/targets/stm32/resources/mpools/stm32n6.mpool",
"options": "--encrypt-weights --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto"
}
}
}
Save the file.
- Additional Options
Flags or parameters to customize the generation process: enabling Neural-ART support with --st-neural-art.
2.2. Command line usage example and generate process
From folder: C:\ST\STEdgeAI\2.2\Utilities\windows, launch
stedgeai.exe generate -m ..\..\scripts\N6_scripts\models\mnist_int8_io_i8.tflite --target stm32n6 --st-neural-art profile@"..\..\scripts\N6_encrypt\neural_art_encrypt.json"
The tool
- parses the input .tflite model, checks for compatibility, and validates the network structure.
- Generates C source files implementing the neural network inference logic, tailored to the target hardware and acceleration features.
- Extracts the trained weights from the model and formats them into raw binary initializers (.raw files) suitable for loading into target memory.
- Produces auxiliary file
c_info.jsonthat describe the memory layout, weight locations, and other relevant information needed for subsequent steps like encryption and flashing. - If the
--encrypt-weightsoption is set, the generation process prepares the code and metadata to support encrypted weights, marking regions for encryption.
Success message:
C:\ST\STEdgeAI\2.2\Utilities\windows>stedgeai.exe generate -m ..\..\scripts\N6_scripts\models\mnist_int8_io_i8.tflite --target stm32n6 --st-neural-art profile@"..\..\scripts\N6_encrypt\neural_art_encrypt.json"
ST Edge AI Core v2.2.0-20266 2adc00962
>>>> EXECUTING NEURAL ART COMPILER
C:/ST/STEdgeAI/2.2/Utilities/windows/atonn.exe -i "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/mnist_int8_io_i8_OE_3_3_0.onnx" --json-quant-file "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/mnist_int8_io_i8_OE_3_3_0_Q.json" -g "network.c" --load-mdesc "C:/ST/STEdgeAI/2.2/Utilities/configs/stm32n6.mdesc" --load-mpool "C:/ST/STEdgeAI/2.2/Utilities/windows/targets/stm32/resources/mpools/stm32n6.mpool" --save-mpool-file "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_ws/neural_art__network/stm32n6.mpool" --out-dir-prefix "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_ws/neural_art__network/" --encrypt-weights --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto --output-info-file "c_info.json" --d-auto 1
--Oauto will optimize the options: max-ca-pipe values = [2, 4], alt-scheduler values = [false, true], experimental values = [false, true], conv-split-cw values = [false, true], conv-split-kw values = [false, true], conv-split-stripe-1x1 values = [false, true], O_level values = [0, 2, 4]
--Oauto best solution found: max-ca-pipe = 2, alt-scheduler = false, experimental = false, conv-split-cw = false, conv-split-kw = false, conv-split-stripe-1x1 = false, O_level = 0
<<<< DONE EXECUTING NEURAL ART COMPILER
Exec/report summary (generate)
------------------------------------------------------------------------------------------------------------------
model file : C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite
type : tflite
c_name : network
options : allocate-inputs, allocate-outputs
optimization : balanced
target/series : stm32n6npu
workspace dir : C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws
output dir : C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output
model_fmt : ss/ss per tensor
model_name : mnist_int8_io_i8
model_hash : 0xbe0a77fcd37f19c0f475d4e7bc5e94fc
params # : 20,410 items (20.00 KiB)
------------------------------------------------------------------------------------------------------------------
input 1/1 : 'Input_0_out_0', int8(1x28x28x1), 784 Bytes, QLinear(0.003921569,-128,int8), activations
output 1/1 : 'Quantize_12_out_0', int8(1x10), 10 Bytes, QLinear(0.003906250,-128,int8), activations
macc : 0
weights (ro) : 20,625 B (20.14 KiB) (1 segment) / -61,015(-74.7%) vs float model
activations (rw) : 4,065 B (3.97 KiB) (1 segment) *
ram (total) : 4,065 B (3.97 KiB) = 4,065 + 0 + 0
------------------------------------------------------------------------------------------------------------------
(*) 'input'/'output' buffers are allocated in the activations buffer
Computing AI RT data/code size (target=stm32n6npu)..
-> compiler "gcc:arm-none-eabi-gcc" is not in the PATH
Compilation details
---------------------------------------------------------------------------------
Compiler version: 1.1.1-14
Compiler arguments: -i C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0.onnx --json-quant-file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0_Q.json -g network.c --load-mdesc C:\ST\STEdgeAI\2.2\Utilities\configs\stm32n6.mdesc --load-mpool C:\ST\STEdgeAI\2.2\Utilities\windows\targets\stm32\resources\mpools\stm32n6.mpool --save-mpool-file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws\neural_art__network\stm32n6.mpool --out-dir-prefix C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws\neural_art__network/ --encrypt-weights --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto --output-info-file c_info.json --d-auto 1
====================================================================================
Memory usage information (input/output buffers are included in activations)
---------------------------------------------------------------------------------
flexMEM [0x34000000 - 0x34000000]: 0 B / 0 B ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
cpuRAM1 [0x34064000 - 0x34064000]: 0 B / 0 B ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
cpuRAM2 [0x34100000 - 0x34200000]: 0 B / 1.000 MB ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
npuRAM3 [0x34200000 - 0x34270000]: 0 B / 448.000 kB ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
npuRAM4 [0x34270000 - 0x342E0000]: 0 B / 448.000 kB ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
npuRAM5 [0x342E0000 - 0x34350000]: 3.970 kB / 448.000 kB ( 0.89 % used) -- weights: 0 B ( 0.00 % used) activations: 3.970 kB ( 0.89 % used)
npuRAM6 [0x34350000 - 0x343C0000]: 0 B / 448.000 kB ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
octoFlash [0x71000000 - 0x78000000]: 20.142 kB / 112.000 MB ( 0.02 % used) -- weights: 20.142 kB ( 0.02 % used) activations: 0 B ( 0.00 % used)
hyperRAM [0x90000000 - 0x92000000]: 0 B / 32.000 MB ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
Total: 24.111 kB -- weights: 20.142 kB activations: 3.970 kB
====================================================================================
Used memory ranges
---------------------------------------------------------------------------------
npuRAM5 [0x342E0000 - 0x34350000]: 0x342E0000-0x342E0FF0
octoFlash [0x71000000 - 0x78000000]: 0x71000000-0x710050A0
====================================================================================
Epochs details
---------------------------------------------------------------------------------
Total number of epochs: 16 of which 1 implemented in software
epoch ID HW/SW/EC Operation (SW only)
epoch 1 HW
epoch 2 HW
epoch 3 HW
epoch 4 HW
epoch 5 HW
epoch 6 HW
epoch 7 HW
epoch 8 HW
epoch 9 HW
epoch 10 HW
epoch 11 HW
epoch 12 HW
epoch 13 HW
epoch 14 HW
epoch 15 HW
epoch 16 -SW- ( Softmax )
====================================================================================
Generated files (5)
------------------------------------------------------------------------------------
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0.onnx
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0_Q.json
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network.c
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network.h
Creating txt report file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_generate_report.txt
elapsed time (generate): 36.857s
2.3. Outputs
- C Source Files (*.c, *.h)
Contain the inference engine code, ready to be compiled and linked for STM32N6-Neural-ART-c-project. network.c and network.h files are generated in the output default folder
- Raw Weight Initializer Files (*.raw)
Binary files containing the unencrypted neural network weights, organized for direct memory loading. contain memory initializers for weights (and activations) that shall be programmed to memory before doing an inference. network_atonbuf.xSPI2.raw
- Intermediate files (*.onnx, *.json)
Files describing the memory layout and locations of weights and parameters, essential for encryption and flashing tools. They can be used later with the ST Neural-ART compiler.
Intermediate files created during this step-by-step:
- mnist_int8_io_i8_OE_3_3_0.onnx
- Represents the complete neural network model, including layer structure, weights, biases, and model parameters.
- A standard interoperable format for exchanging models between different AI frameworks
- mnist_int8_io_i8_OE_3_3_0_Q.json
- Contains the quantization parameters of the model, such as scales, zero points, and other metadata related to int8 quantization.
- Describes how weights and activations have been quantized to reduce size and optimize performance.
- network_c_info.json
- Memory locations of weights and parameters
- Layout of input/output buffers
- Sizes of various memory regions
- Details required for encryption and memory loading
- Generation Report: The network_validate_report.txt provides the main information about the imported model and how it is deployed: A summary about the usage of the memory, used options, epoch types..
Below the output files generated in defaultst_ai_output folder:
3. Weights encryption
For this step it is required to have a STM32N6 board connected to the computer.
A nucleo board or a discovery board can be used.
To perform weight encryption STedgeAI provides
- an embedded firmware in STEdgeAI\2.2\scripts\N6_encrypt\c\Weights_encryption: used to receive orders from the host computer, process the data
and send it back as fast as possible to the host computer.
- a python script that send unencrypted weights to the STM32N6 and receive weights encrypted by the Neural-ART accelerator integrated in the STM32N6 board.
3.1. Weights_encryption embedded firmware
This firmware
- Implement hardware encryption of weights
- This firmware leverages the Neural-ART accelerator integrated in the STM32N6 board to perform hardware-based encryption of the neural network weights. Encryption in the ST NPU is performed by an integrated hardware module that enables fast and transparent encryption and decryption of neural network weights. This mechanism is symmetric, involutive (encrypting twice with the same parameters returns the original data), and depends on the memory address, which prevents simple copying of encrypted weights in memory.
- The file
ll_aton_cipher.cis a low-level driver provided in the Neural-ART SDK that allows interaction with this hardware encryption module. It offers functions to:- Configure encryption keys on bus interfaces (
LL_Busif_SetKeys). - Initialize and start encrypted or decrypted DMA transfers (
LL_DmaCypherInit). - Manage encryption parameters such as the number of rounds, encryption ID, and activation masks. This driver is used in the embedded firmware to ensure transparent decryption of weights during inference, as well as in encryption tools to encrypt weights before flashing.
- Configure encryption keys on bus interfaces (
- Manage communication between the PC and the board
- It handles communication (via UART or debug interface) with the Python script, enabling reliable transfer of data to be encrypted and returning the encrypted data.
- It synchronizes data transfer, encryption, and reception operations.
- It receives the unencrypted weights sent by the Python script, encrypts them in real-time using the hardware, and sends the encrypted weights back to the PC.
- Ensure consistency and security
- By delegating encryption to the embedded hardware, this firmware guarantees that weights are encrypted according to the NPU’s requirements.
- It avoids errors related to software or off-target encryption.
The base program was generated using CubeMX. (see the .ioc file for reference) → "FSBL" and "appli" projects are generated, but only the FSBL
project is used in this example.
From C:\ST\STEdgeAI\2.2\scripts\N6_encrypt\c\Weights_encryption,
- double-click on file .project to open STM32CubeIDE project dedicated to
- send raw weights from the computer to the board - raw weights are encrypted by the NPU
- sends the encrypted weights back from the board to the computer.
- Stm32CubeIDE tool is automatically opened, and you can visualize the delivered code.
The build and program will automatically be done during next step by python script.
3.2. Pyton script: Command line usage example and encryption process
Below a scheme to overview the process:
The python script needs to load the embedded firmware to the STM32 board. The board must be set in dev boot configuration, and connected to the computer.
From C:\ST\STEdgeAI\2.2\scripts\N6_encrypt\python, open a command window and launch the python script:
python end_to_end_encrypt.py --cubeide C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE --postprocess ..\..\..\Utilities\windows\st_ai_output\network_c_info.json ..\..\..\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw
Parameters
--cubeide C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE- Specifies the path to STM32CubeIDE, the IDE/toolchain used to program and communicate with the STM32 board.
- The script uses this path to launch the IDE’s debugging tools (e.g., GDB server) to upload and interact with the encryption firmware on the board.
--postprocess- Indicates that after encryption, the script should perform post-processing steps.
- This usually involves backing up the original unencrypted
.rawfile and replacing it with the encrypted version. - It may also update related metadata or configuration files to reflect the encryption status.
..\..\..\Utilities\windows\st_ai_output\network_c_info.json- The metadata file generated during the model generation step.
- Contains information about memory layout, locations of weights, and other critical data needed to correctly encrypt the right portions of the weight file.
..\..\..\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw- The raw weight file containing unencrypted neural network weights.
- This file is the input data that will be encrypted by the hardware on the STM32 board.
Python script process flow:
- Launch STM32CubeIDE tools
- The script starts the STM32CubeIDE debugging environment to connect to the STM32 board.
- Load encryption firmware
- The encryption firmware is loaded onto the STM32 board. This firmware uses the Neural-ART hardware encryption engine.
- Send raw weights to the board
- The script reads the
.rawweight file and sends it to the STM32 board via the debug interface (over ST-Link).
- The script reads the
- Hardware encryption on the board
- The Neural-ART accelerator on the STM32 board encrypts the weights in real-time. The encryption key is sent via encrypt_neural_art.py script.
- There are 12 rounds that are a compromis between security and latency.
- Receive encrypted weights
- The encrypted data is sent back to the host computer and reassembled into a complete encrypted
.rawfile.
- The encrypted data is sent back to the host computer and reassembled into a complete encrypted
- Post-processing
- The original unencrypted
.rawfile is backed up (renamed with. unencrypted extension). - The encrypted
.rawfile replaces the original file, ready for flashing onto the target device.
- The original unencrypted
Command line result:
C:\ST\STEdgeAI\2.2\scripts\N6_encrypt\python>python end_to_end_encrypt.py --cubeide C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE --postprocess ..\..\..\Utilities\windows\st_ai_output\network_c_info.json ..\..\..\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw
16:58:12.724 :: cubeIDE_toolbox.py :: INFO :: Resetting the board
16:58:13.362 :: cubeIDE_toolbox.py :: INFO :: Starting GDB server
16:58:13.381 :: cubeIDE_toolbox.py :: INFO :: Starting GDB client
16:58:14.369 :: end_to_end_encrypt.py :: INFO :: Waiting for the firmware to initialize
16:58:15.370 :: end_to_end_encrypt.py :: INFO :: Starting encryption script
16:58:15.370 :: encrypt_neural_art.py :: INFO :: Parsing c_info file
16:58:15.375 :: encrypt_neural_art.py :: INFO :: Memory pool to encrypt found at address: 0x71000000 -- 20.142 kBytes to encrypt at offset 0
16:58:15.415 :: encrypt_neural_art.py :: INFO :: Starting encryption
16:58:15.416 :: encrypt_neural_art.py :: INFO :: Sending encryption params: keys = (MSB:0xaabbccddaabbccdd)(LSB:0xaabbccddaabbccdd) -- nb_rounds = 12
16:58:15.632 :: encrypt_neural_art.py :: INFO :: Data transfer finished -- Took 0.200 seconds -- size = 20.142kB -- Encryption rate: 100.708kB/s
16:58:15.635 :: encrypt_neural_art.py :: INFO :: Encrypted data injected into network_atonbuf.xSPI2_encrypted.raw
16:58:15.635 :: encrypt_neural_art.py :: INFO :: Done
16:58:15.637 :: end_to_end_encrypt.py :: INFO :: Postprocessing the files
16:58:15.637 :: end_to_end_encrypt.py :: INFO :: Backup of original unencrypted weights: network_atonbuf.xSPI2.raw -> network_atonbuf.xSPI2.unencrypted
16:58:15.637 :: end_to_end_encrypt.py :: INFO :: Replacing original file with encrypted weights: network_atonbuf.xSPI2_encrypted.raw -> network_atonbuf.xSPI2.raw
16:58:15.646 :: end_to_end_encrypt.py :: INFO :: Done
Below the output files generated in default st_ai_output folder:
4. Loading encrypted weights into the target
This script is a loader and builder utility.
It automates a process that
- Configure firmware project for the STM32N6 platform,
- Copy generated neural network source files and related memory dumps into the project,
- Convert raw memory files to HEX format,
- Program the STM32N6 board's flash and RAM with these files,
- Build and flashing the firmware,
- Run the program on the target board.
The weights shall be encrypted (using the tools from this article) before being flashed into the final board.
From C:\ST\STEdgeAI\2.2\scripts\N6_scripts, adapt config.json file to your working environment and save it:
{
// Set Compiler_type to either gcc or iar
"compiler_type": "iar",
// Path to IAR directory (ends up in bin/)
"iar_binary_path": "C:/Program Files/IAR Systems/Embedded Workbench 9.2/common/bin/",
// Path to CubeIDE directory (ends up in STM32CubeIDE)
"cubeide_path":"C:/ST/STM32CubeIDE_1.18.1/STM32CubeIDE"
}
Adapt config_n6l.json file to your working environment:
// This file is for configuring N6 loader, the util used to copy STEdgeAI outputs into a project,
// compile the project, and load the results on the board.
{
// The 2lines below are _only used if you call n6_loader.py ALONE (memdump is optional and will be the parent dir of network.c by default)
"network.c": "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/network.c",
//"memdump_path": "C:/Users/foobar/CODE/stm.ai/stm32ai_output",
// Location of the "validation" project + build config name to be built (if applicable)
"project_path": "C:/ST/STEdgeAI/2.2/Projects/STM32N6570-DK/Applications/NPU_Validation",
// If using the NPU_Validation project, valid build_conf names are "N6-DK", "N6-DK-USB", "N6-Nucleo", "N6-Nucleo-USB"
"project_build_conf": "N6-DK",
// Skip programming weights to earn time (but lose accuracy) -- useful for performance tests
"skip_external_flash_programming": false,
"skip_ram_data_programming": false
}
From C:\ST\STEdgeAI\2.2\scripts\N6_scripts, launch:
python n6_loader.py --n6-loader-config config_n6l.json
Result:
C:\ST\STEdgeAI\2.2\scripts\N6_scripts>python n6_loader.py --n6-loader-config config_n6l.json
12/01/2025 04:48:02 PM __main__ -- Preparing compiler IAR
12/01/2025 04:48:02 PM __main__ -- Setting a breakpoint in main.c at line 137 (before the infinite loop)
12/01/2025 04:48:02 PM __main__ -- Copying network.c to project: -> C:\ST\STEdgeAI\2.2\Projects\STM32N6570-DK\Applications\NPU_Validation\X-CUBE-AI\App\network.c
12/01/2025 04:48:02 PM __main__ -- Extracting information from the c-file
12/01/2025 04:48:02 PM __main__ -- Converting memory files in results/<model>/generation/ to Intel-hex with proper offsets
12/01/2025 04:48:02 PM __main__ -- arm-none-eabi-objcopy.exe --change-addresses 0x71000000 -Ibinary -Oihex network_atonbuf.xSPI2.raw network_atonbuf.xSPI2.hex
12/01/2025 04:48:02 PM __main__ -- Resetting the board...
12/01/2025 04:48:04 PM __main__ -- Flashing memory xSPI2 -- 20.625 kB
12/01/2025 04:48:05 PM __main__ -- Building project (conf= N6-DK)
12/01/2025 04:48:07 PM __main__ -- Loading internal memories & Running the program
12/01/2025 04:48:18 PM __main__ -- Start operation achieved successfully
The generated network implementation is properly integrated to the project
The weights are correctly converted and programmed into memory.
5. Validate (optional)
The stedgeai validate execution validates that:
- The model runs correctly on the target hardware.
- The inference results meet expected accuracy and correctness.
- Data communication and loading are reliable.
- No critical errors occur during execution.
From folder: C:\ST\STEdgeAI\2.2\Utilities\windows, launch
stedgeai.exe validate -m C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite --target stm32n6 --mode target --desc serial:921600 --val-json st_ai_output/network_c_info.json
Below the result:
C:\ST\STEdgeAI\2.2\Utilities\windows>stedgeai.exe validate -m C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite --target stm32n6 --mode target --desc serial:921600 --val-json st_ai_output/network_c_info.json
ST Edge AI Core v2.2.0-20266 2adc00962
Setting validation data...
generating random data, size=10, seed=42, range=(0, 1)
I[1]: (10, 28, 28, 1)/float32, min/max=[0.000012, 0.999718], mean/std=[0.495359, 0.288935]
c/I[1] conversion [Q(0.00392157,-128)]-> (10, 28, 28, 1)/int8, min/max=[-128, 127], mean/std=[-1.680740, 73.678841]
m/I[1] conversion [Q(0.00392157,-128)]-> (10, 28, 28, 1)/int8, min/max=[-128, 127], mean/std=[-1.680740, 73.678841]
no output/reference samples are provided
Running the TFlite model...
PASS: 0%| | 0/24 [00:04<?, ?it/s]INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Running the ST.AI c-model (AI RUNNER)...(name=network, mode=TARGET)
Proto-buffer driver v2.0 (msg v3.1) (Serial driver v1.0 - COM6:921600) ['network']
Summary 'network' - ['network']
--------------------------------------------------------------------------------------------------------------------------------------------
I[1/1] 'Input_0_out_0' : int8[1,28,28,1], 784 Bytes, QLinear(0.003921569,-128,int8), activations
O[1/1] 'Quantize_12_out_0' : int8[1,10], 10 Bytes, QLinear(0.003906250,-128,int8), activations
n_nodes : 15
compile_datetime : Dec 1 2025 15:54:18
--------------------------------------------------------------------------------------------------------------------------------------------
protocol : Proto-buffer driver v2.0 (msg v3.1) (Serial driver v1.0 - COM6:921600)
tools : ST Neural ART (LL_ATON api) v1.1.1
runtime lib : atonn-v1.1.1-14-ge619e860 (optimized SW lib v10.1.0-ae536891 IAR)
capabilities : IO_ONLY, PER_LAYER, PER_LAYER_WITH_DATA
device.desc : stm32 family - 0x486 - STM32N6xx @800/400MHz
device.attrs : fpu,core_icache,core_dcache,npu_cache=1,mcu_freq=800MHz,noc_freq=400MHz,npu_freq=1000MHz,nic_freq=900MHz
--------------------------------------------------------------------------------------------------------------------------------------------
Warning: C-network signature checking has been skipped
ST.AI Profiling results v2.0 - "network"
------------------------------------------------------------
nb sample(s) : 10
duration : 0.241 ms by sample (0.239/0.256/0.005)
CPU cycles : [55,692 114,285 22,686]
------------------------------------------------------------
Inference time per node
---------------------------------------------------------------------------------------------------------------
c_id m_id type dur (ms) % cumul CPU cycles name
---------------------------------------------------------------------------------------------------------------
0 - epoch 0.006 2.7% 2.7% [ 3,160 910 1,063 ] EpochBlock_2
1 - epoch 0.008 3.3% 5.9% [ 3,661 1,289 1,318 ] EpochBlock_3
2 - epoch 0.006 2.3% 8.2% [ 2,580 982 924 ] EpochBlock_4
3 - epoch 0.005 2.3% 10.5% [ 2,517 989 884 ] EpochBlock_5
4 - epoch 0.028 11.7% 22.2% [ 12,472 7,459 2,541 ] EpochBlock_6
5 - epoch 0.005 2.3% 24.4% [ 2,595 838 903 ] EpochBlock_7
6 - epoch 0.074 30.9% 55.3% [ 3,244 55,070 1,222 ] EpochBlock_8
7 - epoch 0.005 2.3% 57.6% [ 2,586 873 932 ] EpochBlock_9
8 - epoch 0.005 2.3% 59.9% [ 2,516 943 915 ] EpochBlock_10
9 - epoch 0.068 28.2% 88.1% [ 10,160 41,341 2,876 ] EpochBlock_11
10 - epoch 0.005 2.2% 90.3% [ 2,502 873 903 ] EpochBlock_12
11 - epoch 0.005 2.2% 92.5% [ 2,477 872 917 ] EpochBlock_13
12 - epoch 0.005 2.2% 94.8% [ 2,516 886 918 ] EpochBlock_14
13 - epoch 0.005 2.2% 97.0% [ 2,518 891 914 ] EpochBlock_15
14 - epoch (SW) 0.007 3.0% 100.0% [ 188 69 5,456 ] EpochBlock_16
---------------------------------------------------------------------------------------------------------------
n/a n/a Inter-nodal 0.000 0.0% 100.0% n/a
---------------------------------------------------------------------------------------------------------------
total 0.241 [ 55,692 114,285 22,686 ]
4151.94 inf/s [ 28.9% 59.3% 11.8% ]
---------------------------------------------------------------------------------------------------------------
Statistic per tensor
----------------------------------------------------------------------------------------
tensor # type[shape]:size min max mean std name
----------------------------------------------------------------------------------------
I.0 10 i8[1,28,28,1]:784 -128 127 -1.681 73.679 Input_0_out_0
O.0 10 i8[1,10]:10 -128 127 -102.440 71.444 Quantize_12_out_0
----------------------------------------------------------------------------------------
Saving validation data...
output directory: C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output
creating C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_val_io.npz
m_outputs_1: (10, 10)/int8, min/max=[-128, 127], mean/std=[-102.430000, 71.446799], nl_4
c_outputs_1: (10, 10)/int8, min/max=[-128, 127], mean/std=[-102.440000, 71.444149], scale=0.003906250 zp=-128, nl_4
Computing the metrics...
Cross accuracy report #1 (reference vs C-model)
----------------------------------------------------------------------------------------------------
notes: - r/int8 data are dequantized with s=0.00390625 zp=-128
- p/int8 data are dequantized with s=0.00390625 zp=-128
- the output of the reference model is used as ground truth/reference value
- 10 samples (10 items per sample)
acc=100.00% rmse=0.000390625 mae=0.000039062 l2r=0.001317892 mean=0.000039 std=0.000391 nse=0.999998 cos=0.999999
Confusion matrix (axis=-1) - 10 classes (10 samples)
----------------------------------------------------------
C0 0 . . . . . . . . .
C1 . 0 . . . . . . . .
C2 . . 5 . . . . . . .
C3 . . . 0 . . . . . .
C4 . . . . 0 . . . . .
C5 . . . . . 4 . . . .
C6 . . . . . . 1 . . .
C7 . . . . . . . 0 . .
C8 . . . . . . . . 0 .
C9 . . . . . . . . . 0
Evaluation report (summary)
----------------------------------------------------------------------------------------------------------------------
Output acc rmse mae l2r mean std nse cos tensor
----------------------------------------------------------------------------------------------------------------------
X-cross #1 100.00% 0.000390625 0.000039062 0.001317892 0.000039 0.000391 0.999998 0.999999 nl_4
----------------------------------------------------------------------------------------------------------------------
acc : Accuracy (class, axis=-1)
rmse : Root Mean Squared Error
mae : Mean Absolute Error
l2r : L2 relative error
mean : Mean error
std : Standard deviation error
nse : Nash-Sutcliffe efficiency criteria, bigger is better, best=1, range=(-inf, 1]
cos : COsine Similarity, bigger is better, best=1, range=(0, 1]
Creating txt report file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_validate_report.txt
elapsed time (validate): 11.738s
Sum up of this report:
Context and Configuration
- Tool used: ST Edge AI Core v2.2.0
- Validated model: mnist_int8_io_i8.tflite (int8 quantized model)
- Target hardware: STM32N6 (CPU frequency 800MHz, NPU 1000MHz)
- Execution mode: Target (running on STM32N6 hardware via serial driver COM6 at 921600 baud)
- Validation data: 10 randomly generated samples (size 28x28x1, float32 then converted to int8)
Execution and Profiling
- Average inference time: 0.241 ms per sample
- Number of nodes in the network: 15
- Profiling per block (epoch):
- Blocks 6 and 11 consume most of the time (~30.9% and 28.2%)
- Other blocks each take about 2-3% of the time
- CPU cycles used: between 22k and 114k cycles depending on the block
- Overall performance: approximately 4152 inferences per second
Input/Output Data
- Input: int8 tensor (1,28,28,1), values range from -128 to 127
- Output: int8 tensor (1,10), values range from -128 to 127
- Data conversion: quantization with scale ~0.0039 and zero-point -128
Validation Results
- Accuracy: 100% on 10 tested samples
- Errors:
- RMSE (Root Mean Squared Error): 0.00039 (very low)
- MAE (Mean Absolute Error): 0.000039
- L2 relative error: 0.00132
- Nash-Sutcliffe Efficiency (NSE): 0.999998 (near perfect)
- Cosine similarity: 0.999999 (very high)
- Confusion matrix: perfect, no classification errors across 10 classes
Additional Observations
- C-network signature checking was skipped (non-blocking warning)
- Output results on target hardware closely match the TensorFlow Lite reference
- Validation report saved as a text file
- Total validation time: approximately 11.7 seconds
The int8 quantized MNIST model runs perfectly on the STM32N6 target with very high accuracy and very fast inference time.
The validation confirms excellent alignment between the TensorFlow Lite model and the ST hardware execution, with low error metrics.

