How to start with Weights encryption using ST Neural-ART NPU on STM32N6 MCUs

This message will disappear after all relevant tasks have been resolved.

Semantic MediaWiki

There are 1 incomplete or pending task to finish installation of Semantic MediaWiki. An administrator or user with sufficient rights can complete it. This should be done before adding new data to avoid inconsistencies.

Literature

ST Edge AI Documentation

ST Neural-ART NPU - Weights encryption support

Getting started - How to evaluate a model on STM32N6 board

UM2237 STM32CubeProgrammer software description
AN5054 Secure programming using STM32CubeProgrammer
BootRom wiki article
Secure Boot wiki article
Security features for STM32N6MCUs wiki article

Prerequisites

Hardware
- STM32N6 discovery board
- Discovery MB1860- STM32N6 (need USBC cable)

Required tools
- IAR: v9.40.1 + IAR patch to support STM32N6 (delivered with V0.5.0) + IAR Patch EWARMv9_STM32N6xx_V0.6.2
  - IAR patch is available in the STM32CubeFW: STM32Cube_FW_N6_Vx.x.x\Utilities\PC_Software
- STM32CubeProgrammer version 2.18.0
- STM32CubeIDE g. v.1.17.0+
- python>=3.9
- pyserial>=3.5
- protobuf>=3.20.3
- tqdm>=4.64

STEdgeAI
- Download the STEdgeAI package

1. Introduction

This article provides a step-by-step guide to encrypting neural network weights for deployment on STM32N6 platforms equipped with the Neural-ART accelerator. Weight encryption enhances the security of your AI models by protecting sensitive parameters during storage and transfer.

The encryption process relies on a protobuf-defined communication protocol between a host computer and the embedded target.

After setting encryption parameters such as keys and the number of rounds, unencrypted weights are sent to the board, encrypted by the Neural-ART hardware, and returned to the host in encrypted form.

2. Generate the specialized c-files using Neural-ART compiler

The Generate phase corresponds to the first key step in the complete process of encrypting the weights of a neural network intended to run on the STM32N6 platform with the Neural-ART accelerator.

This step goal is to transform a high-level neural network model (e.g., TensorFlow Lite format) into low-level code and data files that can be executed efficiently on the target hardware with Neural-ART acceleration.

It prepares the raw data (weights) that will be encrypted in the following step.

2.1. Inputs

Model File ( .tflite or .onnx)

This is the neural network model file containing the network architecture and trained weights.

For this example, the Tflite model delivered in STEdgeAI package is used: C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite.

Target Specification

Specifies the target hardware platform (STM32N6 MCU), enabling the compiler to optimize the code accordingly.

Compiler Profile

A JSON or configuration file defines compilation options, including optimization flags, memory layouts, and optional parameters.

To support the encryption of the weights, the NPU compiler provides a specific option --encrypt-weights which generates the extra code needed to configure the stream engines to fetch the encrypted data and decrypt them on-the-fly for the processing units. Only three or four cycles of latency are added.

With this option, all weights/parameters regions are considered encrypted by the NPU compiler.

Importing the mpool in the json profile file is crucial to define and manage the memory layout for neural network weights and parameters. It ensures correct memory referencing, supports encryption workflows, and enables efficient use of embedded memory.

To enable the encryption, a profile with the option" --encrypt-weights" must be created in a json file.

In C:\ST\STEdgeAI\2.2\scripts\N6_encrypt folder create a new file neural_art_encrypt.json and copy paste the code below:

{
	"Profiles": {

		// Automatic search of best options + weight encryption option

		"profile" : {

			"memory_pool": "../../Utilities/windows/targets/stm32/resources/mpools/stm32n6.mpool",

            "options": "--encrypt-weights --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto"

		}

    }

}

Save the file.

Additional Options

Flags or parameters to customize the generation process: enabling Neural-ART support with --st-neural-art.

2.2. Command line usage example and generate process

From folder: C:\ST\STEdgeAI\2.2\Utilities\windows, launch

stedgeai.exe generate -m ..\..\scripts\N6_scripts\models\mnist_int8_io_i8.tflite  --target stm32n6 --st-neural-art profile@"..\..\scripts\N6_encrypt\neural_art_encrypt.json"

The tool

parses the input .tflite model, checks for compatibility, and validates the network structure.
Generates C source files implementing the neural network inference logic, tailored to the target hardware and acceleration features.
Extracts the trained weights from the model and formats them into raw binary initializers (.raw files) suitable for loading into target memory.
Produces auxiliary file c_info.json that describe the memory layout, weight locations, and other relevant information needed for subsequent steps like encryption and flashing.
If the --encrypt-weights option is set, the generation process prepares the code and metadata to support encrypted weights, marking regions for encryption.

Success message:

C:\ST\STEdgeAI\2.2\Utilities\windows>stedgeai.exe generate -m ..\..\scripts\N6_scripts\models\mnist_int8_io_i8.tflite  --target stm32n6 --st-neural-art profile@"..\..\scripts\N6_encrypt\neural_art_encrypt.json"
ST Edge AI Core v2.2.0-20266 2adc00962
 >>>> EXECUTING NEURAL ART COMPILER
   C:/ST/STEdgeAI/2.2/Utilities/windows/atonn.exe -i "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/mnist_int8_io_i8_OE_3_3_0.onnx" --json-quant-file "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/mnist_int8_io_i8_OE_3_3_0_Q.json" -g "network.c" --load-mdesc "C:/ST/STEdgeAI/2.2/Utilities/configs/stm32n6.mdesc" --load-mpool "C:/ST/STEdgeAI/2.2/Utilities/windows/targets/stm32/resources/mpools/stm32n6.mpool" --save-mpool-file "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_ws/neural_art__network/stm32n6.mpool" --out-dir-prefix "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_ws/neural_art__network/" --encrypt-weights --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto --output-info-file "c_info.json" --d-auto 1
   --Oauto will optimize the options: max-ca-pipe values = [2, 4], alt-scheduler values = [false, true], experimental values = [false, true], conv-split-cw values = [false, true], conv-split-kw values = [false, true], conv-split-stripe-1x1 values = [false, true], O_level values = [0, 2, 4]
   --Oauto best solution found: max-ca-pipe = 2, alt-scheduler = false, experimental = false, conv-split-cw = false, conv-split-kw = false, conv-split-stripe-1x1 = false, O_level = 0
 <<<< DONE EXECUTING NEURAL ART COMPILER

 Exec/report summary (generate)
 ------------------------------------------------------------------------------------------------------------------
 model file         :   C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite
 type               :   tflite
 c_name             :   network
 options            :   allocate-inputs, allocate-outputs
 optimization       :   balanced
 target/series      :   stm32n6npu
 workspace dir      :   C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws
 output dir         :   C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output
 model_fmt          :   ss/ss per tensor
 model_name         :   mnist_int8_io_i8
 model_hash         :   0xbe0a77fcd37f19c0f475d4e7bc5e94fc
 params #           :   20,410 items (20.00 KiB)
 ------------------------------------------------------------------------------------------------------------------
 input 1/1          :   'Input_0_out_0', int8(1x28x28x1), 784 Bytes, QLinear(0.003921569,-128,int8), activations
 output 1/1         :   'Quantize_12_out_0', int8(1x10), 10 Bytes, QLinear(0.003906250,-128,int8), activations
 macc               :   0
 weights (ro)       :   20,625 B (20.14 KiB) (1 segment) / -61,015(-74.7%) vs float model
 activations (rw)   :   4,065 B (3.97 KiB) (1 segment) *
 ram (total)        :   4,065 B (3.97 KiB) = 4,065 + 0 + 0
 ------------------------------------------------------------------------------------------------------------------
 (*) 'input'/'output' buffers are allocated in the activations buffer

Computing AI RT data/code size (target=stm32n6npu)..
 -> compiler "gcc:arm-none-eabi-gcc" is not in the PATH

Compilation details
   ---------------------------------------------------------------------------------
Compiler version: 1.1.1-14
Compiler arguments:  -i C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0.onnx --json-quant-file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0_Q.json -g network.c --load-mdesc C:\ST\STEdgeAI\2.2\Utilities\configs\stm32n6.mdesc --load-mpool C:\ST\STEdgeAI\2.2\Utilities\windows\targets\stm32\resources\mpools\stm32n6.mpool --save-mpool-file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws\neural_art__network\stm32n6.mpool --out-dir-prefix C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws\neural_art__network/ --encrypt-weights --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto --output-info-file c_info.json --d-auto 1
====================================================================================
Memory usage information  (input/output buffers are included in activations)
   ---------------------------------------------------------------------------------
        flexMEM    [0x34000000 - 0x34000000]:          0  B /          0  B  (  0.00 % used) -- weights:          0  B (  0.00 % used)  activations:          0  B (  0.00 % used)
        cpuRAM1    [0x34064000 - 0x34064000]:          0  B /          0  B  (  0.00 % used) -- weights:          0  B (  0.00 % used)  activations:          0  B (  0.00 % used)
        cpuRAM2    [0x34100000 - 0x34200000]:          0  B /      1.000 MB  (  0.00 % used) -- weights:          0  B (  0.00 % used)  activations:          0  B (  0.00 % used)
        npuRAM3    [0x34200000 - 0x34270000]:          0  B /    448.000 kB  (  0.00 % used) -- weights:          0  B (  0.00 % used)  activations:          0  B (  0.00 % used)
        npuRAM4    [0x34270000 - 0x342E0000]:          0  B /    448.000 kB  (  0.00 % used) -- weights:          0  B (  0.00 % used)  activations:          0  B (  0.00 % used)
        npuRAM5    [0x342E0000 - 0x34350000]:      3.970 kB /    448.000 kB  (  0.89 % used) -- weights:          0  B (  0.00 % used)  activations:      3.970 kB (  0.89 % used)
        npuRAM6    [0x34350000 - 0x343C0000]:          0  B /    448.000 kB  (  0.00 % used) -- weights:          0  B (  0.00 % used)  activations:          0  B (  0.00 % used)
        octoFlash  [0x71000000 - 0x78000000]:     20.142 kB /    112.000 MB  (  0.02 % used) -- weights:     20.142 kB (  0.02 % used)  activations:          0  B (  0.00 % used)
        hyperRAM   [0x90000000 - 0x92000000]:          0  B /     32.000 MB  (  0.00 % used) -- weights:          0  B (  0.00 % used)  activations:          0  B (  0.00 % used)

Total:                                            24.111 kB                                  -- weights:     20.142 kB                  activations:      3.970 kB
====================================================================================
Used memory ranges
   ---------------------------------------------------------------------------------
        npuRAM5    [0x342E0000 - 0x34350000]: 0x342E0000-0x342E0FF0
        octoFlash  [0x71000000 - 0x78000000]: 0x71000000-0x710050A0
====================================================================================
Epochs details
   ---------------------------------------------------------------------------------
Total number of epochs: 16 of which 1 implemented in software

epoch ID   HW/SW/EC Operation (SW only)
epoch 1       HW
epoch 2       HW
epoch 3       HW
epoch 4       HW
epoch 5       HW
epoch 6       HW
epoch 7       HW
epoch 8       HW
epoch 9       HW
epoch 10      HW
epoch 11      HW
epoch 12      HW
epoch 13      HW
epoch 14      HW
epoch 15      HW
epoch 16     -SW-   (      Softmax       )
====================================================================================

 Generated files (5)
 ------------------------------------------------------------------------------------
 C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0.onnx
 C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\mnist_int8_io_i8_OE_3_3_0_Q.json
 C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network.c
 C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw
 C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network.h

Creating txt report file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_generate_report.txt
elapsed time (generate): 36.857s

2.3. Outputs

C Source Files (*.c, *.h)

Contain the inference engine code, ready to be compiled and linked for STM32N6-Neural-ART-c-project. network.c and network.h files are generated in the output default folder

Raw Weight Initializer Files (*.raw)

Binary files containing the unencrypted neural network weights, organized for direct memory loading. contain memory initializers for weights (and activations) that shall be programmed to memory before doing an inference. network_atonbuf.xSPI2.raw

Intermediate files (*.onnx, *.json)

Files describing the memory layout and locations of weights and parameters, essential for encryption and flashing tools. They can be used later with the ST Neural-ART compiler.

Intermediate files created during this step-by-step:

mnist_int8_io_i8_OE_3_3_0.onnx
- Represents the complete neural network model, including layer structure, weights, biases, and model parameters.
- A standard interoperable format for exchanging models between different AI frameworks
mnist_int8_io_i8_OE_3_3_0_Q.json
- Contains the quantization parameters of the model, such as scales, zero points, and other metadata related to int8 quantization.
- Describes how weights and activations have been quantized to reduce size and optimize performance.

Information

mnist_int8_io_i8_OE_3_3_O.onnx and mnist_int8_io_i8_OE_3_3_O.json are generated only to be used as input by the Neural-ART compiler.

network_c_info.json
- Memory locations of weights and parameters
- Layout of input/output buffers
- Sizes of various memory regions
- Details required for encryption and memory loading
Generation Report: The network_validate_report.txt provides the main information about the imported model and how it is deployed: A summary about the usage of the memory, used options, epoch types..

Information

These two files are not listed in the command line above since are considered as secondary, not mandatory for next step.

Below the output files generated in defaultst_ai_output folder:

Information

Generated files are stored in the default 'st_ai_output' folder. The default file-prefix 'network_' is used.

The '-n/--name' option can be used to override the c-name suffix.

The '-o/--output' option can be used to override the output folder.

3. Weights encryption

For this step it is required to have a STM32N6 board connected to the computer.

A nucleo board or a discovery board can be used.

To perform weight encryption STedgeAI provides

an embedded firmware in STEdgeAI\2.2\scripts\N6_encrypt\c\Weights_encryption: used to receive orders from the host computer, process the data

and send it back as fast as possible to the host computer.

a python script that send unencrypted weights to the STM32N6 and receive weights encrypted by the Neural-ART accelerator integrated in the STM32N6 board.

3.1. Weights_encryption embedded firmware

This firmware

Implement hardware encryption of weights
- This firmware leverages the Neural-ART accelerator integrated in the STM32N6 board to perform hardware-based encryption of the neural network weights. Encryption in the ST NPU is performed by an integrated hardware module that enables fast and transparent encryption and decryption of neural network weights. This mechanism is symmetric, involutive (encrypting twice with the same parameters returns the original data), and depends on the memory address, which prevents simple copying of encrypted weights in memory.
- The file ll_aton_cipher.c is a low-level driver provided in the Neural-ART SDK that allows interaction with this hardware encryption module. It offers functions to:
  - Configure encryption keys on bus interfaces (LL_Busif_SetKeys).
  - Initialize and start encrypted or decrypted DMA transfers (LL_DmaCypherInit).
  - Manage encryption parameters such as the number of rounds, encryption ID, and activation masks. This driver is used in the embedded firmware to ensure transparent decryption of weights during inference, as well as in encryption tools to encrypt weights before flashing.
Manage communication between the PC and the board
- It handles communication (via UART or debug interface) with the Python script, enabling reliable transfer of data to be encrypted and returning the encrypted data.
- It synchronizes data transfer, encryption, and reception operations.
- It receives the unencrypted weights sent by the Python script, encrypts them in real-time using the hardware, and sends the encrypted weights back to the PC.
Ensure consistency and security
- By delegating encryption to the embedded hardware, this firmware guarantees that weights are encrypted according to the NPU’s requirements.
- It avoids errors related to software or off-target encryption.

The base program was generated using CubeMX. (see the .ioc file for reference) → "FSBL" and "appli" projects are generated, but only the FSBL

project is used in this example.

From C:\ST\STEdgeAI\2.2\scripts\N6_encrypt\c\Weights_encryption,

- double-click on file .project to open STM32CubeIDE project dedicated to

send raw weights from the computer to the board - raw weights are encrypted by the NPU
sends the encrypted weights back from the board to the computer.

- Stm32CubeIDE tool is automatically opened, and you can visualize the delivered code.

The build and program will automatically be done during next step by python script.

3.2. Pyton script: Command line usage example and encryption process

Below a scheme to overview the process:

The python script needs to load the embedded firmware to the STM32 board. The board must be set in dev boot configuration, and connected to the computer.

From C:\ST\STEdgeAI\2.2\scripts\N6_encrypt\python, open a command window and launch the python script:

python end_to_end_encrypt.py --cubeide C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE --postprocess ..\..\..\Utilities\windows\st_ai_output\network_c_info.json ..\..\..\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw

Parameters

--cubeide C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE
- Specifies the path to STM32CubeIDE, the IDE/toolchain used to program and communicate with the STM32 board.
- The script uses this path to launch the IDE’s debugging tools (e.g., GDB server) to upload and interact with the encryption firmware on the board.
--postprocess
- Indicates that after encryption, the script should perform post-processing steps.
- This usually involves backing up the original unencrypted .raw file and replacing it with the encrypted version.
- It may also update related metadata or configuration files to reflect the encryption status.
..\..\..\Utilities\windows\st_ai_output\network_c_info.json
- The metadata file generated during the model generation step.
- Contains information about memory layout, locations of weights, and other critical data needed to correctly encrypt the right portions of the weight file.
..\..\..\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw
- The raw weight file containing unencrypted neural network weights.
- This file is the input data that will be encrypted by the hardware on the STM32 board.

Python script process flow:

Launch STM32CubeIDE tools
- The script starts the STM32CubeIDE debugging environment to connect to the STM32 board.
Load encryption firmware
- The encryption firmware is loaded onto the STM32 board. This firmware uses the Neural-ART hardware encryption engine.
Send raw weights to the board
- The script reads the .raw weight file and sends it to the STM32 board via the debug interface (over ST-Link).
Hardware encryption on the board
- The Neural-ART accelerator on the STM32 board encrypts the weights in real-time. The encryption key is sent via encrypt_neural_art.py script.
- There are 12 rounds that are a compromis between security and latency.
Receive encrypted weights
- The encrypted data is sent back to the host computer and reassembled into a complete encrypted .raw file.
Post-processing
- The original unencrypted .raw file is backed up (renamed with. unencrypted extension).
- The encrypted .raw file replaces the original file, ready for flashing onto the target device.

Command line result:

C:\ST\STEdgeAI\2.2\scripts\N6_encrypt\python>python end_to_end_encrypt.py --cubeide C:\ST\STM32CubeIDE_1.18.1\STM32CubeIDE --postprocess ..\..\..\Utilities\windows\st_ai_output\network_c_info.json ..\..\..\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw
16:58:12.724 :: cubeIDE_toolbox.py :: INFO     :: Resetting the board
16:58:13.362 :: cubeIDE_toolbox.py :: INFO     :: Starting GDB server
16:58:13.381 :: cubeIDE_toolbox.py :: INFO     :: Starting GDB client
16:58:14.369 :: end_to_end_encrypt.py :: INFO     :: Waiting for the firmware to initialize
16:58:15.370 :: end_to_end_encrypt.py :: INFO     :: Starting encryption script
16:58:15.370 :: encrypt_neural_art.py :: INFO     :: Parsing c_info file
16:58:15.375 :: encrypt_neural_art.py :: INFO     :: Memory pool to encrypt found at address: 0x71000000 -- 20.142 kBytes to encrypt at offset 0
16:58:15.415 :: encrypt_neural_art.py :: INFO     :: Starting encryption
16:58:15.416 :: encrypt_neural_art.py :: INFO     :: Sending encryption params: keys = (MSB:0xaabbccddaabbccdd)(LSB:0xaabbccddaabbccdd) -- nb_rounds = 12
16:58:15.632 :: encrypt_neural_art.py :: INFO     :: Data transfer finished -- Took 0.200 seconds -- size = 20.142kB -- Encryption rate: 100.708kB/s
16:58:15.635 :: encrypt_neural_art.py :: INFO     :: Encrypted data injected into network_atonbuf.xSPI2_encrypted.raw
16:58:15.635 :: encrypt_neural_art.py :: INFO     :: Done
16:58:15.637 :: end_to_end_encrypt.py :: INFO     :: Postprocessing the files
16:58:15.637 :: end_to_end_encrypt.py :: INFO     :: Backup of original unencrypted weights: network_atonbuf.xSPI2.raw -> network_atonbuf.xSPI2.unencrypted
16:58:15.637 :: end_to_end_encrypt.py :: INFO     :: Replacing original file with encrypted weights: network_atonbuf.xSPI2_encrypted.raw -> network_atonbuf.xSPI2.raw
16:58:15.646 :: end_to_end_encrypt.py :: INFO     :: Done

Below the output files generated in default st_ai_output folder:

4. Loading encrypted weights into the target

This script is a loader and builder utility.

It automates a process that

Configure firmware project for the STM32N6 platform,
Copy generated neural network source files and related memory dumps into the project,
Convert raw memory files to HEX format,
Program the STM32N6 board's flash and RAM with these files,
Build and flashing the firmware,
Run the program on the target board.

Warning

This "n6-loader utility" is not a generic utility that work in all conditions/ projects.

Here we copy the generated source files to the NPU-Validation project delivered with stedgeai. The N6 loader utility is used as a primary step for doing validations on the generated code. The second step will be the call of stedgeai validate (see next paragraph).

The weights shall be encrypted (using the tools from this article) before being flashed into the final board.

From C:\ST\STEdgeAI\2.2\scripts\N6_scripts, adapt config.json file to your working environment and save it:

{
	// Set Compiler_type to either gcc or iar
	"compiler_type": "iar",
	// Path to IAR directory (ends up in bin/)
	"iar_binary_path": "C:/Program Files/IAR Systems/Embedded Workbench 9.2/common/bin/",
	// Path to CubeIDE directory (ends up in STM32CubeIDE)
	"cubeide_path":"C:/ST/STM32CubeIDE_1.18.1/STM32CubeIDE"
}

Adapt config_n6l.json file to your working environment:

// This file is for configuring N6 loader, the util used to copy STEdgeAI outputs into a project,
// compile the project, and load the results on the board.
{
	// The 2lines below are _only used if you call n6_loader.py ALONE (memdump is optional and will be the parent dir of network.c by default)
	"network.c": "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/network.c",
	//"memdump_path": "C:/Users/foobar/CODE/stm.ai/stm32ai_output",
	// Location of the "validation" project  + build config name to be built (if applicable)
	"project_path": "C:/ST/STEdgeAI/2.2/Projects/STM32N6570-DK/Applications/NPU_Validation",
	// If using the NPU_Validation project, valid build_conf names are "N6-DK", "N6-DK-USB", "N6-Nucleo", "N6-Nucleo-USB"
	"project_build_conf": "N6-DK",
	// Skip programming weights to earn time (but lose accuracy) -- useful for performance tests
	"skip_external_flash_programming": false,
	"skip_ram_data_programming": false
}

From C:\ST\STEdgeAI\2.2\scripts\N6_scripts, launch:

python n6_loader.py --n6-loader-config config_n6l.json

Result:

C:\ST\STEdgeAI\2.2\scripts\N6_scripts>python n6_loader.py --n6-loader-config config_n6l.json
12/01/2025 04:48:02 PM  __main__ -- Preparing compiler IAR
12/01/2025 04:48:02 PM  __main__ -- Setting a breakpoint in main.c at line 137 (before the infinite loop)
12/01/2025 04:48:02 PM  __main__ -- Copying network.c to project: -> C:\ST\STEdgeAI\2.2\Projects\STM32N6570-DK\Applications\NPU_Validation\X-CUBE-AI\App\network.c
12/01/2025 04:48:02 PM  __main__ -- Extracting information from the c-file
12/01/2025 04:48:02 PM  __main__ -- Converting memory files in results/<model>/generation/ to Intel-hex with proper offsets
12/01/2025 04:48:02 PM  __main__ -- arm-none-eabi-objcopy.exe --change-addresses 0x71000000 -Ibinary -Oihex network_atonbuf.xSPI2.raw network_atonbuf.xSPI2.hex
12/01/2025 04:48:02 PM  __main__ -- Resetting the board...
12/01/2025 04:48:04 PM  __main__ -- Flashing memory xSPI2 -- 20.625 kB
12/01/2025 04:48:05 PM  __main__ -- Building project (conf= N6-DK)
12/01/2025 04:48:07 PM  __main__ -- Loading internal memories & Running the program
12/01/2025 04:48:18 PM  __main__ -- Start operation achieved successfully

The generated network implementation is properly integrated to the project

The weights are correctly converted and programmed into memory.

5. Validate (optional)

The stedgeai validate execution validates that:

The model runs correctly on the target hardware.
The inference results meet expected accuracy and correctness.
Data communication and loading are reliable.
No critical errors occur during execution.

From folder: C:\ST\STEdgeAI\2.2\Utilities\windows, launch

stedgeai.exe validate  -m C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite --target stm32n6 --mode target --desc serial:921600 --val-json st_ai_output/network_c_info.json

Below the result:

C:\ST\STEdgeAI\2.2\Utilities\windows>stedgeai.exe validate  -m C:\ST\STEdgeAI\2.2\scripts\N6_scripts\models\mnist_int8_io_i8.tflite --target stm32n6 --mode target --desc serial:921600 --val-json st_ai_output/network_c_info.json
ST Edge AI Core v2.2.0-20266 2adc00962

Setting validation data...
 generating random data, size=10, seed=42, range=(0, 1)
   I[1]: (10, 28, 28, 1)/float32, min/max=[0.000012, 0.999718], mean/std=[0.495359, 0.288935]
    c/I[1] conversion [Q(0.00392157,-128)]-> (10, 28, 28, 1)/int8, min/max=[-128, 127], mean/std=[-1.680740, 73.678841]
    m/I[1] conversion [Q(0.00392157,-128)]-> (10, 28, 28, 1)/int8, min/max=[-128, 127], mean/std=[-1.680740, 73.678841]
 no output/reference samples are provided

Running the TFlite model...
PASS:   0%|                                                                                     | 0/24 [00:04<?, ?it/s]INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

Running the ST.AI c-model (AI RUNNER)...(name=network, mode=TARGET)

 Proto-buffer driver v2.0 (msg v3.1) (Serial driver v1.0 - COM6:921600) ['network']

  Summary 'network' - ['network']
  --------------------------------------------------------------------------------------------------------------------------------------------
  I[1/1] 'Input_0_out_0'       :   int8[1,28,28,1], 784 Bytes, QLinear(0.003921569,-128,int8), activations
  O[1/1] 'Quantize_12_out_0'   :   int8[1,10], 10 Bytes, QLinear(0.003906250,-128,int8), activations
  n_nodes                      :   15
  compile_datetime             :   Dec  1 2025 15:54:18
  --------------------------------------------------------------------------------------------------------------------------------------------
  protocol                     :   Proto-buffer driver v2.0 (msg v3.1) (Serial driver v1.0 - COM6:921600)
  tools                        :   ST Neural ART (LL_ATON api) v1.1.1
  runtime lib                  :   atonn-v1.1.1-14-ge619e860 (optimized SW lib v10.1.0-ae536891 IAR)
  capabilities                 :   IO_ONLY, PER_LAYER, PER_LAYER_WITH_DATA
  device.desc                  :   stm32 family - 0x486 - STM32N6xx @800/400MHz
  device.attrs                 :   fpu,core_icache,core_dcache,npu_cache=1,mcu_freq=800MHz,noc_freq=400MHz,npu_freq=1000MHz,nic_freq=900MHz
  --------------------------------------------------------------------------------------------------------------------------------------------
 Warning: C-network signature checking has been skipped


  ST.AI Profiling results v2.0 - "network"
  ------------------------------------------------------------
  nb sample(s)   :   10
  duration       :   0.241 ms by sample (0.239/0.256/0.005)
  CPU cycles     :   [55,692 114,285 22,686]
  ------------------------------------------------------------

   Inference time per node
   ---------------------------------------------------------------------------------------------------------------
   c_id    m_id   type                 dur (ms)       %    cumul  CPU cycles                      name
   ---------------------------------------------------------------------------------------------------------------
   0       -      epoch                   0.006    2.7%     2.7%  [   3,160      910    1,063 ]   EpochBlock_2
   1       -      epoch                   0.008    3.3%     5.9%  [   3,661    1,289    1,318 ]   EpochBlock_3
   2       -      epoch                   0.006    2.3%     8.2%  [   2,580      982      924 ]   EpochBlock_4
   3       -      epoch                   0.005    2.3%    10.5%  [   2,517      989      884 ]   EpochBlock_5
   4       -      epoch                   0.028   11.7%    22.2%  [  12,472    7,459    2,541 ]   EpochBlock_6
   5       -      epoch                   0.005    2.3%    24.4%  [   2,595      838      903 ]   EpochBlock_7
   6       -      epoch                   0.074   30.9%    55.3%  [   3,244   55,070    1,222 ]   EpochBlock_8
   7       -      epoch                   0.005    2.3%    57.6%  [   2,586      873      932 ]   EpochBlock_9
   8       -      epoch                   0.005    2.3%    59.9%  [   2,516      943      915 ]   EpochBlock_10
   9       -      epoch                   0.068   28.2%    88.1%  [  10,160   41,341    2,876 ]   EpochBlock_11
   10      -      epoch                   0.005    2.2%    90.3%  [   2,502      873      903 ]   EpochBlock_12
   11      -      epoch                   0.005    2.2%    92.5%  [   2,477      872      917 ]   EpochBlock_13
   12      -      epoch                   0.005    2.2%    94.8%  [   2,516      886      918 ]   EpochBlock_14
   13      -      epoch                   0.005    2.2%    97.0%  [   2,518      891      914 ]   EpochBlock_15
   14      -      epoch (SW)              0.007    3.0%   100.0%  [     188       69    5,456 ]   EpochBlock_16
   ---------------------------------------------------------------------------------------------------------------
   n/a     n/a    Inter-nodal             0.000    0.0%   100.0%                                  n/a
   ---------------------------------------------------------------------------------------------------------------
   total                                  0.241                   [  55,692  114,285   22,686 ]
                                  4151.94 inf/s                   [   28.9%    59.3%    11.8% ]
   ---------------------------------------------------------------------------------------------------------------

   Statistic per tensor
   ----------------------------------------------------------------------------------------
   tensor   #    type[shape]:size       min   max       mean      std  name
   ----------------------------------------------------------------------------------------
   I.0      10   i8[1,28,28,1]:784     -128   127     -1.681   73.679  Input_0_out_0
   O.0      10   i8[1,10]:10           -128   127   -102.440   71.444  Quantize_12_out_0
   ----------------------------------------------------------------------------------------

Saving validation data...
 output directory: C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output
 creating C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_val_io.npz
 m_outputs_1: (10, 10)/int8, min/max=[-128, 127], mean/std=[-102.430000, 71.446799], nl_4
 c_outputs_1: (10, 10)/int8, min/max=[-128, 127], mean/std=[-102.440000, 71.444149], scale=0.003906250 zp=-128, nl_4

Computing the metrics...

 Cross accuracy report #1 (reference vs C-model)
 ----------------------------------------------------------------------------------------------------
 notes: - r/int8 data are dequantized with s=0.00390625 zp=-128
        - p/int8 data are dequantized with s=0.00390625 zp=-128
        - the output of the reference model is used as ground truth/reference value
        - 10 samples (10 items per sample)

  acc=100.00% rmse=0.000390625 mae=0.000039062 l2r=0.001317892 mean=0.000039 std=0.000391 nse=0.999998 cos=0.999999

  Confusion matrix (axis=-1) - 10 classes (10 samples)
  ----------------------------------------------------------
  C0        0    .    .    .    .    .    .    .    .    .
  C1        .    0    .    .    .    .    .    .    .    .
  C2        .    .    5    .    .    .    .    .    .    .
  C3        .    .    .    0    .    .    .    .    .    .
  C4        .    .    .    .    0    .    .    .    .    .
  C5        .    .    .    .    .    4    .    .    .    .
  C6        .    .    .    .    .    .    1    .    .    .
  C7        .    .    .    .    .    .    .    0    .    .
  C8        .    .    .    .    .    .    .    .    0    .
  C9        .    .    .    .    .    .    .    .    .    0

 Evaluation report (summary)
 ----------------------------------------------------------------------------------------------------------------------
 Output       acc       rmse          mae           l2r           mean       std        nse        cos        tensor
 ----------------------------------------------------------------------------------------------------------------------
 X-cross #1   100.00%   0.000390625   0.000039062   0.001317892   0.000039   0.000391   0.999998   0.999999   nl_4
 ----------------------------------------------------------------------------------------------------------------------

  acc  : Accuracy (class, axis=-1)
  rmse : Root Mean Squared Error
  mae  : Mean Absolute Error
  l2r  : L2 relative error
  mean : Mean error
  std  : Standard deviation error
  nse  : Nash-Sutcliffe efficiency criteria, bigger is better, best=1, range=(-inf, 1]
  cos  : COsine Similarity, bigger is better, best=1, range=(0, 1]

Creating txt report file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_validate_report.txt
elapsed time (validate): 11.738s

Sum up of this report:

Context and Configuration

Tool used: ST Edge AI Core v2.2.0
Validated model: mnist_int8_io_i8.tflite (int8 quantized model)
Target hardware: STM32N6 (CPU frequency 800MHz, NPU 1000MHz)
Execution mode: Target (running on STM32N6 hardware via serial driver COM6 at 921600 baud)
Validation data: 10 randomly generated samples (size 28x28x1, float32 then converted to int8)

Execution and Profiling

Average inference time: 0.241 ms per sample
Number of nodes in the network: 15
Profiling per block (epoch):
- Blocks 6 and 11 consume most of the time (~30.9% and 28.2%)
- Other blocks each take about 2-3% of the time
CPU cycles used: between 22k and 114k cycles depending on the block
Overall performance: approximately 4152 inferences per second

Input/Output Data

Input: int8 tensor (1,28,28,1), values range from -128 to 127
Output: int8 tensor (1,10), values range from -128 to 127
Data conversion: quantization with scale ~0.0039 and zero-point -128

Validation Results

Accuracy: 100% on 10 tested samples
Errors:
- RMSE (Root Mean Squared Error): 0.00039 (very low)
- MAE (Mean Absolute Error): 0.000039
- L2 relative error: 0.00132
- Nash-Sutcliffe Efficiency (NSE): 0.999998 (near perfect)
- Cosine similarity: 0.999999 (very high)
Confusion matrix: perfect, no classification errors across 10 classes

Additional Observations

C-network signature checking was skipped (non-blocking warning)
Output results on target hardware closely match the TensorFlow Lite reference
Validation report saved as a text file
Total validation time: approximately 11.7 seconds

The int8 quantized MNIST model runs perfectly on the STM32N6 target with very high accuracy and very fast inference time.

The validation confirms excellent alignment between the TensorFlow Lite model and the ST hardware execution, with low error metrics.