How to deploy Ultralytics YOLO models to STM32N6

This message will disappear after all relevant tasks have been resolved.

Semantic MediaWiki

There are 1 incomplete or pending task to finish installation of Semantic MediaWiki. An administrator or user with sufficient rights can complete it. This should be done before adding new data to avoid inconsistencies.

1. YOLO for STM32

The STMicroelectronics Ultralytics fork provides a collection of pre-trained and quantized YOLOv8, YOLO11, YOLO26 models. These models are compatible with STM32 platforms, ensuring seamless integration and efficient performance for edge computing applications.

The tutorial below offers a standalone, end-to-end guide to deploy YOLOv8, YOLO11 and YOLO26 models to STM32N6.

1.1. Benefits

Offers a set of models compatible with STM32 platforms and stm32ai-modelzoo.
Offers a step by step guide on how to export quantization friendly YOLO26 ONNX models to be used with stm32ai-modelzoo-services and deployed on STM32N6 Discovery Kit.
Offers a quantization friendly pose estimation model (fixed on the latest version of Ultralytics)
A step by step guide on how to use AiRunner to evaluate YOLOv8 models on STM32N6. See here.
A guide on how to deploy the gesture detection model on STM32N6. See here.

1.2. Notice

If You combine this software (“Software”) with other software from STMicroelectronics ("ST Software"), to generate a software or software package ("Combined Software"), for instance for use in or in combination with STM32 products, You must comply with the license terms under which ST distributed such ST Software ("ST Software Terms"). Since this Software is provided to You under AGPL-3.0-only license terms, in most cases (such as, but not limited to, ST Software delivered under the terms of SLA0044, SLA0048, or SLA0078), ST Software Terms contain restrictions which will strictly forbid any distribution or non-internal use of the Combined Software. You are responsible for compliance with applicable license terms for any Software You use, and as such, You must limit your use of this software and any Combined Software accordingly.

1.3. Available YOLO Models


Models	Task	Input Resolution	Format	Input Type	Output Type
YOLOv8n	person_detection	256x256x3	per channel int8	uint8	float
YOLOv8n	person_detection	320x320x3	per channel int8	uint8	float
YOLOv8n	person_detection	416x416x3	per channel int8	uint8	float
YOLO11n	person_detection	256x256x3	per channel int8	uint8	float
YOLOv8n	gesture detection	256x256x3	per channel int8	uint8	float
YOLOv8n	gesture detection	320x320x3	per channel int8	uint8	float
YOLOv8n	pose_estimation	256x256x3	per tensor int8	uint8	float
YOLOv8n	pose_estimation	256x256x3	per channel int8	uint8	float
YOLOv8n	pose_estimation	320x320x3	per channel int8	uint8	float
YOLOv8n	pose_estimation	192x192x3	per channel int8	uint8	float
YOLO11n	pose_estimation	256x256x3	per channel int8	uint8	float
YOLO11n	pose_estimation	320x320x3	per channel int8	uint8	float
YOLOv8n	segmentation	256x256x3	per channel int8	int8	int8
YOLOv8n	segmentation	320x320x3	per channel int8	int8	int8
YOLO11n	segmentation	256x256x3	per channel int8	int8	int8

2. YOLO Ultralytics deployment guide to STM32N6

This tutorial explains how to train a YOLO26n model using the ST Ultralytics fork, export it to ONNX format, quantize it using STM32AI Model Zoo Services, and deploy it on an STM32N6 board.

2.1. Train and Export in ST Ultralytics Fork

2.1.1. Clone ST Ultralytics fork

git clone https://github.com/stm32-hotspot/ultralytics.git

2.1.2. Navigate to repository

cd ultralytics

2.1.3. Install Ultralytics

Create a Python 3.12 environment:

python -m venv yolo-env

Activate the environment:

On Windows:

yolo-env\Scripts\activate

On Unix or MacOS:

source yolo-env/bin/activate

Then, install the package in editable mode to be able to use the yolo command line tool from anywhere in the environment, and to reflect any changes made to the code without needing to reinstall.

pip install -e .

2.1.4. Go to YOLOv8-STEdgeAI example folder

cd examples/YOLOv8-STEdgeAI

2.1.5. Install ONNX dependencies for export and inference

pip install onnx==1.16.1 onnxruntime==1.20.1 tqdm

2.1.6. Download dataset

We will use a small dataset as an example, based on the COCO 2017 validation dataset for this tutorial. You can change the dataset by updating the paths in the next steps accordingly.

We provide a script to download and prepare the dataset for training and evaluation. Run the following command from the current directory:

python download_dataset.py

2.1.7. Convert dataset to YOLO format

As required by the current version of the training pipeline, we need to convert the COCO annotations to YOLO format.

Run the following command from the current directory:

python convert_dataset.py --coco_images_dir datasets/coco/images/val --coco_annotations_file datasets/coco/annotations/instances_val2017.json

2.1.8. Train model

In this example we will train a YOLO26n model for 3 epochs you can adjust the number of epochs as needed, with an image size of 256. You can adjust these parameters as needed, but make sure to keep the image size consistent across training, export, and evaluation steps.

yolo train model=yolo26n.pt data=dataset.yaml epochs=3 imgsz=256

2.1.9. Export to ONNX

Now we will export the trained model to ONNX format, which is compatible with STM32AI Model Zoo Services. Make sure to keep the opset version consistent with the one specified in the quantization configuration in the next steps.

Update the model path in the command below if the model is saved in a different location.

yolo export model=../../runs/detect/train/weights/best.pt format=onnx end2end=False imgsz=256 simplify=True opset=17

2.1.10. Evaluate exported ONNX model

Update the model path in the command below if the model is saved in a different location. This step is important to verify that the exported ONNX model has good accuracy before proceeding with quantization and deployment. Make sure to keep the image size and dataset paths consistent with the previous steps.

yolo val task=detect model=../../runs/detect/train/weights/best.onnx imgsz=256 data=dataset.yaml

2.2. Quantize and Deploy with STM32AI Model Zoo Services

2.2.1. Clone STM32AI Model Zoo Services

Make sure to clone the project in the same location as the ST Ultralytics fork for easier path management, but this is not mandatory. You can clone it anywhere and update the paths in the next steps accordingly.

git clone https://github.com/STMicroelectronics/stm32ai-modelzoo-services.git --depth 1

2.2.2. Navigate to repository

cd stm32ai-modelzoo-services

2.2.3. Initialize and update submodules

git submodule update --init --recursive

If this step fails, rerun it from the repository root and verify network/proxy access.

2.2.4. Create a dedicated environment and install requirements

conda create -n st_zoo python=3.12.9
conda activate st_zoo
pip install -r requirements.txt

2.2.5. Navigate to object detection use case

cd object_detection

2.2.6. Update user_config.yaml to quantize and evaluate exported ONNX model

Update the user_config.yaml file with the following content, and make sure to update the model_path with the path to the exported ONNX model from the previous steps and the dataset paths to point to the COCO validation images and labels in YOLO format generated in the previous steps:

operation_mode: chain_eqe

model:
   model_type: yolo26n
   model_path: tuto/ultralytics/runs/detect/train/weights/best.onnx

dataset:
  format: darknet_yolo
  dataset_name: darknet_yolo
  exclude_unlabeled: true
  download_data: false
  class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
                    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
                    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase',
                    'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
                    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
                    'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog',
                    'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',
                    'toilet','tv','laptop','mouse','remote','keyboard','cell phone','microwave','oven','toaster',
                    'sink','refrigerator','book','clock','vase','scissors','teddy bear','hair drier','toothbrush']
  test_images_path: tuto/ultralytics/examples/YOLOv8-STEdgeAI/datasets/coco/images/val 
  test_annotations_path: tuto/ultralytics/examples/YOLOv8-STEdgeAI/datasets/coco/labels/val
  quantization_path: tuto/ultralytics/examples/YOLOv8-STEdgeAI/datasets/coco/images/val
  quantization_split: 0.001

preprocessing:
   rescaling:
      scale: 1/255
      offset: 0
   resizing:
      aspect_ratio: fit
      interpolation: nearest
   color_mode: rgb

postprocessing:
  confidence_thresh: 0.001
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.5
  plot_metrics: False #True   # Plot precision versus recall curves. Default is False.
  max_detection_boxes: 100

quantization:
  quantizer: onnx_quantizer
  target_opset: 17
  granularity: per_channel #per_channel
  quantization_type: PTQ
  quantization_input_type: float 
  quantization_output_type: float
  export_dir: quantized_models

mlflow:
   uri: ./tf/src/experiments_outputs/mlruns

hydra:
   run:
      dir: ./tf/src/experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

Run the quantization and evaluation pipeline:

python stm32ai_main.py

2.2.7. Deploy exported ONNX model on N6

Update user_config.yaml to deploy the onnx model on the N6 as follows, and make sure to update the model_path with the path to the quantized model generated from the previous step:

operation_mode: deployment

model:
  model_type: yolo26n
  model_path: tf/src/experiments_outputs/%Y_%m_%d_%H_%M_%S}/quantized_models/best_quant_qdq_pc.onnx # update with the path to the quantized model generated from the previous step

dataset:
  class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
                    'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
                    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase',
                    'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
                    'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
                    'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog',
                    'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',
                    'toilet','tv','laptop','mouse','remote','keyboard','cell phone','microwave','oven','toaster',
                    'sink','refrigerator','book','clock','vase','scissors','teddy bear','hair drier','toothbrush']

preprocessing:
  resizing:
    aspect_ratio: crop
    interpolation: nearest
  color_mode: rgb

postprocessing:
  confidence_thresh: 0.5
  NMS_thresh: 0.5
  IoU_eval_thresh: 0.5
  max_detection_boxes: 10

tools:
  stedgeai:
    optimization: balanced
    on_cloud: False
    path_to_stedgeai: C:/ST/STEdgeAI/4.0/Utilities/windows/stedgeai.exe
  path_to_cubeIDE: C:/ST/STM32CubeIDE_1.17.0/STM32CubeIDE/stm32cubeide.exe

deployment:
  c_project_path: ../application_code/object_detection/STM32N6/
  IDE: GCC
  verbosity: 1
  hardware_setup:
    serie: STM32N6
    board: STM32N6570-DK

mlflow:
   uri: ./tf/src/experiments_outputs/mlruns

hydra:
   run:
      dir: ./tf/src/experiments_outputs/${now:%Y_%m_%d_%H_%M_%S}

Verify the board setup following this tutorial: https://github.com/STMicroelectronics/stm32ai-modelzoo-services/blob/main/object_detection/docs/README_DEPLOYMENT_STM32N6.md#3-deployment

Run deployment:

python stm32ai_main.py

2.3. Notes

Replace placeholder paths (for example, model_path values) with your real local paths before running.
Keep ONNX opset aligned between export and quantization settings.
Use the same class order everywhere (training, evaluation, quantization, deployment).