STM32MP2 NPU description

Revision as of 14:35, 10 June 2024 by Registered User (Created page with "== Description == === IP and capabilities === The IP integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP. This IP is a combination of GPU and NPU, this two parts of the IP are sharing the same Parallel Processing Unit (shaders).The NPU part is composed of one AI core that deliver 1.2 TOPS @ 800MHz. It can be overdrive to 900MHz to reach 1.35 TOPS. To have a clearer view of the performances you can reach with this NPU IP, here is a t...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

1. Description[edit source]

1.1. IP and capabilities[edit source]

The IP integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP. This IP is a combination of GPU and NPU, this two parts of the IP are sharing the same Parallel Processing Unit (shaders).The NPU part is composed of one AI core that deliver 1.2 TOPS @ 800MHz. It can be overdrive to 900MHz to reach 1.35 TOPS.

To have a clearer view of the performances you can reach with this NPU IP, here is a table of the performances of several common models:

MobilenetV1_0.5_128_quant | 515 FPS SSD_MobilenetV1_1.0_300_quant | FPS Movenet_SinglePose_Lightning_int8_4 | 15 FPS (GPU execution ONLY) DeepLabV3_quant | 16 FPS


1.2. Restriction and usage[edit source]

To access and run a NN model on the NPU IP, you need to use the OpenVX software stack. But, to simplify the usage of the NPU software stack, we have developed a stai_mpu unified API that allow you to run a NN model easily. To have more information, please visit this wiki page: LINK

This NPU IP only support 8-bits NN models quantized with the per-tensor asymmetric quantization scheme. If the quantization scheme is different, like per-channel, the model will run mainly on GPU instead of NPU. You will find in the next section the list of the supported operations on NPU and on GPU with all the information about the data format needed for the execution on the HW.

Using GPU and NPU in the same time may introduce some delay in the processing.

2. Operation Support[edit source]

Data type abbreviations:

  • asym-u8: asymmetric_affine-uint8
  • asym-i8: symmetric_affine-int8
  • fp32: float32
  • fp16: float16
  • bool8: bool8
  • int16: int16
  • int32: int32

Execution engine abbreviations:

  • NPU: Neural Processing Unit
  • GPU: Graphics Processing Unit

2.1. Basic Operations[edit source]

Operation Type NPU Support GPU support
VSI_NN_OP_CONV2D asym-u8/asym-i8 Yes No
fp32/fp16 No Yes
VSI_NN_OP_CONV1D asym-u8/asym-i8 Yes No
fp32/fp16 No Yes
VSI_NN_OP_CONV3D asym-u8/asym-i8 Yes No
fp32/fp16 No Yes
VSI_NN_OP_DECONVOLUTION asym-u8/asym-i8 Yes No
fp32/fp16 No Yes
VSI_NN_OP_DECONVOLUTION1D asym-u8 Yes No
asym-i8 Yes Yes
fp32/fp16 No Yes
VSI_NN_OP_FCL2 asym-u8/asym-i8 Yes No
fp32/fp16 No Yes
VSI_NN_OP_GROUPED_CONV1D asym-u8 Yes No
asym-i8 Yes Yes
fp32/fp16 No Yes
VSI_NN_OP_GROUPED_CONV2D asym-u8 Yes No
asym-i8 Yes Yes
fp32/fp16 No Yes
No categories assignedEdit