1. Description[edit source]
1.1. IP and capabilities[edit source]
The IP integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP. This IP is a combination of GPU and NPU, this two parts of the IP are sharing the same Parallel Processing Unit (shaders).The NPU part is composed of one AI core that deliver 1.2 TOPS @ 800MHz. It can be overdrive to 900MHz to reach 1.35 TOPS.
To have a clearer view of the performances you can reach with this NPU IP, here is a table of the performances of several common models:
MobilenetV1_0.5_128_quant | 515 FPS SSD_MobilenetV1_1.0_300_quant | FPS Movenet_SinglePose_Lightning_int8_4 | 15 FPS (GPU execution ONLY) DeepLabV3_quant | 16 FPS
1.2. Restriction and usage[edit source]
To access and run a NN model on the NPU IP, you need to use the OpenVX software stack. But, to simplify the usage of the NPU software stack, we have developed a stai_mpu unified API that allow you to run a NN model easily. To have more information, please visit this wiki page: LINK
This NPU IP only support 8-bits NN models quantized with the per-tensor asymmetric quantization scheme. If the quantization scheme is different, like per-channel, the model will run mainly on GPU instead of NPU. You will find in the next section the list of the supported operations on NPU and on GPU with all the information about the data format needed for the execution on the HW.
Using GPU and NPU in the same time may introduce some delay in the processing.
2. Operation Support[edit source]
Data type abbreviations:
- asym-u8: asymmetric_affine-uint8
- asym-i8: symmetric_affine-int8
- fp32: float32
- fp16: float16
- bool8: bool8
- int16: int16
- int32: int32
Execution engine abbreviations:
- NPU: Neural Processing Unit
- GPU: Graphics Processing Unit
2.1. Basic Operations[edit source]
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_CONV2D | asym-u8/asym-i8 | ||
fp32/fp16 | |||
VSI_NN_OP_CONV1D | asym-u8/asym-i8 | ||
fp32/fp16 | |||
VSI_NN_OP_CONV3D | asym-u8/asym-i8 | ||
fp32/fp16 | |||
VSI_NN_OP_DECONVOLUTION | asym-u8/asym-i8 | ||
fp32/fp16 | |||
VSI_NN_OP_DECONVOLUTION1D | asym-u8 | ||
asym-i8 | |||
fp32/fp16 | |||
VSI_NN_OP_FCL2 | asym-u8/asym-i8 | ||
fp32/fp16 | |||
VSI_NN_OP_GROUPED_CONV1D | asym-u8 | ||
asym-i8 | |||
fp32/fp16 | |||
VSI_NN_OP_GROUPED_CONV2D | asym-u8 | ||
asym-i8 | |||
fp32/fp16 |