1. Description[edit source]
1.1. IP and capabilities[edit source]
The unit integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP. This unit is a combination of GPU and NPU, these two parts are sharing the same Parallel Processing Unit (shaders).The NPU part is composed of one AI core that deliver 1.2 TOPS at 800MHz that can be overdrive to 900MHz to reach 1.35 TOPS. On GPU side, the computing power is 12.8 GFLOPS at 800MHz when processing 16bit data.
To have a clearer view of the performances you can reach with this NPU, here is a table of the performances of several common models:
Model | Type |
---|---|
MobilenetV1_0.5_128_quant | 515 FPS |
SSD_MobilenetV1_1.0_300_quant | ?? FPS |
Movenet_SinglePose_Lightning_int8_4 | 15 FPS (GPU execution ONLY) |
DeepLabV3_quant | 16 FPS |
1.2. Restriction and usage[edit source]
To access and run a NN model on the NPU, you need to use the OpenVX software stack. But, to simplify the usage of the NPU software stack, we have developed a stai_mpu unified API that allow you to run a NN model easily. To have more information, please visit this wiki page: LINK
This NPU IP only support 8-bits NN models quantized with the per-tensor asymmetric quantization scheme. If the quantization scheme is different, like per-channel, the model will run mainly on GPU instead of NPU. You will find in the next section the list of the supported operations on NPU and on GPU with all the information about the data format needed for the execution on the HW.
Using GPU and NPU in the same time may introduce some delay in the processing.
2. Operation Support[edit source]
Data type abbreviations:
- asym-u8: asymmetric_affine-uint8
- asym-i8: symmetric_affine-int8
- fp32: float32
- fp16: float16
- bool8: bool8
- int16: int16
- int32: int32
Execution engine abbreviations:
- NPU: Neural Processing Unit
- GPU: Graphics Processing Unit
2.1. Basic Operations[edit source]
This is the list the basic operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_CONV2D | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CONV1D | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CONV3D | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_DECONVOLUTION | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_DECONVOLUTION1D | asym-u8 | ![]() |
![]() |
asym-i8 | ![]() |
![]() | |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_FCL2 | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GROUPED_CONV1D | asym-u8 | ![]() |
![]() |
asym-i8 | ![]() |
![]() | |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GROUPED_CONV2D | asym-u8 | ![]() |
![]() |
asym-i8 | ![]() |
![]() | |
fp32 / fp16 | ![]() |
![]() |
2.2. Activation Operations[edit source]
This is the list the OVXLIB activation operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_ABS | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ACOSH | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ATAN | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ATANH | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CLIP | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_COS | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ERF | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_EXP | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_HARD_SIGMOID | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_INVERSE_SIGMOID | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LEAKY_RELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LINEAR | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LOG | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LOG_SOFTMAX | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MISH | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_NEG | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_PRELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RCP | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RELUN | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RSQRT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SIGMOID | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SIGN | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SIN | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SOFTMAX | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SOFTRELU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SOFTSIGN | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SQRT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SQUARE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SWISH | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_TANH | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() |
2.3. Elementwise Operations[edit source]
This is the list the elementwise operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_ADD | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ADDN | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_DIVIDE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_FLOORDIV | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LOGICAL_NOT | bool8 | ![]() |
![]() |
VSI_NN_OP_LOGICAL_OPS | bool8 | ![]() |
![]() |
VSI_NN_OP_MATRIXMUL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MAXIMUM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MINIMUM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MOD | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MULTIPLY | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_POW | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RELATIONAL_OPS | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
bool8 | ![]() |
![]() | |
VSI_NN_OP_SELECT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
bool8 | ![]() |
![]() | |
VSI_NN_OP_SUBTRACT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() |
2.4. Normalization Operations[edit source]
This is the list the normalization operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_BATCH_NORM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_BATCHNORM_SINGLE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GROUP_NORM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_INSTANCE_NORM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_L2_NORMALIZE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LAYER_NORM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LPNORM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LRN2 | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MOMENTS | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() |
2.5. Reshape Operations[edit source]
This is the list the reshape operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_ARGMAX | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ARGMIN | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_BATCH2SPACE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CONCAT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_DEPTH2SPACE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_EXPAND_BROADCAST | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_PAD2 | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_PERMUTE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_REDUCE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_REORG | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RESHAPE2 | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_REVERSE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SHUFFLECHANNEL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SLICE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SPACE2BATCH | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SPACE2DEPTH | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SPLIT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SQUEEZE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_STACK | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_STRIDED_SLICE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_UNSTACK | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() |
2.6. RNN Operations[edit source]
This is the list the recurrent neural network (RNN) operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_CONV2D_LSTM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CONV2D_LSTM_CELL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GRU | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GRUCELL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LSTM_OVXLIB | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LSTMUNIT_OVXLIB | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SVDF | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() |
2.7. Pooling Operations[edit source]
This is the list the recurrent neural network (RNN) operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_AVG_POOL3D | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GLOBALLPPOOL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_LPPOOL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MAX_POOL3D | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MAXPOOLWITHARGMAX | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_MAXUNPOOL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_POOL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_POOLWITHARGMAX | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ROI_POOL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_UPSAMPLE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() |
2.8. Miscellaneous Operations[edit source]
This is the list other operations supported by the NPU.
Operation | Type | NPU Support | GPU support |
---|---|---|---|
VSI_NN_OP_BUCKETIZE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CAST | all types | ![]() |
![]() |
VSI_NN_OP_CEIL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CONCATSHIFT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_CUMSUM | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_DATACONVERT | asym-u8 / asym-i8 | ![]() |
![]() |
VSI_NN_OP_DROPOUT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_EMBEDDING_LOOKUP | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_FLOOR | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GATHER | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GATHER_ELEMENTS | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GATHER_ND | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_GRID_SAMPLE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ONE_HOT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_PROPOSAL | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_REPEAT | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RESIZE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RESIZE_1D | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_RESIZE_3D | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_REVERSESEQUENCE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_ROUND | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SCATTER_ELEMENTS | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SCATTER_ND | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SCATTER_ND_UPDATE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SEQUENCE_MASK | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_SIGNAL_FRAME | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_TILE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_UPSAMPLESCALE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() | |
VSI_NN_OP_VARIABLE | asym-u8 / asym-i8 | ![]() |
![]() |
fp32 / fp16 | ![]() |
![]() |