STM32MP2 NPU description

Revision as of 22:04, 10 June 2024 by Registered User (→‎Operation Support)

1. Description[edit source]

1.1. Hardware and software capabilities[edit source]

The unit integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP.
This unit is a combination of GPU and NPU and these two parts are sharing the same Parallel Processing Unit (shaders).
The NPU part is composed of one AI core that deliver 1.2 TOPS at 800MHz that can be overdrive to 900MHz to reach 1.35 TOPS.
On GPU side, the computing power is 12.8 GFLOPS at 800MHz when processing 16bit data.

To have a clearer view of the performances you can reach with this NPU, here is a table of the performances of several common models:

Model Type
MobilenetV1_0.5_128_quant 515 FPS
SSD_MobilenetV1_1.0_300_quant ?? FPS
Movenet_SinglePose_Lightning_int8_4 15 FPS (GPU execution ONLY)
DeepLabV3_quant 16 FPS

1.2. Restriction and usage[edit source]

To access and run a NN model on the NPU, you need to use the OpenVX software stack. But, to ease the usage of the NPU software stack, we have developed a stai_mpu unified API that allow you to run a NN model easily. To have more information, please visit this wiki page: LINK available soon

This NPU IP only support 8-bits NN models quantized with the per-tensor asymmetric quantization scheme. If the quantization scheme is different, like per-channel, the model will run mainly on GPU instead of NPU. You will find in the next section the list of the supported operations on NPU and on GPU with all the information about the data format needed for the execution on the HW.

Warning DB.png Important
Application using concurently GPU (as example for display rendering) and NPU for NN processing will have reduced performance on the overall use case becasue both GPU and NPU use a time-sharing mechanism to access the shaders.

2. Operation Support[edit source]

Info white.png Information

Data type abbreviations:

  • asym-u8: asymmetric_affine-uint8
  • asym-i8: symmetric_affine-int8
  • fp32: float32
  • fp16: float16
  • bool8: bool8
  • int16: int16
  • int32: int32

Execution engine abbreviations:

  • NPU: Neural Processing Unit
  • GPU: Graphics Processing Unit

2.1. Basic Operations[edit source]

This is the list the basic operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_CONV2D asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_CONV1D asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_CONV3D asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_DECONVOLUTION asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_DECONVOLUTION1D asym-u8 Yes No
asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_FCL2 asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_GROUPED_CONV1D asym-u8 Yes No
asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_GROUPED_CONV2D asym-u8 Yes No
asym-i8 Yes Yes
fp32 / fp16 No Yes

2.2. Activation Operations[edit source]

This is the list the OVXLIB activation operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_ABS asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ACOSH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ATAN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ATANH asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_CELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_CLIP asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_COS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ERF asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_EXP asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_HARD_SIGMOID asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_INVERSE_SIGMOID asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LEAKY_RELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LINEAR asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LOG asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LOG_SOFTMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MISH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_NEG asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_PRELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RCP asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RELUN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RSQRT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SIGMOID asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SIGN asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SIN asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SOFTMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SOFTRELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SOFTSIGN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SQRT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SQUARE asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SWISH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_TANH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes

2.3. Elementwise Operations[edit source]

This is the list the elementwise operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_ADD asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ADDN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_DIVIDE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_FLOORDIV asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LOGICAL_NOT bool8 No Yes
VSI_NN_OP_LOGICAL_OPS bool8 No Yes
VSI_NN_OP_MATRIXMUL asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_MAXIMUM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MINIMUM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MOD asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MULTIPLY asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_POW asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RELATIONAL_OPS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
bool8 No Yes
VSI_NN_OP_SELECT asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
bool8 No Yes
VSI_NN_OP_SUBTRACT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes

2.4. Normalization Operations[edit source]

This is the list the normalization operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_BATCH_NORM asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_BATCHNORM_SINGLE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GROUP_NORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_INSTANCE_NORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_L2_NORMALIZE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LAYER_NORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LPNORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LRN2 asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MOMENTS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes

2.5. Reshape Operations[edit source]

This is the list the reshape operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_ARGMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ARGMIN asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_BATCH2SPACE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_CONCAT asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_DEPTH2SPACE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_EXPAND_BROADCAST asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_PAD2 asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_PERMUTE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_REDUCE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_REORG asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_RESHAPE2 asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_REVERSE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SHUFFLECHANNEL asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SLICE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SPACE2BATCH asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SPACE2DEPTH asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SPLIT asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SQUEEZE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_STACK asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_STRIDED_SLICE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_UNSTACK asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No

2.6. RNN Operations[edit source]

This is the list the recurrent neural network (RNN) operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_CONV2D_LSTM asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_CONV2D_LSTM_CELL asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_GRU asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_GRUCELL asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_LSTM_OVXLIB asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_LSTMUNIT_OVXLIB asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_SVDF asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes


2.7. Pooling Operations[edit source]

This is the list the recurrent neural network (RNN) operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_AVG_POOL3D asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GLOBALLPPOOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LPPOOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MAX_POOL3D asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_MAXPOOLWITHARGMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MAXUNPOOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_POOL asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_POOLWITHARGMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ROI_POOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_UPSAMPLE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes

2.8. Miscellaneous Operations[edit source]

This is the list other operations supported by the NPU.

Operation Type NPU Support GPU support
VSI_NN_OP_BUCKETIZE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_CAST all types No Yes
VSI_NN_OP_CEIL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_CONCATSHIFT asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_CUMSUM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_DATACONVERT asym-u8 / asym-i8 Yes No
VSI_NN_OP_DROPOUT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_EMBEDDING_LOOKUP asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_FLOOR asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GATHER asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GATHER_ELEMENTS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GATHER_ND asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GRID_SAMPLE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ONE_HOT asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_PROPOSAL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_REPEAT asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RESIZE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RESIZE_1D asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RESIZE_3D asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_REVERSESEQUENCE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ROUND asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SCATTER_ELEMENTS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SCATTER_ND asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SCATTER_ND_UPDATE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SEQUENCE_MASK asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SIGNAL_FRAME asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_TILE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_UPSAMPLESCALE asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_VARIABLE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No

2.9. Fuse Operation Support[edit source]

This is the list the operation combinations that the NPU can fuse.

Fuse Operation First Operation
Second Operation CONV2D CONV1D DW_2D FCL2 PERMUTE
ABS Yes Yes Yes Yes No
ACOSH Yes Yes Yes Yes No
ADD Yes Yes Yes Yes No
ATAN Yes Yes Yes Yes No
CELU Yes Yes Yes Yes No
CLIP Yes Yes Yes Yes No
CONV1D No No No No Yes
CONV2D No No No No Yes
DEPTH2SPACE Yes Yes Yes Yes No
DW_2D No No No No Yes
ELU Yes Yes Yes Yes No
ERF Yes Yes Yes Yes No
GELU Yes Yes Yes Yes No
HARD_SIGMOID Yes Yes Yes Yes No
HSWISH Yes Yes Yes Yes No
INVERSE_SIGMOID Yes Yes Yes Yes No
LEAKY_RELU Yes Yes Yes Yes No
LOG Yes Yes Yes Yes No
MAX_POOL Yes Yes Yes Yes No
MISH Yes Yes Yes Yes No
MULTIPLY Yes Yes Yes Yes No
NEG Yes Yes Yes Yes No
PERMUTE Yes Yes Yes Yes No
PRELU Yes Yes Yes Yes No
RCP Yes Yes Yes Yes No
RELU Yes Yes Yes Yes No
RELUN Yes Yes Yes Yes No
RESHAPE Yes Yes Yes Yes No
RSQRT Yes Yes Yes Yes No
SELU Yes Yes Yes Yes No
SIGMOID Yes Yes Yes Yes No
SOFTRELU Yes Yes Yes Yes No
SOFTSIGN Yes Yes Yes Yes No
SPACE2DEPTH Yes Yes Yes Yes No
SQRT Yes Yes Yes Yes No
SQUARE Yes Yes Yes Yes No
SUBTRACT Yes Yes Yes Yes No
SWISH Yes Yes Yes Yes No
TANH Yes Yes Yes Yes No
MAX_POOL + ABS Yes Yes Yes Yes No
MAX_POOL + ACOSH Yes Yes Yes Yes No
MAX_POOL + ADD Yes Yes Yes Yes No
MAX_POOL + ATAN Yes Yes Yes Yes No
MAX_POOL + BATCH_NORM Yes Yes Yes Yes No
MAX_POOL + CELU Yes Yes Yes Yes No
MAX_POOL + CLIP Yes Yes Yes Yes No
MAX_POOL + ELU Yes Yes Yes Yes No
MAX_POOL + ERF Yes Yes Yes Yes No
MAX_POOL + GELU Yes Yes Yes Yes No
MAX_POOL + HARD_SIGMOID Yes Yes Yes Yes No
MAX_POOL + HSWISH Yes Yes Yes Yes No
MAX_POOL + INVERSE_SIGMOID Yes Yes Yes Yes No
MAX_POOL + LEAKY_RELU Yes Yes Yes Yes No
MAX_POOL + MISH Yes Yes Yes Yes No
MAX_POOL + MULTIPLY Yes Yes Yes Yes No
MAX_POOL + NEG Yes Yes Yes Yes No
MAX_POOL + PRELU Yes Yes Yes Yes No
MAX_POOL + RCP Yes Yes Yes Yes No
MAX_POOL + RELU Yes Yes Yes Yes No
MAX_POOL + RELUN Yes Yes Yes Yes No
MAX_POOL + RSQRT Yes Yes Yes Yes No
MAX_POOL + SELU Yes Yes Yes Yes No
MAX_POOL + SIGMOID Yes Yes Yes Yes No
MAX_POOL + SOFTRELU Yes Yes Yes Yes No
MAX_POOL + SOFTSIGN Yes Yes Yes Yes No
MAX_POOL + SQRT Yes Yes Yes Yes No
MAX_POOL + SQUARE Yes Yes Yes Yes No
MAX_POOL + SUBTRACT Yes Yes Yes Yes No
MAX_POOL + SWISH Yes Yes Yes Yes No
MAX_POOL + TANH Yes Yes Yes Yes No