STM32MP2 NPU description

Revision as of 17:56, 2 July 2024 by Registered User (→‎Restriction and usage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

1. Description[edit source]

1.1. Hardware and software capabilities[edit source]

The unit integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP.
This unit is a combination of GPU and NPU and these two parts are sharing the same parallel processing unit (shaders).
The NPU part is composed of one AI core that delivers 1.2 TOPS at 800 MHz that can be overdrive to 900 MHz to reach 1.35 TOPS.
On the GPU side, the computing power is 12.8 GFLOPS at 800 MHz when processing 16 bit data.

To have a clearer view of the performances you can reach with this NPU, here is a table of the performances of several common models:

Model Input Shape Type
MobilenetV2_1.0 224x224 72 FPS
SSD_MobilenetV2 FPNLite 256x256 36 FPS
YoloV8n 256x256 59 FPS
DeepLabV3 257x257 17 FPS

1.2. Restriction and usage[edit source]

To access and run an neural network (NN) model on the NPU, you need to use the OpenVX software stack. But, to ease the usage of the NPU software stack, we have developed a stai_mpu unified API that allows you to run an NN model easily. For more information, visit the wiki article on how to use stai_mpu API.


This NPU IP only supports 8-bits NN models quantized with the per-tensor asymmetric quantization scheme. If the quantization scheme is different, like per-channel, the model will run mainly on GPU instead of NPU. You will find in the next section the list of the supported operations on NPU and on GPU with all the information about the data format needed for the execution on the hardware.

The NPU/GPU does not support custom operators coming from other frameworks like TFLite™ or ONNX™, if the model contains such operators they will be removed or conversion to NBG format will fail. However, it is possible to define your own OpenVX operator.

Warning DB.png Important
Application using concurrently GPU (as an example, for display rendering) and NPU for NN processing will have reduced performance on the overall use case because both GPU and NPU use a time-sharing mechanism to access the shaders.

2. Operation support[edit source]

Info white.png Information

Data type abbreviations:

  • asym-u8: asymmetric_affine-uint8
  • asym-i8: symmetric_affine-int8
  • fp32: float32
  • fp16: float16
  • bool8: bool8
  • int16: int16
  • int32: int32

Execution engine abbreviations:

  • NPU: Neural processing unit
  • GPU: Graphics processing unit

2.1. Basic operations[edit source]

This is the list of the basic operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_CONV2D asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_CONV1D asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_CONV3D asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_DECONVOLUTION asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_DECONVOLUTION1D asym-u8 Yes No
asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_FCL2 asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_GROUPED_CONV1D asym-u8 Yes No
asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_GROUPED_CONV2D asym-u8 Yes No
asym-i8 Yes Yes
fp32 / fp16 No Yes

2.2. Activation operations[edit source]

This is the list of the OVXLIB activation operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_ABS asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ACOSH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ATAN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ATANH asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_CELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_CLIP asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_COS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ERF asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_EXP asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_HARD_SIGMOID asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_INVERSE_SIGMOID asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LEAKY_RELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LINEAR asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LOG asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_LOG_SOFTMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MISH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_NEG asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_PRELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RCP asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RELUN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_RSQRT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SIGMOID asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SIGN asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SIN asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SOFTMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SOFTRELU asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SOFTSIGN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SQRT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SQUARE asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_SWISH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_TANH asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes

2.3. Elementwise operations[edit source]

This is the list of the elementwise operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_ADD asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_ADDN asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_DIVIDE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_FLOORDIV asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LOGICAL_NOT bool8 No Yes
VSI_NN_OP_LOGICAL_OPS bool8 No Yes
VSI_NN_OP_MATRIXMUL asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_MAXIMUM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MINIMUM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MOD asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MULTIPLY asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_POW asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RELATIONAL_OPS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
bool8 No Yes
VSI_NN_OP_SELECT asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
bool8 No Yes
VSI_NN_OP_SUBTRACT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes

2.4. Normalization operations[edit source]

This is the list of the normalization operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_BATCH_NORM asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_BATCHNORM_SINGLE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GROUP_NORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_INSTANCE_NORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_L2_NORMALIZE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LAYER_NORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LPNORM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LRN2 asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MOMENTS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes

2.5. Reshape operations[edit source]

This is the list of the reshape operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_ARGMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ARGMIN asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_BATCH2SPACE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_CONCAT asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_DEPTH2SPACE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_EXPAND_BROADCAST asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_PAD2 asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_PERMUTE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_REDUCE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_REORG asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_RESHAPE2 asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_REVERSE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SHUFFLECHANNEL asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SLICE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SPACE2BATCH asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SPACE2DEPTH asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SPLIT asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_SQUEEZE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_STACK asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_STRIDED_SLICE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No
VSI_NN_OP_UNSTACK asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No

2.6. RNN operations[edit source]

This is the list of the recurrent neural network (RNN) operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_CONV2D_LSTM asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_CONV2D_LSTM_CELL asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_GRU asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_GRUCELL asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_LSTM_OVXLIB asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_LSTMUNIT_OVXLIB asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_SVDF asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes


2.7. Pooling operations[edit source]

This is the list of the recurrent neural network (RNN) operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_AVG_POOL3D asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GLOBALLPPOOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_LPPOOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MAX_POOL3D asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_MAXPOOLWITHARGMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_MAXUNPOOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_POOL asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_POOLWITHARGMAX asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ROI_POOL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_UPSAMPLE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes

2.8. Miscellaneous operations[edit source]

This is the list of other operations supported by the NPU.

Operation Type NPU support GPU support
VSI_NN_OP_BUCKETIZE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_CAST all types No Yes
VSI_NN_OP_CEIL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_CONCATSHIFT asym-u8 / asym-i8 Yes Yes
fp32 / fp16 No Yes
VSI_NN_OP_CUMSUM asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_DATACONVERT asym-u8 / asym-i8 Yes No
VSI_NN_OP_DROPOUT asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_EMBEDDING_LOOKUP asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_FLOOR asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GATHER asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GATHER_ELEMENTS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GATHER_ND asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_GRID_SAMPLE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ONE_HOT asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_PROPOSAL asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_REPEAT asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RESIZE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RESIZE_1D asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_RESIZE_3D asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_REVERSESEQUENCE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_ROUND asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SCATTER_ELEMENTS asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SCATTER_ND asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SCATTER_ND_UPDATE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SEQUENCE_MASK asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_SIGNAL_FRAME asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_TILE asym-u8 / asym-i8 No Yes
fp32 / fp16 No Yes
VSI_NN_OP_UPSAMPLESCALE asym-u8 / asym-i8 Yes No
fp32 / fp16 No Yes
VSI_NN_OP_VARIABLE asym-u8 / asym-i8 Yes No
fp32 / fp16 Yes No

2.9. Fuse operation support[edit source]

This is the list of the operation combinations that the NPU can fuse.

Fuse operation First operation
Second operation CONV2D CONV1D DW_2D FCL2 PERMUTE
ABS Yes Yes Yes Yes No
ACOSH Yes Yes Yes Yes No
ADD Yes Yes Yes Yes No
ATAN Yes Yes Yes Yes No
CELU Yes Yes Yes Yes No
CLIP Yes Yes Yes Yes No
CONV1D No No No No Yes
CONV2D No No No No Yes
DEPTH2SPACE Yes Yes Yes Yes No
DW_2D No No No No Yes
ELU Yes Yes Yes Yes No
ERF Yes Yes Yes Yes No
GELU Yes Yes Yes Yes No
HARD_SIGMOID Yes Yes Yes Yes No
HSWISH Yes Yes Yes Yes No
INVERSE_SIGMOID Yes Yes Yes Yes No
LEAKY_RELU Yes Yes Yes Yes No
LOG Yes Yes Yes Yes No
MAX_POOL Yes Yes Yes Yes No
MISH Yes Yes Yes Yes No
MULTIPLY Yes Yes Yes Yes No
NEG Yes Yes Yes Yes No
PERMUTE Yes Yes Yes Yes No
PRELU Yes Yes Yes Yes No
RCP Yes Yes Yes Yes No
RELU Yes Yes Yes Yes No
RELUN Yes Yes Yes Yes No
RESHAPE Yes Yes Yes Yes No
RSQRT Yes Yes Yes Yes No
SELU Yes Yes Yes Yes No
SIGMOID Yes Yes Yes Yes No
SOFTRELU Yes Yes Yes Yes No
SOFTSIGN Yes Yes Yes Yes No
SPACE2DEPTH Yes Yes Yes Yes No
SQRT Yes Yes Yes Yes No
SQUARE Yes Yes Yes Yes No
SUBTRACT Yes Yes Yes Yes No
SWISH Yes Yes Yes Yes No
TANH Yes Yes Yes Yes No
MAX_POOL + ABS Yes Yes Yes Yes No
MAX_POOL + ACOSH Yes Yes Yes Yes No
MAX_POOL + ADD Yes Yes Yes Yes No
MAX_POOL + ATAN Yes Yes Yes Yes No
MAX_POOL + BATCH_NORM Yes Yes Yes Yes No
MAX_POOL + CELU Yes Yes Yes Yes No
MAX_POOL + CLIP Yes Yes Yes Yes No
MAX_POOL + ELU Yes Yes Yes Yes No
MAX_POOL + ERF Yes Yes Yes Yes No
MAX_POOL + GELU Yes Yes Yes Yes No
MAX_POOL + HARD_SIGMOID Yes Yes Yes Yes No
MAX_POOL + HSWISH Yes Yes Yes Yes No
MAX_POOL + INVERSE_SIGMOID Yes Yes Yes Yes No
MAX_POOL + LEAKY_RELU Yes Yes Yes Yes No
MAX_POOL + MISH Yes Yes Yes Yes No
MAX_POOL + MULTIPLY Yes Yes Yes Yes No
MAX_POOL + NEG Yes Yes Yes Yes No
MAX_POOL + PRELU Yes Yes Yes Yes No
MAX_POOL + RCP Yes Yes Yes Yes No
MAX_POOL + RELU Yes Yes Yes Yes No
MAX_POOL + RELUN Yes Yes Yes Yes No
MAX_POOL + RSQRT Yes Yes Yes Yes No
MAX_POOL + SELU Yes Yes Yes Yes No
MAX_POOL + SIGMOID Yes Yes Yes Yes No
MAX_POOL + SOFTRELU Yes Yes Yes Yes No
MAX_POOL + SOFTSIGN Yes Yes Yes Yes No
MAX_POOL + SQRT Yes Yes Yes Yes No
MAX_POOL + SQUARE Yes Yes Yes Yes No
MAX_POOL + SUBTRACT Yes Yes Yes Yes No
MAX_POOL + SWISH Yes Yes Yes Yes No
MAX_POOL + TANH Yes Yes Yes Yes No