STM32MP2 NPU description

1. Description[edit | edit source]↑

1.1. Hardware and software capabilities[edit | edit source]↑

The unit integrated in STM32MP2x boards to accelerate the AI processing is the GCNanoUltra31-VIP.
This unit is a combination of GPU and NPU and these two parts are sharing the same parallel processing unit (shaders).
The NPU part is composed of one AI core that delivers 1.2 TOPS at 800 MHz that can be overdrive to 900 MHz to reach 1.35 TOPS.
On the GPU side, the computing power is 12.8 GFLOPS at 800 MHz when processing 16 bit data.

To have a clearer view of the performances you can reach with this NPU, here is a table of the performances of several common models:

Model	Input Shape	Type
MobilenetV2_1.0	224x224	72 FPS
SSD_MobilenetV2 FPNLite	256x256	36 FPS
YoloV8n	256x256	59 FPS
DeepLabV3	257x257	17 FPS

1.2. Restriction and usage[edit | edit source]↑

To access and run an neural network (NN) model on the NPU, you need to use the OpenVX software stack. But, to ease the usage of the NPU software stack, we have developed a stai_mpu unified API that allows you to run an NN model easily. For more information, visit the wiki article on how to use stai_mpu API.

This NPU IP only supports 8-bits NN models quantized with the per-tensor asymmetric quantization scheme. If the quantization scheme is different, like per-channel, the model will run mainly on GPU instead of NPU. You will find in the next section the list of the supported operations on NPU and on GPU with all the information about the data format needed for the execution on the hardware.

The NPU/GPU does not support custom operators coming from other frameworks like TFLite™ or ONNX™, if the model contains such operators they will be removed or conversion to NBG format will fail. However, it is possible to define your own OpenVX operator.

Important

Application using concurrently GPU (as an example, for display rendering) and NPU for NN processing will have reduced performance on the overall use case because both GPU and NPU use a time-sharing mechanism to access the shaders.

2. Operation support[edit | edit source]↑

Information

Data type abbreviations:

asym-u8: asymmetric_affine-uint8
asym-i8: symmetric_affine-int8
fp32: float32
fp16: float16
bool8: bool8
int16: int16
int32: int32

Execution engine abbreviations:

NPU: Neural processing unit
GPU: Graphics processing unit

2.1. Basic operations[edit | edit source]↑

This is the list of the basic operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_CONV2D	asym-u8 / asym-i8
VSI_NN_OP_CONV2D	fp32 / fp16
VSI_NN_OP_CONV1D	asym-u8 / asym-i8
VSI_NN_OP_CONV1D	fp32 / fp16
VSI_NN_OP_CONV3D	asym-u8 / asym-i8
VSI_NN_OP_CONV3D	fp32 / fp16
VSI_NN_OP_DECONVOLUTION	asym-u8 / asym-i8
VSI_NN_OP_DECONVOLUTION	fp32 / fp16
VSI_NN_OP_DECONVOLUTION1D	asym-u8
	asym-i8
	fp32 / fp16
VSI_NN_OP_FCL2	asym-u8 / asym-i8
VSI_NN_OP_FCL2	fp32 / fp16
VSI_NN_OP_GROUPED_CONV1D	asym-u8
	asym-i8
	fp32 / fp16
VSI_NN_OP_GROUPED_CONV2D	asym-u8
	asym-i8
	fp32 / fp16

2.2. Activation operations[edit | edit source]↑

This is the list of the OVXLIB activation operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_ABS	asym-u8 / asym-i8
VSI_NN_OP_ABS	fp32 / fp16
VSI_NN_OP_ACOSH	asym-u8 / asym-i8
VSI_NN_OP_ACOSH	fp32 / fp16
VSI_NN_OP_ATAN	asym-u8 / asym-i8
VSI_NN_OP_ATAN	fp32 / fp16
VSI_NN_OP_ATANH	asym-u8 / asym-i8
VSI_NN_OP_ATANH	fp32 / fp16
VSI_NN_OP_CELU	asym-u8 / asym-i8
VSI_NN_OP_CELU	fp32 / fp16
VSI_NN_OP_CLIP	asym-u8 / asym-i8
VSI_NN_OP_CLIP	fp32 / fp16
VSI_NN_OP_COS	asym-u8 / asym-i8
VSI_NN_OP_COS	fp32 / fp16
VSI_NN_OP_ELU	asym-u8 / asym-i8
VSI_NN_OP_ELU	fp32 / fp16
VSI_NN_OP_ERF	asym-u8 / asym-i8
VSI_NN_OP_ERF	fp32 / fp16
VSI_NN_OP_EXP	asym-u8 / asym-i8
VSI_NN_OP_EXP	fp32 / fp16
VSI_NN_OP_GELU	asym-u8 / asym-i8
VSI_NN_OP_GELU	fp32 / fp16
VSI_NN_OP_HARD_SIGMOID	asym-u8 / asym-i8
VSI_NN_OP_HARD_SIGMOID	fp32 / fp16
VSI_NN_OP_INVERSE_SIGMOID	asym-u8 / asym-i8
VSI_NN_OP_INVERSE_SIGMOID	fp32 / fp16
VSI_NN_OP_LEAKY_RELU	asym-u8 / asym-i8
VSI_NN_OP_LEAKY_RELU	fp32 / fp16
VSI_NN_OP_LINEAR	asym-u8 / asym-i8
VSI_NN_OP_LINEAR	fp32 / fp16
VSI_NN_OP_LOG	asym-u8 / asym-i8
VSI_NN_OP_LOG	fp32 / fp16
VSI_NN_OP_LOG_SOFTMAX	asym-u8 / asym-i8
VSI_NN_OP_LOG_SOFTMAX	fp32 / fp16
VSI_NN_OP_MISH	asym-u8 / asym-i8
VSI_NN_OP_MISH	fp32 / fp16
VSI_NN_OP_NEG	asym-u8 / asym-i8
VSI_NN_OP_NEG	fp32 / fp16
VSI_NN_OP_PRELU	asym-u8 / asym-i8
VSI_NN_OP_PRELU	fp32 / fp16
VSI_NN_OP_RCP	asym-u8 / asym-i8
VSI_NN_OP_RCP	fp32 / fp16
VSI_NN_OP_RELU	asym-u8 / asym-i8
VSI_NN_OP_RELU	fp32 / fp16
VSI_NN_OP_RELUN	asym-u8 / asym-i8
VSI_NN_OP_RELUN	fp32 / fp16
VSI_NN_OP_RSQRT	asym-u8 / asym-i8
VSI_NN_OP_RSQRT	fp32 / fp16
VSI_NN_OP_SELU	asym-u8 / asym-i8
VSI_NN_OP_SELU	fp32 / fp16
VSI_NN_OP_SIGMOID	asym-u8 / asym-i8
VSI_NN_OP_SIGMOID	fp32 / fp16
VSI_NN_OP_SIGN	asym-u8 / asym-i8
VSI_NN_OP_SIGN	fp32 / fp16
VSI_NN_OP_SIN	asym-u8 / asym-i8
VSI_NN_OP_SIN	fp32 / fp16
VSI_NN_OP_SOFTMAX	asym-u8 / asym-i8
VSI_NN_OP_SOFTMAX	fp32 / fp16
VSI_NN_OP_SOFTRELU	asym-u8 / asym-i8
VSI_NN_OP_SOFTRELU	fp32 / fp16
VSI_NN_OP_SOFTSIGN	asym-u8 / asym-i8
VSI_NN_OP_SOFTSIGN	fp32 / fp16
VSI_NN_OP_SQRT	asym-u8 / asym-i8
VSI_NN_OP_SQRT	fp32 / fp16
VSI_NN_OP_SQUARE	asym-u8 / asym-i8
VSI_NN_OP_SQUARE	fp32 / fp16
VSI_NN_OP_SWISH	asym-u8 / asym-i8
VSI_NN_OP_SWISH	fp32 / fp16
VSI_NN_OP_TANH	asym-u8 / asym-i8
VSI_NN_OP_TANH	fp32 / fp16

2.3. Elementwise operations[edit | edit source]↑

This is the list of the elementwise operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_ADD	asym-u8 / asym-i8
VSI_NN_OP_ADD	fp32 / fp16
VSI_NN_OP_ADDN	asym-u8 / asym-i8
VSI_NN_OP_ADDN	fp32 / fp16
VSI_NN_OP_DIVIDE	asym-u8 / asym-i8
VSI_NN_OP_DIVIDE	fp32 / fp16
VSI_NN_OP_FLOORDIV	asym-u8 / asym-i8
VSI_NN_OP_FLOORDIV	fp32 / fp16
VSI_NN_OP_LOGICAL_NOT	bool8
VSI_NN_OP_LOGICAL_OPS	bool8
VSI_NN_OP_MATRIXMUL	asym-u8 / asym-i8
VSI_NN_OP_MATRIXMUL	fp32 / fp16
VSI_NN_OP_MAXIMUM	asym-u8 / asym-i8
VSI_NN_OP_MAXIMUM	fp32 / fp16
VSI_NN_OP_MINIMUM	asym-u8 / asym-i8
VSI_NN_OP_MINIMUM	fp32 / fp16
VSI_NN_OP_MOD	asym-u8 / asym-i8
VSI_NN_OP_MOD	fp32 / fp16
VSI_NN_OP_MULTIPLY	asym-u8 / asym-i8
VSI_NN_OP_MULTIPLY	fp32 / fp16
VSI_NN_OP_POW	asym-u8 / asym-i8
VSI_NN_OP_POW	fp32 / fp16
VSI_NN_OP_RELATIONAL_OPS	asym-u8 / asym-i8
	fp32 / fp16
	bool8
VSI_NN_OP_SELECT	asym-u8 / asym-i8
	fp32 / fp16
	bool8
VSI_NN_OP_SUBTRACT	asym-u8 / asym-i8
VSI_NN_OP_SUBTRACT	fp32 / fp16

2.4. Normalization operations[edit | edit source]↑

This is the list of the normalization operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_BATCH_NORM	asym-u8 / asym-i8
VSI_NN_OP_BATCH_NORM	fp32 / fp16
VSI_NN_OP_BATCHNORM_SINGLE	asym-u8 / asym-i8
VSI_NN_OP_BATCHNORM_SINGLE	fp32 / fp16
VSI_NN_OP_GROUP_NORM	asym-u8 / asym-i8
VSI_NN_OP_GROUP_NORM	fp32 / fp16
VSI_NN_OP_INSTANCE_NORM	asym-u8 / asym-i8
VSI_NN_OP_INSTANCE_NORM	fp32 / fp16
VSI_NN_OP_L2_NORMALIZE	asym-u8 / asym-i8
VSI_NN_OP_L2_NORMALIZE	fp32 / fp16
VSI_NN_OP_LAYER_NORM	asym-u8 / asym-i8
VSI_NN_OP_LAYER_NORM	fp32 / fp16
VSI_NN_OP_LPNORM	asym-u8 / asym-i8
VSI_NN_OP_LPNORM	fp32 / fp16
VSI_NN_OP_LRN2	asym-u8 / asym-i8
VSI_NN_OP_LRN2	fp32 / fp16
VSI_NN_OP_MOMENTS	asym-u8 / asym-i8
VSI_NN_OP_MOMENTS	fp32 / fp16

2.5. Reshape operations[edit | edit source]↑

This is the list of the reshape operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_ARGMAX	asym-u8 / asym-i8
VSI_NN_OP_ARGMAX	fp32 / fp16
VSI_NN_OP_ARGMIN	asym-u8 / asym-i8
VSI_NN_OP_ARGMIN	fp32 / fp16
VSI_NN_OP_BATCH2SPACE	asym-u8 / asym-i8
VSI_NN_OP_BATCH2SPACE	fp32 / fp16
VSI_NN_OP_CONCAT	asym-u8 / asym-i8
VSI_NN_OP_CONCAT	fp32 / fp16
VSI_NN_OP_DEPTH2SPACE	asym-u8 / asym-i8
VSI_NN_OP_DEPTH2SPACE	fp32 / fp16
VSI_NN_OP_EXPAND_BROADCAST	asym-u8 / asym-i8
VSI_NN_OP_EXPAND_BROADCAST	fp32 / fp16
VSI_NN_OP_PAD2	asym-u8 / asym-i8
VSI_NN_OP_PAD2	fp32 / fp16
VSI_NN_OP_PERMUTE	asym-u8 / asym-i8
VSI_NN_OP_PERMUTE	fp32 / fp16
VSI_NN_OP_REDUCE	asym-u8 / asym-i8
VSI_NN_OP_REDUCE	fp32 / fp16
VSI_NN_OP_REORG	asym-u8 / asym-i8
VSI_NN_OP_REORG	fp32 / fp16
VSI_NN_OP_RESHAPE2	asym-u8 / asym-i8
VSI_NN_OP_RESHAPE2	fp32 / fp16
VSI_NN_OP_REVERSE	asym-u8 / asym-i8
VSI_NN_OP_REVERSE	fp32 / fp16
VSI_NN_OP_SHUFFLECHANNEL	asym-u8 / asym-i8
VSI_NN_OP_SHUFFLECHANNEL	fp32 / fp16
VSI_NN_OP_SLICE	asym-u8 / asym-i8
VSI_NN_OP_SLICE	fp32 / fp16
VSI_NN_OP_SPACE2BATCH	asym-u8 / asym-i8
VSI_NN_OP_SPACE2BATCH	fp32 / fp16
VSI_NN_OP_SPACE2DEPTH	asym-u8 / asym-i8
VSI_NN_OP_SPACE2DEPTH	fp32 / fp16
VSI_NN_OP_SPLIT	asym-u8 / asym-i8
VSI_NN_OP_SPLIT	fp32 / fp16
VSI_NN_OP_SQUEEZE	asym-u8 / asym-i8
VSI_NN_OP_SQUEEZE	fp32 / fp16
VSI_NN_OP_STACK	asym-u8 / asym-i8
VSI_NN_OP_STACK	fp32 / fp16
VSI_NN_OP_STRIDED_SLICE	asym-u8 / asym-i8
VSI_NN_OP_STRIDED_SLICE	fp32 / fp16
VSI_NN_OP_UNSTACK	asym-u8 / asym-i8
VSI_NN_OP_UNSTACK	fp32 / fp16

2.6. RNN operations[edit | edit source]↑

This is the list of the recurrent neural network (RNN) operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_CONV2D_LSTM	asym-u8 / asym-i8
VSI_NN_OP_CONV2D_LSTM	fp32 / fp16
VSI_NN_OP_CONV2D_LSTM_CELL	asym-u8 / asym-i8
VSI_NN_OP_CONV2D_LSTM_CELL	fp32 / fp16
VSI_NN_OP_GRU	asym-u8 / asym-i8
VSI_NN_OP_GRU	fp32 / fp16
VSI_NN_OP_GRUCELL	asym-u8 / asym-i8
VSI_NN_OP_GRUCELL	fp32 / fp16
VSI_NN_OP_LSTM_OVXLIB	asym-u8 / asym-i8
VSI_NN_OP_LSTM_OVXLIB	fp32 / fp16
VSI_NN_OP_LSTMUNIT_OVXLIB	asym-u8 / asym-i8
VSI_NN_OP_LSTMUNIT_OVXLIB	fp32 / fp16
VSI_NN_OP_SVDF	asym-u8 / asym-i8
VSI_NN_OP_SVDF	fp32 / fp16

2.7. Pooling operations[edit | edit source]↑

This is the list of the recurrent neural network (RNN) operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_AVG_POOL3D	asym-u8 / asym-i8
VSI_NN_OP_AVG_POOL3D	fp32 / fp16
VSI_NN_OP_GLOBALLPPOOL	asym-u8 / asym-i8
VSI_NN_OP_GLOBALLPPOOL	fp32 / fp16
VSI_NN_OP_LPPOOL	asym-u8 / asym-i8
VSI_NN_OP_LPPOOL	fp32 / fp16
VSI_NN_OP_MAX_POOL3D	asym-u8 / asym-i8
VSI_NN_OP_MAX_POOL3D	fp32 / fp16
VSI_NN_OP_MAXPOOLWITHARGMAX	asym-u8 / asym-i8
VSI_NN_OP_MAXPOOLWITHARGMAX	fp32 / fp16
VSI_NN_OP_MAXUNPOOL	asym-u8 / asym-i8
VSI_NN_OP_MAXUNPOOL	fp32 / fp16
VSI_NN_OP_POOL	asym-u8 / asym-i8
VSI_NN_OP_POOL	fp32 / fp16
VSI_NN_OP_POOLWITHARGMAX	asym-u8 / asym-i8
VSI_NN_OP_POOLWITHARGMAX	fp32 / fp16
VSI_NN_OP_ROI_POOL	asym-u8 / asym-i8
VSI_NN_OP_ROI_POOL	fp32 / fp16
VSI_NN_OP_UPSAMPLE	asym-u8 / asym-i8
VSI_NN_OP_UPSAMPLE	fp32 / fp16

2.8. Miscellaneous operations[edit | edit source]↑

This is the list of other operations supported by the NPU.

Operation	Type	NPU support	GPU support
VSI_NN_OP_BUCKETIZE	asym-u8 / asym-i8
VSI_NN_OP_BUCKETIZE	fp32 / fp16
VSI_NN_OP_CAST	all types
VSI_NN_OP_CEIL	asym-u8 / asym-i8
VSI_NN_OP_CEIL	fp32 / fp16
VSI_NN_OP_CONCATSHIFT	asym-u8 / asym-i8
VSI_NN_OP_CONCATSHIFT	fp32 / fp16
VSI_NN_OP_CUMSUM	asym-u8 / asym-i8
VSI_NN_OP_CUMSUM	fp32 / fp16
VSI_NN_OP_DATACONVERT	asym-u8 / asym-i8
VSI_NN_OP_DROPOUT	asym-u8 / asym-i8
VSI_NN_OP_DROPOUT	fp32 / fp16
VSI_NN_OP_EMBEDDING_LOOKUP	asym-u8 / asym-i8
VSI_NN_OP_EMBEDDING_LOOKUP	fp32 / fp16
VSI_NN_OP_FLOOR	asym-u8 / asym-i8
VSI_NN_OP_FLOOR	fp32 / fp16
VSI_NN_OP_GATHER	asym-u8 / asym-i8
VSI_NN_OP_GATHER	fp32 / fp16
VSI_NN_OP_GATHER_ELEMENTS	asym-u8 / asym-i8
VSI_NN_OP_GATHER_ELEMENTS	fp32 / fp16
VSI_NN_OP_GATHER_ND	asym-u8 / asym-i8
VSI_NN_OP_GATHER_ND	fp32 / fp16
VSI_NN_OP_GRID_SAMPLE	asym-u8 / asym-i8
VSI_NN_OP_GRID_SAMPLE	fp32 / fp16
VSI_NN_OP_ONE_HOT	asym-u8 / asym-i8
VSI_NN_OP_ONE_HOT	fp32 / fp16
VSI_NN_OP_PROPOSAL	asym-u8 / asym-i8
VSI_NN_OP_PROPOSAL	fp32 / fp16
VSI_NN_OP_REPEAT	asym-u8 / asym-i8
VSI_NN_OP_REPEAT	fp32 / fp16
VSI_NN_OP_RESIZE	asym-u8 / asym-i8
VSI_NN_OP_RESIZE	fp32 / fp16
VSI_NN_OP_RESIZE_1D	asym-u8 / asym-i8
VSI_NN_OP_RESIZE_1D	fp32 / fp16
VSI_NN_OP_RESIZE_3D	asym-u8 / asym-i8
VSI_NN_OP_RESIZE_3D	fp32 / fp16
VSI_NN_OP_REVERSESEQUENCE	asym-u8 / asym-i8
VSI_NN_OP_REVERSESEQUENCE	fp32 / fp16
VSI_NN_OP_ROUND	asym-u8 / asym-i8
VSI_NN_OP_ROUND	fp32 / fp16
VSI_NN_OP_SCATTER_ELEMENTS	asym-u8 / asym-i8
VSI_NN_OP_SCATTER_ELEMENTS	fp32 / fp16
VSI_NN_OP_SCATTER_ND	asym-u8 / asym-i8
VSI_NN_OP_SCATTER_ND	fp32 / fp16
VSI_NN_OP_SCATTER_ND_UPDATE	asym-u8 / asym-i8
VSI_NN_OP_SCATTER_ND_UPDATE	fp32 / fp16
VSI_NN_OP_SEQUENCE_MASK	asym-u8 / asym-i8
VSI_NN_OP_SEQUENCE_MASK	fp32 / fp16
VSI_NN_OP_SIGNAL_FRAME	asym-u8 / asym-i8
VSI_NN_OP_SIGNAL_FRAME	fp32 / fp16
VSI_NN_OP_TILE	asym-u8 / asym-i8
VSI_NN_OP_TILE	fp32 / fp16
VSI_NN_OP_UPSAMPLESCALE	asym-u8 / asym-i8
VSI_NN_OP_UPSAMPLESCALE	fp32 / fp16
VSI_NN_OP_VARIABLE	asym-u8 / asym-i8
VSI_NN_OP_VARIABLE	fp32 / fp16

2.9. Fuse operation support[edit | edit source]↑

This is the list of the operation combinations that the NPU can fuse.

Fuse operation	First operation
Second operation	CONV2D	CONV1D	DW_2D	FCL2	PERMUTE
ABS
ACOSH
ADD
ATAN
CELU
CLIP
CONV1D
CONV2D
DEPTH2SPACE
DW_2D
ELU
ERF
GELU
HARD_SIGMOID
HSWISH
INVERSE_SIGMOID
LEAKY_RELU
LOG
MAX_POOL
MISH
MULTIPLY
NEG
PERMUTE
PRELU
RCP
RELU
RELUN
RESHAPE
RSQRT
SELU
SIGMOID
SOFTRELU
SOFTSIGN
SPACE2DEPTH
SQRT
SQUARE
SUBTRACT
SWISH
TANH
MAX_POOL + ABS
MAX_POOL + ACOSH
MAX_POOL + ADD
MAX_POOL + ATAN
MAX_POOL + BATCH_NORM
MAX_POOL + CELU
MAX_POOL + CLIP
MAX_POOL + ELU
MAX_POOL + ERF
MAX_POOL + GELU
MAX_POOL + HARD_SIGMOID
MAX_POOL + HSWISH
MAX_POOL + INVERSE_SIGMOID
MAX_POOL + LEAKY_RELU
MAX_POOL + MISH
MAX_POOL + MULTIPLY
MAX_POOL + NEG
MAX_POOL + PRELU
MAX_POOL + RCP
MAX_POOL + RELU
MAX_POOL + RELUN
MAX_POOL + RSQRT
MAX_POOL + SELU
MAX_POOL + SIGMOID
MAX_POOL + SOFTRELU
MAX_POOL + SOFTSIGN
MAX_POOL + SQRT
MAX_POOL + SQUARE
MAX_POOL + SUBTRACT
MAX_POOL + SWISH
MAX_POOL + TANH