STAI MPU C++ Reference

Revision as of 17:03, 13 September 2024 by Registered User
Applicable for STM32MP13x lines, STM32MP15x lines, STM32MP25x lines

1. Introduction[edit source]

STAI MPU is a cross STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as TFLite, ONNX and NBG. This unified API powers neural networks models on all STM32MPx series to provide a unified and simple interface for porting machine learning applications using unified API calls. The API uses the dynamic loading concept to check during runtime if the required plugins for running the user’s model are installed on the target. This gives more flexibility to the user to install only the necessary plugins and packages required for his application development.
To get more information about the available runtimes, please refer to STAI MPU: AI unified API for STM32MPUs.

2. C++ Class interfaces[edit source]

The C++ API is based on several interface classes that are exposed at the user level.

2.1. stai_mpu_network class interface[edit source]

The stai_mpu_network class is the main interface for running inference and manipulating models’ information such as number of input nodes, number of output nodes, neural network input data and output data. It relies on several methods:

2.1.1. stai_mpu_network constructor method[edit source]

Constructor for the stai_mpu_network class. This function is in charge of identifying automatically the model loaded into the constructor. The backend_engine attribute is then set to its associated stai_mpu_backend_engine enum value. The stai_wrapper is then instantiated accordingly by loading dynamically the associated plugin (shared library of the backend).

stai_mpu_network (const std::string& model_path);

2.1.2. get_num_inputs method:[edit source]

This method returns an integer corresponding to the number of input nodes of the neural network model.

int get_num_inputs ();

2.1.3. get_num_outputs method:[edit source]

This method returns an integer corresponding to the number of output nodes of the neural network model.

int get_num_outputs ();

2.1.4. get_backend_engine method:[edit source]

This method returns an enum class corresponding to the value of the currently used backend such as TFLite, ORT or OVX and inference engine amongst CPU, NPU or EdgeTPU.

stai_mpu_backend_engine get_backend_engine ();

2.1.5. get_input_infos method:[edit source]

This method retrieves information about the input tensors of the neural network model. It returns a vector of stai_mpu_tensor class objects. Each element of the vector contains each input tensor information such as the shape, the data type, quantization type, etc.

std::vector<stai_mpu_tensor> get_input_infos () const;

2.1.6. get_output_infos method:[edit source]

This method retrieves information about the output tensors of the neural network model. It returns a vector of stai_mpu_tensor class objects. Each element of the vector contains each input tensor information such as the shape, the data type, quantization type, etc.

std::vector<stai_mpu_tensor> get_output_infos () const;

2.1.7. set_input method:[edit source]

This method sets input data for a specific input node of the neural network model based on the input index. If the input index exceeds the number of input nodes, the function throws an error. It takes as a parameter:

  • index: The index of the input tensor. It should be lower than the number of input nodes.
  • data: A void pointer to the input data to be fed for inference.
void set_input (int index, const void* data);

2.1.8. run method:[edit source]

This method runs the inference based on the input data set by the set_input method and takes no argument. If the inference running is failed this method throws an error explaining why it failed. If the inference is successful, it returns true.

bool run ();

2.1.9. get_output method:[edit source]

This method reads the output data for a specific output node of the neural network model based on the output index set as an argument. If the input index exceeds the number of input nodes the function throws an error. It takes as a parameter:

  • index : The index of the input tensor. It should be lower than the number of input nodes, otherwise an error will occur.

This method returns:

  • a void pointer to the inference results read from the output node number index. The pointer is to be casted to the proper data type.
void* get_output (int index);
Warning white.png Warning
In case of using an NBG neural network model, the pointer ownership is transferred to the caller function. The user needs to free the memory allocated by the pointer of the get_output function to avoid memory leaks issues.

2.2. stai_mpu_tensor class interface[edit source]

The stai_mpu_tensor class was introduced to ease the manipulation of input and output tensors and retrieve related information. It allows storing important tensor information such as the I/O tensor name, tensor shape, tensor data type, quantization type and quantization parameters with the help of the stai_mpu_quant_params structure. All the class attributes are set as private, but getter methods along with constructor are provided to ease the manipulation of this class.

2.2.1. stai_mpu_tensor constructor method[edit source]

The straightforward way to retrieve a stai_mpu_tensor is to use the get_input_infos and/or get_output_info methods from the stai_mpu_network The constructor here is provided for initilization purposes. This constructor takes as an argument:

  • name : A string representing the name of the I/O tensor. This is very useful for the use-case of object detection where there are usually several output nodes.
  • index: An integer representing the index of the nodes amongst all the I/O tensors.
  • shape: An integer vector taking up to 6 values representing the I/O tensor shape.
  • rank: An integer representing the number of dimensions of the I/O tensor.
  • dtype: A stai_mpu_dtype enum for representing the data type of the I/O tensor.
  • qtype: A stai_mpu_qtype enum for representing the quantization type of the I/O tensor.
  • qparams: A stai_mpu_quant_params structure for storing the quantization parameters of the I/O tensor.
stai_mpu_tensor (const std::string& name, 
                 int index, 
                 const std::vector<int>& shape, 
                 int rank, 
                 stai_mpu_dtype dtype, 
                 stai_mpu_qtype qtype, 
                 stai_mpu_quant_params qparams);

2.2.2. get_name method:[edit source]

This method returns the name of the I/O tensor as a string.

std::string get_name ();
Warning DB.png Important
It is worth noting that in case of NBG, the offline compiler renames the I/O tensor names. The user might notice different names than in the pre-compiled model..

2.2.3. get_index method:[edit source]

This method returns the index of the I/O tensor.

index get_index ();

2.2.4. get_rank method:[edit source]

This method returns the number of dimensions of the I/O tensor.

index get_rank ();

2.2.5. get_shape method:[edit source]

This method returns the shape of the I/O tensor as a vector of integers. The size of this vector is the rank.

std::vector<int> get_shape ();

2.2.6. get_dtype method:[edit source]

This method returns the stai_mpu_dtype associated to data type of the I/O tensor.

stai_mpu_dtype get_dtype ();

2.2.7. get_qtype method:[edit source]

This method returns the stai_mpu_qtype associated to data quantization scheme of the I/O tensor.

stai_mpu_qtype get_qtype ();

2.2.8. get_qparams method:[edit source]

This method returns the stai_mpu_qtype associated to data quantization scheme of the I/O tensor.

stai_mpu_quant_params get_qparams ();

2.3. stai_mpu_dtype enum class[edit source]

The STAI API supports the following data types.

STAI_MPU_DTYPE_INT8
STAI_MPU_DTYPE_INT16
STAI_MPU_DTYPE_INT32
STAI_MPU_DTYPE_INT64
STAI_MPU_DTYPE_UINT8
STAI_MPU_DTYPE_UINT16
STAI_MPU_DTYPE_UINT32
STAI_MPU_DTYPE_UINT64
STAI_MPU_DTYPE_BOOL8
STAI_MPU_DTYPE_CHAR
STAI_MPU_DTYPE_BFLOAT16
STAI_MPU_DTYPE_FLOAT16
STAI_MPU_DTYPE_FLOAT32
STAI_MPU_DTYPE_FLOAT64
STAI_MPU_DTYPE_UNDEFINED
Warning DB.png Important
Conversion of data types from float16 to float32 and from float32 to float16 have been implemented internally in the API to save the user the complexity and the overhead of their implementation.

2.4. stai_mpu_dtype enum class[edit source]

The STAI API supports the following qauntization types depending on the model used.

STAI_MPU_QTYPE_DYNAMIC_FIXED_POINT
STAI_MPU_QTYPE_AFFINE_PER_CHANNEL
STAI_MPU_QTYPE_AFFINE_PER_TENSOR
STAI_MPU_QTYPE_NON
Warning DB.png Important
Not all the quantization schemes are supported by all the model’s formats. e.g., NBG models do not support dynamic fixed point quantization schemes, so it is not applicable to this input type of model.

2.5. stai_mpu_quant_params structure[edit source]

The stai_mpu_quant_params structure is a union of 3 main structures corresponding to the 3 supported quantization schemes and are defined as shown below:

dfp int8_t fixed_point_position
affine_per_tensor float scale uint32_t zero_point
affine_per_channel uint32_t quant_dim: Channel dimension float* scales: an array of scales int32_t* zero_points: an array of zeropoints

The retrieved quantization parameters can be then used to quantize the input data before feeding it to the input tensor and similarly to dequantize the output tensor after getting the output tensor data from the inference results when needed.

2.6. stai_mpu_backend_engine enum class[edit source]

The current version of the STAI MPU API supports the following backends and execution engine.

STAI_MPU_TFLITE_CPU_ENGINE
STAI_MPU_ORT_CPU_ENGINE
STAI_MPU_OVX_NPU_ENGINE

To get more information on how to use the STAI MPU C++ API, please refer to the how-to article.