1. Introduction[edit | edit source]
STAI MPU is a cross-STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as Network Binary Graph (NBG), TensorFlow™ Lite[1] and ONNX™[2]. This unified API powers neural networks models on all STM32MPx series to provide a unified and simple interface for porting machine learning applications using unified API calls. The API uses the dynamic loading concept to check during runtime if the required plugins for running the user’s model are installed on the target. This gives more flexibility to the user to install only the necessary plugins and packages required for his application development.
To get more information about the available runtimes, please refer to STAI MPU: AI unified API for STM32MPUs.
2. C++ Class interfaces[edit | edit source]
The C++ API is based on several interface classes that are exposed at the user level.
2.1. stai_mpu_network class interface[edit | edit source]
The stai_mpu_network class is the main interface for running inference and manipulating models’ information such as number of input nodes, number of output nodes, neural network input data and output data. It relies on several methods:
2.1.1. stai_mpu_network constructor method[edit | edit source]
Constructor for the stai_mpu_network class. This function is in charge of identifying automatically the model loaded into the constructor. The backend_engine attribute is then set to its associated stai_mpu_backend_engine enum value. The stai_wrapper is then instantiated accordingly by loading dynamically the associated plugin (shared library of the backend).
stai_mpu_network (const std::string& model_path, bool use_hw_acceleration);
2.1.2. get_num_inputs method:[edit | edit source]
This method returns an integer corresponding to the number of input nodes of the neural network model.
int get_num_inputs ();
2.1.3. get_num_outputs method:[edit | edit source]
This method returns an integer corresponding to the number of output nodes of the neural network model.
int get_num_outputs ();
2.1.4. get_backend_engine method:[edit | edit source]
This method returns an enum class corresponding to the value of the currently used back-end such as TensorFlow™ Lite[1], ONNXRuntime™[3] and OpenVX™[4] and inference engine amongst CPU, NPU or EdgeTPU.
stai_mpu_backend_engine get_backend_engine ();
2.1.5. get_input_infos method:[edit | edit source]
This method retrieves information about the input tensors of the neural network model. It returns a vector of stai_mpu_tensor class objects. Each element of the vector contains each input tensor information such as the shape, the data type, quantization type, etc.
std::vector<stai_mpu_tensor> get_input_infos () const;
Information |
This function returns an array of all the input tensor structures. It is up to the user to access the tensor he needs based on its index. In case of confusion, please refer to the How to run inference using the STAI MPU C++ API article. |
2.1.6. get_output_infos method:[edit | edit source]
This method retrieves information about the output tensors of the neural network model. It returns a vector of stai_mpu_tensor class objects. Each element of the vector contains each output tensor information such as the shape, the data type, quantization type, etc.
std::vector<stai_mpu_tensor> get_output_infos () const;
Information |
This function returns an array of all the output tensor structures. It is up to the user to access the tensor he needs based on its index. In case of confusion, please refer to the How to run inference using the STAI MPU C++ API article. |
2.1.7. set_input method:[edit | edit source]
This method sets input data for a specific input node of the neural network model based on the input index. If the input index exceeds the number of input nodes, the function throws an error. It takes as a parameter:
- index: The index of the input tensor. It should be lower than the number of input nodes.
- data: A void pointer to the input data to be fed for inference.
void set_input (int index, const void* data);
2.1.8. run method:[edit | edit source]
This method runs the inference based on the input data set by the set_input method and takes no argument. If the inference running is failed this method throws an error explaining why it failed. If the inference is successful, it returns true.
bool run ();
2.1.9. get_output method:[edit | edit source]
This method reads the output data for a specific output node of the neural network model based on the output index set as an argument. If the input index exceeds the number of input nodes the function throws an error. It takes as a parameter:
- index : The index of the input tensor. It should be lower than the number of input nodes, otherwise an error will occur.
This method returns:
- a void pointer to the inference results read from the output node number index. The pointer is to be casted to the proper data type.
void* get_output (int index);
2.2. stai_mpu_tensor class interface[edit | edit source]
The stai_mpu_tensor class was introduced to ease the manipulation of input and output tensors and retrieve related information. It allows storing important tensor information such as the I/O tensor name, tensor shape, tensor data type, quantization type and quantization parameters with the help of the stai_mpu_quant_params structure. All the class attributes are set as private, but getter methods along with constructor are provided to ease the manipulation of this class.
2.2.1. stai_mpu_tensor constructor method[edit | edit source]
The straightforward way to retrieve a stai_mpu_tensor is to use the get_input_infos and/or get_output_info methods from the stai_mpu_network The constructor here is provided for initilization purposes. This constructor takes as an argument:
- name : A string representing the name of the I/O tensor. This is very useful for the use-case of object detection where there are usually several output nodes.
- index: An integer representing the index of the nodes amongst all the I/O tensors.
- shape: An integer vector taking up to 6 values representing the I/O tensor shape.
- rank: An integer representing the number of dimensions of the I/O tensor.
- dtype: A stai_mpu_dtype enum for representing the data type of the I/O tensor.
- qtype: A stai_mpu_qtype enum for representing the quantization type of the I/O tensor.
- qparams: A stai_mpu_quant_params structure for storing the quantization parameters of the I/O tensor.
stai_mpu_tensor (const std::string& name,
int index,
const std::vector<int>& shape,
int rank,
stai_mpu_dtype dtype,
stai_mpu_qtype qtype,
stai_mpu_quant_params qparams);
2.2.2. get_name method:[edit | edit source]
This method returns the name of the I/O tensor as a string.
std::string get_name ();
2.2.3. get_index method:[edit | edit source]
This method returns the index of the I/O tensor.
index get_index ();
2.2.4. get_rank method:[edit | edit source]
This method returns the number of dimensions of the I/O tensor.
index get_rank ();
2.2.5. get_shape method:[edit | edit source]
This method returns the shape of the I/O tensor as a vector of integers. The size of this vector is the rank.
std::vector<int> get_shape ();
2.2.6. get_dtype method:[edit | edit source]
This method returns the stai_mpu_dtype associated to data type of the I/O tensor.
stai_mpu_dtype get_dtype ();
2.2.7. get_qtype method:[edit | edit source]
This method returns the stai_mpu_qtype associated to data quantization scheme of the I/O tensor.
stai_mpu_qtype get_qtype ();
2.2.8. get_qparams method:[edit | edit source]
This method returns the stai_mpu_quant_params structure associated to data quantization scheme of the I/O tensor.
stai_mpu_quant_params get_qparams ();
2.3. stai_mpu_dtype enum class[edit | edit source]
The STAI API supports the following data types.
STAI_MPU_DTYPE_INT8 |
STAI_MPU_DTYPE_INT16 |
STAI_MPU_DTYPE_INT32 |
STAI_MPU_DTYPE_INT64 |
STAI_MPU_DTYPE_UINT8 |
STAI_MPU_DTYPE_UINT16 |
STAI_MPU_DTYPE_UINT32 |
STAI_MPU_DTYPE_UINT64 |
STAI_MPU_DTYPE_BOOL8 |
STAI_MPU_DTYPE_CHAR |
STAI_MPU_DTYPE_BFLOAT16 |
STAI_MPU_DTYPE_FLOAT16 |
STAI_MPU_DTYPE_FLOAT32 |
STAI_MPU_DTYPE_FLOAT64 |
STAI_MPU_DTYPE_UNDEFINED |
2.4. stai_mpu_qtype enum class[edit | edit source]
The STAI API supports the following qauntization types depending on the model used.
STAI_MPU_QTYPE_DYNAMIC_FIXED_POINT |
STAI_MPU_QTYPE_STATIC_AFFINE |
STAI_MPU_QTYPE_NONE |
2.5. stai_mpu_quant_params structure[edit | edit source]
The stai_mpu_quant_params structure is a union of 2 main structures corresponding to the 2 supported quantization schemes and are defined as shown below:
dfp | int8_t fixed_point_position | |
static_affine | float scale | uint32_t zero_point |
The retrieved quantization parameters can be then used to quantize the input data before feeding it to the input tensor and similarly to dequantize the output tensor after getting the output tensor data from the inference results when needed.
2.6. stai_mpu_backend_engine enum class[edit | edit source]
The current version of the STAI MPU API supports the following backends and execution engine.
STAI_MPU_TFLITE_CPU_ENGINE |
STAI_MPU_TFLITE_NPU_ENGINE |
STAI_MPU_ORT_CPU_ENGINE |
STAI_MPU_ORT_NPU_ENGINE |
STAI_MPU_OVX_NPU_ENGINE |
To get more information on how to use the STAI MPU C++ API, please refer to the How to run inference using the STAI MPU C++ API article.
3. References[edit | edit source]