1. Introduction[edit | edit source]
STAI MPU is a cross-STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as Network Binary Graph (NBG), TensorFlow™ Lite[1] and ONNX™[2]. This unified API powers neural networks models on all STM32MPx series to provide a unified and simple interface for porting machine learning applications using unified API calls. The Python API has been built using pybind tool from C++ API code which allows consistency between the two APIs.
To get more information about the available runtimes, please refer to STAI MPU: AI unified API for STM32MPUs.
2. Python Class interfaces[edit | edit source]
The Python API is based on several interface classes that are exposed at the user level.
2.1. stai_mpu_network class interface[edit | edit source]
The stai_mpu_network class is the main interface for running inference and manipulating model’s information such as number of input nodes, number of output nodes, neural network input data and output data. It relies on several methods:
2.1.1. stai_mpu_network constructor method[edit | edit source]
This function serves as a constructor for the stai_mpu_network class and is in charge of identifying automatically the model loaded into the constructor. The backend_engine attribute is then set to its associated stai_mpu_backend_engine enum value. The stai_wrapper is then instantiated accordingly by loading dynamically the associated plugin (shared library of the back-end).
def stai_mpu_network (model_path: Path, str)
2.1.2. get_num_inputs method:[edit | edit source]
This method returns an integer corresponding to the number of input nodes of the neural network model.
def get_num_inputs () -> int
2.1.3. get_num_outputs method:[edit | edit source]
This method returns an integer corresponding to the number of output nodes of the neural network model.
def get_num_outputs () -> int
2.1.4. get_backend_engine method:[edit | edit source]
This method returns an enum class corresponding to the value of the currently used back-end such as TensorFlow™ Lite[1], ONNXRuntime™[3] and OpenVX™[4] and inference engine amongst CPU, NPU or EdgeTPU.
def get_backend_engine () -> stai_mpu_backend_engine
2.1.5. get_input_infos method:[edit | edit source]
This method retrieves information about the input tensors of the neural network model. It returns a list of stai_mpu_tensor class objects. Each element of the vector contains each input tensor information such as the shape, the data type, quantization type, etc.
def get_input_infos () -> List[stai_mpu_tensor]
Information |
This function returns a list of all the input tensor structures. It is up to the user to access the tensor he needs based on its index. In case of confusion, please refer to the How to run inference using the STAI MPU Python API article. |
2.1.6. get_output_infos method:[edit | edit source]
This method retrieves information about the output tensors of the neural network model. It returns a list of stai_mpu_tensor class objects. Each element of the vector contains each output tensor information such as the shape, the data type, quantization type, etc.
def get_output_infos () -> List[stai_mpu_tensor]
Information |
This function returns a list of all the output tensor structures. It is up to the user to access the tensor he needs based on its index. In case of confusion, please refer to the How to run inference using the STAI MPU Python API article. |
2.1.7. set_input method:[edit | edit source]
This method sets input data for a specific input node of the neural network model based on the input index. If the input index exceeds the number of input nodes, the function throws an error. It takes as a parameter:
- index: The index of the input tensor. It should be lower than the number of input nodes.
- input_tensor: An NDArray containing the preprocessed input tensor data to be used for inferencing.
def set_input (index: int, input_tensor: NDArray) -> None
2.1.8. run method:[edit | edit source]
This method runs the inference based on the input dataset by the set_input method and takes no argument. If the inference running is failed this method throws an error explaining why it failed. If the inference is successful, it returns true.
def run () -> bool
2.1.9. get_output method:[edit | edit source]
This method reads the output data for a specific output node of the neural network model based on the output index set as an argument. If the input index exceeds the number of input nodes the function throws an error. It takes as a parameter:
- index : The index of the input tensor. It should be lower than the number of input nodes, otherwise an error will occur.
This method returns:
- a NDArray containing the inference results read from the output node number index.
def get_output (int index) -> NDArray
2.2. stai_mpu_tensor class interface[edit | edit source]
The stai_mpu_tensor class was introduced to ease the manipulation of input and output tensors and retrieve related information. It allows storing important tensor information such as the I/O tensor name, tensor shape, tensor data type, quantization type and quantization parameters. All the class attributes are set as private, but getter methods along with constructor are provided to ease the manipulation of this class.
2.2.1. get_name method:[edit | edit source]
This method returns the name of the I/O tensor as a string.
def get_name() -> str
2.2.2. get_index method:[edit | edit source]
This method returns the index of the I/O tensor.
def get_index() -> int
2.2.3. get_rank method:[edit | edit source]
This method returns the number of dimensions of the I/O tensor.
def get_rank () -> int
2.2.4. get_shape method:[edit | edit source]
This method returns the shape of the I/O tensor as a vector of integers. The size of this vector is the rank.
def get_shape () -> Tuple[int]
2.2.5. get_dtype method:[edit | edit source]
This method returns the stai_mpu_dtype associated to data type of the I/O tensor. E.g for a tensor in float32 data type, this function will return the string "float32" which can be checked later with float32 data type from numpy library.
def get_dtype() -> str
2.2.6. get_qtype method:[edit | edit source]
This method returns the stai_mpu_qtype associated to data quantization scheme of the I/O tensor. This function will return of one the 3 supported quantization types: dynamicFixedPoint, affinePerTensor or affinePerChannel.
def get_qtype() -> str
2.2.7. get_scale method:[edit | edit source]
This method returns the scale variable associated to the I/O tensor in case of the Affine per tensor quantization scheme, otherwise it returns 0.0f.
def get_scale() -> float
2.2.8. get_zero_point method:[edit | edit source]
This method returns the zero point variable associated to the I/O tensor in case of the Affine per tensor quantization scheme, otherwise it returns 0.0f.
def get_zero_point() -> float
2.2.9. get_fixed_point_pos method:[edit | edit source]
This method returns the scale associated to data quantization scheme of the I/O tensor.
def get_fixed_point_pos() -> float
2.3. stai_mpu_backend_engine enum class[edit | edit source]
The current version of the STAI MPU API supports the following backends and execution engine.
STAI_MPU_TFLITE_CPU_ENGINE |
STAI_MPU_ORT_CPU_ENGINE |
STAI_MPU_OVX_NPU_ENGINE |
To get more information on how to use the STAI MPU Python API, please refer to the How to run inference using the STAI MPU Python API article.
3. References[edit | edit source]