1. Article purpose[edit source]
This article describes how to run an inference on the STM32MPx using the STAI MPU C++ API. It is an example based on an image classification application. The unified architecture of the API allows deploying the same application on all the STM32MPx platforms.
Information |
This article provides a simple inferencing example using the STAI MPU C++ API. If you wish to explore all the functions provided by the API, please refer to the STAI MPU C++ Reference. |
2. STAI MPU C++ API[edit source]
STAI MPU is a cross STM32MPx platforms machine learning and computer vision inferencing API with a flexible interface to run several deep learning models formats such as TFLite, ONNX and NBG. If you wish to learn more about the API structure please refer to STAI MPU: AI unified API for STM32MPUs. In the next section we explore, with a basic image-classification example, how to inference your models on the board using the STAI MPU C++ API whether you are running a TFLite, an ONNX or an NBG model on either STM32MP1x or STM32MP2x.
3. Running an inference using the STAI MPU C++ API[edit source]
3.1. Install runtime prerequisites on the target[edit source]
After having configured the AI OpenSTLinux package, you can install the X-LINUX-AI components and the packages needed to run the example. First, we start by installing main packages needed for image processing which are Python Numpy, Python OpenCV.
Board $> apt-get install python3-numpy python3-opencv
Then, we will need to install the API plugins required during runtime depending on the model format used for the inference:
- If you are using a TFLite™ model, please run the following command:
x-linux-ai -i stai-mpu-tflite
- If you are using an ONNX™ model, please run the following command:
x-linux-ai -i stai-mpu-ort
- If you are running your model on an STM32MP2 board and running and NBG model, please run the following command:
x-linux-ai -i stai-mpu-ovx
Information |
The package stai-mpu-ovx is not available on STM32MP1x boards. The TFLite™ and ONNX™ runtimes supported by the API are running exclusively on CPU. |
3.2. Install and launch of the X-LINUX-AI SDK[edit source]
First of all, the installation of the X-LINUX-SDK on your host machine is required to be able to cross-compile AI applications for STM32 boards.
Information |
The SDK environment setup script must be run once on each new working terminal on which you cross-compile. |
Once the OpenSTLinux SDK is installed, go to the installation directory and source the environment:
cd <working directory absolute path>/Developer-Package/SDK
source unknown package
source unknown package
3.3. Write a simple NN inference C++ program[edit source]
#include "stai_mpu_network.h"
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <sys/time.h>
#include <fstream>
int main (int argc, char* argv[]){
if (argc != 3)
return 0;
/////////////////////////////////////////////////
/// Loading the model and metadata ///
/////////////////////////////////////////////////
std::string model_path = argv[1]; // .onnx or .tflite or .nb file
stai_mpu_network stai_model = stai_mpu_network(model_path);
int num_inputs = stai_model.get_num_inputs();
int num_outputs = stai_model.get_num_outputs();
std::vector<stai_mpu_tensor> input_infos = stai_model.get_input_infos();
std::vector<stai_mpu_tensor> output_infos = stai_model.get_output_infos();
std::vector<int> input_shape(input_infos[0].get_rank());
std::vector<int> output_shape(output_infos[0].get_rank());
for (int i = 0; i < num_inputs; i++) {
stai_mpu_tensor input_info = input_infos[i];
std::cout << "** Input node: " << i;
std::cout << " -Input name: " << input_info.get_name();
std::cout << " -Input dims: " << input_info.get_rank();
std::cout << " -Input type: " << input_info.get_dtype();
input_shape = input_info.get_shape();
std::cout << std::endl;
}
for (int i = 0; i < num_outputs; i++) {
stai_mpu_tensor output_info = output_infos[i];
std::cout << "** Output node: " << i;
std::cout << " -Output name: " << output_info.get_name();
std::cout << " -Output dims: " << output_info.get_rank();
std::cout << " -Output type: " << output_info.get_dtype();
output_shape = output_info.get_shape();
std::cout << std::endl;
}
int input_width = input_shape[1];
int input_height = input_shape[2];
int input_channels = input_shape[3];
auto size_in_bytes = input_height * input_width * input_channels;
////////////////////////////////////////////////////
/// Pre-processing the Image ///
////////////////////////////////////////////////////
std::string image_path = argv[2]; // .onnx or .tflite file
cv::Mat img_bgr = cv::imread(image_path);
cv::Mat img_nn;
cv::Size size_nn(input_width, input_height);
cv::resize(img_bgr, img_nn, size_nn);
cv::cvtColor(img_nn, img_nn, cv::COLOR_BGR2RGB);
uint8_t* input_data = img_nn.data;
bool floating_model = false;
float input_mean = 127.5f;
float input_std = 127.5f;
///////////////////////////////////////////////////
/// Setting input and infer ///
///////////////////////////////////////////////////
uint8_t* input_tensor_int = new uint8_t[size_in_bytes];
float* input_tensor_f = new float[size_in_bytes];
if (input_infos[0].get_dtype() == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32)
floating_model = true;
if (floating_model) {
for (int i = 0; i < size_in_bytes; i++)
input_tensor_f[i] = (input_data[i] - input_mean) / input_std;
stai_model.set_input(0, input_tensor_f);
} else {
for (int i = 0; i < size_in_bytes; i++)
input_tensor_int[i] = input_data[i];
stai_model.set_input(0, input_tensor_int);
}
stai_model.run();
///////////////////////////////////////////////////
/// Reading and post-processing output ///
///////////////////////////////////////////////////
void* outputs_tensor = stai_model.get_output(0);
int output_dims = output_infos[0].get_rank();
stai_mpu_dtype output_dtype = output_infos[0].get_dtype();
output_shape = output_infos[0].get_shape();
int output_size = output_shape[output_dims-1];
std::vector<int> results_idx(5);
std::vector<float> results_accu(5);
if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT32 || output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_FLOAT16) {
float* output_data = static_cast<float*>(outputs_tensor);
for (int i = 0; i < 5; i++) {
results_idx[i] = std::distance(&output_data[0],
std::max_element(&output_data[0], &output_data[output_size]));
results_accu[i] = output_data[results_idx[i]];
output_data[results_idx[i]] = 0;
}
} else if (output_dtype == stai_mpu_dtype::STAI_MPU_DTYPE_UINT8){
uint8_t* output_data = static_cast<uint8_t*>(outputs_tensor);
for (int i = 0; i < 5; i++) {
results_idx[i] = std::distance(&output_data[0],
std::max_element(&output_data[0], &output_data[output_size]));
results_accu[i] = output_data[results_idx[i]] / 255.0;
output_data[results_idx[i]] = 0;
}
}
free(outputs_tensor) //Required for NBG to avoid memory leak
for (int i = 0; i < 5; i++) {
std::cout << i << ": " << results_idx[i] << "-" << results_accu[i] <<
std::endl;
}
}
3.4. Create the Makefile[edit source]
Create the following Makefile in the sources/stai_mpu/examples directory:
OPENCV_PKGCONFIG?="opencv4"
ARCHITECTURE?=""
TARGET_BIN = stai_mpu_img_cls
CXXFLAGS += -Wall $(shell pkg-config --cflags $(OPENCV_PKGCONFIG))
CXXFLAGS += -std=c++17 -O3
CXXFLAGS += -I../../
LDFLAGS += -lpthread -lopencv_core -lopencv_imgproc -lopencv_imgcodecs
LDFLAGS += -lstai_mpu -ldl
SRCS = stai_mpu_img_cls.cc
OBJS = $(SRCS:.cc=.o)
all: $(TARGET_BIN)
$(TARGET_BIN): $(OBJS)
$(CXX) -o $@ $^ $(LDFLAGS)
$(OBJS): $(SRCS)
$(CXX) $(CXXFLAGS) -c $^
clean:
rm -rf $(OBJS) $(TARGET_BIN)
Information |
The runtime plugin libraries such as libstai_mpu_ovx.so, libstai_mpu_tflite.so or libstai_mpu_ort.so are loaded dynamically during runtime, so there is no need to use them in the linking of you application during build time. Only the -lstai_mpu is required to link with the STAI MPU C++ API. |
3.5. Download and prepare test data[edit source]
3.6. Cross-compilation and launch[edit source]
Run the cross-compilation:
cd .. make
Once the compilation is finished, a binary file named stai_mpu_img_cls has been created.
Copy the binary file and the test data directory onto the board:
scp -r edgetpu_cpp_example/ root@<board_ip>:/path/ scp stai_mpu_img_cls root@<board_ip>:/path/
Information |
The corresponding runtime plugin to your model should installed before running the binary. |
Connect to the board and launch the example:
./stai_mpu_img_cls
After 2000 inferences the result is:
Running model: edgetpu_cpp_example/inat_bird_edgetpu.tflite and model: edgetpu_cpp_example/inat_plant_edgetpu.tflite for 2000 inferences [Bird image analysis] max value index: 659 value: 0.652344 [Plant image analysis] max value index: 1680 value: 0.964844 Using one Edge TPU, # inferences: 2000 costs: 106.278 seconds.
Where the max value index represents the index of the class detected and the value represents the confidence. On these particular pictures, the bird detected is a poecile atricapillus (black-capped chickadee) and the plant is a helianthus annuus (sunflower). The index and the name of each class are available in the inat_bird_labels.txt and inat_plant_labels.txt stored in the edgetpu_cpp_example directory.