How to compile model and run inference on Coral Edge TPU using STM32MP1

Revision as of 09:59, 26 June 2020 by Registered User

1. Article purpose[edit source]

This article aims to describe how to run an inference on Coral EdgeTPU using the STM32MP1.

Info white.png Information
There are many ways to achieve the same result; this article aims to provide at least one simple example. You are free to explore other methods that are better adapted to your development constraints.

2. Prerequisites[edit source]

2.1. Install the EdgeTPU compiler[edit source]

In the aim of performing an inference on the Coral Edge TPU hardware, we will need to convert our Tensorflow Lite model to an Edge TPU model. This step is done by the mean of an EdgeTPU compiler. For that purpose, we will need to get installed on the host computer the EdgeTPU compiler.

Info white.png Information
The EdgeTPU compiler is provided only for 64 bits architecture. Thus, it is not possible to install it on the target board STM32MP1. Besides, you must ensure that your host computer is under any Debian-based Linux system.

We install the Coral EdgeTPU compiler by running the following commands on the host computer:

  curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
  echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
  sudo apt-get update
  sudo apt-get install edgetpu-compiler

To make sure that the compiler was installed, enter the following command

  edgetpu_compiler -v 
 Edge TPU Compiler version 2.x.y

3. Fetch a Coral EdgeTPU compiled model[edit source]

3.1. Coral's compiled models[edit source]

Coral [1] is offering a large set of quantized and compiled models for demonstration purposes. If you wish to use a ready model you may need to take a look : https://coral.ai/models/

3.2. Compile your own model[edit source]

The Edge TPU compiler's role is to convert (one or more) Tensorflow Lite models into an Edge TPU compatible model. It takes as an argument your .tflite model and returns a .tflite Edge TPU compatible model. You have the possibility to pass multiple models as arguments (each separated with a space), they will be co-compiled and share the Edge TPU's 8Mb of RAM memory for parameter data caching.

You must be aware that not all the operation supported on Tensorflow Lite are supported on the Edge TPU. While building your own model architecture, you should get an idea about the operations and the layers supported by the Edge TPU compiler by taking a look on this table: https://coral.ai/docs/edgetpu/models-intro/#supported-operations

You could see that your model architecture is using unsupported operations and doesn't meet all the requirements, then only the first portion of the model will execute on the Edge TPU. Starting from the first node in your model graph where an unsupported operations occurs, the compiler swipe to the CPU and every following operations will be ran on the CPU of the target board even if an Edge TPU supported operation occurs. The Edge TPU compiler cannot partition the model more than once.

Info white.png Information
If an important percentage of your model is executed on CPU, you should expect a significantly degraded inference speed compared to a model that executes entirely on the Edge TPU. If you wish a maximal speed, you may need to use only Edge TPU supported operation in your model

To compile your .tflite model, you have to execute the following command:

  edgetpu_compiler your_model_1.tflite your_model_1.tflite ...

If you don't specify the -out_dir option, the compiled model will be saved in the current directory under the name input_filename_edgetpu.tflite. You will be able to see another .log file that will provide you information about data caching and memory consumption on the Edge TPU RAM.

3.3. Example: compile an object detection model[edit source]

The object detection model used is the ssd_mobilenet_v1_coco_quant.tflite downloaded from the Coral [1] website and compiled for the Coral Edge TPU using the steps detailed as follow.

  wget http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
  unzip ./coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
  Archive:  ./coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
            inflating: detect.tflite           
            inflating: labelmap.txt  

Now that the model has been downloaded and extracted successfully, it is time to compile it using the EdgeTPU compiler:

  edgetpu_compiler detect.tflite

3.4. Send your model to the target[edit source]

In our workspace directory on the board, we will create 2 main directories to organize our workflow.

 cd /usr/local && mkdir workspace
 cd /usr/local/workspace && mkdir models

We will transfer then our compiled model from the host computer to the models directory in the workspace on the board.

 scp path/to/your/compiled_model_edgetpu.tflite root@<board_ip_address>:/usr/local/workspace/models/

Now that our workspace is ready with our compiled model file, it is time to see how to run an inference using the C++ benchmark application.

4. Run the inference[edit source]

4.1. Install the benchmark application[edit source]

After having configured the AI OpenSTLinux package you can install X-LINUX-AI components for this application:

 apt-get install tflite-edgetpu-benchmark

4.2. Execute the benchmark on the model[edit source]

Now that our compiled model is loaded to the board, it is time to run an inference using the benchmark example. This example aims to measure an average inference time for your model on a desired number of loops. For that purpose, we need to execute the following command to benchmark the performances of the model through the average inference time on 25 loops as an example:

 cd /usr/local/demo-ai/benchmark/tflite-edgetpu/
 ./tflite_edgetpu_benchmark -m /usr/local/workspace/models/your_compiled_model.tflite -l 25

The first inference may take longer time since the model is being loaded to the Coral EdgeTPU RAM memory. This time is not being considered in the average inference time computing.

4.3. Customize your application[edit source]

You have the possibility to adapt your application to your development constraints and needs.

To build a prototype of your application using Python, you may take a look on Image classification Python example or Object detection Python example.

If you feel the need to run your application using the C++ API, please refer to the Image classification C++ example or Object detection C++ example.

5. References[edit source]