How to compile model and run inference on Coral Edge TPU using STM32MP1

Revision as of 10:13, 1 July 2020 by Registered User (→‎Installing the Edge TPU compiler)

1. Article purpose[edit source]

This article aims at describing how to run an inference on Coral Edge TPU using STM32MP1 microprocessor devices.

Info white.png Information
This article provides a simple example. Other methods exist that might be better adapted to your development constraints. Feel free to explore them.

2. Prerequisites[edit source]

2.1. Installing the Edge TPU compiler[edit source]

To perform an inference on the Coral Edge TPU hardware, we need to convert our TensorFlow Lite model into an Edge TPU model. This can be achieved by using the EdgeTPU compiler. To do this, install the EdgeTPU compiler on your host computer.

Info white.png Information
The Edge TPU compiler is provided only for 64-bit architectures. Thus, it is not possible to install it on the STM32MP1 target board. Besides, you must ensure that your host computer uses a Debian-based Linux system.

Install the Coral Edge TPU compiler by running the following commands on your host computer:

  curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
  echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
  sudo apt-get update
  sudo apt-get install edgetpu-compiler

Enter the following command to check that the compiler has been installed:

  edgetpu_compiler -v 
 Edge TPU Compiler version 2.x.y

3. Fetch a Coral EdgeTPU compiled model[edit source]

3.1. Coral's compiled models[edit source]

Coral [1] is offering a large set of quantized and compiled models for demonstration purposes. If you wish to use a ready model you may need to take a look : https://coral.ai/models/

3.2. Compile your own model[edit source]

The Edge TPU compiler's role is to convert (one or more) Tensorflow Lite models into an Edge TPU compatible model. It takes as an argument your .tflite model and returns a .tflite Edge TPU compatible model. You have the possibility to pass multiple models as arguments (each separated with a space), they will be co-compiled and share the Edge TPU's 8Mb of RAM memory for parameter data caching.

You must be aware that not all the operation supported on Tensorflow Lite are supported on the Edge TPU. While building your own model architecture, you should get an idea about the operations and the layers supported by the Edge TPU compiler by taking a look on this table: https://coral.ai/docs/edgetpu/models-intro/#supported-operations

You could see that your model architecture is using unsupported operations and doesn't meet all the requirements, then only the first portion of the model will execute on the Edge TPU. Starting from the first node in your model graph where an unsupported operations occurs, the compiler swipe to the CPU and every following operations will be ran on the CPU of the target board even if an Edge TPU supported operation occurs. The Edge TPU compiler cannot partition the model more than once.

Info white.png Information
If an important percentage of your model is executed on CPU, you should expect a significantly degraded inference speed compared to a model that executes entirely on the Edge TPU. If you wish a maximal speed, you may need to use only Edge TPU supported operation in your model

To compile your .tflite model, you have to execute the following command:

  edgetpu_compiler your_model_1.tflite your_model_1.tflite ...

If you don't specify the -out_dir option, the compiled model will be saved in the current directory under the name input_filename_edgetpu.tflite. You will be able to see another .log file that will provide you information about data caching and memory consumption on the Edge TPU RAM.

3.3. Example: compile an object detection model[edit source]

The object detection model used is the ssd_mobilenet_v1_coco_quant.tflite downloaded from the Coral [1] website and compiled for the Coral Edge TPU using the steps detailed as follow.

  wget http://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
  unzip ./coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
  Archive:  ./coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip
            inflating: detect.tflite           
            inflating: labelmap.txt  

Now that the model has been downloaded and extracted successfully, it is time to compile it using the EdgeTPU compiler:

  edgetpu_compiler detect.tflite

3.4. Send your model to the target[edit source]

In our workspace directory on the board, we will create 2 main directories to organize our workflow.

 cd /usr/local && mkdir -p workspace
 cd /usr/local/workspace && mkdir -p models

We will transfer then our compiled model from the host computer to the models directory in the workspace on the board.

 scp path/to/your/compiled_model_edgetpu.tflite root@<board_ip_address>:/usr/local/workspace/models/

Now that our workspace is ready with our compiled model file, it is time to see how to run an inference using the C++ benchmark application.

4. Run the inference[edit source]

4.1. Install the benchmark application[edit source]

After having configured the AI OpenSTLinux package you can install X-LINUX-AI components for this application:

 apt-get install tflite-edgetpu-benchmark

4.2. Execute the benchmark on the model[edit source]

Now that our compiled model is loaded to the board, it is time to run an inference using the benchmark example. This example aims to measure an average inference time for your model on a desired number of loops. For that purpose, we need to execute the following command to benchmark the performances of the model through the average inference time on 25 loops as an example:

 cd /usr/local/demo-ai/benchmark/tflite-edgetpu/
 ./tflite_edgetpu_benchmark -m /usr/local/workspace/models/your_compiled_model.tflite -l 25

The first inference may take longer time since the model is being loaded to the Coral EdgeTPU RAM memory. This time is not being considered in the average inference time computing.

4.3. Customize your application[edit source]

You have the possibility to adapt your application to your development constraints and needs.

To build a prototype of your application using Python, you may take a look on Image classification Python example or Object detection Python example.

If you feel the need to run your application using the C++ API, please refer to the Image classification C++ example or Object detection C++ example.

5. References[edit source]