How to run Coral Edge TPU inference using Python TensorFlow Lite API

1. Article purpose[edit source]

This article aims to describe how to run an inference on the STM32MP1 using a Google Coral EdgeTPU device and the Python TensorFlow Lite API. It is about an image classification application.

Info white.png Information
There are many ways to achieve the same result; this article aims to provide at least one simple example. You are free to explore other methods that are better adapted to your development constraints.

2. Difference between TensorFlow Lite Python APIs[edit source]

The Artificial Intelligence expansion package X-LINUX-AI comes with two versions of TensorFlow Lite.

The first runtime is based on the TensorFlow Lite[1]2.2.0 version and the second runtime is based on TensorFlow Lite runtime[2]1.12.1 and dedicated to Coral EdgeTPU use.

This situation is due to the fact that TensorFlow Lite 2.2.0 does not support yet the Coral EdgeTPU runtime. The following figure explains the software structure.

File:Tensorflow lite runtime.png
TensorFlow Lite runtime packages structure

The difference is that if you wish to use the TensorFlow Lite 2.2.0 you have to import the following library in your Python script:

import tflite_runtime.interpreter as tflite

If you wish to run inferences on your Coral EdgeTPU device, you need to make the following call in your Python script:

import tflite_edgetpu_runtime.interpreter as tflite

In the following paragraph, we will explore through a basic example of image classification on how you can inference your models on the board using the Coral EdgeTPU device.

3. Run an inference on Coral EdgeTPU using TensorFlow Lite Python API[edit source]

3.1. Installing prerequisites on the target[edit source]

We start by installing the X-LINUX-AI components and the packages needed for running our example. The main packages are Python Numpy[3], Python OpenCV[4] 4.1.x, Python Pillow[5] and Python TensorFlow Lite Edge TPU runtime[2]1.12.1

 apt-get install python3-numpy python3-pillow python3-opencv
 apt-get install python3-tensorflow-lite-edgetpu

3.2. Preparing the workspace on the target[edit source]

Before running the inference make sure that your .tflite model is compiled for inferencing on the Coral EdgeTPU. Please take a look on how to Compile your custom model and send it to the board.

 cd /usr/local/ && mkdir workspace
 cd workspace && mkdir models testdata 

After preparing the workspace on the target sending the compiled model the model directory in the workspace, It is time to send the associated label file and the input image to the workspace so the inference could be executed. You can add as many .jpeg, .jpg and .png pictures as you wish and no mater their size. We will add some image processing operations later to make pictures fit in the size of the input model. In this example, we will be using the mobilenet_v1_1.0_224_quant_edgetpu.tflite model for classifying the images downloaded previously from the Coral[6] website.

 scp path/to/your/compiled/model/mobilenet_v1_1.0_224_quant_edgetpu.tflite root@<board_ip_address>:/usr/local/workspace/models/
 scp path/to/your/label.txt root@<board_ip_address>:/usr/local/workspace/models/
 scp path/to/your/pictures root@<board_ip_address>:/usr/local/workspace/testdata/

Now that our workspace is ready with our compiled model file, label file and some sample pictures, it is time to see how to run an inference using the Python API. We follow by creating a python script that will be transferred via scp command and ran on the target board. This is a very basic example that executes an inference on the Coral Edge TPU that classify images.

 gedit classify_on_stm32mp1.py

3.3. Running the inference[edit source]

If your familiar with inferencing TensorFlow Lite models, you can start by copying the following Python script to your file classify_on_stm32mp1.py. Otherwise, please refer to the next paragraph.

      #!/usr/bin/python3
      #
      # Copyright (c) 2020 STMicroelectronics. All rights reserved.
      #
      # This software component is licensed by ST under BSD 3-Clause license,
      # the "License"; You may not use this file except in compliance with the
      # License. You may obtain a copy of the License at:
      #                        opensource.org/licenses/BSD-3-Clause
       
      import numpy as np
      import tflite_edgetpu_runtime.interpreter as tflite
      from PIL import Image
      import time
      import cv2
      
      label_file = "/usr/local/workspace/models/labels.txt"
      with open( label_file, 'r') as  f :
                 labels = [ line.strip() for line in f.readlines() ]
      model_file = "/usr/local/workspace/models/mobilenet_v1_1.0_224_quant_edgetpu.tflite"
      interpreter = tflite.Interpreter(model_path = model_file, experimental_delegates = [tflite.load_delegate('libedgetpu-max.so.1.0')]) 
      interpreter.allocate_tensors()
      #Getting the model input and output details
      input_details   = interpreter.get_input_details()
      output_details = interpreter.get_output_details()
      height          = input_details[0]['shape'][1]
      width           = input_details[0]['shape'][2]
      image          = Image.open("/usr/local/workspace/testdata/test_image.jpg")
      nn_img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
      nn_img_rgb_resized = cv2.resize(nn_img_rgb, (width, height))
      input_data = np.expand_dims(nn_img_rgb_resized, axis=0)
      interpreter.set_tensor(input_details[0]['index'], input_data)
      start = time.perf_counter()
      interpreter.invoke()
      inference_time = time.perf_counter() - start
      print("inference time:", inference_time)
      results = np.squeeze(interpreter.get_tensor(output_details[0]['index']))
      top_k = results.argsort()[-5:][::-1]
      for i in top_k:
                print('{0:08.6f}'.format(float(results[i]*100/255.0))+":", labels[i])
      print("\n")

Now that our Python script is ready to be executed it is time to send it to the board.

 scp path/to/your/script/classify_on_stm32mp1.py root@<board_ip_address>:/usr/local/workspace/

3.4. Running the inference from the board on the Coral Edge TPU[edit source]

After booting the board and connecting it to the host PC throught an SSH protocol, we are ready to run the inference via the following command

 cd /usr/local/workspace
 python3 classify_on_stm32mp1.py

Enjoy the speed of inferences on the IA hardware accelerator.

4. Explaining the relevant parts of the script[edit source]

4.1. Instantiating the Tensorflow Lite Interpreter[edit source]

We start by loading the labels from the label file by adding these code lines:

      label_file = "/usr/local/workspace/models/labels.txt"
      with open( label_file, 'r') as  f :
                labels = [ line.strip() for line in f.readlines() ]

It is time now to load the model and feed it to the interpreter that we will instantiate using the interpreter API [7] . In this interpreter we are calling a Tensorflow Lite delegate . It is simply an API that delegates a part or all of the graph execution to Edge TPU accelerator hardware. After calling the Edge TPU librarie inside the delegate, it is the time to allocate tensors for the graph execution through our interpreter.

     model_file = "/usr/local/workspace/models/mobilenet_v1_1.0_224_quant_edgetpu.tflite"
     interpreter = tflite.Interpreter(model_path = model_file, experimental_delegates = [tflite.load_delegate('libedgetpu-max.so.1.0')]) 
     interpreter.allocate_tensors()

4.2. Getting the model details and processing the image[edit source]

Now that the interpreter is ready to be fed with the input images, it is important to get the models details so we will be able to adjust the image to fit in the model.

     #Getting the model input and output details
     input_details   = interpreter.get_input_details()
     output_details = interpreter.get_output_details()
     height  = input_details[0]['shape'][1]
     width   = input_details[0]['shape'][2]

It is time to point to our image directory testdata and get images randomly so we will be able to run several image. These images will be firstly converted from BGR to RGB encoding, resized to fit the size of the model input and then get their dimensions expanded by one.

     image        = Image.open("/usr/local/workspace/testdata/test_image.jpg")
     nn_img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
     nn_img_rgb_resized = cv2.resize(nn_img_rgb, (width, height))
     input_data = np.expand_dims(nn_img_rgb_resized, axis=0)

4.3. Invoking the interpreter and displaying results[edit source]

Now that our input data has been processed to fit in the model input size, it is time to feed the image to interpreter input and launch the inference. We will use the time librarie to record the inference duration each time. This time gives a big idea on the Edge TPU performances compared to CPU.

     interpreter.set_tensor(input_details[0]['index'], input_data)
     start = time.perf_counter()
     interpreter.invoke()
     inference_time = time.perf_counter() - start
     print("inference time:", inference_time)
     output_details = interpreter.get_output_details()
     results = np.squeeze(interpreter.get_tensor(output_details[0]['index']))
     top_k = results.argsort()[-5:][::-1]
     for i in top_k:
               print('{0:08.6f}'.format(float(results[i]*100/255.0))+":", self._labels[i])
     print("\n")

5. References[edit source]