This article describes how to measure the performance of a TensorFlow Lite neural network model on STM32MPUs platforms.
1. Installation[edit source]
1.1. Installing from the OpenSTLinux AI package repository[edit source]
After having configured the AI OpenSTLinux package install X-LINUX-AI components for this application. The minimum package required is:
x-linux-ai -i tensorflow-lite-tools
The model used in this example can be installed from the following package:
x-linux-ai -i img-models-mobilenetv2-10-224
2. How to use the Benchmark application[edit source]
2.1. Executing with the command line[edit source]
The benchmark_model C/C++ application is located in the userfs partition:
/usr/local/bin/tensorflow-lite-*/tools/benchmark_model
It accepts the following input parameters:
usage: ./benchmark_model <flags>
Flags:
--num_runs=50 int32 optional expected number of runs, see also min_secs, max_secs
--min_secs=1 float optional minimum number of seconds to rerun for, potentially s
--max_secs=150 float optional maximum number of seconds to rerun for, potentially .
--run_delay=-1 float optional delay between runs in seconds
--run_frequency=-1 float optional Execute at a fixed frequency, instead of a fixed del.
--num_threads=-1 int32 optional number of threads
--use_caching=false bool optional Enable caching of prepacked weights matrices in matr.
--benchmark_name= string optional benchmark name
--output_prefix= string optional benchmark output prefix
--warmup_runs=1 int32 optional minimum number of runs performed on initialization, s
--warmup_min_secs=0.5 float optional minimum number of seconds to rerun for, potentially s
--verbose=false bool optional Whether to log parameters whose values are not set. .
--dry_run=false bool optional Whether to run the tool just with simply loading the.
--report_peak_memory_footprint=false bool optional Report the peak memory footprint by periodically che.
--memory_footprint_check_interval_ms=50 int32 optional The interval in millisecond between two consecutive .
--graph= string optional graph file name
--input_layer= string optional input layer names
--input_layer_shape= string optional input layer shape
--input_layer_value_range= string optional A map-like string representing value range for *inte4
--input_layer_value_files= string optional A map-like string representing value file. Each item.
--allow_fp16=false bool optional allow fp16
--require_full_delegation=false bool optional require delegate to run the entire graph
--enable_op_profiling=false bool optional enable op profiling
--max_profiling_buffer_entries=1024 int32 optional max profiling buffer entries
--profiling_output_csv_file= string optional File path to export profile data as CSV, if not set .
--print_preinvoke_state=false bool optional print out the interpreter internals just before call.
--print_postinvoke_state=false bool optional print out the interpreter internals just before benc.
--release_dynamic_tensors=false bool optional Ensure dynamic tensor's memory is released when they.
--help=false bool optional Print out all supported flags if true.
--num_threads=-1 int32 optional number of threads used for inference on CPU.
--max_delegated_partitions=0 int32 optional Max number of partitions to be delegated.
--min_nodes_per_partition=0 int32 optional The minimal number of TFLite graph nodes of a partit.
--delegate_serialize_dir= string optional Directory to be used by delegates for serializing an.
--delegate_serialize_token= string optional Model-specific token acting as a namespace for deleg.
--external_delegate_path= string optional The library path for the underlying external.
--external_delegate_options= string optional A list of comma-separated options to be passed to th.
2.2. Testing with MobileNet[edit source]
The model used for testing is the mobilenet_v2_1.0_224_int8_per_tensor.tflite downloaded from STM32 AI model zoo[1].
It is a model used for image classification.
On the target, the model is located here:
/usr/local/x-linux-ai/image-classification/models/mobilenet/
2.2.1. Benchmark on CPU[edit source]
The easiest way to use the benchmark is to run it on the CPU. Please, expand the following section to learn how to use it.
To do this you need to run at least the benchmark with the --graph option. But to go a little further, it can be interesting to add the number of CPU cores as an option to the benchmark to improve the performances. Here is the command to execute:
/usr/local/bin/tensorflow-lite-*/tools/benchmark_model --graph=/usr/local/x-linux-ai/image-classification/models/mobilenet/mobilenet_v2_1.0_224_int8_per_tensor.tflite --num_threads=2
Console output:
STARTING! Log parameter values verbosely: [0] Num threads: [2] Graph: [/usr/local/x-linux-ai/image-classification/models/mobilenet/mobilenet_v2_1.0_224_int8_per_tensor.tflite] #threads used for CPU inference: [2] Loaded model /usr/local/x-linux-ai/image-classification/models/mobilenet/mobilenet_v2_1.0_224_int8_per_tensor.tflite INFO: Created TensorFlow Lite XNNPACK delegate for CPU. The input model file size (MB): 3.59541 Initialized session in 273.952ms. Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. count=5 first=133187 curr=119112 min=119112 max=133187 avg=122056 std=5566 Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds. count=50 first=120156 curr=119232 min=119081 max=128264 avg=119760 std=1422 Inference timings in us: Init: 273952, First inference: 133187, Warmup (avg): 122056, Inference (avg): 119760 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. Memory footprint delta from the start of the tool (MB): init=13.4102 overall=19.6641
3. References[edit source]