How to measure the performance of NBG-based models

Revision as of 12:15, 6 December 2023 by Registered User
Applicable for STM32MP25x lines

This article describes how to measure the performance of a network binary graph (NBG) generated from your neural network (NN) model using the NBG benchmark tool on STM32MP25x platforms.

Info white.png Information
If you encounter any difficulties to generate the network binary graph from your neural network model, you may want to take a look at STM32AI-MPU: NN deployment tool for STM32MP25.

1. Description[edit source]

The NBG benchmark tool allows you to run performance measurements on your network binary graph after generating them from your NN model. The main benchmark metrics are:

  1. Inference time: refers to the amount of time it takes for a trained deep learning model to make predictions or decisions based on new input data.
  2. MAC utilization: refers to the percentage of MAC (multiply accumulate) units used on the NPU by the network binary graph. This metric indicates better whether you are taking the best of your NPU hardware accelerator in terms of computing capacity.

2. Installation[edit source]

2.1. Installing from the OpenSTLinux AI package repository[edit source]

Warning white.png Warning
The software package is provided AS IS, and by downloading it, you agree to be bound to the terms of the software license agreement (SLA0048). The detailed content licenses can be found here.

After having configured the AI OpenSTLinux package you can install X-LINUX-AI components for this application:

 apt-get install nbg-benchmark

3. How to use the NBG benchmark tool[edit source]

3.1. Executing with the command line[edit source]

The nbg_benchmark tool binary is located in the userfs partition:

/usr/local/bin/nbg-benchmark-*/tools/nbg_benchmark

It accepts the following input parameters:

Usage: /usr/local/bin/nbg-benchmark-*/tools/nbg_benchmark  -m <nbg_file .nb> -i <input_file .tensor/.txt> -c <int case_mmac> -l <int nb_loops>

-m --nb_file <.nb file path>:               .nb network binary file to be benchmarked.
-i --input_file <.tensor/.txt/ file path>:  Input file to be used for benchmark (maximum 32 input files).
-c --case_mmac <int>:                        Theorical value of MMAC (Million Multiply Accumulate) of the model.
-l --loops <int>:                           The number of loops of the inference (default loops=1)
--help:                                     Show this help

It is important to mention that only the *.nb file parameter is mandatory to run the benchmark. However, you have the possibility to set the .tensor file generated from the STM32AI-MPU: NN deployment tool for STM32MP25 as an input of your benchmark.
If you know the case MAC of your model, you can set it as an argument. If it is not set as an argument, the MAC utilization computation is skipped during the benchmark and only the inference time is computed.
Finally, you can also set the loops arguments to run your model multiple times. The loops argument is set to 1 as a default value.

3.2. Testing with NBG based on MobileNet v3[edit source]

The model used for testing is the mobilenet_v3_large_100_224_quant.nb, which is a MobilenetV3 Large that has been processed and converted to a network binary graph to run on the NPU.
The model used in this example can be installed from the following package:

 apt-get install nbg-models-mobilenetv3

On the target, the model is located here:

/usr/local/demo-ai/image-classification/models/mobilenet/

To launch the benchmark tool, use the following command:

   /usr/local/bin/nbg-benchmark-*/tools/nbg_benchmark -m /usr/local/demo-ai/image-classification/models/mobilenet/mobilenet_v3_large_100_224_quant.nb -l 10 -c 149

Console output:

/usr/local/bin/nbg-benchmark-*/tools/nbg_benchmark -m /usr/local/demo-ai/image-classificati
on/models/mobilenet/mobilenet_v3_large_100_224_quant.nb  -l 10 -c 149
Info: Network binary file set to: /usr/local/demo-ai/image-classification/models/mobilenet/mobilenet_v3_large_100_224_quant.nb
Info: Executing  10 inference(s) during this benchmark.
Info: Using 149 as a case MMAC for /usr/local/demo-ai/image-classification/models/mobilenet/mobilenet_v3_large_100_224_quant.nb model.
Info: Verifying graph...
Info: Verifying graph took: 22ms or 22887us
Info: Copied a buffer of a size of 150528 to tensor.
Info: NPU running at frequency: 799984787Hz.
Info: Started running the graph [10] loops ...
Info: MAC utilization is 1.57% with caseMAC set to 149 Million of MAC
Info: Loop:10,Average: 15.41 ms or 15412.21 us