How to measure machine learning model power consumption with STM32Cube.AI generated application

Revision as of 17:42, 4 August 2021 by Registered User

This article is describing how to easily modify the system performance application generated thanks to STM32Cube.AI to run power and energy measurements in optimal configuration.

The system performance application allows to run automatically inferences of a machine learning processing generated thanks to STM32Cube.AI (neural network or traditional machine learning models). It allows to measure directly on the target the inference time. It can also be used to measure power consumption. However, the default settings are not fully optimal to ensure accurate measures of only the processing excluding peripherals and power leakages on unused GPIO. As an example, we will take the NUCLEO-L4R5ZI, but the process can be adapted to any board supported by STM32Cube.AI.

Info white.png Information
  • STM32Cube.AI is a software aiming at the generation of optimized C code for STM32 and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).

1. Prerequisites

1.1. Hardware

1.2. Software

2. Project generation

2.1. Loading a pre-defined ioc

The next section describes how to start from STM32CubeMX to generate the project. We will provide soon pre-defined STM32CUbeMX project Files ioc for some boards on our GitHub. You can load them directly to the Import your model section and then go directly to the import model section. To load an ioc, select Files / Load Project:

2.2. Create a new project

Open STM32CubeMX and start project using the board selector:

Select the board to use, in our case the NUCLEO-L4R5ZI and create a project without initializing all peripherals with their default Mode:

2.3. Add X-CUBE-AI software pack

Select X-CUBE-AI software pack Core and System Performance application:

Click on X-CUBE-AI software pack:

If by default the peripherals parameters are not set to the best performance, the system will warn you. Select yes to make sure to use the maximal frequency.

X-CUBE-AI will configure default parameters to set the best performance as well as configuring the UART used to report performances.

You can check which UART will be used by X-CUBE-AI to communicate with the board.it is the UART connected to the STLink embedded device which is seen from the PC as a Virtual Com Port once connected by USB. To do so open the Platform Settings panel:

For the NUCLEO-L4R5ZI it is the LPUART1. You can also check the settings used by X-CUBE-AI for the specified UART in the Connectivity panel / Parameter Settings:

These settings must be used on PC the hyper terminal to communicate with the board when the system performance application is running.

Identify the GPIOs used by the UART in the GPIO Settings panel:

For the NUCLEO-L4R5ZI, it is the PG7 and PG8 (Pin 7 and 8 of bank G).

2.4. Check / modify clock configuration

You can also eventually open and modify the clock configuration for instance to select a specific HCLK frequency or change the clock source (on STM32L4R5ZI Nucleo for instance from HSI to MSI). For conveniency when setting the GPIOs, it is recommended to select HSI as source clock. If HSE (external clock) is selected, make sure not to reset GPIOs connected to the external crystal RCC_OSC_IN and RCC_OSC_OUT. Note that on Nucleo board by default the HSE crystal is generally not mounted. You can also check in the System Core / RCC panel the clock setting and especially the Power Regulator Voltage Scale (see Important notes section for STM32H747 case).

When using SMPS for power supply, make sure the right power regulator is selected for the right frequency.

2.5. Reset not unnecessary GPIOs

On the “”Pinout & Configuration”” view, reset all the unused GPIOs. All pins can be put in reset state except the STLINK_RX and STLINK_TX UART pins (PG7 and PG8 for NUCLEO-L4R5ZI configured by X-CUBE-AI), NRST and voltages pins as well as RCC_OSC_IN and RCC_OSC_OUT only if the HSE is selected. On the NUCLEO-L4R5ZI example, it means going tfrom the following configuration:

to the following one:

2.6. Import your model in X-CUBE-AI

As usual with X-CUBE-AI, import the model you want to analyze:

To optimize the RAM usage, it is advised to select the "“Use activation buffer for input buffer”" and "“Use activation buffer for the output buffer”" options in Advanced Settings panel: