How to measure machine learning model power consumption with STM32Cube.AI generated application

Revision as of 18:00, 4 August 2021 by Registered User

This article is describing how to easily modify the system performance application generated thanks to STM32Cube.AI to run power and energy measurements in optimal configuration.

The system performance application allows to run automatically inferences of a machine learning processing generated thanks to STM32Cube.AI (neural network or traditional machine learning models). It allows to measure directly on the target the inference time. It can also be used to measure power consumption. However, the default settings are not fully optimal to ensure accurate measures of only the processing excluding peripherals and power leakages on unused GPIO. As an example, we will take the NUCLEO-L4R5ZI, but the process can be adapted to any board supported by STM32Cube.AI.

Info white.png Information
  • STM32Cube.AI is a software aiming at the generation of optimized C code for STM32 and neural network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).

1. Prerequisites

1.1. Hardware

1.2. Software

2. Project generation

2.1. Loading a pre-defined ioc

The next section describes how to start from STM32CubeMX to generate the project. We will provide soon pre-defined STM32CUbeMX project Files ioc for some boards on our GitHub. You can load them directly to the Import your model section and then go directly to the import model section. To load an ioc, select Files / Load Project:

2.2. Create a new project

Open STM32CubeMX and start project using the board selector:

Select the board to use, in our case the NUCLEO-L4R5ZI and create a project without initializing all peripherals with their default Mode:

2.3. Add X-CUBE-AI software pack

Select X-CUBE-AI software pack Core and System Performance application:

Click on X-CUBE-AI software pack:

If by default the peripherals parameters are not set to the best performance, the system will warn you. Select yes to make sure to use the maximal frequency.

X-CUBE-AI will configure default parameters to set the best performance as well as configuring the UART used to report performances.

You can check which UART will be used by X-CUBE-AI to communicate with the board.it is the UART connected to the STLink embedded device which is seen from the PC as a Virtual Com Port once connected by USB. To do so open the Platform Settings panel:

For the NUCLEO-L4R5ZI it is the LPUART1. You can also check the settings used by X-CUBE-AI for the specified UART in the Connectivity panel / Parameter Settings:

These settings must be used on PC the hyper terminal to communicate with the board when the system performance application is running.

Identify the GPIOs used by the UART in the GPIO Settings panel:

For the NUCLEO-L4R5ZI, it is the PG7 and PG8 (Pin 7 and 8 of bank G).

2.4. Check / modify clock configuration

You can also eventually open and modify the clock configuration for instance to select a specific HCLK frequency or change the clock source (on STM32L4R5ZI Nucleo for instance from HSI to MSI). For conveniency when setting the GPIOs, it is recommended to select HSI as source clock. If HSE (external clock) is selected, make sure not to reset GPIOs connected to the external crystal RCC_OSC_IN and RCC_OSC_OUT. Note that on Nucleo board by default the HSE crystal is generally not mounted. You can also check in the System Core / RCC panel the clock setting and especially the Power Regulator Voltage Scale (see Important notes section for STM32H747 case).

When using SMPS for power supply, make sure the right power regulator is selected for the right frequency.

2.5. Reset not unnecessary GPIOs

On the “”Pinout & Configuration”” view, reset all the unused GPIOs. All pins can be put in reset state except the STLINK_RX and STLINK_TX UART pins (PG7 and PG8 for NUCLEO-L4R5ZI configured by X-CUBE-AI), NRST and voltages pins as well as RCC_OSC_IN and RCC_OSC_OUT only if the HSE is selected. On the NUCLEO-L4R5ZI example, it means going tfrom the following configuration:

to the following one:

2.6. Import your model in X-CUBE-AI

As usual with X-CUBE-AI, import the model you want to analyze:

To optimize the RAM usage, it is advised to select the "“Use activation buffer for input buffer”" and "“Use activation buffer for the output buffer”" options in Advanced Settings panel:

You can also run an Analyze to get the memory footprint of the model:


2.7. System Performance code generation

Open the Project Manager tab and specify a project name. Select the IDE, in this example we will use STM32Cube IDE:

Select the Code Generator tab and check the option ““Set all free pins as analog (to optimize the power consumption)””:

3. Project modification

Once generated you can open the project with STM32Cube IDE:

The ioc can be reused to recover the settings in CubeMX and modify or generate new projects with new models without the need to reconfigure everything. You will find soon a set of pre-configured iocs in our GitHub. You can do a first build of the project to check the generation.

Add to main.c (located in Core/Src) between the two tags ““/* USER CODE BEGIN 0 */”” and ““/* USER CODE END 0 */”” the following functions for the NUCLEO-L4R5ZI:

This snippet is provided AS IS, and by taking it, you agree to be bound to the license terms that can be found here for the component: Application.
/* USER CODE BEGIN 0 */
/**
  * @brief Disable the clock of all GPIOs
  * @param None
  * @retval None
  */
void MX_GPIO_Disable(void)
{
	
  __HAL_RCC_GPIOA_CLK_DISABLE();
  __HAL_RCC_GPIOB_CLK_DISABLE();
  __HAL_RCC_GPIOC_CLK_DISABLE();
  __HAL_RCC_GPIOD_CLK_DISABLE();
  __HAL_RCC_GPIOE_CLK_DISABLE();
  __HAL_RCC_GPIOF_CLK_DISABLE();
  __HAL_RCC_GPIOG_CLK_DISABLE();
  __HAL_RCC_GPIOH_CLK_DISABLE();
}

/**
  * @brief Disable the VCOM UART
  * @param None
  * @retval None
  */
void MX_UARTx_DeInit(void)
{
  HAL_UART_DeInit(&hlpuart1);
  GPIO_InitTypeDef GPIO_InitStruct = {0};

  /*Configure GPIO pins : PG7, PG8 */
  GPIO_InitStruct.Pin = GPIO_PIN_7|GPIO_PIN_8;
  GPIO_InitStruct.Mode = GPIO_MODE_ANALOG;
  GPIO_InitStruct.Pull = GPIO_NOPULL;
  HAL_GPIO_Init(GPIOG, &GPIO_InitStruct);
}
/* USER CODE END 0 */

For other boards than NUCLEO-L4R5ZI, you need to adapt to: - The number and identifications of GPIOs bank, you can simply refer to the function MX_GPIO_Init in the main.c file which is enabling the clock of all the GPIOs to set them to analog mode - The UART handle used for the Virtual Com Port. It is also specified in the main.c where the UART is configured and enabled. Refer to the “Private variable” and the UART_HandleTypeDef used, for the NUCLEO-L4R5ZI: UART_HandleTypeDef hlpuart1; - The GPIOs used for the UART communication, identified in the “Identify the GPIOs used by the UART in the GPIO Settings panel” section.

By placing the function between the tags /* USER CODE BEGIN 0 */ and /* USER CODE END 0 */, the code will be kept even if you regenerate the project for instance to test another model.

Modify the file app_x-cube-ai.c (located in X-CUBE-AI/App): Replace the full function ai_mnetwork_run: by:

This snippet is provided AS IS, and by taking it, you agree to be bound to the license terms that can be found here for the component: Application.
AI_API_ENTRY
ai_i32 ai_mnetwork_run(ai_handle network, const ai_buffer* input,
        ai_buffer* output) {}
By:
extern void MX_UARTx_DeInit(void);
extern void MX_GPIO_Disable(void);
#define AI_MIN_LOOP 16

AI_API_ENTRY
ai_i32 ai_mnetwork_run(ai_handle network, const ai_buffer* input, ai_buffer* output)
{
  struct network_instance* inn;
  static ai_i32 Counter = 0;
  if (inn == NULL)
    return 0;
  if (Counter < AI_MIN_LOOP)
  {
    Counter++;
    return inn->entry->ai_run(inn->handle, input, output);
  }
  else
  {
    printf("\nStarting infinite power measurement loop\n");
    MX_UARTx_DeInit();
    MX_GPIO_Disable();
    while(1)
    {
      inn->entry->ai_run(inn->handle, input, output);
    }
  } 
}

Build and load the firmware in the STM32 as usual using Cube IDE or Cube Programmer.

The system application will run a full loop of 16 inferences to get average inference time as usual and then disable the UART and the GPIO’s clocks to enter in an infinite loop processing the inference and allowing then to measure the power consumption. Note that this code should be copied again each time the project is regenerated through STM32Cube.AI.