Cryptographic Library Performance: Measurement methodology

This page presents the methodology used to compute performance figures of the cryptographic library shared here in the Cryptographic library performance article

1. Time to perform an action

In order to compute the time to perform an action, the number of cycles are measured and time computed from number of cycles and STM32 System Core clock frequency.

The number of cycles is computed thanks to the Systick peripheral. The Systick is configured as follow:

  • Input clock = (System Core Clock / 8)[1]
  • Reload value to its maximum[1]
  • Interrupt enabled to capture counter reload

(1) in order to limit the measurement impact on the measure

1.1. Systick handling code examples

Start & Stop measure

#define MEASURE_START()                                                                                                                           \
do{                                                                                                                                               \
    SysTick->CTRL = 0;                         /* Reset Systick configuration */                                                                  \
    SysTick->LOAD = ((uint32_t)((1<<24) - 1)); /* Set load value to maximum with 1<<24 - 1 value */                                               \
    SysTick->VAL = 0;                          /* Start counting from 0 */                                                                        \
    SysTick->CTRL = SysTick_CTRL_TICKINT_Msk;  /* Enable interrupt and Clock Source = System core clock / 8 */                                    \
    systick_Counter = 0;                       /* Initialize global interrupt counter */                                                          \
    SysTick->CTRL |= SysTick_CTRL_ENABLE_Msk;  /* Enable Systick */                                                                               \
    __DSB(); __ISB();                          /* Ensure all subsequent action are not started prior to complete this macro execution  */         \
}while(0)

#define MEASURE_STOP()                                                                                                                            \
do{                                                                                                                                               \
    __DSB(); __ISB();                          /* Ensure all previous computing completed before execution of this macro */                       \
  SysTick->CTRL &= ~SysTick_CTRL_ENABLE_Msk;   /* Stop Systick counter */                                                                         \
  nbCycle = (0xFFFFFF - SysTick->VAL) * 8;     /* Start by computing current Systick counter elapsed time  */                                     \
  nbCycle += (systick_Counter *((uint32_t)((1<<24) - 1))) * 8;   /* Add number of interrupt occurence * Systick load value */                     \
}while(0)

Systick interupt handling:

int systick_Counter = 0;    /* Variable to accumulate occurrences of Systick interrupt */

/**
  * @brief  This function handles SysTick Handler
  * @param  None
  * @retval None
  */
void SysTick_Handler(void)
{
  systick_Counter++;
}

1.2. Action execution cycles measurement example

void test_function(void)
{
  uint64_t nbCycle = 0;
  ... /* Preliminary processing not needed to be measured */
  MEASURE_START();
  <action(s) processing to measure time>
  MEASURE_STOP();
  ... /* Potential residual processing not included into measurement */
  printf("NbCycles to perform action = %llu", nbCycle );   /* Report number of cycles to execute the action(s) */
}


2. Stack usage to perform an action

To capture statck usage required to perform an action, the lower part of the stack is filled with a pattern prior to execute the action and the depth of removed pattern is computed after to see how deep in the stack = how many stack elements have been used by the action.

2.1. Stack handling code examples

Stack initialization

/* Beginning of the stack before any call */
#if defined(__ICCARM__)
uint32_t Stack_Top __attribute__ ((section (".noinit")));
#elif defined (__ARMCC_VERSION)
#if defined(__ARMCOMPILER_VERSION)
uint32_t Stack_Top __attribute__( ( section( ".bss.NoInit")) ) ;
#else /* compiler V5 */
uint32_t Stack_Top __attribute__( (section( ".bss.NoInit"), zero_init) ) ;
#endif
#elif defined(__GNUC__)
uint32_t Stack_Top __attribute__( ( section( ".bss.NoInit")) ) ;
#endif /* __ICCARM__ */
        EXTERN  Stack_Top

Reset_Handler
        ...
        LDR     R0, =Stack_Top
        STR     SP, [R0]
        ...


Start & Stop measure

#define STACK_START(p_stack_high)                                                                               \
do{                                                                                                             \
    /* Get initial stack position and fill the stack with default pattern 0xCDCDCDCD */                         \
    p_stack_high = UtilGetSP();                                                                                 \
    for (uint32_t stack_address=Stack_Top-STACK_SIZE; stack_address<(int)p_stack_high-4; stack_address+=4)      \
    {                                                                                                           \
       *((int *)stack_address) = 0xCDCDCDCD;                                                                    \
    }                                                                                                           \
}while(0)

#define STACK_STOP(p_stack_high, stack_low)                                                                     \
do{                                                                                                             \
    stack_low = Stack_Top-STACK_SIZE;                                                                           \
    /* Search first stack occurrence != pattern */                                                              \
    for (uint32_t stack_address=Stack_Top-STACK_SIZE; stack_address<(int)p_stack_high-4; stack_address+=4)      \
    {                                                                                                           \
       if (*((int *)stack_address) != 0xCDCDCDCD)                                                               \
       {                                                                                                        \
         stack_low = stack_address-4;                                                                           \
         break;                                                                                                 \
       }                                                                                                        \
    }                                                                                                           \
}while(0)

/**
  * @brief  This function get the current position of stack pointer
  * @param  None
  * @retval None
  */
uint32_t *UtilGetSP( void )
{
  uint32_t *result=0;
#if defined(__ICCARM__)
  asm("MOV %0, SP" : "=r"(result) );
#elif defined (__ARMCC_VERSION)
  result = (uint32_t*)__get_MSP();
#elif defined(__GNUC__)
  __asm__("MOV %0, SP" : "=r"(result) );
#endif /* __ICCARM__ */
  return (result);
}


2.2. Action execution stack usage measurement example

void test_function(void)
{
  uint32_t *p_stack_high;                          /* Top stack             */
  uint32_t stack_low;                              /* Bottom stack          */
  ... /* Preliminary processing not needed to be included in measure */
  STACK_START(p_stack_high);
  <action(s) processing to measure stack usage>
  STACK_STOP(p_stack_high, stack_low);
  ... /* Potential residual processing not included into measurement */
  printf("Stack depth usage to perform action = %d", (int)((int)p_stack_high-stack_low));   /* Report stack usage to execute the action(s) */
}


3. Working buffer usage to perform an action

For some cryptographic computing the APIs used requires a working buffer to be given.

Example for ECC computing:

void test_function(void)
{
  uint8_t working_buffer[4000];                    /* ECC working buffer    */
  ...
  cmox_ecc_construct(&Ecc_Ctx, CMOX_ECC128MULT_MATH_FUNCS, working_buffer, sizeof(working_buffer));
  ...
}

When the action complete, the maximum usage of this working buffer is directly accessible in the context of the operation:

void test_function(void)
{
  uint8_t working_buffer[4000];                    /* ECC working buffer    */
  ...
  cmox_ecc_construct(&Ecc_Ctx, CMOX_ECC128MULT_MATH_FUNCS, working_buffer, sizeof(working_buffer));
  ...
  <action(s) processing to measure working buffer usage>
  /* Save the max memory size used by ECC operation */
  max_mem_used = Ecc_Ctx.membuf_str.MaxMemUsed;
  ... /* Potential residual processing not included into measurement */
  printf("Working buffer usage to perform action = %d", max_mem_used);   /* Report working buffer usage to execute the action(s) */
}


4. Code, constant and global data usage to perform an action

Code size, constant data and global data usage are extracted from generated map file.

IAR Embedded Workbench for ARM example:

    Module                        ro code  ro data  rw data
    ------                        -------  -------  -------
    [...]
    libSTM32Cryptographic_CM33.a: [3]
    cmox_aes_common.c.o               296      552
    cmox_aesfast_decrypt.c.o        1'136    1'036
    cmox_cbc_aesfast_decrypt.c.o        8        4
    cmox_cbc_common.c.o               180
    cmox_cbc_decrypt.c.o              592       44
    cmox_cipher.c.o                   136
    cmox_cipher_modes.c.o           1'094        8
    cmox_cipher_utils.c.o             130
    cmox_init.c.o                      36        8        1
    -------------------------------------------------------
    Total:                          3'608    1'652        1

Code size = ro code : 3608 Bytes

Constant data = ro data : 1652 Bytes

Global data = rw data : 1 Byte

MDK-ARM example:

    Code (inc. data)   RO Data    RW Data    ZI Data      Debug   Library Name

    3752        156       1652          0          1          0   libSTM32Cryptographic_CM33.a

Code size = Code : 3752 Bytes

Constant data = RO Data : 1652 Bytes

Global data = RW Data + ZI Data : 0 + 1 Byte

STM32CubeIDE example:

    .rodata        0x0000000000000000        0x8 ../../../../../../../Middlewares/ST/STM32_Cryptographic/lib\libSTM32Cryptographic_CM33.a(cmox_cbc_aesfast_decrypt.c.o)
    .text          0x0000000000000000       0x1c ../../../../../../../Middlewares/ST/STM32_Cryptographic/lib\libSTM32Cryptographic_CM33.a(cmox_cipher.c.o)

Code size = addition of values from lines starting by .text concerning libSTM32Cryptographic_xxxx

Constant data = addition of values from lines starting by .rodata or CMOX_CTA_PROTECTED_DATA (1) concerning libSTM32Cryptographic_xxxx

Global data = addition of values from lines starting by .bss concerning libSTM32Cryptographic_xxxx

5. References