This page presents the methodology used to compute performance figures of the cryptographic library shared here in the Cryptographic library performance article
1. Time to perform an action
In order to compute the time to perform an action, the number of cycles are measured and time computed from number of cycles and STM32 System Core clock frequency.
The number of cycles is computed thanks to the Systick peripheral. The Systick is configured as follow:
- Input clock = (System Core Clock / 8)[1]
- Reload value to its maximum[1]
- Interrupt enabled to capture counter reload
(1) in order to limit the measurement impact on the measure
1.1. Systick handling code examples
Start & Stop measure
#define MEASURE_START() \
do{ \
SysTick->CTRL = 0; /* Reset Systick configuration */ \
SysTick->LOAD = ((uint32_t)((1<<24) - 1)); /* Set load value to maximum with 1<<24 - 1 value */ \
SysTick->VAL = 0; /* Start counting from 0 */ \
SysTick->CTRL = SysTick_CTRL_TICKINT_Msk; /* Enable interrupt and Clock Source = System core clock / 8 */ \
systick_Counter = 0; /* Initialize global interrupt counter */ \
SysTick->CTRL |= SysTick_CTRL_ENABLE_Msk; /* Enable Systick */ \
__DSB(); __ISB(); /* Ensure all subsequent action are not started prior to complete this macro execution */ \
}while(0)
#define MEASURE_STOP() \
do{ \
__DSB(); __ISB(); /* Ensure all previous computing completed before execution of this macro */ \
SysTick->CTRL &= ~SysTick_CTRL_ENABLE_Msk; /* Stop Systick counter */ \
nbCycle = (0xFFFFFF - SysTick->VAL) * 8; /* Start by computing current Systick counter elapsed time */ \
nbCycle += (systick_Counter *((uint32_t)((1<<24) - 1))) * 8; /* Add number of interrupt occurence * Systick load value */ \
}while(0)
Systick interupt handling:
int systick_Counter = 0; /* Variable to accumulate occurrences of Systick interrupt */
/**
* @brief This function handles SysTick Handler
* @param None
* @retval None
*/
void SysTick_Handler(void)
{
systick_Counter++;
}
1.2. Action execution cycles measurement example
void test_function(void)
{
uint64_t nbCycle = 0;
... /* Preliminary processing not needed to be measured */
MEASURE_START();
<action(s) processing to measure time>
MEASURE_STOP();
... /* Potential residual processing not included into measurement */
printf("NbCycles to perform action = %llu", nbCycle ); /* Report number of cycles to execute the action(s) */
}
2. Stack usage to perform an action
To capture statck usage required to perform an action, the lower part of the stack is filled with a pattern prior to execute the action and the depth of removed pattern is computed after to see how deep in the stack = how many stack elements have been used by the action.
2.1. Stack handling code examples
Stack initialization
/* Beginning of the stack before any call */
#if defined(__ICCARM__)
uint32_t Stack_Top __attribute__ ((section (".noinit")));
#elif defined (__ARMCC_VERSION)
#if defined(__ARMCOMPILER_VERSION)
uint32_t Stack_Top __attribute__( ( section( ".bss.NoInit")) ) ;
#else /* compiler V5 */
uint32_t Stack_Top __attribute__( (section( ".bss.NoInit"), zero_init) ) ;
#endif
#elif defined(__GNUC__)
uint32_t Stack_Top __attribute__( ( section( ".bss.NoInit")) ) ;
#endif /* __ICCARM__ */
EXTERN Stack_Top
Reset_Handler
...
LDR R0, =Stack_Top
STR SP, [R0]
...
Start & Stop measure
#define STACK_START(p_stack_high) \
do{ \
/* Get initial stack position and fill the stack with default pattern 0xCDCDCDCD */ \
p_stack_high = UtilGetSP(); \
for (uint32_t stack_address=Stack_Top-STACK_SIZE; stack_address<(int)p_stack_high-4; stack_address+=4) \
{ \
*((int *)stack_address) = 0xCDCDCDCD; \
} \
}while(0)
#define STACK_STOP(p_stack_high, stack_low) \
do{ \
stack_low = Stack_Top-STACK_SIZE; \
/* Search first stack occurrence != pattern */ \
for (uint32_t stack_address=Stack_Top-STACK_SIZE; stack_address<(int)p_stack_high-4; stack_address+=4) \
{ \
if (*((int *)stack_address) != 0xCDCDCDCD) \
{ \
stack_low = stack_address-4; \
break; \
} \
} \
}while(0)
/**
* @brief This function get the current position of stack pointer
* @param None
* @retval None
*/
uint32_t *UtilGetSP( void )
{
uint32_t *result=0;
#if defined(__ICCARM__)
asm("MOV %0, SP" : "=r"(result) );
#elif defined (__ARMCC_VERSION)
result = (uint32_t*)__get_MSP();
#elif defined(__GNUC__)
__asm__("MOV %0, SP" : "=r"(result) );
#endif /* __ICCARM__ */
return (result);
}
2.2. Action execution stack usage measurement example
void test_function(void)
{
uint32_t *p_stack_high; /* Top stack */
uint32_t stack_low; /* Bottom stack */
... /* Preliminary processing not needed to be included in measure */
STACK_START(p_stack_high);
<action(s) processing to measure stack usage>
STACK_STOP(p_stack_high, stack_low);
... /* Potential residual processing not included into measurement */
printf("Stack depth usage to perform action = %d", (int)((int)p_stack_high-stack_low)); /* Report stack usage to execute the action(s) */
}
3. Working buffer usage to perform an action
For some cryptographic computing the APIs used requires a working buffer to be given.
Example for ECC computing:
void test_function(void)
{
uint8_t working_buffer[4000]; /* ECC working buffer */
...
cmox_ecc_construct(&Ecc_Ctx, CMOX_ECC128MULT_MATH_FUNCS, working_buffer, sizeof(working_buffer));
...
}
When the action complete, the maximum usage of this working buffer is directly accessible in the context of the operation:
void test_function(void)
{
uint8_t working_buffer[4000]; /* ECC working buffer */
...
cmox_ecc_construct(&Ecc_Ctx, CMOX_ECC128MULT_MATH_FUNCS, working_buffer, sizeof(working_buffer));
...
<action(s) processing to measure working buffer usage>
/* Save the max memory size used by ECC operation */
max_mem_used = Ecc_Ctx.membuf_str.MaxMemUsed;
... /* Potential residual processing not included into measurement */
printf("Working buffer usage to perform action = %d", max_mem_used); /* Report working buffer usage to execute the action(s) */
}
4. Code, constant and global data usage to perform an action
Code size, constant data and global data usage are extracted from generated map file.
IAR Embedded Workbench for ARM example:
Module ro code ro data rw data
------ ------- ------- -------
[...]
libSTM32Cryptographic_CM33.a: [3]
cmox_aes_common.c.o 296 552
cmox_aesfast_decrypt.c.o 1'136 1'036
cmox_cbc_aesfast_decrypt.c.o 8 4
cmox_cbc_common.c.o 180
cmox_cbc_decrypt.c.o 592 44
cmox_cipher.c.o 136
cmox_cipher_modes.c.o 1'094 8
cmox_cipher_utils.c.o 130
cmox_init.c.o 36 8 1
-------------------------------------------------------
Total: 3'608 1'652 1
Code size = ro code : 3608 Bytes
Constant data = ro data : 1652 Bytes
Global data = rw data : 1 Byte
MDK-ARM example:
Code (inc. data) RO Data RW Data ZI Data Debug Library Name
3752 156 1652 0 1 0 libSTM32Cryptographic_CM33.a
Code size = Code : 3752 Bytes
Constant data = RO Data : 1652 Bytes
Global data = RW Data + ZI Data : 0 + 1 Byte
STM32CubeIDE example:
.rodata 0x0000000000000000 0x8 ../../../../../../../Middlewares/ST/STM32_Cryptographic/lib\libSTM32Cryptographic_CM33.a(cmox_cbc_aesfast_decrypt.c.o)
.text 0x0000000000000000 0x1c ../../../../../../../Middlewares/ST/STM32_Cryptographic/lib\libSTM32Cryptographic_CM33.a(cmox_cipher.c.o)
Code size = addition of values from lines starting by .text concerning libSTM32Cryptographic_xxxx
Constant data = addition of values from lines starting by .rodata or CMOX_CTA_PROTECTED_DATA (1) concerning libSTM32Cryptographic_xxxx
Global data = addition of values from lines starting by .bss concerning libSTM32Cryptographic_xxxx