How to ensure data coherency when cache and MMU used in STM32CubeMP13

1. Introduction to CACHE and Memory Management Unit (MMU)[edit source]

STM32MP21xx devices are powered by Arm® Cortex®-A7 processor core, which has 32KB L1 and L2 Cache and Memory Management Unit (MMU).

Caches are fast memories which sit in between processor and main memory. It have a usually contain copy of data of main memory locations which enables the processor to run applications at a higher speed due to reduction in reading/writing time. A drawback of using Cache is maintaining coherency between cache and main memory. Contents of cache and main memory might not be same as data updated by processor in cache but it is not updates to main memory similarly some peripheral might update data in main memory but cache is not up-to-date. This problem occurs as data sync between cache and memory occurs less frequently. To understand more about cache operation and caching policy you can refer [1]

Memory Management Unit is responsible maintaining Translation Tables (TTB) which are used to map physical memory addresses to virtual memory addresses. It also controls memory access permissions, memory ordering and cache policies for each memory region.

Refer to ARM Cortex-A Series Programmer's Guide for ARMv7-A [2] for more information on MMU.

Arm®'s Technical reference manual can also help in understanding about Cache and MMU in Cortex®-A7[3]

2. Maintaining Data Coherency[edit source]

Data coherency maintenance is required when ever cache and DMA are used. It can be done by two ways:

  1. Cleaning and Invalidating Cache
  2. Placing data buffer in non cacheable section of memory

2.1. Cleaning and Invalidating Cache[edit source]

2.1.1. DMA used for data transfer from Memory to Peripheral[edit source]

In this case we need to ensure data in memory is updated from cache so we need to clean cache. Clean function ensure data from cache is updated in memory ( Cache Flush Operation ) . This operation should be done just before DMA transaction & it can be done by using the below function available in core_ca.h

L1C_CleanDCacheAll(); // This cleans entire Data cache

Above function cleans the entire data cache and could lead to increase in processing time if called frequently. Better way is to clean cache by address, this cleans only the require region. Cache have 32 Byte line hence buffers should be 32Byte aligned to avoid data loss while clean and invalidate operations.

L1C_CleanDCacheMVA((void*)current);  // This cleans Data cache by address pointed by current

Example of cleaning a region by size is shown below:

ALIGN_32BYTES (unint32_t start[256]);
__STATIC_FORCEINLINE void __clean_cache_by_addr(uint32_t start, uint32_t size)
{
  uint32_t current = start & ~31U;
  uint32_t end = (start + size + 31U) & ~31U;
  while (current < end)
  {
    L1C_CleanDCacheMVA((void*)current);
    current += 32U;
  }
}

2.1.2. DMA used for data transfer from Peripheral to Memory[edit source]

In this case we need to ensure data in cache is updated from memory so we need to clean and invalidate cache. Clean function ensure data from cache is updated in memory ( Cache Flush Operation) then cache is invalidated i.e. data in cache marked inaccurate. Following invalidation data from memory is considered latest and cache is updated. This operation should be done just before DMA transaction, it can be done by using function available in core_ca.h

L1C_CleanInvalidateDCacheAll();  // This cleans and invalidates entire Data cache

Same as clean function we should use clean and invalidate by address and make data buffers aligned to 32 Bytes.

 L1C_CleanInvalidateDCacheMVA((void*)current);   // This cleans and invalidates Data cache by address pointed by current

Example of cleaning and validating a region by size is shown below:

ALIGN_32BYTES (unint32_t start[256]);
__STATIC_FORCEINLINE void __invalidate_cache_by_addr(uint32_t start, uint32_t size)
{
  uint32_t current = start & ~31U;
  uint32_t end = (start + size + 31U) & ~31U;
  while (current < end)
  {
    L1C_CleanInvalidateDCacheMVA((void*)current); /* We clean also because buffers are not 32-byte aligned and read is done after this anyway. */
    current += 32U;
  }
}

2.2. Placing Data in non cacheable Section of memory[edit source]

In case of DMA use or frequent change in Data buffer by an application it is suggested to place such data buffers in non cacheable region which is SYSRAM in case of STM32MP13XX. By using this methodology user can ensure data accuracy and high speed of application. This can be done by making section in linker file and placing it in non cacheable area, then place buffer in those section while declaring in C file.

Linker Modification

    .tcp_sec (NOLOAD) : 
    {
        . = ALIGN(4);
        *(.RxDecripSection)
        . = ALIGN(4);
        *(.TxDecripSection)
        . = ALIGN(4);
        *(.NxServerPoolSection)
        . = ALIGN(4);
        *(.NetXPoolSection)
    } >SYSRAM_BASE

C file Modification

ETH_DMADescTypeDef DMARxDscrTab[ETH_RX_DESC_CNT] __attribute__((section(".RxDecripSection"))); /* Ethernet Rx DMA Descriptors */
ETH_DMADescTypeDef DMATxDscrTab[ETH_TX_DESC_CNT] __attribute__((section(".TxDecripSection")));   /* Ethernet Tx DMA Descriptors */

Example code can be referred from Application in STM32MP13xx Package from Github [4] .

In case of large buffer size that does not fit in SYSRAM sections of DDR can also be made non cacheable Linker Modification

    .tcp_sec (NOLOAD) : 
    {
        . = ALIGN(4);
        *(.RxDecripSection)
        . = ALIGN(4);
        *(.TxDecripSection)
        . = ALIGN(4);
        *(.NxServerPoolSection)
        . = ALIGN(4);
        *(.NetXPoolSection)
    } >SYSRAM_BASE


Example code can be referred from Application in STM32MP13xx Package from Github [5] .

3. References[edit source]