How to ensure data coherency when cache and MMU used in STM32CubeMP13

Applicable for STM32MP13x lines



1. Introduction to cache and memory management unit (MMU)[edit source]

STM32MP13x lines More info.png are powered by Arm® Cortex®-A7 processor core, that has 32KB L1 and 256KB L2 cache and Memory Management Unit (MMU). The L2 cache is configurable with a default size of 256KB. For more information, please refer to Arm®'s Technical reference manual [1] for more information.


Caches are high-speed memories that sit between the processor and main memory. They contain a copy of the data present in main memory locations, which enables the processor to run applications at a faster speed by reducing the reading and writing time.
One drawback of using a cache is that it can generate data incoherency between the cache and main memory. The contents of the cache and main memory may not be the same because the data updated by the processor in the cache may not be updated in the main memory. Similarly, some peripherals may update data in the main memory, but the cache may not be up-to-date. This problem occurs because data synchronization between the cache and memory occurs less frequently.
To understand more about cache operation and caching policy you can refer [2]

Memory Management Unit is responsible for maintaining Translation Tables (TTB) which are used to map physical memory addresses to virtual memory addresses. It also controls memory access permissions, memory ordering and cache policies for each memory region.

Refer to Arm® Cortex®-A Series Programmer's Guide for ARMv7-A [3] for more information on MMU.

Arm®'s Technical reference manual can also help in understanding about cache and MMU in Cortex®-A7[4]

2. Maintaining data coherency[edit source]

Data coherency maintenance is required when ever cache and DMA are used. It can be done by two ways:

  1. Cleaning and Invalidating cache
  2. Placing data buffer in non cacheable section of memory

2.1. Cleaning and invalidating cache[edit source]

2.1.1. DMA used for data transfer from memory to peripheral[edit source]

In this case, we need to ensure that data, in memory, is updated from cache, for that cache must be cleaned. Clean function ensures data from cache is updated in memory ( Cache Flush Operation ) . This operation should be done just before DMA transaction and it can be done by using the below function available in core_ca.h

L1C_CleanDCacheAll(); // This cleans entire Data cache

Above function cleans the entire data cache and could lead to increase in processing time if called frequently. Better way is to clean cache by address, this cleans only the require region. Cache have 32-Bytes line hence buffers should be 32-Bytes aligned to avoid data loss while clean and invalidate operations.

L1C_CleanDCacheMVA((void*)current);  // This cleans Data cache by address pointed by ''current''

Example of cleaning a region by size is shown below:

ALIGN_32BYTES (unint32_t start[256]);
__STATIC_FORCEINLINE void __clean_cache_by_addr(uint32_t start, uint32_t size)
{
  uint32_t current = start & ~31U;
  uint32_t end = (start + size + 31U) & ~31U;
  while (current < end)
  {
    L1C_CleanDCacheMVA((void*)current);
    current += 32U;
  }
}

2.1.2. DMA used for data transfer from peripheral to memory[edit source]

In this case we need to ensure data in cache is updated from memory. The cache must be then cleaned and invalidated. Clean function ensure data from cache is updated in memory ( Cache Flush Operation) then cache is invalidated i.e. data in cache marked inaccurate.
Following invalidation data from memory is considered latest and cache is updated. This operation should be done just before DMA transaction, it can be done by using function available in core_ca.h

L1C_CleanInvalidateDCacheAll();  // This cleans and invalidates entire Data cache

Same as clean function we should use clean and invalidate by address and make data buffers aligned to 32 Bytes.

 L1C_CleanInvalidateDCacheMVA((void*)current);   // This cleans and invalidates Data cache by address pointed by current

Example of cleaning and validating a region by size is shown below:

ALIGN_32BYTES (unint32_t start[256]);
__STATIC_FORCEINLINE void __invalidate_cache_by_addr(uint32_t start, uint32_t size)
{
  uint32_t current = start & ~31U;
  uint32_t end = (start + size + 31U) & ~31U;
  while (current < end)
  {
    L1C_CleanInvalidateDCacheMVA((void*)current); /* We clean also because buffers are not 32-byte aligned and read is done after this anyway. */
    current += 32U;
  }
}

2.2. Placing data in non cacheable section of memory[edit source]

In case of DMA use or frequent change in Data buffer by an application it is suggested to place such data buffers in non cacheable region which is SYSRAM in case of STM32MP13x lines More info.png. By using this methodology user can ensure data accuracy and high speed of application. This can be done by making section in linker file and placing it in non cacheable area, then place buffer in that section while declaring in C file.

Linker Modification

    .tcp_sec (NOLOAD) : 
    {
        . = ALIGN(4);
        *(.RxDecripSection)
        . = ALIGN(4);
        *(.TxDecripSection)
        . = ALIGN(4);
        *(.NxServerPoolSection)
        . = ALIGN(4);
        *(.NetXPoolSection)
    } >SYSRAM_BASE

C file Modification

ETH_DMADescTypeDef DMARxDscrTab[ETH_RX_DESC_CNT] __attribute__((section(".RxDecripSection"))); /* Ethernet Rx DMA Descriptors */
ETH_DMADescTypeDef DMATxDscrTab[ETH_TX_DESC_CNT] __attribute__((section(".TxDecripSection")));   /* Ethernet Tx DMA Descriptors */

Example code can be referred from Application in STM32CubeMP13 Package from Github [5].

3. References[edit source]