1. Introduction to cache and memory management unit (MMU)[edit source]
The STM32MP13x lines are powered by Arm® Cortex®-A7 processor core that has 32KB L1 and 256KB L2 cache, and Memory Management Unit (MMU). The L2 cache is configurable with a default size of 256KB. For more information, refer to Arm®'s Technical reference manual [1] .
Caches are high-speed memories that sit between the processor and main memory. They contain a copy of the data, present in main memory locations. This copy enables the processor to run applications at a faster speed by reducing the reading and writing time.
One drawback of using a cache is that it can generate data incoherency between the cache and main memory.
The contents of the cache and of main memory may be different because the data, updated by the processor in the cache, may not be updated in the main memory. Similarly, some peripherals may update data in the main memory, but the cache may not be up-to-date. This problem occurs because data synchronization between the cache and memory occurs less frequently.
To understand more about cache operation and caching policy, you can refer to the Arm® documentation[2].
Memory management unit is responsible for maintaining Translation Tables (TTB) which are used to map physical memory addresses to virtual memory addresses. It also controls memory access permissions, memory ordering and cache policies for each memory region.
Refer to Arm® Cortex®-A Series Programmer's Guide for ARMv7-A [3] for more information on MMU.
Arm®'s Technical reference manual can also help in understanding about cache and MMU in Cortex®-A7[4].
2. Maintaining data coherency[edit source]
Data coherency maintenance is required whenever a cache and DMA are used. It can be achieved in two ways:
- Cleaning and invalidating the cache,
- Placing data buffer in the noncacheable section of the memory.
2.1. Cleaning and invalidating the cache[edit source]
2.1.1. DMA used for data transfer from memory to peripheral[edit source]
In this case, it is necessary to ensure that the data, in memory, is updated from the cache, which requires to clean the cache. The "Clean function" ensures that the data from the cache is updated in memory (Cache Flush Operation).
This operation should be performed just before the DMA transaction, and it can be achieved by using the function below available in core_ca.h.
L1C_CleanDCacheAll(); // This cleans entire Data cache
The function above cleans the entire data cache, and could lead to increase processing time if called frequently.
A better way to clean the cache is by address. This method only cleans the required region. Since that the cache line is 32 bytes, it's important to align buffers to 32 bytes to prevent data loss when performing clean and invalidate operations.
L1C_CleanDCacheMVA((void*)current); // This cleans Data cache by address pointed by ''current''
Example of cleaning a region by size is shown below:
ALIGN_32BYTES (unint32_t start[256]);
__STATIC_FORCEINLINE void __clean_cache_by_addr(uint32_t start, uint32_t size)
{
uint32_t current = start & ~31U;
uint32_t end = (start + size + 31U) & ~31U;
while (current < end)
{
L1C_CleanDCacheMVA((void*)current);
current += 32U;
}
}
2.1.2. DMA used for data transfer from peripheral to memory[edit source]
In this case, it is necessary to ensure that the data in the cache is updated from memory. The cache must then be cleaned and invalidated.
The "Clean function" updates the main memory with the data from the cache (Cache Flush Operation), followed by invalidation of the cache.
For example, the data in the cache is marked as inaccurate. After invalidation, the data from memory is considered as the latest, and the cache is updated. This operation should be performed just before the DMA transaction. It can be achieved by using the function available in core_ca.h.
L1C_CleanInvalidateDCacheAll(); // This cleans and invalidates entire Data cache
Similar to the clean function, the clean and invalidate function by address should be used to ensure that the data buffers are aligned to 32 bytes.
L1C_CleanInvalidateDCacheMVA((void*)current); // This cleans and invalidates Data cache by address pointed by current
Example of cleaning and validating a region by size is shown below:
ALIGN_32BYTES (unint32_t start[256]);
__STATIC_FORCEINLINE void __invalidate_cache_by_addr(uint32_t start, uint32_t size)
{
uint32_t current = start & ~31U;
uint32_t end = (start + size + 31U) & ~31U;
while (current < end)
{
L1C_CleanInvalidateDCacheMVA((void*)current); /* We clean also because buffers are not 32-byte aligned and read is done after this anyway. */
current += 32U;
}
}
2.2. Placing data in noncacheable section of memory[edit source]
If DMA is used or there are frequent changes in the data buffer by an application, it is recommended to place such data buffers in a noncacheable region, which is SYSRAM in the case of STM32MP13x lines . By using this methodology, the user can ensure data accuracy and the high speed of the application. This can be achieved by creating a section in the linker file and placing it in a noncacheable area, and then placing the buffer in that section while declaring it in the C file.
Linker modification:
.tcp_sec (NOLOAD) :
{
. = ALIGN(4);
*(.RxDecripSection)
. = ALIGN(4);
*(.TxDecripSection)
. = ALIGN(4);
*(.NxServerPoolSection)
. = ALIGN(4);
*(.NetXPoolSection)
} >SYSRAM_BASE
C file modification:
ETH_DMADescTypeDef DMARxDscrTab[ETH_RX_DESC_CNT] __attribute__((section(".RxDecripSection"))); /* Ethernet Rx DMA Descriptors */
ETH_DMADescTypeDef DMATxDscrTab[ETH_TX_DESC_CNT] __attribute__((section(".TxDecripSection"))); /* Ethernet Tx DMA Descriptors */
Example code can be referred from Application in STM32CubeMP13 Package from Github [5].
3. References[edit source]