Last edited one month ago

SWIOTLB mechanism overview



1. Article purpose

This article explains the Linux® kernel SWIOTLB (Software Input Output Translation Lookaside Buffer) mechanism. The SWIOTLB has been introduced for platforms embedding DMA masters that cannot access more than 32-bit address space, and do not have a hardware IOMMU. The SWIOTLB can be seen as a software IOMMU. This SWIOTLB mechanism is a native Linux® kernel feature which is enabled by default.

2. STM32 memory space

To understand the need of the SWIOTLB feature, the STM32 memory space has to be inspected. It can be represented as following:

  • The first 2 GB [0x0000 0000 - 0x7fff ffff] of the memory space is used for internal memories and internal peripherals.
  • The rest of the memory space is used for the DDR:
    • [0x8000 0000 - 0xffff ffff] for DDR up to 2 GB.
    • [0x8000 0000 - 0x17fff ffff] for DDR up to 4 GB.

The CPU (Arm® Cortex®-A35) and some peripherals are "master" on the bus and can access directly to the DDR. These peripherals are essentially the ones that embed a DMA: HPDMA, ETH, SDMMC, DCMIPP, USB3, USBH, DCMIPP, LTDC, VDENC, VENC, PCIE.

Alternate text
STM32 Memory space

On STM32MP25x lines More info.png, all bus master peripherals (except the Arm® Cortex®A35) are only 32-bit compatible, meaning that they cannot access to any address greater than 0xffffffff. It is not a problem for a 2 GB (or smaller) DDR configuration, but it is for DDR greater than 2 GB. Indeed, if the application allocates a buffer inside an area above the first 2 GB of DDR, then bus master peripherals cannot access it.

The SWIOTLB mechanism solves this issue.

3. SWIOTLB feature

The SWIOTLB feature prevents a DMA from accessing a buffer outside of its boundaries. As soon a dma_map_xxx API is called, the SWIOTLB code checks the DMA capability (32-bit or more) of the peripheral:

  • If the address of the buffer to transmit is in the range of the DMA capability, there is no problem.
  • If the address of the buffer to transmit is higher than the DMA capability, the SWIOTLB copies the buffer to transmit in an area (the "aperture") which is accessible by the DMA. Then, the DMA accesses to the new allocated buffer (the "bounce buffer"). This copy is done using "memcpy", which can impact performances.

More details are explained in SWIOTLB tutorial[1].

Alternate text
STM32 Memory space


Useful information about SWIOTLB:

  • By default, the SWIOTLB reserves 64 MB of DDR for the aperture. If this value is insufficient, the following kernel log appears:
swiotlb buffer is full (sz: 64 bytes), total 32768(slots), used 32768 (slots)
  • It can be fixed by increasing the aperture area size used for SWIOTLB using the kernel command line
  • Add swiotlb=n with n = the number of TLB slabs requested. On the STM32 platform, a slab is 2 KB. Therefore, if 128 MB is needed for the SWIOTLB aperture, set the following in the kernel command line: swiotlb=65536. Refer to the kernel command line documentation for more details [2]
  • The SWIOTLB cannot copy a buffer greater than 256 KB. If a buffer larger than 256 KB is copied, the following kernel log appears:
swiotlb buffer is full (sz: 1081345 bytes), total 32768 (slots), used 80 (slots)

4. Good practices

The SWIOTLB is an efficient feature to make all DMA transfers possible (whatever buffer location) but, as mentioned earlier in this article, it adds an extra memcpy, which impacts performances.

To avoid this drawback, users must ensure that buffers are allocated in areas accessible by the DMA.
On STM32 platforms, this means avoiding buffer allocation beyond the first 2 GB of DDR. The buffer should be allocated with GFP_DMA flag instead of GFP_KERNEL, or by using dma_alloc_coherent allocation.

For more details, refer to the kernel DMA API documentation [3].

5. References

  1. [1], SWIOTLB tutorial
  2. [2], kernel command line documentation
  3. [3] Linux® kernel DMA API