Last edited 2 months ago

Dmaengine overview

Applicable for STM32MP13x lines, STM32MP15x lines

This article provides basic information about the DMA engine and how STM32 DMA, DMAMUX and MDMA drivers are plugged into it.

1. Framework purpose[edit | edit source]

This article provides basic information about the DMA framework. However it is worth browsing the Kernel documentation related to DMA concept[1].

The direct memory access (DMA) is a feature that allows some hardware subsystems to access memory independently from the central processing unit (CPU).
The DMA can transfer data between peripherals and memory or between memory and memory.

2. System overview[edit | edit source]

Error: Image is invalid or non-existent.

2.1. Component description[edit | edit source]

  • Peripheral DMA client drivers:

DMA clients are drivers that are mapped on the DMA API[2].

  • DMA engine:

The DMA engine is the engine core on which all clients rely.
Refer to DMA provider[1] for useful information on DMA internal behaviour.

  • Virtual DMA channel support:

The virtual DMA channel support manages virtual DMA channels and DMA requests queues. This layer is no used by DMA clients.

  • STM32 xDMA driver:

The STM32 xDMA driver is used to develop the DMA engine API.

  • STM32 DMAMUX driver:

The STM32 DMAMUX driver request multiplexer allows routing DMA request lines between the device peripherals and the DMA controllers.

  • DMAMUX, DMA and MDMA IP controller:

This is the STM32 DMA controller that handles data transfers between peripherals and memories or memory and memory connected to the same bus.

DMAMUX (DMA request router): DMAMUX internal peripheral
DMA: DMA internal peripheral
MDMA : MDMA internal peripheral

  • Peripheral clients:

Peripheral clients are peripherals where at least one DMA request line is mapped on DMAMUX.

  • Memories:

Memories can be either internal (e.g. SRAM, RETRAM or BCKRAM) or external (DDR).

2.2. APIs description[edit | edit source]

Please refer to DMA Engine API Guide[3] for a clear description of the DMA framework API.

In addition, going through Dynamic API[4] provides insight on the DMA memory allocation API. The client has to rely on this API to properly allocate DMA buffers so that they are processed by the DMA engine without any trouble.

The document Dynamic DMA mapping Guide[5] can be read in conjunction with the previous one.
It presents some examples and usecases.

3. Configuration[edit | edit source]

3.1. Kernel Configuration[edit | edit source]

The DMA engine and driver are enabled throughout menu config (see Menuconfig or how to configure kernel):

For DMA:

Device Drivers -> 
    [*] DMA Engine support ->
        [*] STMicroelectronics STM32 DMA support

For DMAMUX:

Device Drivers -> 
    [*] DMA Engine support ->
        [*] STMicroelectronics STM32 dma multiplexer support

For MDMA

Device Drivers -> 
    [*] DMA Engine support ->
        [*] STMicroelectronics STM32 master dma support

3.2. Device Tree configuration[edit | edit source]

The DT configuration can be done using the STM32CubeMX.

Refer to the following articles for a description of the DT configuration:

4. How to use the framework[edit | edit source]

Refer to the DMA Engine API Guide[3] for an exhautive description of DMA engine client API.

4.1. Request a DMA channel[edit | edit source]

Device Tree configuration at STM32 level (arch/arm/boot/dts/stm32mp131.dtsi for STM32MP13x lines More info.png, arch/arm/boot/dts/stm32mp151.dtsi for STM32MP15x lines More info.png) contains the "dmas" and "dma-names" properties in peripheral nodes having request line mapped.

Peripheral drivers just have to request one or more DMA channels, generally during probe.

#include <linux/dmaengine.h>
struct dma_chan *dma_request_chan(struct device *dev, const char *name);

Thanks to the name, dmaengine finds a channel matching the configuration specified in the dmas property.

struct dma_chan *chan_rx, *chan_tx;

chan_rx =  dma_request_chan(&pdev->dev, "rx");
chan_tx = dma_request_chan(&pdev->dev, "tx");

The returned channel can be null, if there are no more available channels or no one fits the requested configuration. So, the peripheral must check the returned channel, and switch to interrupt mode in case it is null.

4.2. Configure the DMA channel[edit | edit source]

A part of channel configuration comes from the dmas property in peripheral device tree node. Refer to the description in DMA controller Device Tree bindings. dma_slave_config structure is also used to set up the channel. See the dma_slave_config structure definition in include/linux/dmaengine.h for an exhaustive description.

struct dma_slave_config {
	enum dma_transfer_direction direction;
	phys_addr_t src_addr;
	phys_addr_t dst_addr;
	enum dma_slave_buswidth src_addr_width;
	enum dma_slave_buswidth dst_addr_width;
	u32 src_maxburst;
	u32 dst_maxburst;
	u32 src_port_window_size;
	u32 dst_port_window_size;
	bool device_fc;
	unsigned int slave_id;
};

Source/Destination addresses, Source/Destination address width, Source/Destination maximum burst are used by the DMA controller driver to configure the channel. The user should use dmaengine_slave_config() to set this dma_slave_config structure in the DMA controller driver.

struct dma_slave_config config;

/* In case of Memory to Device (TX) */
memset(&config, 0, sizeof(config));
config.dst_addr = phy_addr + txdr_offset;
config.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
config.dst_maxburst = 1;
config.direction = DMA_MEM_TO_DEV;

/* In case of Device to Memory (RX/Capture) */
memset(&config, 0, sizeof(config));
config.src_addr = phy_addr + rxdr_offset;
config.src_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
config.src_maxburst = 1;
config.direction = DMA_DEV_TO_MEM;

int dmaengine_slave_config(struct dma_chan *chan, struct dma_slave_config *config);

4.3. Configure the DMA transfer[edit | edit source]

The DMA engine transfer API must be used to prepare the DMA transfer. Three modes are supported by STM32 DMA controllers drivers:

  • slave_sg: prepare a transfer of a list of scatter gather buffer from/to a peripheral
  • dma_cyclic: prepare a cyclic operation from/to a peripheral until the operation is stopped by the user
  • dma_memcpy: prepare a memcpy operation (rarely used except by dmatest)
struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
           struct dma_chan *chan, struct scatterlist *sgl,
           unsigned int sg_len, enum dma_data_direction direction,
           unsigned long flags);

struct dma_async_tx_descriptor *dmaengine_prep_dma_cyclic(
           struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
           size_t period_len, enum dma_data_direction direction);

struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
           struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
           size_t len, unsigned long flags);

A peripheral driver completion callback can be set up using the callback* fields of the dma_async_tx_descriptor returned by the dmaengine_prep* function.

struct dma_async_tx_descriptor *txdesc;

txdesc = dmaengine_prep_...
txdesc->callback = peripheral_driver_dma_callback;
txdesc->callback_param = peripheral_dev;

4.4. Submit the DMA transfer[edit | edit source]

Once the transfer is prepared, it can be submitted for execution. It is added to the pending queue with dmaengine_submit() used as parameter of dma_submit_error() to digest the return value.

dma_cookie_t dmaengine_submit(struct dma_async_tx_descriptor *desc)
static inline int dma_submit_error(dma_cookie_t cookie)

ret = dma_submit_error(dmaengine_submit(desc));

Then the tranfser can be started with dma_async_issue_pending(). If the channel is idle, the first transfer in the queue is started.

void dma_async_issue_pending(struct dma_chan *chan);

On completion of each DMA transfer, DMA interrupt raises, then the next transfer in queue is started and a tasklet is triggered. When scheduled, this tasklet calls the peripheral driver completion callback if set.

4.5. Terminate the DMA transfer[edit | edit source]

Two variants are available to force the DMA channel to stop any ongoing transfer. No completion callback is called for any incomplete transfers, data in DMA controller FIFO may be lost. Refer to the DMA Engine API Guide[3] for more retails.

  • dmaengine_terminate_async(): can be called from atomic context or from within a completion callback;
  • dmaengine_terminate_sync(): must not be called from atomic context of from within a completion callback.
int dmaengine_terminate_sync(struct dma_chan *chan)
int dmaengine_terminate_async(struct dma_chan *chan)

dmaengine_synchronize() should be used after dmaengine_terminate_async() and outside atomic context or completion callback, to synchronize the termination of the DMA channel to the current context. The function waits for the ongoing transfer and any completion callback to finish before it returns.

void dma_release_channel(struct dma_chan *chan)

4.6. Release the DMA channel[edit | edit source]

The peripheral driver can ask for new transfers or simply release the channel if it is no more needed. It is typically done in peripheral driver remove().

void dma_release_channel(struct dma_chan *chan)

5. How to trace and debug the framework[edit | edit source]

5.1. How to trace[edit | edit source]

Through menuconfig, enable DMA engine debugging and DMA engine verbose debugging (including STM32 drivers):

Device Drivers -> 
    [*] DMA Engine support ->
        [*] DMA Engine debugging
        [*]   DMA Engine verbose debugging (NEW)

5.2. How to debug[edit | edit source]

5.2.1. devfs[edit | edit source]

sysfs entry can be used to browse for available DMA channels.

More information can be found in sysfs.

The following command lists all the registered DMA channels:

Board $> ls /sys/class/dma/
dma0chan0   dma0chan13  dma0chan18  dma0chan22  dma0chan27  dma0chan31  dma0chan8  dma1chan3  dma2chan0  dma2chan5
dma0chan1   dma0chan14  dma0chan19  dma0chan23  dma0chan28  dma0chan4   dma0chan9  dma1chan4  dma2chan1  dma2chan6
dma0chan10  dma0chan15  dma0chan2   dma0chan24  dma0chan29  dma0chan5   dma1chan0  dma1chan5  dma2chan2  dma2chan7
dma0chan11  dma0chan16  dma0chan20  dma0chan25  dma0chan3   dma0chan6   dma1chan1  dma1chan6  dma2chan3
dma0chan12  dma0chan17  dma0chan21  dma0chan26  dma0chan30  dma0chan7   dma1chan2  dma1chan7  dma2chan4

Each channel is expanded as follows:

Board $> ls -la /sys/class/dma/dma0chan0/
total 0
drwxr-xr-x  3 root root    0 Jun  7 21:22 .
drwxr-xr-x 34 root root    0 Jun  7 21:22 ..
-r--r--r--  1 root root 4096 Jun  9 13:11 bytes_transferred
lrwxrwxrwx  1 root root    0 Jun  9 13:11 device -> ../../../58000000.dma
-r--r--r--  1 root root 4096 Jun  9 13:11 in_use
-r--r--r--  1 root root 4096 Jun  9 13:11 memcpy_count
drwxr-xr-x  2 root root    0 Jun  9 13:11 power
lrwxrwxrwx  1 root root    0 Jun  9 13:11 subsystem -> ../../../../../../class/dma
-rw-r--r--  1 root root 4096 Jun  7 21:22 uevent

device indicates which DMA driver manages the channel.

echoing in_use indicates whether the channel has been allocated or not.

Board $> cat /sys/class/dma/dma0chan0/in_use                                               
1

5.2.2. Debugfs[edit | edit source]

debugfs entries are available. Via the /sys/kernel/debug/dmaengine users can get information about the DMA devices and the used channels.

root@stm32mp1:~# cat /sys/kernel/debug/dmaengine/summary 
dma0 (58000000.dma-controller): number of channels: 32
 dma0chan0    | 48000000.dma-controller:ch0
 dma0chan1    | 48000000.dma-controller:ch1
 dma0chan2    | 48000000.dma-controller:ch2
 dma0chan3    | 48000000.dma-controller:ch3
 dma0chan4    | 48000000.dma-controller:ch4
 dma0chan5    | 48000000.dma-controller:ch5
 dma0chan6    | 48000000.dma-controller:ch6
 dma0chan7    | 48000000.dma-controller:ch7
 dma0chan8    | 48001000.dma-controller:ch0
 dma0chan9    | 48001000.dma-controller:ch1
 dma0chan10   | 48001000.dma-controller:ch2
 dma0chan11   | 48001000.dma-controller:ch3
 dma0chan12   | 48001000.dma-controller:ch4
 dma0chan13   | 48001000.dma-controller:ch5
 dma0chan14   | 48001000.dma-controller:ch6
 dma0chan15   | 48001000.dma-controller:ch7
 dma0chan16   | 54002000.hash:in

dma1 (48000000.dma-controller): number of channels: 8
 dma1chan0    | 4000e000.serial:rx (via router: 48002000.dma-router)
 dma1chan1    | 4000e000.serial:tx (via router: 48002000.dma-router)
 dma1chan2    | 4000b000.audio-controller:tx (via router: 48002000.dma-router)
 dma1chan3    | 4000b000.audio-controller:rx (via router: 48002000.dma-router)
 dma1chan4    | 4400b004.audio-controller:tx (via router: 48002000.dma-router)
 dma1chan5    | 4400b024.audio-controller:rx (via router: 48002000.dma-router)

dma2 (48001000.dma-controller): number of channels: 8

Other DMA debugfs entries are available when the Linux kernel is compiled with "Enable debugging of DMA-API usage" configuration. They are documented in Part III - Debug drivers use of the DMA-API[4].

5.2.3. dmatest[edit | edit source]

dmatest can be used to validate or debug DMA engine and driver without using client devices. This module is more a test than a debug module. It performs a memory-to-memory copy using standard DMA engine API.

For details on how to use this kernel module, refer to [6].

6. Source code location[edit | edit source]

DMA: drivers/dma/stm32-dma.c
MDMA: drivers/dma/stm32-mdma.c
DMAMUX: drivers/dma/stm32-dmamux.c

DMA engine:

7. To go further[edit | edit source]

Very useful documentation can be found at DMAEngine documentation

8. References[edit | edit source]