Dmaengine overview

Revision as of 10:59, 25 November 2021 by Registered User (Merge articles)
Applicable for STM32MP13x lines, STM32MP15x lines

This article provides basic information about the DMA engine and how STM32 DMA, DMAMUX and MDMA drivers are plugged into it.

1. Framework purpose[edit source]

This article provides basic information about the DMA framework. For additional information, browse the Kernel documentation related to DMA concept[1].

The direct memory access (DMA) is a feature that allows some hardware subsystems to access memory independently from the central processing unit (CPU).
The DMA can transfer data between peripherals and memory or between memory and memory.

2. System overview[edit source]

Error: Image is invalid or non-existent.

2.1. Component description[edit source]

  • Peripheral DMA client drivers:

DMA clients are drivers that are mapped on the DMA API[2].

  • DMA engine:

The DMA engine is the engine core on which all clients rely.
Refer to DMA provider[1] for useful information on DMA internal behavior.

  • Virtual DMA channel support:

The virtual DMA channel support manages virtual DMA channels and DMA requests queues. This layer is no used by DMA clients.

  • STM32 xDMA driver:

The STM32 xDMA driver is used to develop the DMA engine API.

  • STM32 DMAMUX driver:

The STM32 DMAMUX driver request multiplexer allows routing DMA request lines between the device peripherals and the DMA controllers.

  • DMAMUX, DMA and MDMA IP controller:

This is the STM32 DMA controller that handles data transfers between peripherals and memories or memory and memory connected to the same bus.

DMAMUX (DMA request router): DMAMUX internal peripheral
DMA: DMA internal peripheral
MDMA : MDMA internal peripheral

  • Peripheral clients:

Peripheral clients are peripherals where at least one DMA request line is mapped on DMAMUX.

  • Memories:

Memories can be either internal (such as SRAM, RETRAM or backup RAM) or external (DDR memories).

2.2. API description[edit source]

Refer to DMA Engine API Guide[3] for a clear description of the DMA framework API.

In addition, going through Dynamic API[4] provides insight on the DMA memory allocation API. The client has to rely on this API to properly allocate DMA buffers so that they are processed by the DMA engine without any trouble.

The document Dynamic DMA mapping Guide[5] presents some examples and usecases.
It can be read in conjunction with the previous one.

3. Configuration[edit source]

3.1. Kernel configuration[edit source]

The DMA engine and driver are enabled throughout menu config (see Menuconfig or how to configure kernel):

For DMA:

Device drivers -> 
    [*] DMA engine support ->
        [*] STMicroelectronics STM32 DMA support

For DMAMUX:

Device drivers -> 
    [*] DMA engine support ->
        [*] STMicroelectronics STM32 DMA multiplexer support

For MDMA

Device drivers -> 
    [*] DMA engine support ->
        [*] STMicroelectronics STM32 master DMA support

3.2. Device tree configuration[edit source]

The device tree (DT) configuration can be done using the STM32CubeMX.

Refer to the following articles for a description of the DT configuration:

4. How to use the framework[edit source]

Refer to the DMA Engine API Guide[3] for an exhautive description of the DMA engine client API.

4.1. Request a DMA channel[edit source]

Device Tree configuration at STM32 level (arch/arm/boot/dts/stm32mp131.dtsi for STM32MP13x lines More info.png, arch/arm/boot/dts/stm32mp151.dtsi for STM32MP15x lines More info.png) contains the "dmas" and "dma-names" properties in peripheral nodes having request line mapped.

The peripheral drivers just have to request one or more DMA channels. This is generally done during probe.

#include <linux/dmaengine.h>
struct dma_chan *dma_request_chan(struct device *dev, const char *name);

Thanks to the name, the dmaengine finds a channel that matches the configuration specified in the dmas property.

struct dma_chan *chan_rx, *chan_tx;

chan_rx =  dma_request_chan(&pdev->dev, "rx");
chan_tx = dma_request_chan(&pdev->dev, "tx");

The returned channel can be null if there are no more available channels or none of them fits the requested configuration. In this case, the peripheral must check the returned channel and switch to Interrupt mode.

4.2. Configure the DMA channel[edit source]

A part of channel configuration comes from the dmas property in the peripheral device tree node. Refer to the description in DMA controller device tree bindings. dma_slave_config structure is also used to set up the channel. Refer to the dma_slave_config structure definition in include/linux/dmaengine.h for an exhaustive description.

struct dma_slave_config {
	enum dma_transfer_direction direction;
	phys_addr_t src_addr;
	phys_addr_t dst_addr;
	enum dma_slave_buswidth src_addr_width;
	enum dma_slave_buswidth dst_addr_width;
	u32 src_maxburst;
	u32 dst_maxburst;
	u32 src_port_window_size;
	u32 dst_port_window_size;
	bool device_fc;
	unsigned int slave_id;
};

Source/Destination addresses, Source/Destination address width, Source/Destination maximum burst are used by the DMA controller driver to configure the channel. The user must use dmaengine_slave_config() to set this dma_slave_config structure in the DMA controller driver.

struct dma_slave_config config;

/* In case of memory to device (TX) */
memset(&config, 0, sizeof(config));
config.dst_addr = phy_addr + txdr_offset;
config.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
config.dst_maxburst = 1;
config.direction = DMA_MEM_TO_DEV;

/* In case of device to memory (RX/Capture) */
memset(&config, 0, sizeof(config));
config.src_addr = phy_addr + rxdr_offset;
config.src_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
config.src_maxburst = 1;
config.direction = DMA_DEV_TO_MEM;

int dmaengine_slave_config(struct dma_chan *chan, struct dma_slave_config *config);

4.3. Configure the DMA transfer[edit source]

The DMA engine transfer API must be used to prepare the DMA transfer. Three modes are supported by STM32 DMA controller drivers:

  • slave_sg: prepares a transfer of a list of scatter-gather buffer from/to a peripheral
  • dma_cyclic: prepares a cyclic operation from/to a peripheral until the operation is stopped by the user
  • dma_memcpy: prepares a memcpy operation (seldom used except by dmatest)
struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
           struct dma_chan *chan, struct scatterlist *sgl,
           unsigned int sg_len, enum dma_data_direction direction,
           unsigned long flags);

struct dma_async_tx_descriptor *dmaengine_prep_dma_cyclic(
           struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
           size_t period_len, enum dma_data_direction direction);

struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy(
           struct dma_chan *chan, dma_addr_t dst, dma_addr_t src,
           size_t len, unsigned long flags);

A peripheral driver completion callback can be set up using the callback* fields of the dma_async_tx_descriptor returned by the dmaengine_prep* function.

struct dma_async_tx_descriptor *txdesc;

txdesc = dmaengine_prep_...
txdesc->callback = peripheral_driver_dma_callback;
txdesc->callback_param = peripheral_dev;

4.4. Submit the DMA transfer[edit source]

Once the transfer is prepared, it can be submitted for execution. It is added to the pending queue using dmaengine_submit() used as parameter of dma_submit_error() to digest the returned value.

dma_cookie_t dmaengine_submit(struct dma_async_tx_descriptor *desc)
static inline int dma_submit_error(dma_cookie_t cookie)

ret = dma_submit_error(dmaengine_submit(desc));

The transfer can then be started using dma_async_issue_pending(). If the channel is idle, the first transfer in the queue is started.

void dma_async_issue_pending(struct dma_chan *chan);

On completion of each DMA transfer, a DMA interrupt is raised, then the next transfer in the queue is started and a tasklet is triggered. When scheduled, this tasklet calls the peripheral driver completion callback, provided it is set.

4.5. Terminate the DMA transfer[edit source]

Two variants are available to force the DMA channel to stop an ongoing transfer. No completion callback is called for an incomplete transfer and the data in DMA controller FIFO may be lost. Refer to the DMA Engine API Guide[3] for more details.

  • dmaengine_terminate_async(): this function can be called from atomic context or from within a completion callback;
  • dmaengine_terminate_sync(): this function must not be called from atomic context or from within a completion callback.
int dmaengine_terminate_sync(struct dma_chan *chan)
int dmaengine_terminate_async(struct dma_chan *chan)

dmaengine_synchronize() must be used after dmaengine_terminate_async() and outside atomic context or completion callback, to synchronize the termination of the DMA channel with the current context. The function waits for the completion of the ongoing transfer and any callback before returning.

void dma_release_channel(struct dma_chan *chan)

4.6. Release the DMA channel[edit source]

The peripheral driver can ask for new transfers or simply release the channel if it is no more needed. It is typically done by calling the peripheral driver remove() function.

void dma_release_channel(struct dma_chan *chan)

5. How to trace and debug the framework[edit source]

5.1. How to trace[edit source]

Through menuconfig, enable DMA engine debugging and DMA engine verbose debugging (including STM32 drivers):

Device Drivers -> 
    [*] DMA Engine support ->
        [*] DMA Engine debugging
        [*]   DMA Engine verbose debugging (NEW)

5.2. How to debug[edit source]

5.2.1. devfs[edit source]

sysfs entry can be used to browse for available DMA channels.

More information can be found in sysfs.

The following command lists all the registered DMA channels:

Board $> ls /sys/class/dma/
dma0chan0   dma0chan13  dma0chan18  dma0chan22  dma0chan27  dma0chan31  dma0chan8  dma1chan3  dma2chan0  dma2chan5
dma0chan1   dma0chan14  dma0chan19  dma0chan23  dma0chan28  dma0chan4   dma0chan9  dma1chan4  dma2chan1  dma2chan6
dma0chan10  dma0chan15  dma0chan2   dma0chan24  dma0chan29  dma0chan5   dma1chan0  dma1chan5  dma2chan2  dma2chan7
dma0chan11  dma0chan16  dma0chan20  dma0chan25  dma0chan3   dma0chan6   dma1chan1  dma1chan6  dma2chan3
dma0chan12  dma0chan17  dma0chan21  dma0chan26  dma0chan30  dma0chan7   dma1chan2  dma1chan7  dma2chan4

Each channel is expanded as follows:

Board $> ls -la /sys/class/dma/dma0chan0/
total 0
drwxr-xr-x  3 root root    0 Jun  7 21:22 .
drwxr-xr-x 34 root root    0 Jun  7 21:22 ..
-r--r--r--  1 root root 4096 Jun  9 13:11 bytes_transferred
lrwxrwxrwx  1 root root    0 Jun  9 13:11 device -> ../../../58000000.dma
-r--r--r--  1 root root 4096 Jun  9 13:11 in_use
-r--r--r--  1 root root 4096 Jun  9 13:11 memcpy_count
drwxr-xr-x  2 root root    0 Jun  9 13:11 power
lrwxrwxrwx  1 root root    0 Jun  9 13:11 subsystem -> ../../../../../../class/dma
-rw-r--r--  1 root root 4096 Jun  7 21:22 uevent

device indicates which DMA driver manages the channel.

echoing in_use indicates whether the channel has been allocated or not.

Board $> cat /sys/class/dma/dma0chan0/in_use                                               
1

5.2.2. Debugfs[edit source]

debugfs entries are available. The user can get information about the DMA devices and the used channels through the /sys/kernel/debug/dmaengine .

root@stm32mp1:~# cat /sys/kernel/debug/dmaengine/summary 
dma0 (58000000.dma-controller): number of channels: 32
 dma0chan0    | 48000000.dma-controller:ch0
 dma0chan1    | 48000000.dma-controller:ch1
 dma0chan2    | 48000000.dma-controller:ch2
 dma0chan3    | 48000000.dma-controller:ch3
 dma0chan4    | 48000000.dma-controller:ch4
 dma0chan5    | 48000000.dma-controller:ch5
 dma0chan6    | 48000000.dma-controller:ch6
 dma0chan7    | 48000000.dma-controller:ch7
 dma0chan8    | 48001000.dma-controller:ch0
 dma0chan9    | 48001000.dma-controller:ch1
 dma0chan10   | 48001000.dma-controller:ch2
 dma0chan11   | 48001000.dma-controller:ch3
 dma0chan12   | 48001000.dma-controller:ch4
 dma0chan13   | 48001000.dma-controller:ch5
 dma0chan14   | 48001000.dma-controller:ch6
 dma0chan15   | 48001000.dma-controller:ch7
 dma0chan16   | 54002000.hash:in

dma1 (48000000.dma-controller): number of channels: 8
 dma1chan0    | 4000e000.serial:rx (via router: 48002000.dma-router)
 dma1chan1    | 4000e000.serial:tx (via router: 48002000.dma-router)
 dma1chan2    | 4000b000.audio-controller:tx (via router: 48002000.dma-router)
 dma1chan3    | 4000b000.audio-controller:rx (via router: 48002000.dma-router)
 dma1chan4    | 4400b004.audio-controller:tx (via router: 48002000.dma-router)
 dma1chan5    | 4400b024.audio-controller:rx (via router: 48002000.dma-router)

dma2 (48001000.dma-controller): number of channels: 8

Other DMA debugfs entries are available when the Linux® kernel is compiled using "Enable debugging of DMA-API usage" configuration. They are documented in Part III - Debug drivers use of the DMA-API[4].

5.2.3. dmatest[edit source]

dmatest can be used to validate or debug DMA engine and driver without using client devices. This is more a test than a debug module. It performs a memory-to-memory copy using the standard DMA engine API.

For details on how to use this kernel module, refer to [6].

6. Source code location[edit source]

DMA: drivers/dma/stm32-dma.c
MDMA: drivers/dma/stm32-mdma.c
DMAMUX: drivers/dma/stm32-dmamux.c

DMA engine:

7. To go further[edit source]

Very useful documentation can be found at DMAEngine documentation

8. References[edit source]