How to exchange data buffers with the coprocessor

Revision as of 18:49, 8 July 2019 by Registered User



1. Article purpose[edit source]

This article provides information needed to exchange big data between Cortex M4 and Cortex A7.


2. Introduction[edit source]

Virtual UART over RPMSG is a good solution to exchange small/medium amount of data between Cortex M4 and Cortex A7.



Some experiments have demonstrated exchange up to 7MB/s from M4 firmware to user land application.



In situation where more than 7MB/s have to be exchanged, RPMSG is not the solution; in this case transfer by DMA is a must.

DMA will decrease a lot the CPU consumption due to the copy of data.

But the solution based on DMA requires:

  • contiguous memory allocation
  • physical address and size on Cortex M4
  • mmap buffer to let Linux user land application have access to it

A specific Linux driver has to be developed to take care of such constraints.

3. static architecture of big data solution[edit source]

Big data solution is composed of:

  • Linux user land application
  • Linux RPMSG sdb (shared data buffer) driver
  • Linux RPMSG tty driver


4. system use case to demonstrate[edit source]

Let's implement a logic analyser running on the STM32MP1 discovery board. As lots of GPIO port E bits are present on Arduino connector, we will sample PE8..15. In order to implement an intensive data treatment on Cortex M4, M4 firmware will be responsible to:

  • receive a command Start/Stop sampling including sampling frequency
    • sample the data at a sampling frequency
  • filter input data to keep only bit 8..12
  • compress data on bit 13..15
    • bit 13..15 represent a repetition factor:
      • 000: repetition=0 meaning the data is present once
      • 111: repetition=7 meaning the data is present eight times
  • transfer the packed data in DDR buffer by packet of 1024 bytes thanks to DMA
  • inform the user interface when a DDR buffer of 1 megabyte is filled


A user interface will be available in a web page, running in the Linux application.


The user interface allows to control:

  • the sampling frequency
  • the start / stop of the sampling

The user interface displays statistics of:

  • number of packed data received by user interface
  • number of unpacked data decompressed by user interface
  • number of packed data written by user interface in file system

The user interface saves packed data in a binary file.

5. behavior[edit source]

At startup, the application has to:

  • load Cortex M4 firmware
  • open rpmsg_sdb driver
  • use IOCTL function of rpmsg_sdb driver to allocate and mmap 3 buffers of 1MB in DDR memory.
  • open rpmsg_tty driver

On press on START button, the application sends the sampling command to M4 firmware (including the sampling frequency), and open a "date-time.dat" binary file.

On press on STOP button, or overrun data error, or file system full error, the application sends the stop command to M4 firmware and close the "date-time.dat" binary file.

On event "buffer full" coming from Cortex M4 firmware, the application unpack data, write packed data in "date-time.dat" file, and update statistics information.

Cortex M4 firmware is responsible to:

  • receive and manage DDR buffer information (address and size) in an array
  • receive sampling command and perform the sampling of PE8..PE12 in SRAM
  • pack sampling data using the 3 upper bits of each data byte to implement a repetition factor
  • transfer packed data to application to DDR buffer thanks to DMA
  • inform application when a buffer of 1MB in DDR is full and roll to next buffer


rpmsg_sdb driver is responsible to:


  • allocate / free buffers in contiguous memory (DDR)
  • send physical address and size of DDR buffer
  • mmap DDR buffers in order to let user land application have access to them


rpmsg_tty driver will be used to communicate (transport commands and events) between Cortex M4 firmware and user land application.

6. data flow[edit source]

File:LA dataflow.png

7. dynamic view[edit source]

File:LA msc.png

8. snapshot view of user interface[edit source]

File:LA snapshot.png

9. results[edit source]

In this use case M4 CPU is able to compress data at a rate of 64 megabits per second (8 megabytes per second), and this is the maximum we can hit.


Depending on the type of data available on PORTE, the transfer rate between Cortex M4 and Cortex A7 will vary from 1 megabyte per second (repetition factor of 7) to 8 megabytes per second (repetition factor of 0).

10. source code[edit source]

The source code of this use case is available as a Yocto layer here: https://codex.cro.st.com/plugins/git/stm32mpuapp/meta/meta-st-stm32mpu-app-logicanalyser