Difference between revisions of "How to exchange large data buffers with the coprocessor - example"

[quality revision] [pending revision]
m
m
 

1 Article purpose[edit]

This article gives an example of high-rate transfers of data chunks from the Arm® Cortex®-M core to the Arm® Cortex®-A core.

2 Introduction[edit]

Relying on a logic analyzer sample, this article describes the mechanism and the software implemented to perform high-rate transfers. In this example, the Cortex-M core is used to perform continuously:

  • real-time operations
  • offload of a heavy simple data algorithm (masking bit)
  • transfer of the resulting data flow to DDR buffers via DMA.
Such kind of implementation requires 
  • to the cortex-A

Depending on the frequency sampling, the data is transferred using:

  • or indirect buffer exchange mode
    • transfer using DDR buffers requiring:
      • contiguous memory allocation in DDR memory
      • Cortex-M awareness of the physical address and size of the memory buffers
      • mmaping of buffers to enable Linux® user land application access to them
    .
A specific Linux driver,
    • rpmsg_sdb (shared data buffer) Linux driver,
has been
    • developed to take care of
such
    • DDR constraints.

    • For details on
the .
    • TTY RPMsg channel for control
    • sampling frequency is more than 5MHz

3 Example of context description[edit]

Let us implement a logic analyzer running on the STM32MP1 discovery kit.

From the user interface, press the START button to start the logic analyzer sampling. The logic analyzer samples GPIO PORT E bits 8 to 1214, which are present on the Arduino connector. They correspond to 5 7 bits. The 3 remaining bits in each data byte are used to implement a packing algorithm. After packing, the logic analyzer saves data in a binary file system named "date-time".dat, where date-time is the date and time of the system when the START button is pressed8th bit will be reset by M4 algorithm.

The number of received data is displayed on the screen as bytes and megabytes.

4 Example of static architecture for exchanging large data buffers[edit]

The example of large data buffer exchange includes:

  • A Cortex-M firmware
  • A Linux user land application
  • A Linux rpmsg_tty driver
  • A Linux rpmsg_sdb (shared data buffer) driver
  • A Linux rpmsg_tty driver
How2ELDBArchi.jpg

In the figure above, the numbers indicate the chronological order of data flows.

Note that the there is a direct correspondence between the sampling frequency and the data flow rate: thus, a sampling frequency of 8MHz means a data flow rate of 8MB/s (megabytes per second).

Data flow when sampling frequency is less than or equal to 5MHz:


Data flow when sampling frequency is more than 5MHz:

5 Cortex-M firmware[edit]

The Cortex-M firmware is responsible for:

  • receiving a command giving of the number of DDR buffers through the TTY RPMsg channel, from the Linux application.
  • receiving messages containing the physical address and size of DDR buffer(s), from the Linux rpmsg_sdb driver; These DDR buffers are always allocated, during the initialisation step, even if they are only used when the frequency sampling is more than 5MHz (see below).
  • receiving a command Start/Stop sampling (including sampling frequency) through the TTY RPMsg channel, from the Linux application
  • On start request:
    • sampling the data at the requested sampling frequency
    • filtering and packing data
    • transferring via DMA the packed data to the thanks to DMA2 stream0 from GPIOE to SRAM buffers
    • masking and transferring data buffers from SRAM to Linux application
      • thanks to TTY over RPMSG buffer by packet of 256 bytes if the frequency is less than or equal to 5MHz
      • thanks to copy to DDR buffer by packet of 1024 bytes
      • , if the frequency sampling is more than 5MHz, and informing the Cortex-A user interface (through the
      TTY
      • SDB RPMsg channel) when a DDR buffer of 1 Mbyte is filled, and roll to next DDR buffer.

6 Linux user land application[edit]

The Cortex-A Linux application includes a web GTK user interface.

It allows controlling:

  • the sampling frequency
  • the start / stop of the sampling.
  • the data to be sampled thanks to "Set data" notch UI widget

The user interface displays statistics, includingdisplays :

  • statistics : the number of packed data received by the user interface, as bytes and Mbytes
  • the number of unpacked data decompressed by the user interface
  • the number of packed data written by the user interface in the file system.

The user interface saves the packed data in a binary file.

  • first data of every new received megabyte

7 Linux drivers[edit]

  • The rpmsg_sdb Linux driver is responsible for the shared buffer management.The rpmsg_tty driver tty driver (drivers/rpmsg/rpmsg_tty.c ) is used to communicate (transport commands and status/events) between the Cortex-M firmware and the Cortex-A user land application.
  • The rpmsg_sdb Linux driver is responsible for the DDR shared buffers management.

8 Dynamic view[edit]

At startup, the Linux application performs the following actions:

  • It loads the rpmsg_sdb.ko module.
  • It loads the Cortex-M firmware, then starts it.
  • It opens the a rpmsg_tty driver channel for Cortex-M firmware control.
  • It opens a rpmsg_tty channel for Cortex-M firmware trace debug.
  • It opens the rpmsg_sdb driver, then uses rpmsg_sdb IOCTL interface to allocate and mmap 3 10 buffers of 1Mbyte in DDR memory.

When the user presses User button2, the Linux application starts.


When the START button is pressed, the application sends the sampling command to the Cortex-M firmware (including the sampling frequency), and creates a "date-time.dat" binary file that is used to store the data sample in mass storage.

When the STOP button is pressed, an overrun data error or a file system full error occurs, the application sends the stop command to the Cortex-M firmware and finalizes the "date-time.dat" binary file.

.

Case 1: user selects a frequency sampling of 4MHz => case of TTY buffers (frequency scaling less than or equal to 5MHz)

When the Linux application receives a data buffer over TTY it checks if a new MByte has been fully received, and in this case it updates the statistics information.


Case 2: user selects a frequency sampling of 8MHz => case of copy to DDR buffers (frequency scaling more than 5MHz)

When the Cortex-M firmware sends a "buffer full" signal via the rpmsg_sdb driver, the application unpacks data, writes the packed data in "date-time.dat" file, and updates the statistic information.

Howtobigdatamsc.jpg

9 Results[edit]

Depending on the type of data available on GPIOE, the transfer rate between the Cortex-M and the Cortex-A core varies from 1 Mbyte per second (repetition factor of 7) to 8 Mbytes per second (repetition factor of 0).

In the later use case, the Cortex-M CPU is able to compress data at a rate up to 64 Mbits per second (8 Mbytes per second). This corresponds to the maximum rate that can be achieved.

10 Limitation[edit]

The limitation is due to data packing, as shown in the figure below:

LA perf.png

Linux application updates the statistics information.


When the STOP button is pressed, the application sends the stop command to the Cortex-M firmware.

When the user presses User button2, the Linux application stops.


9 Results[edit]

The Cortex-M CPU performs a mask and copy data operation on 1024 bytes within 75.4µs; This implies a maximum frequency sampling of: 1 / (75.4e-6 / 1024) => 13.58MHz. This corresponds to the maximum frequency sampling that can be achieved. In order to let a margin, the maximum frequency sampling implemented in this example is set to 12MHz.

On this oscilloscope snapshot, a GPIO is set at the beginning of the packing data masking and copying algorithm, and reset at the end of the algorithm. So, 10975.5 4 µs are spent to pack mask and copy 1024 bytes of data at a sampling frequency of 8 MHz. Increasing the frequency to 9 MHz would cross the limit : 9MHz => 111µs.

11

in DDR.

10 Source code[edit]

The source code corresponding to this use case is available as a Yocto layer at:

https://github.com/STMicroelectronics/meta-st-stm32mpu-app-logicanalyser.git

The firmware is included in the Yocto layer as an .elf file.

The source code of the Cortex-M firmware is available at:

https://github.com/STMicroelectronics/logicanalyser

For firmware compilation, please have a look into: Developer Package for STM32CubeMP1

In the source code example, 10 buffers of 1MByte each are allocated for the exchange. 3 buffers is the minimum to guarantee the real time behavior of the application. If the number of buffer needs to be increased (more than 10), then rpmsg_sdb_driver, M4 firmware, and Linux application must be modified, as the messaging relies on a single digit for the buffer index "BxLyyyyyyyy" => "BxxLyyyyyyyy".

11 Key messages[edit]

Cortex-M is capable to perform basic algorithm on high data flow. If a more complex data treatment is needed, the data rate must be adapted to be able to treat it on Cortex-M.

Code instrumentation and GPIO set/reset to measure data algorithm timing are highly recommended to check real time of the full system.

For a data rate less than or equal to 5MB/s, TTY over RPMSG should be the preferred solution.

It is not recommended to use several DMA2 streams to deal with high data rates: using a second DMA stream to transfer data to DDR does not work if both streams work at same high rate (trials at 6MHz proved this); in this case a DMA stream0 error occurs.

12 Usage[edit]

Please follow README.md of the Yocto layer to perform installation.

The logicanalyser application is launched/stopped by pressing User2 button of the STM32MP1 Discovery board.

Select the sampling frequency and click on Start to start the use case.

Snapshot view of user interface :



== Article purpose ==
This article gives an example of high-rate transfers of data chunks from the Arm® Cortex®-M core to the Arm® Cortex®-A core.

== Introduction ==
Relying on a logic analyzer sample, this article describes the mechanism and the software implemented to perform high-rate transfers.
In this example, the Cortex-M core is used to perform continuously:
* real-time operations
* offload of a heavy data algorithmsimple data algorithm (masking bit)* transfer of the resulting data flow to DDR buffers via DMA.

Such kind of implementation requires :
*to the cortex-A

Depending on the frequency sampling, the data is transferred using:
* either [[Exchanging_buffers_with_the_coprocessor#Direct_buffer_exchange_mode | direct buffer exchange mode]]
** TTY RPMsg channel for control and data transfer
** sampling frequency is less than or equal to 5MHz

* or  [[Exchanging_buffers_with_the_coprocessor#Indirect_buffer_exchange_mode | indirect buffer exchange mode]] 
** transfer using DDR buffers requiring:
*** contiguous memory allocation in DDR memory
*** Cortex-M awareness of the physical address and size of the memory buffers
*** mmaping of buffers to enable Linux® user land application access to them.

A specific Linux driver, 

**rpmsg_sdb (shared data buffer), has been Linux driver, developed to take care of suchDDR constraints. <br>
For details on thethis buffer exchange mechanisms, refer to the [[how_to_exchange_large_data_buffers_with_the_coprocessor_-_principle | how to exchange large data buffers with the coprocessor - principle]] article.


** TTY RPMsg channel for control
** sampling frequency is more than 5MHz
== Example of context description == 
Let us implement a logic analyzer running on the [[STM32MP15 Discovery kits - getting started|STM32MP1 discovery kit]].

From the user interface, press the START button to start the logic analyzer sampling.
The logic analyzer samples GPIO PORT E bits 8 to 1214, which are present on the Arduino connector. They correspond to 57 bits.
The 3 remaining bits in each data byte are used to implement a packing algorithm.

After packing, the logic analyzer saves data in a binary file system named "date-time".dat, where date-time is the date and time of the system when the START button is pressed8th bit will be reset by M4 algorithm.

The number of received data is displayed on the screen as bytes and megabytes.

== Example of static architecture for exchanging large data buffers ==
The example of large data buffer exchange includes:
* A Cortex-M firmware
* A Linux user land application
* A Linux rpmsg_tty driver
* A Linux rpmsg_sdb (shared data buffer) driver* A Linux rpmsg_tty driver

[[File:How2ELDBArchi

Note that the there is a direct correspondence between the sampling frequency and the data flow rate: thus, a sampling frequency of 8MHz means a data flow rate of 8MB/s (megabytes per second).

Data flow when sampling frequency is less than or equal to 5MHz:
[[File:how2bigdatarchiTTY1.jpg|thumb|center|800px|link=]]In the figure above, the numbers indicate the chronological order of data flows.<br clear=all>


Data flow when sampling frequency is more than 5MHz:
[[File:how2bigdatarchiDDR1.jpg|thumb|center|800px|link=]]


== Cortex-M firmware ==
The Cortex-M firmware is responsible for:
* receiving a command giving of the number of DDR buffers through the TTY RPMsg channel, from the Linux application.
* receiving messages containing the physical address and size of DDR buffer(s), from the Linux rpmsg_sdb driver 
; These DDR buffers are always allocated, during the initialisation step, even if they are only used when the frequency sampling is more than 5MHz (see below).* receiving a command Start/Stop sampling (including sampling frequency) through the TTY RPMsg channel, from the Linux application
* On start request:
** sampling the data at the requested sampling frequency ** filtering and packing data
** transferring via DMA the packed data to the thanks to DMA2 stream0 from GPIOE to SRAM buffers
** masking and transferring data buffers from SRAM to Linux application
*** thanks to TTY over RPMSG buffer by packet of 256 bytes if the frequency is less than or equal to 5MHz 
*** thanks to copy to DDR buffer by packet of 1024 bytes 

** , if the frequency sampling is more than 5MHz, and informing the Cortex-A user interface (through the TTYSDB  RPMsg channel) when a DDR buffer of 1 Mbyte is filled, and roll to next DDR buffer.

== Linux user land application ==
The Cortex-A Linux application includes a webGTK user interface.

It allows controlling:
* the sampling frequency 
* the start / stop of the sampling.
* the data to be sampled thanks to "Set data" notch UI widget 
The user interface displays :
* statistics, including:
*  : the number of packed data received by the user interface

* the number of unpacked data decompressed by the user interface
* the number of packed data written by the user interface in the file system.

The user interface saves the packed data in a binary file.

== Linux drivers ==, as bytes and Mbytes
* the first data of every new received megabyte

== Linux drivers ==
* The rpmsg_tty driver ({{CodeSource | Linux kernel | drivers/rpmsg/rpmsg_tty.c}}) is used to communicate (transport commands and status/events) between the Cortex-M firmware and the Cortex-A user land application.* The [[how_to_exchange_large_data_buffers_with_the_coprocessor_-_principle#rpmsg_sdb_driver | rpmsg_sdb]] Linux driver is responsible for the DDR shared bufferbuffers management. 
* The rpmsg_tty driver is used to communicate (transport commands and status/events) between the Cortex-M firmware and the Cortex-A user land application.
== Dynamic view ==
At startup, the Linux application performs the following actions:
* It loads the rpmsg_sdb.ko module.
* It loads the Cortex-M firmware, then starts it.
* It opens thea rpmsg_tty driverchannel for Cortex-M firmware control.
* It opens a rpmsg_tty channel for Cortex-M firmware trace debug.
* It opens the rpmsg_sdb driver, then uses rpmsg_sdb IOCTL interface to allocate and mmap 310 buffers of 1Mbyte in DDR memory.

When the user presses User button2, the Linux application starts.

[[File:how2bigdataSTARTmsc.png|thumb|left|800px|link=]]
<br clear=all>


When the START button is pressed, the application sends the sampling command to the Cortex-M firmware (including the sampling frequency), and creates a "date-time.dat" binary file that is used to store the data sample in mass storage.

When the STOP button is pressed, an overrun data error or a file system full error occurs, the application sends the stop command to the Cortex-M firmware and finalizes the "date-time.dat" binary file.
.

Case 1: user selects a frequency sampling of 4MHz => case of  TTY buffers (frequency scaling less than or equal to 5MHz)

When the Linux application receives a data buffer over TTY it checks if a new MByte has been fully received, and in this case it updates the statistics information.

[[File:how2bigdataTTYmsc.png|thumb|left|800px|link=]]
<br clear=all>


Case 2: user selects a frequency sampling of 8MHz => case of copy to DDR buffers (frequency scaling more than 5MHz)
When the Cortex-M firmware sends a "buffer full" signal via the rpmsg_sdb driver, theLinux application unpacks data, writes the packed data in "date-time.dat" file, and updates the statistic information.

[[File:howtobigdatamsc.jpgupdates the statistics information.

[[File:how2bigdataDDRmsc.png|thumb|left|800px|link=]]
<br clear=all>

== Results ==
Depending on the type of data available on GPIOE, the transfer rate between the Cortex-M and the Cortex-A core varies from 1 Mbyte per second (repetition factor of 7) to 8 Mbytes per second (repetition factor of 0).

In the later use case, the Cortex-M CPU is able to compress data at a rate up to 64 Mbits per second (8 Mbytes per second).When the STOP button is pressed, the application sends the stop command to the Cortex-M firmware.

When the user presses User button2, the Linux application stops.

[[File:how2bigdataENDmsc.png|thumb|left|800px|link=]]
<br clear=all>


== Results ==
The Cortex-M CPU performs a mask and copy data operation on 1024 bytes within 75.4µs; This implies a maximum frequency sampling of: 1 / (75.4e-6 / 1024) => 13.58MHz. This corresponds to the maximum rate frequency sampling that can be achieved.  


== Limitation ==
The limitation is due to data packing, as shown in the figure below:

[[File:LA_perfIn order to let a margin, the maximum frequency sampling implemented in this example is set to 12MHz.  

[[File:how2bigdataChrono.png|thumb|center|800px|link=]]

On this oscilloscope snapshot, a GPIO is set at the beginning of the packing data masking and copying algorithm, and reset at the end of the algorithm. 
So, 109.575.4 µs are spent to pack mask and copy 1024 bytes of data at a sampling frequency of 8 MHz. Increasing the frequency to 9 MHz would cross the limit : 9MHz => 111µs. 

==in DDR.

== Source code ==
The source code corresponding to this use case is available as a Yocto layer at:
:https://github.com/STMicroelectronics/meta-st-stm32mpu-app-logicanalyser.git
The firmware is included in the Yocto layer as an .elf file.

The source code of the Cortex-M firmware is available at:
:https://github.com/STMicroelectronics/logicanalyser
For firmware compilation, please have a look into:	[[STM32CubeMP1 Package#Developer Package for STM32CubeMP1|Developer Package for STM32CubeMP1]]
== In the source code example, 10 buffers of 1MByte each are allocated for the exchange. 3 buffers is the minimum to guarantee the real time behavior of the application. If the number of buffer needs to be increased (more than 10), then  rpmsg_sdb_driver, M4 firmware, and Linux application must be modified, as the messaging relies on a single digit for the buffer index "BxLyyyyyyyy" => "BxxLyyyyyyyy".

== Key messages ==
Cortex-M is capable to perform basic algorithm on high data flow.
If a more complex data treatment is needed, the data rate must be adapted to be able to treat it on Cortex-M.

Code instrumentation and GPIO set/reset to measure data algorithm timing are highly recommended to check real time of the full system.

'''For a data rate less than or equal to 5MB/s, TTY over RPMSG should be the preferred solution.'''

It is not recommended to use several DMA2 streams to deal with high data rates: using a second DMA stream to transfer data to DDR does not work if both streams work at same high rate (trials at 6MHz proved this); in this case a DMA stream0 error occurs.

== Usage ==
Please follow README.md of the Yocto layer to perform installation.

The '''logicanalyser application''' is launched/stopped by pressing User2 button of the STM32MP1 Discovery board.

Select the sampling frequency and click on Start to start the use case.

Snapshot view of user interface :
[[File:Screenshot.jpgscreenshotLA.png|thumb|center|800px|link=]]
<noinclude>

{{PublicationRequestId | 14364 | 2019-12-10 | AnneJ}}
[[Category:How to run use cases with expansions]]</noinclude>
(8 intermediate revisions by 3 users not shown)
Line 6: Line 6:
 
In this example, the Cortex-M core is used to perform continuously:
 
In this example, the Cortex-M core is used to perform continuously:
 
* real-time operations
 
* real-time operations
* offload of a heavy data algorithm
+
* simple data algorithm (masking bit)
* transfer of the resulting data flow to DDR buffers via DMA.
+
* transfer of the resulting data to the cortex-A
   
Such kind of implementation requires :
+
Depending on the frequency sampling, the data is transferred using:
* contiguous memory allocation in DDR memory
+
* either [[Exchanging_buffers_with_the_coprocessor#Direct_buffer_exchange_mode | direct buffer exchange mode]]
* Cortex-M awareness of the physical address and size of the memory buffers
+
** TTY RPMsg channel for control and data transfer
* mmaping of buffers to enable Linux&reg; user land application access to them.
+
** sampling frequency is less than or equal to 5MHz
   
A specific Linux driver, rpmsg_sdb (shared data buffer), has been developed to take care of such constraints. <br>
+
* or  [[Exchanging_buffers_with_the_coprocessor#Indirect_buffer_exchange_mode | indirect buffer exchange mode]]
For details on the buffer exchange mechanisms, refer to the [[how_to_exchange_large_data_buffers_with_the_coprocessor_-_principle | how to exchange large data buffers with the coprocessor - principle]] article.
+
** transfer using DDR buffers requiring:
  +
*** contiguous memory allocation in DDR memory
  +
*** Cortex-M awareness of the physical address and size of the memory buffers
  +
*** mmaping of buffers to enable Linux&reg; user land application access to them
  +
**rpmsg_sdb (shared data buffer) Linux driver, developed to take care of DDR constraints. For details on this buffer exchange mechanisms, refer to the [[how_to_exchange_large_data_buffers_with_the_coprocessor_-_principle | how to exchange large data buffers with the coprocessor - principle]] article
  +
** TTY RPMsg channel for control
  +
** sampling frequency is more than 5MHz
   
 
== Example of context description ==  
 
== Example of context description ==  
Line 21: Line 27:
   
 
From the user interface, press the START button to start the logic analyzer sampling.
 
From the user interface, press the START button to start the logic analyzer sampling.
The logic analyzer samples GPIO PORT E bits 8 to 12, which are present on the Arduino connector. They correspond to 5 bits.
+
The logic analyzer samples GPIO PORT E bits 8 to 14, which are present on the Arduino connector. They correspond to 7 bits.
The 3 remaining bits in each data byte are used to implement a packing algorithm.
+
The 8th bit will be reset by M4 algorithm.
   
After packing, the logic analyzer saves data in a binary file system named "date-time".dat, where date-time is the date and time of the system when the START button is pressed.
+
The number of received data is displayed on the screen as bytes and megabytes.
   
 
== Example of static architecture for exchanging large data buffers ==
 
== Example of static architecture for exchanging large data buffers ==
Line 30: Line 36:
 
* A Cortex-M firmware
 
* A Cortex-M firmware
 
* A Linux user land application
 
* A Linux user land application
  +
* A Linux rpmsg_tty driver
 
* A Linux rpmsg_sdb (shared data buffer) driver
 
* A Linux rpmsg_sdb (shared data buffer) driver
* A Linux rpmsg_tty driver
 
   
[[File:How2ELDBArchi.jpg|thumb|center|800px|link=]]
+
Note that the there is a direct correspondence between the sampling frequency and the data flow rate: thus, a sampling frequency of 8MHz means a data flow rate of 8MB/s (megabytes per second).
In the figure above, the numbers indicate the chronological order of data flows.
+
 
  +
Data flow when sampling frequency is less than or equal to 5MHz:
  +
[[File:how2bigdatarchiTTY1.jpg|thumb|center|800px|link=]]
  +
<br clear=all>
  +
 
  +
Data flow when sampling frequency is more than 5MHz:
  +
[[File:how2bigdatarchiDDR1.jpg|thumb|center|800px|link=]]
   
 
== Cortex-M firmware ==
 
== Cortex-M firmware ==
 
The Cortex-M firmware is responsible for:
 
The Cortex-M firmware is responsible for:
* receiving messages containing the physical address and size of DDR buffer(s), from the Linux rpmsg_sdb driver  
+
* receiving a command giving of the number of DDR buffers through the TTY RPMsg channel, from the Linux application.
  +
* receiving messages containing the physical address and size of DDR buffer(s), from the Linux rpmsg_sdb driver; These DDR buffers are always allocated, during the initialisation step, even if they are only used when the frequency sampling is more than 5MHz (see below).
 
* receiving a command Start/Stop sampling (including sampling frequency) through the TTY RPMsg channel, from the Linux application
 
* receiving a command Start/Stop sampling (including sampling frequency) through the TTY RPMsg channel, from the Linux application
 
* On start request:
 
* On start request:
** sampling the data at the requested sampling frequency
+
** sampling the data at the requested sampling frequency thanks to DMA2 stream0 from GPIOE to SRAM buffers
** filtering and packing data
+
** masking and transferring data buffers from SRAM to Linux application
** transferring via DMA the packed data to the DDR buffer by packet of 1024 bytes  
+
*** thanks to TTY over RPMSG buffer by packet of 256 bytes if the frequency is less than or equal to 5MHz
** informing the Cortex-A user interface (through the TTY RPMsg channel) when a DDR buffer of 1 Mbyte is filled, and roll to next DDR buffer.
+
*** thanks to copy to DDR buffer by packet of 1024 bytes, if the frequency sampling is more than 5MHz, and informing the Cortex-A user interface (through the SDB RPMsg channel) when a DDR buffer of 1 Mbyte is filled, and roll to next DDR buffer.
   
 
== Linux user land application ==
 
== Linux user land application ==
The Cortex-A Linux application includes a web user interface.
+
The Cortex-A Linux application includes a GTK user interface.
   
 
It allows controlling:
 
It allows controlling:
 
* the sampling frequency  
 
* the sampling frequency  
 
* the start / stop of the sampling.
 
* the start / stop of the sampling.
  +
* the data to be sampled thanks to "Set data" notch UI widget
   
The user interface displays statistics, including:
+
The user interface displays :
* the number of packed data received by the user interface
+
* statistics : the number of data received by the user interface, as bytes and Mbytes
* the number of unpacked data decompressed by the user interface
+
* the first data of every new received megabyte
* the number of packed data written by the user interface in the file system.
 
 
 
The user interface saves the packed data in a binary file.
 
   
 
== Linux drivers ==
 
== Linux drivers ==
* The [[how_to_exchange_large_data_buffers_with_the_coprocessor_-_principle#rpmsg_sdb_driver | rpmsg_sdb]] Linux driver is responsible for the shared buffer management.  
+
* The rpmsg_tty driver ({{CodeSource | Linux kernel | drivers/rpmsg/rpmsg_tty.c}}) is used to communicate (transport commands and status/events) between the Cortex-M firmware and the Cortex-A user land application.
 
+
* The [[how_to_exchange_large_data_buffers_with_the_coprocessor_-_principle#rpmsg_sdb_driver | rpmsg_sdb]] Linux driver is responsible for the DDR shared buffers management.  
* The rpmsg_tty driver is used to communicate (transport commands and status/events) between the Cortex-M firmware and the Cortex-A user land application.
 
   
 
== Dynamic view ==
 
== Dynamic view ==
Line 69: Line 79:
 
* It loads the rpmsg_sdb.ko module.
 
* It loads the rpmsg_sdb.ko module.
 
* It loads the Cortex-M firmware, then starts it.
 
* It loads the Cortex-M firmware, then starts it.
* It opens the rpmsg_tty driver for Cortex-M firmware control.
+
* It opens a rpmsg_tty channel for Cortex-M firmware control.
* It opens the rpmsg_sdb driver, then uses rpmsg_sdb IOCTL interface to allocate and mmap 3 buffers of 1Mbyte in DDR memory.
+
* It opens a rpmsg_tty channel for Cortex-M firmware trace debug.
  +
* It opens the rpmsg_sdb driver, then uses rpmsg_sdb IOCTL interface to allocate and mmap 10 buffers of 1Mbyte in DDR memory.
 
   
 
   
When the START button is pressed, the application sends the sampling command to the Cortex-M firmware (including the sampling frequency), and creates a "date-time.dat" binary file that is used to store the data sample in mass storage.
+
When the user presses User button2, the Linux application starts.
  +
 
  +
[[File:how2bigdataSTARTmsc.png|thumb|left|800px|link=]]
  +
 
  +
<br clear=all>
  +
 
  +
When the START button is pressed, the application sends the sampling command to the Cortex-M firmware (including the sampling frequency).
  +
 
  +
Case 1: user selects a frequency sampling of 4MHz => case of  TTY buffers (frequency scaling less than or equal to 5MHz)
  +
 
  +
When the Linux application receives a data buffer over TTY it checks if a new MByte has been fully received, and in this case it updates the statistics information.
  +
 
  +
[[File:how2bigdataTTYmsc.png|thumb|left|800px|link=]]
  +
 
  +
<br clear=all>
  +
 
  +
Case 2: user selects a frequency sampling of 8MHz => case of copy to DDR buffers (frequency scaling more than 5MHz)
  +
 
  +
When the Cortex-M firmware sends a "buffer full" signal via the rpmsg_sdb driver, Linux application updates the statistics information.
  +
 
  +
[[File:how2bigdataDDRmsc.png|thumb|left|800px|link=]]
  +
 
  +
<br clear=all>
   
When the STOP button is pressed, an overrun data error or a file system full error occurs, the application sends the stop command to the Cortex-M firmware and finalizes the "date-time.dat" binary file.
+
When the STOP button is pressed, the application sends the stop command to the Cortex-M firmware.
   
When the Cortex-M firmware sends a "buffer full" signal via the rpmsg_sdb driver, the application unpacks data, writes the packed data in "date-time.dat" file, and updates the statistic information.
+
When the user presses User button2, the Linux application stops.
   
[[File:howtobigdatamsc.jpg|thumb|left|800px|link=]]
+
[[File:how2bigdataENDmsc.png|thumb|left|800px|link=]]
   
 
<br clear=all>
 
<br clear=all>
   
 
== Results ==
 
== Results ==
Depending on the type of data available on GPIOE, the transfer rate between the Cortex-M and the Cortex-A core varies from 1 Mbyte per second (repetition factor of 7) to 8 Mbytes per second (repetition factor of 0).
+
The Cortex-M CPU performs a mask and copy data operation on 1024 bytes within 75.4µs; This implies a maximum frequency sampling of: 1 / (75.4e-6 / 1024) => 13.58MHz. This corresponds to the maximum frequency sampling that can be achieved. In order to let a margin, the maximum frequency sampling implemented in this example is set to 12MHz.   
 
 
In the later use case, the Cortex-M CPU is able to compress data at a rate up to 64 Mbits per second (8 Mbytes per second). This corresponds to the maximum rate that can be achieved.   
 
   
== Limitation ==
+
[[File:how2bigdataChrono.png|thumb|center|800px|link=]]
The limitation is due to data packing, as shown in the figure below:
 
 
[[File:LA_perf.png|thumb|center|800px|link=]]
 
   
On this oscilloscope snapshot, a GPIO is set at the beginning of the packing algorithm, and reset at the end of the algorithm.  
+
On this oscilloscope snapshot, a GPIO is set at the beginning of the data masking and copying algorithm, and reset at the end of the algorithm.  
So, 109.5 µs are spent to pack 1024 bytes of data at a sampling frequency of 8 MHz. Increasing the frequency to 9 MHz would cross the limit : 9MHz => 111µs.  
+
So, 75.4 µs are spent to mask and copy 1024 bytes of data in DDR.
   
 
== Source code ==
 
== Source code ==
Line 103: Line 131:
 
:https://github.com/STMicroelectronics/logicanalyser
 
:https://github.com/STMicroelectronics/logicanalyser
 
For firmware compilation, please have a look into: [[STM32CubeMP1 Package#Developer Package for STM32CubeMP1|Developer Package for STM32CubeMP1]]
 
For firmware compilation, please have a look into: [[STM32CubeMP1 Package#Developer Package for STM32CubeMP1|Developer Package for STM32CubeMP1]]
  +
  +
In the source code example, 10 buffers of 1MByte each are allocated for the exchange. 3 buffers is the minimum to guarantee the real time behavior of the application. If the number of buffer needs to be increased (more than 10), then  rpmsg_sdb_driver, M4 firmware, and Linux application must be modified, as the messaging relies on a single digit for the buffer index "BxLyyyyyyyy" => "BxxLyyyyyyyy".
  +
  +
== Key messages ==
  +
Cortex-M is capable to perform basic algorithm on high data flow.
  +
If a more complex data treatment is needed, the data rate must be adapted to be able to treat it on Cortex-M.
  +
  +
Code instrumentation and GPIO set/reset to measure data algorithm timing are highly recommended to check real time of the full system.
  +
  +
'''For a data rate less than or equal to 5MB/s, TTY over RPMSG should be the preferred solution.'''
  +
  +
It is not recommended to use several DMA2 streams to deal with high data rates: using a second DMA stream to transfer data to DDR does not work if both streams work at same high rate (trials at 6MHz proved this); in this case a DMA stream0 error occurs.
   
 
== Usage ==
 
== Usage ==
Please follow README.md of Yocto layer to perform installation.
+
Please follow README.md of the Yocto layer to perform installation.
   
 
The '''logicanalyser application''' is launched/stopped by pressing User2 button of the STM32MP1 Discovery board.
 
The '''logicanalyser application''' is launched/stopped by pressing User2 button of the STM32MP1 Discovery board.
Line 112: Line 152:
   
 
Snapshot view of user interface :
 
Snapshot view of user interface :
[[File:Screenshot.jpg|thumb|center|800px|link=]]
+
[[File:screenshotLA.png|thumb|center|800px|link=]]
   
 
<noinclude>
 
<noinclude>