Difference between revisions of "Exchanging buffers with the coprocessor"

[quality revision] [quality revision]
m (Introduction)
m
 
Applicable for STM32MP15x lines

1 Introduction[edit]

In the STM32MPU Embedded Software distribution, the RPMsg protocol allows communication between the Arm® Cortex®-A and Cortex®-M cores [1][2].

To implement a feature relying on RPMsg protocol, it is important to understand that this protocol has not been designed to directly transfer high data rate streams. As a result, the implementation needs to be adapted depending on the use case constraints:

  • For control and low data rate exchange, RPMsg is enough.
  • For high rate transfer and large data buffers, an indirect buffer exchange mode should be preferred.

It is not possible to provide strict rules for choosing one or the other implementation. This depends on the use case but also on:

  • the loading of the Cortex CPUs
  • process priorities
  • preemptions (such as interrupts and secure services)
  • other services implemented on top of the RPMsg instance.
  • ....

However, as shown in the How to exchange data buffers with the coprocessor article, for a data rate less than or equal to 5MB/s, the direct buffer exchange mode is recommended (as it is easier to implement than the indirect buffer exchange mode).

The aim of this article is to help choosing the best adapted implementation. If this is not sufficient, another approach is to implement first the direct mode and test its performance in your system.

2 RPMsg protocol awareness[edit]

The RPMsg provides, through the virtio framework, a basic transport layer based on a shared ring buffer:

  • The buffers are prenegociated and preallocated during coprocessor loading (size , number of buffers).
  • The buffers are allocated in non cacheable memory.
  • There is no direct access from RPMsg client to these buffers. They are filled by a copy (no zero copy or DMA transfers).
  • No bandwidth is guaranteed. Buffers can be shared between several RPMsg clients.

The size of the buffers is hard-coded (512 bytes). However it is possible to customize the number of buffers used. Modifying this parameter impacts the number of buffers allocated in both direction (refer to resource table for details).
A doorbell signal is associated to each RPMsg transfer. It is sent to the destination processor via the stm32 IPCC mailbox and generates an IRQ for each message transfer.
Notice that the IRQ frequency can be a criteria for the decision. For instance a 1 Mbyte/s transfer from Cortex-M4 to Cortex-A7 generates around 2000 IRQ per second on each Cortex, to transfer 512-bytes RPMsg buffers.

3 Direct buffer exchange mode[edit]

This mode consists in using the RPMsg buffer to transfer data between the processors.

  • The RPMsg message contains effective data.
  • Memory allocation is limited to the RPMsg buffers allocation.
  • RPMsg client implementation is quite straight forward in terms of code implementation.

Copro-sw-ipc-overview.png Copro-sw-ipc-overview.png

The Direct buffer exchange implementation is recommended:

  • for control message, for instance to control a remote processor application,
  • to exchange low data rate stream (similar to a slow data rate bus).

For application sample, refer to List of Availables Projects and have a look into :

4 Indirect buffer exchange mode[edit]

This mode is also called "large data buffers exchange" or "Big Data". It consists in using RPMsg to carry references to some other buffers that contain the effective data. These other buffers can be:

  • of any size,
  • allocated by multiple means, in cached or non cached memory, DDR or MCU SRAM, ...
  • mmapped for direct access by application,
  • accessed by DMA or any master peripheral.

This implementation allows limiting data copy between producer and consumer, offering direct data access to buffer clients such as applications.

Copro-sw-ipc-big-data.png Copro-sw-ipc-big-data.png

In the above overview, rpmsg_sdb is a driver taken as an example. It offers to the application an interface to allocate and exchange buffers with the remote processor.

This implementation is recommended:

  • for high bit rate transfer,
  • for real time transfer (e.g. audio buffers),
  • to privilege dynamic buffers allocation and/or minimize copies.
  • to adapt to existing Linux framework or application
  • ...

For details on the mechanisms that can be implemented for large data buffer exchanges, refer to How to exchange large data buffers with the coprocessor - principle article
For application sampleapplication sample (and source code), refer to the How to exchange large data buffers with the coprocessor - example article.

5 References[edit]


== <noinclude>{{ApplicableFor
|MPUs list=STM32MP15x
|MPUs checklist=STM32MP13x, STM32MP15x
}}</noinclude>


== Introduction ==
In the [[STM32MPU_Embedded_Software_distribution | STM32MPU Embedded Software distribution]], the [[Linux_RPMsg_framework_overview|RPMsg]] protocol allows communication between the Arm<sup>&reg;</sup> Cortex<sup>&reg;</sup>-A and Cortex<sup>&reg;</sup>-M cores <ref name=coprocessor management overview>[[Coprocessor_management_overview]] </ref><ref name=Rpmsg framework overview>[[Linux_RPMsg_framework_overview]]</ref>.<br>


To implement a feature relying on RPMsg protocol, it is important to understand that this protocol has not been designed to directly transfer high data rate streams. As a result, the implementation needs to be adapted depending on the use case constraints:
* For control and low data rate exchange, RPMsg is enough.
* For high rate transfer and large data buffers, an indirect buffer exchange mode should be preferred.

It is not possible to provide strict rules for choosing one or the other implementation. This depends on the use case but also on:
*the loading of the Cortex CPUs
*process priorities 
*preemptions (such as interrupts and secure services)
*other services implemented on top of the RPMsg instance.
*....

However, as shown in the [[How to exchange data buffers with the coprocessor|How to exchange data buffers with the coprocessor]] article, for a data rate less than or equal to 5MB/s, the [[#Direct buffer exchange mode|direct buffer exchange mode]] is recommended (as it is easier to implement than the indirect buffer exchange mode).
The aim of this article is to help choosing the best adapted implementation. If this is not sufficient, another approach is to 
implement first the direct mode and test its performance in your system.

== RPMsg protocol awareness ==
The [[Linux_RPMsg_framework_overview|RPMsg]] provides, through the virtio framework, a basic transport layer based on a shared ring buffer:
* The buffers are prenegociated and preallocated during coprocessor loading (size , number of buffers).
* The buffers are allocated in non cacheable memory.
* There is no direct access from RPMsg client to these buffers. They are filled by a copy (no zero copy or DMA transfers).
* No bandwidth is guaranteed.  Buffers can be shared between several RPMsg clients.

The size of the buffers is hard-coded (512 bytes). However it is possible to customize the number of buffers used. Modifying this parameter impacts the number of buffers allocated in both direction (refer to [[Coprocessor_resource_table#How_to_add_RPMsg_inter-processor_communication| resource table]] for details).<br>

A doorbell signal is associated to each RPMsg transfer. It is sent to the destination processor via the [[Linux_Mailbox_framework_overview | stm32 IPCC mailbox]] and generates an IRQ for each message transfer.<br>

Notice that the IRQ frequency can be a criteria for the decision. For instance a 1 Mbyte/s transfer from Cortex-M4 to Cortex-A7 generates around 2000 IRQ per second on each Cortex, to transfer 512-bytes RPMsg buffers.

==Direct buffer exchange mode==
This mode consists in using the RPMsg buffer to transfer data between the processors. 
* The RPMsg message contains effective data.
* Memory allocation is limited to the RPMsg buffers allocation.
* RPMsg client implementation is quite straight forward in terms of code implementation. 

[[File:copro-sw-ipc-overview.png|800px|link=]]

The Direct buffer exchange implementation is recommended:
* for control message, for instance to control a remote processor application,
* to exchange low data rate stream (similar to a  slow data rate bus).

For application sample,  refer to :
* the "OpenAMP_TTY_echo" application example in the [[STM32CubeMP1_Package_release_note#Available_projects|Listlist of Availables Projects]] and have a look into OpenAMP_TTY_echo'''  application.
available projects]],
* the [[How to exchange data buffers with the coprocessor]] article that provides source code example for the direct buffer exchange mode. 
==Indirect buffer exchange mode==
This mode is also called '''"large data buffers exchange"''' or '''"Big Data"'''. It consists in using RPMsg to carry references to some other buffers that contain the effective data. These other buffers can be:
* of any size,
* allocated by multiple means, in cached or non cached memory, DDR or MCU SRAM, ...
* mmapped for direct access by application, 
* accessed by DMA or any master peripheral.
This implementation allows limiting data copy between producer and consumer, offering direct data access to buffer clients such as applications.

[[File:copro-sw-ipc-big-data.png|800px|link=]]

In the above overview, [https://github.com/STMicroelectronics/meta-st-stm32mpu-app-logicanalyser/tree/thud/recipes-kernel/rpsmg-sdb-mod/files rpmsg_sdb][How to exchange data buffers with the coprocessor#rpmsg_sdb driver|rpmsg_sdb]] is a driver taken as an example. It offers to the application an interface to allocate and exchange buffers with the remote processor.

This implementation is recommended: 
* for high bit rate transfer,
* for real time transfer (e.g. audio buffers),
* to privilege dynamic buffers allocation and/or minimize copies.
* to adapt to existing Linux framework or application
*...

For details on the mechanisms that can be implemented for large data buffer exchanges, refer to [[How to exchange large data buffers with the coprocessor - principle]] article <br>

For application sample, refer to [[How to exchange large application sample (and source code), refer to the [[How to exchange data buffers with the coprocessor - example]] article.

==References==<references/>

<noinclude>

[[Category:Coprocessor_management_Linux]]
[[Category:Coprocessor_management_STM32Cube]]
{{PublicationRequestId | 14610 | 2020-01-15 |}}</noinclude>
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
<noinclude>{{ApplicableFor
  +
|MPUs list=STM32MP15x
  +
|MPUs checklist=STM32MP13x, STM32MP15x
  +
}}</noinclude>
  +
 
== Introduction ==
 
== Introduction ==
 
In the [[STM32MPU_Embedded_Software_distribution | STM32MPU Embedded Software distribution]], the [[Linux_RPMsg_framework_overview|RPMsg]] protocol allows communication between the Arm<sup>&reg;</sup> Cortex<sup>&reg;</sup>-A and Cortex<sup>&reg;</sup>-M cores <ref name=coprocessor management overview>[[Coprocessor_management_overview]] </ref><ref name=Rpmsg framework overview>[[Linux_RPMsg_framework_overview]]</ref>.<br>
 
In the [[STM32MPU_Embedded_Software_distribution | STM32MPU Embedded Software distribution]], the [[Linux_RPMsg_framework_overview|RPMsg]] protocol allows communication between the Arm<sup>&reg;</sup> Cortex<sup>&reg;</sup>-A and Cortex<sup>&reg;</sup>-M cores <ref name=coprocessor management overview>[[Coprocessor_management_overview]] </ref><ref name=Rpmsg framework overview>[[Linux_RPMsg_framework_overview]]</ref>.<br>
Line 12: Line 17:
 
*other services implemented on top of the RPMsg instance.
 
*other services implemented on top of the RPMsg instance.
 
*....
 
*....
  +
  +
However, as shown in the [[How to exchange data buffers with the coprocessor|How to exchange data buffers with the coprocessor]] article, for a data rate less than or equal to 5MB/s, the [[#Direct buffer exchange mode|direct buffer exchange mode]] is recommended (as it is easier to implement than the indirect buffer exchange mode).
  +
 
The aim of this article is to help choosing the best adapted implementation. If this is not sufficient, another approach is to  
 
The aim of this article is to help choosing the best adapted implementation. If this is not sufficient, another approach is to  
 
implement first the direct mode and test its performance in your system.
 
implement first the direct mode and test its performance in your system.
Line 32: Line 40:
 
* RPMsg client implementation is quite straight forward in terms of code implementation.  
 
* RPMsg client implementation is quite straight forward in terms of code implementation.  
   
[[File:copro-sw-ipc-overview.png|link=]]
+
[[File:copro-sw-ipc-overview.png|800px|link=]]
   
 
The Direct buffer exchange implementation is recommended:
 
The Direct buffer exchange implementation is recommended:
Line 38: Line 46:
 
* to exchange low data rate stream (similar to a  slow data rate bus).
 
* to exchange low data rate stream (similar to a  slow data rate bus).
   
For application sample, refer to [[STM32CubeMP1_Package_release_note#Available_projects|List of Availables Projects]] and have a look into OpenAMP_TTY_echo'''  application.
+
For application sample, refer to:
  +
* the "OpenAMP_TTY_echo" application example in the [[STM32CubeMP1_Package_release_note#Available_projects|list of available projects]],
  +
* the [[How to exchange data buffers with the coprocessor]] article that provides source code example for the direct buffer exchange mode.  
   
 
==Indirect buffer exchange mode==
 
==Indirect buffer exchange mode==
Line 48: Line 58:
 
This implementation allows limiting data copy between producer and consumer, offering direct data access to buffer clients such as applications.
 
This implementation allows limiting data copy between producer and consumer, offering direct data access to buffer clients such as applications.
   
[[File:copro-sw-ipc-big-data.png]]
+
[[File:copro-sw-ipc-big-data.png|800px|link=]]
   
In the above overview, [https://github.com/STMicroelectronics/meta-st-stm32mpu-app-logicanalyser/tree/thud/recipes-kernel/rpsmg-sdb-mod/files rpmsg_sdb] is a driver taken as an example. It offers to the application an interface to allocate and exchange buffers with the remote processor.
+
In the above overview, [[How to exchange data buffers with the coprocessor#rpmsg_sdb driver|rpmsg_sdb]] is a driver taken as an example. It offers to the application an interface to allocate and exchange buffers with the remote processor.
   
 
This implementation is recommended:  
 
This implementation is recommended:  
Line 59: Line 69:
 
*...
 
*...
   
For details on the mechanisms that can be implemented for large data buffer exchanges, refer to [[How to exchange large data buffers with the coprocessor - principle]] article <br>
+
For application sample (and source code), refer to the [[How to exchange data buffers with the coprocessor]] article.
For application sample, refer to [[How to exchange large data buffers with the coprocessor - example]] article.
 
   
 
==References==
 
==References==