Difference between revisions of "Exchanging buffers with the coprocessor"

[checked revision] [quality revision]
(Introduction)
m (Introduction)
 

1 Introduction[edit]

In the STM32MPU Embedded Software distribution, the RPMsg protocol allows communication between the Arm® Cortex®-A and Cortex®-M cores [1][2].

To implement a feature relying on RPMsg protocol, it is important to understand that this protocol has not been designed to directly transfer high data rate streams. As a result, the implementation needs need to be adapted depending on the use case constraints:

  • For control and low data rate exchange, RPMsg is enough.
  • For high rate transfer and large data buffers, an indirect buffer exchange mode should be preferred.

It is not possible to provide strict rules for choosing one or the other implementation. This depends on the use case but also on:

  • the loading of the Cortex CPUs
  • process priorities
  • preemptions (such as interrupts and secure services)
  • other services implemented on top of the RPMsg instance.
  • ....

The aim of this article is to help choosing the best adapted implementation. If this is not sufficient, another approach is to implement first the direct mode and test its performance in your system.

2 RPMsg protocol awareness[edit]

The RPMsg provides, through the virtio framework, a basic transport layer based on a shared ring buffer:

  • The buffers are prenegociated and preallocated during coprocessor loading (size , number of buffers).
  • The buffers are allocated in non cacheable memory.
  • There is no direct access from RPMsg client to these buffers. They are filled by a copy (no zero copy or DMA transfers).
  • No bandwidth is guaranteed. Buffers can be shared between several RPMsg clients.

The size of the buffers is hard-coded (512 bytes). However it is possible to customize the number of buffers used. Modifying this parameter impacts the number of buffers allocated in both direction (refer to resource table for details).
A doorbell signal is associated to each RPMsg transfer. It is sent to the destination processor via the stm32 IPCC mailbox and generates an IRQ for each message transfer.
Notice that the IRQ frequency can be a criteria for the decision. For instance a 1 Mbyte/s transfer from Cortex-M4 to Cortex-A7 generates around 2000 IRQ per second on each Cortex, to transfer 512-bytes RPMsg buffers.

3 Direct buffer exchange mode[edit]

This mode consists in using the RPMsg buffer to transfer data between the processors.

  • The RPMsg message contains effective data.
  • Memory allocation is limited to the RPMsg buffers allocation.
  • RPMsg client implementation is quite straight forward in terms of code implementation.

Copro-sw-ipc-overview.png

The Direct buffer exchange implementation is recommended:

  • for control message, for instance to control a remote processor application,
  • to exchange low data rate stream (similar to a slow data rate bus).

For application sample, refer to List of Availables Projects and have a look into OpenAMP_TTY_echo application.

4 Indirect buffer exchange mode[edit]

This mode is also called "large data buffers exchange" or "Big Data". It consists in using RPMsg to carry references to some other buffers that contain the effective data. These other buffers can be:

  • of any size,
  • allocated by multiple means, in cached or non cached memory, DDR or MCU SRAM, ...
  • mmapped for direct access by application,
  • accessed by DMA or any master peripheral.

This implementation allows limiting data copy between producer and consumer, offering direct data access to buffer clients such as applications.

Copro-sw-ipc-big-data.png

In the above overview, rpmsg_sdb is a driver taken as an example. It offers to the application an interface to allocate and exchange buffers with the remote processor.

This implementation is recommended:

  • for high bit rate transfer,
  • for real time transfer (e.g. audio buffers),
  • to privilege dynamic buffers allocation and/or minimize copies.
  • to adapt to existing Linux framework or application
  • ...

For details on the mechanisms that can be implemented for large data buffer exchanges, refer to How to exchange large data buffers with the coprocessor - principle article
For application sample, refer to How to exchange large data buffers with the coprocessor - example article.

5 References[edit]


== Introduction ==
In the [[STM32MPU_Embedded_Software_distribution | STM32MPU Embedded Software distribution]], the [[Linux_RPMsg_framework_overview|RPMsg]] protocol allows communication between the Arm<sup>&reg;</sup> Cortex<sup>&reg;</sup>-A and Cortex<sup>&reg;</sup>-M cores <ref name=coprocessor management overview>[[Coprocessor_management_overview]] </ref><ref name=Rpmsg framework overview>[[Linux_RPMsg_framework_overview]]</ref>.<br>


To implement a feature relying on RPMsg protocol, it is important to understand that this protocol has not been designed to directly transfer high data rate streams. As a result, the implementation needsneed to be adapted depending on the use case constraints:
* For control and low data rate exchange, RPMsg is enough.
* For high rate transfer and large data buffers, an indirect buffer exchange mode should be preferred.

It is not possible to provide strict rules for choosing one or the other implementation. This depends on the use case but also on:
*the loading of the Cortex CPUs
*process priorities 
*preemptions (such as interrupts and secure services)
*other services implemented on top of the RPMsg instance.
*....
The aim of this article is to help choosing the best adapted implementation. If this is not sufficient, another approach is to 
implement first the direct mode and test its performance in your system.

== RPMsg protocol awareness ==
The [[Linux_RPMsg_framework_overview|RPMsg]] provides, through the virtio framework, a basic transport layer based on a shared ring buffer:
* The buffers are prenegociated and preallocated during coprocessor loading (size , number of buffers).
* The buffers are allocated in non cacheable memory.
* There is no direct access from RPMsg client to these buffers. They are filled by a copy (no zero copy or DMA transfers).
* No bandwidth is guaranteed.  Buffers can be shared between several RPMsg clients.

The size of the buffers is hard-coded (512 bytes). However it is possible to customize the number of buffers used. Modifying this parameter impacts the number of buffers allocated in both direction (refer to [[Coprocessor_resource_table#How_to_add_RPMsg_inter-processor_communication| resource table]] for details).<br>

A doorbell signal is associated to each RPMsg transfer. It is sent to the destination processor via the [[Linux_Mailbox_framework_overview | stm32 IPCC mailbox]] and generates an IRQ for each message transfer.<br>

Notice that the IRQ frequency can be a criteria for the decision. For instance a 1 Mbyte/s transfer from Cortex-M4 to Cortex-A7 generates around 2000 IRQ per second on each Cortex, to transfer 512-bytes RPMsg buffers.

==Direct buffer exchange mode==
This mode consists in using the RPMsg buffer to transfer data between the processors. 
* The RPMsg message contains effective data.
* Memory allocation is limited to the RPMsg buffers allocation.
* RPMsg client implementation is quite straight forward in terms of code implementation. 

[[File:copro-sw-ipc-overview.png|link=]]

The Direct buffer exchange implementation is recommended:
* for control message, for instance to control a remote processor application,
* to exchange low data rate stream (similar to a  slow data rate bus).

For application sample,  refer to [[STM32CubeMP1_Package_release_note#Available_projects|List of Availables Projects]] and have a look into OpenAMP_TTY_echo'''  application.

==Indirect buffer exchange mode==
This mode is also called '''"large data buffers exchange"''' or '''"Big Data"'''. It consists in using RPMsg to carry references to some other buffers that contain the effective data. These other buffers can be:
* of any size,
* allocated by multiple means, in cached or non cached memory, DDR or MCU SRAM, ...
* mmapped for direct access by application, 
* accessed by DMA or any master peripheral.
This implementation allows limiting data copy between producer and consumer, offering direct data access to buffer clients such as applications.

[[File:copro-sw-ipc-big-data.png]]

In the above overview, [https://github.com/STMicroelectronics/meta-st-stm32mpu-app-logicanalyser/tree/thud/recipes-kernel/rpsmg-sdb-mod/files rpmsg_sdb] is a driver taken as an example. It offers to the application an interface to allocate and exchange buffers with the remote processor.

This implementation is recommended: 
* for high bit rate transfer,
* for real time transfer (e.g. audio buffers),
* to privilege dynamic buffers allocation and/or minimize copies.
* to adapt to existing Linux framework or application
*...

For details on the mechanisms that can be implemented for large data buffer exchanges, refer to [[How to exchange large data buffers with the coprocessor - principle]] article <br>

For application sample, refer to [[How to exchange large data buffers with the coprocessor - example]] article.

==References==<references/>

<noinclude>

[[Category:Coprocessor_management_Linux]]
[[Category:Coprocessor_management_STM32Cube]]
{{PublicationRequestId | 14610 | 2020-01-15 |}}</noinclude>
Line 2: Line 2:
 
In the [[STM32MPU_Embedded_Software_distribution | STM32MPU Embedded Software distribution]], the [[Linux_RPMsg_framework_overview|RPMsg]] protocol allows communication between the Arm<sup>&reg;</sup> Cortex<sup>&reg;</sup>-A and Cortex<sup>&reg;</sup>-M cores <ref name=coprocessor management overview>[[Coprocessor_management_overview]] </ref><ref name=Rpmsg framework overview>[[Linux_RPMsg_framework_overview]]</ref>.<br>
 
In the [[STM32MPU_Embedded_Software_distribution | STM32MPU Embedded Software distribution]], the [[Linux_RPMsg_framework_overview|RPMsg]] protocol allows communication between the Arm<sup>&reg;</sup> Cortex<sup>&reg;</sup>-A and Cortex<sup>&reg;</sup>-M cores <ref name=coprocessor management overview>[[Coprocessor_management_overview]] </ref><ref name=Rpmsg framework overview>[[Linux_RPMsg_framework_overview]]</ref>.<br>
   
To implement a feature relying on RPMsg protocol, it is important to understand that this protocol has not been designed to directly transfer high data rate streams. As a result, the implementation need to be adapted depending on the use case constraints:
+
To implement a feature relying on RPMsg protocol, it is important to understand that this protocol has not been designed to directly transfer high data rate streams. As a result, the implementation needs to be adapted depending on the use case constraints:
 
* For control and low data rate exchange, RPMsg is enough.
 
* For control and low data rate exchange, RPMsg is enough.
 
* For high rate transfer and large data buffers, an indirect buffer exchange mode should be preferred.
 
* For high rate transfer and large data buffers, an indirect buffer exchange mode should be preferred.

Attachments

Discussions