Last edited 6 months ago

Coprocessor management troubleshooting grid

Applicable for STM32MP15x lines, STM32MP25x lines


Some typical issues related to the management of a coprocessor are listed below. Solutions or debugging methods are proposed for these issues. If your issue is not listed, try looking in the articles in the Coprocessor management Linux, Coprocessor management STM32Cube or troubleshooting grids categories.

1. Coprocessor firmware loading and control[edit | edit source]

Symptom Resolution

The coprocessor traces are not available on Linux® side

 cat  /sys/kernel/debug/remoteproc/remoteproc0/trace0
No such file or directory

This may happen for two reasons:

  • The firmware does not include any resource table or the resource table does not define any trace.
Update the firmware resource table and rebuild the firmware
  • The ".resource_table" section is empty or not defined in the elf file.
Use the following command to verify it in the generated elf file:
  readelf -l <elf file>
 ...
 02     .data .resource_table .bss ._user_heap_stack
 ....

When starting the coprocessor from the bootloader (u-boot):

"unsupported fw ver: 0
Remote Processor 0 resource table Not found : 0x00000000-0x0 "

The firmware does not include any resource table. This is only a warning that does not prevent the firmware from starting properly. "rproc load_rsc" step can be bypassed.

When Loading a signed firmware from OP-TEE:

"remoteproc remoteproc0: bad phdr da 0x80123db8 mem 0x7f36a8"

There is probably a mismatch between the firmware memory mapping and the memory regions declared in the Linux remoteproc device tree. Use the following command to verify it in the generated elf file:

  readelf -l <elf file>

and compare the elf program header with the OPTEE device tree reserved-memory declaration.

When Loading a signed firmware from OP-TEE:

"E/TA:  remoteproc_load_segment:887 Fails to clear segment, res = 0xffff0001"

There is probably a mismatch between the firmware memory mapping and the memory regions declared in the OP-TEE remoteproc device tree. Use the following command to verify it in the generated elf file:

  readelf -l <elf file>

and compare the elf program header with the OPTEE device tree reserved-memory declaration.

When Loading a signed firmware from OP-TEE:

"E/TA:  remoteproc_load_fw:1014 Can't Authenticate the firmware <res = 0xFFFF0000>"

This means the OP-TEE remoteproc framework detect something not valid in the the firmware format. The root cause can be:

  • The firmware is a ELF while a signed image is required. Following format should return the TEE supported format:
 cat sys/class/remoteproc/remoteproc0/fw_format
  • The firmware file is corrupted.
  • The firmware image(s) has(ve) not be signed.
  • The firmware image(s) has been signed with a bad key.
  • The TLV parameters are not valid.

See Firmware_signature for details
To debug some traces can be added in ta/remoteproc/remoteproc_core.c

2. Inter processor communication[edit | edit source]


Symptom Resolution

Frozen firmware as consequence of a deadlock in OpenAMP during IP communication with the the main processor.

This Issue probably comes from rpmsg_virtio_rx_callback or rpmsg_virtio_send_offchannel_raw (rpmsg_virtio.c) functions that are called in interrupt context. These functions use a mutex lock in rpmsg_device struct when accessing the index of the virtio queue index. Rework your code so that these functions are not called in interrupt context.

Linux kernel trace:

stm32-ipcc 4c001000.mailbox: Try increasing MBOX_TX_QUEUE_LEN
  • On each IPCC interrupt, the coprocessor treats all the buffered RPMsgs (one IPCC event for several RPMsgs). On Linux side, one IPCC signal is programmed for each RPMsg sent. This can result is an overflow warning on Linux since too many IPCC events are queued. No RPMsgs are dropped but this message could be interpreted as the coprocessor reaches its capacity to treat the received messages in time.Consider reworking the code so that the coprocessor processes messages more efficiently or decreasing the rate of messages sent by Linux.
  • Another possible root cause could be that the IPCC was not properly deinitialized when the STM32Cube firmware was previously stopped, and the IPCC channels were not released. To prevent this issue, the STM32Cube firmware should implement the shutdown process as shown in the CoproSync_ShutDown example, and calls the HAL_IPCC_DeInit() API.

Linux kernel trace ( example with rpmsg_tty driver):

 rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: timeout waiting for a tx buffer

This message means that there is no more TX buffer available to transmit messages to the remote processor. This may happen for two reasons:

  • The firmware implementation is incomplete and the coprocessor does not process any IPCC interrupt (eg interrupt disabled or interrupt handler not defined). Fix the firmware code: refer to IPCC_internal_peripheral for details on the peripheral.
  • The coprocessor is busy, frozen or crashed. This can be confirmed by performing a debug tool analysis.

Linux kernel trace:

rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: No memory for tty_prepare_flip_string

There is no more space in the TTY temporary buffer on reception. The root cause is probably that the Linux application did not read the tty device on time to treat the incoming TTY stream. Consider reworking the Linux application so that it takes less time to process messages or decreasing the message exchange rate.

Linux kernel trace:

remoteproc remoteproc0: stm32_rproc_kick: failed (<mbx>, <error value>)

The Linux remoteproc driver cannot use the IPCC mailbox. This may happen for two reasons:

  • The Linux kernel is built without the support of the stm32 IPCC mailbox: to enable it, refer to IPCC configuration.
  • The mailboxes are not or incorrectly defined in the Linux kernel DeviceTree: fix it as described in mailbox DeviceTree.

stm32Cube error: VIRT_UART_Transmit() returns an error until a first message
is received from the Cortex-A.

Before the Cortex-M can send any message, the RPMsg protocol requires that the Linux application has sent a first message (to provide its address to the Cortex-M).

The Linux and/or the Cortex-M applications shall be reworked to follow this constraint.