How to optimize acquisition speed

Revision as of 11:54, 8 July 2020 by Registered User (→‎Use of ST-Link server)

1. Introduction

The acquisition rate can be an important parameter to monitor values changing quickly. The time required to retrieve the data values from the target can be influenced by many parameters. The purpose of the article is to explain which parameters must be checked and how to get the best sampling frequency.

STM32CubeMonitor variable monitoring
STM32CubeMonitor performs data acquisition by reading in the target MCU memory though ST-Link and SWD/JTAG connection.

Acquisition chain


The elements involved are :

  • STM32 MCU debug bloc which performs access to the memory through the MCU bus.
  • STM32 JTAG/SWD access port, used to connect the MCU debug bloc to ST-Link.
  • ST-Link device.
  • USB protocol on the computer.
  • STM32CubeMonitor software.


Each time data need to be sampled, the software reads the memory area where variables are located. If several memory areas need to be accessed, there are multiple requests for each read cycle.

2. Acquisition speed setting in STM32CubeMonitor

The acquisition frequency can be set in the 'variable' node. If several 'variable' nodes are used, different speed can be set for each node. It is then possible to have some variables refreshed very quickly, while others are refreshed at a lower rate. To set the frequency, open the 'variable' node. In "Acquisition Parameters" part, the "Sampling Frequency" is used to set the acquisition rate:

  • "sequential loop" provides the fastest acquisition rate: as soon as a measurement is complete, a new one is started.
  • 0.1HZ ... 1000Hz: predefined frequency.
  • Custom: used to set a specific frequency in Hz.

Click on "Done" and then "Deploy" to update the flow. The next acquisition will be done at the requested frequency, or at maximum speed if the requested frequency can not be reached.

3. Elements influencing acquisition time

3.1. ST-Link SWD/JTAG speed

The clock speed for link between ST-Link and MCU can be changed. With a high speed clock, the time to transmit the read request and response is shorter. To optimize the sampling frequency, the highest frequency must be used.
The JTAG/SWD clock frequency can be set in the "probe config" of "Probe in" or "Probe out" nodes.
The ST-Link V3 hardware provides higher frequency than ST-Link V2, so the acquisition is faster.
If the JTAG/SWD connection is done through long wires, with the maximum speed the connection may be unstable. In this case, lower frequency should be used to get a reliable connection.

3.2. Computer performances

The read operation from STM32CubeMonitor involves the software, the drivers and USB stack on the computer. The USB stack is managed by computer OS, and some latency is added when each request is sent to ST-Link.
The total time required to process the transaction is dependent of operating system and speed of the computer used. It can take more than 1ms on some computers.
This latency is a major part of the read time. It is not possible to improve it, so it is important to reduce the number of read operation required for each acquisition cycle.

3.3. Use of ST-Link server

When the ST-link server is used, it adds an extra protocol layer for TCP mode. It is then possible to share the ST-Link between several programs, but there is a performance decrease due to extra layer added. (The speed could be reduced by 50%).

3.4. Data size and location

The size and location of variables have a direct impact on time required to read data. The size effect is easy to understand : The amount of data to read is at minimum the size of variable multiplied by the number of variables. Reading 10 u32 variable will need to transfer 40 bytes of payload. To increase access speed, it is useful to reduce the number of data.

Data location impact is more complex to estimate. Performing each data access is time consuming, and it is more efficient to read many variable in one read operation when possible.
STM32CubeMonitor will perform this optimization automatically :

  • If 3 u8 variables are in the same u32 block, the tool will read one 32bit data instead of 3 u8 access. It is 3 time faster, and the u8 values will be extracted inside STM32cubeMonitor
  • if 2 variables are in the same memory area, it is more efficient to read a bigger block.
  • Unfortunately when variables are not in the same area, the software will need to perform 2 access.

To improve performances, it is better to declare all the variables to monitor in the same memory area.

4. Snapshot mode

In snapshot mode, the data are copied by the embedded software to a buffer, and STM32CubeMonitor downloads this buffer. The "sampling frequency" is the rate to dump the buffer form the target.
The real acquisition rate can be higher than "sampling frequency", as it is managed by target MCU firmware. The snapshot mode can be useful to capture fast events, but the buffer fill speed must not be too high to avoid buffer overflow. Snapshot can be very efficient to store "burst" events.

5. Optimization of flow

When some graphical elements like gauges or bar-graph are used, there is no need to display data hundreds of times per seconds. The "single-value" subflow includes a rate limiter to decrease the number of data sent to dashboard nodes. This rate limitation reduce the computer CPU load, and allows to keep bandwidth for the acquisition.

6. Conclusion

The sampling rate can reach 1000Hz for a single variable, and if many variables are used the bandwidth will be shared and speed reduced.
It is worth to group all data in the same memory area to benefit of read optimizations.