How to optimize acquisition speed

1. Introduction

The acquisition rate can be an important parameter to monitor values changing quickly. The time required to retrieve the data values from the target can be influenced by many parameters. The purpose of the article is to explain which parameters must be checked and how to get the best sampling frequency.

STM32CubeMonitor variable monitoring
STM32CubeMonitor performs data acquisition by reading in the target MCU memory though ST-Link and SWD/JTAG connection.

Acquisition chain

The elements involved are :

  • STM32 MCU debug block which performs access to the memory through the MCU bus.
  • STM32 JTAG/SWD access port, used to connect the MCU debug block to ST-Link.
  • ST-Link device.
  • USB protocol on the computer.
  • STM32CubeMonitor software.

Each time data need to be sampled, the software reads the memory area where variables are located. If several memory areas need to be accessed, there are multiple requests for each read cycle.

2. Acquisition speed setting in STM32CubeMonitor

The acquisition frequency can be set in the 'variable' node. If several 'variable' nodes are used, different speed can be set for each node. It is then possible to have some variables refreshed very quickly, while others are refreshed at a lower rate. To set the frequency, open the 'variable' node. In "Acquisition Parameters" part, the "Sampling Frequency" is used to set the acquisition rate:

  • "sequential loop" provides the fastest acquisition rate: as soon as a measurement is complete, a new one is started.
  • 0.1HZ ... 1000Hz: predefined frequency.
  • Custom: used to set a specific frequency in Hz.

Click on "Done" and then "Deploy" to update the flow. The next acquisition will be done at the requested frequency, or at maximum speed if the requested frequency can not be reached.

3. Elements influencing acquisition time

3.1. ST-Link SWD/JTAG speed

The clock speed for link between ST-Link and MCU can be changed. With a high speed clock, the time to transmit the read request and response is shorter. To optimize the sampling frequency, the highest frequency must be used.
The JTAG/SWD clock frequency can be set in the "probe config" of "Probe in" or "Probe out" nodes.
The ST-Link V3 hardware provides higher frequency than ST-Link V2, so the acquisition is faster.
If the JTAG/SWD connection is done through long wires, with the maximum speed the connection may be unstable. In this case, lower frequency should be used to get a reliable connection.

3.2. Computer performances

The read operation from STM32CubeMonitor involves the software, the drivers and USB stack on the computer. The USB stack is managed by computer OS, and some latency is added when each request is sent to ST-Link. This latency is a major part of the read time. It is not possible to improve it, so it is important to reduce the number of read operation required for each acquisition cycle.
The total time required to process the transaction is dependent of operating system and speed of the computer used. It can take more than 1ms on some computers. It is important to ensure that the CPU frequency is the maximum available for your computer: On a laptop verify the power mode, to save battery the frequency can be decreased automatically especially when the power supply is unplugged.
Avoid to have another application or process needing important resources to reduce process concurrency. For example a background process like a background screen change.
In STM32CubeMonitor v1.4.0 multithreading of acquisition task has been implemented to improve acquisition performances :

  • Multi probe acquisition speed is increased.
  • he sampling time is computed to match better the requested sampling frequency.

In STM32CubeMonitor v1.5.0 a new parameter named acqFrequencyThreshold is settable in settings.js file. This threshold value is set to 10Hz as default value. This threshold defined tow modes:

  • Normal mode for acquisition frequency upper than acqFrequencyThreshold, the acquisition is more accurate, but the CPU workload is higher.
  • Eco mode for acquisition frequency lower or equal to acqFrequencyThreshold the acquisition is less accurate, but the CPU workload is lower. This mode is useful in case there are a lot of probes in low acquisition frequency.

3.3. Use of ST-Link server

When the ST-link server is used, it adds an extra protocol layer for TCP mode. It is then possible to share the ST-Link between several programs, but there is a performance decrease due to extra layer added. (The speed could be reduced by 50%).

3.4. Data size and location

The size and location of variables have a direct impact on time required to read data. The size effect is easy to understand: The amount of data to read is at minimum the size of variable multiplied by the number of variables. Reading 10 u32 variable needs to transfer 40 bytes of payload. To increase the access speed, it is useful to reduce the number of data.

The data location impact is more complex to estimate. Performing each data access is time consuming, and it is more efficient to read many variable in one read operation when possible.
STM32CubeMonitor performs this optimization automatically :

  • If 3 u8 variables are in the same u32 block, the tool reads one 32bit data instead of 3 u8 access. It is 3 time faster, and the u8 values are extracted inside STM32cubeMonitor
  • if 2 variables are in the same memory area, it is more efficient to read a bigger block.
  • Unfortunately when variables are not in the same area, the software needs to perform 2 access.

To improve performances, it is better to declare all the variables to monitor in the same memory area.

4. Snapshot mode

In the snapshot mode, the data are copied by the embedded software to a buffer, and STM32CubeMonitor downloads this buffer. The "sampling frequency" is the rate to dump the buffer form the target.
The real acquisition rate can be higher than "sampling frequency", as it is managed by target MCU firmware. The snapshot mode can be useful to capture fast events, but the buffer fill speed must not be too high to avoid a buffer overflow. The Snapshot can be very efficient to store "burst" events.

5. Optimization of flow

When some graphical elements like gauges or bar-graph are used, there is no need to display data hundreds of times per seconds. The "single-value" subflow includes a rate limiter to decrease the number of data sent to dashboard nodes. This rate limitation reduces the computer CPU load, and allows to keep the bandwidth for the acquisition.

6. Conclusion

The sampling rate can reach 1000Hz for a single variable, and if many variables are used the bandwidth will be shared and the speed reduced.
It is worth grouping all data in the same memory area to benefit of read optimizations.
It is important to ensure that the frequency of the CPUs is at the maximum and that there is not too much application in parallel.