How to optimize the boot time

Revision as of 11:06, 3 December 2021 by Registered User
Applicable for STM32MP13x lines, STM32MP15x lines

1. Article purpose[edit source]

The purpose of this document is to provide information on how to measure and improve the boot-time[1] of a typical STM32MP1 series Linux system. This article is not an exhaustive list of possible optimizations, and those that are considered insufficiently reliable for industrial use are intentionally omitted.

2. Overview[edit source]

On a typical STM32MP1 Linux system, the boot-chain is performed, respectively, by: the ROM code, TF-A, U-Boot, the Linux kernel, and the user-land[2]. All these components except the ROM code can be modified, and thus configured to start more quickly. For each of them, the procedure is the same: features that are not required at boot-time must be initiated after system boot (or disabled), and features that improve the boot time must be enabled.

3. Measuring boot-time[edit source]

Before optimizing the performance of any piece of software, the time duration of each part must be considered so that the effort can be focused effectively.

3.1. Using a serial console[edit source]

One of the easiest way to measure the boot-time of a Linux system is to observe traces emitted on a serial console using timing software, such as serialgrab or like the script File:Measure-timing.txt based on microcom and p2f.

For example:

  microcom -p /dev/ttyACM0 | bash Measure-timing.txt
 Waiting for board reset...
 NOTICE:  Model: STMicroelectronics STM32MP157C eval daughter on eval mother
 NOTICE:  Board: MB1263 Var1 Rev.C-01
 U-Boot 2018.11-stm32mp-r2 (Nov 14 2018 - 16:10:06 +0000)
 Model: STMicroelectronics STM32MP157C eval daughter on eval mother
 Starting kernel ...
 [    0.000000] Booting Linux on physical CPU 0x0
 Freeing unused kernel memory: 1024K
 Run /sbin/init as init process
 ST OpenSTLinux - Weston - (A Yocto Project Based Distro) 2.6-openstlinux-4.19-thud-mp1-19-02-20 stm32mp1 ttySTM0
 stm32mp1 login:
 Timing results:
 FSBL:  1.23s
 SSBL:  2.34s
 Linux: 2.34s
 init:  1.23s
 total: 7.14s

Note: these are illustrative figures since the boot-time depends on too many parameters to provide meaningful figures here.

3.2. Using hardware timers[edit source]

When measuring traces from a serial console is not precise enough, or when traces are not available, it is possible to use hardware timers. Some components of the boot chain provide timing information based on hardware timers; for instance with U-Boot, the following options can be enabled:


This prints on the serial console — just before booting the OS — something similar to the output below (illustrative figures):

 Starting kernel ...
 Timer summary in microseconds (11 records):
        Mark    Elapsed  Stage
           0          0  reset
     123,456    123,456  board_init_f
   2,345,678  2,222,222  board_init_r
   3,456,789  1,111,111  id=64
   3,567,890    111,101  id=65
   3,678,901    111,011  main_loop
   4,567,890    888,989  bootm_start
   4,678,901    111,011  id=15
   4,789,012    110,111  start_kernel
 Accumulated time:
                 23,456  dm_r
                567,890  dm_f
 Booting Linux on physical CPU 0x0

If the serial console is not available, this timing information can also be stored in the device-tree or in memory:


It is possible to add this feature to boot-chain components that do not implement it by default, like TF-A. See the file arch/arm/cpu/armv7/arch_timer.c from U-Boot for an example.

3.3. Using GPIOs[edit source]

When the console is disabled, the boot-chain components can be modified to trigger events on GPIOs according to the current stage of the boot-process. Then a logical analyzer can be used to measure the time between each such event.

Error creating thumbnail: Unable to save thumbnail to destination
Example of trace captured by Sigrok and displayed by PulseView.

4. Optimizing boot-time[edit source]

4.1. TF-A[edit source]

4.1.1. Configuration[edit source]

The TF-A execution time can be noticeably reduced by disabling features that are not required. To achieve this, the right options have to be specified when building the TF-A; for instance with a system that boots exclusively from NAND logic connected through the FMC:

  make STM32MP1_DEBUG_ENABLE=0    \
            STM32MP1_UART_PROGRAMMER=0 \
            STM32MP1_USB=0             \
            STM32MP1_QSPI_NOR=0        \
            STM32MP1_QSPI_NAND=0       \
            STM32MP_FMC_NAND=1         \
            STM32MP_EMMC=0             \
            STM32MP_SDMMC=0            \

These build options must of course be adjusted according to the expected usage and the TF-A version.

4.1.2. I2C timing[edit source]

I2C driver computes device timing at init, in order to guarantee correct data hold and setup times. The execution time of the computation algorithm is not so important, but with all associated device tree accesses, it can last several 10ms. It has to be done for each instance, so it can happen several times in the boot sequence.

For specific needs, a hard-coded timing (and its related frequency) can be used in order to optimize boot performances. Dedicated fields can be added in the I2C instance related handle structure. Then, once retrieved, these boot values can be configured by the client, this avoids to compute timings (and to do several device tree accesses) on first init for each instance.

4.2. U-Boot[edit source]

4.2.1. Delay[edit source]

Before loading the Linux kernel and its device tree, U-Boot, by default, waits one second for a potential user input. This behavior — undesirable when boot time is a concern — can be easily removed by specifying bootdelay=0 in include/configs/stm32mp1.h.

4.2.2. Configuration and device tree[edit source]

More broadly, removing support for all unused devices from U-Boot configuration and unused nodes from the device-tree drastically reduces the U-Boot execution time, since this eliminates the initialization time of those devices and the time to parse the device-tree.

There are also a couple of U-Boot features that are specially designed to improve the boot-time, for example:


Regarding support for UBI fastmap (for NOR and NAND storage media), please see the "Linux" section below for more information.

4.2.3. Configuration access[edit source]

By default U-Boot searches a boot configuration file extlinux.conf or a boot script boot.scr.uimg from a bootable partition ("bootfs" for OpenSTLinux), see U-Boot_overview#Generic_Distro_configuration for details. Such access to a file system on storage media can take a short of time, but fortunately it is possible to embed a boot command into the U-Boot binary instead. To do this, the whole U-Boot environment must be specified from scratch:


Or the U-Boot "distro" mode must be disabled:


In the latter case, be aware that if booting from an ext2/4 partition — typically when booting from an SD card or from eMMC — the following options must be explicitly selected (they were previously selected implicitly by CONFIG_DISTRO_DEFAULTS):


In both cases only CONFIG_ENV_IS_NOWHERE must be set to y (remove all the other CONFIG_ENV_IS...) , and the environment variable bootcmd must contain the expected boot command, for example:

 CONFIG_BOOTCOMMAND="run bootcmd_ubi"
 bootcmd_ubi=env set bootargs ubi.mtd=UBI root=ubi0:boot \
                          rootfstype=ubifs rootwait  \
                          rw console=ttySTM0,115200; \
          ubi part UBI;                              \
          ubifsmount ubi0:boot;                      \
          ubifsload 0xc2000000 /zImage;              \
          ubifsload 0xc4000000 /stm32mp157c-ev1.dtb; \
          bootz 0xc2000000 - 0xc4000000

4.3. Linux[edit source]

4.3.1. Configuration and device-tree[edit source]

For U-Boot, one of the most efficient ways to decrease the Linux boot time is to remove support for all unused devices from its configuration and device-tree, since this eliminates the time taken to initialize those devices. Also, support for devices and features that are required but not mandatory at boot-time can be compiled as modules, and then loaded by the user-land once the boot-process is done (see the "Init" subsection below).

4.3.2. Traces[edit source]

Linux boot traces have a size of about 2 Kbytes, so they take about 2 seconds to transfer on a serial link configured at 128 Kbit/s. If this is an issue, the following kernel parameters (bootargs variable in U-Boot) can be used to remove these traces:

 quiet loglevel=0

This is automatically done by U-Boot when silent mode is required (see CONFIG_SILENT_CONSOLE and CONFIG_SILENT_U_BOOT_ONLY): silent=1 in used U-Boot environment.

4.3.3. UBI volume[edit source]

When partitions such as "bootfs" and "rootfs" are stored on a UBI volume, the following two actions are highly recommended to greatly reduce the volume attachment time at boot-time, both by U-Boot and by the Linux kernel:

1. Decrease the size of the UBI volume 2. Enable the fastmap feature. For this, the CONFIG_MTD_UBI_FASTMAP option must be set to y both for Linux and U-Boot, and ubi.fm_autoconvert=1 has to be added to the kernel boot parameters. Note that the very first boot is used to create the fastmap information, thus this one does not get faster.

A lot of other volume/file-systems exist for Flash media, such as JFFS, LogFS, F2FS, and so on. Depending on the use-case, some may be faster than others.

4.4. User-land[edit source]

4.4.1. Init[edit source]

The very first user-land process launched by the Linux kernel is init; it is in charge of launching other processes in the right order. As a consequence, removing all unnecessary services can greatly speed up this part of the boot-process. The way services can be removed highly depends on the system in use (systemd, OpenRC and so on), so please refer to its documentation.

Ultimately, when no services are required, the init system can be replaced by the final application itself; either by storing its binary in /sbin/init, or by passing the following Linux boot parameter:


4.4.2. Framework[edit source]

For graphical applications, the choice of display framework can seriously impact the startup time of the application itself. For instance, one could use the Linux framebuffer instead of Weston or Xorg, since the former is set up faster than the latter.

Likewise for video applications; if the boot-time is a concern, direct use of the V4L2 interface instead of higher-level interfaces such as GStreamer is recommended.

5. Conclusion[edit source]

There is no universal recipe to improve the boot-time of a Linux system, since it depends on the contraints of the final use-case. However the universal rule is: always benchmark, before and after optimizing.

6. Further reading[edit source]

7. References[edit source]

  1. Sometimes referred to as startup-time.
  2. Sometimes referred to as user-space.