X-CUBE-AI documentation

This article describes the documentation related to X-CUBE-AI and especially the X-CUBE-AI embedded documentation contents and how to access it. The embedded documentation is installed with X-CUBE-AI, which ensures to provide the accurate documentation for the considered version of X-CUBE-AI.

Info white.png Information
  • X-CUBE-AI is a software that generates optimized C code for STM32 microcontrollers and Neural Network inference. It is delivered under the Mix Ultimate Liberty+OSS+3rd-party V1 software license agreement (SLA0048).

1 X-CUBE-AI getting started and user guide

There are four main documentation items for X-CUBE-AI completed by WiKi articles:

2 X-CUBE-AI embedded documentation contents

The embedded documentation describes the following topics (for X-CUBE-AI version 7.0.0) in detail:

  • User guide
    • Installation: specific environment settings to use the command-line interface in a console.
    • Command-line interface: the stm32ai application is a console utility, which provides a complete and unified command-line interface (CLI) to generate the X-CUBE-AI optimized library for an STM32 device family from a pre-trained DL/ML model. This section describes the CLI in detail.
    • Embedded inference client API: this section describes the embedded inference client API, which must be used by a C application layer (AI client) to use a deployed C model.
    • Evaluation report and metrics: this section describes the different metrics (and associated computing flow) that are used to evaluate the performance of the generated C files (or C model), mainly through the validate command.
    • Quantized model and quantize command: the X-CUBE-AI code generator can be used to deploy a quantized model (8-bit integer format). Quantization (also called calibration) is an optimization technique to compress a 32-bit floating-point model by reducing its size, by improving the CPU/MCU usage and latency, at the expense of a small degradation of the accuracy. This section describes the way X-CUBE-AI supports quantized models and the CLI internal post-training quantization process.
  • Advanced features
    • Relocatable binary network support: a relocatable binary model designates a binary object, which can be installed and executed anywhere in an STM32 memory sub-system. It contains a compiled version of the generated NN C files including the requested forward kernel functions and the weights. The principal objective is to provide a flexible way to upgrade an AI-based application without generating and programming the whole end-user firmware again. For example, this is the primary element needed to use the FOTA (Firmware Over-The-Air) technology. This section describes how to build and use a relocatable binary.
    • Keras LSTM stateful support: this section describes how X-CUBE-AI provides an initial support for the Keras stateful LSTM.
    • Keras Lambda/custom layer support: the goal of this section is to explain how to import a Keras model containing Lambda or custom layers, following one way or another depending on the nature of the model.
    • Platform Observer API: for advanced run-time, debug or profiling purposes, an AI client application can register a call-back function to be notified before or/end after the execution of a C node. The call-back can be used to measure the execution time, dump the intermediate values, or both. This section describes how to use and take advantage of this feature.
    • STM32 CRC IP as shared resources: to use the network_runtime library, the STM32 CRC IP must be enabled (or clocked); otherwise, the application hangs. To improve the usage of the CRC IP and consider it as a shared resource, two optional specific hooks or call-back functions are defined to facilitate its usage with a resource manager. This section describes how it works.
    • TensorFlow™ Lite for Microcontrollers support: the X-CUBE-AI Expansion Package integrates a specific path, which allows to generate a ready-to-use STM32 IDE project embedding a TensorFlow™ Lite for Microcontrollers runtime and its associated TFLite (TensorFlow™ Lite) model. This can be considered as an alternative to the default X-CUBE-AI solution to deploy an AI solution based on a TFLite model. This section describes how X-CUBE-AI supports TensorFlow™ Lite for Microcontrollers.
  • How to
    • How to use USB-CDC driver for validation: this article explains how to enable the USB-CDC profile to perform faster the validation on the board. A client USB device with the STM32 Communication Device Class (namely the Virtual COM port) is used as communication link with the host. It avoids the overhead of the ST-LINK bridge connecting the UART pins to or from the ST-LINK USB port. However, an STM32 Nucleo or Discovery board with a built-in USB device peripheral is requested.
    • How to run locally a C model: this article explains how to run locally the generated C model. The first goal is to enhance an end-user validation process with a large data set including the specific pre- and post-processing functions with the X-CUBE-AI inference run-time. It is also useful to integrate an X-CUBE-AI validation step in a CI/CD/MLOps flow without an STM32 board.
    • How to upgrade an STM32 project: this article describes how to upgrade manually or with the CLI an STM32CubeMX-based or proprietary source tree with a new version of the X-CUBE-AI library.
  • Supported DL/ML frameworks: lists the supported Deep Learning frameworks, and the operators and layers supported for each of them.
    • Keras toolbox: lists the Keras layers (or operators) that can be imported and converted. Keras is supported through the TensorFlow™ backend with channels-last dimension ordering. Keras.io 2.0 up to version 2.5.1 is supported, while networks defined in Keras 1.x are not officially supported. Up to TF Keras 2.5.0 is supported.
    • TensorFlow™ Lite toolbox: lists the TensorFlow™ Lite layers (or operators) that can be imported and converted. TensorFlow™ Lite is the format used to deploy a Neural Network model on mobile platforms. STM32Cube.AI imports and converts the .tflite files based on the flatbuffer technology. The official ‘schema.fbs’ definition (tags v2.5.0) is used to import the models. A number of operators from the supported operator list are handled, including the quantized models and/or operators generated by the Quantization Aware Training process, by the post-training quantization process, or both.
    • ONNX toolbox: lists the ONNX layers (or operators) that can be imported and converted. ONNX is an open format built to represent Machine Learning models. A subset of operators from Opset 7, 8, 9 and 10 of ONNX 1.6 is supported.
  • Frequently asked questions
    • Generic aspects
      • How to know the version of the Deep Learning framework components used?
      • Channel first support for ONNX model
      • How is used the CMSIS-NN library?
      • What is the EABI used for the network_runtime libraries?
      • X-CUBE-AI Python™ API availability?
      • Stateful LSTM support?
      • How is used the ONNX optimizer?
      • How is used the TFLite interpreter?
      • TensorFlow™ Keras (tf.keras) vs. Keras.io
      • It is possible to update a model on the firmware without having to do a full firmware update?
      • Keras model or sequential layer support?
      • Is it possible to split the weights buffer?
      • Is it possible to place the “activations” buffer in different memory segments?
      • How to compress the non-dense/fully-connected layers?
      • Is it possible to apply a compression factor different from x8, x4?
      • How to specify or indicate a compression factor by layer?
      • Why is a small negative ratio reported for the weights size with a model without compression?
      • Is it possible to dump/capture the intermediate values during inference execution?
    • Validation aspects
      • Validation on target vs. validation on desktop
      • How to interpret the validation results?
      • How to generate npz/npy files from an image data set?
      • How to validate a specific network when multiple networks are embedded into the same firmware?
      • Reported STM32 results are incoherent
      • Unable to perform automatic validation on target
      • Long time process or crash with a large test data set
    • Quantization and post-training quantization process
      • Backward compatibility with X-CUBE-AI 4.0 and X-CUBE-AI 4.1
      • Is it possible to use the Keras post-training quantization process through the UI?
      • Is it possible to use the Keras post-training quantization process with a non-classifier model?
      • Is it possible to use the compression for a quantized model?
      • How to apply the Keras post-training quantization process on a non-Keras model?
      • TensorFlow™ Lite, OPTIMIZE_FOR_SIZE option support

3 X-CUBE-AI embedded documentation access

To access the embedded documentation, first install X-CUBE-AI. The installation process is described in the getting started user manual. Once the installation is done, the documentation is available in the installation directory under X-CUBE-AI/7.0.0/Documentation/index.html (adapt the example, given here for version 7.0.0, to the version used). For Windows®, by default, the documentation is located here (replace the string username by your Windows® username): file:///C:/Users/username/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/7.0.0/Documentation/index.html. The release notes are available here: file:///C:/Users/username/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/7.0.0/Release_Notes.html.

The embedded documentation can also be accessed through the STM32CubeMX UI, once the X-CUBE-AI Expansion Package has been selected and loaded, by clicking on the "Help" menu and then on "X-CUBE-AI documentation":

X-CUBE-AI documenation