Getting started with STM32Cube.AI Developer Cloud

Revision as of 15:11, 1 February 2024 by Registered User (→‎Launch the optimization)

This article is a step-by-step guide of how to use STM32Cube.AI Developer Cloud online platform and services.

STM32Cube.AI Developer Cloud workflow


1. Glossary

  • DMZ is a physical or logical subnetwork that contains and exposes an organization's external-facing services to an untrusted, usually larger network, such as the Internet.
  • ELF is a common standard format for binary files.
  • GUI stands for graphic user interface.
  • IOC is the project configuration file for STM32CubeMX.
  • K-means is a type of machine learning algorithm.
  • Keras is an open-source library that provides a Python interface for artificial neural networks.
  • MACC stands for multiply-accumulate.
  • MATLAB® is a proprietary multi-paradigm programming language and numeric computing environment.
  • MCU is an acronym for microcontroller unit.
  • Netron is a viewer for neural network, deep learning, and machine learning models.
  • npz is a binary file format used to store arrays.
  • ONNX is an open ecosystem that provides an open-source format for AI models.
  • PyTorch is a machine learning framework.
  • RAM stands for random access memory, which is volatile memory.
  • REST API stands for representational state transfer application programming interface.
  • scikit-learn is a free software machine learning library for the Python programming language.
  • STM32 model zoo is a collection of reference machine learning models that are optimized to run on STM32 microcontrollers.
  • Support vector machine (SVM) a type of machine learning algorithm.
  • TensorFlow Lite is a mobile library for deploying models on mobile, microcontrollers, and other edge devices.
  • TFLiteConverter is the API for converting TensorFlow models to TensorFlow Lite.
  • USART stands for universal synchronous/asynchronous receiver/transmitter, which allows a device to communicate using serial protocols.
  • X-CUBE-AI is an STM32Cube Expansion Package and part of the STM32Cube.AI ecosystem.

2. Overview

STM32Cube.AI Developer Cloud (STM32CubeAI-DC) is a free-of-charge online platform and service that enables the creation, optimization, benchmarking, and generation of artificial intelligence (AI) for STM32 microcontrollers based on the Arm® Cortex®‑M processor. It is based on the STM32Cube.AI core technology.

The benefits and features of STM32Cube.AI Developer Cloud are:

  • Online GUI (no installation required) accessible with STMicroelectronics extranet user credentials.
  • Network optimization and visualization providing the RAM and flash memory sizes required to run on the STM32 target.
  • Quantization tool to convert a floating-point model into an integer model.
  • Benchmark service on the STMicroelectronics hosted board farm including various STM32 boards to make the best suited hardware selection.
  • Code generator including the network C code and optionally the full STM32 project.
  • STM32 model zoo:
    • Easy access to model selection, training script, and key model metrics, directly available for benchmark.
    • Application code generator from the user's model with “Getting started” code examples.
    • Machine learning (ML) workflow automation service with Python scripts (REST API).
  • Supports all X-CUBE-AI features, such as:
    • Native support for various deep learning frameworks, such as Keras and TensorFlow Lite, and support for frameworks that can export to the ONNX standard format, such as PyTorch, MATLAB®, and more.
    • Support for the 8-bit quantization of Keras networks and TensorFlow Lite quantized networks.
    • Support for various built-in scikit-learn models, such as isolation forest, support vector machine (SVM), K-means, and more.
    • Possibility to use larger networks by storing weights in external flash memory and activation buffers in external RAM.
    • Easy portability across different STM32 microcontroller series.
  • User-friendly license terms.

3. Logging in

To start using the tool, navigate to the home page of the STM32Cube.AI Developer Cloud at https://stm32ai-cs.st.com/. The welcome page as shown below should appear:

STM32Cube.AI Developer Cloud welcome page


When the page has fully loaded, click the "START NOW" button. This redirects you to the sign-in page, as shown below:

STM32Cube.AI Developer Cloud login page


  • If you have a myST account, type your credentials and click the "Login" button.
  • If you do not currently have an account, create one by clicking the "Create Account" button and filling out the required form. This process is entirely free.
Info white.png Information
Using STM32Cube.AI Developer Cloud requires a working internet connection. A stable connection is recommended to avoid losing any data during the progress.

4. Creating a project

After a successful login, you are directed to the main page. This page presents three main zones, as shown below:

STM32Cube.AI Developer Cloud home page


4.1. Uploading a model

In the first zone, it is possible to upload any pre-trained AI models by clicking the "Upload" button and selecting the file from the file explorer or by using a drag-and-drop action on the file.

4.2. Importing from STM32 model zoo

The second zone shows a list of pretrained AI models, which can be used as a starting point for various use cases. These currently include five categories:

  • Hand posture
  • Image classification
  • Human activity recognition
  • Audio event detection
  • Object detection

These models originate from the STM32 model zoo.
To create a new project, click the "Import" button located next to the desired model.

4.3. Using saved models

The third zone functions as a workspace. This area contains all of the AI models that you have previously analyzed and benchmarked in STM32Cube.AI Developer Cloud.

4.4. Starting the project

In this article, we use a .h5 model from the model zoo, which was created using the TensorFlow Keras API. To use the model, follow these steps:

  1. Scroll down the model list from model zoo and locate the MOBILENET_V2_0.35_128_IMAGE_CLASSIFICATION_PERSON.H5 model. Once located, select the "Import" button adjacent to it.
    STM32Cube.AI Developer Cloud: import model from model zoo

    The model now appears in the workspace.
  2. Press the "Play" button to compute the Netron graph. When the analysis is complete, click the "Netron" icon to open a popup window displaying the model architecture in Netron. This view allows the observation of all layers and components of the model. To exit this view, click the "cross" button located in the top right corner.
    STM32Cube.AI Developer Cloud: start new project
  3. Click the "Start" button to create a new project.

After the project has been created, five action items become available in the top bar:

STM32Cube.AI Developer Cloud: five action items
  • Optimize: optimize the model with different options.
  • Quantize: quantize the float model using the post-training quantization.
  • Benchmark: benchmark the AI model on different STM32 boards from our board farm.
  • Results: view, analyze, and compare the results generated from different benchmarking runs.
  • Generate: generate the code and the projects for the target MCU family and board for the optimized and benchmarked model.

The following chapters of this page provide more details about these action items.

In this section, you can also view the details of the currently selected model. This includes relevant information, such as input and output shape, type, and MACC numbers, among other data.

STM32Cube.AI Developer Cloud: current selected model
Info white.png Information
This information is displayed in the section of each other action item as well.


5. Action: optimize

In the optimize section, a default optimization is conducted using the balanced optimization option. This enables "use activation buffer for input buffer" and "use activation for output buffer". You can modify the default settings and click on "Optimize" to observe the impact.

5.1. Optimization options

There are three distinct options available for optimizing AI models:

  • Balance between RAM size and inference time: this approach seeks to find a tradeoff between minimum RAM and the shortest inference time.
  • Optimize for RAM size: this approach seeks to optimize the RAM size.
  • Optimize for inference time: this approach seeks to optimize the inference time.
STM32Cube.AI Developer Cloud: optimize options
Info white.png Information
This optimization is based on a dataset-less approach, meaning that no training, validation, or test dataset is required to apply the compression and optimization algorithms. For further information, refer to UM2526.

By default, the balanced option is selected to provide users with the best compromise between the smallest footprint and the shortest inference time possible.

5.2. Use activation buffer for input/output buffer

When these options are enabled, it indicates that the “activations” buffer is also used to handle the input/output buffers. This impacts memory but not inference time. Depending on the size of the input/output data, the “activations” buffer may be larger but overall less than the sum of the activation buffer plus the input/output buffer.

Info white.png Information
For more details about these options, refer to UM2526.

5.3. Launch the optimization

To initiate the optimization process, select the desired options and click the "Optimize" button. While the optimization is running, you can monitor the terminal, which provides details regarding the optimization run. Any errors are displayed here.

STM32Cube.AI Developer Cloud: terminal

Multiple optimizations can be launched by selecting different options, allowing you to choose the option that best suits your needs. After each optimization operation, the terminal displays the reported numbers in terms of MACC, flash size, and RAM size.

STM32Cube.AI Developer Cloud: optimize results

When all optimizations have been conducted, you can select the option that best suits your needs. In this example we will use the balanced approach. From here, you can proceed to the next action item. If you are using a float model, the next step is quantization. However, if you are using a quantized model, you can skip quantization and proceed directly to benchmarking.

To start the quantization step, click the "Go to quantize" button.
To skip the quantization step and proceed directly to benchmarking, click the "Go to benchmark" button.

6. Quantize

This panel can be used to create a quantized model (8-bit integer format) from a Keras float model. Quantization is an optimization technique to compress a 32-bit floating-point model by reducing the size of the weights of the model (smaller storage size and less memory peak usage at runtime), improving CPU/MCU usage and latency (including power consumption) with a possible degradation of the accuracy. A quantized model executes some or all of the operations on tensors with integers rather than floating point values.

The quantize service uses Tensorflow post-training quantization interface offered by TFLiteConverter. You may select any input or output type from the three supported options, including int8, unsigned int8 or float32. To prevent accuracy loss during the quantization process, users are advised to provide their training dataset or a portion thereof. This can be provided in the form of .npz files. If no quantization file is provided, the quantization will be conducted with random data, and the resulting quantized model may only be used for benchmarks to obtain the necessary flash and RAM size, as well as the inference time. However, no accuracy will be calculated.

Info white.png Information
Data Protection: please note that all user data and models are stored, encrypted and protected in the Microsoft Azure Cloud Service. ST will have no access to them. For more information, refer to Data Protection section in this article.


Once an .npz file of the dataset has been provided, quantization can be performed by pressing the "Launch quantization" button. For the purposes of this demonstration, we will conduct the quantization without the dataset, using random numbers instead.

STM32Cube.AI Developer Cloud: launch quantization


Once the quantization has been completed, the quantized model will be listed below. You have the choice to run the "Optimize selected quantized model" by pressing the button and to compare the results with the float model. During the optimization process, the output will also be displayed in the terminal, as seen in the previous step..

STM32Cube.AI Developer Cloud quantized model


After the optimization of the quantized model, the results are displayed in the "History of optimization results" table. In our example, an approximately 70% reduction in both Flash and RAM size from the optimization should be observed:

STM32Cube.AI Developer Cloud: quantization results


Once satisfied with the quantization results, you can press the "Go to benchmark" button. This will direct you to the model benchmarking step.

7. Benchmark

In the Benchmark panel, you can find the currently selected model, as well as the current parameters used for optimization.

STM32Cube.AI Developer Cloud: benchmark header


The benchmark service lets users to remotely run selected AI models on several STM32 boards and obtain the internal or external Flash and RAM usage, as well as the inference time. These STM32 boards are hosted in ST premises, known as the "board farm" and will be accessible via a waiting queue Some basic information of the boards is listed, including CPU type and frequency, as well as both internal and external memory sizes.

STM32Cube.AI Developer Cloud: benchmark boards


To initiate the benchmark on a given board, simply press the "Start Benchmark" button next to that board. This will start the benchmarking process. For the purposes of this article, we will choose the STM32H735G-DK.

Info white.png Information
If the required memory size of the selected model exceeds the available memory on the board, the "Start Benchmark" button will be grayed out.


During the benchmarking process, users may observe the progress bar and current status in real-time. A project for system performance application will be created. The tool builds the project with the C code of the AI model and flashes one of the actual physical boards in our board farm. It runs the application on the board and provides users with the memory footprint and inference time. It is important to note that the inference times reported here are actual inference times on real physical boards, rather than the result of an emulation.

Once the benchmark has been completed, the measured inference time will be reported next to the board..

STM32Cube.AI Developer Cloud - benchmark - inference time


To obtain further details regarding the resources used per layer, simply click on the "three-dot" icon and choose the "Show details per layer" option. A dialog box with the corresponding bar chart will appear.

STM32Cube.AI Developer Cloud: benchmark - show details


The bar chart displays the size of each of the layers in bytes. Users may toggle between the bar chart and the pie chart by clicking the "Toggle Pie/Bar Chart" button. While the bar chart displays the actual size, the pie chart displays the distribution.

STM32Cube.AI Developer Cloud: benchmark - toggle charts


Users may perform the same action for the execution time. The bar chart displays the duration in the milliseconds spent on each of the layers, while the pie chart displays the distribution of the execution time by each of the layers. To close this view, simply click anywhere outside the dialog.

The tool enables users to launch a benchmark on multiple boards simultaneously. You can launch the benchmarking on all boards by pressing all the "Start Benchmark" buttons. This will repeat the process for all boards and report the inference time on all boards next to them. Once all launched benchmarking runs have been completed, you can go to the Results section.

Info white.png Information
It is not necessary to run the benchmarking on all boards. If satisfied with one of the benchmarks, you can directly click on the "three-dot" icon and choose "Generate code for this board. This will skip the "Results" step and proceed directly to the "Generate" step.


8. Results

The Results page showcases a table of all benchmarks conducted in the STM32Cube.AI Developer Cloud, which is intuitive for comparing all benchmarks. Users may directly select a benchmark for code generation by pressing the "Generate with this configuration" button. For this demonstration, we will choose the STM32H735G-DK, which provides the smallest inference time.

STM32Cube.AI Developer Cloud - Results


9. Generate

The Generate page offers various outputs that cater to different needs. These include updating an existing project, creating a new STM32CubeMX project, generating a STM32CubeIDE project with all the sources included, or downloading a compiled firmware to estimate the inference time on your own board.

Upon reaching the "Generate" step, users will be presented with the same header of the currently selected model and current parameters. If coming from the Benchmark or Results step, the target board will already be selected. Otherwise, users will need to filter from CPU types or STM32 series to select their target STM32 board.

STM32Cube.AI Developer Cloud: Generate - select board


Following this, users will have four generation options to choose from:

  • Download C Code
  • Download STM32CubeMX IOC file
  • Download STM32CubeIDE Project
  • Download Firmware
STM32Cube.AI Developer Cloud: Generate - 4 types


9.1. Download C code

This option generates STM32-optimized C Code associated with your pre-trained Neural Network. The ZIP package contains all network.c and .h files, the Cube.AI runtime library files and the stm32ai command line reports and output. Users can replace the existing files in their project with these newly generated files.

9.2. Download STM32CubeMX IOC file

This option generates a ZIP package containing an IOC file and the selected model, which is ready to start a new STM32CubeMX project locally on your machine. Users can open the IOC file with STM32CubeMX v6.6.1 or above and either directly generate the code or add other peripherals required for their application.

9.3. Download STM32CubeIDE Project

This option generates a ZIP package with a STM32CubeIDE project including IOC file, Project File, file tree and STM32-optimized C Code:

  • The board is the selected board
  • X-CUBE-AI is activated in the project and SystemPerformance application is selected
  • The neural network model is configured in the project and is available in the ZIP package
  • The USART needed for system performance application is already configured
  • The STM32CubeIDE project and all the code and library are generated

Users can open the project with STM32CubeIDE 1.10.0 or above by double-clicking the .project file in the package. Once the project is imported into STM32CubeIDE, users can either edit the code or directly compile and flash the program on their own board.

9.4. Download Firmware

This option generates ELF file(s) associated to the selected board, which can be directly flashed on users' own board using STM32CubeProgrammer. The System Performance application is enabled on the application.

The AI system performance application is a self and bare-metal on-device application that allows the out-of-the-box measurement of the critical system integration aspects of the generated neural network. The accuracy performance aspect is not and cannot be considered here. The reported measurements include CPU cycles by inference (duration in ms, CPU cycles, CPU workload), and used stack and used heap (in bytes).

10. Other information

10.1. Project cannot be saved

Please note that, in some particular circumstances, your progress in the working project may be lost and cannot be retrieved, for example, if the web browser is accidentally closed or if you lose the internet connection. If this happens, you will need to start over.

10.2. Data Protection

STM32Cube.AI Developer Cloud is deployed on the Microsoft Azure Cloud Service.

10.2.1. Customer External access and data upload

  • External access to the service is always done through a firewall, a load balancer and a route dispatcher. All accesses are performed using encrypted secured https.
  • All users are authenticated using my.st.com authentication.
  • There is no direct access to the internal Azure services nor uploaded resources.
  • Uploaded data are checked against malicious data.

10.2.2. Model and Data storage

  • Uploaded models are stored in an Azure storage service and accessible only by the user that has uploaded the model, the stm32ai micro services and the benchmark farm for the purpose of their service.
  • Models are automatically deleted after 6 months of inactivity.
  • Uploaded data is kept only for the time of the action.
  • Access to the storage is only allowed by private end points not visible outside the DMZ
Info white.png Information
For more information about the deployment architecture and data protection, refer to this online documentation.


10.3. Embedded Documentation

STM32Cube.AI Developer Cloud: documentation and information bubbles


Users can open the online documentation anytime by clicking on the "Open documentation" button embedded in the page or clicking on the "Documentation" icon at the top of the page to open it in a new tab/window. To close the documentation, simply press the "close documentation" button.

In certain pages, there are also "Information bubbles" available too. By click on the bubble, users will directly go the corresponding documentation page related to that topic.

10.4. Step-by-step video Tutorial on YouTube

ST has also created a video of a step-by-step guide for STM32Cube.AI Developer Cloud. You can find this video on STMicroelectronics official YouTube channel: Getting started with STM32Cube.AI Developer Cloud

11. References



No categories assignedEdit