Last edited one week ago

How to transcode a video using hardware acceleration

Applicable for STM32MP25x lines

1. Introduction[edit | edit source]

Transcoding a video consists to decode this video content and re-encode it with a target compression format, eventually applying a video filter in-between such as framerate adaptation, rescale, color effect or any other pixel processing.

The article V4L2 video codec overview explains how to decode and encode video using hardware acceleration, we will focus here on video filters and on leveraging the hardware resources such as CPU and GPU.


2. GStreamer transcode how-to[edit | edit source]

Here is the topology of a typical GStreamer transcoding pipeline using gst-launch:

 <H264 source> !
    \_ Any source providing H264 video bitstream content
       (decodebin, uridecodebin, v4l2src, ...)

 videorate ! video/x-raw, framerate=<target framerate>/1 !
    \_ Drop input frames till <target framerate> is reached,
       this save processing/encoding CPU/GPU resources

 videoscale ! video/x-raw, width=<target width>, height=<target height> !
    \_ Software scaling to <target width>/<target height>
        OR
 glupload ! glcolorscale ! gldownload ! video/x-raw, width=<target width>, height=<target height> !
    \_ Hardware scaling to <target width>/<target height> using GPU

 v4l2slh264enc ! h264parse !
    \_ H264 hardware encoding to H264 bitstream using VENC peripheral

 qtmux ! filesink location=<MP4 file>
    \_ Muxing to MP4 video content and save to local disk

 sync=true
    \_ Synchronize filesink on pipeline clock (pipeline framerate).
       Here, pipeline framerate is set according to videorate src caps,
       the overall scale-encoding processing is so executed at this exact rate
       instead of 'as fast as possible' ("sync=false" is the filesink default)
       reducing de-facto the CPU/GPU load.

 -e
   \_ Emit EOS (End Of Stream) message when interrupting command (CTRL+C for ex.).
      This allows qtmux to finish in clean way the closure of the file to 
      generate a valid playable file, otherwise it is seen as corrupted.

 | grep ended 2>&1
    \_ Disable all traces, including pipeline time counter
       this is needed to keep CPU/GPU load readable on terminal
       but keep pipeline execution duration final line to get transcoding framerate


3. Examples[edit | edit source]

Info white.png Information
Do not have big_buck_bunny_720p_H264_AAC_25fps_3400K_short.mp4? Download it from here.

Execute first the following commands on target in order to display regularly the CPU and GPU load:


(while true; do \
gpu_last_gc=$(cat /sys/kernel/debug/gc/idle); \
m=$(mpstat 1 1 | grep "Average:     all" | awk -F" " '{print "cpu load " $3 "%"}'); \
echo "-------------------------"; \
gpu_on=$(echo $gpu_last_gc | tr -d '\n' | tr -d ',' | awk -F" " '{printf("%f\n", $2)}'); \
gpu_start=$(echo $gpu_last_gc | tr -d '\n' | tr -d ',' | awk -F" " '{printf("%f\n", ($2+$5+$8+$11))}'); \
tr -d '\n' < /sys/kernel/debug/gc/idle | tr -d ',' | awk -v on=$gpu_on -v start=$gpu_start -F" " '{printf ("gpu load  %.0f%%\n", ($2 - on) * 100/($2+$4+$6+$8 - start));}'; \
echo $m; \
done) &

More details on those commands can be found in How to monitor the GCNANO GPU load and mpstat articles.


3.1. Rescale video[edit | edit source]

Transcode a 720p 25 fps stream to 960x520 as fast as possible with software downscale using GStreamer videoscale element:

gst-launch-1.0 filesrc location= big_buck_bunny_720p_H264_AAC_25fps_3400K_short.MP4 ! queue name=qtdemux ! qtdemux ! queue name=dparse ! h264parse ! queue name=h264dec ! v4l2slh264dec ! queue name=vscale ! videoscale ! video/x-raw, width=960, height=520 ! queue name=vconv ! videoconvert ! video/x-raw, format=NV12 ! queue name=h264enc ! v4l2slh264enc ! queue name=eparse ! h264parse ! queue name=qtmux ! qtmux ! queue name=filesink ! filesink location=720p_to_960x520_30fps.mp4 -e | grep ended 2>&1
gpu load 0%
cpu load 50.44%
Execution ended after 0:00:13.765451424

Transcoding occurs at 7.3 fps (100 frames in 13.7s) with 50% CPU load, the GPU is not used.


3.2. Decrease video framerate[edit | edit source]

Transcode a 720p 25 fps stream to a 960x520 15 fps stream as fast as possible with software downscale and reduction of framerate using GStreamer videorate element:

gst-launch-1.0 filesrc location= big_buck_bunny_720p_H264_AAC_25fps_3400K_short.MP4 ! queue name=qtdemux ! qtdemux ! queue name=dparse ! h264parse ! queue name=h264dec ! v4l2slh264dec ! videorate ! video/x-raw, framerate=15/1 ! queue name=vscale ! videoscale ! video/x-raw, width=960, height=520 ! queue name=vconv ! videoconvert ! video/x-raw, format=NV12 ! queue name=h264enc ! v4l2slh264enc ! queue name=eparse ! h264parse ! queue name=qtmux ! qtmux ! queue name=filesink ! filesink location=720p_to_960x520_10fps.mp4 -e | grep ended 2>&1
gpu load 0%
cpu load 51.22%
Execution ended after 0:00:08.192601067

Transcoding occurs faster at 12.2fps with 51% CPU load, the GPU is not used.

This shows how the limitation of framerate helps to gain some transcoding speed.


3.3. Rescale video using GPU[edit | edit source]

Transcode a 720p 25 fps stream to a 960x520 30 fps stream as fast as possible with GPU downscale using GStreamer glcolorscale element:

gst-launch-1.0 filesrc location= big_buck_bunny_720p_H264_AAC_25fps_3400K_short.MP4 ! queue name=qtdemux ! qtdemux ! queue name=dparse ! h264parse ! queue name=h264dec ! v4l2slh264dec ! queue name=glup ! glupload ! queue name=glscale ! glcolorscale ! queue name=gldown ! gldownload ! video/x-raw, width=960, height=520 ! queue name=h264enc ! v4l2slh264enc ! queue name=eparse ! h264parse ! queue name=qtmux ! qtmux ! queue name=filesink ! filesink location=720p_to_960x520_30fps.mp4 -e | grep ended 2>&1
gpu load 26%
cpu load 34.2%
Execution ended after 0:00:02.579300742

Transcoding occurs faster at 38.9 fps with 34% CPU load and 26% GPU load.

Replacing software scaling by GPU scaling multiply by three the transcoding rate.


3.4. Rescale video using GPU while decreasing framerate[edit | edit source]

Transcode a 720p 25 fps stream to a 960x520 15 fps stream as fast as possible using GPU downscale.

gst-launch-1.0 filesrc location= big_buck_bunny_720p_H264_AAC_25fps_3400K_short.MP4 ! queue name=qtdemux ! qtdemux ! queue name=dparse ! h264parse ! queue name=h264dec ! v4l2slh264dec ! videorate ! video/x-raw, framerate=15/1 ! queue name=glup ! glupload ! queue name=glscale ! glcolorscale ! queue name=gldown ! gldownload ! video/x-raw, width=960, height=520 ! queue name=h264enc ! v4l2slh264enc ! queue name=eparse ! h264parse ! queue name=qtmux ! qtmux ! queue name=filesink ! filesink location=720p_to_960x520_10fps.mp4 -e | grep ended 2>&1
gpu load 24%
cpu load 39%
Execution ended after 0:00:01.447157184

Transcoding is even faster at 69.4 fps with 39% CPU load and 24% GPU load.

As we have seen before, the framerate reduction increase transcoding speed, but at the cost of loss of framerate.

3.5. Rescale video using GPU in real-time[edit | edit source]

Transcode a 720p 25 fps stream to a 960x520 25 fps stream using GPU downscale limiting transcoding rate to 25 fps using sync=true filesink property, this will assert transcoding pipeline rate to source framerate:

gst-launch-1.0 filesrc location= big_buck_bunny_720p_H264_AAC_25fps_3400K_short.MP4 ! queue name=qtdemux ! qtdemux ! queue name=dparse ! h264parse ! queue name=h264dec ! v4l2slh264dec ! queue name=glup ! glupload ! queue name=glscale ! glcolorscale ! queue name=gldown ! gldownload ! video/x-raw, width=960, height=520 ! queue name=h264enc ! v4l2slh264enc ! queue name=eparse ! h264parse ! queue name=qtmux ! qtmux ! queue name=filesink ! filesink sync=true location=720p_to_960x520_10fps.mp4 -e | grep ended 2>&1
gpu load 28%
cpu load 38.97%
Execution ended after 0:00:03.901553199

Transcoding rate is slowed-down to 25 fps input video framerate which keeps CPU and GPU load low while preserving framerate of input video.

This also shows the CPU and GPU load required for real-time live transcoding.