Blade 3 YOLOv5 TensorRT benchmark

Referring to this benchmark (YOLOv5 TensorRT Benchmark for NVIDIA® Jetson™ AGX Xavier™ and NVIDIA® Laptop), I also tested the very popular YOLOv5 with the Blade 3 at hand to see how it works on the RK3588 chip.


Briefly, RKNN-Toolkit is a software development kit that provides users with model conversion, inference and performance evaluation on PC and Rockchip NPU platforms(RK1808/RK1806/RK3399Pro/RV1109/RV1126). GitHub address: GitHub - rockchip-linux/rknn-toolkit. Thanks to airockchip and shaoshengsong for sharing the trained YOLOv5 mode.
The Blade 3 operating system is Debian11.

First, build YOLOv5 demo on Blade 3:

  1. Install adb on the Debian system. It is easy to install, you can google some tutorials. In addition, you can connect the keyboard to the Blade 3 to use the terminal.(change to root command ‘sudo su’)
  2. adb shell into the blade3
  3. Install necessary packages on Blade 3( If your sources.list is so slow, you can change it to another. Just search by Google how to change the sources.list in Debian).
apt update
apt install gcc cmake git build-essential
  1. Install the NPU demo on Blade 3. (download links: GitHub - rockchip-linux/rknpu2, use YOLOv5 small model)
cd /data
git clone https://github.com/rockchip-linux/rknpu2.git
  1. Enter into YOLOv5 demo directory
cd /data/rknpu2/examples/rknn_yolov5_demo
  1. Use vi command to edit build-linux_RK3588.sh. Change GCC_COMPILER and LD_LIBRARY_PATH .
set -e

TARGET_SOC="rk3588"
GCC_COMPILER=/usr/bin/aarch64-linux-gnu

export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu:$LD_LIBRARY_PATH
export CC=${GCC_COMPILER}-gcc
export CXX=${GCC_COMPILER}-g++

ROOT_PWD=$( cd "$( dirname $0 )" && cd -P "$( dirname "$SOURCE" )" && pwd )

# build
BUILD_DIR=${ROOT_PWD}/build/build_linux_aarch64

if [[ ! -d "${BUILD_DIR}" ]]; then
  mkdir -p ${BUILD_DIR}
fi

cd ${BUILD_DIR}
cmake ../.. -DCMAKE_SYSTEM_NAME=Linux -DTARGET_SOC=${TARGET_SOC}
make -j4
make install
cd -
  1. Build the demo
./build-linux_RK3588.sh
  1. Run the demo
cd install/rknn_yolov5_demo_Linux
./rknn_yolov5_demo ./model/RK3588/yolov5s-640-640.rknn ./model/bus.jpg 

Output results:

img width = 640, img height = 640
Loading mode...
sdk version: 1.4.0 (a10f100eb@2022-09-09T09:07:14) driver version: 0.7.2
model input num: 1, output num: 3
  index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
  index=0, name=output, n_dims=5, dims=[1, 3, 85, 80], n_elems=1632000, size=1632000, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=77, scale=0.080445
  index=1, name=371, n_dims=5, dims=[1, 3, 85, 40], n_elems=408000, size=408000, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=56, scale=0.080794
  index=2, name=390, n_dims=5, dims=[1, 3, 85, 20], n_elems=102000, size=102000, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=69, scale=0.081305
model is NHWC input fmt
model input height=640, width=640, channel=3
once run use 40.653000 ms
loadLabelName ./model/coco_80_labels_list.txt
person @ (114 235 212 527) 0.819099
person @ (210 242 284 509) 0.814970
person @ (479 235 561 520) 0.790311
bus @ (99 141 557 445) 0.693320
person @ (78 338 122 520) 0.404960
loop count = 10 , average run  34.209300 ms

The output image is named as out.jpg

  1. Get the benchmark of these test images
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1nZzd4jOM6XyVmne_BtmrmHGHGfrsBstP' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/ \n/p')&id=1nZzd4jOM6XyVmne_BtmrmHGHGfrsBstP" -O coco_calib.zip && rm -rf /tmp/cookies.txt
unzip coco_calib.zip 

Without further ado, let’s see how many items we can identify in the photos, along with the average image processing time: