视频讲解
rk3399使用阿里推理引擎MNN使用cpu和gpu进行benchmark,OpenCL效果不佳?
背景
MNN是阿里开源的推理引擎,今天测试一下在rk3399平台上的benchmark怎么样?
alibaba/MNN: MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba (github.com)
首先git clone
git clone git@github.com:alibaba/MNN.git
创建build目录
cd MNN
mkdir build
cd build
cmake配置
注意交叉编译器以及opencl库的使用方式,是使用系统opencl库还是使用wrap进行dlopen加载
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DMNN_BUILD_DEMO=ON \
-DMNN_BUILD_BENCHMARK=true \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_VERSION=1 \
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
-DMNN_OPENCL=ON \
-DMNN_USE_SYSTEM_LIB=ON \
-DCMAKE_C_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=${cross_compile_toolchain}/bin/aarch64-linux-gnu-g++make -j32
部署
然后将build目录下的libMNN.so以及benchmark.out和上级目录下的benchmark的model放到一起,同时libMNN.so需要放到rk3399的lib目录下
sudo cp libMNN.so /lib
sudo cp -rf ../benchmark/model .
然后运行benchmark测试,第二个参数:loop测试次数,第4个参数:0代表使用cpu,3代表使用opencl
cpu测试
firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 0clear
MNN benchmark
Forward type: CPU thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
[ - ] SqueezeNetV1.0.mnn max = 86.128 ms min = 86.128 ms avg = 86.128 ms
[ - ] MobileNetV2_224.mnn max = 42.041 ms min = 42.041 ms avg = 42.041 ms
[ - ] inception-v3.mnn max = 505.111 ms min = 505.111 ms avg = 505.111 ms
[ - ] mobilenetV3.mnn max = 13.533 ms min = 13.533 ms avg = 13.533 ms
[ - ] nasnet.mnn max = 145.489 ms min = 145.489 ms avg = 145.489 ms
[ - ] mobilenet-v1-1.0.mnn max = 66.624 ms min = 66.624 ms avg = 66.624 ms
[ - ] squeezenetv1.1.mnn max = 40.437 ms min = 40.437 ms avg = 40.437 ms
[ - ] resnet-v2-50.mnn max = 308.836 ms min = 308.836 ms avg = 308.836 ms
gpu测试
firefly@firefly:~/MNN$ sudo ./benchmark.out models/ 1 0 3
MNN benchmark
Forward type: OpenCL thread=4 precision=2 sparsity=0 sparseBlockOC=1 testQuantizedModel=0
--------> Benchmarking... loop = 1, warmup = 0
[-INFO-]: precision=2, use fp16 inference if your device supports and open MNN_ARM82=ON.
The device support i8sdot:0, support fp16:0, support i8mm: 0
arm_release_ver of this libmali is 'r18p0-01rel0', rk_so_ver is '4'.[ - ] SqueezeNetV1.0.mnn max = 159.619 ms min = 159.619 ms avg = 159.619 ms
[ - ] MobileNetV2_224.mnn max = 126.671 ms min = 126.671 ms avg = 126.671 ms
[ - ] inception-v3.mnn max = 800.436 ms min = 800.436 ms avg = 800.436 ms
[ - ] mobilenetV3.mnn max = 61.661 ms min = 61.661 ms avg = 61.661 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] nasnet.mnn max = 140.189 ms min = 140.189 ms avg = 140.189 ms
[ - ] mobilenet-v1-1.0.mnn max = 98.918 ms min = 98.918 ms avg = 98.918 ms
[ - ] squeezenetv1.1.mnn max = 121.158 ms min = 121.158 ms avg = 121.158 ms
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
Map error scalePtrCL == nullptr
Map error biasPtrCL == nullptr
[ - ] resnet-v2-50.mnn max = 428.075 ms min = 428.075 ms avg = 428.075 ms
结论
可以看到,gpu使用上很慢且存在算子的问题,实际上在rk3568上测试opencl很流畅且没有问题,这里留下问题,之后探究