宿主机:台式电脑 Ubuntu20.04
开发板:A1000(烧录版本SDK v2.3.1.2)
模型转换容器:bsnn-tools-container-stk-4.2.0
编译容器:a1000b-sdk-fad-2.3.1.2
yolov5使用工程:黑芝麻根据https://github.com/ultralytics/yolov5
修改后的yolov5 float
工程(yolov5 float
可从黑芝麻资料网站下载)
一、yolov5训练 onnx转换 测试验证
黑芝麻A1000目前只支持浮点型训练的模型,且使用Relu在A1000上运行效率更高,所以yolov5 float
中的bst_yolov5
工程基于https://github.com/ultralytics/yolov5
基础,已经修改了训练方式和激活函数,并且添加了opt优化。具体修改细节自行对比。
1.训练数据集
bst_yolov5/yolov5/datasets路径中已有coco128数据集
2.环境配置
参考requirements.txt。我用的还是之前配置好的conda yolov8环境,额外安装了IPython、onnx相关库(采用清华源加快下载安装):
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple IPython
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorboard
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple onnx
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple onnx-simplifier
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple onnx2torch
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple onnxruntime
3.模型训练
coco128.yaml在data中,我复制了出来,修改coco128.yaml中相关路径。进入终端输入指令:
python train.py --data coco128.yaml --weights yolov5m.pt --img 640 --epochs 300
将weights中的best.pt拷贝到bst_yolov5/yolov5中,并重命名yolov5m_relu_0921.pt
注意
:
如果这里训练出现AttributeError: 'FreeTypeFont' object has no attribute 'getsize'
报错,是因为新版本的 Pillow删除了该getsize 功能,降级到 Pillow 9.5 就可以解决该问题。直接输入指令安装即可:
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple Pillow==9.5
4.onnx转换
转换程序与https://github.com/ultralytics/yolov5
中的export.py有所改动,具体请自行对比。
运行指令:
python export.py --weights yolov5m_relu_0921.pt --include=onnx
5.验证测试onnx
1)验证
输入指令:
python val.py --weights yolov5m_relu_0921.onnx --data coco128.yaml --img 640
结果:
2)测试
输入指令:
python detect.py --weights yolov5m_relu_0921.onnx --source test.jpg
二、toolchain 模型转换及量化测试
1.准备toolchain工具模型转换所需文件
1)inputs_jpg
2)image_process_config.json
{"source_image_format": "RGB","target_image_format": "RGB","mean": [0,0,0],"scale": [0.00390625,0.00390625,0.00390625]
}
3)run.yaml
# Target engine must be set.
device_engine: A1000B0
# Tag your mode name.
model_name: yolov5s_relu_20240918
# Path to your float onnx model.
model_path: /workspace/models/yolov5_bstnnx/yolov5m_relu_0921.onnx
# Folder of calibration data.
input_data_set_path: /workspace/models/yolov5_bstnnx/inputs_jpg
# Setup data reader.
data_reader_method: image_folder_data_reader
# How to calculate "scale" and "mean" in image_process_config, let's assume X' = (X-mean)/std
# mean = mean; scale = np.floor(std/2**-12 + 0.5) * 2**-12 -> std=1/255 -> scale = 0.00390625
image_process_config: /workspace/models/yolov5_bstnnx/image_process_config.json
# Number of calibration data used in quantization.
size_limit: 500
# Enable auto_ptq
auto_ptq: /workspace/models/yolov5_bstnnx/task_info_for_autoptq.yaml
# Setup result_dir
result_dir: auto_ptq_092101
# Setup priority range
priority_range: 100-1200# General stage settings
stage:- stage_name: pre_processing_stagepriority: 100- stage_name: graph_optimization_stagepriority: 200- stage_name: quantization_stagepriority: 300- stage_name: graph_partition_stagepriority: 400- stage_name: section_binding_stagepriority: 500- stage_name: code_generation_stagepriority: 600- stage_name: code_compilation_stagepriority: 700- stage_name: run_emulation_stageprofiling_mode: 2priority: 800- stage_name: hardware_testing_stageXTSC_NET_SIM: Truegenerate_rbf_only: Truepriority: 1100- stage_name: report_generation_stagepriority: 1200- stage_name: userland_stagepriority: 1500
4)task_info_for_autoptq.yaml
task_settings:quantization_method: ["weight_perchannel_kl"]convert_int8conv_to_int16: [[]]# convert_int8conv_to_int16: [[], ["/model.24/m.0/Conv", "/model.24/m.1/Conv", "/model.24/m.2/Conv"]] unadjustable_activations_last_n: [-1, 0, 1]calibration_method: ["minmax", "kl", "percentile_0.999"]# calibration_method: ["minmax", "kl", "percentile_0.999", "percentile_0.998", "percentile_0.997", "percentile_0.996", "percentile_0.995"]auto_update: false # default to False, # if set to True it will align the current task_order # with current task_settings and append the new tasks # to the queue of task orders.# task_order_schema:
# ["index", quantization_method", "convert_int8conv_to_int16",
# "unadjustable_activations_last_n", "calibration_method"]
5)yolov5m_relu_0921.onnx
全部存放在新建的yolov5_bstnnx文件夹中
2.从宿主机将yolov5_bstnnx拷贝到模型转换容器
运行bsnn-tools-container-stk-4.2.0容器,将yolov5_bstnnx拷贝到bsnn-tools-container-stk-4.2.0容器中:
docker cp /home/stk/bst_yolov5/yolov5_bstnnx bsnn-tools-container-stk-4.2.0:/workspace/models/
登陆容器的jupyter,yolov5_bstnnx已在路径中
注意修改run.yaml里面的相关路径。
3.进行模型转换
在容器中运行指令:
bstnnx_run --config models/yolov5_bstnnx/run.yaml
将结果/workspace/auto_ptq_0921/1100_HardwareTestingStage/yolov5s_relu_20240918.20240921032730.hw_test_config拷贝到宿主机/home/stk/heizhima路径中:
docker cp bsnn-tools-container-stk-4.2.0:/workspace/auto_ptq_092101/1100_HardwareTestingStage/yolov5s_relu_20240918.20240921032730.hw_test_config /home/stk/heizhima/
4.板端模型量化测试
启动开发板,将yolov5s_relu_20240918.20240921032730.hw_test_config从宿主机拷贝到开发板:
adb push /home/stk/heizhima/yolov5s_relu_20240918.20240921032730.hw_test_config /home/root
在板端运行:
cd /home/root/yolov5s_relu_20240918.20240921032730.hw_test_config
./run_dsp.sh
测试通过。
三、基于转换后的模型文件,创建编译cmake工程
1.首先查看黑芝麻给的cmake工程demo
从黑芝麻资料网站中BSNN工具链项目模块,下载 Demos示例代码 / yolov5 / linux_yolov5_demo 中的yolov5_demo_opencv,其目录如下:
其中:
1)3rdparty
和usr
两个文件夹是bst-bsnn相关库文件等;
2)yolov5model
是通过toolchain转换后1100_HardwareTestingStage 后缀.hw_test_config中 得到的weights.bin、.meta、.lib三个文件;
3)src
中就是检测程序;
4)start_yolov5m.sh
是后面在开发板上运行用到的运行脚本。
2.创建自己的cmake工程
新建一个yolov5demo文件夹,将demo中的3rdparty
、usr
、src
、CMakeLists.txt
直接拷贝进来,并新建一个yolov5s_model文件夹:
将前面yolov5s_relu_20240918.20240921032730.hw_test_config中weights.bin,及fw_integration中.meta、.lib文件
拷贝到cmake工程yolov5demo/yolov5s_model中并重命名:
3.修改自己的cmake工程
可以根据自己的需求修改CMakeLists.txt、src中的代码。
这里我就微微修改了CMakeLists.txt:
修改src/main.cpp,因为我没有接显示器,所以注释了cv显示部分,增加检测结果图片保存:
#include <iostream>
#include <opencv2/opencv.hpp>
#include "bsnn_model_load.h"
#include "drmshow.h"
#include "image_process.h"
#include <string.h>using namespace std;int main(int argc, char* argv[])
{// std::string in_image_path = "./000000000139.jpg";std::string in_image_path = "./fad_video.avi";std::string bsnn_model_path = "./model"; cv::Mat img, input_image;// asic_type_check();if(2 <= argc) {in_image_path = argv[1];}if(3 <= argc) {bsnn_model_path = argv[2];}cv::VideoCapture capture(in_image_path);BSNN_MODEL bsnn_model(bsnn_model_path);Timer time, fps;while (true){// step 1 图片预处理capture >> img;if(img.empty()){printf("img.empty = %d\n",img.empty());capture.release();capture.open(in_image_path);continue;}time.reset();fps.reset();preprocess(img, input_image);size_t input_img_len = 3 * IMG_HEIGHT * IMG_WIDTH;// 更改图片在内存中的存储方式为WHDif (!input_image.isContinuous())printf("-> input image is not continuous in memory...");uchar input_buf[input_img_len] = {0};for (int c = 0; c < 3; c++){for (int i = 0; i < IMG_HEIGHT; i++){for (int j = 0; j < IMG_WIDTH; j++){input_buf[c * IMG_HEIGHT * IMG_WIDTH + i * IMG_WIDTH + j] = input_image.data[3 * (i * IMG_WIDTH + j) + c];}}}cout << "-> preprocess time : " << time.elapsed() << endl;// step 2 模型推理time.reset();bsnn_model.Run(input_buf, input_img_len);auto bsnn_output = bsnn_model.GetModelOutput();printf("-> lite engine inference FPS: %.2f\n", 1.0 / time.elapsed() * 1000);cout << "-> bsnn model inference time : " << time.elapsed() << endl;// step 3 后处理time.reset();std::vector<ObjInfo> result;process_output(bsnn_output.get(), result);//cout << "-> The number of objects detected: " << result.size() << endl;cout << "-> post process time : " << time.elapsed() << endl;draw_bboxes(img, result);cv::imwrite("./inferresult.jpg",img);bsnn_model.ReleaseOutputBuffer();// printf("-> full flow FPS: %.2f\n", 1.0 / fps.elapsed() * 1000);cv::Mat show_img;cv::resize(img, show_img, cv::Size(1920, 1080));//cv::imshow("img", show_img);//cv::waitKey(1);}return 0;
}
4.编译工程
1)拷贝
关闭模型转换容器,启动a1000b-sdk-fad-2.3.1.2容器,将yolov5demo拷贝到容器中:
docker cp yolov5demo a1000b-sdk-fad-2.3.1.2:opt/bstos/2.3.1.2/sysroots/aarch64-bst-linux/usr/include/src
此时容器中,有了yolov5demo:
cd /opt/bstos/2.3.1.2/sysroots/aarch64-bst-linux/usr/include/src
ls #查看目录中是否有yolov5demo
2)编译
容器中编译:
cd yolov5demo
mkdir build
cd build
cmake ..
make
四、开发板运行
1.将编译好的yolov5demo从容器拷贝到宿主机
将编译好的yolov5demo从容器拷贝到宿主机
docker cp a1000b-sdk-fad-2.3.1.2:opt/bstos/2.3.1.2/sysroots/aarch64-bst-linux/usr/include/src/yolov5demo /home/stk/heizhima
2.整合运行文件
在yolov5demo中新建yolov5demo_run文件夹,在其中:
1)创建app文件夹
将yolov5demo的build文件夹中可执行文件yolov5s
拷贝到app中:
2)创建yolov5s文件夹
将yolov5demo的yolov5s_model文件夹中三个文件,拷贝到yolov5s文件夹中:
3)创建datasets文件夹
该文件夹内存放运行用的检测图片或者视频,这里我存放了一些图片和视频:
4)根据需求是否创建保存检测结果的文件夹
在yolov5demo/src/main.cpp中,我直接将检测后的图片保存在当前运行目录路径,无需创建。
5)创建运行脚本文件
创建start_yolov5s.sh,内容,其中前四行是黑芝麻给的:
#!/bin/sh
mkdir -p /run/user/1000
export XDG_RUNTIME_DIR="/run/user/1000"echo 0 > /sys/devices/platform/vsp@1/enable && weston --tty=1 &./app/yolov5s ./datasets/fad_quick_start_video_01.avi ./yolov5s
最终,yolov5demo_run文件夹:
3.运行
1) yolov5demo中usr/lib64 文件夹下的文件cp到开发板 usr/lib/文件夹下
adb push libbsnn.so /usr/lib/
adb push libbsnn.so.3 /usr/lib/
adb push libbsnn.so.3 /usr/lib/
2)将yolov5demo_run文件夹从宿主机拷贝到开发板上:
adb push /home/stk/heizhima/yolov5demo/yolov5demo_run /userdata
3)运行
cd /userdata/yolov5demo_run
chmod +x start_yolov5s.sh
./start_yolov5s.sh
运行过程中,随机时间拷贝结果到宿主机查看:
adb pull /userdata/yolov5demo_run/inferresult.jpg /home/stk/heizhima