香橙派 AIpro开发体验：使用YOLOV8对USB摄像头画面进行目标检测

前言
一、香橙派AIpro硬件准备
二、连接香橙派AIpro
- 1. 通过网线连接路由器和香橙派AIpro
- 2. 通过wifi连接香橙派AIpro
- 3. 使用vscode 通过ssh连接香橙派AIpro
三、USB摄像头测试
- 1. 配置ipynb远程开发环境
- - 1.1 创建一个video.ipynb 文件
  - 1.2 在远程主机中安装jupyter插件和python 插件
- 2. 使用opencv读取USB摄像头进行拍照
- 3. 使用opencv读取USB摄像头进行实时拍摄显示
四、使用yolov8进行目标检测
- 1. 使用torch cpu推理yolov8
- 2. 使用opencv推理onnx模型
- - 2.1 导出yolov8的onnx模型
  - 2.2 onnx推理
- 3. 使用npu 推理yolov8
- - 3.1 onnx转换为OM模型
  - 3.2 添加交换空间
  - 3.3 npu推理
五、总结
六、参考

前言

YOLOv8 作为最新的目标检测算法，以其高精度、高速度和易用性，成为许多开发者首选。而香橙派 AIpro 作为一款高性能嵌入式开发板，采用昇腾AI技术路线，集成图形处理器，拥有8GB/16GB LPDDR4X，8/20 TOPS AI算力，为 AI 应用提供了坚实的硬件基础。本篇文章将分享使用香橙派 AIpro 和 YOLOv8 结合 USB 摄像头进行物体检测的实战经验，并探讨其在实际应用中的价值。

一、香橙派AIpro硬件准备

香橙派 AIpro 开发板、USB 摄像头、电源适配器，网线，micro SD卡预烧录ubuntu系统。

二、连接香橙派AIpro

1. 通过网线连接路由器和香橙派AIpro

为了确保香橙派AIpro与网络的稳定连接，我们采用网线将其直接接入路由器。随后，在电脑上运行ip扫描器对内网进行全面扫描，成功识别到设备“orangepiaipro”，其IP地址为192.168.1.7。

2. 通过wifi连接香橙派AIpro

我们在登录香橙派AIpro之后，可以参照以下方法进行wifi的连接。
扫描wifi

sudo nmcli dev wifi

连接wifi

sudo nmcli dev wifi connect wifi名称 password wifi密码

3. 使用vscode 通过ssh连接香橙派AIpro

Tip：使用vscode可以像本地开发一样，在香橙派AIpro上进行远程开发。
安装vscode 插件
1.Remote - SSH
2.Remote - SSH: Editing
3.Remote Explorer

创建一个ssh连接，用户名默认为HwHiAiUser，登录密码默认为Mind@123

ssh HwHiAiUser@192.168.1.7

我们连接上之后打开桌面文件夹，在桌面进行开发

选择桌面路径

同时我们打开终端

三、USB摄像头测试

1. 配置ipynb远程开发环境

1.1 创建一个video.ipynb 文件

创建好之后保存在桌面文件夹内，vscode会同步这个目录的文件，方便我们进行开发。

1.2 在远程主机中安装jupyter插件和python 插件

我们对这两个主要的插件进行安装，其他插件会自动安装上。

然后我们打开video.ipynb 文件选择我们需要的python版本。
python3.10.12 是系统自动的python版本。
base(python 3.9.2) 是anaconda的基础python版本。
我们应该使用conda 环境，最好是新建conda环境，来避免可能出现的环境依赖问题。

2. 使用opencv读取USB摄像头进行拍照

我们可能会遇到无法读取摄像头的错误，是因为没有权限访问摄像头。

直接对摄像头的权限进行降级，让当前用户可以访问

sudo  chmod 666 /dev/video0

我们在video.ipynb中进行代码编写，可以直接显示摄像头画面

import cv2
from IPython.display import display, Imagecamera = cv2.VideoCapture(0)
camera.set(cv2.CAP_PROP_FOURCC,cv2.VideoWriter_fourcc('M','J','P','G'))
if not camera.isOpened():raise IOError("Impossible d'ouvrir la webcam")
ret, frame = camera.read()
if not ret:raise IOError("Impossible de capturer une image")
display(Image(data=cv2.imencode('.jpg', frame)[1]))
camera.release()

运行效果如下：

3. 使用opencv读取USB摄像头进行实时拍摄显示

我们在video.ipynb中进行如下python代码编写，可以直接显示摄像头画面

import cv2
import numpy as np
from IPython.display import display, clear_output,Image# Initialize the camera
camera = cv2.VideoCapture(0)  # Use 0 for the default camera
# Set the codec to MJPG if it is supported
if camera.isOpened():camera.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'))
else:raise IOError("Cannot open the webcam")
try:while True:# Capture frame-by-frameret, frame = camera.read()if not ret:raise IOError("Cannot capture frame")# Display the imageclear_output(wait=True)# Afficher l'image capturéedisplay(Image(data=cv2.imencode('.jpg', frame)[1]))
finally:# When everything done, release the capturecamera.release()

本次使用的usb摄像头帧率比较低，所以有拖影，但从实时性的体验上来说，还是非常不错的。
在这里插入图片描述

四、使用yolov8进行目标检测

1. 使用torch cpu推理yolov8

本次测试使用的版本为yolov8.2 ，首先将yolov8中的ultralytics文件夹拖到香橙派AIpro的桌面上。

然后在video.ipynb中进行代码编写，调用yolov8库进行推理

import cv2
import numpy as np
from IPython.display import display, clear_output,Image
from ultralytics import YOLO
from time import time
# Load a model
model = YOLO('yolov8n.pt')  # pretrained YOLOv8n model
# Initialize the camera
camera = cv2.VideoCapture(0)  # Use 0 for the default camera
# Set the codec to MJPG if it is supported
if camera.isOpened():# camera.set(cv2.CAP_PROP_FRAME_WIDTH, 1280.0)# camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 720.0)camera.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'))
else:raise IOError("Cannot open the webcam")
try:while True:# Capture frame-by-frameret, frame = camera.read()if not ret:raise IOError("Cannot capture frame")s = time()results = model(frame,conf=0.25,iou=0.5,verbose=False)print(time()-s)for r in results:im = r.plot()# Display the imageclear_output(wait=True)# Afficher l'image capturéedisplay(Image(data=cv2.imencode('.jpg', im)[1]))
finally:# When everything done, release the capturecamera.release()

直接调用原始库推理速度约为0.5s 一次

香橙派AIpro直接调用yolov8库使用torch cpu进行推理，推理时占用2核cpu，整体占用50%，如果多线程实现应该在0.2s左右，就是4-5帧。推理时内存占用也不高，表现还是不错的。

2. 使用opencv推理onnx模型

2.1 导出yolov8的onnx模型

2.2 onnx推理

编写python代码，使用opencv dnn推理onnx模型并读取usb摄像头进行检测

import cv2
import numpy as np
from IPython.display import display, clear_output,Image
from time import time
import cv2.dnn
from ultralytics.utils import ASSETS, yaml_load
from ultralytics.utils.checks import check_yaml
CLASSES = yaml_load(check_yaml('coco128.yaml'))['names']
colors = np.random.uniform(0, 255, size=(len(CLASSES), 3))
model: cv2.dnn.Net = cv2.dnn.readNetFromONNX("yolov8n.onnx")def draw_bounding_box(img, class_id, confidence, x, y, x_plus_w, y_plus_h):label = f'{CLASSES[class_id]} ({confidence:.2f})'color = colors[class_id]cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)cv2.putText(img, label, (x - 10, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)def main(original_image):[height, width, _] = original_image.shapelength = max((height, width))image = np.zeros((length, length, 3), np.uint8)image[0:height, 0:width] = original_imagescale = length / 640blob = cv2.dnn.blobFromImage(image, scalefactor=1 / 255, size=(640, 640), swapRB=True)model.setInput(blob)outputs = model.forward()outputs = np.array([cv2.transpose(outputs[0])])rows = outputs.shape[1]boxes = []scores = []class_ids = []for i in range(rows):classes_scores = outputs[0][i][4:](minScore, maxScore, minClassLoc, (x, maxClassIndex)) = cv2.minMaxLoc(classes_scores)if maxScore >= 0.25:box = [outputs[0][i][0] - (0.5 * outputs[0][i][2]), outputs[0][i][1] - (0.5 * outputs[0][i][3]),outputs[0][i][2], outputs[0][i][3]]boxes.append(box)scores.append(maxScore)class_ids.append(maxClassIndex)result_boxes = cv2.dnn.NMSBoxes(boxes, scores, 0.25, 0.45, 0.5)detections = []for i in range(len(result_boxes)):index = result_boxes[i]box = boxes[index]detection = {'class_id': class_ids[index],'class_name': CLASSES[class_ids[index]],'confidence': scores[index],'box': box,'scale': scale}detections.append(detection)draw_bounding_box(original_image, class_ids[index], scores[index], round(box[0] * scale), round(box[1] * scale),round((box[0] + box[2]) * scale), round((box[1] + box[3]) * scale))# Initialize the camera
camera = cv2.VideoCapture(0)  # Use 0 for the default camera
# Set the codec to MJPG if it is supported
if camera.isOpened():# camera.set(cv2.CAP_PROP_FRAME_WIDTH, 1280.0)# camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 720.0)camera.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'))
else:raise IOError("Cannot open the webcam")
try:while True:# Capture frame-by-frameret, frame = camera.read()if not ret:raise IOError("Cannot capture frame")s = time()main(frame)print(time()-s)# Display the imageclear_output(wait=True)# Afficher l'image capturéedisplay(Image(data=cv2.imencode('.jpg', frame)[1]))finally:# When everything done, release the capturecamera.release()

onnx推理使用单核cpu，推理一次的速度约为0.7s

3. 使用npu 推理yolov8

3.1 onnx转换为OM模型

将ONNX模型转换为OM模型，用CANN提供的ATC工具将其转换为昇腾AI处理器能识别的OM模型。

atc --framework=5 --model=yolov8n.onnx --input_format=NCHW --output=yolov8n --soc_version=Ascend310B4

atc命令中各参数的含义如下：
–framework：原始框架类型，5表示ONNX。
–model：ONNX模型文件存储路径。
–input_format：输入的格式定义
–output：离线om模型的路径以及文件名。
–soc_version：昇腾AI处理器的型号。
在服务器种执行npu-smi info命令进行查询，在查询到的“Name”前增加Ascend信息，例如“Name”对应取值为310B4，实际配置的–soc_version值为Ascend310B4。

3.2 添加交换空间

若出现以下错误则是内存不足，可以添加交换空间

BrokenPipeError: [Errno 32] Broken pipe
/usr/local/miniconda3/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 97 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’

使用 root 用户执行:

su -

创建一个用于交换空间的文件,创建8GB的交换文件:

mkswap /swapfile

设置交换文件

mkswap /swapfile

启用交换空间

swapon /swapfile

编辑/etc/fstab文件,使交换空间开机自动挂载:

echo '/swapfile none swap defaults 0 0' >> /etc/fstab

6.验证交换空间是否生效

free -m

通过top监控可以看到转换过程占用内存大概 12G左右，不添加虚拟缓存内存是不够用的。

3.3 npu推理

编写python代码使用npu推理yolov8 对usb摄像头进行检测

import os# Verify the path
print(os.environ['LD_LIBRARY_PATH'])
import cv2
import numpy as np
from IPython.display import display, clear_output,Image
from time import time
from ais_bench.infer.interface import InferSession
from ultralytics.utils import ASSETS, yaml_load
from ultralytics.utils.checks import check_yamlCLASSES = yaml_load(check_yaml('coco128.yaml'))['names']colors = np.random.uniform(0, 255, size=(len(CLASSES), 3))model = InferSession(device_id=0, model_path="yolov8n.om")def draw_bounding_box(img, class_id, confidence, x, y, x_plus_w, y_plus_h):label = f'{CLASSES[class_id]} ({confidence:.2f})'color = colors[class_id]cv2.rectangle(img, (x, y), (x_plus_w, y_plus_h), color, 2)cv2.putText(img, label, (x - 10, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)def main(original_image):[height, width, _] = original_image.shapelength = max((height, width))image = np.zeros((length, length, 3), np.uint8)image[0:height, 0:width] = original_imagescale = length / 640blob = cv2.dnn.blobFromImage(image, scalefactor=1 / 255, size=(640, 640), swapRB=True)begin_time = time()outputs = model.infer(feeds=blob, mode="static")end_time = time()print("om infer time:", end_time - begin_time)outputs = np.array([cv2.transpose(outputs[0][0])])rows = outputs.shape[1]boxes = []scores = []class_ids = []for i in range(rows):classes_scores = outputs[0][i][4:](minScore, maxScore, minClassLoc, (x, maxClassIndex)) = cv2.minMaxLoc(classes_scores)if maxScore >= 0.25:box = [outputs[0][i][0] - (0.5 * outputs[0][i][2]), outputs[0][i][1] - (0.5 * outputs[0][i][3]),outputs[0][i][2], outputs[0][i][3]]boxes.append(box)scores.append(maxScore)class_ids.append(maxClassIndex)result_boxes = cv2.dnn.NMSBoxes(boxes, scores, 0.25, 0.45, 0.5)detections = []for i in range(len(result_boxes)):index = result_boxes[i]box = boxes[index]detection = {'class_id': class_ids[index],'class_name': CLASSES[class_ids[index]],'confidence': scores[index],'box': box,'scale': scale}detections.append(detection)draw_bounding_box(original_image, class_ids[index], scores[index], round(box[0] * scale), round(box[1] * scale),round((box[0] + box[2]) * scale), round((box[1] + box[3]) * scale))# Initialize the camera
camera = cv2.VideoCapture(0)  # Use 0 for the default camera# Set the codec to MJPG if it is supported
if camera.isOpened():# camera.set(cv2.CAP_PROP_FRAME_WIDTH, 1280.0)# camera.set(cv2.CAP_PROP_FRAME_HEIGHT, 720.0)camera.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'))
else:raise IOError("Cannot open the webcam")# Define the codec and create VideoWriter object
# Get the width and height of the frames
frame_width = int(camera.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT))
print(f"Frame width: {frame_width}, Frame height: {frame_height}")# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi', fourcc, 30.0, (frame_width, frame_height))  # 20.0 is the frame ratetry:_start_time = time()while time() - _start_time < 5:# Capture frame-by-frameret, frame = camera.read()if not ret:raise IOError("Cannot capture frame")main(frame)out.write(frame)# Display the image# clear_output(wait=True)# # Afficher l'image capturée# display(Image(data=cv2.imencode('.jpg', frame)[1]))finally:# When everything done, release the capturecamera.release()out.release()

与前面torch和onnx 的推理相比，基于昇腾CANN的推理效果，在速度上有了质的飞跃。

yolov8使用npu推理一帧的速度达到了惊人的0.017s,相比cpu提升了20-30倍。

五、总结

昇腾CANN框架的优势:

推理速度显著提升: 在使用YOLOv8模型进行推理时，我发现昇腾CANN的单帧处理速度能达到0.017秒，相比CPU提升了20-30倍，这对于实时性要求高的应用场景（如自动驾驶、安防监控）至关重要。
兼容性与扩展性强: 昇腾CANN支持多种模型和算法，并随着昇腾硬件的升级不断提升性能，为开发者提供了更广阔的选择空间。
香橙派AIpro开发板的优势:
硬件性能出色: 能够流畅运行复杂的AI算法和模型，满足我的开发需求。
易用性高: 简单的设置和配置就能将AI应用部署到开发板上进行测试和验证，极大提升了开发效率。
扩展性强: 丰富的接口方便连接其他硬件设备，为开发更复杂的AI应用提供了更多可能性。

总的来说，我对其高效的推理速度、便捷的开发体验以及强大的扩展性印象深刻。
香橙派AIpro开发板为开发者提供了一个优秀的平台，可以方便地体验和利用昇腾CANN强大的AI推理能力。我相信，随着昇腾CANN和香橙派AIpro开发板的不断发展，它们将进一步推动AI技术的应用和普及。