使用labelme工具,使用多边形方式进行标注。
pip install labelme
标注完成后只需要将标注的数据使用opencv的最小外接矩形框进行转化即可。
labelme标注的多边形格式数据转换成COCO旋转框格式,转换脚本代码如下:
import os
import json
import cv2
import numpy as npdef rotate_rectangle(polygon):"""将多边形转换为旋转矩形:param polygon: 多边形的顶点坐标列表:return: 旋转矩形的中心点坐标、宽度、高度和旋转角度"""points = np.array(polygon, dtype=np.float32)rect = cv2.minAreaRect(points)center, (width, height), angle = rectreturn list(center), width, height, angledef get_rectangle_points(center, width, height, angle):"""根据旋转矩形的中心点、宽度、高度和旋转角度计算四个顶点坐标:param center: 中心点坐标:param width: 宽度:param height: 高度:param angle: 旋转角度:return: 四个顶点坐标"""rect = ((center[0], center[1]), (width, height), angle)box = cv2.boxPoints(rect)box = np.int0(box)return box.flatten().tolist()def convert_to_coco_format(input_folder, output_file):"""将LabelMe格式的JSON文件转换为COCO格式:param input_folder: 包含JSON文件的文件夹路径:param output_file: 输出的COCO格式JSON文件路径"""coco_data = {"info": {"description": "HanXi_locate","url": "","version": "1.0","year": 2017,"contributor": "","date_created": "2025-01-09"},"licenses": [],"images": [],"annotations": [],"categories": []}category_set = set() # 用于跟踪已经添加的类别category_id_map = {} # 用于映射类别名称到类别IDimage_id = 1annotation_id = 1for filename in os.listdir(input_folder):if filename.endswith(".json"):with open(os.path.join(input_folder, filename), "r") as f:data = json.load(f)# 添加图像信息image_info = {"id": image_id,"width": data["imageWidth"],"height": data["imageHeight"],"file_name": data["imagePath"],"license": 0,"flickr_url": "","coco_url": "","date_captured": ""}coco_data["images"].append(image_info)# 添加标注信息for shape in data["shapes"]:if shape["shape_type"] == "polygon":polygon = shape["points"]center, width, height, angle = rotate_rectangle(polygon)points = get_rectangle_points(center, width, height, angle)label = shape["label"]if label not in category_set:category_id = len(category_set) + 1category_id_map[label] = category_idcategory_set.add(label)coco_data["categories"].append({"supercategory": "none", "id": category_id,"name": label})# Calculate bounding box coordinatesx, y, w, h = cv2.boundingRect(np.array(points).reshape((-1, 1, 2)))annotation_info = {"id": annotation_id,"image_id": image_id,"category_id": category_id_map[label],"segmentation": [points],"area": width * height,"bbox": [x, y, w, h], # Add bounding box information"iscrowd": 0,"ignore": 0}coco_data["annotations"].append(annotation_info)annotation_id += 1image_id += 1# 保存为COCO格式的JSON文件with open(output_file, "w") as f:json.dump(coco_data, f, indent=4)if __name__ == "__main__":input_folder = "./hanxi"output_file = "val.json"convert_to_coco_format(input_folder, output_file)
最终将数据整理成如下格式:
hanxi_locate_coco
├── annotations
│ ├── train.json
│ ├── val.json
├── train
│ ├── 1.jpg
│ ├── 5.jpg
│ ├── ...
├── val
│ ├── 2.jpg
│ ├── 7.jpg
│ ├── ...
本文使用PaddleDetection套件进行算法训练研发。具体安装方式参照PaddleDetection官网:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
python setup.py install
cd ppdet/ext_op
python setup.py install
选择轻量级的PPYoloE-R算法。具体配置文件参照PaddleDetection/configs/rotate/ppyoloe_r/ppyoloe_r_crn_s_3x_dota_ms.yml。
################################## dataset ##############################
metric: RBOX
num_classes: 1 TrainDataset:name: COCODataSetimage_dir: trainanno_path: annotations/train.jsondataset_dir: dataset/hanxi_locate_coco/data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']EvalDataset:name: COCODataSetimage_dir: valanno_path: annotations/val.jsondataset_dir: dataset/hanxi_locate_coco/data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_poly']TestDataset:name: ImageFolderanno_path: annotations/val.jsondataset_dir: dataset/hanxi_locate_coco/################################## runtime ##############################
use_gpu: true
use_xpu: false
use_mlu: false
use_npu: false
log_iter: 5
save_dir: output_locate
snapshot_epoch: 5
print_flops: false
print_params: false# Exporting the model
export:post_process: True # Whether post-processing is included in the network when export model.nms: True # Whether NMS is included in the network when export model.benchmark: False # It is used to testing model performance, if set `True`, post-process and NMS will not be exported.fuse_conv_bn: False################################## optimizer_3x ##############################
epoch: 50
LearningRate:base_lr: 0.008schedulers:- name: CosineDecaymax_epochs: 44- name: LinearWarmupstart_factor: 0.steps: 1000OptimizerBuilder:clip_grad_by_norm: 35.optimizer:momentum: 0.9type: Momentumregularizer:factor: 0.0005type: L2
################################## ppyoloe_r_reader ##############################
worker_num: 4
image_height: &image_height 512
image_width: &image_width 512
image_size: &image_size [*image_height, *image_width]TrainReader:sample_transforms:- Decode: {}- Poly2Array: {}- RandomRFlip: {}- RandomRRotate: {angle_mode: 'value', angle: [0, 90, 180, -90]}- RandomRRotate: {angle_mode: 'value', angle: [30, 60], rotate_prob: 0.5}- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}- Poly2RBox: {filter_threshold: 2, filter_mode: 'edge', rbox_type: 'oc'}batch_transforms:- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}- Permute: {}- PadRGT: {}- PadBatch: {pad_to_stride: 32}batch_size: 2shuffle: truedrop_last: trueuse_shared_memory: truecollate_batch: trueEvalReader:sample_transforms:- Decode: {}- Poly2Array: {}- RResize: {target_size: *image_size, keep_ratio: True, interp: 2}- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}- Permute: {}batch_transforms:- PadBatch: {pad_to_stride: 32}batch_size: 2collate_batch: falseTestReader:sample_transforms:- Decode: {}- Resize: {target_size: *image_size, keep_ratio: True, interp: 2}- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}- Permute: {}batch_transforms:- PadBatch: {pad_to_stride: 32}batch_size: 2################################## model ##############################
architecture: YOLOv3
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998YOLOv3:backbone: CSPResNetneck: CustomCSPPANyolo_head: PPYOLOERHeadpost_process: ~CSPResNet:layers: [3, 6, 6, 3]channels: [64, 128, 256, 512, 1024]return_idx: [1, 2, 3]use_large_stem: Trueuse_alpha: TrueCustomCSPPAN:out_channels: [768, 384, 192]stage_num: 1block_num: 3act: 'swish'spp: trueuse_alpha: TruePPYOLOERHead:fpn_strides: [32, 16, 8]grid_cell_offset: 0.5use_varifocal_loss: truestatic_assigner_epoch: -1loss_weight: {class: 1.0, iou: 2.5, dfl: 0.05}static_assigner:name: FCOSRAssignerfactor: 12threshold: 0.23boundary: [[512, 10000], [256, 512], [-1, 256]]assigner:name: RotatedTaskAlignedAssignertopk: 13alpha: 1.0beta: 6.0nms:name: MultiClassNMSnms_top_k: 2000keep_top_k: -1score_threshold: 0.1nms_threshold: 0.1normalized: False################################## custom ##############################
weights: output_locate/model_final
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/CSPResNetb_s_pretrained.pdparams
depth_mult: 0.33
width_mult: 0.50
PU单卡训练
CUDA_VISIBLE_DEVICES=0
python tools/train.py -c config_locate.yml
GPU多卡训练
CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c config_locate.yml
# 导出模型
python tools/export_model.py -c config_locate.yml -o weights=output_locate/model_final.pdparams# 预测图片
python deploy/python/infer.py --image_dir dataset/hanxi_locate_coco/val --model_dir=output_locate/output_inference/config_locate --device=gpu