重要的事说在前面
数据集:
https://pan.baidu.com/s/1YayAeqgdqZ0u2vSovd0Z4w
提取码:8888
如果作者误删的话,参考这里下载的CCPD2019.tar.xz和CCPD2020.zip获取。
背景
上一节车牌识别之一:车牌检测(包含全部免费的数据集、源码和模型下载)利用物体检测技术将车牌已经定位出来了,包括外包框和四个角点,本文进行车牌区域的OCR识别探讨
依赖
paddlepaddle-gpu==2.6.2
shapely
scikit-image
six
pyclipper
lmdb
tqdm
numpy
rapidfuzz
opencv-python
opencv-contrib-python
cython
Pillow
pyyaml
requests
albumentations==1.4.10
# to be compatible with albumentations
albucore==0.0.13
安装paddleocr
paddleocr是目前认为针对中文识别精度最高的框架,这是本项目的不二选择
git clone https://github.com/PaddlePaddle/PaddleOCR
cd PaddleOCR
pip install -v -e .
本项目所测试的paddleocr的版本是2.9.0
将数据集下载至data/ccpd并解压
解压后的数据目录为:
data/ccpd
├── CCPD2019
├── CCPD2019.tar.xz
├── CCPD2020
└── CCPD2020.zip
生成训练测试数据集
然后解压后的数据以5:1的大小生成训练数据, 新建gen_paddleocr_format_data.py,内容为:
import os,sys
import glob
import random
import cv2
import numpy as np
from tqdm import tqdm
provincelist = ["皖", "沪", "津", "渝", "冀","晋", "蒙", "辽", "吉", "黑","苏", "浙", "京", "闽", "赣","鲁", "豫", "鄂", "湘", "粤","桂", "琼", "川", "贵", "云","西", "陕", "甘", "青", "宁","新"]wordlist = ["A", "B", "C", "D", "E","F", "G", "H", "J", "K","L", "M", "N", "P", "Q","R", "S", "T", "U", "V","W", "X", "Y", "Z", "0","1", "2", "3", "4", "5","6", "7", "8", "9"]
def gen_paddlecor_format_data(rootpath, dstpath):os.makedirs(dstpath, exist_ok=True)os.makedirs(os.path.join(dstpath, 'train'), exist_ok=True)os.makedirs(os.path.join(dstpath, 'val'), exist_ok=True)list_images = glob.glob(f'{rootpath}/**/*.jpg', recursive=True)for imgpath in tqdm(list_images):if "/ccpd_np/" in imgpath:#cpd_np是没有车牌的图片,跳过continue#print(imgpath)img = cv2.imread(imgpath)# 图像名imgname = os.path.basename(imgpath).split('.')[0]# 根据图像名分割标注_, _, box, points, label, brightness, blurriness = imgname.split('-')# --- 边界框信息box = box.split('_')box = [list(map(int, i.split('&'))) for i in box]box_w = box[1][0]-box[0][0]box_h = box[1][1]-box[0][1]filename = label# --- 读取车牌号label = label.split('_')# 省份缩写province = provincelist[int(label[0])]# 车牌信息words = [wordlist[int(i)] for i in label[1:]]# 车牌号label = province+''.join(words)#print(label)img_plate = img[box[0][1]:box[1][1], box[0][0]:box[1][0]]random_number = random.uniform(0, 1)if random_number > 0.1:#traindst_img_path = os.path.join(dstpath, 'train', f"{filename}.jpg")labelfileadd = os.path.join(dstpath, 'train.txt',)labelcontent = f"train/{filename}.jpg\t{label}\n"else:#valdst_img_path = os.path.join(dstpath, 'val', f"{filename}.jpg")labelfileadd = os.path.join(dstpath, 'val.txt',)labelcontent = f"val/{filename}.jpg\t{label}\n"cv2.imwrite(dst_img_path, img_plate)with open(labelfileadd, 'a') as f:f.write(labelcontent)
def gent_license_dict(dict_save_path):allwordlist = wordlist + provincelistallwordlist.sort()print(allwordlist)with open(dict_save_path, 'w') as f:for word in allwordlist:f.write(word+'\n')
if __name__ == '__main__':if len(sys.argv)!= 3:print("Usage: python gen_paddlecor_format_data.py <ccpd_dataset_path> <output_path>")exit(1)gen_paddlecor_format_data(sys.argv[1], sys.argv[2])gent_license_dict("data/ccpd_paddleocr/dict.txt")
运行命令:
python gen_yolo_format_data.py data/ccpd data/ccpd_paddleocr
生成识别训练集的层级目录:
ccpd_paddleocr/
├── dict.txt
├── train
├── train.txt
├── val
└── val.txt
下载预训练模型
点这里获取,然后在pretrain_models中解压
配置文件
在PaddleOCR/coofigs/rec/PP-OCRv3中,新建ch_PP-OCRv3_rec_liecece.yml,其内容为
Global:debug: falseuse_gpu: trueepoch_num: 100log_smooth_window: 20print_batch_step: 10save_model_dir: ./output/rec_ppocr_licence_v3save_epoch_step: 3eval_batch_step: [0, 2000]cal_metric_during_train: truepretrained_model: pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy.pdparamscheckpoints:save_inference_dir:use_visualdl: falseinfer_img: doc/imgs_words/ch/word_1.jpgcharacter_dict_path: data/ccpd_paddleocr/dict.txt#character_dict_path: ppocr/utils/ppocr_keys_v1.txtmax_text_length: &max_text_length 12infer_mode: falseuse_space_char: falsedistributed: truesave_res_path: ./output/rec/predicts_licence_ppocrv3.txtOptimizer:name: Adambeta1: 0.9beta2: 0.999lr:name: Cosinelearning_rate: 0.001warmup_epoch: 5regularizer:name: L2factor: 3.0e-05Architecture:model_type: recalgorithm: SVTR_LCNetTransform:Backbone:name: MobileNetV1Enhancescale: 0.5last_conv_stride: [1, 2]last_pool_type: avglast_pool_kernel_size: [2, 2]Head:name: MultiHeadhead_list:- CTCHead:Neck:name: svtrdims: 64depth: 2hidden_dims: 120use_guide: TrueHead:fc_decay: 0.00001- SARHead:enc_dim: 512max_text_length: *max_text_lengthLoss:name: MultiLossloss_config_list:- CTCLoss:- SARLoss:PostProcess: name: CTCLabelDecodeMetric:name: RecMetricmain_indicator: accignore_space: FalseTrain:dataset:name: SimpleDataSetdata_dir: data/ccpd_paddleocr/ext_op_transform_idx: 1label_file_list:- data/ccpd_paddleocr/train.txttransforms:- DecodeImage:img_mode: BGRchannel_first: false- RecConAug:prob: 0.5ext_data_num: 2image_shape: [48, 320, 3]max_text_length: *max_text_length- RecAug:- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: truebatch_size_per_card: 128drop_last: truenum_workers: 4
Eval:dataset:name: SimpleDataSetdata_dir: data/ccpd_paddleocr/label_file_list:- data/ccpd_paddleocr/val.txttransforms:- DecodeImage:img_mode: BGRchannel_first: false- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: falsedrop_last: falsebatch_size_per_card: 128num_workers: 4
开始训练
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_liecece.yml -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy.pdparams
训练过程
...[2024/12/17 18:25:33] ppocr INFO: epoch: [85/100], global_step: 215960, lr: 0.000101, acc: 0.992187, norm_edit_dis: 0.998884, CTCLoss: 0.029224, SARLoss: 0.201802, loss: 0.234332, avg_reader_cost: 0.23257 s, avg_batch_cost: 0.42802 s, avg_samples: 128.0, ips: 299.05106 samples/s, eta: 4:16:11, max_mem_reserved: 9719 MB, max_mem_allocated: 9308 MB
[2024/12/17 18:25:37] ppocr INFO: epoch: [85/100], global_step: 215970, lr: 0.000101, acc: 0.984375, norm_edit_dis: 0.997768, CTCLoss: 0.037204, SARLoss: 0.198890, loss: 0.242042, avg_reader_cost: 0.15294 s, avg_batch_cost: 0.35018 s, avg_samples: 128.0, ips: 365.52281 samples/s, eta: 4:16:07, max_mem_reserved: 9719 MB, max_mem_allocated: 9308 MB
[2024/12/17 18:25:41] ppocr INFO: epoch: [85/100], global_step: 215980, lr: 0.000101, acc: 0.984375, norm_edit_dis: 0.997210, CTCLoss: 0.071525, SARLoss: 0.196448, loss: 0.259408, avg_reader_cost: 0.20332 s, avg_batch_cost: 0.39946 s, avg_samples: 128.0, ips: 320.43009 samples/s, eta: 4:16:03, max_mem_reserved: 9719 MB, max_mem_allocated: 9308 MB
[2024/12/17 18:25:44] ppocr INFO: epoch: [85/100], global_step: 215990, lr: 0.000101, acc: 0.984375, norm_edit_dis: 0.997280, CTCLoss: 0.063012, SARLoss: 0.195158, loss: 0.248105, avg_reader_cost: 0.10683 s, avg_batch_cost: 0.30685 s, avg_samples: 128.0, ips: 417.13550 samples/s, eta: 4:15:59, max_mem_reserved: 9719 MB, max_mem_allocated: 9308 MB
[2024/12/17 18:25:48] ppocr INFO: epoch: [85/100], global_step: 216000, lr: 0.000101, acc: 0.988281, norm_edit_dis: 0.997280, CTCLoss: 0.038435, SARLoss: 0.206021, loss: 0.248680, avg_reader_cost: 0.22180 s, avg_batch_cost: 0.41802 s, avg_samples: 128.0, ips: 306.20233 samples/s, eta: 4:15:56, max_mem_reserved: 9719 MB, max_mem_allocated: 9308 MB
eval model:: 100%|██████████████████████████████████████████████████████████| 284/284 [00:16<00:00, 16.73it/s]
[2024/12/17 18:26:05] ppocr INFO: cur metric, acc: 0.9785529428504617, norm_edit_dis: 0.9959577823762181, fps: 3396.2841101808885
[2024/12/17 18:26:06] ppocr INFO: save best model is to ./output/rec_ppocr_licence_v3/best_accuracy
[2024/12/17 18:26:06] ppocr INFO: best metric, acc: 0.9785529428504617, is_float16: False, norm_edit_dis: 0.9959577823762181, fps: 3396.2841101808885, best_epoch: 85
...
我们看到测试精度97.86%,说明识别模型还是很不错的。
训练结果保存
output/rec_ppocr_licence_v3/
├── best_accuracy.pdopt
├── best_accuracy.pdparams
├── best_accuracy.states
├── best_model
│ ├── model.pdopt
│ └── model.pdparams
├── config.yml
├── latest.pdopt
├── latest.pdparams
├── latest.states
└── train.log
分数最高的模型保存在output/rec_ppocr_licence_v3/best_model中。
接下来随便找几张先手动切出来看一下识别结果
以下是推理命令
python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_liecece.yml -o Global.pretrained_model=./output/rec_ppocr_licence_v3/best_model/model Global.infer_img=../license_plate/test_plate.jpg
结果显示
infer_img: ../license_plate/test_plate.jpgresult: 皖AD49590 0.9894002079963684
infer_img: ../license_plate/22_16_5_0_25_24_25.jpgresult: 川SFA101
一点点遗憾
这个模型用于普通场景下的车牌识别,是足够用的了。但是在训练的字典中,没有发现特殊车牌,比较双行文字,大使馆,军车等,这也是因为数据集有缺失导致,只能在以后收集更加完整的数据集后,再次使用相同的方法进行训练了。
附模型
点这里下载
车牌识别之一:车牌检测(包含全部免费的数据集、源码和模型下载)
车牌识别之三:检测+识别的onnx部署(免费下载高精度onnx模型)