PaddleOCR超大分辨率文本检测代码教程
目录
1.前提
2.PaddleOCR部署(win10下)
3.解决思路和代码
1.前提
这是我提的issue:https://github.com/PaddlePaddle/PaddleOCR/issues/11888
很多问题可以看:https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_ch/FAQ.md
对于超大分辨率,直接resize已经不适合,那么就需要滑动窗口、以及不同尺度的窗口滑动,对于目标检测yolov5 v8中有时用到,其中需要多次nms;但是这里ocr采用DB这种分割体系的文本检测就减少了很多操作。
仅适用DB这种分割体系的文本检测!!!
2.PaddleOCR部署(win10下)
对于win10下部署paddleocr其中重要的一环是conda虚拟环境教程;不用安装cuda,仅需要按照最新的nvidia驱动即可,驱动向下兼容;再参考paddle-gpu官网conda安装cudatoolkit、cudnn对应版本即可!
3.解决思路和代码
直接上代码,简单。
这部分代码是直接在 PaddleOCR\tools\infer\predict_det.py中TextDetector的__call__函数中添加,支持多尺度的分割文本检测!
ori_im = img.copy()image_height, image_width = img.shape[:2]# 前提是采用两阶段检测识别方法,det采用分割的方法,如DB、DB++if (image_width // max(window_size) >= 2) and (image_height // max(window_size) >= 2) and self.det_algorithm in ['DB', 'PSE', 'DB++']:preds_all = np.zeros([1, 1, image_height, image_width]) # 预测概率图assert len(window_size) == len(stride), "窗口尺寸和步长 列表长度不一致"st = time.time()preds = {}for i in range(len(window_size)):window_size_i = window_size[i]stride_i = stride[i]# 计算水平和垂直方向上窗口的数量 num_windows_height = (image_height - window_size_i) // stride_i + 1 num_windows_width = (image_width - window_size_i) // stride_i + 1 # 这里有天然的缺陷if window_size_i > self.args.det_limit_side_len:raise ValueError("window_size超过默认的参数,导致无法预测后赋值,暂时无法修改,建议降低window_size!")windows_x, windows_y = np.meshgrid(np.append(np.arange(0, num_windows_width * stride_i, stride_i), image_width-window_size_i),np.append(np.arange(0, num_windows_height * stride_i, stride_i), image_height-window_size_i),) # x-w y-h# 遍历超大分辨率图片 滑动窗口h,w = windows_x.shape[:2]for y in range(h):for x in range(w):start_h, start_w = windows_y[y,x], windows_x[y,x]print(f"正在处理y{start_h}-x{start_w}块。。。")img_ = img[start_h:start_h+window_size_i, start_w:start_w+window_size_i]data = {'image': img_}if self.args.benchmark:self.autolog.times.start()data = transform(data, self.preprocess_op)img_, shape_list = dataif img_ is None:return None, 0img_ = np.expand_dims(img_, axis=0)shape_list[0] = image_heightshape_list[1] = image_widthshape_list = np.expand_dims(shape_list, axis=0)img_ = img_.copy()if self.args.benchmark:self.autolog.times.stamp()if self.use_onnx:input_dict = {}input_dict[self.input_tensor.name] = img_outputs = self.predictor.run(self.output_tensors, input_dict)else:self.input_tensor.copy_from_cpu(img_)self.predictor.run()outputs = []for output_tensor in self.output_tensors:output = output_tensor.copy_to_cpu()outputs.append(output)if self.args.benchmark:self.autolog.times.stamp()if self.det_algorithm in ['DB', 'PSE', 'DB++']:preds_all[:, :, start_h:start_h+window_size_i, start_w:start_w+window_size_i] = np.maximum(preds_all[:, :, start_h:start_h+window_size_i, start_w:start_w+window_size_i], outputs[0])else:raise NotImplementedErrorpreds['maps'] = preds_allpost_result = self.postprocess_op(preds, shape_list)dt_boxes = post_result[0]['points']if self.args.det_box_type == 'poly':dt_boxes = self.filter_tag_det_res_only_clip(dt_boxes, ori_im.shape)else:dt_boxes = self.filter_tag_det_res(dt_boxes, ori_im.shape)if self.args.benchmark:self.autolog.times.end(stamp=True)et = time.time()return dt_boxes, et - st