Python OCR 图片转文字进阶：读光OCR之行检测模型+行识别模型

介绍
- - 阿里云文字识别OCR（读光OCR）
  - 前置条件
  - - 模型1：行检测模型
    - 模型1：行识别模型
  - 代码：main.py

介绍

什么是OCR？

OCR是“Optical Character Recognition”的缩写，中文意为“光学字符识别”。它是一种技术，可以识别和转换打印在纸张或图像上的文字和字符为机器可处理的格式，如计算机文本文件。通过使用OCR技术，可以快速地将纸质文档数字化，从而使文本可以被编辑、搜索和分析。这项技术广泛应用于各种场合，如图书馆和档案馆的文献数字化、 pdf 文件的文本搜索、以及扫描文档中的条形码和二维码等。

阿里云文字识别OCR（读光OCR）

阿里云文字识别OCR（读光OCR），是一款由阿里巴巴达摩院打造的OCR产品，用于识别图片、文档、卡证等文件所包含的文字信息。

前置条件

1、准备电脑环境（我当前用的是 4060 显卡）
2、安装环境（conda、python）
3、下载模型（通过下方链接地址下载模型）

模型1：行检测模型

https://www.modelscope.cn/models/iic/cv_resnet18_ocr-detection-db-line-level_damo/summary

模型1：行识别模型

https://www.modelscope.cn/models/iic/cv_convnextTiny_ocr-recognition-general_damo/summary

代码：main.py

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasksfrom PIL import Image, ImageDraw# 加载行检测模型
ocr_detection = pipeline(Tasks.ocr_detection, model='damo/cv_resnet18_ocr-detection-db-line-level_damo')# 目标图片路径
pic_path = 'ipp.jpg'
result = ocr_detection(pic_path)
print(result)# 多边形坐标
polygons = result["polygons"]
fg_name = "cropped_image_"
# 打开图片
image = Image.open(pic_path)# 创建新图片
cropped_images = []# 遍历每个多边形
for i, polygon_coords in enumerate(polygons):# 获取多边形边界框x_min, y_min = min(polygon_coords[::2]), min(polygon_coords[1::2])x_max, y_max = max(polygon_coords[::2]), max(polygon_coords[1::2])# 根据边界框裁剪图像cropped_region = image.crop((x_min, y_min, x_max, y_max))# 将裁剪后的图片保存到列表中cropped_images.append(cropped_region)# 保存裁剪后的图片
for i, cropped_image in enumerate(cropped_images):cropped_image.save(f"{fg_name}{i}.jpg")print("图片已成功裁剪并保存。")# 加载行识别模型
ocr_recognition = pipeline(Tasks.ocr_recognition, model='damo/cv_convnextTiny_ocr-recognition-general_damo')# 循环识别获取结果
for i, cropped_image in enumerate(cropped_images):img_url = f"{fg_name}{i}.jpg"result = ocr_recognition(img_url)print(result)

from modelscope.pipelines import pipeline: 导入ModelScope的pipeline模块，这个模块允许用户通过配置任务类型和模型来快速构建一个模型管道。
from modelscope.utils.constant import Tasks: 导入ModelScope的常量定义，其中Tasks是一个枚举类型，定义了不同的任务类型。
ocr_detection = pipeline(Tasks.ocr_detection, model='damo/cv_resnet18_ocr-detection-db-line-level_damo'): 创建一个文本检测的管道，使用的是damo/cv_resnet18_ocr-detection-db-line-level_damo这个模型。
pic_path = 'ipp.jpg': 设置要处理的图片的路径。
result = ocr_detection(pic_path): 将图片路径输入到文本检测管道中，得到检测结果。
polygons = result["polygons"]: 从结果中提取多边形坐标，这些坐标代表了文本区域的边界。
image = Image.open(pic_path): 使用Pillow库打开指定的图片。
cropped_images = []: 初始化一个空列表，用于存储裁剪后的图像。
for i, polygon_coords in enumerate(polygons):: 遍历每个文本区域的多边形坐标。
x_min, y_min = min(polygon_coords[::2]), min(polygon_coords[1::2]): 计算每个多边形的最小x和y坐标。
x_max, y_max = max(polygon_coords[::2]), max(polygon_coords[1::2]): 计算每个多边形的最小x和y坐标。
cropped_region = image.crop((x_min, y_min, x_max, y_max)): 根据计算出的边界框裁剪图像。
cropped_images.append(cropped_region): 将裁剪后的图像保存到列表中。
for i, cropped_image in enumerate(cropped_images):: 遍历裁剪后的图像列表。
cropped_image.save(f"{fg_name}{i}.jpg"): 将每个裁剪后的图像保存到文件系统。
print("图片已成功裁剪并保存。"): 打印成功消息。
ocr_recognition = pipeline(Tasks.ocr_recognition, model='damo/cv_convnextTiny_ocr-recognition-general_damo'): 创建一个文本识别的管道，使用的是damo/cv_convnextTiny_ocr-recognition-general_damo这个模型。
for i, cropped_image in enumerate(cropped_images):: 遍历裁剪后的图像列表。
img_url = f"{fg_name}{i}.jpg": 为每个裁剪后的图像创建一个文件路径。
result = ocr_recognition(img_url): 将裁剪后的图像路径输入到文本识别管道中，得到识别结果。
print(result): 打印识别结果。