这里写目录标题
- flask 文件上传与接收
- flask应答(接收请求(文件、数据)
- flask请求(上传文件)
- 传递参数和文件
- argparse 不从命令行调用参数
- 1、设置default值
- 2、"从命令行传入的参数".split()
- 3、['--input','内容']
- python解压压缩包
- python将文件如jpg保存到指定文件夹报错
- 一团糟
flask 文件上传与接收
文件流接收
1、前端传来的对象是二进制文件流,有两种方法保存本地。
(1)通过open()方法将文件流写入保存
(2)直接用调用 file.save() 方法保存传来的文件流:
flask应答(接收请求(文件、数据)
from flask import Flask,request
app = Flask(__name__)@app.route('/upload',methods = ['POST'])
def file_receive():# 获取文件对象file = request.files['file']# 获取文件名filename = file.filename# file.save 也可保存传来的文件# file.save(f'./{filename}')with open(f'./{filename}','wb') as f:f.write(file.stream.read())return {'success':1}if __name__ == '__main__':app.run()
flask请求(上传文件)
测试该段代码的文件上传可以用requests实现,用open()创建一个二进制对象,传给后端:
import requestsdef uploads():url = 'http://127.0.0.1:5000/upload'files = {'file':open('C:\\Users\\xxx\\Desktop\\push\\test.mp4','rb')}r = requests.post(url,files = files)print(r.text)if __name__=="__main__":uploads()
传递参数和文件
from flask import Flask,request
app = Flask(__name__)@app.route('/upload',methods = ['POST'])
def file_receive():# 获取文件对象file = request.files['file']# 获取参数bodybody = request.datafilename = file.filename# file.save 也可保存传来的文件# file.save(f'./{filename}')with open(f'./{filename}','wb') as f:f.write(file.stream.read())return {'success':1}if __name__ == '__main__':app.run()
requests 测试代码:
import requestsdef uploads():url = 'http://127.0.0.1:5000/upload'body = {'info':'test'}files = {'file':open('C:\\Users\\xxx\\Desktop\\push\\test.mp4','rb')}r = requests.post(url,json = body,files = files)print(r.text)if __name__=="__main__":uploads()
flask 文件上传与接收
argparse 不从命令行调用参数
1、设置default值
parser.add_argument('-f', '--config_file', dest='config_file', type=argparse.FileType(mode='r'))
改进如下
yaml_path='test.yaml'
parser.add_argument('-f', '--config_file', dest='config_file',type=argparse.FileType(mode='r'),default=yaml_path)
2、“从命令行传入的参数”.split()
现在很多python代码使用parser解析输入参数, 我们如果想要在IDE里(如pycharm)分析源代码,不可能每一次都使用命令行进行,因此这里面使用了一个技巧,即源程序在定义完入口命令行参数后,使用了args = parser.parse_args() 来接送实际使用命令行时的输入,我们这里把这句代码替换为:
args= parser.parse_args(“从命令行传入的参数”.split())
args = parser.parse_args("--input ../example_graphs/karate.adjlist --output ./output".split())
str=“–input …/example_graphs/karate.adjlist”
args = parser.parse_args(str.split())
就报错AttributeError: 'str' object has no attribute 'spilt'
可以使用第三种方式
args = parser.parse_args(【'--input',str】)
Pycham不用命令行传入参数
3、[‘–input’,‘内容’]
Python 中使用 argparse 解析命令行参数 | Linux 中国
有一些第三方库用于命令行解析,但标准库 argparse 与之相比也毫不逊色。
无需添加很多依赖,你就可以编写带有实用参数解析功能的漂亮命令行工具。
Python 中的参数解析
使用 argparse 解析命令行参数时,第一步是配置一个 ArgumentParser 对象。这通常在全局模块内完成,因为单单_配置_一个解析器没有副作用。
import argparsePARSER = argparse.ArgumentParser()
ArgumentParser 中最重要的方法是 .add_argument(),它有几个变体。默认情况下,它会添加一个参数,并期望一个值。
PARSER.add_argument("--value")
查看实际效果,调用 .parse_args():
PARSER.parse_args(["--value", "some-value"])
Namespace(value='some-value')
也可以使用 = 语法:
PARSER.parse_args(["--value=some-value"])
Namespace(value='some-value')
为了缩短在命令行输入的命令,你还可以为选项指定一个短“别名”:
PARSER.add_argument("--thing", "-t")
可以传入短选项:
PARSER.parse_args(“-t some-thing”.split())
Namespace(value=None, thing=‘some-thing’)
或者长选项:
PARSER.parse_args(“–thing some-thing”.split())
Namespace(value=None, thing=‘some-thing’)
类型
有很多类型的参数可供你使用。除了默认类型,最流行的两个是布尔类型和计数器。布尔类型有一个默认为 True 的变体和一个默认为 False 的变体。
PARSER.add_argument(“–active”, action=“store_true”)
PARSER.add_argument(“–no-dry-run”, action=“store_false”, dest=“dry_run”)
PARSER.add_argument(“–verbose”, “-v”, action=“count”)
除非显式传入 --active,否则 active 就是 False。dry-run 默认是 True,除非传入 --no-dry-run。无值的短选项可以并列。
传递所有参数会导致非默认状态:
PARSER.parse_args(“–active --no-dry-run -vvvv”.split())
Namespace(value=None, thing=None, active=True, dry_run=False, verbose=4)
默认值则比较单一:
PARSER.parse_args(“”.split())
Namespace(value=None, thing=None, active=False, dry_run=True, verbose=None)
子命令
经典的 Unix 命令秉承了“一次只做一件事,并做到极致”,但现代的趋势把“几个密切相关的操作”放在一起。
git、podman 和 kubectl 充分说明了这种范式的流行。argparse 库也可以做到:
MULTI_PARSER = argparse.ArgumentParser()
subparsers = MULTI_PARSER.add_subparsers()
get = subparsers.add_parser(“get”)
get.add_argument(“–name”)
get.set_defaults(command=“get”)
search = subparsers.add_parser(“search”)
search.add_argument(“–query”)
search.set_defaults(command=“search”)
MULTI_PARSER.parse_args(“get --name awesome-name”.split())
Namespace(name=‘awesome-name’, command=‘get’)
MULTI_PARSER.parse_args(“search --query name~awesome”.split())
Namespace(query=‘name~awesome’, command=‘search’)`
程序架构
使用 argparse 的一种方法是使用下面的结构:
## my_package/__main__.py
import argparse
import sysfrom my_package import toplevelparsed_arguments = toplevel.PARSER.parse_args(sys.argv[1:])
toplevel.main(parsed_arguments)## my_package/toplevel.pyPARSER = argparse.ArgumentParser()
## .add_argument, etc.def main(parsed_args):...# do stuff with parsed_args
在这种情况下,使用 python -m my_package 运行。或者,你可以在包安装时使用 console_scprits 入口点。
总结
argparse 模块是一个强大的命令行参数解析器,还有很多功能没能在这里介绍。它能实现你想象的一切。
python解压压缩包
python解压压缩包
如果是从前端上传的zip,只想将解压后的文件夹存在服务器中,那么先解压再保存(保存之后才存在文件路径),可以将前端输入的zip文件
现在我们直接使用上一步产生的 spam.zip 文件内容,首先假定输入为字节数据,然后窥探其中每一个条目的文件信息与内容import zipfile
import io
import osdef read_zipfiles(path, folder=''):for member in path.iterdir():filename = os.path.join(folder, member.name)if member.is_file():print(filename, ':', member.read_text()) # member.read_bytes()else:read_zipfiles(member, filename)with open('spam.zip', 'rb') as myzip:zip_data = myzip.read()with zipfile.ZipFile(io.BytesIO(zip_data)) as zip_file:read_zipfiles(zipfile.Path(zip_file))
Python zipfile 只借助内存进行压缩与解压缩
# 处理压缩文件if file and allowed_file(file.filename):filename = secure_filename(file.filename)file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename)) # 压缩文件保存在项目路径下local_dir = os.path.join(base_dir, '11') # 新创建一个路径,用来放压缩后的文件hh = os.path.join(base_dir, filename) # 这个是找到压缩文件路径-------C:/Code/haha.zipprint(hh)print(local_dir)shutil.unpack_archive(filename=hh, extract_dir=local_dir)# 把文件保存在刚刚设定好的路径下os.remove(hh) # 最后把压缩文件删除
flask上传文件及上传zip文件实例
python将文件如jpg保存到指定文件夹报错
dst = open(dst, "wb")
```pythonfrom PIL import Imageimport os# 打开图片image = Image.open('example.jpg')# 保存图片到指定文件夹if not os.path.exists('new_folder'):os.makedirs('new_folder')image.save('new_folder/example_new.jpg')
上述代码中,使用os模块创建一个新的文件夹new_folder,并将图片保存到这个文件夹中。
我的代码报错
python - IO错误: Errno 13 Permission denied for specific files
些许类似,没明白
一团糟
if suffix.lower() in ['jpg', 'png', 'jpeg']:# uploaded_file.save(image_folder + uploaded_file.filename.split('.')[-2])# image_folder = image_folder + uploaded_file.filename.split('.')[-2]save_path=image_folder + uploaded_file.filename.split('.')[-2]# # uploaded_file.save(save_path + uploaded_file.filename)# image_folder = image_folder + uploaded_file.filename.split('.')[-2]# uploaded_file.save('./images/hhh/'+ uploaded_file.filename)# image_folder = image_folder + 'hhh/'# save_path=image_folder + uploaded_file.filename.split('.')[-2]+'/'+ uploaded_file.filename.split('.')[-2]+'.'# print(save_path)# uploaded_file.save(save_path + suffix.lower())# image_folder = image_folder + uploaded_file.filename.split('.')[-2]print(uploaded_file.filename)print(type(uploaded_file.filename))save_path=os.path.join(save_path, uploaded_file.filename)print(save_path)with open(uploaded_file.filename, 'wb') as f:print('222')f.write(uploaded_file)print('111')
# from paddleocr import PaddleOCR
import os
import sys
import importlib__dir__ = os.path.dirname(__file__)sys.path.append(os.path.join(__dir__, ''))import cv2
import logging
import numpy as np
from pathlib import Path
# import base64
# from io import BytesIO
from PIL import Imagedef _import_file(module_name, file_path, make_importable=False):spec = importlib.util.spec_from_file_location(module_name, file_path)module = importlib.util.module_from_spec(spec)spec.loader.exec_module(module)if make_importable:sys.modules[module_name] = modulereturn moduletools = _import_file('tools', os.path.join(__dir__, 'tools/__init__.py'), make_importable=True)
ppocr = importlib.import_module('ppocr', 'paddleocr')
ppstructure = importlib.import_module('ppstructure', 'paddleocr')
from ppocr.utils.logging import get_logger
from tools.infer import predict_system
from ppocr.utils.utility import check_and_read, get_image_file_list, alpha_to_color, binarize_img
from ppocr.utils.network import maybe_download, download_with_progressbar, is_link, confirm_model_dir_url
from tools.infer.utility import draw_ocr, str2bool, check_gpu
from ppstructure.utility import init_args, draw_structure_result
from ppstructure.predict_system import StructureSystem, save_structure_res, to_excellogger = get_logger()
__all__ = ['PaddleOCR', 'PPStructure', 'draw_ocr', 'draw_structure_result','save_structure_res', 'download_with_progressbar', 'to_excel'
]SUPPORT_DET_MODEL = ['DB']
VERSION = '2.7.0.3'
SUPPORT_REC_MODEL = ['CRNN', 'SVTR_LCNet']
BASE_DIR = os.path.expanduser("~/.paddleocr/")DEFAULT_OCR_MODEL_VERSION = 'PP-OCRv4'
SUPPORT_OCR_MODEL_VERSION = ['PP-OCR', 'PP-OCRv2', 'PP-OCRv3', 'PP-OCRv4']
DEFAULT_STRUCTURE_MODEL_VERSION = 'PP-StructureV2'
SUPPORT_STRUCTURE_MODEL_VERSION = ['PP-Structure', 'PP-StructureV2']
MODEL_URLS = {'OCR': {'PP-OCRv4': {'det': {'ch': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_infer.tar',},'en': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar',},'ml': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_infer.tar'}},'rec': {'ch': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/ppocr_keys_v1.txt'},'en': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/english/en_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/en_dict.txt'},'korean': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/korean_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/dict/korean_dict.txt'},'japan': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/japan_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/dict/japan_dict.txt'},'chinese_cht': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/chinese_cht_dict.txt'},'ta': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/ta_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/dict/ta_dict.txt'},'te': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/te_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/dict/te_dict.txt'},'ka': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/ka_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/dict/ka_dict.txt'},'latin': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/latin_dict.txt'},'arabic': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/arabic_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/dict/arabic_dict.txt'},'cyrillic': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/cyrillic_dict.txt'},'devanagari': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv4/multilingual/devanagari_PP-OCRv4_rec_infer.tar','dict_path': './ppocr/utils/dict/devanagari_dict.txt'},},'cls': {'ch': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',}},},'PP-OCRv3': {'det': {'ch': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar',},'en': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_infer.tar',},'ml': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/Multilingual_PP-OCRv3_det_infer.tar'}},'rec': {'ch': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/ppocr_keys_v1.txt'},'en': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/en_dict.txt'},'korean': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/korean_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/korean_dict.txt'},'japan': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/japan_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/japan_dict.txt'},'chinese_cht': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/chinese_cht_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/chinese_cht_dict.txt'},'ta': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ta_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/ta_dict.txt'},'te': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/te_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/te_dict.txt'},'ka': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/ka_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/ka_dict.txt'},'latin': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/latin_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/latin_dict.txt'},'arabic': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/arabic_dict.txt'},'cyrillic': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/cyrillic_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/cyrillic_dict.txt'},'devanagari': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/devanagari_PP-OCRv3_rec_infer.tar','dict_path': './ppocr/utils/dict/devanagari_dict.txt'},},'cls': {'ch': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',}},},'PP-OCRv2': {'det': {'ch': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_infer.tar',},},'rec': {'ch': {'url':'https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_infer.tar','dict_path': './ppocr/utils/ppocr_keys_v1.txt'}},'cls': {'ch': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',}},},'PP-OCR': {'det': {'ch': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar',},'en': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_ppocr_mobile_v2.0_det_infer.tar',},'structure': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_det_infer.tar'}},'rec': {'ch': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/ppocr_keys_v1.txt'},'en': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/en_dict.txt'},'french': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/french_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/french_dict.txt'},'german': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/german_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/german_dict.txt'},'korean': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/korean_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/korean_dict.txt'},'japan': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/japan_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/japan_dict.txt'},'chinese_cht': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/chinese_cht_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/chinese_cht_dict.txt'},'ta': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ta_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/ta_dict.txt'},'te': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/te_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/te_dict.txt'},'ka': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/ka_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/ka_dict.txt'},'latin': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/latin_ppocr_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/latin_dict.txt'},'arabic': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/arabic_ppocr_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/arabic_dict.txt'},'cyrillic': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/cyrillic_ppocr_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/cyrillic_dict.txt'},'devanagari': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/devanagari_ppocr_mobile_v2.0_rec_infer.tar','dict_path': './ppocr/utils/dict/devanagari_dict.txt'},'structure': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_rec_infer.tar','dict_path': 'ppocr/utils/dict/table_dict.txt'}},'cls': {'ch': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar',}},}},'STRUCTURE': {'PP-Structure': {'table': {'en': {'url':'https://paddleocr.bj.bcebos.com/dygraph_v2.0/table/en_ppocr_mobile_v2.0_table_structure_infer.tar','dict_path': 'ppocr/utils/dict/table_structure_dict.txt'}}},'PP-StructureV2': {'table': {'en': {'url':'https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_infer.tar','dict_path': 'ppocr/utils/dict/table_structure_dict.txt'},'ch': {'url':'https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/ch_ppstructure_mobile_v2.0_SLANet_infer.tar','dict_path': 'ppocr/utils/dict/table_structure_dict_ch.txt'}},'layout': {'en': {'url':'https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_infer.tar','dict_path':'ppocr/utils/dict/layout_dict/layout_publaynet_dict.txt'},'ch': {'url':'https://paddleocr.bj.bcebos.com/ppstructure/models/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer.tar','dict_path':'ppocr/utils/dict/layout_dict/layout_cdla_dict.txt'}}}}
}def parse_args(mMain=True):import argparseparser = init_args()parser.add_help = mMainparser.add_argument("--lang", type=str, default='ch')parser.add_argument("--det", type=str2bool, default=True)parser.add_argument("--rec", type=str2bool, default=True)parser.add_argument("--type", type=str, default='ocr')parser.add_argument("--ocr_version",type=str,choices=SUPPORT_OCR_MODEL_VERSION,default='PP-OCRv4',help='OCR Model version, the current model support list is as follows: ''1. PP-OCRv4/v3 Support Chinese and English detection and recognition model, and direction classifier model''2. PP-OCRv2 Support Chinese detection and recognition model. ''3. PP-OCR support Chinese detection, recognition and direction classifier and multilingual recognition model.')parser.add_argument("--structure_version",type=str,choices=SUPPORT_STRUCTURE_MODEL_VERSION,default='PP-StructureV2',help='Model version, the current model support list is as follows:'' 1. PP-Structure Support en table structure model.'' 2. PP-StructureV2 Support ch and en table structure model.')for action in parser._actions:if action.dest in ['rec_char_dict_path', 'table_char_dict_path', 'layout_dict_path']:action.default = Noneif mMain:return parser.parse_args()else:inference_args_dict = {}for action in parser._actions:inference_args_dict[action.dest] = action.defaultreturn argparse.Namespace(**inference_args_dict)def parse_lang(lang):latin_lang = ['af', 'az', 'bs', 'cs', 'cy', 'da', 'de', 'es', 'et', 'fr', 'ga', 'hr','hu', 'id', 'is', 'it', 'ku', 'la', 'lt', 'lv', 'mi', 'ms', 'mt', 'nl','no', 'oc', 'pi', 'pl', 'pt', 'ro', 'rs_latin', 'sk', 'sl', 'sq', 'sv','sw', 'tl', 'tr', 'uz', 'vi', 'french', 'german']arabic_lang = ['ar', 'fa', 'ug', 'ur']cyrillic_lang = ['ru', 'rs_cyrillic', 'be', 'bg', 'uk', 'mn', 'abq', 'ady', 'kbd', 'ava','dar', 'inh', 'che', 'lbe', 'lez', 'tab']devanagari_lang = ['hi', 'mr', 'ne', 'bh', 'mai', 'ang', 'bho', 'mah', 'sck', 'new', 'gom','sa', 'bgc']if lang in latin_lang:lang = "latin"elif lang in arabic_lang:lang = "arabic"elif lang in cyrillic_lang:lang = "cyrillic"elif lang in devanagari_lang:lang = "devanagari"assert lang in MODEL_URLS['OCR'][DEFAULT_OCR_MODEL_VERSION]['rec'], 'param lang must in {}, but got {}'.format(MODEL_URLS['OCR'][DEFAULT_OCR_MODEL_VERSION]['rec'].keys(), lang)if lang == "ch":det_lang = "ch"elif lang == 'structure':det_lang = 'structure'elif lang in ["en", "latin"]:det_lang = "en"else:det_lang = "ml"return lang, det_langdef get_model_config(type, version, model_type, lang):if type == 'OCR':DEFAULT_MODEL_VERSION = DEFAULT_OCR_MODEL_VERSIONelif type == 'STRUCTURE':DEFAULT_MODEL_VERSION = DEFAULT_STRUCTURE_MODEL_VERSIONelse:raise NotImplementedErrormodel_urls = MODEL_URLS[type]if version not in model_urls:version = DEFAULT_MODEL_VERSIONif model_type not in model_urls[version]:if model_type in model_urls[DEFAULT_MODEL_VERSION]:version = DEFAULT_MODEL_VERSIONelse:logger.error('{} models is not support, we only support {}'.format(model_type, model_urls[DEFAULT_MODEL_VERSION].keys()))sys.exit(-1)if lang not in model_urls[version][model_type]:if lang in model_urls[DEFAULT_MODEL_VERSION][model_type]:version = DEFAULT_MODEL_VERSIONelse:logger.error('lang {} is not support, we only support {} for {} models'.format(lang, model_urls[DEFAULT_MODEL_VERSION][model_type].keys(), model_type))sys.exit(-1)return model_urls[version][model_type][lang]def img_decode(content: bytes):np_arr = np.frombuffer(content, dtype=np.uint8)return cv2.imdecode(np_arr, cv2.IMREAD_UNCHANGED)def check_img(img):if isinstance(img, bytes):img = img_decode(img)if isinstance(img, str):# download net imageif is_link(img):download_with_progressbar(img, 'tmp.jpg')img = 'tmp.jpg'image_file = imgimg, flag_gif, flag_pdf = check_and_read(image_file)if not flag_gif and not flag_pdf:with open(image_file, 'rb') as f:img_str = f.read()img = img_decode(img_str)if img is None:try:buf = BytesIO()image = BytesIO(img_str)im = Image.open(image)rgb = im.convert('RGB')rgb.save(buf, 'jpeg')buf.seek(0)image_bytes = buf.read()data_base64 = str(base64.b64encode(image_bytes),encoding="utf-8")image_decode = base64.b64decode(data_base64)img_array = np.frombuffer(image_decode, np.uint8)img = cv2.imdecode(img_array, cv2.IMREAD_COLOR)except:logger.error("error in loading image:{}".format(image_file))return Noneif img is None:logger.error("error in loading image:{}".format(image_file))return Noneif isinstance(img, np.ndarray) and len(img.shape) == 2:img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)return imgclass PaddleOCR(predict_system.TextSystem):def __init__(self, **kwargs):"""paddleocr packageargs:**kwargs: other params show in paddleocr --help"""params = parse_args(mMain=False)params.__dict__.update(**kwargs)assert params.ocr_version in SUPPORT_OCR_MODEL_VERSION, "ocr_version must in {}, but get {}".format(SUPPORT_OCR_MODEL_VERSION, params.ocr_version)params.use_gpu = check_gpu(params.use_gpu)if not params.show_log:logger.setLevel(logging.INFO)self.use_angle_cls = params.use_angle_clslang, det_lang = parse_lang(params.lang)# init model dirdet_model_config = get_model_config('OCR', params.ocr_version, 'det',det_lang)params.det_model_dir, det_url = confirm_model_dir_url(params.det_model_dir,os.path.join(BASE_DIR, 'whl', 'det', det_lang),det_model_config['url'])rec_model_config = get_model_config('OCR', params.ocr_version, 'rec',lang)params.rec_model_dir, rec_url = confirm_model_dir_url(params.rec_model_dir,os.path.join(BASE_DIR, 'whl', 'rec', lang), rec_model_config['url'])cls_model_config = get_model_config('OCR', params.ocr_version, 'cls','ch')params.cls_model_dir, cls_url = confirm_model_dir_url(params.cls_model_dir,os.path.join(BASE_DIR, 'whl', 'cls'), cls_model_config['url'])if params.ocr_version in ['PP-OCRv3', 'PP-OCRv4']:params.rec_image_shape = "3, 48, 320"else:params.rec_image_shape = "3, 32, 320"# download model if using paddle inferif not params.use_onnx:maybe_download(params.det_model_dir, det_url)maybe_download(params.rec_model_dir, rec_url)maybe_download(params.cls_model_dir, cls_url)if params.det_algorithm not in SUPPORT_DET_MODEL:logger.error('det_algorithm must in {}'.format(SUPPORT_DET_MODEL))sys.exit(0)if params.rec_algorithm not in SUPPORT_REC_MODEL:logger.error('rec_algorithm must in {}'.format(SUPPORT_REC_MODEL))sys.exit(0)if params.rec_char_dict_path is None:params.rec_char_dict_path = str(Path(__file__).parent / rec_model_config['dict_path'])logger.debug(params)# init det_model and rec_modelsuper().__init__(params)self.page_num = params.page_numdef ocr(self,img,det=True,rec=True,cls=True,bin=False,inv=False,alpha_color=(255, 255, 255)):"""OCR with PaddleOCRargs:img: img for OCR, support ndarray, img_path and list or ndarraydet: use text detection or not. If False, only rec will be exec. Default is Truerec: use text recognition or not. If False, only det will be exec. Default is Truecls: use angle classifier or not. Default is True. If True, the text with rotation of 180 degrees can be recognized. If no text is rotated by 180 degrees, use cls=False to get better performance. Text with rotation of 90 or 270 degrees can be recognized even if cls=False.bin: binarize image to black and white. Default is False.inv: invert image colors. Default is False.alpha_color: set RGB color Tuple for transparent parts replacement. Default is pure white."""assert isinstance(img, (np.ndarray, list, str, bytes))if isinstance(img, list) and det == True:logger.error('When input a list of images, det must be false')exit(0)if cls == True and self.use_angle_cls == False:logger.warning('Since the angle classifier is not initialized, it will not be used during the forward process')img = check_img(img)# for infer pdf fileif isinstance(img, list):if self.page_num > len(img) or self.page_num == 0:self.page_num = len(img)imgs = img[:self.page_num]else:imgs = [img]def preprocess_image(_image):_image = alpha_to_color(_image, alpha_color)if inv:_image = cv2.bitwise_not(_image)if bin:_image = binarize_img(_image)return _imageif det and rec:ocr_res = []for idx, img in enumerate(imgs):img = preprocess_image(img)dt_boxes, rec_res, _ = self.__call__(img, cls)if not dt_boxes and not rec_res:ocr_res.append(None)continuetmp_res = [[box.tolist(), res]for box, res in zip(dt_boxes, rec_res)]ocr_res.append(tmp_res)return ocr_reselif det and not rec:ocr_res = []for idx, img in enumerate(imgs):img = preprocess_image(img)dt_boxes, elapse = self.text_detector(img)if not dt_boxes:ocr_res.append(None)continuetmp_res = [box.tolist() for box in dt_boxes]ocr_res.append(tmp_res)return ocr_reselse:ocr_res = []cls_res = []for idx, img in enumerate(imgs):if not isinstance(img, list):img = preprocess_image(img)img = [img]if self.use_angle_cls and cls:img, cls_res_tmp, elapse = self.text_classifier(img)if not rec:cls_res.append(cls_res_tmp)rec_res, elapse = self.text_recognizer(img)ocr_res.append(rec_res)if not rec:return cls_resreturn ocr_resimport json
import os
import io
import zipfile
import shutilclass Result:def __init__(self, id, value):self.id = idself.value = valuedef result_encoder(obj):if isinstance(obj, Result):return {'id': obj.id, 'PaddleOCR': obj.value}return json.JSONEncoder.default(obj)import paddlepaddle.disable_signal_handler() # 在2.2版本提供了disable_signal_handler接口from flask import Flask, requestapp = Flask(__name__)@app.route('/OCR', methods=['GET','POST'])
def fun():print(request.files)uploaded_file = request.files['file']if not uploaded_file:return {'error': 'No file is provided'}suffix = uploaded_file.filename.split('.')[-1] # 取得文件的后缀名# #也可以根据文件的后缀名对文件类型进行过滤,如:if suffix.lower() not in ['jpg', 'png', 'jpeg', 'zip']:return {'error': 'The uploaded file type is invalid'}image_folder = './images/'if not os.path.exists(image_folder):os.makedirs(image_folder)if suffix.lower() in ['jpg', 'png', 'jpeg']:# uploaded_file.save(image_folder + uploaded_file.filename.split('.')[-2])# image_folder = image_folder + uploaded_file.filename.split('.')[-2]save_path=image_folder + uploaded_file.filename.split('.')[-2]if not os.path.exists(save_path):os.makedirs(save_path)uploaded_file.save(os.path.join(save_path, uploaded_file.filename))image_folder = save_pathelse:zip_buffer = io.BytesIO(uploaded_file.read())with zipfile.ZipFile(zip_buffer, 'r') as zip_ref:zip_ref.extractall(image_folder) # 解压缩到指定的目标文件夹save_path=image_folder + uploaded_file.filename.split('.')[-2]# uploaded_file.save(image_folder+'/'+uploaded_file.filename)# with zipfile.ZipFile('/data1/xyj/PaddleOCR/images/app_test.zip', 'r') as zip_ref:# zip_ref.extractall(image_folder) # 解压缩到指定的目标文件夹# image_folder = save_path# with zipfile.ZipFile('/data1/xyj/PaddleOCR/images/app_test.zip', 'r') as zip_ref:# for member in zip_ref.infolist():# zip_ref.extract(member.filename, image_folder)# image_folder = save_path# shutil.unpack_archive('/data1/xyj/PaddleOCR/images/app_test.zip', image_folder, 'zip')image_folder = save_path# image_folder = "/data1/xyj/datasets/zh_test"# image_folder = request.json['image_folder']output_path = 'outputs/'if not os.path.exists(output_path):os.makedirs(output_path)output_path = os.path.join(output_path, "zh_test_PaddleOCR.json")if (os.path.exists(output_path)):os.remove(output_path)# Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换# 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memoryans = {}for filename in os.listdir(image_folder):img_path = os.path.join(image_folder, filename)result = ocr.ocr(img_path, cls=True)for res in result:outputs=''if res is not None:for line in res:outputs=outputs+line[1][0]+' 'res = Result(filename, outputs)with open(output_path, "a", encoding="utf8") as file:json.dump(result_encoder(res), file, ensure_ascii=False, indent=4)ans[filename] = outputs
# 将列表转换为 JSON 格式的字符串json_data = json.dumps(ans, ensure_ascii=False)# 将 JSON 字符串写入文件with open("data.json", "w") as file:file.write(json_data)return {'result': json_data}if __name__ == '__main__':app.run(host='0.0.0.0', port=5009)
app_ppocr.py