目录
- 背景
- 尝试1: pymysql模块的escape_string方法
- 尝试2: 修改pandas.read_excel引擎
- 尝试3: 回退xlrd版本
- 总结
背景
在Linux上部署的时候, 使用pandas
模块读取Excel, 然后pymysql
模块入库, 结果发生了错误
Traceback (most recent call last):File "/usr/local/lib64/python3.6/site-packages/tornado/web.py", line 1702, in _executeresult = method(*self.path_args, **self.path_kwargs)File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1135, in postself._replyPrivateMessage(message)File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1221, in _replyPrivateMessageself._junming(filepath)File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1263, in _junmingsdk.mysql(config.db_qihua).insmany('qihua.log_bjmantis_cost',data_db)File "/root/.jenkins/workspace/git_python_web_linux/sdk.py", line 65, in insmanyself._cur.executemany(sqlInsert,listTuple)File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 182, in executemanyself.rowcount = sum(self.execute(query, arg) for arg in args)File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 182, in <genexpr>self.rowcount = sum(self.execute(query, arg) for arg in args)File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 146, in executequery = self.mogrify(query, args)File "/usr/local/lib64/python3.6/site-packages/pymysql/cursors.py", line 125, in mogrifyquery = query % self._escape_args(args, conn)TypeError: not all arguments converted during string formatting
尝试1: pymysql模块的escape_string方法
from pymysql.converters import escape_string
...
data_database = [escape_string(x) for x in data]
...
还是报错, 所以应该不是字符的问题.
打印出来肉眼观察, 也没有字符问题
尝试2: 修改pandas.read_excel引擎
修改引擎, 是可以解决字符问题, 但出现了新的问题
...
excel = pandas.read_excel(filepath, engine='openpyxl')
...
正常入库. 但是, 无法读取xls
文件
Traceback (most recent call last):File "/usr/local/lib64/python3.6/site-packages/tornado/web.py", line 1702, in _executeresult = method(*self.path_args, **self.path_kwargs)File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1135, in postself._replyPrivateMessage(message)File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1224, in _replyPrivateMessageself._qingzhilan(filepath)File "/root/.jenkins/workspace/git_python_web_linux/handler.py", line 1296, in _qingzhilandf = pandas.read_excel(filepath,sheet_name=['Sheet1'], engine='openpyxl')File "/usr/local/lib64/python3.6/site-packages/pandas/util/_decorators.py", line 296, in wrapperreturn func(*args, **kwargs)File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_base.py", line 304, in read_excelio = ExcelFile(io, engine=engine)File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_base.py", line 867, in __init__self._reader = self._engines[engine](self._io)File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_openpyxl.py", line 480, in __init__super().__init__(filepath_or_buffer)File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_base.py", line 353, in __init__self.book = self.load_workbook(filepath_or_buffer)File "/usr/local/lib64/python3.6/site-packages/pandas/io/excel/_openpyxl.py", line 492, in load_workbookfilepath_or_buffer, read_only=True, data_only=True, keep_links=FalseFile "/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 345, in load_workbookdata_only, keep_links, rich_text)File "/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 123, in __init__self.archive = _validate_archive(fn)File "/usr/local/lib/python3.6/site-packages/openpyxl/reader/excel.py", line 93, in _validate_archiveraise InvalidFileException(msg)openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.
尝试3: 回退xlrd版本
注: Linux平台上默认应该用pip3
替代pip
pip list | grep xlrd # 查看现有xlrd版本
# pip list | findstr 'xlrd' ## Windows平台查包
pip uninstall xlrd # 卸载包
pip install xlrd==1.2.0 # 安装包
回退到1.2.0版本后,我之前遇到的错误消失了。我的代码再次可以正常运行,我能够继续我的数据分析工作。
总结
Linux平台使用pandas.read_excel
并且可能会有xls
和xlsx
格式的话, 建议回退xlrd
版本到1.2.0来解决字符和格式问题