环境硬件配置及Hadoop,Hive版本
一、安装步骤
pip install pure-sasl
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/16/83/30eaf3765de898083
75a8358f9c15d45a3dd44ed26be991471abc0b4480b/pure_sasl-0.5.1-py2.py3-none-any.whl
pip install thrift_sasl==0.2.1 --no-deps
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/80/36/16dfe92d32df63cc2
b7b7be8d0e4a736617b7e52daaa7f83ae386a89d179/thrift_sasl-0.2.1.tar.gz
pip install thrift==0.9.3
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ae/58/35e3f0cd290039ff8
62c2c9d8ae8a76896665d70343d833bdc2f748b8e55/thrift-0.9.3.tar.gz
pip install impyla (上面安装的都是依赖包)
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/80/93/f0d226061ee4679d5
b593c88c7b2e9e077a271c799d29facf31bf03666c1/impyla-0.14.1.tar.gz (151kB)
在安装pip install impyla时报错: error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": Downloads | IDE, Code, & Team Foundation Server | Visual Studio
解决方法:在这个网址上下载了Microsoft Visual C++ 2019安装后,问题没有解决。后来下载了别人分享的工具包安装后,再执行pip install impyla安装成功
别人的分享地址:忘记了.....
二、写脚本
#此时可以开始写脚本连接数据库了
from impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host='***', port=10000, auth_mechanism='PLAIN', user='***', password='***', database='***')
cursor = conn.cursor()
cursor.execute('show databases')
print(as_pandas(cursor))
三、问题解决
执行数据库连接后,出现问题
ThriftParserError: ThriftPy does not support generating module with path in protocol ‘c’
定位到 Libsite-packagesthriftpyparserparser.py的
if url_scheme == '':
with open(path) as fh:
data = fh.read()
elif url_scheme in ('http', 'https'):
data = urlopen(path).read()
else:
raise ThriftParserError('ThriftPy does not support generating module '
'with path in protocol '{}''.format(
url_scheme))
更改为
if url_scheme == '':
with open(path) as fh:
data = fh.read()
elif url_scheme in ('c', 'd','e','f''):
with open(path) as fh:
data = fh.read()
elif url_scheme in ('http', 'https'):
data = urlopen(path).read()
else:
raise ThriftParserError('ThriftPy does not support generating module '
'with path in protocol '{}''.format(
url_scheme))
执行数据库连接后,再次出现问题
TypeError: can’t concat str to bytes
定位到错误的最后一条,在init.py第94行
...
header = struct.pack(">BI", status, len(body))
self._trans.write(header + body)
...
修改为
...
header = struct.pack(">BI", status, len(body))
if(type(body) is str):
body = body.encode()
self._trans.write(header + body)
...
执行连接 成功