启动agent节点失败,提示拒绝连接,百度一下很多人说解决方式是用ps -ef | grep supervisord查看是否有进程,有的话 kill 掉(使用kill -9 会自动拉起进程,使用kill),然后重启即可,kill掉后依然无法启动
cloudera-scm-agent.log日志文件报错如下
[01/Apr/2019 16:46:19 +0000] 97874 MainThread supervisor INFO Trying to connect to supervisor (Attempt 1)
[01/Apr/2019 16:46:20 +0000] 97874 MainThread supervisor ERROR Failed! trying again in 2 second(s): [Errno 111] Connection refused
Traceback (most recent call last):File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 141, in connectobj = cls(cfg, os_ops_obj)File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 88, in __init__self.identifier = self.__get_supervisor_identification()File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/__init__.py", line 531, in new_fnreturn fn(self, *args, **kwargs)File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 379, in __get_supervisor_identificationreturn self.client.supervisor.getIdentification()File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__return self.__send(self.__name, args)File "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __requestverbose=self.__verboseFile "/opt/cloudera/cm-agent/lib/python2.7/site-packages/supervisor/xmlrpc.py", line 460, in requestself.connection.request('POST', handler, request_body, self.headers)File "/usr/lib64/python2.7/httplib.py", line 1041, in requestself._send_request(method, url, body, headers)File "/usr/lib64/python2.7/httplib.py", line 1075, in _send_requestself.endheaders(body)File "/usr/lib64/python2.7/httplib.py", line 1037, in endheadersself._send_output(message_body)File "/usr/lib64/python2.7/httplib.py", line 881, in _send_outputself.send(msg)File "/usr/lib64/python2.7/httplib.py", line 843, in sendself.connect()File "/usr/lib64/python2.7/httplib.py", line 824, in connectself.timeout, self.source_address)File "/usr/lib64/python2.7/socket.py", line 571, in create_connectionraise err
error: [Errno 111] Connection refused
[01/Apr/2019 16:46:22 +0000] 97874 MainThread supervisor INFO Trying to connect to supervisor (Attempt 2)
[01/Apr/2019 16:46:23 +0000] 97874 MainThread supervisor ERROR Failed! trying again in 2 second(s): [Errno 111] Connection refused
Traceback (most recent call last):File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 141, in connectobj = cls(cfg, os_ops_obj)File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 88, in __init__self.identifier = self.__get_supervisor_identification()File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/util/__init__.py", line 531, in new_fnreturn fn(self, *args, **kwargs)File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/supervisor.py", line 379, in __get_supervisor_identificationreturn self.client.supervisor.getIdentification()File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__return self.__send(self.__name, args)File "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __requestverbose=self.__verboseFile "/opt/cloudera/cm-agent/lib/python2.7/site-packages/supervisor/xmlrpc.py", line 460, in requestself.connection.request('POST', handler, request_body, self.headers)File "/usr/lib64/python2.7/httplib.py", line 1041, in requestself._send_request(method, url, body, headers)File "/usr/lib64/python2.7/httplib.py", line 1075, in _send_requestself.endheaders(body)File "/usr/lib64/python2.7/httplib.py", line 1037, in endheadersself._send_output(message_body)File "/usr/lib64/python2.7/httplib.py", line 881, in _send_outputself.send(msg)File "/usr/lib64/python2.7/httplib.py", line 843, in sendself.connect()File "/usr/lib64/python2.7/httplib.py", line 824, in connectself.timeout, self.source_address)File "/usr/lib64/python2.7/socket.py", line 571, in create_connectionraise err
error: [Errno 111] Connection refused
后续多次尝试寻找原因为何拒绝连接,发现防火墙开启了(iptables),关闭后,重启服务,成功。
总结一下可能出现agent无法启动的原因
1、Python文件不匹配;参考http://www.cnblogs.com/lion.net/archive/2014/09/02/3950619.html中_io的设置
2、日志文件不存在,在config.ini中把log_file放开
3、/etc/hosts/中主机和ip配置问题
4、防火墙是否关闭,(iptables/firewall)
5、端口配置,config.ini中端口是否配置的为7182
6、集群时间是否同步,安装ntp同步时间
根据这次问题发现,异常后只提示连接拒绝,无具体说明,所以防火墙如果必须开启,请注意各端口开放情况,否则异常排查较为麻烦