Kafka 集群中有一个 broker 会被选举为 Controller,负责管理集群 broker 的上下线,所有 topic 的分区副本分配和 leader 选举等工作。
Controller 的管理工作都是依赖于 Zookeeper 的。
以下为 partition 的 leader 选举过程:
Leader选举流程
我们试试当这个主节点挂了,谁会被抢到(目前这个brokers/ids/ 有0 1 2三个节点)
我们启动zk客户端
[root@backup01 bin]# ./zkCli.sh
Connecting to localhost:2181
2020-04-19 16:05:13,799 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 04:05 GMT
2020-04-19 16:05:13,815 [myid:] - INFO [main:Environment@100] - Client environment:host.name=backup01
2020-04-19 16:05:13,815 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_172
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/local/java/jdk1.8.0_172/jre
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../build/classes:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../build/lib/*.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../lib/slf4j-log4j12-1.7.25.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../lib/slf4j-api-1.7.25.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../lib/netty-3.10.6.Final.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../lib/log4j-1.2.17.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../lib/jline-0.9.94.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../lib/audience-annotations-0.5.0.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../zookeeper-3.4.13.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../src/java/lib/*.jar:/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin/../conf:
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-862.el7.x86_64
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root
2020-04-19 16:05:13,817 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/usr/local/hadoop/zookeeper/zookeeper-3.4.13/bin
2020-04-19 16:05:13,818 [myid:] - INFO [main:ZooKeeper@442] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@277050dc
Welcome to ZooKeeper!
2020-04-19 16:05:13,878 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1029] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2020-04-19 16:05:14,007 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@879] - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
2020-04-19 16:05:14,033 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1303] - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x10004d8e0310000, negotiated timeout = 30000WATCHER::WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
我们看到 controller 是brokerid:0
[zk: localhost:2181(CONNECTED) 2] ls /
[cluster, controller_epoch, controller, brokers, zookeeper, admin, isr_change_notification, consumers, log_dir_event_notification, latest_producer_id_block, config]
[zk: localhost:2181(CONNECTED) 3] get /controller
{"version":1,"brokerid":0,"timestamp":"1586133981273"}
cZxid = 0x500000003
ctime = Mon Apr 06 08:46:21 CST 2020
mZxid = 0x500000003
mtime = Mon Apr 06 08:46:21 CST 2020
pZxid = 0x500000003
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x2000392dd2c0000
dataLength = 54
numChildren = 0
[zk: localhost:2181(CONNECTED) 5]
当我们的kafkabrokerid挂了之后,zk会为我们怎么选举,我们将kafka的brokerid:0的节点 kill掉
查看这个brokerid为0的节点进程
[root@backup01 bin]# kill -9 19728
[root@backup01 bin]# jps
54464 ZooKeeperMain
12732 Elasticsearch
54750 Jps
14623 QuorumPeerMain
[root@backup01 bin]#
[zk: localhost:2181(CONNECTED) 7] get /controller
{"version":1,"brokerid":1,"timestamp":"1587284613737"}
cZxid = 0x600000005
ctime = Sun Apr 19 16:23:33 CST 2020
mZxid = 0x600000005
mtime = Sun Apr 19 16:23:33 CST 2020
pZxid = 0x600000005
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x3000065a6f40003
dataLength = 54
numChildren = 0
[zk: localhost:2181(CONNECTED) 8]