kafka源码阅读-ReplicaStateMachine(副本状态机)解析

概述

Kafka源码包含多个模块,每个模块负责不同的功能。以下是一些核心模块及其功能的概述:

  1. 服务端源码 :实现Kafka Broker的核心功能,包括日志存储、控制器、协调器、元数据管理及状态机管理、延迟机制、消费者组管理、高并发网络架构模型实现等。

  2. Java客户端源码 :实现了Producer和Consumer与Broker的交互机制,以及通用组件支撑代码。

  3. Connect源码 :用来构建异构数据双向流式同步服务。

  4. Stream源码 :用来实现实时流处理相关功能。

  5. Raft源码 :实现了Raft一致性协议。

  6. Admin模块 :Kafka的管理员模块,操作和管理其topic,partition相关,包含创建,删除topic,或者拓展分区等。

  7. Api模块 :负责数据交互,客户端与服务端交互数据的编码与解码。

  8. Client模块 :包含Producer读取Kafka Broker元数据信息的类,如topic和分区,以及leader。

  9. Cluster模块 :包含Broker、Cluster、Partition、Replica等实体类。

  10. Common模块 :包含各种异常类以及错误验证。

  11. Consumer模块 :消费者处理模块,负责客户端消费者数据和逻辑处理。

  12. Controller模块 :负责中央控制器的选举,分区的Leader选举,Replica的分配或重新分配,分区和副本的扩容等。

  13. Coordinator模块 :负责管理部分consumer group和他们的offset。

  14. Javaapi模块 :提供Java语言的Producer和Consumer的API接口。

  15. Log模块 :负责Kafka文件存储,读写所有Topic消息数据。

  16. Message模块 :封装多条数据组成数据集或压缩数据集。

  17. Metrics模块 :负责内部状态监控。

  18. Network模块 :处理客户端连接,网络事件模块。

  19. Producer模块 :生产者细节实现,包括同步和异步消息发送。

  20. Security模块 :负责Kafka的安全验证和管理。

  21. Serializer模块 :序列化和反序列化消息内容。

  22. Server模块 :涉及Leader和Offset的checkpoint,动态配置,延时创建和删除Topic,Leader选举,Admin和Replica管理等。

  23. Tools模块 :包含多种工具,如导出consumer offset值,LogSegments信息,Topic的log位置信息,Zookeeper上的offset值等。

  24. Utils模块 :包含各种工具类,如Json,ZkUtils,线程池工具类,KafkaScheduler公共调度器类等。

这些模块共同构成了Kafka的整体架构,使其能够提供高吞吐量、高可用性的消息队列服务。

kafka源码分支为1.0.2

分区状态机记录着当前集群所有 Partition 的状态信息以及如何对 Partition 状态转移进行相应的处理;副本状态机则是记录着当前集群所有 Replica 的状态信息以及如何对 Replica 状态转变进行相应的处理。

kafkaController初始化时,会启动replicaStateMachine和partitionStateMachine:

    //在 KafkaController 中//有两个状态机:分区状态机和副本状态机;//一个管理器:Channel 管理器,负责管理所有的 Broker 通信;//相关缓存:Partition 信息、Topic 信息、broker id 信息等;//四种 leader 选举机制:分别是用 leader offline、broker 掉线、partition reassign、最优 leader 选举时触发;//启动副本状态机,初始化所有 Replica 的状态信息,如果 Replica 所在节点是 alive 的,那么状态更新为 OnlineReplica, 否则更新为 ReplicaDeletionIneligible;replicaStateMachine.startup()//启动分区状态机,初始化所有 Partition 的状态信息,如果 leader 所在 broker 是 alive 的,那么状态更新为 OnlinePartition,否则更新为 OfflinePartitionpartitionStateMachine.startup()

ReplicaStateMachine类相关方法:

  /*** Invoked on successful controller election. First registers a broker change listener since that triggers all* state transitions for replicas. Initializes the state of replicas for all partitions by reading from zookeeper.* Then triggers the OnlineReplica state change for all replicas.*/def startup() {// //初始化所有副本的状态信息initializeReplicaState()//将online的replica状态转变为OnlineReplicahandleStateChanges(controllerContext.allLiveReplicas(), OnlineReplica)info("Started replica state machine with initial state -> " + replicaState.toString())}/*** Invoked on startup of the replica's state machine to set the initial state for replicas of all existing partitions* in zookeeper*///初始化所有副本的状态信息// 这里只是将 Replica 的状态信息更新副本状态机的缓存 replicaState 中,并没有真正进行状态转移的操作。private def initializeReplicaState() {for((topicPartition, assignedReplicas) <- controllerContext.partitionReplicaAssignment) {val topic = topicPartition.topicval partition = topicPartition.partitionassignedReplicas.foreach { replicaId =>val partitionAndReplica = PartitionAndReplica(topic, partition, replicaId)//如果 Replica 所在机器是 alive 的,那么将其状态设置为 OnlineReplica//replicaId即brokerIdif (controllerContext.isReplicaOnline(replicaId, topicPartition))replicaState.put(partitionAndReplica, OnlineReplica)else {// mark replicas on dead brokers as failed for topic deletion, if they belong to a topic to be deleted.// This is required during controller failover since during controller failover a broker can go down,// so the replicas on that broker should be moved to ReplicaDeletionIneligible to be on the safer side.//否则设置为 ReplicaDeletionIneligible 状态replicaState.put(partitionAndReplica, ReplicaDeletionIneligible)}}}}/*** This API is invoked by the broker change controller callbacks and the startup API of the state machine* @param replicas     The list of replicas (brokers) that need to be transitioned to the target state* @param targetState  The state that the replicas should be moved to* The controller's allLeaders cache should have been updated before this*///用于处理 Replica 状态的变化def handleStateChanges(replicas: Set[PartitionAndReplica], targetState: ReplicaState,callbacks: Callbacks = (new CallbackBuilder).build) {if (replicas.nonEmpty) {info("Invoking state change to %s for replicas %s".format(targetState, replicas.mkString(",")))try {brokerRequestBatch.newBatch()//状态转变replicas.foreach(r => handleStateChange(r, targetState, callbacks))//向 broker 发送相应请求brokerRequestBatch.sendRequestsToBrokers(controller.epoch)} catch {case e: Throwable => error("Error while moving some replicas to %s state".format(targetState), e)}}}/*** This API exercises the replica's state machine. It ensures that every state transition happens from a legal* previous state to the target state. Valid state transitions are:* NonExistentReplica --> NewReplica* --send LeaderAndIsr request with current leader and isr to the new replica and UpdateMetadata request for the*   partition to every live broker** NewReplica -> OnlineReplica* --add the new replica to the assigned replica list if needed** OnlineReplica,OfflineReplica -> OnlineReplica* --send LeaderAndIsr request with current leader and isr to the new replica and UpdateMetadata request for the*   partition to every live broker** NewReplica,OnlineReplica,OfflineReplica,ReplicaDeletionIneligible -> OfflineReplica* --send StopReplicaRequest to the replica (w/o deletion)* --remove this replica from the isr and send LeaderAndIsr request (with new isr) to the leader replica and*   UpdateMetadata request for the partition to every live broker.** OfflineReplica -> ReplicaDeletionStarted* --send StopReplicaRequest to the replica (with deletion)** ReplicaDeletionStarted -> ReplicaDeletionSuccessful* -- mark the state of the replica in the state machine** ReplicaDeletionStarted -> ReplicaDeletionIneligible* -- mark the state of the replica in the state machine** ReplicaDeletionSuccessful -> NonExistentReplica* -- remove the replica from the in memory partition replica assignment cache* @param partitionAndReplica The replica for which the state transition is invoked* @param targetState The end state that the replica should be moved to*/def handleStateChange(partitionAndReplica: PartitionAndReplica, targetState: ReplicaState,callbacks: Callbacks) {val topic = partitionAndReplica.topicval partition = partitionAndReplica.partitionval replicaId = partitionAndReplica.replicaval topicAndPartition = TopicAndPartition(topic, partition)// Replica 不存在的话,状态初始化为 NonExistentReplicaval currState = replicaState.getOrElseUpdate(partitionAndReplica, NonExistentReplica)val stateChangeLog = stateChangeLogger.withControllerEpoch(controller.epoch)try {def logStateChange(): Unit =stateChangeLog.trace(s"Changed state of replica $replicaId for partition $topicAndPartition from " +s"$currState to $targetState")val replicaAssignment = controllerContext.partitionReplicaAssignment(topicAndPartition)//校验状态转变是否符合要求assertValidTransition(partitionAndReplica, targetState)targetState match {case NewReplica => 其前置状态只能为 NonExistentReplica// start replica as a follower to the current leader for its partition//从 zk 获取 Partition 的 leaderAndIsr 信息val leaderIsrAndControllerEpochOpt = ReplicationUtils.getLeaderIsrAndEpochForPartition(zkUtils, topic, partition)leaderIsrAndControllerEpochOpt match {case Some(leaderIsrAndControllerEpoch) =>//若是leader的replica状态不能变为NewReplicaif(leaderIsrAndControllerEpoch.leaderAndIsr.leader == replicaId)throw new StateChangeFailedException(s"Replica $replicaId for partition $topicAndPartition cannot " +s"be moved to NewReplica state as it is being requested to become leader")//向该 replicaId 发送 LeaderAndIsr 请求,这个方法同时也会向所有的 broker 发送 updateMeta 请求brokerRequestBatch.addLeaderAndIsrRequestForBrokers(List(replicaId),topic, partition, leaderIsrAndControllerEpoch,replicaAssignment, isNew = true)//对于新建的 Partition,处于这个状态时,该 Partition 是没有相应的 LeaderAndIsr 信息的case None => // new leader request will be sent to this replica when one gets elected}//将该 Replica 的状态转移成 NewReplica,然后结束流程。replicaState.put(partitionAndReplica, NewReplica)logStateChange()case ReplicaDeletionStarted => //其前置状态只能为 OfflineReplica//更新向该 Replica 的状态为 ReplicaDeletionStarted;replicaState.put(partitionAndReplica, ReplicaDeletionStarted)// send stop replica command//发送 StopReplica 请求给该副本,并设置 deletePartition=true//broker收到这请求后,会从物理存储上删除这个 Replica 的数据内容brokerRequestBatch.addStopReplicaRequestForBrokers(List(replicaId), topic, partition, deletePartition = true,callbacks.stopReplicaResponseCallback)logStateChange()case ReplicaDeletionIneligible => //其前置状态只能为 ReplicaDeletionStartedreplicaState.put(partitionAndReplica, ReplicaDeletionIneligible)logStateChange()case ReplicaDeletionSuccessful => //其前置状态只能为 ReplicaDeletionStartedreplicaState.put(partitionAndReplica, ReplicaDeletionSuccessful)logStateChange()case NonExistentReplica => //其前置状态只能为 ReplicaDeletionSuccessful。// NonExistentReplica 是副本完全删除、不存在这个副本的状态// remove this replica from the assigned replicas list for its partition//在 controller 的 partitionReplicaAssignment 删除这个 Partition 对应的 replica 信息;val currentAssignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)controllerContext.partitionReplicaAssignment.put(topicAndPartition, currentAssignedReplicas.filterNot(_ == replicaId))//将这个 Topic 从缓存中删除。replicaState.remove(partitionAndReplica)logStateChange()case OnlineReplica =>//其前置状态只能为 NewReplica, OnlineReplica, OfflineReplica, ReplicaDeletionIneligible//副本正常工作时的状态,此时的 Replica 既可以作为 leader 也可以作为 followerreplicaState(partitionAndReplica) match {case NewReplica => //其前置状态如果为 NewReplica// add this replica to the assigned replicas list for its partition//从 Controller 的 partitionReplicaAssignment 中获取这个 Partition 的 AR;val currentAssignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)//如果 Replica 不在 AR 中的话,那么就将其添加到 Partition 的 AR 中;if(!currentAssignedReplicas.contains(replicaId))controllerContext.partitionReplicaAssignment.put(topicAndPartition, currentAssignedReplicas :+ replicaId)logStateChange()case _ => //其前置状态如果为:OnlineReplica, OfflineReplica, ReplicaDeletionIneligible// check if the leader for this partition ever existed//如果该 Partition 的 LeaderIsrAndControllerEpoch 信息存在,那么就更新副本的状态,并发送相应的请求//否则不做任何处理;controllerContext.partitionLeadershipInfo.get(topicAndPartition) match {case Some(leaderIsrAndControllerEpoch) =>brokerRequestBatch.addLeaderAndIsrRequestForBrokers(List(replicaId), topic, partition, leaderIsrAndControllerEpoch,replicaAssignment)replicaState.put(partitionAndReplica, OnlineReplica)logStateChange()case None => // that means the partition was never in OnlinePartition state, this means the broker never// started a log for that partition and does not have a high watermark value for this partition}}//最后将 Replica 的状态设置为 OnlineReplica 状态。replicaState.put(partitionAndReplica, OnlineReplica)case OfflineReplica => //其前置状态只能为 NewReplica, OnlineReplica, OfflineReplica, ReplicaDeletionIneligible// send stop replica command to the replica so that it stops fetching from the leader//发送 StopReplica 请求给该副本,先停止副本同步 (deletePartition = false)brokerRequestBatch.addStopReplicaRequestForBrokers(List(replicaId), topic, partition, deletePartition = false)// As an optimization, the controller removes dead replicas from the ISRval leaderAndIsrIsEmpty: Boolean =controllerContext.partitionLeadershipInfo.get(topicAndPartition) match {case Some(_) =>//将该 replica 从 Partition 的 isr 移除这个 replica(前提 isr 中还有其他有效副本)controller.removeReplicaFromIsr(topic, partition, replicaId) match {case Some(updatedLeaderIsrAndControllerEpoch) =>// send the shrunk ISR state change request to all the remaining alive replicas of the partition.val currentAssignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)if (!controller.topicDeletionManager.isPartitionToBeDeleted(topicAndPartition)) {// 发送 LeaderAndIsr 请求给剩余的其他副本,因为 ISR 变动了brokerRequestBatch.addLeaderAndIsrRequestForBrokers(currentAssignedReplicas.filterNot(_ == replicaId),topic, partition, updatedLeaderIsrAndControllerEpoch, replicaAssignment)}//更新这个 Replica 的状态为 OfflineReplicareplicaState.put(partitionAndReplica, OfflineReplica)logStateChange()falsecase None =>true}case None =>true}if (leaderAndIsrIsEmpty && !controller.topicDeletionManager.isPartitionToBeDeleted(topicAndPartition))throw new StateChangeFailedException(s"Failed to change state of replica $replicaId for partition $topicAndPartition since the leader " +s"and isr path in zookeeper is empty")}}catch {case t: Throwable =>stateChangeLog.error(s"Initiated state change of replica $replicaId for partition $topicAndPartition from " +s"$currState to $targetState failed", t)}}

上面 Replica 各种转移的触发的条件:
在这里插入图片描述
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/50659.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Vue常用指令及其生命周期

作者&#xff1a;CSDN-PleaSure乐事 欢迎大家阅读我的博客 希望大家喜欢 目录 1.常用指令 1.1 v-bind 1.2 v-model 注意事项 1.3 v-on 注意事项 1.4 v-if / v-else-if / v-else 1.5 v-show 1.6 v-for 无索引 有索引 生命周期 定义 流程 1.常用指令 Vue当中的指令…

数据库水印算法三道题

针对数据库水印算法的面试题&#xff0c;由简单到困难&#xff0c;可以设计以下三道题目&#xff1a; 1. 基础理解题 题目&#xff1a;请简要解释什么是数据库水印算法&#xff0c;并说明其主要应用场景。 参考答案&#xff1a; 数据库水印算法是一种在数据库中嵌入隐蔽信息…

Red Hat 9.4 配置Yum镜像源

1. 虚拟机信息 镜像&#xff1a;rhel-server-9.4-x86_64-dvd.iso 系统版本&#xff1a;Red Hat 9.4 版本信息&#xff1a; cat /etc/redhat-release Red Hat Enterprise Linux release 9.4 (Plow)2. 配置文件 vim /etc/yum.repos.d/local.repo # 按i键&#xff0c;输入以下内…

Linux 普通用户启动Nginx使用80端口,小于1024的端口

让 Nginx 运行在 root 权限下&#xff1a; 在root用户下执行 cd /usr/local/nginx/sbin/ chown root nginx chmod us nginx或者&#xff1a;cd /usr/local/nginx/sbin/ sudo chown root nginx sudo chmod us nginx

远程项目调试-informer2020

informer2020 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting(原文&#xff09;Informer 是一个基于Transformer的模型&#xff0c;是为了应对长依赖关系而开发的。本文的主要主题是序列预测。序列预测可以在任何具有不断变化的数据的地方…

[笔记]ONVIF服务端实现[进行中...]

1.文档搜索&#xff1a; 从&#xff1a;https://www.cnblogs.com/liwen01/p/17337916.html 跳转到了&#xff1a;ONVIF协议网络摄像机&#xff08;IPC&#xff09;客户端程序开发&#xff08;1&#xff09;&#xff1a;专栏开篇_onvif 许振坪-CSDN博客 1.1原生代码支持&…

Linux——管理本地用户和组(详细介绍了Linux中用户和组的概念及用法)

目录 一、用户和组概念 &#xff08;一&#xff09;、用户的概念 &#xff08;二&#xff09;、组的概念 补充组 主要组 二、获取超级用户访问权限 &#xff08;一&#xff09;、su 命令和su -命令 &#xff08; 二&#xff09;、sudo命令 三、管理本地用户账户 &…

ERROR: Cannot find command ‘git’- do you have ‘git’ installed and in your PATH?

ERROR: Cannot find command ‘git’- do you have ‘git’ installed and in your PATH? 目录 ERROR: Cannot find command ‘git’- do you have ‘git’ installed and in your PATH? 【常见模块错误】 【解决方案】 欢迎来到英杰社区https://bbs.csdn.net/topics/61780…

Transformer自然语言处理实战pdf阅读

一.第一章 欢迎来到transformer的世界 1.解码器-编码器框架 在Transformer出现之前&#xff0c;NLP的最新技术是LSTM等循环架构。这些架 构通过在神经网络连接使用反馈循环&#xff0c;允许信息从一步传播到另一 步&#xff0c;使其成为对文本等序列数据进行建模的理想选择。如…

图片检查 python脚本

图片检查 python脚本 import os from PIL import Imagedef is_image_broken(image_path):try:img Image.open(image_path)img.verify() # Verify that it is, in fact an imagereturn Falseexcept (IOError, SyntaxError) as e:return Truedef check_images_in_directory(di…

Unity分享:继承自MonoBehaviour的脚步不要对引用类型的字段在声明时就初始化

如果某些字段在每个构造函数中都要进行初始化&#xff0c;很多人都喜欢在字段声明时就进行初始化&#xff0c;对于一个非继承自MonoBehaviour的脚步&#xff0c;这样做是没有问题的&#xff0c;然而继承自MonoBehaviour后就会造成内存的浪费&#xff0c;为什么呢&#xff1f;因…

多模态大模型应用中的Q-Former是什么?

多模态大模型应用中的Q-Former是什么&#xff1f; Q-Former是一种新型的神经网络架构&#xff0c;专注于通过查询&#xff08;Query&#xff09;机制来改进信息检索和表示学习。在这篇博客中&#xff0c;我们将详细探讨Q-Former的工作原理、应用场景&#xff0c;并在必要时通过…

pyqt designer使用spliter

1、在designer界面需要使用spliter需要父界面不使用布局&#xff0c;减需要分割两个模块选中&#xff0c;再点击spliter分割 2、在分割后&#xff0c;再对父界面进行布局设置 3、对于两边需要不等比列放置的&#xff0c;需要套一层 group box在最外层进行分割

大数据学习之Flink基础

Flink基础 1、系统时间与时间时间 系统时间&#xff08;处理时间&#xff09; 在Sparksreaming的任务计算时&#xff0c;使用的是系统时间。 假设所用窗口为滚动窗口&#xff0c;大小为5分钟。那么每五分钟&#xff0c;都会对接收的数据进行提交任务. 但是&#xff0c;这里有…

GoogleCTF2023 Writeup

GoogleCTF2023 Writeup Misc NPC Crypto LEAST COMMON GENOMINATOR? Web UNDER-CONSTRUCTION NPC A friend handed me this map and told me that it will lead me to the flag. It is confusing me and I don’t know how to read it, can you help me out? Attach…

VSCode切换默认终端

我的VSCode默认终端为PowerShell&#xff0c;每次新建都会自动打开PowerShell。但是我想让每次都变为cmd&#xff0c;也就是Command Prompt 更改默认终端的操作方法如下&#xff1a; 键盘调出命令面板&#xff08;CtrlShiftP&#xff09;中,输入Terminal: Select Default Prof…

Hisilicon 适配新遥控器

Hisilicon 适配新遥控器 适配NEC红外遥控器: 相关文档: Android解决方案开发指南:输入 红外驱动使用说明及注意事项 Application Notes HMS 开发指南:IR HMS sample 使用指南:IR 1、查看公版遥控器 sample_ir 没有此命令,不是没有编译打开,而是名字变成ir_user:…

Java 中的Stream流

Stream流就像工厂中的流水线操作。 如何使用Stream&#xff1f; 1、首先要获取Stream流&#xff0c;那么如何获取呢? 对于不同的数据&#xff0c;有不同的获取方法。 ①单列集合 方法名说明default Stream<E> stream()Collection接口中的默认方法 所以实现了Colle…

Multi Range Read与Covering Index是如何优化回表的?

上篇文章末尾我们提出一个问题&#xff1a;有没有什么办法可以尽量避免回表或让回表的开销变小呢&#xff1f; 本篇文章围绕这个问题提出解决方案&#xff0c;一起来看看MySQL是如何优化的 回表 为什么会发生回表&#xff1f; 因为使用的索引并没有整条记录的所有信息&…

使用shell脚本在Linux主机上创建一个admin账号,并将uid配置为特定值

#!/bin/bash #说明&#xff1a;在当前主机上创建一个admin账号&#xff0c;将uid设置为1101&#xff1b; #如果账号已存在&#xff0c;则要判断uid是否为1101&#xff0c;不是的话则配置为1101&#xff1b; #如果系统中已存在其它账号使用了1101这个uid&#xff0c;则要提前变更…