kafka源码阅读-ReplicaStateMachine(副本状态机)解析

概述

Kafka源码包含多个模块,每个模块负责不同的功能。以下是一些核心模块及其功能的概述:

  1. 服务端源码 :实现Kafka Broker的核心功能,包括日志存储、控制器、协调器、元数据管理及状态机管理、延迟机制、消费者组管理、高并发网络架构模型实现等。

  2. Java客户端源码 :实现了Producer和Consumer与Broker的交互机制,以及通用组件支撑代码。

  3. Connect源码 :用来构建异构数据双向流式同步服务。

  4. Stream源码 :用来实现实时流处理相关功能。

  5. Raft源码 :实现了Raft一致性协议。

  6. Admin模块 :Kafka的管理员模块,操作和管理其topic,partition相关,包含创建,删除topic,或者拓展分区等。

  7. Api模块 :负责数据交互,客户端与服务端交互数据的编码与解码。

  8. Client模块 :包含Producer读取Kafka Broker元数据信息的类,如topic和分区,以及leader。

  9. Cluster模块 :包含Broker、Cluster、Partition、Replica等实体类。

  10. Common模块 :包含各种异常类以及错误验证。

  11. Consumer模块 :消费者处理模块,负责客户端消费者数据和逻辑处理。

  12. Controller模块 :负责中央控制器的选举,分区的Leader选举,Replica的分配或重新分配,分区和副本的扩容等。

  13. Coordinator模块 :负责管理部分consumer group和他们的offset。

  14. Javaapi模块 :提供Java语言的Producer和Consumer的API接口。

  15. Log模块 :负责Kafka文件存储,读写所有Topic消息数据。

  16. Message模块 :封装多条数据组成数据集或压缩数据集。

  17. Metrics模块 :负责内部状态监控。

  18. Network模块 :处理客户端连接,网络事件模块。

  19. Producer模块 :生产者细节实现,包括同步和异步消息发送。

  20. Security模块 :负责Kafka的安全验证和管理。

  21. Serializer模块 :序列化和反序列化消息内容。

  22. Server模块 :涉及Leader和Offset的checkpoint,动态配置,延时创建和删除Topic,Leader选举,Admin和Replica管理等。

  23. Tools模块 :包含多种工具,如导出consumer offset值,LogSegments信息,Topic的log位置信息,Zookeeper上的offset值等。

  24. Utils模块 :包含各种工具类,如Json,ZkUtils,线程池工具类,KafkaScheduler公共调度器类等。

这些模块共同构成了Kafka的整体架构,使其能够提供高吞吐量、高可用性的消息队列服务。

kafka源码分支为1.0.2

分区状态机记录着当前集群所有 Partition 的状态信息以及如何对 Partition 状态转移进行相应的处理;副本状态机则是记录着当前集群所有 Replica 的状态信息以及如何对 Replica 状态转变进行相应的处理。

kafkaController初始化时,会启动replicaStateMachine和partitionStateMachine:

    //在 KafkaController 中//有两个状态机:分区状态机和副本状态机;//一个管理器:Channel 管理器,负责管理所有的 Broker 通信;//相关缓存:Partition 信息、Topic 信息、broker id 信息等;//四种 leader 选举机制:分别是用 leader offline、broker 掉线、partition reassign、最优 leader 选举时触发;//启动副本状态机,初始化所有 Replica 的状态信息,如果 Replica 所在节点是 alive 的,那么状态更新为 OnlineReplica, 否则更新为 ReplicaDeletionIneligible;replicaStateMachine.startup()//启动分区状态机,初始化所有 Partition 的状态信息,如果 leader 所在 broker 是 alive 的,那么状态更新为 OnlinePartition,否则更新为 OfflinePartitionpartitionStateMachine.startup()

ReplicaStateMachine类相关方法:

  /*** Invoked on successful controller election. First registers a broker change listener since that triggers all* state transitions for replicas. Initializes the state of replicas for all partitions by reading from zookeeper.* Then triggers the OnlineReplica state change for all replicas.*/def startup() {// //初始化所有副本的状态信息initializeReplicaState()//将online的replica状态转变为OnlineReplicahandleStateChanges(controllerContext.allLiveReplicas(), OnlineReplica)info("Started replica state machine with initial state -> " + replicaState.toString())}/*** Invoked on startup of the replica's state machine to set the initial state for replicas of all existing partitions* in zookeeper*///初始化所有副本的状态信息// 这里只是将 Replica 的状态信息更新副本状态机的缓存 replicaState 中,并没有真正进行状态转移的操作。private def initializeReplicaState() {for((topicPartition, assignedReplicas) <- controllerContext.partitionReplicaAssignment) {val topic = topicPartition.topicval partition = topicPartition.partitionassignedReplicas.foreach { replicaId =>val partitionAndReplica = PartitionAndReplica(topic, partition, replicaId)//如果 Replica 所在机器是 alive 的,那么将其状态设置为 OnlineReplica//replicaId即brokerIdif (controllerContext.isReplicaOnline(replicaId, topicPartition))replicaState.put(partitionAndReplica, OnlineReplica)else {// mark replicas on dead brokers as failed for topic deletion, if they belong to a topic to be deleted.// This is required during controller failover since during controller failover a broker can go down,// so the replicas on that broker should be moved to ReplicaDeletionIneligible to be on the safer side.//否则设置为 ReplicaDeletionIneligible 状态replicaState.put(partitionAndReplica, ReplicaDeletionIneligible)}}}}/*** This API is invoked by the broker change controller callbacks and the startup API of the state machine* @param replicas     The list of replicas (brokers) that need to be transitioned to the target state* @param targetState  The state that the replicas should be moved to* The controller's allLeaders cache should have been updated before this*///用于处理 Replica 状态的变化def handleStateChanges(replicas: Set[PartitionAndReplica], targetState: ReplicaState,callbacks: Callbacks = (new CallbackBuilder).build) {if (replicas.nonEmpty) {info("Invoking state change to %s for replicas %s".format(targetState, replicas.mkString(",")))try {brokerRequestBatch.newBatch()//状态转变replicas.foreach(r => handleStateChange(r, targetState, callbacks))//向 broker 发送相应请求brokerRequestBatch.sendRequestsToBrokers(controller.epoch)} catch {case e: Throwable => error("Error while moving some replicas to %s state".format(targetState), e)}}}/*** This API exercises the replica's state machine. It ensures that every state transition happens from a legal* previous state to the target state. Valid state transitions are:* NonExistentReplica --> NewReplica* --send LeaderAndIsr request with current leader and isr to the new replica and UpdateMetadata request for the*   partition to every live broker** NewReplica -> OnlineReplica* --add the new replica to the assigned replica list if needed** OnlineReplica,OfflineReplica -> OnlineReplica* --send LeaderAndIsr request with current leader and isr to the new replica and UpdateMetadata request for the*   partition to every live broker** NewReplica,OnlineReplica,OfflineReplica,ReplicaDeletionIneligible -> OfflineReplica* --send StopReplicaRequest to the replica (w/o deletion)* --remove this replica from the isr and send LeaderAndIsr request (with new isr) to the leader replica and*   UpdateMetadata request for the partition to every live broker.** OfflineReplica -> ReplicaDeletionStarted* --send StopReplicaRequest to the replica (with deletion)** ReplicaDeletionStarted -> ReplicaDeletionSuccessful* -- mark the state of the replica in the state machine** ReplicaDeletionStarted -> ReplicaDeletionIneligible* -- mark the state of the replica in the state machine** ReplicaDeletionSuccessful -> NonExistentReplica* -- remove the replica from the in memory partition replica assignment cache* @param partitionAndReplica The replica for which the state transition is invoked* @param targetState The end state that the replica should be moved to*/def handleStateChange(partitionAndReplica: PartitionAndReplica, targetState: ReplicaState,callbacks: Callbacks) {val topic = partitionAndReplica.topicval partition = partitionAndReplica.partitionval replicaId = partitionAndReplica.replicaval topicAndPartition = TopicAndPartition(topic, partition)// Replica 不存在的话,状态初始化为 NonExistentReplicaval currState = replicaState.getOrElseUpdate(partitionAndReplica, NonExistentReplica)val stateChangeLog = stateChangeLogger.withControllerEpoch(controller.epoch)try {def logStateChange(): Unit =stateChangeLog.trace(s"Changed state of replica $replicaId for partition $topicAndPartition from " +s"$currState to $targetState")val replicaAssignment = controllerContext.partitionReplicaAssignment(topicAndPartition)//校验状态转变是否符合要求assertValidTransition(partitionAndReplica, targetState)targetState match {case NewReplica => 其前置状态只能为 NonExistentReplica// start replica as a follower to the current leader for its partition//从 zk 获取 Partition 的 leaderAndIsr 信息val leaderIsrAndControllerEpochOpt = ReplicationUtils.getLeaderIsrAndEpochForPartition(zkUtils, topic, partition)leaderIsrAndControllerEpochOpt match {case Some(leaderIsrAndControllerEpoch) =>//若是leader的replica状态不能变为NewReplicaif(leaderIsrAndControllerEpoch.leaderAndIsr.leader == replicaId)throw new StateChangeFailedException(s"Replica $replicaId for partition $topicAndPartition cannot " +s"be moved to NewReplica state as it is being requested to become leader")//向该 replicaId 发送 LeaderAndIsr 请求,这个方法同时也会向所有的 broker 发送 updateMeta 请求brokerRequestBatch.addLeaderAndIsrRequestForBrokers(List(replicaId),topic, partition, leaderIsrAndControllerEpoch,replicaAssignment, isNew = true)//对于新建的 Partition,处于这个状态时,该 Partition 是没有相应的 LeaderAndIsr 信息的case None => // new leader request will be sent to this replica when one gets elected}//将该 Replica 的状态转移成 NewReplica,然后结束流程。replicaState.put(partitionAndReplica, NewReplica)logStateChange()case ReplicaDeletionStarted => //其前置状态只能为 OfflineReplica//更新向该 Replica 的状态为 ReplicaDeletionStarted;replicaState.put(partitionAndReplica, ReplicaDeletionStarted)// send stop replica command//发送 StopReplica 请求给该副本,并设置 deletePartition=true//broker收到这请求后,会从物理存储上删除这个 Replica 的数据内容brokerRequestBatch.addStopReplicaRequestForBrokers(List(replicaId), topic, partition, deletePartition = true,callbacks.stopReplicaResponseCallback)logStateChange()case ReplicaDeletionIneligible => //其前置状态只能为 ReplicaDeletionStartedreplicaState.put(partitionAndReplica, ReplicaDeletionIneligible)logStateChange()case ReplicaDeletionSuccessful => //其前置状态只能为 ReplicaDeletionStartedreplicaState.put(partitionAndReplica, ReplicaDeletionSuccessful)logStateChange()case NonExistentReplica => //其前置状态只能为 ReplicaDeletionSuccessful。// NonExistentReplica 是副本完全删除、不存在这个副本的状态// remove this replica from the assigned replicas list for its partition//在 controller 的 partitionReplicaAssignment 删除这个 Partition 对应的 replica 信息;val currentAssignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)controllerContext.partitionReplicaAssignment.put(topicAndPartition, currentAssignedReplicas.filterNot(_ == replicaId))//将这个 Topic 从缓存中删除。replicaState.remove(partitionAndReplica)logStateChange()case OnlineReplica =>//其前置状态只能为 NewReplica, OnlineReplica, OfflineReplica, ReplicaDeletionIneligible//副本正常工作时的状态,此时的 Replica 既可以作为 leader 也可以作为 followerreplicaState(partitionAndReplica) match {case NewReplica => //其前置状态如果为 NewReplica// add this replica to the assigned replicas list for its partition//从 Controller 的 partitionReplicaAssignment 中获取这个 Partition 的 AR;val currentAssignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)//如果 Replica 不在 AR 中的话,那么就将其添加到 Partition 的 AR 中;if(!currentAssignedReplicas.contains(replicaId))controllerContext.partitionReplicaAssignment.put(topicAndPartition, currentAssignedReplicas :+ replicaId)logStateChange()case _ => //其前置状态如果为:OnlineReplica, OfflineReplica, ReplicaDeletionIneligible// check if the leader for this partition ever existed//如果该 Partition 的 LeaderIsrAndControllerEpoch 信息存在,那么就更新副本的状态,并发送相应的请求//否则不做任何处理;controllerContext.partitionLeadershipInfo.get(topicAndPartition) match {case Some(leaderIsrAndControllerEpoch) =>brokerRequestBatch.addLeaderAndIsrRequestForBrokers(List(replicaId), topic, partition, leaderIsrAndControllerEpoch,replicaAssignment)replicaState.put(partitionAndReplica, OnlineReplica)logStateChange()case None => // that means the partition was never in OnlinePartition state, this means the broker never// started a log for that partition and does not have a high watermark value for this partition}}//最后将 Replica 的状态设置为 OnlineReplica 状态。replicaState.put(partitionAndReplica, OnlineReplica)case OfflineReplica => //其前置状态只能为 NewReplica, OnlineReplica, OfflineReplica, ReplicaDeletionIneligible// send stop replica command to the replica so that it stops fetching from the leader//发送 StopReplica 请求给该副本,先停止副本同步 (deletePartition = false)brokerRequestBatch.addStopReplicaRequestForBrokers(List(replicaId), topic, partition, deletePartition = false)// As an optimization, the controller removes dead replicas from the ISRval leaderAndIsrIsEmpty: Boolean =controllerContext.partitionLeadershipInfo.get(topicAndPartition) match {case Some(_) =>//将该 replica 从 Partition 的 isr 移除这个 replica(前提 isr 中还有其他有效副本)controller.removeReplicaFromIsr(topic, partition, replicaId) match {case Some(updatedLeaderIsrAndControllerEpoch) =>// send the shrunk ISR state change request to all the remaining alive replicas of the partition.val currentAssignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)if (!controller.topicDeletionManager.isPartitionToBeDeleted(topicAndPartition)) {// 发送 LeaderAndIsr 请求给剩余的其他副本,因为 ISR 变动了brokerRequestBatch.addLeaderAndIsrRequestForBrokers(currentAssignedReplicas.filterNot(_ == replicaId),topic, partition, updatedLeaderIsrAndControllerEpoch, replicaAssignment)}//更新这个 Replica 的状态为 OfflineReplicareplicaState.put(partitionAndReplica, OfflineReplica)logStateChange()falsecase None =>true}case None =>true}if (leaderAndIsrIsEmpty && !controller.topicDeletionManager.isPartitionToBeDeleted(topicAndPartition))throw new StateChangeFailedException(s"Failed to change state of replica $replicaId for partition $topicAndPartition since the leader " +s"and isr path in zookeeper is empty")}}catch {case t: Throwable =>stateChangeLog.error(s"Initiated state change of replica $replicaId for partition $topicAndPartition from " +s"$currState to $targetState failed", t)}}

上面 Replica 各种转移的触发的条件:
在这里插入图片描述
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/50659.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Vue常用指令及其生命周期

作者&#xff1a;CSDN-PleaSure乐事 欢迎大家阅读我的博客 希望大家喜欢 目录 1.常用指令 1.1 v-bind 1.2 v-model 注意事项 1.3 v-on 注意事项 1.4 v-if / v-else-if / v-else 1.5 v-show 1.6 v-for 无索引 有索引 生命周期 定义 流程 1.常用指令 Vue当中的指令…

远程项目调试-informer2020

informer2020 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting(原文&#xff09;Informer 是一个基于Transformer的模型&#xff0c;是为了应对长依赖关系而开发的。本文的主要主题是序列预测。序列预测可以在任何具有不断变化的数据的地方…

Linux——管理本地用户和组(详细介绍了Linux中用户和组的概念及用法)

目录 一、用户和组概念 &#xff08;一&#xff09;、用户的概念 &#xff08;二&#xff09;、组的概念 补充组 主要组 二、获取超级用户访问权限 &#xff08;一&#xff09;、su 命令和su -命令 &#xff08; 二&#xff09;、sudo命令 三、管理本地用户账户 &…

ERROR: Cannot find command ‘git’- do you have ‘git’ installed and in your PATH?

ERROR: Cannot find command ‘git’- do you have ‘git’ installed and in your PATH? 目录 ERROR: Cannot find command ‘git’- do you have ‘git’ installed and in your PATH? 【常见模块错误】 【解决方案】 欢迎来到英杰社区https://bbs.csdn.net/topics/61780…

Transformer自然语言处理实战pdf阅读

一.第一章 欢迎来到transformer的世界 1.解码器-编码器框架 在Transformer出现之前&#xff0c;NLP的最新技术是LSTM等循环架构。这些架 构通过在神经网络连接使用反馈循环&#xff0c;允许信息从一步传播到另一 步&#xff0c;使其成为对文本等序列数据进行建模的理想选择。如…

多模态大模型应用中的Q-Former是什么?

多模态大模型应用中的Q-Former是什么&#xff1f; Q-Former是一种新型的神经网络架构&#xff0c;专注于通过查询&#xff08;Query&#xff09;机制来改进信息检索和表示学习。在这篇博客中&#xff0c;我们将详细探讨Q-Former的工作原理、应用场景&#xff0c;并在必要时通过…

pyqt designer使用spliter

1、在designer界面需要使用spliter需要父界面不使用布局&#xff0c;减需要分割两个模块选中&#xff0c;再点击spliter分割 2、在分割后&#xff0c;再对父界面进行布局设置 3、对于两边需要不等比列放置的&#xff0c;需要套一层 group box在最外层进行分割

大数据学习之Flink基础

Flink基础 1、系统时间与时间时间 系统时间&#xff08;处理时间&#xff09; 在Sparksreaming的任务计算时&#xff0c;使用的是系统时间。 假设所用窗口为滚动窗口&#xff0c;大小为5分钟。那么每五分钟&#xff0c;都会对接收的数据进行提交任务. 但是&#xff0c;这里有…

GoogleCTF2023 Writeup

GoogleCTF2023 Writeup Misc NPC Crypto LEAST COMMON GENOMINATOR? Web UNDER-CONSTRUCTION NPC A friend handed me this map and told me that it will lead me to the flag. It is confusing me and I don’t know how to read it, can you help me out? Attach…

VSCode切换默认终端

我的VSCode默认终端为PowerShell&#xff0c;每次新建都会自动打开PowerShell。但是我想让每次都变为cmd&#xff0c;也就是Command Prompt 更改默认终端的操作方法如下&#xff1a; 键盘调出命令面板&#xff08;CtrlShiftP&#xff09;中,输入Terminal: Select Default Prof…

Java 中的Stream流

Stream流就像工厂中的流水线操作。 如何使用Stream&#xff1f; 1、首先要获取Stream流&#xff0c;那么如何获取呢? 对于不同的数据&#xff0c;有不同的获取方法。 ①单列集合 方法名说明default Stream<E> stream()Collection接口中的默认方法 所以实现了Colle…

Multi Range Read与Covering Index是如何优化回表的?

上篇文章末尾我们提出一个问题&#xff1a;有没有什么办法可以尽量避免回表或让回表的开销变小呢&#xff1f; 本篇文章围绕这个问题提出解决方案&#xff0c;一起来看看MySQL是如何优化的 回表 为什么会发生回表&#xff1f; 因为使用的索引并没有整条记录的所有信息&…

DataEase一键部署:轻松搭建数据可视化平台

DataEase是一个开源的数据可视化和分析工具&#xff0c;旨在帮助用户轻松创建和共享数据仪表盘。它支持多种数据源&#xff0c;包括关系型数据库&#xff0c;文件数据源&#xff0c;NoSQL数据库等&#xff0c;提供强大的数据查询、处理和可视化功能。DataEase 不仅是一款数据可…

VMware虚拟机中CentOS7自定义ip地址并且固定ip

配置固定ip(虚拟机) 前提&#xff1a;虚拟机网络配置成&#xff0c;自定义网络并选择VMnet8(NAT 模式) 操作(如下图)&#xff1a;点击虚拟机–》设置–》–》硬件–》网络适配器–》自定义&#xff1a;特定虚拟网络–》选择&#xff1a;VMnet8(NAT 模式) 虚拟机网络设置 需要记…

【漏洞复现】Jenkins CLI 接口任意文件读取漏洞(CVE-2024-23897)

漏洞简介 Jenkins是一款基于JAVA开发的开源自动化服务器。 Jenkins使用args4j来解析命令行输入&#xff0c;并支持通过HTTP、WebSocket等协议远程传入命令行参数。在args4j中&#xff0c;用户可以通过字符来加载任意文件&#xff0c;这导致攻击者可以通过该特性来读取服务器上…

论文快过(图像配准|Coarse_LoFTR_TRT)|适用于移动端的LoFTR算法的改进分析 1060显卡上45fps

项目地址&#xff1a;https://github.com/Kolkir/Coarse_LoFTR_TRT 创建时间&#xff1a;2022年 相关训练数据&#xff1a;BlendedMVS LoFTR [19]是一种有效的深度学习方法&#xff0c;可以在图像对上寻找合适的局部特征匹配。本文报道了该方法在低计算性能和有限内存条件下的…

【PyTorch】基于LSTM网络的气温预测模型实现

假设CSV文件名为temperature_data.csv&#xff0c;其前五行和标题如下&#xff1a; 这里&#xff0c;我们只使用Temperature列进行单步预测。以下是整合的代码示例&#xff1a; import pandas as pd import numpy as np import torch import torch.nn as nn import torch.op…

RocketMQ消息短暂而又精彩的一生(荣耀典藏版)

目录 前言 一、核心概念 二、消息诞生与发送 2.1.路由表 2.2.队列的选择 2.3.其它特殊情况处理 2.3.1.发送异常处理 2.3.2.消息过大的处理 三、消息存储 3.1.如何保证高性能读写 3.1.1.传统IO读写方式 3.2零拷贝 3.2.1.mmap() 3.2.2sendfile() 3.2.3.CommitLog …

Redis 7.x 系列【27】集群原理之通信机制

有道无术&#xff0c;术尚可求&#xff0c;有术无道&#xff0c;止于术。 本系列Redis 版本 7.2.5 源码地址&#xff1a;https://gitee.com/pearl-organization/study-redis-demo 文章目录 1. 概述2 节点和节点2.1 集群拓扑2.2 集群总线协议2.3 流言协议2.4 心跳机制2.5 节点握…

OpenGauss和GaussDB有何不同

OpenGauss和GaussDB是两个不同的数据库产品&#xff0c;它们都具有高性能、高可靠性和高可扩展性等优点&#xff0c;但是它们之间也有一些区别和相似之处。了解它们之间的关系、区别、建议、适用场景和如何学习&#xff0c;对于提高技能和保持行业敏感性非常重要。本文将深入探…