Redis源码解析:21sentinel(二)定期发送消息、检测主观下线

六:定时发送消息

         哨兵每隔一段时间,会向其所监控的所有实例发送一些命令,用于获取这些实例的状态。这些命令包括:”PING”、”INFO”和”PUBLISH”。

         “PING”命令,主要用于哨兵探测实例是否活着。如果对方超过一段时间,还没有回复”PING”命令,则认为其是主观下线了。

         “INFO”命令,主要用于哨兵获取实例当前的状态和信息,比如该实例当前是主节点还是从节点;该实例反馈的IP地址和PORT信息,是否与我记录的一样;该实例如果是主节点的话,那它都有哪些从节点;该实例如果是从节点的话,它与主节点是否连通,它的优先级是多少,它的复制偏移量是多少等等,这些信息在故障转移流程中,是判断实例状态的重要信息;

         “PUBLISH”命令,主要用于哨兵向实例的HELLO频道发布有关自己以及主节点的信息,也就是所谓的HELLO消息。因为所有哨兵都会订阅主节点和从节点的HELLO频道,因此,每个哨兵都会收到其他哨兵发布的信息。

         因此,通过这些命令,尽管在配置文件中只配置了主节点的信息,但是哨兵可以通过主节点的”INFO”回复,得到所有从节点的信息;又可以通过订阅实例的HELLO频道,接收其他哨兵通过”PUBLISH”命令发布的信息,从而得到监控同一主节点的所有其他哨兵的信息。

 

         在“主函数”sentinelHandleRedisInstance中,是通过调用sentinelSendPeriodicCommands来发送这些命令的。注意,以上的命令都有自己的发送周期,在sentinelSendPeriodicCommands函数中,并不是一并发送三个命令,而是发送那些,按照发送周期应该发送的命令。该函数的代码如下:

void sentinelSendPeriodicCommands(sentinelRedisInstance *ri) {mstime_t now = mstime();mstime_t info_period, ping_period;int retval;/* Return ASAP if we have already a PING or INFO already pending, or* in the case the instance is not properly connected. */if (ri->flags & SRI_DISCONNECTED) return;/* For INFO, PING, PUBLISH that are not critical commands to send we* also have a limit of SENTINEL_MAX_PENDING_COMMANDS. We don't* want to use a lot of memory just because a link is not working* properly (note that anyway there is a redundant protection about this,* that is, the link will be disconnected and reconnected if a long* timeout condition is detected. */if (ri->pending_commands >= SENTINEL_MAX_PENDING_COMMANDS) return;/* If this is a slave of a master in O_DOWN condition we start sending* it INFO every second, instead of the usual SENTINEL_INFO_PERIOD* period. In this state we want to closely monitor slaves in case they* are turned into masters by another Sentinel, or by the sysadmin. */if ((ri->flags & SRI_SLAVE) &&(ri->master->flags & (SRI_O_DOWN|SRI_FAILOVER_IN_PROGRESS))) {info_period = 1000;} else {info_period = SENTINEL_INFO_PERIOD;}/* We ping instances every time the last received pong is older than* the configured 'down-after-milliseconds' time, but every second* anyway if 'down-after-milliseconds' is greater than 1 second. */ping_period = ri->down_after_period;if (ping_period > SENTINEL_PING_PERIOD) ping_period = SENTINEL_PING_PERIOD;if ((ri->flags & SRI_SENTINEL) == 0 &&(ri->info_refresh == 0 ||(now - ri->info_refresh) > info_period)){/* Send INFO to masters and slaves, not sentinels. */retval = redisAsyncCommand(ri->cc,sentinelInfoReplyCallback, NULL, "INFO");if (retval == REDIS_OK) ri->pending_commands++;} else if ((now - ri->last_pong_time) > ping_period) {/* Send PING to all the three kinds of instances. */sentinelSendPing(ri);} else if ((now - ri->last_pub_time) > SENTINEL_PUBLISH_PERIOD) {/* PUBLISH hello messages to all the three kinds of instances. */sentinelSendHello(ri);}
}

         如果实例标志位中设置了SRI_DISCONNECTED标记,说明当前实例的异步上下文还没有创建好,因此直接返回;

         实例的pending_commands属性,表示已经向该实例发送的命令中,尚有pending_commands个命令还没有收到回复。每次调用redisAsyncCommand函数,向实例异步发送一条命令之后,就会增加该属性的值,而每当收到命令回复之后,就会减少该属性的值;

         因此,如果该属性的值大于SENTINEL_MAX_PENDING_COMMANDS(100),说明该实例尚有超过100条命令的回复信息没有收到。这种情况下,说明与实例的连接已经不正常了,为了节约内存,因此直接返回;

         接下来计算info_period和ping_period,这俩值表示发送"INFO"和"PING"命令的时间周期。如果当前时间距离上次收到"INFO"或"PING"回复的时间已经超过了info_period或ping_period,则向实例发送"INFO"或"PING"命令;

         如果当前实例为从节点,并且该从节点对应的主节点已经客观下线了,则置info_period为1000,否则的话置为SENTINEL_INFO_PERIOD(10000)。之所以在主节点客观下线后更频繁的向从节点发送"INFO"命令,是因为从节点可能会被置为新的主节点,因此需要更加实时的获取其状态;

         将ping_period置为ri->down_after_period的值,该属性的值是根据配置文件中down-after-milliseconds选项得到的,如果该属性值大于SENTINEL_PING_PERIOD(1000),则将ping_period置为SENTINEL_PING_PERIOD;

         接下来开始发送命令:如果当前实例不是哨兵实例,并且距离上次收到"INFO"命令回复已经超过了info_period,则向该实例异步发送"INFO"命令。

         否则,如果距离上次收到"PING"命令回复已经超过了ping_period,则调用函数sentinelSendPing向该实例异步发送"PING"命令;

         否则,如果距离上次收到"PUBLISH"命令的回复已经超过了SENTINEL_PUBLISH_PERIOD(2000),则调用函数sentinelSendHello向该实例异步发送"PUBLISH"命令;

         因此,"PING"用于探测实例是否活着,可以发送给所有类型的实例;而"INFO"命令用于获取实例的信息,只需发送给主节点和从节点实例;而"PUBLISH"用于向HELLO频道发布哨兵本身和主节点的信息,除了发送给主节点和从节点之外,哨兵本身也实现了"PUBLISH"命令的处理函数,因此该命令也会发送给哨兵实例。

 

1:PING消息

         函数sentinelSendPing用于向实例发送”PING”命令,因为该命令用于探测实例是否主观下线,因此等到后面讲解主观下线是在分析。

 

2:HELLO消息

         函数sentinelSendHello用于发布HELLO消息,它的代码如下:

int sentinelSendHello(sentinelRedisInstance *ri) {char ip[REDIS_IP_STR_LEN];char payload[REDIS_IP_STR_LEN+1024];int retval;char *announce_ip;int announce_port;sentinelRedisInstance *master = (ri->flags & SRI_MASTER) ? ri : ri->master;sentinelAddr *master_addr = sentinelGetCurrentMasterAddress(master);if (ri->flags & SRI_DISCONNECTED) return REDIS_ERR;/* Use the specified announce address if specified, otherwise try to* obtain our own IP address. */if (sentinel.announce_ip) {announce_ip = sentinel.announce_ip;} else {if (anetSockName(ri->cc->c.fd,ip,sizeof(ip),NULL) == -1)return REDIS_ERR;announce_ip = ip;}announce_port = sentinel.announce_port ?sentinel.announce_port : server.port;/* Format and send the Hello message. */snprintf(payload,sizeof(payload),"%s,%d,%s,%llu," /* Info about this sentinel. */"%s,%s,%d,%llu", /* Info about current master. */announce_ip, announce_port, server.runid,(unsigned long long) sentinel.current_epoch,/* --- */master->name,master_addr->ip,master_addr->port,(unsigned long long) master->config_epoch);retval = redisAsyncCommand(ri->cc,sentinelPublishReplyCallback, NULL, "PUBLISH %s %s",SENTINEL_HELLO_CHANNEL,payload);if (retval != REDIS_OK) return REDIS_ERR;ri->pending_commands++;return REDIS_OK;
}

         首先得到实例ri所属的主节点实例master;然后调用sentinelGetCurrentMasterAddress函数得到master的地址信息;

         如果实例ri的标志位中具有SRI_DISCONNECTED标记的话,直接返回;

         如果当前哨兵配置了sentinel.announce_ip的话,则使用该ip信息作为自己的ip地址,否则,调用anetSockName函数,根据socket描述符得到当前哨兵的ip地址;

         如果当前哨兵配置了sentinel.announce_port的话,则使用该port信息作为自己的端口信息,否则,使用server.port作为当前哨兵的端口信息;

         接下来组装要发布的HELLO信息,HELLO信息的格式是:"sentinel_ip,sentinel_port,sentinel_runid,current_epoch,master_name,master_ip,master_port,master_config_epoch"

         接下来,向ri异步发送"PUBLISH__sentinel__:hello <HELLO>"命令,设置命令回调函数为sentinelPublishReplyCallback;

 

         当哨兵收到实例对于该”PUBLISH”命令的回复之后,会调用回调函数sentinelPublishReplyCallback,该函数只用于更新属性ri->last_pub_time,对回复内容无需关心:

void sentinelPublishReplyCallback(redisAsyncContext *c, void *reply, void *privdata) {sentinelRedisInstance *ri = c->data;redisReply *r;REDIS_NOTUSED(privdata);if (ri) ri->pending_commands--;if (!reply || !ri) return;r = reply;/* Only update pub_time if we actually published our message. Otherwise* we'll retry again in 100 milliseconds. */if (r->type != REDIS_REPLY_ERROR)ri->last_pub_time = mstime();
}

 

         之前在介绍sentinelReconnectInstance函数时讲过,当哨兵向主节点或从节点实例建立订阅连接时,向实例发送” SUBSCRIBE __sentinel__:hello"命令,订阅HELLO频道时,设置该命令的回调函数为sentinelReceiveHelloMessages。因此,当收到该频道上发布的消息时,就会调用函数sentinelReceiveHelloMessages。

         该频道上的消息,是监控同一实例的其他哨兵节点发来的HELLO消息,当前哨兵通过HELLO消息,来发现其他哨兵,并且相互之间交互最新的主节点信息。sentinelReceiveHelloMessages函数的代码如下:

void sentinelReceiveHelloMessages(redisAsyncContext *c, void *reply, void *privdata) {sentinelRedisInstance *ri = c->data;redisReply *r;REDIS_NOTUSED(privdata);if (!reply || !ri) return;r = reply;/* Update the last activity in the pubsub channel. Note that since we* receive our messages as well this timestamp can be used to detect* if the link is probably disconnected even if it seems otherwise. */ri->pc_last_activity = mstime();/* Sanity check in the reply we expect, so that the code that follows* can avoid to check for details. */if (r->type != REDIS_REPLY_ARRAY ||r->elements != 3 ||r->element[0]->type != REDIS_REPLY_STRING ||r->element[1]->type != REDIS_REPLY_STRING ||r->element[2]->type != REDIS_REPLY_STRING ||strcmp(r->element[0]->str,"message") != 0) return;/* We are not interested in meeting ourselves */if (strstr(r->element[2]->str,server.runid) != NULL) return;sentinelProcessHelloMessage(r->element[2]->str, r->element[2]->len);
}

         该函数中,首先更新ri->pc_last_activity为当前时间;

         然后判断是否处理接收到的消息,注意,只处理"message"消息,也就是说不会处理"subscribe"消息;

         注意,如果收到的"message"消息中,包含了自身的runid,说明这是本哨兵自己发送的消息,因此无需处理,直接返回;

         最后,调用sentinelProcessHelloMessage函数处理收到的HELLO消息;

         注意:在测试时发现会收到从节点重复的HELLO消息,也就是同一时间,同一个哨兵发布的两条一模一样的消息。这是因为哨兵向主节点发送的”PUBLISH”命令,会因为主从复制的原因,而同步到从节点;而同时该哨兵也向从节点发送”PUBLISH”命令,因此,从节点就会在同一时间,收到两条一模一样的HELLO消息,并将它们发布到频道上。

 

         另外,一旦哨兵发现了其他哨兵之后,可以直接向其发送"PUBLISH __sentinel__:hello <HELLO>"命令。哨兵自己实现了”PUBLISH”的处理函数sentinelPublishCommand,当收到其他哨兵直接发来的HELLO消息时,就会调用该函数处理。该函数的代码如下:

void sentinelPublishCommand(redisClient *c) {if (strcmp(c->argv[1]->ptr,SENTINEL_HELLO_CHANNEL)) {addReplyError(c, "Only HELLO messages are accepted by Sentinel instances.");return;}sentinelProcessHelloMessage(c->argv[2]->ptr,sdslen(c->argv[2]->ptr));addReplyLongLong(c,1);
}


         因此,不管是从真正的订阅频道中收到HELLO消息,还是直接收到其他哨兵发来的”PUBLISH”命令,最终都是通过sentinelProcessHelloMessage函数对HELLO消息进行处理的。该函数的代码如下:

void sentinelProcessHelloMessage(char *hello, int hello_len) {/* Format is composed of 8 tokens:* 0=ip,1=port,2=runid,3=current_epoch,4=master_name,* 5=master_ip,6=master_port,7=master_config_epoch. */int numtokens, port, removed, master_port;uint64_t current_epoch, master_config_epoch;char **token = sdssplitlen(hello, hello_len, ",", 1, &numtokens);sentinelRedisInstance *si, *master;if (numtokens == 8) {/* Obtain a reference to the master this hello message is about */master = sentinelGetMasterByName(token[4]);if (!master) goto cleanup; /* Unknown master, skip the message. *//* First, try to see if we already have this sentinel. */port = atoi(token[1]);master_port = atoi(token[6]);si = getSentinelRedisInstanceByAddrAndRunID(master->sentinels,token[0],port,token[2]);current_epoch = strtoull(token[3],NULL,10);master_config_epoch = strtoull(token[7],NULL,10);if (!si) {/* If not, remove all the sentinels that have the same runid* OR the same ip/port, because it's either a restart or a* network topology change. */removed = removeMatchingSentinelsFromMaster(master,token[0],port,token[2]);if (removed) {sentinelEvent(REDIS_NOTICE,"-dup-sentinel",master,"%@ #duplicate of %s:%d or %s",token[0],port,token[2]);}/* Add the new sentinel. */si = createSentinelRedisInstance(NULL,SRI_SENTINEL,token[0],port,master->quorum,master);if (si) {sentinelEvent(REDIS_NOTICE,"+sentinel",si,"%@");/* The runid is NULL after a new instance creation and* for Sentinels we don't have a later chance to fill it,* so do it now. */si->runid = sdsnew(token[2]);sentinelFlushConfig();}}/* Update local current_epoch if received current_epoch is greater.*/if (current_epoch > sentinel.current_epoch) {sentinel.current_epoch = current_epoch;sentinelFlushConfig();sentinelEvent(REDIS_WARNING,"+new-epoch",master,"%llu",(unsigned long long) sentinel.current_epoch);}/* Update master info if received configuration is newer. */if (master->config_epoch < master_config_epoch) {master->config_epoch = master_config_epoch;if (master_port != master->addr->port ||strcmp(master->addr->ip, token[5])){sentinelAddr *old_addr;sentinelEvent(REDIS_WARNING,"+config-update-from",si,"%@");sentinelEvent(REDIS_WARNING,"+switch-master",master,"%s %s %d %s %d",master->name,master->addr->ip, master->addr->port,token[5], master_port);old_addr = dupSentinelAddr(master->addr);sentinelResetMasterAndChangeAddress(master, token[5], master_port);sentinelCallClientReconfScript(master,SENTINEL_OBSERVER,"start",old_addr,master->addr);releaseSentinelAddr(old_addr);}}/* Update the state of the Sentinel. */if (si) si->last_hello_time = mstime();}cleanup:sdsfreesplitres(token,numtokens);
}

         首先,根据消息中的master_name,调用函数sentinelGetMasterByName,在字典sentinel.masters中寻找相应的主节点实例master,如果找不到,则直接退出;

         然后,调用getSentinelRedisInstanceByAddrAndRunID函数,根据消息中的sentinel_ip,sentinel_port和sentinel_runid信息,在字典master->sentinels中,找到runid,ip和port都匹配的哨兵实例。

         如果没有找到匹配的哨兵实例,要么这是一个新发现的哨兵,要么是某个哨兵的信息发生了变化(比如有可能某个哨兵实例重启了,导致runid发生了变化;或者网络拓扑发生了变化,导致ip或port发生了变化)。

         这种情况下,首先调用函数removeMatchingSentinelsFromMaster,删除字典master->sentinels中,具有相同runid,或者具有相同ip和port的哨兵实例;然后根据HELLO消息中的ip和port信息,重新创建一个新的哨兵实例,添加到字典master->sentinels中,这样下次调用sentinelReconnectInstance时,就会向该哨兵实例进行建链了。;

         如果找到了匹配的哨兵实例,并且HELLO消息中的sentinel_current_epoch,大于本实例当前的current_epoch,则更新本实例的current_epoch属性;

         如果HELLO消息中的master_config_epoch,大于本实例记录的master的config_epoch,则更新本实例记录的master的config_epoch。并且如果HELLO消息中的master_ip或master_port,与本实例记录的主节点的ip或port信息不匹配的话,则说明可能发生了故障转移,某个从节点升级成为了新的主节点,因此调用sentinelResetMasterAndChangeAddress函数,重置主节点,及其从节点实例的信息;

         最后,更新si->last_hello_time属性为当前时间;

 

3:”INFO”命令

         “INFO”命令,主要用于哨兵获取主从节点实例当前的状态和信息,比如该实例当前是主节点还是从节点;该实例反馈的IP地址和PORT信息,是否与本哨兵记录的一样;该实例如果是主节点的话,那它都有哪些从节点;该实例如果是从节点的话,它与主节点是否连通,它的优先级是多少,它的复制偏移量是多少等等,这些信息在故障转移流程中,是判断实例状态的重要信息;

         在sentinelSendPeriodicCommands函数中,设置的”INFO”命令的回调函数是sentinelInfoReplyCallback。该函数的代码很简单,主要是调用sentinelRefreshInstanceInfo函数对回复进行处理。因此,主要看一下sentinelRefreshInstanceInfo函数的代码:

void sentinelRefreshInstanceInfo(sentinelRedisInstance *ri, const char *info) {sds *lines;int numlines, j;int role = 0;/* The following fields must be reset to a given value in the case they* are not found at all in the INFO output. */ri->master_link_down_time = 0;/* Process line by line. */lines = sdssplitlen(info,strlen(info),"\r\n",2,&numlines);for (j = 0; j < numlines; j++) {sentinelRedisInstance *slave;sds l = lines[j];/* run_id:<40 hex chars>*/if (sdslen(l) >= 47 && !memcmp(l,"run_id:",7)) {if (ri->runid == NULL) {ri->runid = sdsnewlen(l+7,40);} else {if (strncmp(ri->runid,l+7,40) != 0) {sentinelEvent(REDIS_NOTICE,"+reboot",ri,"%@");sdsfree(ri->runid);ri->runid = sdsnewlen(l+7,40);}}}/* old versions: slave0:<ip>,<port>,<state>* new versions: slave0:ip=127.0.0.1,port=9999,... */if ((ri->flags & SRI_MASTER) &&sdslen(l) >= 7 &&!memcmp(l,"slave",5) && isdigit(l[5])){char *ip, *port, *end;if (strstr(l,"ip=") == NULL) {/* Old format. */ip = strchr(l,':'); if (!ip) continue;ip++; /* Now ip points to start of ip address. */port = strchr(ip,','); if (!port) continue;*port = '\0'; /* nul term for easy access. */port++; /* Now port points to start of port number. */end = strchr(port,','); if (!end) continue;*end = '\0'; /* nul term for easy access. */} else {/* New format. */ip = strstr(l,"ip="); if (!ip) continue;ip += 3; /* Now ip points to start of ip address. */port = strstr(l,"port="); if (!port) continue;port += 5; /* Now port points to start of port number. *//* Nul term both fields for easy access. */end = strchr(ip,','); if (end) *end = '\0';end = strchr(port,','); if (end) *end = '\0';}/* Check if we already have this slave into our table,* otherwise add it. */if (sentinelRedisInstanceLookupSlave(ri,ip,atoi(port)) == NULL) {if ((slave = createSentinelRedisInstance(NULL,SRI_SLAVE,ip,atoi(port), ri->quorum, ri)) != NULL){sentinelEvent(REDIS_NOTICE,"+slave",slave,"%@");sentinelFlushConfig();}}}/* master_link_down_since_seconds:<seconds> */if (sdslen(l) >= 32 &&!memcmp(l,"master_link_down_since_seconds",30)){ri->master_link_down_time = strtoll(l+31,NULL,10)*1000;}/* role:<role> */if (!memcmp(l,"role:master",11)) role = SRI_MASTER;else if (!memcmp(l,"role:slave",10)) role = SRI_SLAVE;if (role == SRI_SLAVE) {/* master_host:<host> */if (sdslen(l) >= 12 && !memcmp(l,"master_host:",12)) {if (ri->slave_master_host == NULL ||strcasecmp(l+12,ri->slave_master_host)){sdsfree(ri->slave_master_host);ri->slave_master_host = sdsnew(l+12);ri->slave_conf_change_time = mstime();}}/* master_port:<port> */if (sdslen(l) >= 12 && !memcmp(l,"master_port:",12)) {int slave_master_port = atoi(l+12);if (ri->slave_master_port != slave_master_port) {ri->slave_master_port = slave_master_port;ri->slave_conf_change_time = mstime();}}/* master_link_status:<status> */if (sdslen(l) >= 19 && !memcmp(l,"master_link_status:",19)) {ri->slave_master_link_status =(strcasecmp(l+19,"up") == 0) ?SENTINEL_MASTER_LINK_STATUS_UP :SENTINEL_MASTER_LINK_STATUS_DOWN;}/* slave_priority:<priority> */if (sdslen(l) >= 15 && !memcmp(l,"slave_priority:",15))ri->slave_priority = atoi(l+15);/* slave_repl_offset:<offset> */if (sdslen(l) >= 18 && !memcmp(l,"slave_repl_offset:",18))ri->slave_repl_offset = strtoull(l+18,NULL,10);}}ri->info_refresh = mstime();sdsfreesplitres(lines,numlines);/* ---------------------------- Acting half -----------------------------* Some things will not happen if sentinel.tilt is true, but some will* still be processed. *//* Remember when the role changed. */if (role != ri->role_reported) {ri->role_reported_time = mstime();ri->role_reported = role;if (role == SRI_SLAVE) ri->slave_conf_change_time = mstime();/* Log the event with +role-change if the new role is coherent or* with -role-change if there is a mismatch with the current config. */sentinelEvent(REDIS_VERBOSE,((ri->flags & (SRI_MASTER|SRI_SLAVE)) == role) ?"+role-change" : "-role-change",ri, "%@ new reported role is %s",role == SRI_MASTER ? "master" : "slave",ri->flags & SRI_MASTER ? "master" : "slave");}/* None of the following conditions are processed when in tilt mode, so* return asap. */if (sentinel.tilt) return;/* Handle master -> slave role switch. */if ((ri->flags & SRI_MASTER) && role == SRI_SLAVE) {/* Nothing to do, but masters claiming to be slaves are* considered to be unreachable by Sentinel, so eventually* a failover will be triggered. */}...
} 

         该函数首先在for循环中解析"INFO"回复信息:

         首先解析出"run_id"之后的信息,保存在ri->runid中。如果该实例的runid发生了变化,还需要记录日志,向"+reboot"频道发布消息;

         如果实例为主节点,则解析"slave"后的从节点信息,取出其中的ip和port信息,然后根据ip和port,调用sentinelRedisInstanceLookupSlave函数,在字典ri->slaves中寻找是否已经保存了该从节点的信息。如果没有,则调用createSentinelRedisInstance创建从节点实例,并插入到ri->slaves中,也就是发现了主节点属下的从节点,下次调用函数sentinelReconnectInstance时,就会向该从节点建链了;

         解析"master_link_down_since_seconds"信息,该信息表示从节点与主节点的断链时间。将其转换成整数后,记录到ri->master_link_down_time中;

         解析"role"信息,如果包含"role:master",则置role为SRI_MASTER,说明该实例报告自己为主节点;如果包含"role:slave",则置role为SRI_SLAVE,说明该实例报告自己为从节点;

         如果role为SRI_SLAVE,找到回复信息中的"master_host:"信息,记录到ri->slave_master_host中;找到回复信息中的"master_port:"信息,记录到ri->slave_master_port中;找到回复信息中的"master_link_status:"信息,根据其值是否为"up",记录到ri->slave_master_link_status中;找到回复信息中的"slave_priority:"信息,记录到ri->slave_priority中;找到回复信息中的"slave_repl_offset:"信息,记录到ri->slave_repl_offset中;

         解析完所有"INFO"回复信息之后,更新ri->info_refresh为当前时间;

        

         接下来根据实例的角色信息执行一些动作:

         ri->role_reported的初始值是根据ri->flags得到的,如果收到"INFO"回复后,解析得到的role与ri->role_reported不同,说明该实例的角色发生了变化,比如从主节点变成了从节点,或者相反。只要role与ri->role_reported不同,就首先更新ri->role_reported_time为当前时间,并且将ri->role_reported置为role;如果role为SRI_SLAVE,还需要更新ri->slave_conf_change_time的值为当前时间;最后,还根据ri->flags中的角色是否与role,来记录日志,发布信息;

         如果当前哨兵已经进入了TILT模式,则直接返回;

         如果ri->flags中为主节点,但是role为从节点,这种情况无需采取动作,因为这种情况会被视为主节点不可达,最终会引发故障迁移流程;

         本函数剩下的动作,与故障转移流程有关,后续在介绍。

 

七:判断实例是否主观下线

         首先解释一下主观下线和客观下线的区别。

         所谓主观下线,就是从“我”(当前实例)的角度来看,某个实例已经下线了。但是单个哨兵的视角可能是盲目的,仅从“我”的角度,就决定一个实例下线是武断的。因此,“我”还会通过命令询问其他哨兵节点,看它们是否也认为该实例已经下线了,如果超过quorum个(包括“我”)哨兵反馈认为该实例已经下线了,则“我”就会认为该实例确实已经下线了,也就是所谓的客观下线了。

 

         判断某个实例主观下线,主要是根据其是否能及时回复”PING”命令决定的。因此,首先看一下发送”PING”命令的函数sentinelSendPing的实现:

int sentinelSendPing(sentinelRedisInstance *ri) {int retval = redisAsyncCommand(ri->cc,sentinelPingReplyCallback, NULL, "PING");if (retval == REDIS_OK) {ri->pending_commands++;/* We update the ping time only if we received the pong for* the previous ping, otherwise we are technically waiting* since the first ping that did not received a reply. */if (ri->last_ping_time == 0) ri->last_ping_time = mstime();return 1;} else {return 0;}
}

         在该函数中,设置收到”PING”命令回复后的回调函数为sentinelPingReplyCallback。

需要注意的是,如果ri->last_ping_time值为0,则更新ri->last_ping_time为当前时间。而只有在收到"PING"命令的正常回复之后,ri->last_ping_time的值才会被置为0。

 

         下面是回调函数sentinelPingReplyCallback的代码:

void sentinelPingReplyCallback(redisAsyncContext *c, void *reply, void *privdata) {sentinelRedisInstance *ri = c->data;redisReply *r;REDIS_NOTUSED(privdata);if (ri) ri->pending_commands--;if (!reply || !ri) return;r = reply;if (r->type == REDIS_REPLY_STATUS ||r->type == REDIS_REPLY_ERROR) {/* Update the "instance available" field only if this is an* acceptable reply. */if (strncmp(r->str,"PONG",4) == 0 ||strncmp(r->str,"LOADING",7) == 0 ||strncmp(r->str,"MASTERDOWN",10) == 0){ri->last_avail_time = mstime();ri->last_ping_time = 0; /* Flag the pong as received. */} else {/* Send a SCRIPT KILL command if the instance appears to be* down because of a busy script. */if (strncmp(r->str,"BUSY",4) == 0 &&(ri->flags & SRI_S_DOWN) &&!(ri->flags & SRI_SCRIPT_KILL_SENT)){if (redisAsyncCommand(ri->cc,sentinelDiscardReplyCallback, NULL,"SCRIPT KILL") == REDIS_OK)ri->pending_commands++;ri->flags |= SRI_SCRIPT_KILL_SENT;}}}ri->last_pong_time = mstime();
}

         如果回复信息为"PONG","LOADING"或"MASTERDOWN",表示正常回复,因此置该实例的属性ri->last_avail_time为当前时间,并且置ri->last_ping_time为0,这样下次发送"PING"命令时就会更新ri->last_ping_time的值了;

         如果回复信息以"BUSY"开头,并且该实例已经被置为主观下线,并且还没有向该实例发送过"SCRIPT KILL"命令,则向该实例发送"SCRIPTKILL"命令;

         最后,不管回复信息是什么,更新ri->last_pong_time为当前时间。

 

         因此,有关”PING”命令的时间属性总结如下:

         ri->last_ping_time:上一次正常发送”PING”命令的时间。需要注意的是,只有当收到"PING"命令的正常回复后,下次发送"PING"命令时才会更新该属性为当时时间戳。如果发送”PING”命令后,没有收到任何回复,或者没有收到正常回复,则下次发送”PING”命令时,就不会更新该属性。如果该属性值为0,说明已经收到了上一个"PING"命令的正常回复,但是还没有开始发送下一个"PING"命令。检测实例是否主观下线,主要就是根据该属性判断的。

         ri->last_pong_time:每当收到"PING"命令的回复后,不管是否是正常恢复,都会更新该属性为当时时间戳;

 

         在哨兵的“主函数”sentinelHandleRedisInstance中,调用sentinelCheckSubjectivelyDown函数检测实例是否主观下线,该函数同时还会检测TCP连接是否正常。该函数的代码如下:

void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {mstime_t elapsed = 0;if (ri->last_ping_time)elapsed = mstime() - ri->last_ping_time;/* Check if we are in need for a reconnection of one of the* links, because we are detecting low activity.** 1) Check if the command link seems connected, was connected not less*    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have a*    pending ping for more than half the timeout. */if (ri->cc &&(mstime() - ri->cc_conn_time) > SENTINEL_MIN_LINK_RECONNECT_PERIOD &&ri->last_ping_time != 0 && /* Ther is a pending ping... *//* The pending ping is delayed, and we did not received* error replies as well. */(mstime() - ri->last_ping_time) > (ri->down_after_period/2) &&(mstime() - ri->last_pong_time) > (ri->down_after_period/2)){sentinelKillLink(ri,ri->cc);}/* 2) Check if the pubsub link seems connected, was connected not less*    than SENTINEL_MIN_LINK_RECONNECT_PERIOD, but still we have no*    activity in the Pub/Sub channel for more than*    SENTINEL_PUBLISH_PERIOD * 3.*/if (ri->pc &&(mstime() - ri->pc_conn_time) > SENTINEL_MIN_LINK_RECONNECT_PERIOD &&(mstime() - ri->pc_last_activity) > (SENTINEL_PUBLISH_PERIOD*3)){sentinelKillLink(ri,ri->pc);}/* Update the SDOWN flag. We believe the instance is SDOWN if:** 1) It is not replying.* 2) We believe it is a master, it reports to be a slave for enough time*    to meet the down_after_period, plus enough time to get two times*    INFO report from the instance. */if (elapsed > ri->down_after_period ||(ri->flags & SRI_MASTER &&ri->role_reported == SRI_SLAVE &&mstime() - ri->role_reported_time >(ri->down_after_period+SENTINEL_INFO_PERIOD*2))){/* Is subjectively down */if ((ri->flags & SRI_S_DOWN) == 0) {sentinelEvent(REDIS_WARNING,"+sdown",ri,"%@");ri->s_down_since_time = mstime();ri->flags |= SRI_S_DOWN;}} else {/* Is subjectively up */if (ri->flags & SRI_S_DOWN) {sentinelEvent(REDIS_WARNING,"-sdown",ri,"%@");ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT);}}
}

         ri->cc_conn_time属性表示上一次向该实例发起命令类型的TCP建链的时间;ri->pc_conn_time属性表示上一次向该实例发起订阅类型的TCP建链的时间;

         首先计算elapsed的值,该值表示是当前时间与ri->last_ping_time之间的时间差;

         然后判断命令类型的TCP连接是否正常,不正常的条件是:距离上次建链时已经超过了SENTINEL_MIN_LINK_RECONNECT_PERIOD,并且上次发送"PING"后还没有收到正常回复,且当前时间与ri->last_ping_time之间的时间差已经超过了ri->down_after_period/2,并且距离上次收到任何"PING"回复的时间,已经超过了ri->down_after_period/2;

         如果命令类型的连接不正常了,则直接调用sentinelKillLink断开连接,释放异步上下文;

 

         然后判断订阅类型的TCP连接是否正常,不正常的条件是:距离上次建链时已经超过了SENTINEL_MIN_LINK_RECONNECT_PERIOD,并且距离上次收到订阅频道发来的任何消息的时间,已经超过了SENTINEL_PUBLISH_PERIOD*3;

         如果订阅类型的连接不正常了,则直接调用sentinelKillLink断开连接,释放异步上下文;

 

         如果elapsed的值大于ri->down_after_period,或者:当前实例我认为它是主节点,但是它的"INFO"回复中却报告自己是从节点,并且距离上次收到它在"INFO"回复中报告自己是从节点的时间,已经超过了ri->down_after_period+SENTINEL_INFO_PERIOD*2;

         满足以上任意一个条件,都认为该实例是主观下线了。因此:只要该实例还没有标志为主观下线,则将SRI_S_DOWN标记增加到实例标志位中,表示该实例主观下线;

         如果不满足以上条件,但是该实例之前已经被标记为主观下线了,则认为该实例主观上线了,去掉其标志位中的SRI_S_DOWN和SRI_SCRIPT_KILL_SENT标记;

转载于:https://www.cnblogs.com/gqtcgq/p/7247048.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/395862.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

[SDOI2018]原题识别

题解&#xff1a; 。。感觉挺烦得 而且我都没有注意到树随机这件事情。。 就写个30分的莫队。。 #include <bits/stdc.h> using namespace std; #define rint register int #define IL inline #define rep(i,h,t) for (int ih;i<t;i) #define dep(i,t,h) for (int it;…

django app中扩展users表

app models中编写新的User1 # _*_ coding:utf-8 _*_2 from __future__ import unicode_literals34 from django.db import models5 from django.contrib.auth.models import AbstractUser # 继承user67 # Create your models here.8910 class UserProfile(AbstractUser):11 …

[bzoj2301] [HAOI2011]Problem b

Description 对于给出的n个询问&#xff0c;每次求有多少个数对(x,y)&#xff0c;满足a≤x≤b&#xff0c;c≤y≤d&#xff0c;且gcd(x,y) k&#xff0c;gcd(x,y)函数为x和y的最大公约数。 Input 第一行一个整数n&#xff0c;接下来n行每行五个整数&#xff0c;分别表示a、b、…

华为p4用鸿蒙系统吗_华为p40pro是鸿蒙系统吗

华为的鸿蒙OS是一款“面向未来”的操作系统&#xff0c;一款基于微内核的面向全场景的分布式操作系统&#xff0c;此前mate30系列并没有搭载鸿蒙系统。那华为p40pro是鸿蒙系统吗&#xff1f;品牌型号&#xff1a;华为p40pro华为p40pro是鸿蒙系统吗&#xff1f;华为p40pro没有搭…

设置MYSQL允许用IP访问

mysql>use mysql;mysql>update user set host % where user root;mysql>flush privileges;mysql>select host,user from user where userroot;mysql>quit 转载于:https://www.cnblogs.com/vipstone/p/5541619.html

Web优化 --利用css sprites降低图片请求

sprites是鬼怪&#xff0c;小妖精&#xff0c;调皮鬼的意思&#xff0c;初听这个高端洋气的名字我被震慑住了&#xff0c;一步步掀开其面纱后发觉非常easy的东西。作用却非常大 什么是CSS Sprites CSS Sprites是指把网页中非常多小图片&#xff08;非常多图标文件&#xff09;做…

[BZOJ3203][SDOI2013]保护出题人(凸包+三分)

https://www.cnblogs.com/Skyminer/p/6435544.html 先不要急于转化成几何模型&#xff0c;先把式子化到底再对应到几何图形中去。 1 #include<cstdio>2 #include<algorithm>3 #define rep(i,l,r) for (int i(l); i<(r); i)4 typedef long long ll;5 using names…

轻松创建nodejs服务器(1):一个简单nodejs服务器例子

这篇文章主要介绍了一个简单nodejs服务器例子,本文实现了一个简单的hello world例子,并展示如何运行这个服务器,需要的朋友可以参考下我们先来实现一个简单的例子&#xff0c;hello world。 似乎每种语言教程的第一节都会讲这个&#xff0c;我们也不例外。 首先我们先创建一个项…

谁是赢家_人工智能竞赛正在进行中。 这是赢家。

谁是赢家by Terren Peterson由Terren Peterson 人工智能竞赛正在进行中。 这是赢家。 (The race is on for artificial intelligence. Here’s who is winning.) On Saturday, Louisville, Kentucky hosted the 143rd running of the Kentucky Derby. It was a spectacle wher…

mysql取消mvvc机制_MySQL探秘(六):InnoDB一致性非锁定读

一致性非锁定读(consistent nonlocking read)是指InnoDB存储引擎通过多版本控制(MVVC)读取当前数据库中行数据的方式。如果读取的行正在执行DELETE或UPDATE操作&#xff0c;这时读取操作不会因此去等待行上锁的释放。相反地&#xff0c;InnoDB会去读取行的一个快照。上图直观地…

自动化脚本

自动化脚本工具: http://appium.io/slate/cn/master/?python#about-appium 查看app元素工具: uiautomatorviewer http://www.cnblogs.com/ITGirl00/p/4235466.html app 反编译原理 http://blog.csdn.net/jiangwei0910410003/article/details/47188679转载于:https://www.cnblo…

springmvc常用注解之@Controller和@RequestMapping

对于各种注解而言&#xff0c;排第一的当然是“Controller”,表明某类是一个controller。 “RequestMapping”请求路径映射&#xff0c;如果标注在某个controller的类级别上&#xff0c;则表明访问此类路径下的方法都要加上其配置的路径&#xff1b;最常用是标注在方法上&…

最小可行产品是什么_无论如何,“最小可行产品”到底意味着什么?

最小可行产品是什么by Ravi Vadrevu通过拉维瓦德雷武(Ravi Vadrevu) 无论如何&#xff0c;“最小可行产品”实际上是什么意思&#xff1f; (What does “Minimum Viable Product” actually mean, anyway?) 伊隆马斯克(Elon Musk)提出一个令人困惑的想法 (Elon Musk on makin…

站立会议12-2

编写团队博客&#xff0c;进行资料的查看转载于:https://www.cnblogs.com/qijun1120/p/10247725.html

彻底删除mysql server 2005_sql2005卸载工具(sql server 2005卸载工具)

如果您要安装新版的sql就必须先完整的卸载sql2005&#xff0c;如果你按照常规的方法是不能完整的卸载sql2005&#xff0c;从而会引起安装的时候说sql已经挂起的错误&#xff0c;sql2005卸载工具(sql server 2005卸载工具)&#xff0c;是一个帮你完整的清理已经安装的sql的工具。…

谷歌浏览器有时会卡顿_Google不会,不要学:为什么搜索有时会比了解更好

谷歌浏览器有时会卡顿by Jeremy Gunter杰里米甘特(Jeremy Gunter) Google不会&#xff0c;不要学&#xff1a;为什么搜索有时会比了解更好 (Google not, learn not: why searching can sometimes be better than knowing) A few months ago, I was reading through some of th…

codevs 1907 方格取数 3

Description 在一个有m*n 个方格的棋盘中&#xff0c;每个方格中有一个正整数。现要从方格中取数&#xff0c;使任意2 个数所在方格没有公共边&#xff0c;且取出的数的总和最大。试设计一个满足要求的取数算法。 Input 第1 行有2 个正整数m和n&#xff0c;分别表示棋盘的行数和…

APP应用 HTTP/1.0中keep-alive

在HTTP/1.0中keep-alive不是标准协议&#xff0c;客户端必须发送Connection:Keep-Alive来激活keep-alive连接。https://www.imooc.com/article/31231HTTP协议是无状态的协议&#xff0c;即每一次请求都是互相独立的。因此它的最初实现是&#xff0c;每一个http请求都会打开一个…

mysql 日期滞后_如何滞后MySQL中的列?

要在MySQL中滞后一列&#xff0c;首先让我们创建一个表。创建表的查询如下-mysql> create table LagDemo-> (-> UserId int,-> UserValue int-> );示例使用insert命令在表中插入一些记录。查询如下-mysql> insert into LagDemo values(12,158);mysql> ins…

oracle高效分页查询总结

本文参考链接&#xff1a;http://blog.sina.com.cn/s/blog_8604ca230100vro9.html 探索查询语句&#xff1a; --分页参数&#xff1a;size 20 page 2 --没有order by的查询 -- 嵌套子查询&#xff0c;两次筛选&#xff08;推荐使用&#xff09; --SELECT * -- FROM (SELECT R…