什么是redis sentinel
参考文档:https://redis.io/topics/sentinel
简单的来说,就是Redis Sentinel 为redis 提供高可用性,主要体现在下面几个方面:
1.监控:redis sentinel会不间断的监控主服务器和从服务器是否正常工作
2.通知:当出现问题时,sentinel可以通过API通知系统管理员以及另外的服务器
3.自动故障转移:如果主服务器出现故障,sentinel可以启动故障转移,将其中一台从服务器升级为主服务器,其他的从服务器会重新配置为新主服务器 4.提供配置:sentinel充当客户端发现权限来源,客户端连接到sentinel询问负责给定服务器当前redis主服务器地址,如果发生故障,sentinel将报告新地址
redis sentinel 模拟环境
模拟环境为:1主2从
========redis=================sentinel==========
master:127.0.0.1 6379 127.0.0.1 26379
slave1:127.0.0.1 6380 127.0.0.1 26380
slave2:127.0.0.1 6381 127.0.0.1 26381
环境搭建
redis.conf配置
6379
# cat redis-6379.conf | grep -Ev "^$|^#" bind 127.0.0.1 port 6379 daemonize yes pidfile /var/run/redis_6379.pid logfile "/root/redis/redis-6379.log" dbfilename dump-6379.rdb dir /root/redis ... #
6380
# cat redis-6380.conf | grep -Ev "^$|^#" bind 127.0.0.1 port 6380 daemonize yes pidfile /var/run/redis_6380.pid logfile "/root/redis/redis-6380.log" dbfilename dump-6380.rdb dir /root/redis ... #
6381
# cat redis-6381.conf | grep -Ev "^$|^#" bind 127.0.0.1 port 6381 daemonize yes pidfile /var/run/redis_6381.pid logfile "/root/redis/redis-6381.log" dbfilename dump-6381.rdb dir /root/redis ... #
sentinel.conf配置
6379/6380/6381
# cat sentinel-*.conf | grep -Ev "^#|^$" port 26379 daemonize yes logfile "/root/redis/sentinel-6379.log" dir "/tmp" sentinel monitor mymaster 127.0.0.1 6379 2 sentinel down-after-milliseconds mymaster 30000 sentinel parallel-syncs mymaster 1 sentinel failover-timeout mymaster 180000 #
启动redis server 和 sentinel
redis: # redis-server /etc/redis_6379.conf # redis-server /etc/redis_6380.conf # redis-server /etc/redis_6381.confsentinel: # redis-sentinel /etc/sentinel-6379.conf # redis-sentinel /etc/sentinel-6380.conf # redis-sentinel /etc/sentinel-6381.conf
配置主从复制
# redis-cli -p 6380 127.0.0.1:6380> SLAVEOF 127.0.0.1 6379 OK 127.0.0.1:6380> exit# redis-cli -p 6381 127.0.0.1:6381> SLAVEOF 127.0.0.1 6379 OK 127.0.0.1:6381> exit
模拟故障迁移
首先,kill 掉redis master进程
# for n in `ps aux | grep redis-server | grep 6379 | awk '{print $2}'`;do kill -9 $n ;done;
分析log
首先,redis 从服务器首先发现redis master 服务器无法连接,报错如下:
# tail -F redis-63*.log ==> redis-6380.log <== 2851:S 13 Nov 14:48:54.235 # Connection with master lost. 2851:S 13 Nov 14:48:54.235 * Caching the disconnected master state.==> redis-6381.log <== 3695:S 13 Nov 14:48:54.466 * Connecting to MASTER 127.0.0.1:6379 3695:S 13 Nov 14:48:54.466 * MASTER <-> SLAVE sync started 3695:S 13 Nov 14:48:54.467 # Error condition on socket for SYNC: Connection refused==> redis-6380.log <== 2851:S 13 Nov 14:48:54.781 * Connecting to MASTER 127.0.0.1:6379 2851:S 13 Nov 14:48:54.782 * MASTER <-> SLAVE sync started 2851:S 13 Nov 14:48:54.782 # Error condition on socket for SYNC: Connection refused ...
紧接着,redis sentinel 完成故障切换,从log来看,当6379主节点挂了之后,redis重新提了一个从节点6380为主节点,log 如下:
# tail -F sentinel-63*.log ==> sentinel-6379.log <== 3225:X 13 Nov 14:49:24.322 # +sdown master mymaster 127.0.0.1 6379==> sentinel-6381.log <== 3235:X 13 Nov 14:49:24.327 # +sdown master mymaster 127.0.0.1 6379==> sentinel-6380.log <== 3230:X 13 Nov 14:49:24.332 # +sdown master mymaster 127.0.0.1 6379==> sentinel-6381.log <== 3235:X 13 Nov 14:49:24.386 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2 3235:X 13 Nov 14:49:24.386 # +new-epoch 1 3235:X 13 Nov 14:49:24.386 # +try-failover master mymaster 127.0.0.1 6379==> sentinel-6380.log <== 3230:X 13 Nov 14:49:24.388 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2 3230:X 13 Nov 14:49:24.388 # +new-epoch 1 3230:X 13 Nov 14:49:24.388 # +try-failover master mymaster 127.0.0.1 6379==> sentinel-6381.log <== 3235:X 13 Nov 14:49:24.409 # +vote-for-leader 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6380.log <== 3230:X 13 Nov 14:49:24.416 # +vote-for-leader 858e250193e7f985bd7d63569a158f52a9cb9e0c 1==> sentinel-6381.log <== 3235:X 13 Nov 14:49:24.416 # 858e250193e7f985bd7d63569a158f52a9cb9e0c voted for 858e250193e7f985bd7d63569a158f52a9cb9e0c 1==> sentinel-6380.log <== 3230:X 13 Nov 14:49:24.417 # 06f94705a99df53e468af594737913ce7c6287d5 voted for 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6379.log <== 3225:X 13 Nov 14:49:24.422 # +new-epoch 1 3225:X 13 Nov 14:49:24.432 # +vote-for-leader 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6381.log <== 3235:X 13 Nov 14:49:24.432 # d0e6638165ba8f8186562da586f4e0789dd4abd1 voted for 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6380.log <== 3230:X 13 Nov 14:49:24.432 # d0e6638165ba8f8186562da586f4e0789dd4abd1 voted for 06f94705a99df53e468af594737913ce7c6287d5 1==> sentinel-6381.log <== 3235:X 13 Nov 14:49:24.468 # +elected-leader master mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:24.468 # +failover-state-select-slave master mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:24.545 # +selected-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:24.545 * +failover-state-send-slaveof-noone slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:24.608 * +failover-state-wait-promotion slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:25.295 # +promoted-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:25.295 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:25.345 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379==> sentinel-6379.log <== 3225:X 13 Nov 14:49:25.345 # +config-update-from sentinel 06f94705a99df53e468af594737913ce7c6287d5 127.0.0.1 26381 @ mymaster 127.0.0.1 6379 3225:X 13 Nov 14:49:25.345 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380 3225:X 13 Nov 14:49:25.345 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380 3225:X 13 Nov 14:49:25.345 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6380.log <== 3230:X 13 Nov 14:49:25.346 # +config-update-from sentinel 06f94705a99df53e468af594737913ce7c6287d5 127.0.0.1 26381 @ mymaster 127.0.0.1 6379 3230:X 13 Nov 14:49:25.346 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380 3230:X 13 Nov 14:49:25.346 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380 3230:X 13 Nov 14:49:25.346 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6381.log <== 3235:X 13 Nov 14:49:25.561 # -odown master mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:25.814 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:26.893 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:26.954 # +failover-end master mymaster 127.0.0.1 6379 3235:X 13 Nov 14:49:26.954 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380 3235:X 13 Nov 14:49:26.955 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380 3235:X 13 Nov 14:49:26.955 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6379.log <== 3225:X 13 Nov 14:49:55.349 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6380.log <== 3230:X 13 Nov 14:49:55.397 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380==> sentinel-6381.log <== 3235:X 13 Nov 14:49:57.014 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
再返回过来看redis server的log,此时可以看到6381为从节点已经向主节点6380请求并且完成了复制操作
==> redis-6380.log <== 2851:M 13 Nov 14:49:25.823 * Slave 127.0.0.1:6381 asks for synchronization 2851:M 13 Nov 14:49:25.823 * Partial resynchronization request from 127.0.0.1:6381 accepted. Sending 422 bytes of backlog starting from offset 124407.==> redis-6381.log <== 3695:S 13 Nov 14:49:25.823 * Successful partial resynchronization with master. 3695:S 13 Nov 14:49:25.823 # Master replication ID changed to 0288d040464ebccbb56dc56d54455434a406bcb2 3695:S 13 Nov 14:49:25.823 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
当我们再启动6379服务器时,sentinel会让6379成为从库并且连接6380服务器,log如下:
启动6379服务器 # redis-server /root/redis/redis-6379.conf# tail -F sentinel-63*.log ... ==> sentinel-6379.log <== 3225:X 13 Nov 16:05:00.384 * +convert-to-slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380 ...# tail -F redis-63*.log ... ==> redis-6379.log <== 7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: receiving 194 bytes from master 7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: Flushing old data 7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: Loading DB in memory 7493:S 13 Nov 16:05:00.566 * MASTER <-> SLAVE sync: Finished with success==> redis-6381.log <== 3695:S 13 Nov 16:05:36.467 * 1 changes in 900 seconds. Saving... 3695:S 13 Nov 16:05:36.468 * Background saving started by pid 7519 7519:C 13 Nov 16:05:36.486 * DB saved on disk 7519:C 13 Nov 16:05:36.487 * RDB: 8 MB of memory used by copy-on-write 3695:S 13 Nov 16:05:36.569 * Background saving terminated with success ...
未完待续。。。