1. 集群的两台服务器的状态
实例 | 正常情况主备 | ip | 端口 |
node1 | 主机 | 192.168.6.6 | 9088 |
node2 | 备机 | 192.168.6.7 | 9088 |
2. 测试的步骤
- down掉node1
- 观察node2的状态
- 在node2未自动切换的时候手动将node2调整为单机状态,模拟紧急使用
- 模拟不紧急时,将node2升级为主机,并恢复节点node1
3. 主机down机后手动操纵备机使备机快速进入可使用状态
[gbasedbt@node01 install]$ onstat -g dri
On-Line (Prim) -- Up 00:16:11 -- 1650580 KbytesData Replication at 0x4cf1a028:Type State Paired server Last DR CKPT (id/pg) Supports Proxy Writesprimary on node2 9 / 1 NADRINTERVAL 0DRTIMEOUT 30DRAUTO 0DRLOSTFOUND /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO 0ENCRYPT_HDR 0Backlog 0Last Send 2024/06/17 22:01:20Last Receive 2024/06/17 22:01:20Last Ping 2024/06/17 22:01:05Last log page applied(log id,page): 9,2[root@node01 GBASE]# onstat -
On-Line (Prim) -- Up 00:14:11 -- 1650580 Kbytes[root@node01 GBASE]# su - gbasedbt
上一次登录:一 6月 17 21:45:54 CST 2024pts/0 上
[gbasedbt@node01 ~]$ onclean -ky
onclean: Cleaning up processes and resources for 'node1'...- Looking for the master daemon process: 13760- Looking for the shmem key: 52934803- Looking for the shmem key: 52934804- Looking for semaphore ID: 10- Looking for the shmem key: 52934801- Looking for the shmem key: 52934802
[gbasedbt@node01 ~]$
--主备集群之间由健康检查判断集群是否正常,由于心跳检查是多次连接,每次连接之间有数秒的间隔,所以主机down到备机切换之间有健康检查时间,这段时间备机显示集群是正常的
[gbasedbt@node02 ~]$ onstat -g dri
Read-Only (Sec) -- Up 00:01:22 -- 1635008 KbytesData Replication at 0x4c13d028:Type State Paired server Last DR CKPT (id/pg) Supports Proxy WritesHDR Secondary on node1 9 / 1 NDRINTERVAL 0DRTIMEOUT 30DRAUTO 0DRLOSTFOUND /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO 0ENCRYPT_HDR 0Backlog 0Last Send 2024/06/17 22:02:04Last Receive 2024/06/17 22:02:04Last Ping 2024/06/17 22:01:59Last log page applied(log id,page): 0,0
- 本次模拟主机down机,备机还没有发现的情况下,将备机恢复使用
[gbasedbt@node02 ~]$ onstat -g dri
Read-Only (Sec) -- Up 00:01:22 -- 1635008 KbytesData Replication at 0x4c13d028:Type State Paired server Last DR CKPT (id/pg) Supports Proxy WritesHDR Secondary on node1 9 / 1 NDRINTERVAL 0DRTIMEOUT 30DRAUTO 0DRLOSTFOUND /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO 0ENCRYPT_HDR 0Backlog 0Last Send 2024/06/17 22:02:04Last Receive 2024/06/17 22:02:04Last Ping 2024/06/17 22:01:59Last log page applied(log id,page): 0,0[gbasedbt@node02 ~]$ onstat -
Read-Only (Sec) -- Up 00:01:55 -- 1635008 Kbytes[gbasedbt@node02 ~]$ onmode -d standard
[gbasedbt@node02 ~]$ onstat -
On-Line -- Up 00:02:21 -- 1635008 Kbytes
4. 备机变成单机状态后需要升为主机并恢复集群
[gbasedbt@node02 ~]$ onmode -d primary node1
[gbasedbt@node02 ~]$ onstat -
On-Line (Prim) -- Up 00:02:38 -- 1635008 Kbytes
--node1节点执行oninit -PHY执行物理日志恢复
[gbasedbt@node01 node1_dbs]$ oninit -PHY
[gbasedbt@node01 node1_dbs]$ onstat -m
Fast Recovery -- Up 00:00:13 -- 1650580 KbytesMessage Log File: /opt/GBASE/gbase/tmp/online_node1.log
06/17/24 22:49:31 SQL_FEAT_CTRL value set to 0x8008
06/17/24 22:49:31 SQL_DEF_CTRL value set to 0x4b0
06/17/24 22:49:31 GBase Database Server Version 12.10.FC4G1AEE Software Serial Number AAA#B000000
06/17/24 22:49:32 GBase Database Server Initialized -- Shared Memory Initialized.06/17/24 22:49:32 Started 1 B-tree scanners.
06/17/24 22:49:32 B-tree scanner threshold set at 5000.
06/17/24 22:49:32 B-tree scanner range scan size set to -1.
06/17/24 22:49:32 B-tree scanner ALICE mode set to 6.
06/17/24 22:49:32 B-tree scanner index compression level set to med.
06/17/24 22:49:32 DR: Reservation of the last logical log for log backup turned on
06/17/24 22:49:32 Data replication type and state information reset. To start DR, usethe 'onmode -d' command and wait for the pair to be operational,before shutting down the database server06/17/24 22:49:32 Physical Recovery Started at Page (3:394).
06/17/24 22:49:32 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored.
06/17/24 22:49:32 Dataskip is now OFF for all dbspaces
06/17/24 22:49:32 Restartable Restore has been ENABLED
06/17/24 22:49:32 Recovery Mode
--查看节点,发现为快速恢复阶段
[gbasedbt@node01 node1_dbs]$ onstat -
Fast Recovery -- Up 00:00:21 -- 1650580 Kbytes--将node1节点当成备机加入节点
[gbasedbt@node01 node1_dbs]$ onmode -d secondary node2
[gbasedbt@node01 node1_dbs]$ onstat -
Read-Only (Sec) -- Up 00:02:04 -- 2188180 Kbytes[gbasedbt@node01 node1_dbs]$ onstat -g dri
Read-Only (Sec) -- Up 00:04:31 -- 2188180 KbytesData Replication at 0x4cf1a028:Type State Paired server Last DR CKPT (id/pg) Supports Proxy WritesHDR Secondary on node2 9 / 5 NDRINTERVAL 0DRTIMEOUT 30DRAUTO 2DRLOSTFOUND /opt/GBASE/gbase/etc/dr.lostfoundDRIDXAUTO 0ENCRYPT_HDR 0Backlog 0Last Send 2024/06/17 22:50:42Last Receive 2024/06/17 22:50:44Last Ping 2024/06/17 22:53:35Last log page applied(log id,page): 0,0