1、案例背景
记一次某医院PACS存储NetApp FAS2554更换故障硬盘的过程。
这个netapp设备以前从未调试过,据客户说该设备上线也有快9年了,头一次故障硬盘。因为己经过保了,客户只是采购的硬盘,我这来免费服务了。。。
netapp调试基本全靠命令行,更换的过程中也遇到了一些麻烦,特此记录一下。
设备大概长这个样子(网上图片,现场没拍)
2、更换过程
替换前,0a.00.2磁盘己经由热备盘替换,RAID-DP状态为normal,active
直接拔下该位置故障磁盘,替换为新的磁盘
检查磁盘状态
FAS2240-A> sysconfig -r
Aggregate aggr0 (online, raid_dp) (block checksums)Plex /aggr0/plex0 (online, normal, active, pool0)RAID group /aggr0/plex0/rg0 (normal, block checksums)RAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)--------------- ------------- ---- ---- ---- ----- -------------- --------------dparity 0a.00.0 0a 0 0 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 parity 0a.00.110a 0 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.4 0a 0 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.6 0a 0 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.7 0a 0 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.8 0a 0 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.9 0a 0 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.100a 0 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.1 0a 0 1 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 Pool1 spare disks (empty)
Pool0 spare disks (empty)
Partner disks
RAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
partner 0a.00.5 0a 0 5 SA:A 0 BSAS 7200 0/0 1695759/3472914816
partner 0a.00.3 0a 0 3 SA:A 0 BSAS 7200 0/0 1695759/3472914816
检查所有磁盘,发现磁盘0a.00.2识别到
FAS2240-A> disk show -v DISK OWNER POOL SERIAL NUMBER HOME CHKSUM
------------ ------------- ----- ------------- ------------- ------
0a.00.0 FAS2240-A (1897445747) Pool0 WD-WCAY01059319 FAS2240-A (1897445747) Block
0a.00.9 FAS2240-A (1897445747) Pool0 WD-WCAY01110442 FAS2240-A (1897445747) Block
0a.00.11 FAS2240-A (1897445747) Pool0 WD-WCAY01118699 FAS2240-A (1897445747) Block
0a.00.3 FAS2240-B (1897447544) Pool0 WD-WCAY01148720 FAS2240-B (1897447544) Block
0a.00.5 FAS2240-B (1897447544) Pool0 WD-WCAY01110637 FAS2240-B (1897447544) Block
0a.00.6 FAS2240-A (1897445747) Pool0 WD-WCAY01138268 FAS2240-A (1897445747) Block
0a.00.8 FAS2240-A (1897445747) Pool0 WD-WCAY01140234 FAS2240-A (1897445747) Block
0a.00.7 FAS2240-A (1897445747) Pool0 WD-WCAY01148812 FAS2240-A (1897445747) Block
0a.00.10 FAS2240-A (1897445747) Pool0 WD-WCAY01059555 FAS2240-A (1897445747) Block
0a.00.2 FAS3250-B (2017886517) Pool0 YGHNUAWA FAS3250-B (2017886517) Block
0a.00.1 FAS2240-A (1897445747) Pool0 WD-WCAY01110728 FAS2240-A (1897445747) Block
0a.00.4 FAS2240-A (1897445747) Pool0 WD-WCAY01140030 FAS2240-A (1897445747) Block
0a.00.2为后替换的磁盘,默认分配到FAS3250-B,而B控制器目前只存储系统RAID4,使用磁盘0a.00.5,0a.00.3。B控无法使用0a.00.2,需要切换该磁盘到A控制器。
进入到维护模式
FAS2240-A> priv set diag
Warning: These diagnostic commands are for use by NetApppersonnel only.
查看disk 命令提示
FAS2240-A*> disk
usage: disk <options>
Options are:
assign {<disk_name> | all | [-T <storage type> | -shelf <shelf name>] [-n <count>] | auto} [-p <pool>] [-o <ownername>] [-s <sysid>] [-c block|zoned] [-f] - assign a disk to a filer or all unowned disks by specifying "all" or <count> number of unowned disks
ddr_label {repair | print | delete | dumpraw | modify [-c] -o <offset> -v <value> | start_scan | pause_scan | resume_scan | error_scan | rediscover | reinit } [-f] [-d all | <disk_list>]
encrypt { lock | rekey | destroy | sanitize | show } - perform tasks specific to self-encrypting disks
fail [-i] [-f] <disk_name> - fail a file system disk
maint { start | abort | status | list} - run maintenance tests on one or more disks
power_cycle [ -f ] { [-d <disk_list>] | [ -c <channel_name> [ -s <shelf_number> ] ] } - power-cycle one or more disks
reassign {-o <old_name> | -s <old_sysid>} [-n <new_name>] [-d <new_sysid>] - reassign disks from old filer
remove [-w] <disk_name> - remove a spare disk
remove_ownership [<disk_name> | all | -s <sysid>] [-f] - revert/remove disk ownership
replace {start [-f] [-m] <disk_name> <spare_disk_name>} | {stop <disk_name>} - replace a file system disk with a spare disk or stop replacing
sanitize { start | abort | status | release } - sanitize one or more disks
sanown_stats {start| stop| show }- collect sanown event stats
scrub { start | stop } - start or stop disk scrubbing
shm_stats [<disk_name> | asup | clear_errors] - Storage Health Monitor statistics for a disk
show [-o <ownername> | -s <sysid> | -n | -v | -a] - lists disks and owners
simpull <disk_name1> [<disk_name2> [<disk_name3> ... ]] - simulate one or more disk pulls
simpush [<sim_disk_path_name1> [<sim_disk_path_name2> [<sim_disk_path_name3> ...]] | -l] - simulate one or more disk pushes or list available disks to push
unfail [-s] <disk_name> - unfail a disk (-s not valid in maintenance mode)
zero spares - Zero all spare disks
记划从控制器中删除该磁盘,失败
FAS2240-A*> disk remove_ownership 0a.00.2
disk remove_ownership: Disk 0a.00.2 is not owned by this node.
FAS2240-A*> disk remove_ownership 0a.00.2 -f
disk remove_ownership: Disk 0a.00.2 is not owned by this node.FAS2240-A*> sysconfig
NetApp Release 8.1.3 7-Mode: Sat Jun 8 08:11:51 PDT 2013
System ID: 1897445747 (FAS2240-A); partner ID: 1897447544 (FAS2240-B)
System Serial Number: 700001384306 (FAS2240-A)
System Rev: D1
System Storage Configuration: Single-Path HA
System ACP Connectivity: Partial Connectivity
slot 0: System BoardProcessors: 4Processor type: Intel(R) Xeon(R) CPU C3528 @ 1.73GHzMemory Size: 6144 MBMemory Attributes: HoistingNormal ECCController: A
Service Processor Status: Online
slot 0: Internal 10/100 Ethernet Controller
e0M MAC Address: 00:a0:98:3f:8b:05 (auto-100tx-fd-up)
e0P MAC Address: 00:a0:98:3f:8b:04 (auto-100tx-fd-up)
slot 0: Quad Gigabit Ethernet Controller 82580
e0a MAC Address: 00:a0:98:3f:8b:00 (auto-100tx-fd-up)
e0b MAC Address: 00:a0:98:3f:8b:01 (auto-1000t-fd-down)
e0c MAC Address: 00:a0:98:3f:8b:02 (auto-1000t-fd-down)
e0d MAC Address: 00:a0:98:3f:8b:03 (auto-1000t-fd-down)
slot 0: Interconnect HBA: Mellanox IB MT25204
slot 0: SAS Host Adapter 0a 12 Disks: 20345.5GB
1 shelf with IOM6E
slot 0: SAS Host Adapter 0b
slot 0: Intel ICH USB EHCI Adapter u0a (0xdf101000)
boot0Micron Technology Real SSD eUSB 2GB, class 0/0, rev 2.00/11.10, addr 2 1936MB 512B/sect (B9F0022700107745)slot 1: Fibre Channel Target Host Adapter 1aslot 1: Fibre Channel Target Host Adapter 1b
直接接使用命令在A控上分配该磁盘失败
FAS2240-A*> disk assign 0a.00.2
disk assign: Assign failed for one or more disks in the disk list.
使用强制-f参数,分配成功
FAS2240-A*> disk assign 0a.00.2 -f
FAS2240-A*> Sun Oct 20 20:10:18 CST [FAS2240-A:raid.assim.disk.nolabels:error]: Disk 0a.00.2 Shelf 0 Bay 2 [NETAPP X306_HMARK02TSSA 4321] S/N [YGHNUAWA] has no valid labels. It will be taken out of service to prevent possible data loss.
Sun Oct 20 20:10:18 CST [FAS2240-A:raid.config.disk.bad.label:error]: Disk 0a.00.2 Shelf 0 Bay 2 [NETAPP X306_HMARK02TSSA 4321] S/N [YGHNUAWA] has bad label.
Sun Oct 20 20:10:18 CST [FAS2240-A:callhome.dsk.label:CRITICAL]: Call home for DISK BAD LABEL
分配成功后,该盘标记为坏盘
FAS2240-A*> sysconfig -r
Aggregate aggr0 (online, raid_dp) (block checksums)Plex /aggr0/plex0 (online, normal, active, pool0)RAID group /aggr0/plex0/rg0 (normal, block checksums)RAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)--------------- ------------- ---- ---- ---- ----- -------------- --------------dparity 0a.00.0 0a 0 0 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 parity 0a.00.110a 0 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.4 0a 0 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.6 0a 0 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.7 0a 0 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.8 0a 0 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.9 0a 0 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.100a 0 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.1 0a 0 1 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 Pool1 spare disks (empty)
Pool0 spare disks (empty)
Broken disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
bad label0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 Partner disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
partner 0a.00.5 0a 0 5 SA:A 0 BSAS 7200 0/0 1695759/3472914816
partner 0a.00.3 0a 0 3 SA:A 0 BSAS 7200 0/0 1695759/3472914816
FAS2240-A*> disk show -vDISK OWNER POOL SERIAL NUMBER HOME CHKSUM
------------ ------------- ----- ------------- ------------- ------
0a.00.0 FAS2240-A (1897445747) Pool0 WD-WCAY01059319 FAS2240-A (1897445747) Block
0a.00.9 FAS2240-A (1897445747) Pool0 WD-WCAY01110442 FAS2240-A (1897445747) Block
0a.00.11 FAS2240-A (1897445747) Pool0 WD-WCAY01118699 FAS2240-A (1897445747) Block
0a.00.3 FAS2240-B (1897447544) Pool0 WD-WCAY01148720 FAS2240-B (1897447544) Block
0a.00.5 FAS2240-B (1897447544) Pool0 WD-WCAY01110637 FAS2240-B (1897447544) Block
0a.00.6 FAS2240-A (1897445747) Pool0 WD-WCAY01138268 FAS2240-A (1897445747) Block
0a.00.8 FAS2240-A (1897445747) Pool0 WD-WCAY01140234 FAS2240-A (1897445747) Block
0a.00.7 FAS2240-A (1897445747) Pool0 WD-WCAY01148812 FAS2240-A (1897445747) Block
0a.00.10 FAS2240-A (1897445747) Pool0 WD-WCAY01059555 FAS2240-A (1897445747) Block
0a.00.1 FAS2240-A (1897445747) Pool0 WD-WCAY01110728 FAS2240-A (1897445747) Block
0a.00.4 FAS2240-A (1897445747) Pool0 WD-WCAY01140030 FAS2240-A (1897445747) Block
0a.00.2 FAS2240-A (1897445747) Pool0 YGHNUAWA FAS2240-A (1897445747) Block
FAS2240-A*> Sun Oct 20 20:11:00 CST [FAS2240-A:monitor.globalStatus.nonCritical:warning]: Disk on adapter 0a, shelf 0, bay 2, failed.
使用命令标记该盘为正常磁盘
FAS2240-A*> disk unfail -s 0a.00.2
disk unfail: unfailing disk 0a.00.2...
FAS2240-A*> disk show -v DISK OWNER POOL SERIAL NUMBER HOME CHKSUM
------------ ------------- ----- ------------- ------------- ------
0a.00.0 FAS2240-A (1897445747) Pool0 WD-WCAY01059319 FAS2240-A (1897445747) Block
0a.00.9 FAS2240-A (1897445747) Pool0 WD-WCAY01110442 FAS2240-A (1897445747) Block
0a.00.11 FAS2240-A (1897445747) Pool0 WD-WCAY01118699 FAS2240-A (1897445747) Block
0a.00.3 FAS2240-B (1897447544) Pool0 WD-WCAY01148720 FAS2240-B (1897447544) Block
0a.00.5 FAS2240-B (1897447544) Pool0 WD-WCAY01110637 FAS2240-B (1897447544) Block
0a.00.6 FAS2240-A (1897445747) Pool0 WD-WCAY01138268 FAS2240-A (1897445747) Block
0a.00.8 FAS2240-A (1897445747) Pool0 WD-WCAY01140234 FAS2240-A (1897445747) Block
0a.00.7 FAS2240-A (1897445747) Pool0 WD-WCAY01148812 FAS2240-A (1897445747) Block
0a.00.10 FAS2240-A (1897445747) Pool0 WD-WCAY01059555 FAS2240-A (1897445747) Block
0a.00.1 FAS2240-A (1897445747) Pool0 WD-WCAY01110728 FAS2240-A (1897445747) Block
0a.00.4 FAS2240-A (1897445747) Pool0 WD-WCAY01140030 FAS2240-A (1897445747) Block
0a.00.2 FAS2240-A (1897445747) Pool0 YGHNUAWA FAS2240-A (1897445747) Block
FAS2240-A*> sysconfig -r
Aggregate aggr0 (online, raid_dp) (block checksums)Plex /aggr0/plex0 (online, normal, active, pool0)RAID group /aggr0/plex0/rg0 (normal, block checksums)RAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)--------------- ------------- ---- ---- ---- ----- -------------- --------------dparity 0a.00.0 0a 0 0 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 parity 0a.00.110a 0 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.4 0a 0 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.6 0a 0 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.7 0a 0 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.8 0a 0 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.9 0a 0 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.100a 0 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.1 0a 0 1 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 Pool1 spare disks (empty)
Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (not zeroed)Partner disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
partner 0a.00.5 0a 0 5 SA:A 0 BSAS 7200 0/0 1695759/3472914816
partner 0a.00.3 0a 0 3 SA:A 0 BSAS 7200 0/0 1695759/3472914816
FAS2240-A*> aggr status -sPool1 spare disks (empty)Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (not zeroed)
FAS2240-A*> vol status -sPool1 spare disks (empty)Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (not zeroed)
热备盘为not zeroed,重新格式化该磁盘
FAS2240-A*> disk zero spares
执行命令查看格式化进度,直至格式化完毕
FAS2240-A*> vol status -s
Pool1 spare disks (empty)
Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (zeroing, 0% done)FAS2240-A*> vol status -sPool1 spare disks (empty)Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (zeroing, 8% done)FAS2240-A*> vol status -sPool1 spare disks (empty)Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (zeroing, 11% done)FAS2240-A> vol status -sPool1 spare disks (empty)Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
FAS2240-A> sysconfig -r
Aggregate aggr0 (online, raid_dp) (block checksums)Plex /aggr0/plex0 (online, normal, active, pool0)RAID group /aggr0/plex0/rg0 (normal, block checksums)RAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)--------------- ------------- ---- ---- ---- ----- -------------- --------------dparity 0a.00.0 0a 0 0 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 parity 0a.00.110a 0 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.4 0a 0 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.6 0a 0 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.7 0a 0 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.8 0a 0 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.9 0a 0 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.100a 0 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 data 0a.00.1 0a 0 1 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 Pool1 spare disks (empty)Pool0 spare disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
Spare disks for block checksum
spare 0a.00.2 0a 0 2 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 Partner disksRAID DiskDevice HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------------- ------------- ---- ---- ---- ----- -------------- --------------
partner 0a.00.5 0a 0 5 SA:A 0 BSAS 7200 0/0 1695759/3472914816
partner 0a.00.3 0a 0 3 SA:A 0 BSAS 7200 0/0 1695759/3472914816
到这里,全流程更换完毕!
也欢迎关注我的公众号【徐sir的IT之路】,一起学习————————————————————————————
公众号:徐sir的IT之路
CSDN :https://blog.csdn.net/xxddxhyz?type=blog
墨天轮:https://www.modb.pro/u/3605
PGFANS:https://www.pgfans.cn/user/home?userId=5568————————————————————————————