一、现象:
下节点特别慢。10台节点,每台大约需要退役60w个块。但是3个小时才退役了3000多个块。
NN侧如下日志,可以看到30秒只退役了512-494 = 20
个块,这要是退役600w个块,得猴年马月?
2024-03-19 14:44:42,952 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor: There are 10000 blocks pending replication and the limit is 10000. A further 6174331 blocks are waiting to be processed. The replication queue currently has 112518 blocks2024-03-19 14:45:12,972 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor: There are 10000 blocks pending replication and the limit is 10000. A further 6174331 blocks are waiting to be processed. The replication queue currently has 112494 blocks
NN监控里,也能发现不少块复制超时了。
DN侧监控,发现只有一台节点在处理复制任务,它的XmitsInProgress线程数不为0,其它的DN都为0ÿ