项目场景:
Traceback (most recent call last):
File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 255, in <module>
args=(args,),
File "/home/yuan/anaconda3/envs/AdelaiNet/lib/python3.7/site-packages/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 235, in main
return trainer.train()
File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 118, in train
self.train_loop(self.start_iter, self.max_iter)
File "/home/yuan/桌面/shenchunhua/CondInst-master/train_net.py", line 107, in train_loop
self.run_step()
File "/home/yuan/anaconda3/envs/AdelaiNet/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 232, in run_step
self._detect_anomaly(losses, loss_dict)
File "/home/yuan/anaconda3/envs/AdelaiNet/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 245, in _detect_anomaly
self.iter, loss_dict
FloatingPointError: Loss became infinite or NaN at iteration=88!
loss_dict = {'loss_fcos_cls': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>), 'loss_fcos_loc': tensor(0.5552, device='cuda:0', grad_fn=<DivBackward0>), 'loss_fcos_ctr': tensor(0.7676, device='cuda:0', grad_fn=<DivBackward0>), 'loss_mask': tensor(0.8649, device='cuda:0', grad_fn=<DivBackward0>), 'data_time': 0.0022056670004531043}
原因分析:
学习率的问题,导致损失爆炸了,可以把学习调整一下!