运行pytorch作业出现错误 RuntimeError: unable to write to file · Issue #26 · huaweicloud/dls-example · GitHub
pytorch将共享内存的临时文件保存在了/torch_xxx文件中,即容器中的根目录下。容器磁盘空间不足导致该问题的发生。目前可以通过以下代码暂时关闭pytorch的shared memory功能来规避
直接加在train.py的最前面就可以
import sys
import torch
from torch.utils.data import dataloader
from torch.multiprocessing import reductions
from multiprocessing.reduction import ForkingPicklerdefault_collate_func = dataloader.default_collatedef default_collate_override(batch):dataloader._use_shared_memory = Falsereturn default_collate_func(batch)setattr(dataloader, 'default_collate', default_collate_override)for t in torch._storage_classes:if sys.version_info[0] == 2:if t in ForkingPickler.dispatch:del ForkingPickler.dispatch[t]else:if t in ForkingPickler._extra_reducers:del ForkingPickler._extra_reducers[t]####以下是train的原始代码