Transformer与Unet的结合是本人的2024毕设项目,在此之前从未接触过该领域,一切从0开始的过程十分痛苦,希望能帮助到你们
本笔记不定时更新
文章目录
- 复现报错
- 使用预处理Synapse数据集的通用问题
- 使用预处理ACDC数据集的通用问题
- 找不到requirements.txt中的pytorch==1.4.0
- 环境异常/代码异常
- protobuf报错
- 运行train.py只出进度条后就结束且不报错
- 一大堆错误Missing key(s) in state_dict
- ImportError: DLL load failed while importing cv2: 找不到指定的程序
- 自定义数据集(图片)
- RuntimeError: CUDA error: device-side assert triggered
- ValueError: could not broadcast input array from shape (xxx,xxx) into shape (xxx,)
复现报错
使用预处理Synapse数据集的通用问题
均可通过该文章解决:TransUNet模型复现_transunet复现-CSDN博客
使用预处理ACDC数据集的通用问题
该预处理过的数据集位于项目github仓库:GitHub - Beckschen/TransUNet: This repository includes the official project of TransUNet, presented in our paper: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation.中的data2部分,两个文件一个是数据集,一个是部分代码
这部分代码不全,不能直接运行
- 日志功能没有开启,请参考Synapse的trainer进行补全
- db_test与test_loader没有导包,请参考test.py中的内容进行补全
找不到requirements.txt中的pytorch==1.4.0
直接使用更高版本,没有问题
环境异常/代码异常
protobuf报错
D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\google\protobuf\internal\api_implementation.py:87: UserWarning: Selected implementation cpp is not available.warnings.warn(
Traceback (most recent call last):File "train.py", line 10, in <module>from trainer import trainer_synapse, trainer_acdc, trainer_driveFile "G:\TranUNet\project\TransUNet\trainer.py", line 11, in <module>from tensorboardX import SummaryWriterFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\tensorboardX\__init__.py", line 5, in <module>from .torchvis import TorchVisFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\tensorboardX\torchvis.py", line 10, in <module>from .writer import SummaryWriterFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\tensorboardX\writer.py", line 16, in <module>from .comet_utils import CometLoggerFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\tensorboardX\comet_utils.py", line 5, in <module>from google.protobuf.json_format import MessageToJsonFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\google\protobuf\json_format.py", line 30, in <module>from google.protobuf.internal import type_checkersFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\google\protobuf\internal\type_checkers.py", line 28, in <module>from google.protobuf.internal import decoderFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\google\protobuf\internal\decoder.py", line 64, in <module>from google.protobuf.internal import encoderFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\google\protobuf\internal\encoder.py", line 48, in <module>from google.protobuf.internal import wire_formatFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\google\protobuf\internal\wire_format.py", line 13, in <module>from google.protobuf import descriptorFile "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\google\protobuf\descriptor.py", line 28, in <module>from google.protobuf.pyext import _message
ImportError: DLL load failed while importing _message: 找不到指定的程序。
网上的大部分方法为降级,但我这里不好使
我的解决办法是:重装环境且清除PyCharm缓存,尤其是在未动过任何环境的情况下突发此问题
运行train.py只出进度条后就结束且不报错
CUDA问题,重启系统后可正常运行,或会出现报错信息
如果有报错通常是CUDA版本与pytorch-cuda版本对不上
一大堆错误Missing key(s) in state_dict
Traceback (most recent call last):File "D:/Program Files/JetBrains/PyCharm 2023.3.4/plugins/python/helpers/pydev/pydevd.py", line 1534, in _execpydev_imports.execfile(file, globals, locals) # execute the scriptFile "D:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfileexec(compile(contents+"\n", file, 'exec'), glob, loc)File "G:\TranUNet\project\TransUNet\test.py", line 154, in <module>net.load_state_dict(torch.load(snapshot))File "D:\Environment\anaconda3\envs\TransUNet\lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dictraise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VisionTransformer:Missing key(s) in state_dict: "transformer.embeddings.hybrid_model.body.block1/.unit1/.gn1.weight", "transformer.embeddings.hybrid_model.body.block1/.unit1/.gn1.bias", "transformer.embeddings.hybrid_model.body.block1/.unit1/.conv1.weight", "transformer.embeddings.hybrid_model.body.block1/.unit1/.gn2.weight", "transformer.embeddings.hybrid_model.body.block1/.unit1/.gn2.bias", "transformer.embeddings.hybrid_model.body.block1/.unit1/.conv2.weight", "transformer.embeddings.hybrid_model.body.block1/.unit1/.gn3.weight", "transformer.embeddings.hybrid_model.body.block1/.unit1/.gn3.bias", "transformer.embeddings.hybrid_model.body.block1/.unit1/.conv3.weight", "transformer.embeddings.hybrid_model.body.block1/.unit1/.downsample.weight",
......
在networsk/vit_seg_modeling_resnet_skip.py中,在ResNetV2类中,修改如下部分代码:
self.body = nn.Sequential(OrderedDict([('block1', nn.Sequential(OrderedDict([('unit1', PreActBottleneck(cin=width, cout=width*4, cmid=width))] +[(f'unit{i:d}', PreActBottleneck(cin=width*4, cout=width*4, cmid=width)) for i in range(2, block_units[0] + 1)],))),('block2', nn.Sequential(OrderedDict([('unit1', PreActBottleneck(cin=width*4, cout=width*8, cmid=width*2, stride=2))] +[(f'unit{i:d}', PreActBottleneck(cin=width*8, cout=width*8, cmid=width*2)) for i in range(2, block_units[1] + 1)],))),('block3', nn.Sequential(OrderedDict([('unit1', PreActBottleneck(cin=width*8, cout=width*16, cmid=width*4, stride=2))] +[(f'unit{i:d}', PreActBottleneck(cin=width*16, cout=width*16, cmid=width*4)) for i in range(2, block_units[2] + 1)],))),
]))
这个错误与训练时的这个类的这部分代码有关
如果其中的’block’、‘unit’、f’unit{i:d}‘如上所示,则替换为’block/’、‘unit/’、f’unit{i:d}/’
如果是本来就有/的,就删除掉/
ImportError: DLL load failed while importing cv2: 找不到指定的程序
Traceback (most recent call last):File "train.py", line 10, in <module> from trainer import trainer_synapse, trainer_acdc, trainer_drive, trainer_carFile "G:\TranUNet\project\TransUNet\trainer.py", line 18, in <module> from datasets.own_data import ImageFolder File "G:\TranUNet\project\TransUNet\datasets\own_data.py", line 2, in <module> import cv2
ImportError: DLL load failed while importing cv2: 找不到指定的程序。
源码中并没有用到cv2,只是源码复现可删除掉import cv2
自定义数据集(图片)
RuntimeError: CUDA error: device-side assert triggered
The length of train set is: 1000
167 iterations per epoch. 25050 max iterations 0%| | 0/150 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [5,0,0], thread: [497,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [5,0,0], thread: [106,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [5,0,0], thread: [853,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [5,0,0], thread: [206,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [3,0,0], thread: [650,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [3,0,0], thread: [911,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [3,0,0], thread: [160,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [1,0,0], thread: [930,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [0,0,0], thread: [324,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [0,0,0], thread: [173,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [0,0,0], thread: [563,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [0,0,0], thread: [435,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [0,0,0], thread: [1015,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/NLLLoss2d.cu:104: nll_loss2d_forward_kernel: block: [1,0,0], thread: [1020,0,0] Assertion `t >= 0 && t < n_classes` failed.0%| | 0/150 [00:06<?, ?it/s]
Traceback (most recent call last):File "/hy-tmp/TransUNet/train.py", line 136, in <module>trainer[dataset_name](args, net, snapshot_path)File "/hy-tmp/TransUNet/trainer.py", line 576, in trainer_kvasirloss_dice = dice_loss(outputs, label_batch, softmax=True)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn forward_call(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/hy-tmp/TransUNet/utils.py", line 36, in forwardtarget = self._one_hot_encoder(target)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/hy-tmp/TransUNet/utils.py", line 18, in _one_hot_encodertemp_prob = input_tensor == i # * torch.ones_like(input_tensor)^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
确保label图像已被映射到0,1,2,3等,参考【科研02】【代码复现】【代码分享】TransUnet-RoadExtract 道路提取【数据预处理-raster2npz】 - 安知燕雀? - 博客园 (cnblogs.com)中的1.4节部分
需要注意的是,图片尽量使用png格式,jpg可能有转换不全的问题,即虽然处理的过程是非0全转1,但可能出现部分像素是2的情况
如果还是报错,适当增大train.py中该数据集的num_classes,每次+1直到不报错
ValueError: could not broadcast input array from shape (xxx,xxx) into shape (xxx,)
Traceback (most recent call last):File "test.py", line 159, in <module>inference(args, net, test_save_path)File "test.py", line 56, in inferencemetric_i = test_single_volume(image, label, model, classes=args.num_classes, patch_size=[args.img_size, args.img_size],File "G:\TranUNet\project\TransUNet\utils.py", line 80, in test_single_volumeprediction[ind] = pred
ValueError: could not broadcast input array from shape (384,384) into shape (384,)
导致这个问题的原因是,源代码处理的image和label都是3D图像,而我们在处理常规图像后,image是三维的RGB,label是二维的灰度图,而prediction是根据label的维度创建的,因此报错
如果输入的图片就是简单的彩色图像,在test.py中找到prediction[ind] = pred
修改为prediction = pred