VGG16-CF-VGG11实验报告

说明：VGG16和CF-VGG11是论文《A 3D Fluorescence Classification and Component Prediction Method Based on VGG Convolutional Neural Network and PARAFAC Analysis Method》使用的两种主要模型。其对应代码仓库提供了实验使用的数据集、平行因子分析结果和CNN模型。论文和代码仓库是本文实验使用的基本材料。

论文摘要
数据集信息
环境配置
分类实验（工作目录：代码仓库/VGG16）
- 修改代码
- 实验流程
- 实验结果
- train.py笔记
- - 数据流
组分拟合实验（工作目录：代码仓库/CF-VGG11）
- 修改代码
- 实验流程
- 实验结果
- 笔记
- - train.py数据流分析
  - FVGG11.py
  - FAlexNet.py
  - SimpleCNN.py

论文摘要

三维荧光的研究目前主要采用平行因子分析(PARAFAC)、荧光区域积分(FRI)和主成分分析(PCA)等方法。
目前结合卷积神经网络(CNN)的研究也很多，但在CNN与三维荧光分析相结合的方法中，还没有一种方法被认为是最有效的。
本文在已有研究基础上，从实际环境中采集了一些样品进行三维荧光数据的测量，并从互联网中获得了一批公开数据集。
首先对数据进行预处理(包括PARAFAC分析和CNN数据集生成两步)，然后提出了基于VGG16和VGG11卷积神经网络的三维荧光分类方法和分量拟合方法。
使用VGG16网络对三维荧光数据进行分类，训练准确率为99.6%(与PCA + SVM方法同样准确)。
对于分量图拟合网络，我们综合比较了改进的LeNet网络、改进的AlexNet网络和改进的VGG11网络，PCA + SVM改进的VGG11网络。
在改进的VGG11网络训练中，我们使用MSE损失函数和余弦相似度来判断模型的优劣，网络训练的MSE损失达到4.6×10⁻⁴，训练结果的余弦相似度达到0.99。（由此可见，）网络性能非常出色。
实验表明，CNN在三维荧光分析中具有很大的应用价值。

数据集信息

以下表格中的Samples,Number,Train,Validate,Test,Total Samples after Expansion列来自论文的Table 3。

Samples	Number	Train	Validate	VGG16/main	VGG11/train	Test	VGG11/test	VGG16/test	Total Samples after Expansion
FU	45	27	9	35	35	9	7	35	135
F	105	63	21	81	81	21	21	81	315
P	206	124	41	161	161	41	42	161	618
PU	60	36	12	45	45	12	12	45	180

论文中的数据扩充说明：在实际训练过程中，我们通过色域失真和镜像翻转来扩展图像。

表格分析：Train+Validate与2个网络的训练集大小相近，VGG16的测试集扩充了4倍，Total Samples after Expansion=3*Number。

环境配置

安装CUDA 12.1
安装cuDnn
新建环境：conda create -n 3deem python=3.10
安装torch-2.2.1+cu121-cp310-cp310-win_amd64.whl
在虚拟环境中安装matplotlib,opencv
pip3 install torchvision --index-url https://download.pytorch.org/whl/cu121

分类实验（工作目录：代码仓库/VGG16）

修改代码

新增annotation_generator.py

import os
from utils.utils import get_classesclasses_path = 'model_data/cls_classes.txt' 
img_root_path = 'datasets/main/'
txt_path = 'model_data/cls_train.txt'assert os.path.exists(img_root_path)txt = open(txt_path,'w')
class_names, num_classes = get_classes(classes_path)
for tag_index in range(0,num_classes):class_name = class_names[tag_index]img_path = img_root_path + class_name + '/'files = os.listdir(img_path)for img_file in files:line = str(tag_index) + ';' + img_path + img_file + '\n'txt.write(line)

nets/mobilenet.py、nets/resnet50.py、nets/vgg16.py
torchvision.models.utils → torch.hub
修改vit.py

# 在第9行插入以下4行
from torch.hub import load_state_dict_from_url
model_urls = {'vit': 'https://download.pytorch.org/models/vit_b_16-c867db91.pth',
}
# 修改vit函数如下
def vit(input_shape=[224, 224], pretrained=False, progress=True, num_classes=1000):model = VisionTransformer(input_shape)if pretrained:state_dict = load_state_dict_from_url(model_urls['vit'], model_dir='./model_data',progress=progress)model.load_state_dict(state_dict,strict=False)# ...（剩余部分不变）

修改train.py
"cls_train.txt" → "model_data/cls_train.txt"
修改eval_top1.py
"./cls_test.txt" → "model_data/cls_test.txt"

实验流程

运行annotation_generator.py，生成cls_train.txt。
打开train.py，修改backbone，运行。（运行train.py会自动下载预训练模型到model_data目录下，下载地址见模型源代码中的model_urls变量；使用vit模型时需要根据注释调整lr参数；显存溢出则需要需要根据注释调整Batch_size参数）
打开classification.py，修改backbone，将model_path修改为训练好的权值文件路径，运行。
运行predict.py，查看单个识别结果。
运行annotation_generator.py，生成cls_test.txt。（运行之前修改以下变量：img_root_path、txt_path）
运行eval_top1.py，得到测试正确率。

实验结果

backbone	acc1
VGG16	top-1=99.69%
mobilenet	top-1=100.00%
resnet50	top-1=100.00%
vit	top-1=99.38%

train.py笔记

classes_path：类文件存储路径。在类文件中，每一行为一个类的名称。

input_shape：224*224是VGG的标准输入大小。

annotation_path：每一行代表一个样本，格式：标签序号;图像地址

//运算符:先除后向下取整

冻结阶段与解冻阶段区别：前者调用model.freeze_backbone()，后者调用model.Unfreeze_backbone()。

数据流

classes_path→class_names, num_classes。annotation_path→lines（标签+地址）→train_dataset,val_dataset→gen,gen_val

(1)train_dataset,val_dataset->DataGenerator(data.Dataset)
def __init__(self, annotation_lines, input_shape, random)：该类的实例是一个实现了列表操作的假列表，程序员自定义构造方法的传入数据，自定义列表操作，则实例可对外表现为一个列表。
__getitem__：首先按annotation_lines中的路径读取图片数据，然后调用get_random_data，然后调用preprocess_input将值变换到(-1,1)，然后用numpy.transpose(x, [2, 0, 1])交换数组轴。（[2, 0, 1]表示交换后的轴顺序，将x的第三个轴变成第一个轴，第一个轴变成第二个轴，第二个轴变成第三个轴）
get_random_data：首先将灰度图像转换为RGB图像，然后分情况处理。如果randomfalse，缩放图像尺寸到input_shape（空白部分用灰条补足）后返回；如果randomtrue，将图像进行随机尺寸的缩放和随机尺寸的伸缩，随机选择是否翻转，随机选择是否旋转（空白部分用灰色补足），随机进行色域扭曲

(2)fit_one_epoch(…gen, gen_val)
用enumerate遍历gen，iteration为下标，batch为循环变量（假列表中的元素）。对于每一次循环，调用gen.__getitem__(iteration)，得到images, y。y作为targets，调用model_train(images)得到outputs，计算outputs与targets的交叉熵得到loss。

组分拟合实验（工作目录：代码仓库/CF-VGG11）

修改代码

新增nets目录，将FAlexNet.py, FVGG11.py, SimpleCNN.py移入其中，新增__init__.py

from .FAlexNet import FAlexNet
from .FVGG11 import FVGG11
from .SimpleCNN import SimpleCNNget_model_from_name = {"FAlexNet" : FAlexNet,"FVGG11"   : FVGG11,"SimpleCNN": SimpleCNN
}

修改train.py
from ModelName import ModelName → from nets import get_model_from_name

model = ModelName() 
# 以上一行改为以下三行
# 取值FAlexNet或FVGG11或SimpleCNN
backbone = 'FVGG11'
model = get_model_from_name[backbone]()
# 注释以下几行
print('iter', idx+1, '余弦相似度(准确率):', accuracy)
print('iter %d finished, total_loss:%4f, cossine similarity: %4f' % (idx+1, total_loss / (idx+1), total_accuracy / (idx+1)))
print('val total loss:', val_loss / (iteration + 1), ',  lr:', get_lr(optimizer))

if epoch % 10 == 0 → if (epoch + 1) % 10 == 0

新增eval.py

import torch
from nets import get_model_from_name
import os
from torchvision import transforms
from dataloader import FeemDataSet
from torch.utils.data import DataLoaderWEIGHT_PATH = r'./logs/model_name_CX.pth'
# 取值：FVGG11/SimpleCNN/FAlexNet
backbone = 'FVGG11'
annotation_path = 'cls_test.txt'if __name__ == '__main__':model = get_model_from_name[backbone]()model.load_state_dict(torch.load(WEIGHT_PATH))model.eval()files = os.listdir('./data/test')datas = []# 读入测试集with open(annotation_path, "r") as f:lines = f.readlines()dataset = FeemDataSet(lines, transform=transforms.Compose([transforms.ToTensor()]))datas = DataLoader(dataset, batch_size=1, shuffle=True)total_accuracy = 0for idx, (data,target) in enumerate(datas):pred = model(data)pred = torch.detach(pred)accuracy = torch.cosine_similarity(pred, target.view(1, -1))total_accuracy += accuracyprint('第%d张图像与标签的相似度=%2f' % ((idx + 1),accuracy))print('平均相似度=%2f' % (total_accuracy / (idx + 1)))

实验流程

运行txt_annotation.py，生成cls_train.txt,cls_test.txt,cls_predict.txt。注意到仅portsur和pure有5组分，仅portsur有6组分，在comp=5和comp=6时应使用专门的索引文件：cls_train_C5.txt,cls_train_C6.txt,cls_test_C5.txt,cls_test_C6.txt。
打开dataloader.py，修改FeemDataSet.__init__()中的comp默认值，运行。
打开train.py，修改model、annotation_path，运行。
打开eval.py，修改WEIGHT_PATH、backbone、annotation_path，运行。

实验结果

backbone	C1	C2	C3	C4	C5	C6	avg
FVGG11	0.999743	0.999852	0.999738	0.999915	0.999094	0.999903	0.999708
FAlexNet	0.998036	0.999412	0.994359	0.941888	0.990296	0.999894	0.987314
SimpleCNN	0.994475	0.997314	0.9911	0.997244	0.98159	0.999983	0.993618

笔记

train.py数据流分析

annotation_path→lines（标签索引+地址）→train_dataset,val_dataset→train_datas,val_datas→idx,batch=images,targets

(1)train_dataset,val_dataset->FeemDataSet(data.Dataset)
def __init__(self, data_pathes, comp, transform)
comp：组分序号
path in data_pathes→data,label 示例：1;./data/train/fish/fish_100.jpg(fish数据集的一张图像)→fish_comp1
label,comp→target_path
data/target_path→datas/targets（按路径打开，用convert(‘L’)转换为灰度图像，加入集合）→self.data/self.target
__getitem__:index→data/target，按需transform

(2)train_iteration
调用model_train(images)得到outputs，计算outputs与targets的MSELoss。计算outputs与targets的sim（余弦相似度数组），求平均值得到accuracy。

FVGG11.py

使用torchsummary查看其网络结构：

        Layer (type)               Output Shape         Param #
================================================================Conv2d-1           [-1, 64, 60, 60]             640BatchNorm2d-2           [-1, 64, 60, 60]             128ReLU-3           [-1, 64, 60, 60]               0MaxPool2d-4           [-1, 64, 30, 30]               0Conv2d-5          [-1, 128, 30, 30]          73,856BatchNorm2d-6          [-1, 128, 30, 30]             256ReLU-7          [-1, 128, 30, 30]               0MaxPool2d-8          [-1, 128, 15, 15]               0Conv2d-9          [-1, 256, 15, 15]         295,168BatchNorm2d-10          [-1, 256, 15, 15]             512ReLU-11          [-1, 256, 15, 15]               0MaxPool2d-12            [-1, 256, 7, 7]               0Conv2d-13            [-1, 512, 7, 7]       1,180,160BatchNorm2d-14            [-1, 512, 7, 7]           1,024ReLU-15            [-1, 512, 7, 7]               0MaxPool2d-16            [-1, 512, 3, 3]               0Conv2d-17            [-1, 512, 3, 3]       2,359,808BatchNorm2d-18            [-1, 512, 3, 3]           1,024ReLU-19            [-1, 512, 3, 3]               0MaxPool2d-20            [-1, 512, 1, 1]               0Linear-21                 [-1, 2048]       1,050,624ReLU-22                 [-1, 2048]               0Dropout-23                 [-1, 2048]               0Linear-24                 [-1, 1024]       2,098,176ReLU-25                 [-1, 1024]               0Dropout-26                 [-1, 1024]               0Linear-27                 [-1, 3600]       3,690,000

对于torchvision.models.vgg.vgg11_bn，使用torchsummary查看其网络结构，并与FVGG11对比：

        Layer (type)               Output Shape         Param #
================================================================Conv2d-1         [-1, 64, 224, 224]           1,792 对应Layer1BatchNorm2d-2         [-1, 64, 224, 224]             128 ...ReLU-3         [-1, 64, 224, 224]               0 ...MaxPool2d-4         [-1, 64, 112, 112]               0 ...（边长减半）Conv2d-5        [-1, 128, 112, 112]          73,856 ...（边长翻倍）BatchNorm2d-6        [-1, 128, 112, 112]             256 ...ReLU-7        [-1, 128, 112, 112]               0 ...MaxPool2d-8          [-1, 128, 56, 56]               0 ...（边长减半）Conv2d-9          [-1, 256, 56, 56]         295,168 ...BatchNorm2d-10          [-1, 256, 56, 56]             512 ...ReLU-11          [-1, 256, 56, 56]               0 对应Layer11Conv2d-12          [-1, 256, 56, 56]         590,080BatchNorm2d-13          [-1, 256, 56, 56]             512ReLU-14          [-1, 256, 56, 56]               0MaxPool2d-15          [-1, 256, 28, 28]               0 对应Layer12（边长减半）Conv2d-16          [-1, 512, 28, 28]       1,180,160 ...BatchNorm2d-17          [-1, 512, 28, 28]           1,024 ...ReLU-18          [-1, 512, 28, 28]               0 对应Layer15Conv2d-19          [-1, 512, 28, 28]       2,359,808BatchNorm2d-20          [-1, 512, 28, 28]           1,024ReLU-21          [-1, 512, 28, 28]               0MaxPool2d-22          [-1, 512, 14, 14]               0 对应Layer16（边长减半）Conv2d-23          [-1, 512, 14, 14]       2,359,808 ...BatchNorm2d-24          [-1, 512, 14, 14]           1,024 ...ReLU-25          [-1, 512, 14, 14]               0 对应Layer19Conv2d-26          [-1, 512, 14, 14]       2,359,808BatchNorm2d-27          [-1, 512, 14, 14]           1,024ReLU-28          [-1, 512, 14, 14]               0MaxPool2d-29            [-1, 512, 7, 7]               0 对应Layer20（边长减半）
AdaptiveAvgPool2d-30            [-1, 512, 7, 7]               0 Linear-31                 [-1, 4096]     102,764,544 对应Layer21ReLU-32                 [-1, 4096]               0 ...Dropout-33                 [-1, 4096]               0 ...Linear-34                 [-1, 4096]      16,781,312 ...ReLU-35                 [-1, 4096]               0 ...Dropout-36                 [-1, 4096]               0 ...Linear-37                 [-1, 1000]       4,097,000 对应Layer27

经过调查研究，发现FVGG11的网络结构其实来源于论文《Video object forgery detection algorithm based on VGG-11 convolutional neural network》。

FAlexNet.py

使用torchsummary查看其网络结构：

        Layer (type)               Output Shape         Param #
================================================================Conv2d-1           [-1, 96, 58, 58]             960MaxPool2d-2           [-1, 96, 28, 28]               0Conv2d-3          [-1, 256, 26, 26]         221,440MaxPool2d-4          [-1, 256, 12, 12]               0Conv2d-5          [-1, 384, 12, 12]         885,120Conv2d-6          [-1, 384, 12, 12]       1,327,488Conv2d-7          [-1, 256, 12, 12]         884,992MaxPool2d-8            [-1, 256, 5, 5]               0Linear-9                 [-1, 2048]      13,109,248Dropout-10                 [-1, 2048]               0Linear-11                 [-1, 1024]       2,098,176Dropout-12                 [-1, 1024]               0Linear-13                 [-1, 3600]       3,690,000

对于torchvision.models.alexnet.alexnet，使用torchsummary查看其网络结构，并与FAlexNet对比：

        Layer (type)               Output Shape         Param #
================================================================Conv2d-1           [-1, 64, 55, 55]          23,296 对应Layer1ReLU-2           [-1, 64, 55, 55]               0 使用标准，被隐藏MaxPool2d-3           [-1, 64, 27, 27]               0 对应Layer2Conv2d-4          [-1, 192, 27, 27]         307,392 对应Layer3ReLU-5          [-1, 192, 27, 27]               0 使用标准，被隐藏MaxPool2d-6          [-1, 192, 13, 13]               0 对应Layer4Conv2d-7          [-1, 384, 13, 13]         663,936 对应Layer5ReLU-8          [-1, 384, 13, 13]               0 使用标准，被隐藏Conv2d-9          [-1, 256, 13, 13]         884,992 对应Layer6ReLU-10          [-1, 256, 13, 13]               0 使用标准，被隐藏Conv2d-11          [-1, 256, 13, 13]         590,080 对应Layer7ReLU-12          [-1, 256, 13, 13]               0 使用标准，被隐藏MaxPool2d-13            [-1, 256, 6, 6]               0 对应Layer8
AdaptiveAvgPool2d-14            [-1, 256, 6, 6]               0 新增Dropout-15                 [-1, 9216]               0 对应Layer10Linear-16                 [-1, 4096]      37,752,832 对应Layer9ReLU-17                 [-1, 4096]               0 使用标准，被隐藏Dropout-18                 [-1, 4096]               0 对应Layer12Linear-19                 [-1, 4096]      16,781,312 对应Layer11ReLU-20                 [-1, 4096]               0 使用标准，被隐藏Linear-21                 [-1, 1000]       4,097,000 对应Layer13

总结改进之处：调换Linear与Dropout的先后顺序（共2处）