神经网络中如何优化模型和超参数调优（案例为tensor的预测）

总结：

初级：简单修改一下超参数，效果一般般但是够用，有时候甚至直接不够用

中级：optuna得出最好的超参数之后，再多一些epoch让train和testloss整体下降，然后结果就很不错。

高级：在中级的基础上，更换更适合的损失函数之后，在train的时候backward反向传播这个loss,optuna也更改这个loss标准，现在效果有质的改变。

问题：

最近在做cfd领域，需要流场进行预测，然后流场提取出来再深度学习就是一个多维度tensor,而神经网络的目的就是通过模型预测让预测的tensor与实际的tensor的结果尽可能的接近，具体来说就是让每个值之间的误差尽可能小。

目前情况：现在模型大概以及确定，但是效果一般般，这时候就需要进行下面的调优方法。

优化方法：

一、初级优化：

简单修改一下超参数，效果一般般但是够用，有时候甚至直接不够用

二、中级优化：optuna调参，然后epoch加多

optuna得出最好的超参数之后，再多一些epoch让train和testloss整体下降，然后结果就很不错。

三、高级优化：

在中级的基础上，现在更换更适合的损失函数之后，在train的时候backward反向传播这个loss,optuna也更改这个loss标准，现在效果有质的改变。

也就是下面这三行代码

smooth_l1 = F.smooth_l1_loss(out.view(shape1, shape2), y.view(shape1, shape2))#！！！！！！！！！！！！！
smooth_l1.backward() #用这个smooth_l1_loss反向传播#！！！！！！！！！！！！！！！！！！！！！！！！！
return test_smooth_l1  #test中的最后一个epoch的test_smooth_l1！！！！！！！！！！！！！！！！！！！！！！！！！！！！！

通过上面预测的数据和实际的数据进行的对比，可以发现预测的每个结果与实际的结果的误差在大约0.01范围之内（实际数据在[-4,4]之间）。

确定损失函数：

要让两个矩阵的值尽可能接近，选择合适的损失函数（loss function）是关键。常见的用于这种目的的损失函数包括以下几种：

均方误差（Mean Squared Error, MSE）：对预测值与真实值之间的平方误差求平均。MSE对大误差比较敏感，能够显著惩罚偏离较大的预测值。
```
import torch.nn.functional as F loss = F.mse_loss(predicted, target)
```
平均绝对误差（Mean Absolute Error, MAE）：对预测值与真实值之间的绝对误差求平均。MAE对异常值不如MSE敏感，适用于数据中存在异常值的情况。
```
import torch loss = torch.mean(torch.abs(predicted - target))
```
平滑L1损失（Smooth L1 Loss）：又称Huber Loss，当误差较小时，平滑L1损失类似于L1损失，当误差较大时，类似于L2损失。适合在有噪声的数据集上使用。
```
import torch.nn.functional as F loss = F.smooth_l1_loss(predicted, target)
```
总结如下：

MSE：适用于需要显著惩罚大偏差的情况。
MAE：适用于数据中存在异常值，并且你希望对异常值不那么敏感的情况。
Smooth L1 Loss：适用于既有一定抗噪声能力又能对大偏差适当惩罚的情况。

这里根据任务选择Smooth L1 Loss。

具体做法：

目前这个经过optuna调优，然后先下面处理（思想是将loss的反向传播和optuna优化标准全换为更适合这个任务的smooth_l1_loss函数）

1. loss将mse更换为smooth_l1_loss，
2. l2.backward()更换为smooth_l1.backward()，
3. return test_l2更改为return test_smooth_l1

结果：point_data看着值很接近，每个值误差0.01范围内。说明用这个上面这个方法是对的。试了一下图也有优化。并step_loss现在极低。

下面代码中加感叹号的行都是上面思路修改我的项目中对应的代码行，重要！！！

import optuna
import time
import torch.optim as optim
# 求解loss的两个参数
shape1 =  -1   
shape2 = data.shape[1]* 3def objective1(trial):batch_size = trial.suggest_categorical('batch_size', [32])learning_rate = trial.suggest_float('learning_rate', 1e-6, 1e-2,log=True)layers = trial.suggest_categorical('layers', [2,4,6])width = trial.suggest_categorical('width', [10,20,30])#新加的weight_decay = trial.suggest_float('weight_decay', 1e-6, 1e-2,log=True)#新加的#再加个优化器optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'SGD', 'RMSprop'])# loss_function_name = trial.suggest_categorical('loss_function', ['LpLoss', 'MSELoss'])""" Read data """# data是[1991, 80, 40, 30]，而data_cp是为归一化的[2000, 80, 40, 30]train_a = data[ntest:-1,:,:]#data:torch.Size：50:, 80, 40, 30。train50对应的是predict50+9+1train_u = data_cp[ntest+10:,:,:]#torch.Size([50, 64, 64, 10])#data_cp是未归一化的，第11个对应的是data的第data的第1个,两者差10# print(train_a.shape)# print(train_u.shape)test_a = data[:ntest,:,:]#选取最后200个当测试集test_u = data_cp[10:ntest+10,:,:]# print(test_a.shape)# print(test_u.shape)#torch.Size([40, 80, 40, 3])train_loader = torch.utils.data.DataLoader(torch.utils.data.TensorDataset(train_a, train_u),batch_size=batch_size, shuffle=True)test_loader = torch.utils.data.DataLoader(torch.utils.data.TensorDataset(test_a, test_u),batch_size=batch_size, shuffle=False)#没有随机的train_loader，用于后面预测可视化data_loader_noshuffle = torch.utils.data.DataLoader(torch.utils.data.TensorDataset(data[:,:,:], data_cp[9:,:,:]),batch_size=batch_size, shuffle=False)# %%""" The model definition """device = torch.device("cuda" if torch.cuda.is_available() else "cpu")model = WNO1d(width=width, level=level, layers=layers, size=h, wavelet=wavelet,in_channel=in_channel, grid_range=grid_range).to(device)# print(count_params(model))# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-6)#调参数用，优化器选择if optimizer_name == 'Adam':optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)elif optimizer_name == 'SGD':optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay, momentum=0.9)else:  # RMSpropoptimizer = optim.RMSprop(model.parameters(), lr=learning_rate, weight_decay=weight_decay)scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=gamma)train_loss = torch.zeros(epochs)test_loss = torch.zeros(epochs)myloss = LpLoss(size_average=False)""" Training and testing """for ep in range(epochs):model.train()t1 = default_timer()train_mse = 0train_l2 = 0for x, y in train_loader:x, y = x.to(device), y.to(device)optimizer.zero_grad()out = model(x)mse = F.mse_loss(out.view(shape1, shape2), y.view(shape1, shape2))# # 训练时使用 Smooth L1 Losssmooth_l1 = F.smooth_l1_loss(out.view(shape1, shape2), y.view(shape1, shape2))#！！！！！！！！！！！！！l2 = myloss(out.view(shape1, shape2), y.view(shape1, shape2))# l2.backward()smooth_l1.backward() #用这个smooth_l1_loss反向传播#！！！！！！！！！！！！！！！！！！！！！！！！！optimizer.step()train_mse += mse.item()train_l2 += l2.item()scheduler.step()model.eval()test_l2 = 0.0test_smooth_l1 =0with torch.no_grad():for x, y in test_loader:x, y = x.to(device), y.to(device)out = model(x)test_l2 += myloss(out.view(shape1, shape2), y.view(shape1, shape2)).item()test_smooth_l1  +=F.smooth_l1_loss(out.view(shape1, shape2), y.view(shape1, shape2)).item()#！！！！！！！！！！！！！！！！！！train_mse /= ntrain#len(train_loader)train_l2 /= ntraintest_l2 /= ntesttest_smooth_l1 /= ntest#！！！！！！！！！！！！！！！！！！！train_loss[ep] = train_l2test_loss[ep] = test_l2t2 = default_timer()print('Epoch-{}, Time-{:0.4f}, [step_loss:] -> Train-MSE-{:0.4f}，test_smooth_l1-{:0.4f} Train-L2-{:0.4f}, Test-L2-{:0.4f}'.format(ep, t2-t1, train_mse,test_smooth_l1, train_l2, test_l2))#！！！！！！！！！！！！！！！！1if trial.should_prune():raise optuna.exceptions.TrialPruned()"""防止打印信息错位"""print(f"Trial {trial.number} finished with value: {test_l2}")return test_smooth_l1  #test中的最后一个epoch的test_smooth_l1！！！！！！！！！！！！！！！！！！！！！！！！！！！！！""" For saving the trained model and prediction data """