模型优化和调整（2）

接模型优化和调整（1）

调整反向传播

梯度消失和梯度爆炸

梯度消失和梯度爆炸都和计算出来的“delta”有关。理想的delta应该是逐渐减小的。如果delta一直太小，则会导致下降太慢，甚至对于权重没有改变，此时形成了梯度消失。如果delta一直很大，则会出现波浪式(choppy)学习过程，实际没有任何下降，此时形成了梯度爆炸。下图给出了梯度消失和梯度爆炸的示意。

解决方案有

权重初始化。初始化时选择较优的权重
激活函数。激活函数可以影响梯度下降，因此应该选择合适的激活函数
批规范化（Batch normalization）。这个概念在GANs和Diffusion模型（2）中提到过，本文稍后会给出一些讲解

批规范化

批规范化是一项处理梯度消失和梯度爆炸的重要技术。具体如下：

在每一个隐藏层之前，对输入进行规范化
这里的规范化是指：对权重和偏好进行中心化和定标(Center and Scale)，或者称为StandardScaler
在计算平均值和标准差的时候，会考虑隐藏层输出的值，使得规范化后的输入数据具有相同的规格(scale)。即使delta更新了、激活函数改变了数据的规格，这个步骤也能保持每个隐藏层的输入数据具有相同的规格。
有助于通过更少的期数获得更高的准确度。
需要额外的计算，因而会增加对计算资源的使用、以及执行时间。

试验程序

试验程序仍然基于模型优化和调整（1）中的基础模型。

#Initialize the measures
accuracy_measures = {}normalization_list = ['none', 'batch']for normalization in normalization_list:#Load default configurationmodel_config = base_model_config()#Acquire and process input dataX,Y = get_data()model_config["NORMALIZATION"] = normalizationmodel_name = "Normalization-" + normalizationhistory = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Batch Normalization")

运行程序后，可以得到如下结果

可以看到，使用了批规范化后，模型的准确度提高了

优化因子(Optimizer)

优化因子是帮助快速梯度下降的关键工具。可用的优化因子有

SGD(Stochastic Gradient Descent)
RMSprop
Adam
Adagrad

本文不会对每种优化因子的数学原理展开陈述，有兴趣可以搜索相关资料

试验程序

#Initialize the measures
accuracy_measures = {}optimizer_list = ['sgd', 'rmsprop', 'adam', 'adagrad']for optimizer in optimizer_list:#Load default configurationmodel_config = base_model_config()#Acquire and process input dataX,Y = get_data()model_config["OPTIMIZER"] = optimizermodel_name = "Optimizer-" + optimizerhistory = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Optimizers")

学习率(Learning Rate)

和优化因子相关的另一个超参数是学习率。学习率是

权重改变和其对应的估计误差之间的比值
和优化因子一起协同工作。在误差估计之后，优化因子会根据学习率调整delta。
学习率是一个一个小于1的小数。

学习率的选择

较大的值
- 学习更快，需要的期数更少
- 增加梯度爆炸的风险
较小的值
- 学习更慢，但更稳定
- 增加梯度消失的风险

试验程序

#Initialize the measures
accuracy_measures = {}learning_rate_list = [0.001, 0.005, 0.01, 0.1, 0.5]for learning_rate in learning_rate_list:#Load default configurationmodel_config = base_model_config()#Acquire and process input dataX,Y = get_data()model_config["LEARNING_RATE"] = learning_ratemodel_name = "Learning_Rate-" + str(learning_rate)history = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Learning Rates")

过拟合处理

过拟合就是对训练集中的数据有着非常高的拟合度，然而对于训练集之外的独立数据准确度相对较低。应对过拟合的方法有：

简化模型
- 减少层数和层中的结点数
训练中使用更小的期和批大小
增加训练数据的规模和多样性
正则化(Regularization)
丢弃(Dropout)

正则化

控制模型训练中的过拟合
在模型参数更新后，给模型参数提供一个调整量，防止其过拟合
当过拟合增加时，提供一个惩罚(penalty)，以减少模型的偏差
多种可用的正则化方法
- L1，L2，L1和L2的组合

试验程序

#Initialize the measures
accuracy_measures = {}regularizer_list = ['l1', 'l2', 'l1_l2']for regularizer in regularizer_list:#Load default configurationmodel_config = base_model_config()#Acquire and process input dataX,Y = get_data()model_config["REGULARIZER"] = regularizermodel_config["EPOCHS"] = 25model_name = "Regularizer-" + regularizerhistory = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Regularization")

丢弃(Dropout)

dropout是减少过拟合的一种非常流行的方法。dropout

在前向传播过程中随机丢弃一些结点
给定一个百分比数，按照这个百分比随机丢弃一些结点
drop的选取，应该使得训练数据集和测试数据集的准确度相似

试验程序

#Initialize the measures
accuracy_measures = {}dropout_list = [0.0, 0.1, 0.2, 0.5]for dropout in dropout_list:#Load default configurationmodel_config = base_model_config()#Acquire and process input dataX,Y = get_data()model_config["DROPOUT_RATE"] = dropoutmodel_config["EPOCHS"] = 25model_name = "dropout-" + str(dropout)history = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Dropouts")

模型优化练习

在这个练习中，需要从以下几个方面对模型进行优化

模型
- 模型的层数
- 每一层的结点数（基于优化后的层数）
反向传播
- 优化因子
- 学习率（基于已选定的优化因子）
过拟合
- 正则化
- 丢弃率（基于已经选定的正则化算法）
最终模型
- 组装所有的优化参数
- 和默认设置对比

环境准备

使用google colab的开发环境，需要作以下准备工作

在google colab的drive中创建一个自己的工作路径：Colab Notebooks/DeepLearning/tuning
将数据文件root_cause_analysis.csv上传到这个路径下
将模型优化和调整（1）中的“程序公共函数”代码封装为一个单独的文件：CommonFunctions.ipynb，准备重用

因为使用了google drive的本地文件，所以需要先导入自己的google drive

# mount my drive in google colab
from google.colab import drive
drive.mount('/content/drive')# change to my working directory, all sources are in this folder
%cd /content/drive/My Drive/Colab Notebooks/DeepLearning/tuning

同时，由于需要重用公共函数，所以运行以下代码

%run CommonFunctions.ipynb

获取并准备数据

将这一个动作封装为一个函数get_rca_data()，以便后续使用。

程序对类别做了独热编码(one-hot-encoding)的处理，这个处理在我之前的很多博文中都有讲解。具体来说，laber_encoder.fit_transform会将字符类别转换为"1, 2, 3"这样的数字标签；然后再调用to_categorical()，将"1, 2, 3"这样的数字标签转化为只含有0和1的向量。比如2转化为[0, 1, 0]，3转化为[0, 0, 1]。

import pandas as pd
import os
import tensorflow as tfdef get_rca_data():#Load the data file into a Pandas Dataframesymptom_data = pd.read_csv("root_cause_analysis.csv")#Explore the data loadedprint(symptom_data.dtypes)symptom_data.head()from sklearn import preprocessingfrom sklearn.model_selection import train_test_splitlaber_encoder = preprocessing.LabelEncoder()symptom_data['ROOT_CAUSE'] = laber_encoder.fit_transform(symptom_data['ROOT_CAUSE'])print(symptom_data['ROOT_CAUSE'][:5])#Convert Pandas Dataframe into a numpy vectornp_symptom = symptom_data.to_numpy().astype(float)#Extract the features (X), from 2nd column ~ 8th column (column B~H)X_data = np_symptom[:,1:8]#Extract the targets (Y), convert to one-hot-encoding the 9th column (column G)Y_data = np_symptom[:,8]Y_data = tf.keras.utils.to_categorical(Y_data, 3)return X_data, Y_data

调整网络参数

先优化层数，基本参考了模型优化和调整（1）中的程序

#Initialize the measures
accuracy_measures = {}
layer_list = []
for layer_count in range(1, 6):#32 nodes in each layerlayer_list.append(32)#Load default configurationmodel_config = base_model_config()#Acquire and process input dataX,Y = get_rca_data()#"HIDDEN_NODES" includes all nodes in layers from input layer to the last hidden layermodel_config["HIDDEN_NODES"] = layer_listmodel_name = "Layer-" + str(layer_count)history = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Layers")

结果如下：

可以看出2层具有较好的性能，因此选择层数为2

#2 layers seem to provide the highest accuracy level at lower epoch counts
LAYERS = 2

然后固定选择的层数，优化结点数

参考模型优化和调整（1）中的程序

#Initialize the measures
accuracy_measures = {}for node_count in range(8, 40, 8):#have a fixed number of 2 hidden layerslayer_list = []for layer_count in range(LAYERS):layer_list.append(node_count)#Load default configurationmodel_config = base_model_config()#Acquire and process input dataX,Y = get_rca_data()#"HIDDEN_NODES" includes all nodes in layers from input layer to the last hidden layermodel_config["HIDDEN_NODES"] = layer_listmodel_name = "Nodes-" + str(node_count)history = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Nodes")

可以看出，32具有较好的性能

#32 nodes seem to be best
NODES = 32

调整反向传播

调整优化因子

#Initialize the measures
accuracy_measures = {}optimizer_list = ['sgd', 'rmsprop', 'adam', 'adagrad']for optimizer in optimizer_list:#Load default configurationmodel_config = base_model_config()#apply the chosen configmodel_config["HIDDEN_NODES"] = []for i in range(LAYERS):model_config["HIDDEN_NODES"].append(NODES)#Acquire and process input dataX,Y = get_rca_data()model_config["OPTIMIZER"] = optimizermodel_name = "Optimizer-" + optimizerhistory = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Optimizers")

运行结果如下

应该选择'rmsprop'

#rmsprop seem to be best
OPTIMIZER = 'rmsprop'

调整学习率

#Initialize the measures
accuracy_measures = {}learning_rate_list = [0.001, 0.005, 0.01, 0.1, 0.5]for learning_rate in learning_rate_list:#Load default configurationmodel_config = base_model_config()#apply the chosen configmodel_config["HIDDEN_NODES"] = []for i in range(LAYERS):model_config["HIDDEN_NODES"].append(NODES)model_config["OPTIMIZER"] = OPTIMIZER#Acquire and process input dataX,Y = get_rca_data()model_config["LEARNING_RATE"] = learning_ratemodel_name = "Learning_Rate-" + str(learning_rate)history = create_and_run_model(model_config, X, Y, model_name)accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Learning Rates")

运行结果如下：

这个多次运行后结果不太稳定，原因是数据量太小。最终选择了0.001。

#All seems to be OK, choose 0.001
LEARNING_RATE = 0.001

避免过拟合

调整正则化

#Initialize the measures
accuracy_measures = {}regularizer_list = [None, 'l1', 'l2', 'l1_l2']for regularizer in regularizer_list:#Load default configurationmodel_config = base_model_config()#apply the chosen configmodel_config["HIDDEN_NODES"] = []for i in range(LAYERS):model_config["HIDDEN_NODES"].append(NODES)model_config["OPTIMIZER"] = OPTIMIZERmodel_config["LEARNING_RATE"] = LEARNING_RATE#Acquire and process input dataX,Y = get_rca_data()model_config["REGULARIZER"] = regularizermodel_config["EPOCHS"] = 25model_name = "Regularizer-" + str(regularizer)history = create_and_run_model(model_config, X, Y, model_name)# as considering overfitting, we choose valication accuracy as metricaccuracy_measures[model_name] = history.history["val_accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Regularization")

结果如下：

None和'l2'具有接近的性能，多次运行后，最终选择了None

# None & l2 has simliar performance, after run with serveral times, choose None
REGULARIZER = None

调整丢弃率

#Initialize the measures
accuracy_measures = {}dropout_list = [0.0, 0.1, 0.2, 0.5]for dropout in dropout_list:#Load default configurationmodel_config = base_model_config()#apply the chosen configmodel_config["HIDDEN_NODES"] = []for i in range(LAYERS):model_config["HIDDEN_NODES"].append(NODES)model_config["OPTIMIZER"] = OPTIMIZERmodel_config["LEARNING_RATE"] = LEARNING_RATEmodel_config["REGULARIZER"] = REGULARIZER#Acquire and process input dataX,Y = get_rca_data()model_config["DROPOUT_RATE"] = dropoutmodel_name = "dropout-" + str(dropout)history = create_and_run_model(model_config, X, Y, model_name)# as considering overfitting, we choose valication accuracy as metricaccuracy_measures[model_name] = history.history["val_accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Dropouts")

这个运行结果也不太稳定，多次运行后，选择了0.1

# 0.1 is the best
DROPOUT = 0.1

构建最终的模型

通过使用默认配置和优化后的配置，对比二者的效果

#Initialize the measures
accuracy_measures = {}#Base model with default configurations
model_config = base_model_config()
model_config["HIDDEN_NODES"] = [16]
model_config["NORMALIZATION"] = None
model_config["OPTIMIZER"] = 'rmsprop'
model_config["LEARNING_RATE"] = 0.001
model_config["REGULARIZER"] = None
model_config["DROPOUT_RATE"] = 0.0#Acquire and process input data
X,Y = get_rca_data()model_name = "Base-Model"
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]#Optimized model
#apply the chosen config
model_config["HIDDEN_NODES"] = []
for i in range(LAYERS):model_config["HIDDEN_NODES"].append(NODES)
model_config["NORMALIZATION"] = 'batch'
model_config["OPTIMIZER"] = OPTIMIZER
model_config["LEARNING_RATE"] = LEARNING_RATE
model_config["REGULARIZER"] = REGULARIZER
model_config["DROPOUT_RATE"] = DROPOUT#Acquire and process input data
X,Y = get_rca_data()model_name = "Optimized-Model"
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]#Plot
plot_graph(accuracy_measures, "Compare Base and Optimized Model")

这个运行结果也不是很稳定，多次运行后，总体来说，优化后模型的性能是更好的。