文章目录
- 1. baseline
- 2. 改进
- 2.1 增加训练时间
- 2.2 更改网络结构
Digit Recognizer 练习地址
相关博文:
[Hands On ML] 3. 分类(MNIST手写数字预测)
[Kaggle] Digit Recognizer 手写数字识别
1. baseline
- 导入包
import tensorflow as tf
from tensorflow import keras
# import keras
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pdtrain = pd.read_csv('train.csv')
y_train_full = train['label']
X_train_full = train.drop(['label'], axis=1)
X_test = pd.read_csv('test.csv')
- 数据维度
X_train_full.shape
(42000, 784)
42000个训练样本,每个样本 28*28
展平后的像素值 784 个
- 像素归一化,拆分训练集、验证集
X_valid, X_train = X_train_full[:8000] / 255.0, X_train_full[8000:] / 255.0
y_valid, y_train = y_train_full[:8000], y_train_full[8000:]
- 数据预览
from PIL import Image
img = Image.fromarray(np.uint8(np.array(X_train_full)[0].reshape(28,28)))
img.show()
print(np.uint8(np.array(X_train_full)[0].reshape(28,28)))
数字 1 的像素矩阵:
[[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 188 255 94 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 191 250 253 93 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 123 248 253 167 10 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 247 253 208 13 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 207 253 235 77 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54 209 253 253 88 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 93 254 253 238 170 17 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 23 210 254 253 159 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 16 209 253 254 240 81 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 27 253 253 254 13 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 20 206 254 254 198 7 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 168 253 253 196 7 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 20 203 253 248 76 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 22 188 253 245 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 103 253 253 191 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 89 240 253 195 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 15 220 253 253 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 94 253 253 253 94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 89 251 253 250 131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 214 218 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
- 添加模型
model = keras.models.Sequential()
# model.add(keras.layers.Flatten(input_shape=[784]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))
或者以下写法
model = keras.models.Sequential([
# keras.layers.Flatten(input_shape=[784]),
keras.layers.Dense(300, activation="relu"),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
- 定义优化器,配置模型
opt = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss="sparse_categorical_crossentropy",optimizer=opt, metrics=["accuracy"])
- 训练
history = model.fit(X_train, y_train, epochs=30,validation_data=(X_valid, y_valid))
...
Epoch 30/30
1063/1063 [==============================] - 2s 2ms/step -
loss: 0.0927 - accuracy: 0.9748 -
val_loss: 0.1295 - val_accuracy: 0.9643
- 模型参数
model.summary()
输出:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_15 (Dense) (None, 300) 235500
_________________________________________________________________
dense_16 (Dense) (None, 100) 30100
_________________________________________________________________
dense_17 (Dense) (None, 10) 1010
=================================================================
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________
- 绘制模型结构
from tensorflow.keras.utils import plot_model
plot_model(model, './model.png', show_shapes=True)
- 绘制训练曲线
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1) # set the vertical range to [0-1]
plt.show()
- 对测试集预测
y_pred = model.predict(X_test)
pred = y_pred.argmax(axis=1).reshape(-1)
print(pred.shape)image_id = pd.Series(range(1,len(pred)+1))
output = pd.DataFrame({'ImageId':image_id, 'Label':pred})
output.to_csv("submission_svc.csv", index=False)
得分 : 0.95989
2. 改进
根据上面的准确率:
...
Epoch 30/30
1063/1063 [==============================] - 2s 2ms/step -
loss: 0.0927 - accuracy: 0.9748 -
val_loss: 0.1295 - val_accuracy: 0.9643
人类的准确率几乎是100%,我们的训练集准确率 97.48%,验证集准确率 96.43%,我们的模型存在高偏差
参考, 超参数调试、正则化以及优化:https://michael.blog.csdn.net/article/details/108372707
怎么办?
2.1 增加训练时间
训练次数更改为 epochs=100
...
Epoch 100/100
1063/1063 [==============================] - 2s 2ms/step -
loss: 0.0751 - accuracy: 0.9798 -
val_loss: 0.1194 - val_accuracy: 0.9661
得分 : 0.96296,比上面好 0.307%
2.2 更改网络结构
- 添加隐藏层
model = keras.models.Sequential()
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu")) # 增加一层
model.add(keras.layers.Dense(10, activation="softmax"))
Epoch 100/100
1063/1063 [==============================] - 2s 2ms/step -
loss: 0.0585 - accuracy: 0.9847 -
val_loss: 0.1114 - val_accuracy: 0.9672
得分 : 0.96546,比上面好 0.25%
- 再添加隐藏层
model = keras.models.Sequential()
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu")) # 增加一层
model.add(keras.layers.Dense(50, activation="relu")) # 增加一层
model.add(keras.layers.Dense(10, activation="softmax"))
Epoch 100/100
1063/1063 [==============================] - 2s 2ms/step -
loss: 0.0544 - accuracy: 0.9860 -
val_loss: 0.1039 - val_accuracy: 0.9700
得分 : 0.96578,比上面好 0.032%
- 增加隐藏单元数量、使用 batch_size = 128、训练250轮
DROP_OUT = 0.3
model = keras.models.Sequential()
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))
history = model.fit(X_train, y_train, epochs=250, batch_size=128,validation_data=(X_valid, y_valid))
Epoch 250/250
266/266 [==============================] - 3s 10ms/step -
loss: 9.7622e-08 - accuracy: 1.0000 -
val_loss: 0.2358 - val_accuracy: 0.9766
得分 : 0.97442,比上面好 0.864%
- 使用 dropout 随机使一些神经元失效,是一种正则化方法
DROP_OUT = 0.3
model = keras.models.Sequential()
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dropout(DROP_OUT)) # dropout 正则化
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dropout(DROP_OUT))
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dropout(DROP_OUT))
model.add(keras.layers.Dense(500, activation="relu"))
model.add(keras.layers.Dropout(DROP_OUT))
model.add(keras.layers.Dense(10, activation="softmax"))
history = model.fit(X_train, y_train, epochs=250, batch_size=128,validation_data=(X_valid, y_valid))
Epoch 250/250
266/266 [==============================] - 4s 16ms/step -
loss: 0.0171 - accuracy: 0.9940 -
val_loss: 0.0928 - val_accuracy: 0.9779
得分 : 0.97546,比上面好 0.104%
- 实验对比汇总:
模型/准确率(%) | 训练集 | 验证集 | 测试集 |
---|---|---|---|
简单模型 | 97.48 | 96.43 | 95.989 |
增加训练次数 | 97.98 | 96.61 | 96.296(+0.307%) |
增加隐藏层 | 98.47 | 96.72 | 96.546(+0.25%) |
再增加隐藏层 | 98.60 | 97.00 | 96.578(+0.032%) |
增加隐藏单元数量、batch_size = 128、训练250轮 | 100 | 97.66 | 97.442(+0.864%) |
使用 dropout 随机失活(正则化) | 99.40 | 97.79 | 97.546(+0.104%) |
目前最好得分,可以在 kaggle 排到1597名。
我的CSDN博客地址 https://michael.blog.csdn.net/
长按或扫码关注我的公众号(Michael阿明),一起加油、一起学习进步!