我是从keras入门深度学习的,第一个用的demo是keras实现的yolov3,代码很好懂(其实也不是很好懂,第一次也搞了很久才弄懂)
然后是做的车牌识别,用了tiny-yolo来检测车牌位置,当时训练有4w张图片,用了一天来训练,当时觉得时间可能就是这么长,也不懂GPU训练的时候GPU利用率,所以不怎么在意,后来随着项目图片片的增多,训练时间越来越大,受不了了,看了一片文章才注意到GPU利用率的问题.想到要用tensorflow原生的api去训练,比如用tf.data.dataset
就找到了这个tensorflow原生实现yolo的项目,在训练的时候发现他没加梯度衰减,训练了一段时间total loss下不去了,所以加了一个梯度衰减。想写一下文章,小白的第一篇文章哈哈哈,大神别喷我的内容太简单
YunYang1994/tensorflow-yolov3github.com他好像改了train.py
原来是这样的
import tensorflow as tf
from core import utils, yolov3
from core.dataset import dataset, Parser
sess = tf.Session()IMAGE_H, IMAGE_W = 416, 416
BATCH_SIZE = 8
EPOCHS = 2000*1000
LR = 0.0001
SHUFFLE_SIZE = 1000
CLASSES = utils.read_coco_names('./data/voc.names')
ANCHORS = utils.get_anchors('./data/voc_anchors.txt')
NUM_CLASSES = len(CLASSES)train_tfrecord = "../VOC/train/voc_train*.tfrecords"
test_tfrecord = "../VOC/test/voc_test*.tfrecords"parser = Parser(IMAGE_H, IMAGE_W, ANCHORS, NUM_CLASSES)
trainset = dataset(parser, train_tfrecord, BATCH_SIZE, shuffle=SHUFFLE_SIZE)
testset = dataset(parser, test_tfrecord , BATCH_SIZE, shuffle=None)is_training = tf.placeholder(tf.bool)
example = tf.cond(is_training, lambda: trainset.get_next(), lambda: testset.get_next())images, *y_true = example
model = yolov3.yolov3(NUM_CLASSES, ANCHORS)with tf.variable_scope('yolov3'):y_pred = model.forward(images, is_training=is_training)loss = model.compute_loss(y_pred, y_true)optimizer = tf.train.AdamOptimizer(LR)
saver = tf.train.Saver(max_to_keep=2)tf.summary.scalar("loss/coord_loss", loss[1])
tf.summary.scalar("loss/sizes_loss", loss[2])
tf.summary.scalar("loss/confs_loss", loss[3])
tf.summary.scalar("loss/class_loss", loss[4])write_op = tf.summary.merge_all()
writer_train = tf.summary.FileWriter("./data/train")
writer_test = tf.summary.FileWriter("./data/test")update_var = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="yolov3/yolo-v3")
with tf.control_dependencies(update_var):train_op = optimizer.minimize(loss[0], var_list=update_var,global_step=global_step) # only update yolo layersess.run(tf.global_variables_initializer())
pretrained_weights = tf.global_variables(scope="yolov3/darknet-53")
load_op = utils.load_weights(var_list=pretrained_weights,weights_file="./darknet53.conv.74")
sess.run(load_op)for epoch in range(EPOCHS):run_items = sess.run([train_op, write_op] + loss, feed_dict={is_training:True})writer_train.add_summary(run_items[1], global_step=epoch)writer_train.flush() # Flushes the event file to diskif (epoch+1)%1000 == 0: saver.save(sess, save_path="./checkpoint/yolov3.ckpt", global_step=epoch)run_items = sess.run([write_op] + loss, feed_dict={is_training:False})writer_test.add_summary(run_items[0], global_step=epoch)writer_test.flush() # Flushes the event file to diskprint("EPOCH:%7d tloss_xy:%7.4f tloss_wh:%7.4f tloss_conf:%7.4f tloss_class:%7.4f"%(epoch, run_items[2], run_items[3], run_items[4], run_items[5]))
然后我发现没有梯度下降,所以就找了怎么实现
实现如下
optimizer = tf.train.AdamOptimizer(LR)
改为
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(LR,100,0.93,staircase=True,global_step=global_step)
optimizer = tf.train.AdamOptimizer(learning_rate)
learningrate 是梯度的类,LR是初始梯度,100是每一百次初始梯度乘以衰减度,这里是第三个参数0.93代表了衰减度,globalstep_step = global_step是一定要加的,不然梯度一直保持了初始梯度。
最后加个打印
tf.summary.scalar('learning_rate',learning_rate)
就可以爽快的去训练了