Tensorflow C3D完成视频动作识别

本文是视频动作识别领域经典的C3D网络的简易实现,可以作为动作识别的入门。论文为<Learning Spatiotemporal Features with 3D Convolutional Networks>(ICCV 2015)。

框架:Tensorflow (=1.6)+python(2.7)+slim

数据集:UCF101. Center for Research in Computer Vision at the University of Central Florida

代码:2012013382/C3D-Tensorflow-slim

3D卷积的基本概念网上有很多,不再阐述。这里主要说一下输入帧(图片)通过网络之后的变化情况。

C3D的基本网络结构如图1所示:

图1 C3D网络结构示意图

细节:

1)输入clip(视频段)的shape为[batch_size, frame_length, crop_size, crop_size, channel_num],其中frame_length为16,表示输入为16帧一个样本;crop_size为112,channel_num为3,表示每帧的size统一为[112, 112, 3]。

2)每个卷积核的size都是[3, 3, 3],第一维表示时间维,后面两维表示帧(图片)上的kernel size;stride都是[1, 1, 1], padding='SAME'。

3)所有的pooling都是3D max pooling,只有第一层pooling的size和stride为[1, 2, 2],其他的均为[2, 2, 2],维数的含义与1)中一致,padding='SAME'。作者称第一层时间维用1是为了避免时间维度上过早缩小为1。

输入clip通过网络的shape变化如下:

设batch_size为10。

Input shape:[10, 16, 112, 112, 3]

After conv1:[10, 16, 112, 112, 64]

After pool1:[10, 16, 56, 56, 64]

After conv2a:[10, 16, 56, 56, 128]

After pool2:[10, 8, 28, 28, 128]

After conv3a:[10, 8, 28, 28, 256]

After conv3b:[10, 8, 28, 28, 256]

After pool3:[10, 4, 14, 14, 256]

After conv4a:[10, 4, 14, 14, 512]

After conv4b:[10, 4, 14, 14, 512]

After pool4:[10, 2, 7, 7, 512]

After conv5a:[10, 2, 7, 7, 512]

After conv5b:[10, 2, 7, 7, 512]

After pool5:[10, 1, 4, 4, 512]

After fc6:[10, 4096]

After fc7:[10, 4096]

out:[10, num_classes](UCF的num_classes为101)

数据预处理

做视频的工作,数据预处理相对会比较复杂,由于视频数据集通常较大,我们通常将其先转为图片的形式,再每次从硬盘上读一个batch的数据。下载UCF101数据集之后,将其解压到项目的根目录下。创建convert_video_to_images.sh文件,内容为

for folder in $1/*
dofor file in "$folder"/*.avidoif [[ ! -d "${file[@]%.avi}" ]]; thenmkdir -p "${file[@]%.avi}"fiffmpeg -i "$file" -vf fps=$2 "${file[@]%.avi}"/%05d.jpgrm "$file"done
done

执行

sudo ./convert_video_to_images.sh UCF101/ 5

表示将视频每秒取5帧图片。

之后生成训练集与测试集。创建convert_images_to_list.sh文件,内容为

> train.list
> test.list
COUNT=-1
for folder in $1/*
doCOUNT=$[$COUNT + 1]for imagesFolder in "$folder"/*doif (( $(jot -r 1 1 $2)  > 1 )); thenecho "$imagesFolder" $COUNT >> train.listelseecho "$imagesFolder" $COUNT >> test.listfi        done
done

执行

./convert_images_to_list.sh UCF101/ 4

表示1/4的数据为测试集,其余为训练集。

每次为训练和测试从硬盘上读取batch_size大小的数据,具体如下。

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport PIL.Image as Image
import random
import numpy as np
import os
import time
CLIP_LENGTH = 16
import cv2
VALIDATION_PRO = 0.2np_mean = np.load('crop_mean.npy').reshape([CLIP_LENGTH, 112, 112, 3])
def get_test_num(filename):lines = open(filename, 'r')return len(list(lines))
def get_video_indices(filename):lines = open(filename, 'r')#Shuffle datalines = list(lines)video_indices = range(len(lines))random.seed(time.time())random.shuffle(video_indices)validation_video_indices = video_indices[:int(len(video_indices) * 0.2)]train_video_indices = video_indices[int(len(video_indices) * 0.2):]return train_video_indices, validation_video_indicesdef frame_process(clip, clip_length=CLIP_LENGTH, crop_size=112, channel_num=3):frames_num = len(clip)croped_frames = np.zeros([frames_num, crop_size, crop_size, channel_num]).astype(np.float32)#Crop every frame into shape[crop_size, crop_size, channel_num]for i in range(frames_num):img = Image.fromarray(clip[i].astype(np.uint8))if img.width > img.height:scale = float(crop_size) / float(img.height)img = np.array(cv2.resize(np.array(img), (int(img.width * scale + 1), crop_size))).astype(np.float32)else:scale = float(crop_size) / float(img.width)img = np.array(cv2.resize(np.array(img), (crop_size, int(img.height * scale + 1)))).astype(np.float32)crop_x = int((img.shape[0] - crop_size) / 2)crop_y = int((img.shape[1] - crop_size) / 2)img = img[crop_x: crop_x + crop_size, crop_y : crop_y + crop_size, :]croped_frames[i, :, :, :] = img - np_mean[i]return croped_framesdef convert_images_to_clip(filename, clip_length=CLIP_LENGTH, crop_size=112, channel_num=3):clip = []for parent, dirnames, filenames in os.walk(filename):filenames = sorted(filenames)if len(filenames) < clip_length:for i in range(0, len(filenames)):image_name = str(filename) + '/' + str(filenames[i])img = Image.open(image_name)img_data = np.array(img)clip.append(img_data)for i in range(clip_length - len(filenames)):image_name = str(filename) + '/' + str(filenames[len(filenames) - 1])img = Image.open(image_name)img_data = np.array(img)clip.append(img_data)else:s_index = random.randint(0, len(filenames) - clip_length)for i in range(s_index, s_index + clip_length):image_name = str(filename) + '/' + str(filenames[i])img = Image.open(image_name)img_data = np.array(img)clip.append(img_data)if len(clip) == 0:print(filename)clip = frame_process(clip, clip_length, crop_size, channel_num)return clip#shape[clip_length, crop_size, crop_size, channel_num]def get_batches(filename, num_classes, batch_index, video_indices, batch_size=10, crop_size=112, channel_num=3):lines = open(filename, 'r')clips = []labels = []lines = list(lines)for i in video_indices[batch_index: batch_index + batch_size]:line = lines[i].strip('\n').split()dirname = line[0]label = line[1]i_clip = convert_images_to_clip(dirname, CLIP_LENGTH, crop_size, channel_num)clips.append(i_clip)labels.append(int(label))clips = np.array(clips).astype(np.float32)labels = np.array(labels).astype(np.int64)oh_labels = np.zeros([len(labels), num_classes]).astype(np.int64)for i in range(len(labels)):oh_labels[i, labels[i]] = 1batch_index = batch_index + batch_size#Convert to numpybatch_data = {'clips': clips, 'labels': oh_labels}return batch_data, batch_index

这里需要注意的是:为了简便,我每一个视频随机抽取一个连续的16帧组成clip,作为一个样本,如果batch_size为10,那么就是取了10个视频,每个视频随机取16帧,组成了10 clips,作为每次网络的输入。

模型使用slim,因为实现简单,阅读容易。

import tensorflow as tf
import tensorflow.contrib.slim as slimdef C3D(input, num_classes, keep_pro=0.5):with tf.variable_scope('C3D'):with slim.arg_scope([slim.conv3d],padding='SAME',weights_regularizer=slim.l2_regularizer(0.0005),activation_fn=tf.nn.relu,kernel_size=[3, 3, 3],stride=[1, 1, 1]):net = slim.conv3d(input, 64, scope='conv1')net = slim.max_pool3d(net, kernel_size=[1, 2, 2], stride=[1, 2, 2], padding='SAME', scope='max_pool1')net = slim.conv3d(net, 128, scope='conv2')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool2')net = slim.repeat(net, 2, slim.conv3d, 256, scope='conv3')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool3')net = slim.repeat(net, 2, slim.conv3d, 512, scope='conv4')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool4')net = slim.repeat(net, 2, slim.conv3d, 512, scope='conv5')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool5')net = tf.reshape(net, [-1, 512 * 4 * 4])net = slim.fully_connected(net, 4096, weights_regularizer=slim.l2_regularizer(0.0005), scope='fc6')net = slim.dropout(net, keep_pro, scope='dropout1')net = slim.fully_connected(net, 4096, weights_regularizer=slim.l2_regularizer(0.0005), scope='fc7')net = slim.dropout(net, keep_pro, scope='dropout2')out = slim.fully_connected(net, num_classes, weights_regularizer=slim.l2_regularizer(0.0005), \activation_fn=None, scope='out')return out

训练

import tensorflow as tf
import numpy as np
import C3D_model
import time
import data_processing
import os
import os.path
from os.path import join
TRAIN_LOG_DIR = os.path.join('Log/train/', time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
TRAIN_CHECK_POINT = 'check_point/'
TRAIN_LIST_PATH = 'train.list'
TEST_LIST_PATH = 'test.list'
BATCH_SIZE = 10
NUM_CLASSES = 101
CROP_SZIE = 112
CHANNEL_NUM = 3
CLIP_LENGTH = 16
EPOCH_NUM = 50
INITIAL_LEARNING_RATE = 1e-4
LR_DECAY_FACTOR = 0.5
EPOCHS_PER_LR_DECAY = 2
MOVING_AV_DECAY = 0.9999
#Get shuffle index
train_video_indices, validation_video_indices = data_processing.get_video_indices(TRAIN_LIST_PATH)with tf.Graph().as_default():batch_clips = tf.placeholder(tf.float32, [BATCH_SIZE, CLIP_LENGTH, CROP_SZIE, CROP_SZIE, CHANNEL_NUM], name='X')batch_labels = tf.placeholder(tf.int32, [BATCH_SIZE, NUM_CLASSES], name='Y')keep_prob = tf.placeholder(tf.float32)logits = C3D_model.C3D(batch_clips, NUM_CLASSES, keep_prob)with tf.name_scope('loss'):loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=batch_labels))tf.summary.scalar('entropy_loss', loss)with tf.name_scope('accuracy'):accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(batch_labels, 1)), np.float32))tf.summary.scalar('accuracy', accuracy)#global_step = tf.Variable(0, name='global_step', trainable=False) #decay_step = EPOCHS_PER_LR_DECAY * len(train_video_indices) // BATCH_SIZElearning_rate = 1e-4#tf.train.exponential_decay(INITIAL_LEARNING_RATE, global_step, decay_step, LR_DECAY_FACTOR, staircase=True)optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)#, global_step=global_step)saver = tf.train.Saver()summary_op = tf.summary.merge_all()config = tf.ConfigProto()config.gpu_options.allow_growth = Truewith tf.Session(config=config) as sess:train_summary_writer = tf.summary.FileWriter(TRAIN_LOG_DIR, sess.graph)sess.run(tf.global_variables_initializer())sess.run(tf.local_variables_initializer())step = 0for epoch in range(EPOCH_NUM):accuracy_epoch = 0loss_epoch = 0batch_index = 0for i in range(len(train_video_indices) // BATCH_SIZE):step += 1batch_data, batch_index = data_processing.get_batches(TRAIN_LIST_PATH, NUM_CLASSES, batch_index,train_video_indices, BATCH_SIZE)_, loss_out, accuracy_out, summary = sess.run([optimizer, loss, accuracy, summary_op],feed_dict={batch_clips:batch_data['clips'],batch_labels:batch_data['labels'],keep_prob: 0.5})loss_epoch += loss_outaccuracy_epoch += accuracy_outif i % 10 == 0:print('Epoch %d, Batch %d: Loss is %.5f; Accuracy is %.5f'%(epoch+1, i, loss_out, accuracy_out))train_summary_writer.add_summary(summary, step)print('Epoch %d: Average loss is: %.5f; Average accuracy is: %.5f'%(epoch+1, loss_epoch / (len(train_video_indices) // BATCH_SIZE),accuracy_epoch / (len(train_video_indices) // BATCH_SIZE)))accuracy_epoch = 0loss_epoch = 0batch_index = 0for i in range(len(validation_video_indices) // BATCH_SIZE):batch_data, batch_index = data_processing.get_batches(TRAIN_LIST_PATH, NUM_CLASSES, batch_index,validation_video_indices, BATCH_SIZE)loss_out, accuracy_out = sess.run([loss, accuracy],feed_dict={batch_clips:batch_data['clips'],batch_labels:batch_data['labels'],keep_prob: 1.0})loss_epoch += loss_outaccuracy_epoch += accuracy_outprint('Validation loss is %.5f; Accuracy is %.5f'%(loss_epoch / (len(validation_video_indices) // BATCH_SIZE),accuracy_epoch /(len(validation_video_indices) // BATCH_SIZE)))saver.save(sess, TRAIN_CHECK_POINT + 'train.ckpt', global_step=epoch)

这里取训练集的20%作为验证集,在训练集上每跑完一个epoch,就在验证集上验证一次。

测试

import tensorflow as tf
import numpy as np
import C3D_model
import data_processing
TRAIN_LOG_DIR = 'Log/train/'
TRAIN_CHECK_POINT = 'check_point/train.ckpt-36'
TEST_LIST_PATH = 'test.list'
BATCH_SIZE = 10
NUM_CLASSES = 101
CROP_SZIE = 112
CHANNEL_NUM = 3
CLIP_LENGTH = 16
EPOCH_NUM = 50
test_num = data_processing.get_test_num(TEST_LIST_PATH)test_video_indices = range(test_num)with tf.Graph().as_default():batch_clips = tf.placeholder(tf.float32, [BATCH_SIZE, CLIP_LENGTH, CROP_SZIE, CROP_SZIE, CHANNEL_NUM], name='X')batch_labels = tf.placeholder(tf.int32, [BATCH_SIZE, NUM_CLASSES], name='Y')keep_prob = tf.placeholder(tf.float32)logits = C3D_model.C3D(batch_clips, NUM_CLASSES, keep_prob)accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(batch_labels, 1)), np.float32))restorer = tf.train.Saver()config = tf.ConfigProto()config.gpu_options.allow_growth = Truewith tf.Session(config=config) as sess:sess.run(tf.global_variables_initializer())sess.run(tf.local_variables_initializer())restorer.restore(sess, TRAIN_CHECK_POINT)accuracy_epoch = 0batch_index = 0for i in range(test_num // BATCH_SIZE):if i % 10 == 0:print('Testing %d of %d'%(i + 1, test_num // BATCH_SIZE))batch_data, batch_index = data_processing.get_batches(TEST_LIST_PATH, NUM_CLASSES, batch_index,test_video_indices, BATCH_SIZE)accuracy_out = sess.run(accuracy,feed_dict={batch_clips: batch_data['clips'],batch_labels: batch_data['labels'],keep_prob: 1.0})accuracy_epoch += accuracy_outprint('Test accuracy is %.5f' % (accuracy_epoch / (test_num // BATCH_SIZE)))

实验结果

我在训练集上跑了36个epoches,最终模型在测试集上的结果为72%左右。

参考

hx173149/C3D-tensorflow​github.com图标

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/547214.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

beego上传文件

html代码&#xff1a; <form id"fform" method"POST" enctype"multipart/form-data"> <input id"myfile" name"myfile" type"file" /> <input type"submit" value"保存" /&…

去除表中重复的数据

2019独角兽企业重金招聘Python工程师标准>>> 表数据如下&#xff1a; 查询表中重复的数据&#xff1a; SELECT * FROM t_class WHERE classname IN (SELECT classname FROM t_class GROUP BY classname HAVING COUNT(classname) > 1) 结果如下&#xff1a; 显示…

golang中的类和接口的使用

类使用&#xff1a;实现一个people中有一个sayhi的方法调用功能&#xff0c;代码如下&#xff1a; type People struct {//..}func (p *People) SayHi() {fmt.Println("************************* say hi !!")}func (this *LoginController) Get() {p : new(People)p…

基于动态骨骼的动作识别方法ST-GCN

解读&#xff1a;基于动态骨骼的动作识别方法ST-GCN&#xff08;时空图卷积网络模型&#xff09; 2018年04月09日 01:14:14 我是婉君的 阅读数 16076更多 分类专栏&#xff1a; 计算机视觉 论文 版权声明&#xff1a;本文为博主原创文章&#xff0c;遵循 CC 4.0 BY-SA 版权协…

winform中textbox属性Multiline=true时全选

1、文本框右键属性 > 添加KeyDown事件。 2、添加如下代码&#xff1a; private void txt_result_KeyDown(object sender, KeyEventArgs e){if (e.Control && e.KeyCode Keys.A){((TextBox)sender).SelectAll();}}

LVS性能调优小记

lvs调优主要是针对内核参数的调优一、调整内核参数CONFIG_IP_VS_TAB_BITS1.1 CONFIG_IP_VS_TAB_BITS说明IPVS connection hash table size&#xff0c;取值范围:[12,20]。该表用于记录每个进来的连接及路由去向的信息。连接的Hash表要容纳几百万个并发连接&#xff0c;任何一个…

pyTorch实现C3D模型的视频行为识别实践

1、3D卷积t简介 2、C3D模型原理与PyTorch实现 2.1、C3D模型结构 2.2、C3D视频动作识别&#xff08;附PyTorch代码&#xff09; 2.3、测试结果 参考 1、3D卷积的简介 在图像处理领域&#xff0c;被卷积的都是静态图像&#xff0c;所以使用2D卷积网络就足以。而在视频理解领域&…

nodejs与javascript中的aes加密

简介 1.aes加密简单来说&#xff0c;在密码学中又称Rijndael加密法&#xff0c;是美国联邦政府采用的一种区块加密标准。这个标准用来替代原先的DES&#xff0c;已经被多方分析且广为全世界所使用。高级加密标准已然成为对称密钥加密中最流行的算法之一。 2.AES的区块长度固定…

SQL事务控制语言(TCL)

1、什么是事务&#xff1f; 事务&#xff08;Transaction&#xff09;是由一系列相关的SQL语句组成的最小逻辑工作单元&#xff0c;在程序更新数据库时事务事关重要&#xff0c;因为必须维护数据的完整性。事务由数据操作语言完成&#xff0c;是对数据库所做的一个或多个修改。…

U811.1接口EAI系列之六--物料上传--VB语言

1. 业务系统同步U811.1存货档案通用方法. 2.具体代码处理如下&#xff1a; 作者&#xff1a;王春天 2013-11-06 地址&#xff1a;http://www.cnblogs.com/spring_wang/p/3409844.html 代码中调用的通用方法在: http://www.cnblogs.com/spring_wang/p/3393147.html 物料信息生成…

HyperLPR Python3车牌识别系统的入门使用

概要 HyperLRP是一个开源的、基于深度学习高性能中文车牌识别库&#xff0c;由北京智云视图科技有限公司开发&#xff0c;支持PHP、C/C、Python语言&#xff0c;Windows/Mac/Linux/Android/IOS 平台。 github地址&#xff1a; https://github.com/zeusees/HyperLPR TODO 支…

maven 基本命令

今天复习了一下maven命令的使用&#xff0c;这里总结一下&#xff0c;作为后续使用的参考&#xff1a;1. mvn clean清理命令&#xff0c;该命令用来清除原来编译生成的.class和.jar 等文件。maven的做法比较暴力&#xff0c;直接将放置这类文件的targer目录删除了&#xff01;2…

Linux——软件包简单学习笔记

Linux中的是那种软件包&#xff1a; &#xff08;这里学习是基于redHat的Cent-OS&#xff09; 1&#xff1a; 二进制软件包管理&#xff08;RPM、YUM&#xff09; 2&#xff1a;源代码包安装 3&#xff1a; 脚本安装&#xff08;Shell或Java脚本&#xff09; 一&#xff1a; 二…

表单reset无法重置hidden的解决方案

方法一&#xff1a;用text代替hidden&#xff0c;设置text隐藏 <input id"id" name"id" style"display: none;" value"0" /> 方法二&#xff1a;单独处理hidden类型 jQuery("#saveForm").form("reset");jQu…

python http 返回json中文乱码

json.dumps(var,ensure_asciiFalse)并不能解决中文乱码的问题 python 2.7版本 # -*- coding: utf-8 -*- m {a : 你好}print m >{a: \xe4\xbd\xa0\xe5\xa5\xbd}print json.dumps(m) >{"a": "\u4f60\u597d"}print json.dumps(m,ensure_asciiFalse) …

javascript中Array的操作

concat&#xff08;组合数组&#xff09;join&#xff08;数组转字符串&#xff09;pop&#xff08;删除最后一个元素&#xff09;shift&#xff08;删除第一个元素&#xff09;push&#xff08;在数组尾部添加新元素&#xff09;unshift&#xff08;在数组头部添加新元素&…

Github git clone国内mirror加速

Github国内加速克隆及下载 fastgit.orghttps://doc.fastgit.org/ gitclone.comhttps://gitclone.com/ giteehttps://gitee.com/mirrors cnpmjs.orghttps://github.com.cnpmjs.org/ 克隆加速 #原地址 git clone https://github.com/kubernetes/kubernetes.git#改为 git cl…

ASP.NET MVC 5调用其他Action

引用代码&#xff1a; Html.Action("Index", "BaseData", new { d "variety" }) 后台获取参数&#xff1a; RouteData.Values["d"]

OpenscenGraph中控制swapbuffer的方法(用于多机大屏幕同步显示机制)

*************************************************************************************************************************osg多机同步swapbuffer的实现方式。***osg中真正调用opengl::swapbuffer的地方在 osg::GrapicsContext::swapBuffers()中调用的。***如果develope…

linq to entity常用操作

一、聚合函数查询 double sum 0;using (xxxEntities db new xxxEntities()){sum db.userinfo.AsNoTracking().Where(d > d.idid).Sum(m > (double?)m.money).GetValueOrDefault();}return sum; 二、删除操作 int result 0;if (id > 0){using (gghdbEntities db …