Tensorflow C3D完成视频动作识别

本文是视频动作识别领域经典的C3D网络的简易实现,可以作为动作识别的入门。论文为<Learning Spatiotemporal Features with 3D Convolutional Networks>(ICCV 2015)。

框架:Tensorflow (=1.6)+python(2.7)+slim

数据集:UCF101. Center for Research in Computer Vision at the University of Central Florida

代码:2012013382/C3D-Tensorflow-slim

3D卷积的基本概念网上有很多,不再阐述。这里主要说一下输入帧(图片)通过网络之后的变化情况。

C3D的基本网络结构如图1所示:

图1 C3D网络结构示意图

细节:

1)输入clip(视频段)的shape为[batch_size, frame_length, crop_size, crop_size, channel_num],其中frame_length为16,表示输入为16帧一个样本;crop_size为112,channel_num为3,表示每帧的size统一为[112, 112, 3]。

2)每个卷积核的size都是[3, 3, 3],第一维表示时间维,后面两维表示帧(图片)上的kernel size;stride都是[1, 1, 1], padding='SAME'。

3)所有的pooling都是3D max pooling,只有第一层pooling的size和stride为[1, 2, 2],其他的均为[2, 2, 2],维数的含义与1)中一致,padding='SAME'。作者称第一层时间维用1是为了避免时间维度上过早缩小为1。

输入clip通过网络的shape变化如下:

设batch_size为10。

Input shape:[10, 16, 112, 112, 3]

After conv1:[10, 16, 112, 112, 64]

After pool1:[10, 16, 56, 56, 64]

After conv2a:[10, 16, 56, 56, 128]

After pool2:[10, 8, 28, 28, 128]

After conv3a:[10, 8, 28, 28, 256]

After conv3b:[10, 8, 28, 28, 256]

After pool3:[10, 4, 14, 14, 256]

After conv4a:[10, 4, 14, 14, 512]

After conv4b:[10, 4, 14, 14, 512]

After pool4:[10, 2, 7, 7, 512]

After conv5a:[10, 2, 7, 7, 512]

After conv5b:[10, 2, 7, 7, 512]

After pool5:[10, 1, 4, 4, 512]

After fc6:[10, 4096]

After fc7:[10, 4096]

out:[10, num_classes](UCF的num_classes为101)

数据预处理

做视频的工作,数据预处理相对会比较复杂,由于视频数据集通常较大,我们通常将其先转为图片的形式,再每次从硬盘上读一个batch的数据。下载UCF101数据集之后,将其解压到项目的根目录下。创建convert_video_to_images.sh文件,内容为

for folder in $1/*
dofor file in "$folder"/*.avidoif [[ ! -d "${file[@]%.avi}" ]]; thenmkdir -p "${file[@]%.avi}"fiffmpeg -i "$file" -vf fps=$2 "${file[@]%.avi}"/%05d.jpgrm "$file"done
done

执行

sudo ./convert_video_to_images.sh UCF101/ 5

表示将视频每秒取5帧图片。

之后生成训练集与测试集。创建convert_images_to_list.sh文件,内容为

> train.list
> test.list
COUNT=-1
for folder in $1/*
doCOUNT=$[$COUNT + 1]for imagesFolder in "$folder"/*doif (( $(jot -r 1 1 $2)  > 1 )); thenecho "$imagesFolder" $COUNT >> train.listelseecho "$imagesFolder" $COUNT >> test.listfi        done
done

执行

./convert_images_to_list.sh UCF101/ 4

表示1/4的数据为测试集,其余为训练集。

每次为训练和测试从硬盘上读取batch_size大小的数据,具体如下。

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport PIL.Image as Image
import random
import numpy as np
import os
import time
CLIP_LENGTH = 16
import cv2
VALIDATION_PRO = 0.2np_mean = np.load('crop_mean.npy').reshape([CLIP_LENGTH, 112, 112, 3])
def get_test_num(filename):lines = open(filename, 'r')return len(list(lines))
def get_video_indices(filename):lines = open(filename, 'r')#Shuffle datalines = list(lines)video_indices = range(len(lines))random.seed(time.time())random.shuffle(video_indices)validation_video_indices = video_indices[:int(len(video_indices) * 0.2)]train_video_indices = video_indices[int(len(video_indices) * 0.2):]return train_video_indices, validation_video_indicesdef frame_process(clip, clip_length=CLIP_LENGTH, crop_size=112, channel_num=3):frames_num = len(clip)croped_frames = np.zeros([frames_num, crop_size, crop_size, channel_num]).astype(np.float32)#Crop every frame into shape[crop_size, crop_size, channel_num]for i in range(frames_num):img = Image.fromarray(clip[i].astype(np.uint8))if img.width > img.height:scale = float(crop_size) / float(img.height)img = np.array(cv2.resize(np.array(img), (int(img.width * scale + 1), crop_size))).astype(np.float32)else:scale = float(crop_size) / float(img.width)img = np.array(cv2.resize(np.array(img), (crop_size, int(img.height * scale + 1)))).astype(np.float32)crop_x = int((img.shape[0] - crop_size) / 2)crop_y = int((img.shape[1] - crop_size) / 2)img = img[crop_x: crop_x + crop_size, crop_y : crop_y + crop_size, :]croped_frames[i, :, :, :] = img - np_mean[i]return croped_framesdef convert_images_to_clip(filename, clip_length=CLIP_LENGTH, crop_size=112, channel_num=3):clip = []for parent, dirnames, filenames in os.walk(filename):filenames = sorted(filenames)if len(filenames) < clip_length:for i in range(0, len(filenames)):image_name = str(filename) + '/' + str(filenames[i])img = Image.open(image_name)img_data = np.array(img)clip.append(img_data)for i in range(clip_length - len(filenames)):image_name = str(filename) + '/' + str(filenames[len(filenames) - 1])img = Image.open(image_name)img_data = np.array(img)clip.append(img_data)else:s_index = random.randint(0, len(filenames) - clip_length)for i in range(s_index, s_index + clip_length):image_name = str(filename) + '/' + str(filenames[i])img = Image.open(image_name)img_data = np.array(img)clip.append(img_data)if len(clip) == 0:print(filename)clip = frame_process(clip, clip_length, crop_size, channel_num)return clip#shape[clip_length, crop_size, crop_size, channel_num]def get_batches(filename, num_classes, batch_index, video_indices, batch_size=10, crop_size=112, channel_num=3):lines = open(filename, 'r')clips = []labels = []lines = list(lines)for i in video_indices[batch_index: batch_index + batch_size]:line = lines[i].strip('\n').split()dirname = line[0]label = line[1]i_clip = convert_images_to_clip(dirname, CLIP_LENGTH, crop_size, channel_num)clips.append(i_clip)labels.append(int(label))clips = np.array(clips).astype(np.float32)labels = np.array(labels).astype(np.int64)oh_labels = np.zeros([len(labels), num_classes]).astype(np.int64)for i in range(len(labels)):oh_labels[i, labels[i]] = 1batch_index = batch_index + batch_size#Convert to numpybatch_data = {'clips': clips, 'labels': oh_labels}return batch_data, batch_index

这里需要注意的是:为了简便,我每一个视频随机抽取一个连续的16帧组成clip,作为一个样本,如果batch_size为10,那么就是取了10个视频,每个视频随机取16帧,组成了10 clips,作为每次网络的输入。

模型使用slim,因为实现简单,阅读容易。

import tensorflow as tf
import tensorflow.contrib.slim as slimdef C3D(input, num_classes, keep_pro=0.5):with tf.variable_scope('C3D'):with slim.arg_scope([slim.conv3d],padding='SAME',weights_regularizer=slim.l2_regularizer(0.0005),activation_fn=tf.nn.relu,kernel_size=[3, 3, 3],stride=[1, 1, 1]):net = slim.conv3d(input, 64, scope='conv1')net = slim.max_pool3d(net, kernel_size=[1, 2, 2], stride=[1, 2, 2], padding='SAME', scope='max_pool1')net = slim.conv3d(net, 128, scope='conv2')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool2')net = slim.repeat(net, 2, slim.conv3d, 256, scope='conv3')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool3')net = slim.repeat(net, 2, slim.conv3d, 512, scope='conv4')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool4')net = slim.repeat(net, 2, slim.conv3d, 512, scope='conv5')net = slim.max_pool3d(net, kernel_size=[2, 2, 2], stride=[2, 2, 2], padding='SAME', scope='max_pool5')net = tf.reshape(net, [-1, 512 * 4 * 4])net = slim.fully_connected(net, 4096, weights_regularizer=slim.l2_regularizer(0.0005), scope='fc6')net = slim.dropout(net, keep_pro, scope='dropout1')net = slim.fully_connected(net, 4096, weights_regularizer=slim.l2_regularizer(0.0005), scope='fc7')net = slim.dropout(net, keep_pro, scope='dropout2')out = slim.fully_connected(net, num_classes, weights_regularizer=slim.l2_regularizer(0.0005), \activation_fn=None, scope='out')return out

训练

import tensorflow as tf
import numpy as np
import C3D_model
import time
import data_processing
import os
import os.path
from os.path import join
TRAIN_LOG_DIR = os.path.join('Log/train/', time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
TRAIN_CHECK_POINT = 'check_point/'
TRAIN_LIST_PATH = 'train.list'
TEST_LIST_PATH = 'test.list'
BATCH_SIZE = 10
NUM_CLASSES = 101
CROP_SZIE = 112
CHANNEL_NUM = 3
CLIP_LENGTH = 16
EPOCH_NUM = 50
INITIAL_LEARNING_RATE = 1e-4
LR_DECAY_FACTOR = 0.5
EPOCHS_PER_LR_DECAY = 2
MOVING_AV_DECAY = 0.9999
#Get shuffle index
train_video_indices, validation_video_indices = data_processing.get_video_indices(TRAIN_LIST_PATH)with tf.Graph().as_default():batch_clips = tf.placeholder(tf.float32, [BATCH_SIZE, CLIP_LENGTH, CROP_SZIE, CROP_SZIE, CHANNEL_NUM], name='X')batch_labels = tf.placeholder(tf.int32, [BATCH_SIZE, NUM_CLASSES], name='Y')keep_prob = tf.placeholder(tf.float32)logits = C3D_model.C3D(batch_clips, NUM_CLASSES, keep_prob)with tf.name_scope('loss'):loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=batch_labels))tf.summary.scalar('entropy_loss', loss)with tf.name_scope('accuracy'):accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(batch_labels, 1)), np.float32))tf.summary.scalar('accuracy', accuracy)#global_step = tf.Variable(0, name='global_step', trainable=False) #decay_step = EPOCHS_PER_LR_DECAY * len(train_video_indices) // BATCH_SIZElearning_rate = 1e-4#tf.train.exponential_decay(INITIAL_LEARNING_RATE, global_step, decay_step, LR_DECAY_FACTOR, staircase=True)optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)#, global_step=global_step)saver = tf.train.Saver()summary_op = tf.summary.merge_all()config = tf.ConfigProto()config.gpu_options.allow_growth = Truewith tf.Session(config=config) as sess:train_summary_writer = tf.summary.FileWriter(TRAIN_LOG_DIR, sess.graph)sess.run(tf.global_variables_initializer())sess.run(tf.local_variables_initializer())step = 0for epoch in range(EPOCH_NUM):accuracy_epoch = 0loss_epoch = 0batch_index = 0for i in range(len(train_video_indices) // BATCH_SIZE):step += 1batch_data, batch_index = data_processing.get_batches(TRAIN_LIST_PATH, NUM_CLASSES, batch_index,train_video_indices, BATCH_SIZE)_, loss_out, accuracy_out, summary = sess.run([optimizer, loss, accuracy, summary_op],feed_dict={batch_clips:batch_data['clips'],batch_labels:batch_data['labels'],keep_prob: 0.5})loss_epoch += loss_outaccuracy_epoch += accuracy_outif i % 10 == 0:print('Epoch %d, Batch %d: Loss is %.5f; Accuracy is %.5f'%(epoch+1, i, loss_out, accuracy_out))train_summary_writer.add_summary(summary, step)print('Epoch %d: Average loss is: %.5f; Average accuracy is: %.5f'%(epoch+1, loss_epoch / (len(train_video_indices) // BATCH_SIZE),accuracy_epoch / (len(train_video_indices) // BATCH_SIZE)))accuracy_epoch = 0loss_epoch = 0batch_index = 0for i in range(len(validation_video_indices) // BATCH_SIZE):batch_data, batch_index = data_processing.get_batches(TRAIN_LIST_PATH, NUM_CLASSES, batch_index,validation_video_indices, BATCH_SIZE)loss_out, accuracy_out = sess.run([loss, accuracy],feed_dict={batch_clips:batch_data['clips'],batch_labels:batch_data['labels'],keep_prob: 1.0})loss_epoch += loss_outaccuracy_epoch += accuracy_outprint('Validation loss is %.5f; Accuracy is %.5f'%(loss_epoch / (len(validation_video_indices) // BATCH_SIZE),accuracy_epoch /(len(validation_video_indices) // BATCH_SIZE)))saver.save(sess, TRAIN_CHECK_POINT + 'train.ckpt', global_step=epoch)

这里取训练集的20%作为验证集,在训练集上每跑完一个epoch,就在验证集上验证一次。

测试

import tensorflow as tf
import numpy as np
import C3D_model
import data_processing
TRAIN_LOG_DIR = 'Log/train/'
TRAIN_CHECK_POINT = 'check_point/train.ckpt-36'
TEST_LIST_PATH = 'test.list'
BATCH_SIZE = 10
NUM_CLASSES = 101
CROP_SZIE = 112
CHANNEL_NUM = 3
CLIP_LENGTH = 16
EPOCH_NUM = 50
test_num = data_processing.get_test_num(TEST_LIST_PATH)test_video_indices = range(test_num)with tf.Graph().as_default():batch_clips = tf.placeholder(tf.float32, [BATCH_SIZE, CLIP_LENGTH, CROP_SZIE, CROP_SZIE, CHANNEL_NUM], name='X')batch_labels = tf.placeholder(tf.int32, [BATCH_SIZE, NUM_CLASSES], name='Y')keep_prob = tf.placeholder(tf.float32)logits = C3D_model.C3D(batch_clips, NUM_CLASSES, keep_prob)accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(batch_labels, 1)), np.float32))restorer = tf.train.Saver()config = tf.ConfigProto()config.gpu_options.allow_growth = Truewith tf.Session(config=config) as sess:sess.run(tf.global_variables_initializer())sess.run(tf.local_variables_initializer())restorer.restore(sess, TRAIN_CHECK_POINT)accuracy_epoch = 0batch_index = 0for i in range(test_num // BATCH_SIZE):if i % 10 == 0:print('Testing %d of %d'%(i + 1, test_num // BATCH_SIZE))batch_data, batch_index = data_processing.get_batches(TEST_LIST_PATH, NUM_CLASSES, batch_index,test_video_indices, BATCH_SIZE)accuracy_out = sess.run(accuracy,feed_dict={batch_clips: batch_data['clips'],batch_labels: batch_data['labels'],keep_prob: 1.0})accuracy_epoch += accuracy_outprint('Test accuracy is %.5f' % (accuracy_epoch / (test_num // BATCH_SIZE)))

实验结果

我在训练集上跑了36个epoches,最终模型在测试集上的结果为72%左右。

参考

hx173149/C3D-tensorflow​github.com图标

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/547214.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

去除表中重复的数据

2019独角兽企业重金招聘Python工程师标准>>> 表数据如下&#xff1a; 查询表中重复的数据&#xff1a; SELECT * FROM t_class WHERE classname IN (SELECT classname FROM t_class GROUP BY classname HAVING COUNT(classname) > 1) 结果如下&#xff1a; 显示…

基于动态骨骼的动作识别方法ST-GCN

解读&#xff1a;基于动态骨骼的动作识别方法ST-GCN&#xff08;时空图卷积网络模型&#xff09; 2018年04月09日 01:14:14 我是婉君的 阅读数 16076更多 分类专栏&#xff1a; 计算机视觉 论文 版权声明&#xff1a;本文为博主原创文章&#xff0c;遵循 CC 4.0 BY-SA 版权协…

pyTorch实现C3D模型的视频行为识别实践

1、3D卷积t简介 2、C3D模型原理与PyTorch实现 2.1、C3D模型结构 2.2、C3D视频动作识别&#xff08;附PyTorch代码&#xff09; 2.3、测试结果 参考 1、3D卷积的简介 在图像处理领域&#xff0c;被卷积的都是静态图像&#xff0c;所以使用2D卷积网络就足以。而在视频理解领域&…

HyperLPR Python3车牌识别系统的入门使用

概要 HyperLRP是一个开源的、基于深度学习高性能中文车牌识别库&#xff0c;由北京智云视图科技有限公司开发&#xff0c;支持PHP、C/C、Python语言&#xff0c;Windows/Mac/Linux/Android/IOS 平台。 github地址&#xff1a; https://github.com/zeusees/HyperLPR TODO 支…

java简单自定义Annotation

为什么80%的码农都做不了架构师&#xff1f;>>> 原文内容比较多&#xff0c;这里就简单地说一下。java 5以后增加了3个annotation&#xff0c; Override Deprecated SuppressWarnings 以上annotation用途就不说了。java中自定义annotation需要interface关键字和用到…

Tensorflow Object detection API 在 Windows10 配置

Tensorflow Object detection API 在 Windows10 下的配置不如在 Ubuntu 下配置方便&#xff0c;但还是有方法的&#xff0c;介绍一下我的配置流程。官方目标检测的demo中调用了大量的py文件&#xff0c;不利于项目的部署&#xff0c;因此我将其合并为两个文件 ##1.Tensorflow m…

使用jq的toggle函数实现全选功能遇到的问题

2019独角兽企业重金招聘Python工程师标准>>> 今天做网站后台管理的时候&#xff0c;要实现一个单选全选的功能&#xff0c;很简单的功能&#xff0c;不过&#xff0c;遇到了一个很诡异的问题&#xff0c;写出来跟大家分享下。 功能就不赘述了&#xff0c;大家都懂&…

GO国内镜像加速模块下载

众所周知&#xff0c;国内网络访问国外资源经常会出现不稳定的情况。 Go 生态系统中有着许多中国 Gopher 们无法获取的模块&#xff0c;比如最著名的 golang.org/x/...。并且在中国大陆从 GitHub 获取模块的速度也有点慢。 因此设置 CDN 加速代理就很有必要了&#xff0c;以下…

AbstractEndpoint 和 ProtocolHandler

2019独角兽企业重金招聘Python工程师标准>>> AbstractEndpoint 和 ProtocolHandler /** Licensed to the Apache Software Foundation (ASF) under one or more* contributor license agreements. See the NOTICE file distributed with* this work for additiona…

HOG + SVM 实现图片分类(python3)

前言 大家能看到这篇文章&#xff0c;想必对HOG还是有些了解了&#xff0c;那我就不赘述了&#xff0c;其实我自己不太懂&#xff0c;但是还是比刚开始好一些了。下面我的代码是参考这位博主的&#xff1a;点我查看 上面那位博主是用的cifar-10数据集&#xff0c;但是我们的数…

随机过程及其在金融领域中的应用 第三章 习题 及 答案

随机过程及其在金融领域中的应用 第三章 习题 及 答案 本文地址: http://blog.csdn.net/caroline_wendy/article/details/16879515 包含: 1, 2, 5, 15; 第1题: 第2题: 第5题: 第15题: 转载于:https://blog.51cto.com/spikeking/1388002

Fiddler手机抓包(iPhone)

Fiddler不但能截获各种浏览器发出的HTTP/HTTPS请求&#xff0c;也可以截获各种移动设备&#xff08;包括Andriod和IOS&#xff09;发出的HTTP/HTTPS请求。最关键的是&#xff0c;Fiddler还可以断点调试&#xff0c;修改Request和Response&#xff0c;而且即便抓包的是IOS设备&a…

python脚本去除文件名里的空格

原始文件名里很多空格&#xff0c;写了个python脚本去除文件名里的空格 import osfilepath"image" # 文件目录名 zimulus os.listdir(filepath)for musicname in zimulus:#改目录下的文件名oldmusicpath filepath \\ musicnamenewmusicname musicname.replace( …

eclipse打开文件所在目录

设置 添加扩展工具&#xff0c;添加步骤如下&#xff1a; Run-->External Tools-->External Tools Configurations... new 一个 programlocation 里面填 &#xff1a;C:/WINDOWS/explorer.exeArguments 里面填: ${container_loc}设置完成之后&#xff0c;选择文件&#…

python saml2 认证实例程序demo

# pip install pysaml2 from saml2.client import Saml2Client from saml2.config import Config as Saml2Configmetadata_filepath acs_endpoint_url entity_id# 获取跳转网址 saml_settings {metadata: {local: [authenticator_self.metadata_filepath]}, service: {sp: {end…

找回Kitkat的AppOps

2019独角兽企业重金招聘Python工程师标准>>> How to invoke AppOps in Android 4.4 KITKAT % adb shell am start -n com.android.settings/com.android.settings.Settings \ -e :android:show_fragment com.android.settings.applications.AppOpsSummary \ --activ…

win2003 sp2+iis 6.0上部署.net 2.0和.net 4.0网站的方法

网站环境 IIS6.0,操作系统Windows server2003 sp2,服务器之前已经部署了.net 2.0和asp的网站,现在要部署新开发的.net 4.0网站.本来认为很简单&#xff0c;却遇到了很多问题&#xff0c;下面详细描述下过程&#xff1a; 1.官网下载.net framework4.0,下载地址:http://www.micro…

python+opencv实现机器视觉基础技术(2)(宽度测量,缺陷检测,医学检测

本篇博客接着讲解机器视觉的有关技术和知识。包括宽度测量&#xff0c;缺陷检测&#xff0c;医学处理。 一&#xff1a;宽度测量 在传统的自动化生产中&#xff0c;对于尺寸的测量&#xff0c;典型的方法就是千分尺、游标卡尺、塞尺等。而这些测量手段测量精度低、速度慢&…

map 与 unordered_map

两者效率对比&#xff1a; #include <iostream> #include <string> #include <map> #include <unordered_map> #include <sys/time.h> #include <list>using namespace std;template<class T> void fun(const T& t, int sum) {f…

YOLO v3 安装并训练自己数据

文章目录 1. 安装1.1 模型安装1.2 运行Demo2.训练自己的数据集2.1数据集准备2.2修改配置文件2.2.1修改cfg/voc.data2.2.2修改data/voc.names2.2.3修改cfg/yolo-voc.cfg2.3 训练3. 测试3.1 单张图像测试3.2多张图像测试3.3 测试数据集测试mAP、recall等参数命令参数总结训练模型…