前言
按照上一篇博客所遗留的话题:多标签分类,进行初步探索,此篇博客针对caffe官网的多分类进行配置,只不过是Python接口的,不过官网在开头说明可以使用HDF5或者LMDB进行操作,只不过Python更加方便罢了
国际惯例,贴参考网址:
官网教程:Multilabel classification on PASCAL using python data-layers
数据集(VOC2012)官网:Visual Object Classes Challenge 2012 (VOC2012) 或者直接下载压缩包
【注】虽然大部分和Linux下Python的配置代码方法一样,但是Windows下跑此程序还是有些地方需要改动的。
准备工作
- 此电脑上caffe的路径默认为:
E:\CaffeDev-GPU\caffe-master\
,同时在E:\CaffeDev-GPU\caffe-master\data
下新建一个文件夹pascal
。 - 下载完数据集以后是
VOCtrainval_11-May-2012.tar
这个压缩文件,把此压缩包里VOCdevkit
里面的文件夹VOC2012
直接拖到上面说的E:\CaffeDev-GPU\caffe-master\data\pascal
文件夹中,然后就有了这样几个文件夹
- 确保Python接口已经完全配置成功,按照此博客配置即可。由于还不会玩Python(会玩Python的看下面的注即可),所以我把编译好的
E:\CaffeDev-GPU\caffe-master\Build\x64\Release\pycaffe\caffe
在C:\Users\Bingo\Anaconda2\Lib\site-packages
拷贝了一份的同时,也将E:\CaffeDev-GPU\caffe-master\Build\x64\Release\pycaffe
整个复制到E:\CaffeDev-GPU\caffe-master\python
,该替换的直接点替换即可。
【注】这一步主要是为了指定我们的Python是从GPU版本下拷贝过来的,如果会玩Python的同学,请直接指定当前调用的caffe
是E:\CaffeDev-GPU\caffe-master\Build\x64\Release\pycaffe
这个文件夹编译好的即可,不会的话按照我的操作,这样应该会默认调用Anaconda2
里面site-packages
中的caffe
程序运行
写在前面,以下程序最好一句一句运行调试,为了方便,我适当写成了代码块运行
在
E:\CaffeDev-GPU\caffe-master\examples
下新建一个空文件夹multilabel
用于存储我们后面的代码,在此文件夹目录下打开jupyter notebook
,即cmd
命令如下:E:\CaffeDev-GPU\caffe-master\examples\multilabel>jupyter notebook
在打开的页面中
New->Python2
,然后逐条语句运行以下程序:首先导入一些必要的包
import sys import osimport numpy as np import os.path as osp import matplotlib.pyplot as pltfrom copy import copy% matplotlib inline plt.rcParams['figure.figsize'] = (6, 6)
然后设置一下caffe相关路径并导入caffe,问题就在这里,感觉这个
import
进来的caffe
并不是设置的路径里面的caffe
,所以有准备工作中的3
操作caffe_root = '../../' # this file is expected to be in {caffe_root}/examples sys.path.append(caffe_root + 'python') sys.path.append('../../../') import caffe from caffe import layers as L, params as P
先看看
E:\CaffeDev-GPU\caffe-master\examples\pycaffe
下有无tools.py
这个文件,并import
近来,我在这里卡了一会,主要是一直不会用python
代码引入一个py
文件,不过最后还是瞎折腾引入了sys.path.append('../../examples/pycaffe/layers') # the datalayers we will use are in this directory. sys.path.append('../../examples/pycaffe') # the tools file is in this folder import tools
设置数据集位置以及待会微调所需要的模型
bvlc_reference_caffenet.caffemodel
,如果你是按照我的博客学caffe
,这个文件应该是已经存在于E:\CaffeDev-GPU\caffe-master\models\bvlc_reference_caffenet
,如果存在了,以下代码还是在下载,那你得核对一下上面各种路径了# set data root directory, e.g:pascal_root = osp.join(caffe_root, 'data/pascal/VOC2012')# these are the PASCAL classes, we'll need them later.classes = np.asarray(['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'])# make sure we have the caffenet weight downloaded.if not os.path.isfile(caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'):print("Downloading pre-trained CaffeNet model...")!../scripts/download_model_binary.py ../models/bvlc_reference_caffenet
设置caffe的GPU运行模式,这里我也卡了一下,Python一直崩溃,结果发现是自动调用的
caffe
是我原来在site-packages
编译的CPU
模式caffe
,所以这一步出问题,你也得琢磨一下这个细节caffe.set_mode_gpu() caffe.set_device(0)
以下便是设计网络结构了,不可能出错
# helper function for common structuresdef conv_relu(bottom, ks, nout, stride=1, pad=0, group=1):conv = L.Convolution(bottom, kernel_size=ks, stride=stride,num_output=nout, pad=pad, group=group)return conv, L.ReLU(conv, in_place=True)# another helper functiondef fc_relu(bottom, nout):fc = L.InnerProduct(bottom, num_output=nout)return fc, L.ReLU(fc, in_place=True)# yet another helper functiondef max_pool(bottom, ks, stride=1):return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)# main netspec wrapperdef caffenet_multilabel(data_layer_params, datalayer):# setup the python data layer n = caffe.NetSpec()n.data, n.label = L.Python(module = 'pascal_multilabel_datalayers', layer = datalayer, ntop = 2, param_str=str(data_layer_params))# the net itselfn.conv1, n.relu1 = conv_relu(n.data, 11, 96, stride=4)n.pool1 = max_pool(n.relu1, 3, stride=2)n.norm1 = L.LRN(n.pool1, local_size=5, alpha=1e-4, beta=0.75)n.conv2, n.relu2 = conv_relu(n.norm1, 5, 256, pad=2, group=2)n.pool2 = max_pool(n.relu2, 3, stride=2)n.norm2 = L.LRN(n.pool2, local_size=5, alpha=1e-4, beta=0.75)n.conv3, n.relu3 = conv_relu(n.norm2, 3, 384, pad=1)n.conv4, n.relu4 = conv_relu(n.relu3, 3, 384, pad=1, group=2)n.conv5, n.relu5 = conv_relu(n.relu4, 3, 256, pad=1, group=2)n.pool5 = max_pool(n.relu5, 3, stride=2)n.fc6, n.relu6 = fc_relu(n.pool5, 4096)n.drop6 = L.Dropout(n.relu6, in_place=True)n.fc7, n.relu7 = fc_relu(n.drop6, 4096)n.drop7 = L.Dropout(n.relu7, in_place=True)n.score = L.InnerProduct(n.drop7, num_output=20)n.loss = L.SigmoidCrossEntropyLoss(n.score, n.label)return str(n.to_proto())
将网络结构写入
prototxt
里面,这里有一个坑就是/
和\
的问题,我们在程序中尽量用/
,因为\
经常用于做转义字符使用,所以在prototxt
中遇到类似\train\XXXX
的文件,将会读取rain
文件夹,因为\t
被当成制表符了,这里也卡了我一下,解决方法如下:workdir = './pascal_multilabel_with_datalayer//' if not os.path.isdir(workdir):os.makedirs(workdir)solverprototxt = tools.CaffeSolver(trainnet_prototxt_path = osp.join(workdir, "trainnet.prototxt"), testnet_prototxt_path = osp.join(workdir, "valnet.prototxt")) solverprototxt.sp['display'] = "1" solverprototxt.sp['base_lr'] = "0.0001" solverprototxt.write(osp.join(workdir, 'solver.prototxt'))# write train net.with open(osp.join(workdir, 'trainnet.prototxt'), 'w') as f:# provide parameters to the data layer as a python dictionary. Easy as pie!data_layer_params = dict(batch_size = 128, im_shape = [227, 227], split = 'train', pascal_root = pascal_root)f.write(caffenet_multilabel(data_layer_params, 'PascalMultilabelDataLayerSync'))# write validation net.with open(osp.join(workdir, 'valnet.prototxt'), 'w') as f:data_layer_params = dict(batch_size = 128, im_shape = [227, 227], split = 'val', pascal_root = pascal_root)f.write(caffenet_multilabel(data_layer_params, 'PascalMultilabelDataLayerSync'))
载入网络结构
solver = caffe.SGDSolver(osp.join(workdir, 'solver.prototxt'))
应该会出现下面这个输出
BatchLoader initialized with 5717 images PascalMultilabelDataLayerSync initialized for split: train, with bs: 128, im_shape: [227, 227]. BatchLoader initialized with 5823 images PascalMultilabelDataLayerSync initialized for split: val, with bs: 128, im_shape: [227, 227].
比较奇怪的是下面这几句话我执行的结果是输出一张啥都没有的图,但是官网有图片被取出来,奇怪。
transformer = tools.SimpleTransformer() # This is simply to add back the bias, re-shuffle the color channels to RGB, and so on... image_index = 100 # First image in the batch. plt.figure() plt.imshow(transformer.deprocess(copy(solver.net.blobs['data'].data[image_index, ...]))) gtlist = solver.net.blobs['label'].data[image_index, ...].astype(np.int) plt.title('GT: {}'.format(classes[np.where(gtlist)])) plt.axis('off'); print(classes)
这是我的程序输出,自行对照官网的输出
依据官网所说,必须用一个方法去度量准确率,而在多标签中常用的方法是海明距离
Hamming distance
,仅仅需要一个简单的循环操作,如下:def hamming_distance(gt, est):return sum([1 for (g, e) in zip(gt, est) if g == e]) / float(len(gt))def check_accuracy(net, num_batches, batch_size = 128):acc = 0.0for t in range(num_batches):net.forward()gts = net.blobs['label'].dataests = net.blobs['score'].data > 0for gt, est in zip(gts, ests): #for each ground truth and estimated label vectoracc += hamming_distance(gt, est)return acc / (num_batches * batch_size)
训练模型
for itt in range(6):solver.step(100)print 'itt:{:3d}'.format((itt + 1) * 100), 'accuracy:{0:.4f}'.format(check_accuracy(solver.test_nets[0], 50))
这里慢慢等,会有输出的,而且cmd窗口可以看到具体迭代了多少次,Python窗口100次提示一次
itt:100 accuracy:0.9239 itt:200 accuracy:0.9236 itt:300 accuracy:0.9239 itt:400 accuracy:0.9240 itt:500 accuracy:0.9237 itt:600 accuracy:0.9241
检查一下基本准确率
def check_baseline_accuracy(net, num_batches, batch_size = 128):acc = 0.0for t in range(num_batches):net.forward()gts = net.blobs['label'].dataests = np.zeros((batch_size, len(gts)))for gt, est in zip(gts, ests): #for each ground truth and estimated label vectoracc += hamming_distance(gt, est)return acc / (num_batches * batch_size)print 'Baseline accuracy:{0:.4f}'.format(check_baseline_accuracy(solver.test_nets[0], 5823/128))
输出
Baseline accuracy:0.9238
可视化一些结果看看
test_net = solver.test_nets[0] for image_index in range(5):plt.figure()plt.imshow(transformer.deprocess(copy(test_net.blobs['data'].data[image_index, ...])))gtlist = test_net.blobs['label'].data[image_index, ...].astype(np.int)estlist = test_net.blobs['score'].data[image_index, ...] > 0plt.title('GT: {} \n EST: {}'.format(classes[np.where(gtlist)], classes[np.where(estlist)]))plt.axis('off')
输出结果分别有
整个程序至此结束,容易出错的就在于各种路径的书写上,请自行核对,必要时使用bat核对错误,因为bat调试很少崩溃。下一步研究研究文章开头说的使用
HDF5
格式制作多标签数据集并训练测试看看。
附件
包含代码,以及对应产出的各种文件(主要是prototxt文件)的程序打包:链接:http://pan.baidu.com/s/1nvofWJf 密码:m1h0
数据集请自行去官网下载,可以使用迅雷好像,速度蛮快的。