从本篇开始,我们来记录一下全卷积网络用来做语义分割的全过程。
代码:https://github.com/shelhamer/fcn.berkeleyvision.org
下面我们将描述三方面的内容:
1. 官方提供的公开数据集
2. 自己的数据集如何准备,主要是如何标注label
3. 训练结束后如何对结果着色。
公开数据集
这里分别说一下SiftFlowDataset与pascal voc数据集。
1. pascal voc
根据FCN代码中的data文件夹下的pascal说明:
# PASCAL VOC and SBDPASCAL VOC is a standard recognition dataset and benchmark with detection and semantic segmentation challenges.
The semantic segmentation challenge annotates 20 object classes and background.
The Semantic Boundary Dataset (SBD) is a further annotation of the PASCAL VOC data that provides more semantic segmentation and instance segmentation masks.PASCAL VOC has a private test set and [leaderboard for semantic segmentation](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=6).The train/val/test splits of PASCAL VOC segmentation challenge and SBD diverge.
Most notably VOC 2011 segval intersects with SBD train.
Care must be taken for proper evaluation by excluding images from the train or val splits.We train on the 8,498 images of SBD train.
We validate on the non-intersecting set defined in the included `seg11valid.txt`.Refer to `classes.txt` for the listing of classes in model output order.
Refer to `../voc_layers.py` for the Python data layer for this dataset.See the dataset sites for download:- PASCAL VOC 2012: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
- SBD: see [homepage](http://home.bharathh.info/home/sbd) or [direct download](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz)
我们可以下载训练数据集:SBD 以及测试集:PASCAL VOC 2012
然后进入fcn/data,新建sbdd文件夹(如果没有),将benchmark的dataset解压到sbdd中,将VOC2012解压到data下的pascal文件夹下。 这两个文件夹已经准备好了train.txt用于训练,seg11valid.txt用于测试。
2. SIFT-Flow
下载数据集:下载地址。
并解压至/fcn.berkeleyvision.org/data/下,并覆盖名为sift-flow的文件夹。
由于FCN源代码已经为我们准备好了train.txt等文件了,所以不需要重新生成。
准备自己的数据集
深度学习图像分割(FCN)训练自己的模型大致可以以下三步:
1.为自己的数据制作label;
2.将自己的数据分为train,val和test集;
3.仿照voc_lyaers.py编写自己的输入数据层。
在FCN中,图像的大小是不限的,此时如果数据集的图片大小不一,则每次只能训一张图片。这是FCN代码的默认设置。即batch_size=1.但是如果批量训练,则应该要求所有的数据集大小相同。此时我们需要使用resize进行缩放。一般情况下,我们将原图缩放到256*256,或者500*500.
1. 缩放图像
下面给出几个缩放函数,来自网上:http://blog.csdn.net/u010402786/article/details/72883421
(1)单张图片的resize
import Image def convert(width,height):im = Image.open("C:\\xxx\\test.jpg")out = im.resize((width, height),Image.ANTIALIAS)out.save("C:\\xxx\\test.jpg")
if __name__ == '__main__':convert(256,256)
(2)resize整个文件夹里的图片
import Image
import osdef convert(dir,width,height):file_list = os.listdir(dir)print(file_list)for filename in file_list:path = ''path = dir+filenameim = Image.open(path)out = im.resize((256,256),Image.ANTIALIAS)print "%s has been resized!"%filenameout.save(path)if __name__ == '__main__':dir = raw_input('please input the operate dir:')convert(dir,256,256)
(3)按比例resize
import Image def convert(width,height):im = Image.open("C:\\workspace\\PythonLearn1\\test_1.jpg")(x, y)= im.sizex_s = widthy_s = y * x_s / xout = im.resize((x_s, y_s), Image.ANTIALIAS)out.save("C:\\workspace\\PythonLearn1\\test_1_out.jpg")
if __name__ == '__main__':convert(256,256)
图像标签制作
第一步:使用github开源软件进行标注
地址:https://github.com/wkentaro/labelme
Usage
Annotation
Run labelme --help
for detail.
labelme # Open GUI
labelme static/apc2016_obj3.jpg # Specify file
labelme static/apc2016_obj3.jpg -O static/apc2016_obj3.json # Close window after the save
The annotations are saved as a JSON file. The
file includes the image itself.
Visualization
To view the json file quickly, you can use utility script:
labelme_draw_json static/apc2016_obj3.json
Convert to Dataset
To convert the json to set of image and label, you can run following:
labelme_json_to_dataset static/apc2016_obj3.json
第二步:为标注出来的label.png进行着色
上面的标注软件将生成的json文件转化为Dataset后,会生成label.png文件。是一张灰度图像,16位。
因此我们需要对照VOC分割的颜色进行着色,一定要保证颜色的准确性。Matlab代码:
function cmap = labelcolormap(N)if nargin==0N=256
end
cmap = zeros(N,3);
for i=1:Nid = i-1; r=0;g=0;b=0;for j=0:7r = bitor(r, bitshift(bitget(id,1),7 - j));g = bitor(g, bitshift(bitget(id,2),7 - j));b = bitor(b, bitshift(bitget(id,3),7 - j));id = bitshift(id,-3);endcmap(i,1)=r; cmap(i,2)=g; cmap(i,3)=b;
end
cmap = cmap / 255;
或者python代码:
import numpy as np# Get the specified bit value
def bitget(byteval, idx):return ((byteval & (1 << idx)) != 0)# Create label-color map, label --- [R G B]
# 0 --- [ 0 0 0], 1 --- [128 0 0], 2 --- [ 0 128 0]
# 3 --- [128 128 0], 4 --- [ 0 0 128], 5 --- [128 0 128]
# 6 --- [ 0 128 128], 7 --- [128 128 128], 8 --- [ 64 0 0]
# 9 --- [192 0 0], 10 --- [ 64 128 0], 11 --- [192 128 0]
# 12 --- [ 64 0 128], 13 --- [192 0 128], 14 --- [ 64 128 128]
# 15 --- [192 128 128], 16 --- [ 0 64 0], 17 --- [128 64 0]
# 18 --- [ 0 192 0], 19 --- [128 192 0], 20 --- [ 0 64 128]
def labelcolormap(N=256):color_map = np.zeros((N, 3))for n in xrange(N):id_num = nr, g, b = 0, 0, 0for pos in xrange(8):r = np.bitwise_or(r, (bitget(id_num, 0) << (7-pos)))g = np.bitwise_or(g, (bitget(id_num, 1) << (7-pos)))b = np.bitwise_or(b, (bitget(id_num, 2) << (7-pos)))id_num = (id_num >> 3)color_map[n, 0] = rcolor_map[n, 1] = gcolor_map[n, 2] = breturn color_map/255if __name__=="__main__":color_map=labelcolormap(21)print color_map
上面会生成如下的矩阵,以python的结果为例:
[[ 0. 0. 0. ][ 0.50196078 0. 0. ][ 0. 0.50196078 0. ][ 0.50196078 0.50196078 0. ][ 0. 0. 0.50196078][ 0.50196078 0. 0.50196078][ 0. 0.50196078 0.50196078][ 0.50196078 0.50196078 0.50196078][ 0.25098039 0. 0. ][ 0.75294118 0. 0. ][ 0.25098039 0.50196078 0. ][ 0.75294118 0.50196078 0. ][ 0.25098039 0. 0.50196078][ 0.75294118 0. 0.50196078][ 0.25098039 0.50196078 0.50196078][ 0.75294118 0.50196078 0.50196078][ 0. 0.25098039 0. ][ 0.50196078 0.25098039 0. ][ 0. 0.75294118 0. ][ 0.50196078 0.75294118 0. ][ 0. 0.25098039 0.50196078]]
分别对应着Pascal voc的colormap:
background 0 0 0
aeroplane 128 0 0
bicycle 0 128 0
bird 128 128 0
boat 0 0 128
bottle 128 0 128
bus 0 128 128
car 128 128 128
cat 64 0 0
chair 192 0 0
cow 64 128 0
diningtable 192 128 0
dog 64 0 128
horse 192 0 128
motorbike 64 128 128
person 192 128 128
pottedplant 0 64 0
sheep 128 64 0
sofa 0 192 0
train 128 192 0
tvmonitor 0 64 128
这里使用函数生成了label对应的颜色,这里label就是指0,1,2,… ,21(这里pascal voc共21类)
而在第一步标注生成的图像label.png里面的数值就是0,1,2…21.最多256个数值。一般取为灰度图像。
因此我们需要根据这个colormap将上面生成的灰度图转化为rgb图像。
方法一:改造skimage的colormap
其实在skimage中已经包含了部分colormap,但是不是针对于pascal voc的格式,因此我们需要单独指定。
找到如下路径:
/*/anaconda2/lib/python2.7/site-packages/skimage/color/
修改colorlabel.py,增加
DEFAULT_COLORS1 = ('maroon', 'lime', 'olive', 'navy', 'purple', 'teal','gray', 'fcncat', 'fcnchair', 'fcncow', 'fcndining','fcndog', 'fcnhorse', 'fcnmotor', 'fcnperson', 'fcnpotte','fcnsheep', 'fcnsofa', 'fcntrain', 'fcntv')
并且 把_label2rgb_overlay函数改造:
if colors is None:colors = DEFAULT_COLORS1
最后在rgb_colors.py中新增如下变量:
fcnchair = (0.753, 0, 0)
fcncat = (0.251, 0, 0)
fcncow = (0.251, 0.502, 0)
fcndining = (0.753, 0.502, 0)
fcndog = (0.251, 0, 0.502)
fcnhorse = (0.753, 0, 0.502)
fcnmotor = (0.251, 0.502, 0.502)
fcnperson = (0.753, 0.502, 0.502)
fcnpotte = (0, 0.251, 0)
fcnsheep = (0.502, 0.251, 0)
fcnsofa = (0, 0.753, 0)
fcntrain = (0.502, 0.753, 0)
fcntv = (0, 0.251, 0.502)
如果嫌麻烦,只需要下载:https://github.com/315386775/FCN_train
然后将Add_colortoimg下的skimge-color替换skimage的color文件夹即可。
最后执行转换:
#!usr/bin/python
# -*- coding:utf-8 -*-
import PIL.Image
import numpy as np
from skimage import io,data,color
import matplotlib.pyplot as pltimg = PIL.Image.open('xxx.png')
img = np.array(img)
dst = color.label2rgb(img, bg_label=0, bg_color=(0, 0, 0))
io.imsave('xxx.png', dst)
方法二: 不修改源代码
#!usr/bin/python
# -*- coding:utf-8 -*-
import PIL.Image
import numpy as np
from skimage import io,data,color# Get the specified bit value
def bitget(byteval, idx):return ((byteval & (1 << idx)) != 0)# Create label-color map, label --- [R G B]
# 0 --- [ 0 0 0], 1 --- [128 0 0], 2 --- [ 0 128 0]
# 4 --- [128 128 0], 5 --- [ 0 0 128], 6 --- [128 0 128]
# 7 --- [ 0 128 128], 8 --- [128 128 128], 9 --- [ 64 0 0]
# 10 --- [192 0 0], 11 --- [ 64 128 0], 12 --- [192 128 0]
# 13 --- [ 64 0 128], 14 --- [192 0 128], 15 --- [ 64 128 128]
# 16 --- [192 128 128], 17 --- [ 0 64 0], 18 --- [128 64 0]
# 19 --- [ 0 192 0], 20 --- [128 192 0], 21 --- [ 0 64 128]
def labelcolormap(N=256):color_map = np.zeros((N, 3))for n in xrange(N):id_num = nr, g, b = 0, 0, 0for pos in xrange(8):r = np.bitwise_or(r, (bitget(id_num, 0) << (7-pos)))g = np.bitwise_or(g, (bitget(id_num, 1) << (7-pos)))b = np.bitwise_or(b, (bitget(id_num, 2) << (7-pos)))id_num = (id_num >> 3)color_map[n, 0] = rcolor_map[n, 1] = gcolor_map[n, 2] = breturn color_map/255color_map = labelcolormap(21)img = PIL.Image.open('label.png')
img = np.array(img)
dst = color.label2rgb(img,colors=color_map[1:],bg_label=0, bg_color=(0, 0, 0))
io.imsave('xxx.png', dst)
这种方法直接加载了colormap,更简单明了。
需要注意的是:第一种方法中,将部分colormap做了修改,比如DEFAULT_COLORS1的第二个color,本来应该是(0 128 0),即(0, 0.502, 0),在skimge显示为green,但是这里使用了lime = (0, 1, 0)。不过差别不大。
第三步:最关键的一步
把24位png图转换为8位png图,直接上matlab代码:
dirs=dir('F:/xxx/*.png');
map =labelcolormap(256);
for n=1:numel(dirs)strname=strcat('F:/xxx/',dirs(n).name);img=imread(strname);x=rgb2ind(img,map);newname=strcat('F:/xxx/',dirs(n).name);imwrite(x,map,newname,'png');
end
至此我们就生成了8位的彩色图。
需要注意的是,我们可以读取上面的生成的图像,看下面的输出是否与VOC输出一致。
In [23]: img = PIL.Image.open('F:/DL/000001_json/test/dstfcn.png')
In [24]: np.unique(img)
Out[24]: array([0, 1, 2], dtype=uint8)
主要关注[0, 1, 2] ,是不是有这样的输出,如果有,证明我们就成功地生成了label。
上面我们经历了生成label灰度图像–>生成colormap–>转化为rgb—》转化为8位rgb。
接下来,我们需要为训练准备如下数据:
test.txt是测试集,train.txt是训练集,val.txt是验证集,trainval.txt是训练和验证集
这时可以参考faster rcnn的比例,VOC2007中,trainval大概是整个数据集的50%,test也大概是整个数据集的50%;train大概是trainval的50%,val大概是trainval的50%。可参考以下代码:
参考:http://blog.csdn.net/sinat_30071459/article/details/50723212
%%
%该代码根据已生成的xml,制作VOC2007数据集中的trainval.txt;train.txt;test.txt和val.txt
%trainval占总数据集的50%,test占总数据集的50%;train占trainval的50%,val占trainval的50%;
%上面所占百分比可根据自己的数据集修改,如果数据集比较少,test和val可少一些
%%
%注意修改下面四个值
xmlfilepath='E:\Annotations';
txtsavepath='E:\ImageSets\Main\';
trainval_percent=0.5;%trainval占整个数据集的百分比,剩下部分就是test所占百分比
train_percent=0.5;%train占trainval的百分比,剩下部分就是val所占百分比%%
xmlfile=dir(xmlfilepath);
numOfxml=length(xmlfile)-2;%减去.和.. 总的数据集大小trainval=sort(randperm(numOfxml,floor(numOfxml*trainval_percent)));
test=sort(setdiff(1:numOfxml,trainval));trainvalsize=length(trainval);%trainval的大小
train=sort(trainval(randperm(trainvalsize,floor(trainvalsize*train_percent))));
val=sort(setdiff(trainval,train));ftrainval=fopen([txtsavepath 'trainval.txt'],'w');
ftest=fopen([txtsavepath 'test.txt'],'w');
ftrain=fopen([txtsavepath 'train.txt'],'w');
fval=fopen([txtsavepath 'val.txt'],'w');for i=1:numOfxmlif ismember(i,trainval)fprintf(ftrainval,'%s\n',xmlfile(i+2).name(1:end-4));if ismember(i,train)fprintf(ftrain,'%s\n',xmlfile(i+2).name(1:end-4));elsefprintf(fval,'%s\n',xmlfile(i+2).name(1:end-4));endelsefprintf(ftest,'%s\n',xmlfile(i+2).name(1:end-4));end
end
fclose(ftrainval);
fclose(ftrain);
fclose(fval);
fclose(ftest);
不过这里是利用了xml文件,我们可以直接利用img文件夹即可。
对测试结果着色
其实这一步主要就是修改infer.py
方法一:
import numpy as np
from PIL import Image
import caffe# load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe
im = Image.open('pascal/VOC2010/JPEGImages/2007_000129.jpg')
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= np.array((104.00698793,116.66876762,122.67891434))
in_ = in_.transpose((2,0,1))# load net
net = caffe.Net('voc-fcn8s/deploy.prototxt', 'voc-fcn8s/fcn8s-heavy-pascal.caffemodel', caffe.TEST)
# shape for input (data blob is N x C x H x W), set data
net.blobs['data'].reshape(1, *in_.shape)
net.blobs['data'].data[...] = in_
# run net and take argmax for prediction
net.forward()
out = net.blobs['score'].data[0].argmax(axis=0)arr=out.astype(np.uint8)
im=Image.fromarray(arr)palette=[]
for i in range(256):palette.extend((i,i,i))
palette[:3*21]=np.array([[0, 0, 0],[128, 0, 0],[0, 128, 0],[128, 128, 0],[0, 0, 128],[128, 0, 128],[0, 128, 128],[128, 128, 128],[64, 0, 0],[192, 0, 0],[64, 128, 0],[192, 128, 0],[64, 0, 128],[192, 0, 128],[64, 128, 128],[192, 128, 128],[0, 64, 0],[128, 64, 0],[0, 192, 0],[128, 192, 0],[0, 64, 128]], dtype='uint8').flatten()
im.putpalette(palette)
im.show()
im.save('test.png')
或者采用跟准备数据一样的方法:
import numpy as np
from PIL import Imageimport caffefrom scipy.misc import imread, imsave
from skimage.color import label2rgb# Get the specified bit value
def bitget(byteval, idx):return ((byteval & (1 << idx)) != 0)# Create label-color map, label --- [R G B]
# 0 --- [ 0 0 0], 1 --- [128 0 0], 2 --- [ 0 128 0]
# 4 --- [128 128 0], 5 --- [ 0 0 128], 6 --- [128 0 128]
# 7 --- [ 0 128 128], 8 --- [128 128 128], 9 --- [ 64 0 0]
# 10 --- [192 0 0], 11 --- [ 64 128 0], 12 --- [192 128 0]
# 13 --- [ 64 0 128], 14 --- [192 0 128], 15 --- [ 64 128 128]
# 16 --- [192 128 128], 17 --- [ 0 64 0], 18 --- [128 64 0]
# 19 --- [ 0 192 0], 20 --- [128 192 0], 21 --- [ 0 64 128]
def labelcolormap(N=256):color_map = np.zeros((N, 3))for n in xrange(N):id_num = nr, g, b = 0, 0, 0for pos in xrange(8):r = np.bitwise_or(r, (bitget(id_num, 0) << (7-pos)))g = np.bitwise_or(g, (bitget(id_num, 1) << (7-pos)))b = np.bitwise_or(b, (bitget(id_num, 2) << (7-pos)))id_num = (id_num >> 3)color_map[n, 0] = rcolor_map[n, 1] = gcolor_map[n, 2] = breturn color_mapdef main():# load image, switch to BGR, subtract mean, and make dims C x H x W for Caffeim = Image.open('data/pascal/VOCdevkit/VOC2012/JPEGImages/2007_000346.jpg')in_ = np.array(im, dtype=np.float32)in_ = in_[:,:,::-1]in_ -= np.array((104.00698793,116.66876762,122.67891434))in_ = in_.transpose((2,0,1))# load netnet = caffe.Net('voc-fcn8s/deploy.prototxt', 'ilsvrc-nets/fcn8s-heavy-pascal.caffemodel', caffe.TEST)# shape for input (data blob is N x C x H x W), set datanet.blobs['data'].reshape(1, *in_.shape)net.blobs['data'].data[...] = in_# run net and take argmax for predictionnet.forward()out = net.blobs['score'].data[0].argmax(0).astype(np.uint8)color_map = labelcolormap(21)label_mask = label2rgb(out, colors=color_map[1:], bg_label=0)label_mask[out == 0] = [0, 0, 0]imsave('data/pascal/VOCdevkit/VOC2012/JPEGImages/test_prediction.png', label_mask.astype(np.uint8))if __name__ == '__main__':main()
参考文献
- 图像分割 | FCN数据集制作的全流程(图像标注)
- FCN制作自己的数据集、训练和测试全流程
- FCN网络训练 终极版
- 【FCN实践】04 预测