如下列举了 将数据集做成VOC2007格式用于Faster-RCNN训练的相关链接。
- RCNN系列实验的PASCAL VOC数据集格式设置
- 制作VOC2007数据集用于Faster-RCNN训练
- 将数据集做成VOC2007格式用于Faster-RCNN训练
这一篇比较详细地介绍了如何制造voc2007的所有文件,内含相关软件和代码,值得一看。 - voc2007数据集的下载和解压
下面的部分介绍了py-faster-rcnn源码中关于产生imdb与roidb的方法。
- faster-rcnn 之训练数据是如何准备的:imdb和roidb的产生
- faster rcnn源码解读(四)之数据类型imdb.py和pascal_voc.py(主要是imdb和roidb数据类型的解说)
- py-faster-rcnn源码解读系列(二)——pascal_voc.py
这里简单说一下Imdb与roidb。
imdb是作者封装的一个图片数据库类,内含数据库的名字,比如默认的是voc_2007_trainval ,部分定义如下:
class imdb(object):"""Image database."""def __init__(self, name):self._name = nameself._num_classes = 0self._classes = []self._image_index = []self._obj_proposer = 'selective_search'self._roidb = Noneself._roidb_handler = self.default_roidb #(handle本来是指针,这里就当做引用)# Use this dict for storing dataset specific config optionsself.config = {}@propertydef name(self):return self._name@propertydef num_classes(self):return len(self._classes)@propertydef classes(self):return self._classes@propertydef image_index(self):return self._image_index@propertydef roidb_handler(self):return self._roidb_handler@roidb_handler.setterdef roidb_handler(self, val):self._roidb_handler = val
##!!!!!!设置roidbdef set_proposal_method(self, method):method = eval('self.' + method + '_roidb') # python中eval是可以具体运行里面的字符串的,这里指运行:self.gt_roidb,函数在pascal_voc.py中。self.roidb_handler = method@propertydef roidb(self):# A roidb is a list of dictionaries, each with the following keys:# boxes# gt_overlaps# gt_classes# flippedif self._roidb is not None:return self._roidbself._roidb = self.roidb_handler()return self._roidb#cache_path: /data1/caiyong.wang/program/py-faster-rcnn.back_up/data/cache 缓冲路径@propertydef cache_path(self):cache_path = osp.abspath(osp.join(cfg.DATA_DIR, 'cache'))if not os.path.exists(cache_path):os.makedirs(cache_path)return cache_path@propertydef num_images(self):return len(self.image_index)
其中包含了roidb,roidb翻译过来就是roi database.其实就是目标检测包围盒,从上面也可以看出,
roidb是字典的列表,每项都是一张图片,如下的每一项都是一张图片上多个盒子的信息,因此是二维数组:
# A roidb is a list of dictionaries, each with the following keys:
# - boxes 一个二维数组 每一行存储 xmin ymin xmax ymax ,行指的多个box的序号
# - gt_classes存储了每个box所对应的类索引(类数组在初始化函数中声明)
# - overlap是一个二维数组,行指box的序号,列共有21列,存储的是0.0或者1.0 ,当box对应的类别时,自然为1.0.这实际上是指对于ground truth box,由于这里的候选框就是ground truth box,所以自然重叠后为1,而与其他的自然重叠设为0.后来被转成了稀疏矩阵.
# - seg _areas存储着 box的面积
#- flipped 为false 代表该图片还未被翻转(后来在train.py里会将翻转的图片加进去,用该变量用于区分)
产生imdb与roidb的主要代码是:
def get_roidb(imdb_name, rpn_file=None):imdb = get_imdb(imdb_name)#通过工厂类获取数据库信息print 'Loaded dataset `{:s}` for training'.format(imdb.name)imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD)print 'Set proposal method: {:s}'.format(cfg.TRAIN.PROPOSAL_METHOD)if rpn_file is not None:imdb.config['rpn_file'] = rpn_fileroidb = get_training_roidb(imdb)#获得训练数据return roidb, imdb
步骤是
1. 通过 lib/datasets/factory.py获取pascal_voc对象。
2. pascal_voc继承自imdb,进一步定义了该数据库下的一些函数。
如下是pascal_voc的一些成员举例:
#下面是成员变量的初始化:
#{
# year:’2007’
# name:'voc_2007_trainval'
# image _set:’trainval’
# devkit _path:’data/VOCdevkit2007’
# data _path:’data /VOCdevkit2007/VOC2007’
# classes:(…)_如果想要训练自己的数据,需要修改这里_
# class _to _ind:{…} _一个将类名转换成下标的字典 _
# image _ext:’.jpg’
# image _index: [‘000001’,’000003’,……]_根据trainval.txt获取到的image索引_
# roidb _handler: <Method gt_roidb >
# salt: <Object uuid >
# comp _id:’comp4’
# config:{…}
#}
3. 通过 lib/datasets/imdb.py 中的set_proposal_method 获取roidb.
def set_proposal_method(self, method):method = eval('self.' + method + '_roidb') # python中eval是可以具体运行里面的字符串的,这里指运行:self.gt_roidb,函数在pascal_voc.py中。self.roidb_handler = method
这里默认是eval('self.gt_roidb'),self.gt_roidb,函数在pascal_voc.py中。定义了ground truth的box的一些信息。
4. 通过lib/fast rcnn/train.py中的如下函数获取训练数据。
def get_training_roidb(imdb):"""Returns a roidb (Region of Interest database) for use in training."""if cfg.TRAIN.USE_FLIPPED:print 'Appending horizontally-flipped training examples...'imdb.append_flipped_images()print 'done'print 'Preparing training data...'rdl_roidb.prepare_roidb(imdb)print 'done'return imdb.roidb
这里面进行了图片的翻转,其实是box的翻转(见 lib/datasets/imdb.py/append_flipped_images)
以及lib/roi_data_layer/roidb/rdl_roidb对象的prepare_roidb ,这个函数丰富了roi的一些信息,新增了每一个图片的roidb的图片路径,大小,以及每个box对应的最大重叠,和最大重叠对应的label等。
def prepare_roidb(imdb):"""Enrich the imdb's roidb by adding some derived quantities thatare useful for training. This function precomputes the maximumoverlap, taken over ground-truth boxes, between each ROI andeach ground-truth box. The class with maximum overlap is alsorecorded."""sizes = [PIL.Image.open(imdb.image_path_at(i)).sizefor i in xrange(imdb.num_images)]roidb = imdb.roidbfor i in xrange(len(imdb.image_index)): #此时如果在前面调用了imdb.append_flipped_images,则imdb.image_index已经翻倍。roidb[i]['image'] = imdb.image_path_at(i)roidb[i]['width'] = sizes[i][0]roidb[i]['height'] = sizes[i][1]# need gt_overlaps as a dense array for argmaxgt_overlaps = roidb[i]['gt_overlaps'].toarray()# max overlap with gt over classes (columns)max_overlaps = gt_overlaps.max(axis=1)# gt class that had the max overlapmax_classes = gt_overlaps.argmax(axis=1)roidb[i]['max_classes'] = max_classesroidb[i]['max_overlaps'] = max_overlaps# sanity checks# max overlap of 0 => class should be zero (background)zero_inds = np.where(max_overlaps == 0)[0]assert all(max_classes[zero_inds] == 0)# max overlap > 0 => class should not be zero (must be a fg class)nonzero_inds = np.where(max_overlaps > 0)[0]assert all(max_classes[nonzero_inds] != 0)