caffe---create自己的数据出现的各种bug
本文转载自:http://blog.youkuaiyun.com/dcxhun3/article/details/51966921
目前bug主要是create_imagenet.sh(来源于examples/imagenet)生成lmdb数据时产生的
bug 1 mkdir *_val_lmdb failed
这个一般是因为指定路径下已经存在了该文件,导致出现冲突问题,我最开始对于这问题是每次都手动敲码删除该文件,最后发现自己很笨,可以直接加个语句到create_imagenet.sh中:
- rm -rf $EXAMPLE/mytask_train_lmdb
- rm -rf $EXAMPLE/mytask_val_lmdb
bug 2 找不到指定路径下的图片could not open or find file
第一个情况是我在windows cmd下生成的txt标签文件,这里路径是反斜杠,我没有注意到。解决的最好办法就是打开txt文件,将反斜杠替换为斜杠。要么就是在linux下运行make_list.py就不会出现这个问题了。
第二种情况,这个着实困扰了我好久,怎么也搞不懂,路径明明对着了,为啥就不对呢?百思不得其解。。。最后才发现是python里面的转义字符 \t 搞的鬼 在图片名和标签之间的空格用\t表示的,解决这个问题的办法是用 ‘ ’代替了,好了:
- #fout.write('%s\t%d\n'%(image_list[i][0], image_list[i][1]))
- fout.write('%s%s%d\n'%(image_list[i][0], ' ',image_list[i][1]))#space not \t
正确情况,开始生成lmdb 数据比较大啊 378430图像 比较耗时
代码一
make_list.py
- import fnmatch,os
- import random
- import numpy as np
- import argparse
- def list_image(root, recursive, exts):
- image_list = []
- if recursive:
- cat = {}
- for path, subdirs, files in os.walk(root,True):
- print path
- for fname in files:
- fpath = os.path.join(path,fname)
- suffix = os.path.splitext(fname)[1].lower()
- if os.path.isfile(fpath) and (suffix in exts):
- if path not in cat:
- cat[path] = len(cat)
- image_list.append((os.path.relpath(fpath, root), cat[path]))
- # print fpath,cat[path]
- else:
- for fname in os.listdir(root):
- fpath = os.path.join(root, fname)
- suffix = os.path.splitext(fname)[1].lower()
- if os.path.isfile(fpath) and (suffix in exts):
- image_list.append((os.path.relpath(fpath, root), 0))
- return image_list
- def write_list(path_out, image_list):
- with open(path_out, 'w') as fout:
- for i in xrange(len(image_list)):
- #fout.write('%d \t %d \t %s\n'%(i, image_list[i][1], image_list[i][0]))
- #fout.write('%s\t%d\n'%(image_list[i][0], image_list[i][1]))
- fout.write('%s%s%d\n'%(image_list[i][0], ' ',image_list[i][1]))#space not \t
- def make_list(prefix_out, root, recursive, exts, num_chunks, train_ratio):
- image_list = list_image(root, recursive, exts)
- random.shuffle(image_list)
- N = len(image_list)
- chunk_size = (N+num_chunks-1)/num_chunks
- for i in xrange(num_chunks):
- chunk = image_list[i*chunk_size:(i+1)*chunk_size]
- if num_chunks > 1:
- str_chunk = '_%d'%i
- else:
- str_chunk = ''
- if train_ratio < 1:
- sep = int(chunk_size*train_ratio)
- write_list(prefix_out+str_chunk+'_train.txt', chunk[:sep])
- write_list(prefix_out+str_chunk+'_val.txt', chunk[sep:])
- else:
- write_list(prefix_out+str_chunk+'.txt', chunk)
- def main():
- parser = argparse.ArgumentParser(
- formatter_class=argparse.ArgumentDefaultsHelpFormatter,
- description='Make image list files that are\
- required by im2rec')
- parser.add_argument('root', help='path to folder that contain images.')
- parser.add_argument('prefix', help='prefix of output list files.')
- parser.add_argument('--exts', type=list, default=['.bmp','.bmp'],
- help='list of acceptable image extensions.')
- parser.add_argument('--chunks', type=int, default=1, help='number of chunks.')
- parser.add_argument('--train_ratio', type=float, default=1.0,
- help='Percent of images to use for training.')
- parser.add_argument('--recursive', type=bool, default=True,
- help='If true recursively walk through subdirs and assign an unique label\
- to images in each folder. Otherwise only include images in the root folder\
- and give them label 0.')
- args = parser.parse_args()
- make_list(args.prefix, args.root, args.recursive,
- args.exts, args.chunks, args.train_ratio)
- if __name__ == '__main__':
- main()
create_imagenet.sh
- #!/usr/bin/env sh
- # Create the imagenet lmdb inputs
- # N.B. set the path to the imagenet train + val data dirs
- EXAMPLE=examples/mytask
- DATA=/mnt/hgfs/caffe
- TOOLS=build/tools
- TRAIN_DATA_ROOT=/mnt/hgfs/caffe/train/
- VAL_DATA_ROOT=/mnt/hgfs/caffe/val/
- # Set RESIZE=true to resize the images to 256x256. Leave as false if images have
- # already been resized using another tool.
- RESIZE=true
- if $RESIZE; then
- RESIZE_HEIGHT=256
- RESIZE_WIDTH=256
- else
- RESIZE_HEIGHT=0
- RESIZE_WIDTH=0
- fi
- if [ ! -d "$TRAIN_DATA_ROOT" ]; then
- echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
- echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
- "where the ImageNet training data is stored."
- exit 1
- fi
- if [ ! -d "$VAL_DATA_ROOT" ]; then
- echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
- echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
- "where the ImageNet validation data is stored."
- exit 1
- fi
- echo "Creating train lmdb..."
- rm -rf $EXAMPLE/mytask_train_lmdb
- rm -rf $EXAMPLE/mytask_val_lmdb
- GLOG_logtostderr=1 $TOOLS/convert_imageset \
- --resize_height=$RESIZE_HEIGHT \
- --resize_width=$RESIZE_WIDTH \
- --shuffle \
- $TRAIN_DATA_ROOT \
- $DATA/train.txt \
- $EXAMPLE/mytask_train_lmdb
- echo "Train lmdb done!"
- echo "Creating val lmdb..."
- GLOG_logtostderr=1 $TOOLS/convert_imageset \
- --resize_height=$RESIZE_HEIGHT \
- --resize_width=$RESIZE_WIDTH \
- --shuffle \
- $VAL_DATA_ROOT \
- $DATA/val.txt \
- $EXAMPLE/mytask_val_lmdb
- echo "val lmdb done!"
- echo "Done."