制作新的train，test数据集

最新推荐文章于 2025-07-06 05:50:39 发布

转载最新推荐文章于 2025-07-06 05:50:39 发布 · 415 阅读

文章标签：

#shell

本文介绍了一种通过随机打乱原始顺序来重新划分训练集和测试集的方法，这种方法能确保两个集合间的场景分布更加均匀。

之前的数据集的train和test是直接按照网上下载的数据的前7000个作为训练集，后2212个作为测试集。看得出来，这个数据集是由开车录制视频转换来的图片数据，后面2000多个图片的场景和前面的场景不太一样。所以将整个数据集随机打乱，随机分配7000个训练集和2212个测试集。下面是代码：

import random
import os

image_index = []

with open('/home/bnrc/all_image_index.txt','r') as f:
    for line in f:
        line = line[:-1]
        image_index.append(line)

random.shuffle(image_index)   
for x in image_index:
    print x

# print len(image_index)


for i in range(7000):
    os.system('cp /home/bnrc/all_image/%s /home/bnrc/new_train/'%image_index[i])      　　　　　　　　#使用os模块，system是直接向终端输入命令。这里还有一个%s格式化输入

for j in range(7000,9212):
    os.system('cp /home/bnrc/all_image/%s /home/bnrc/new_test/'%image_index[j])

# os.system('pwd')