将数据划分为训练数据及测试数据(div_train_val.py 解析)

本文介绍了一种将LFW数据集划分为训练和测试数据的方法,通过Python脚本实现人脸(face)与非人脸(non-face)两类图像的均衡抽取,确保了机器学习任务中的数据质量。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

将LFW数据划分为face,non-face两个图像数据文件,在此基础上,提取训练数据及测试数据。

训练数据,在face文件中提取一部分,在non-face文件中提取一部分。

测试数据,在face文件中提取一部分,在non-face文件中提取一部分。

使用 div_train_val.py ,能得到训练数据及测试数据,其文件内容为,图像的路径及标注。

# -*- coding: utf-8 -*-
"""
Created on Mon Jun  8 14:15:21 2015
@brief: 用与划分训练数据,train.list 和 val.list
@author: Riwei Chen <Riwei.Chen@outlook.com>
"""
import  os
def div_database(filepath,savepath,top_num=1000,equal_num=False,full_path=False):
    '''
    @brief: 提取webface人脸数据
    @param : filepath 文件路径
    @param : top_num=1000,表示提取的类别数目,face,non-face -> top_num= 2
    @param : equal_num 是否强制每个人都相同
    '''
    dirlists=os.listdir(filepath)  #crop_images(存放图像文件)文件下的目录
    dict_id_num={}  #定义一个存放子目录长度的元组
    for subdir in dirlists:
        dict_id_num[subdir]=len(os.listdir(os.path.join(filepath,subdir)))  #存储每个子目录下文件的长度(例如face 子目录里面所包含的图像数量)
    #sorted(dict_id_num.items, key=lambda dict_id_num:dict_id_num[1])
    sorted_num_id=sorted([(v, k) for k, v in dict_id_num.items()], reverse=True) #排序,["face",length] -> [length,"face"]
    select_ids=sorted_num_id[0:top_num]
    if equal_num == True:
        trainfile=save_path+'train_'+str(top_num)+'_equal.list'
        testfile=save_path+'val_'+str(top_num)+'_qeual.list'
    else:  #新建训练文件及测试文件
        trainfile=save_path+'train_'+str(top_num)+'.list'
        testfile=save_path+'val_'+str(top_num)+'.list'
    fid_train=open(trainfile,'w') 
    fid_test=open(testfile,'w')
    pid=0
    pre = ""
    if full_path ==True:
        pre = data_path
    #将数据划分为训练数据及测试数据,face 中选取一部分划分为训练数据,另一部分划分为测试数据;non-face中选取一部分划分为训练数据,另一部分划分为测试数据
    for  select_id in select_ids:
        subdir=select_id[1]
        filenamelist=os.listdir(os.path.join(filepath,subdir))  #获取图像文件名
        num=1
        for filename in filenamelist :
            #print select_ids[top_num-1]
            if equal_num==True and num>select_ids[top_num-1][0]:
                break
            if num%10!=0:
                fid_train.write(os.path.join(pre,subdir,filename)+'\t'+str(pid)+'\n')  #保存图像路径及其标注
            else:
                fid_test.write(os.path.join(pre,subdir,filename)+'\t'+str(pid)+'\n')
            num=num+1
        pid=pid+1
    fid_train.close()
    fid_test.close()

if __name__=='__main__':
    data_path = '/home/zhuangni/code/FaceDetection/ReprocessData/alfw/crop_images'
    save_path = '/home/zhuangni/code/FaceDetection/Data/aflw/'
    div_database(data_path,save_path, top_num=2, equal_num=False,full_path =True)



ileNotFoundError: [Errno 2] No such file or directory: '2/DIV2K_X2/train/LR_bicubic/X2' root@dsw-580131-ff65fdc8d-v8dpq:/mnt/workspace/2# python train_model.py /usr/local/lib/python3.11/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) 2025-06-12 13:27:15.410653: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2025-06-12 13:27:15.450556: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2025-06-12 13:27:16.298466: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Traceback (most recent call last): File "/mnt/workspace/2/train_model.py", line 52, in <module> train_dataset = SRDataset( ^^^^^^^^^^ File "/mnt/workspace/2/train_model.py", line 19, in __init__ self.image_names = sorted(os.listdir(lr_dir)) ^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: '2/DIV2K_X2/train/LR_bicubic/X2'
最新发布
06-14
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值