简介
想用PyTorch 做分类任务的模型训练,找到一个使用模板,稍加调整并附上我的理解。
1. 数据准备
在这个阶段,传入batch_size, 传入训练样本的存储路径(image_path),数据储存格式如下:
Data
----class1
-----image01.png
-----image02.png
……
----class2
-----image11.png
-----image12.png
……
----class3
-----image21.png
-----image22.png
……
接下来就采用torch.utils.data.DataLoader将数据按照train 和 val 打包(这个函数的用法放在最后), 同时也使用了数据增强。
# 传入 batch_size
def train_val_data_process(batch_size:Int,image_path:str):
data_transform = {
"train": transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
"val": transforms.Compose([transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
# check the image_path exist or not
assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"), transform=data_transform["train"])
train_num = len(train_dataset)
cl_list = train_dataset.class_to_idx
num_classes = len(cl_list)
print("Number of classes:", num_classes) #
cla_dict = dict((val, key) for key, val in cl_list.items())
# write dict into json file
json_str = json.dumps(cla_dict, indent=4)
with open('class_indices.json', 'w') as json_file:
json_file.write(json_str)
nw = min([os.cpu_count(), batch_size if batch_size > 1 else