What are training set, validation set and test set?

本文介绍了机器学习中训练集、验证集和测试集的概念及其作用。训练集用于拟合模型参数,验证集用于调整模型结构并防止过拟合,而测试集则用于评估模型的泛化能力。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

这三个名词在机器学习领域的文章中极其常见,但很多人对他们的概念并不是特别清楚,尤其是后两个经常被人混用。Ripley, B.D(1996)在他的经典专著Pattern Recognition and Neural Networks中给出了这三个词的定义。
Training set: A set of examples used for learning, which is to fit the parameters [i.e., weights] of the classifier.
Validation set: A set of examples used to tune the parameters [i.e., architecture, not weights] of a classifier, for example to choose the number of hidden units in a neural network.
Test set: A set of examples used only to assess the performance [generalization] of a fully specified classifier.
显 然,training set是用来训练模型或确定模型参数的,如ANN中权值等; validation set是用来做模型选择(model selection),即做模型的最终优化及确定的,如ANN的结构;而 test set则纯粹是为了测试已经训练好的模型的推广能力。当然,test set这并不能保证模型的正确性,他只是说相似的数据用此模型会得出相似的结果。但实际应用中,一般只将数据集分成两类,即training set 和test set,大多数文章并不涉及validation set
Ripley还谈到了Why separate test and validation sets?
1. The error rate estimate of the final model on validation data will be biased (smaller than the true error rate) since the validation set is used to select the final model.
2. After assessing the final model with the test set, YOU MUST NOT tune the model any further.

from data import * from utils.augmentations import SSDAugmentation, BaseTransform from utils.functions import MovingAverage, SavePath from utils.logger import Log from utils import timer from layers.modules import MultiBoxLoss from yolact import Yolact import os import sys import time import math, random from pathlib import Path import torch from torch.autograd import Variable import torch.nn as nn import torch.optim as optim import torch.backends.cudnn as cudnn import torch.nn.init as init import torch.utils.data as data import numpy as np import argparse import datetime # Oof import eval as eval_script def str2bool(v): return v.lower() in ("yes", "true", "t", "1") parser = argparse.ArgumentParser( description='Yolact Training Script') parser.add_argument('--batch_size', default=2, type=int, help='Batch size for training') parser.add_argument('--resume', default=None, type=str, help='Checkpoint state_dict file to resume training from. If this is "interrupt"'\ ', the model will resume training from the interrupt file.') parser.add_argument('--start_iter', default=-1, type=int, help='Resume training at this iter. If this is -1, the iteration will be'\ 'determined from the file name.') parser.add_argument('--num_workers', default=0, type=int, help='Number of workers used in dataloading') parser.add_argument('--cuda', default=True, type=str2bool, help='Use CUDA to train model') parser.add_argument('--lr', '--learning_rate', default=None, type=float, help='Initial learning rate. Leave as None to read this from the config.') parser.add_argument('--momentum', default=None, type=float, help='Momentum for SGD. Leave as None to read this from the config.') parser.add_argument('--decay', '--weight_decay', default=None, type=float, help='Weight decay for SGD. Leave as None to read this from the config.') parser.add_argument('--gamma', default=None, type=float, help='For each lr step, what to multiply the lr by. Leave as None to read this from the config.') parser.add_argument('--save_folder', default='weights/', help='Directory for saving checkpoint models.') parser.add_argument('--log_folder', default='logs/', help='Directory for saving logs.') parser.add_argument('--config', default=None, help='The config object to use.') parser.add_argument('--save_interval', default=10000, type=int, help='The number of iterations between saving the model.') parser.add_argument('--validation_size', default=5000, type=int, help='The number of images to use for validation.') parser.add_argument('--validation_epoch', default=2, type=int, help='Output validation information every n iterations. If -1, do no validation.') parser.add_argument('--keep_latest', dest='keep_latest', action='store_true', help='Only keep the latest checkpoint instead of each one.') parser.add_argument('--keep_latest_interval', default=100000, type=int, help='When --keep_latest is on, don\'t delete the latest file at these intervals. This should be a multiple of save_interval or 0.') parser.add_argument('--dataset', default=None, type=str, help='If specified, override the dataset specified in the config with this one (example: coco2017_dataset).') parser.add_argument('--no_log', dest='log', action='store_false', help='Don\'t log per iteration information into log_folder.') parser.add_argument('--log_gpu', dest='log_gpu', action='store_true', help='Include GPU information in the logs. Nvidia-smi tends to be slow, so set this with caution.') parser.add_argument('--no_interrupt', dest='interrupt', action='store_false', help='Don\'t save an interrupt when KeyboardInterrupt is caught.') parser.add_argument('--batch_alloc', default=None, type=str, help='If using multiple GPUS, you can set this to be a comma separated list detailing which GPUs should get what local batch size (It should add up to your total batch size).') parser.add_argument('--no_autoscale', dest='autoscale', action='store_false', help='YOLACT will automatically scale the lr and the number of iterations depending on the batch size. Set this if you want to disable that.') parser.set_defaults(keep_latest=False, log=True, log_gpu=False, interrupt=True, autoscale=True) args = parser.parse_args() if args.config is not None: set_cfg(args.config) if args.dataset is not None: set_dataset(args.dataset) if args.autoscale and args.batch_size != 8: factor = args.batch_size / 8 if __name__ == '__main__': print('Scaling parameters by %.2f to account for a batch size of %d.' % (factor, args.batch_size)) cfg.lr *= factor cfg.max_iter //= factor cfg.lr_steps = [x // factor for x in cfg.lr_steps] # Update training parameters from the config if necessary def replace(name): if getattr(args, name) == None: setattr(args, name, getattr(cfg, name)) replace('lr') replace('decay') replace('gamma') replace('momentum') # This is managed by set_lr cur_lr = args.lr if torch.cuda.device_count() == 0: print('No GPUs detected. Exiting...') exit(-1) if args.batch_size // torch.cuda.device_count() < 6: if __name__ == '__main__': print('Per-GPU batch size is less than the recommended limit for batch norm. Disabling batch norm.') cfg.freeze_bn = True loss_types = ['B', 'C', 'M', 'P', 'D', 'E', 'S', 'I'] if torch.cuda.is_available(): if args.cuda: torch.set_default_tensor_type('torch.cuda.FloatTensor') if not args.cuda: print("WARNING: It looks like you have a CUDA device, but aren't " + "using CUDA.\nRun with --cuda for optimal training speed.") torch.set_default_tensor_type('torch.FloatTensor') else: torch.set_default_tensor_type('torch.FloatTensor') class NetLoss(nn.Module): """ A wrapper for running the network and computing the loss This is so we can more efficiently use DataParallel. """ def __init__(self, net:Yolact, criterion:MultiBoxLoss): super().__init__() self.net = net self.criterion = criterion def forward(self, images, targets, masks, num_crowds): preds = self.net(images) losses = self.criterion(self.net, preds, targets, masks, num_crowds) return losses class CustomDataParallel(nn.DataParallel): """ This is a custom version of DataParallel that works better with our training data. It should also be faster than the general case. """ def scatter(self, inputs, kwargs, device_ids): # More like scatter and data prep at the same time. The point is we prep the data in such a way # that no scatter is necessary, and there's no need to shuffle stuff around different GPUs. devices = ['cuda:' + str(x) for x in device_ids] splits = prepare_data(inputs[0], devices, allocation=args.batch_alloc) return [[split[device_idx] for split in splits] for device_idx in range(len(devices))], \ [kwargs] * len(devices) def gather(self, outputs, output_device): out = {} for k in outputs[0]: out[k] = torch.stack([output[k].to(output_device) for output in outputs]) return out def train(): if not os.path.exists(args.save_folder): os.mkdir(args.save_folder) dataset = COCODetection(image_path=cfg.dataset.train_images, info_file=cfg.dataset.train_info, transform=SSDAugmentation(MEANS)) if args.validation_epoch > 0: setup_eval() val_dataset = COCODetection(image_path=cfg.dataset.valid_images, info_file=cfg.dataset.valid_info, transform=BaseTransform(MEANS)) # Parallel wraps the underlying module, but when saving and loading we don't want that yolact_net = Yolact() net = yolact_net net.train() if args.log: log = Log(cfg.name, args.log_folder, dict(args._get_kwargs()), overwrite=(args.resume is None), log_gpu_stats=args.log_gpu) # I don't use the timer during training (I use a different timing method). # Apparently there's a race condition with multiple GPUs, so disable it just to be safe. timer.disable_all() # Both of these can set args.resume to None, so do them before the check if args.resume == 'interrupt': args.resume = SavePath.get_interrupt(args.save_folder) elif args.resume == 'latest': args.resume = SavePath.get_latest(args.save_folder, cfg.name) if args.resume is not None: print('Resuming training, loading {}...'.format(args.resume)) yolact_net.load_weights(args.resume) if args.start_iter == -1: args.start_iter = SavePath.from_str(args.resume).iteration else: print('Initializing weights...') yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=args.momentum, weight_decay=args.decay) criterion = MultiBoxLoss(num_classes=cfg.num_classes, pos_threshold=cfg.positive_iou_threshold, neg_threshold=cfg.negative_iou_threshold, negpos_ratio=cfg.ohem_negpos_ratio) if args.batch_alloc is not None: args.batch_alloc = [int(x) for x in args.batch_alloc.split(',')] if sum(args.batch_alloc) != args.batch_size: print('Error: Batch allocation (%s) does not sum to batch size (%s).' % (args.batch_alloc, args.batch_size)) exit(-1) net = CustomDataParallel(NetLoss(net, criterion)) if args.cuda: net = net.cuda() # Initialize everything if not cfg.freeze_bn: yolact_net.freeze_bn() # Freeze bn so we don't kill our means yolact_net(torch.zeros(1, 3, cfg.max_size, cfg.max_size).cuda()) if not cfg.freeze_bn: yolact_net.freeze_bn(True) # loss counters loc_loss = 0 conf_loss = 0 iteration = max(args.start_iter, 0) last_time = time.time() epoch_size = len(dataset)+1 // args.batch_size num_epochs = math.ceil(cfg.max_iter / epoch_size) # Which learning rate adjustment step are we on? lr' = lr * gamma ^ step_index step_index = 0 data_loader = data.DataLoader(dataset, args.batch_size, num_workers=args.num_workers, shuffle=True, collate_fn=detection_collate, pin_memory=True) save_path = lambda epoch, iteration: SavePath(cfg.name, epoch, iteration).get_path(root=args.save_folder) time_avg = MovingAverage() global loss_types # Forms the print order loss_avgs = { k: MovingAverage(100) for k in loss_types } print('Begin training!') print() # try-except so you can use ctrl+c to save early and stop training try: for epoch in range(num_epochs): # Resume from start_iter if (epoch+1)*epoch_size < iteration: continue for datum in data_loader: # Stop if we've reached an epoch if we're resuming from start_iter if iteration == (epoch+1)*epoch_size: break # Stop at the configured number of iterations even if mid-epoch if iteration == cfg.max_iter: break # Change a config setting if we've reached the specified iteration changed = False for change in cfg.delayed_settings: if iteration >= change[0]: changed = True cfg.replace(change[1]) # Reset the loss averages because things might have changed for avg in loss_avgs: avg.reset() # If a config setting was changed, remove it from the list so we don't keep checking if changed: cfg.delayed_settings = [x for x in cfg.delayed_settings if x[0] > iteration] # Warm up by linearly interpolating the learning rate from some smaller value if cfg.lr_warmup_until > 0 and iteration <= cfg.lr_warmup_until: set_lr(optimizer, (args.lr - cfg.lr_warmup_init) * (iteration / cfg.lr_warmup_until) + cfg.lr_warmup_init) # Adjust the learning rate at the given iterations, but also if we resume from past that iteration while step_index < len(cfg.lr_steps) and iteration >= cfg.lr_steps[step_index]: step_index += 1 set_lr(optimizer, args.lr * (args.gamma ** step_index)) # Zero the grad to get ready to compute gradients optimizer.zero_grad() # Forward Pass + Compute loss at the same time (see CustomDataParallel and NetLoss) losses = net(datum) losses = { k: (v).mean() for k,v in losses.items() } # Mean here because Dataparallel loss = sum([losses[k] for k in losses]) # no_inf_mean removes some components from the loss, so make sure to backward through all of it # all_loss = sum([v.mean() for v in losses.values()]) # Backprop loss.backward() # Do this to free up vram even if loss is not finite if torch.isfinite(loss).item(): optimizer.step() # Add the loss to the moving average for bookkeeping for k in losses: loss_avgs[k].add(losses[k].item()) cur_time = time.time() elapsed = cur_time - last_time last_time = cur_time # Exclude graph setup from the timing information if iteration != args.start_iter: time_avg.add(elapsed) if iteration % 10 == 0: eta_str = str(datetime.timedelta(seconds=(cfg.max_iter-iteration) * time_avg.get_avg())).split('.')[0] total = sum([loss_avgs[k].get_avg() for k in losses]) loss_labels = sum([[k, loss_avgs[k].get_avg()] for k in loss_types if k in losses], []) print(('[%3d] %7d ||' + (' %s: %.3f |' * len(losses)) + ' T: %.3f || ETA: %s || timer: %.3f') % tuple([epoch, iteration] + loss_labels + [total, eta_str, elapsed]), flush=True) if args.log: precision = 5 loss_info = {k: round(losses[k].item(), precision) for k in losses} loss_info['T'] = round(loss.item(), precision) if args.log_gpu: log.log_gpu_stats = (iteration % 10 == 0) # nvidia-smi is sloooow log.log('train', loss=loss_info, epoch=epoch, iter=iteration, lr=round(cur_lr, 10), elapsed=elapsed) log.log_gpu_stats = args.log_gpu iteration += 1 if iteration % args.save_interval == 0 and iteration != args.start_iter: if args.keep_latest: latest = SavePath.get_latest(args.save_folder, cfg.name) print('Saving state, iter:', iteration) yolact_net.save_weights(save_path(epoch, iteration)) if args.keep_latest and latest is not None: if args.keep_latest_interval <= 0 or iteration % args.keep_latest_interval != args.save_interval: print('Deleting old save...') os.remove(latest) # This is done per epoch if args.validation_epoch > 0: if epoch % args.validation_epoch == 0 and epoch > 0: compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None) # Compute validation mAP after training is finished compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None) except KeyboardInterrupt: if args.interrupt: print('Stopping early. Saving network...') # Delete previous copy of the interrupted network so we don't spam the weights folder SavePath.remove_interrupt(args.save_folder) yolact_net.save_weights(save_path(epoch, repr(iteration) + '_interrupt')) exit() yolact_net.save_weights(save_path(epoch, iteration)) def set_lr(optimizer, new_lr): for param_group in optimizer.param_groups: param_group['lr'] = new_lr global cur_lr cur_lr = new_lr def gradinator(x): x.requires_grad = False return x def prepare_data(datum, devices:list=None, allocation:list=None): with torch.no_grad(): if devices is None: devices = ['cuda:0'] if args.cuda else ['cpu'] if allocation is None: allocation = [args.batch_size // len(devices)] * (len(devices) - 1) allocation.append(args.batch_size - sum(allocation)) # The rest might need more/less images, (targets, masks, num_crowds) = datum cur_idx = 0 for device, alloc in zip(devices, allocation): for _ in range(alloc): images[cur_idx] = gradinator(images[cur_idx].to(device)) targets[cur_idx] = gradinator(targets[cur_idx].to(device)) masks[cur_idx] = gradinator(masks[cur_idx].to(device)) cur_idx += 1 if cfg.preserve_aspect_ratio: # Choose a random size from the batch _, h, w = images[random.randint(0, len(images)-1)].size() for idx, (image, target, mask, num_crowd) in enumerate(zip(images, targets, masks, num_crowds)): images[idx], targets[idx], masks[idx], num_crowds[idx] \ = enforce_size(image, target, mask, num_crowd, w, h) cur_idx = 0 split_images, split_targets, split_masks, split_numcrowds \ = [[None for alloc in allocation] for _ in range(4)] for device_idx, alloc in enumerate(allocation): split_images[device_idx] = torch.stack(images[cur_idx:cur_idx+alloc], dim=0) split_targets[device_idx] = targets[cur_idx:cur_idx+alloc] split_masks[device_idx] = masks[cur_idx:cur_idx+alloc] split_numcrowds[device_idx] = num_crowds[cur_idx:cur_idx+alloc] cur_idx += alloc return split_images, split_targets, split_masks, split_numcrowds def no_inf_mean(x:torch.Tensor): """ Computes the mean of a vector, throwing out all inf values. If there are no non-inf values, this will return inf (i.e., just the normal mean). """ no_inf = [a for a in x if torch.isfinite(a)] if len(no_inf) > 0: return sum(no_inf) / len(no_inf) else: return x.mean() def compute_validation_loss(net, data_loader, criterion): global loss_types with torch.no_grad(): losses = {} # Don't switch to eval mode because we want to get losses iterations = 0 for datum in data_loader: images, targets, masks, num_crowds = prepare_data(datum) out = net(images) wrapper = ScatterWrapper(targets, masks, num_crowds) _losses = criterion(out, wrapper, wrapper.make_mask()) for k, v in _losses.items(): v = v.mean().item() if k in losses: losses[k] += v else: losses[k] = v iterations += 1 if args.validation_size <= iterations * args.batch_size: break for k in losses: losses[k] /= iterations loss_labels = sum([[k, losses[k]] for k in loss_types if k in losses], []) print(('Validation ||' + (' %s: %.3f |' * len(losses)) + ')') % tuple(loss_labels), flush=True) def compute_validation_map(epoch, iteration, yolact_net, dataset, log:Log=None): with torch.no_grad(): yolact_net.eval() start = time.time() print() print("Computing validation mAP (this may take a while)...", flush=True) val_info = eval_script.evaluate(yolact_net, dataset, train_mode=True) end = time.time() if log is not None: log.log('val', val_info, elapsed=(end - start), epoch=epoch, iter=iteration) yolact_net.train() def setup_eval(): eval_script.parse_args(['--no_bar', '--max_images='+str(args.validation_size)]) if __name__ == '__main__': train() 上述代码是yolact分析语义分割的代码,但输出结构要额外计算IoU、Parameters、GFLOPs、FPS这四个指标,修改代码
06-20
<think>我们正在修改YOLACT代码以输出额外的指标:IoU、模型参数数量(Parameters)、GFLOPs和FPS。根据引用[1]和[2],我们需要注意:-参数量(Params):模型中所有可学习参数的总数,单位可以是百万(M)。-GFLOPs:十亿次浮点运算,衡量模型计算复杂度。-FPS:每秒处理的帧数,衡量实时性。步骤:1.IoU(交并比):通常用于评估目标检测的精度,在YOLACT中,它可能在评估阶段已经计算,但我们需要在训练或测试时输出。2.参数量:在模型初始化后就可以计算,并打印出来。3.GFLOPs:需要模型的前向传播计算图,可以通过工具计算(如thop,ptflops等)或者手动计算(复杂)。4.FPS:需要在推理阶段计时,然后根据处理的图像数量和时间计算。由于YOLACT代码基于PyTorch,我们可以使用现有的库来计算参数量和FLOPs。具体实现步骤:一、计算参数量(Parameters)在PyTorch中,可以通过遍历模型的参数并累加其数量来得到。二、计算GFLOPs我们可以使用第三方库,例如`thop`或`ptflops`。这里以`thop`为例。三、计算FPS在推理过程中,记录处理一定数量图像(如100张)的总时间,然后计算平均FPS。四、输出IoU在YOLACT中,IoU通常是在评估时计算。我们可以修改评估代码,使其在每一类或每一张图片上输出IoU,并可能输出平均IoU。但是注意:用户要求的是在现有的代码基础上添加这些指标的输出,所以我们需要找到合适的位置插入代码。实施细节:1.参数量和GFLOPs的计算只需要进行一次,可以在模型初始化后计算并保存,然后在需要输出的时候打印。2.FPS需要在推理过程中计算。为了避免初始加载的影响,我们通常跳过前几帧,然后计算后面连续帧的处理时间。3.IoU的计算:在YOLACT的评估函数中,通常已经计算了mAP(平均精度均值),其中包含IoU的计算。但是,我们可能需要单独输出IoU(例如,每个类别的平均IoU或整体平均IoU)。因此,我们需要在评估过程中提取这些信息。修改建议:步骤1:在模型初始化后计算参数量和GFLOPs-安装thop库:`pipinstallthop`-在模型构建完成后,使用thop计算FLOPs,并同时计算参数量。步骤2:在评估过程中输出IoU-在YOLACT的评估代码中,通常有一个计算mAP的过程。我们可以修改这部分代码,使其同时输出平均IoU。步骤3:在推理过程中计算FPS-在测试脚本中,使用时间模块记录推理时间(不包括数据加载和后处理中的非模型部分?或者包括整个流程?根据需求,通常FPS指的是整个流程的每秒帧数)。-注意:为了准确,建议多次测量取平均。具体代码修改:一、参数量和GFLOPs在模型初始化后(例如,在训练脚本中构建模型后,或者在测试脚本中加载模型后)添加以下代码:```pythonimporttorchfromthopimportprofile#假设model是已经构建好的模型#创建一个示例输入,尺寸需要与模型的输入匹配input=torch.randn(1,3,cfg.img_size,cfg.img_size).to(device)#使用thop计算FLOPs和参数量flops,params=profile(model,inputs=(input,))#转换为GFLOPs和百万参数gflops=flops/1e9m_params=params/1e6print(f"Params:{m_params:.3f}M")print(f"GFLOPs:{gflops:.3f}G")```注意:`cfg.img_size`是配置中的图像大小,需要根据实际情况调整。另外,`profile`函数可能会因为模型中的某些操作而报错,需要确保模型在给定输入下能正常运行。二、FPS计算在推理循环中,我们可以这样计算:```pythonimporttime#假设我们有一个测试数据加载器test_loader=...#预热:跳过前几帧foriinrange(5):data=next(iter(test_loader))model(data)#正式测试total_time=0.0num_frames=100foriinrange(num_frames):data=next(iter(test_loader))start_time=time.time()model(data)#包括前向传播,但可能不包括后处理?根据需求调整end_time=time.time()ifi>=5:#跳过前5帧total_time+=end_time-start_time#计算平均FPSavg_fps=(num_frames-5)/total_timeprint(f"FPS:{avg_fps:.2f}")```注意:这里只计算了模型前向传播的时间。如果用户希望包括数据加载和后处理,则需要将整个流程的时间计算在内。三、输出IoU在YOLACT的评估代码中,通常会调用`evalimage`或类似的函数。在计算mAP的过程中,会计算每个预测框与真实框的IoU。我们可以修改这部分代码,在最后输出平均IoU。在`layers/box_utils.py`或`utils/eval.py`中,找到计算mAP的部分。通常,计算mAP时已经计算了IoU,我们可以将这些IoU值保存下来并求平均。例如,在计算每个类别的AP时,我们可以记录所有匹配的检测框的IoU(即那些被认定为真正例的检测框的IoU),然后计算平均IoU。具体修改需要根据YOLACT的代码结构,这里给出一个可能的修改方向:```python#假设在计算AP的函数中,我们有一个列表存放每个真正例的IoUious=[]#在匹配过程中,当匹配到一个真正例时,记录该检测框与真实框的IoUious.append(iou_value)#计算整个数据集的平均IoUmean_iou=sum(ious)/len(ious)iflen(ious)>0else0print(f"MeanIoU:{mean_iou:.4f}")```注意:这个平均IoU只考虑了真正例(即正确检测且分类正确的框),但也可以考虑所有检测框与真实框的最大IoU(不管是否匹配)来得到另一种平均IoU。用户需要明确需求。总结:1.在模型初始化后计算并保存参数量和GFLOPs。2.在评估代码中修改以输出平均IoU。3.在推理测试时计算FPS。注意:由于YOLACT的具体实现可能有所不同,以上代码需要根据实际情况调整。最后,我们可以在训练开始前或测试时输出这些指标。相关问题:1.如何计算目标检测模型中的IoU?2.模型参数量(Parameters)和计算量(FLOPs)有什么区别?3.如何提高目标检测模型的FPS?</think>### 如何在 YOLACT 中添加 IoU、Params、GFLOPs 和 FPS 指标 #### 一、添加参数量计算 (Parameters) 在模型初始化后添加参数统计代码: ```python def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) # 在模型初始化后调用 total_params = count_parameters(yolact_model) print(f"Params: {total_params/1e6:.3f}M") # 转换为百万单位 ``` #### 二、添加 GFLOPs 计算 使用 `thop` 库计算浮点运算量: ```python pip install thop # 先安装依赖 ``` ```python from thop import profile # 在模型初始化后添加 input = torch.randn(1, 3, cfg.img_size, cfg.img_size).to(device) flops, _ = profile(yolact_model, inputs=(input,)) print(f"GFLOPs: {flops/1e9:.2f}G") # 转换为十亿单位 ``` #### 三、添加 FPS 计算 在推理循环中添加时间统计: ```python import time # 在推理循环前添加 total_time = 0 num_frames = 100 # 测试帧数 # 预热 for _ in range(10): yolact_model(torch.randn(1,3,550,550).to(device)) # 正式测试 for i in range(num_frames): start = time.perf_counter() # 原推理代码 with torch.no_grad(): preds = yolact_model(input_image) # 后处理代码 # ... total_time += time.perf_counter() - start fps = num_frames / total_time print(f"FPS: {fps:.2f}") ``` #### 四、添加 IoU 计算 在评估函数中添加交并比计算: ```python def calculate_iou(box1, box2): # box格式: [x1, y1, x2, y2] inter_x1 = max(box1[0], box2[0]) inter_y1 = max(box1[1], box2[1]) inter_x2 = min(box1[2], box2[2]) inter_y2 = min(box1[3], box2[3]) inter_area = max(0, inter_x2 - inter_x1) * max(0, inter_y2 - inter_y1) union_area = (box1[2]-box1[0])*(box1[3]-box1[1]) + \ (box2[2]-box2[0])*(box2[3]-box2[1]) - inter_area return inter_area / union_area # 在检测结果与GT匹配时调用 for pred_box, gt_box in zip(predictions, ground_truths): iou = calculate_iou(pred_box, gt_box) # 存储或输出iou ``` #### 五、集成输出 创建统一输出函数: ```python def print_metrics(params, gflops, fps, iou=None): print(f"Model Metrics:") print(f"├── Parameters: {params:.3f}M") print(f"├── GFLOPs: {gflops:.2f}G") print(f"├── FPS: {fps:.2f}") if iou is not None: print(f"└── mIoU: {iou:.4f}") # 在评估流程结束时调用 ``` #### 六、关键修改位置 1. **参数和GFLOPs**:模型初始化后立即计算 2. **FPS**:在 `eval.py` 或测试脚本的推理循环中 3. **IoU**:在 `eval.py` 的评估函数中匹配预测和GT时 4. **配置文件**:在 `config.py` 中添加新指标开关 > **注意事项**: > 1. FPS 测试需在 GPU 环境进行,排除数据加载时间 > 2. IoU 计算需在 NMS 后处理之后 > 3. 使用 `torch.no_grad()` 避免影响推理速度 > 4. GFLOPs 计算使用固定输入尺寸 (550x550) #### 七、参数说明 - **Params**:41.353M 表示模型有 4135 万可学习参数[^1] - **GFLOPs**:衡量单次前向传播计算复杂度[^2] - **FPS**:实际部署的关键性能指标[^2] - **IoU**:检测质量核心指标,值域 [0,1] [^1]: Params 指模型中所有可学习参数的总数,决定模型容量和内存需求 [^2]: GFLOPs 衡量计算量,FPS 衡量实时性能,两者反映不同维度的效率 --- ### 相关问题 1. **如何准确测量包含数据加载的端到端 FPS?** 需要在数据加载前后分别计时,并考虑 batch size 的影响。 2. **为什么 GFLOPs 相同但实际推理速度不同?** 硬件架构(如 Tensor Core)、内存带宽和算子优化都会影响实际速度[^2]。 3. **如何减少 YOLACT 的参数量而不显著影响 mAP?** 可尝试通道剪枝、知识蒸馏或更换 backbone(如 MobileNetV3)。 4. **IoU 计算中如何处理未匹配的预测框?** 通常将未匹配框的 IoU 计为 0,或仅统计与 GT 匹配的预测框。 5. **FLOPs 和 FLOPS 有什么区别?** FLOPs 指运算总量,FLOPS 指设备每秒浮点运算能力[^2]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值