目标检测学习笔记——探索YOLOV5

phily123

已于 2023-02-20 18:17:23 修改

阅读量5.6k

点赞数 2

分类专栏：目标检测学习笔记文章标签： pytorch 学习深度学习

于 2022-03-03 11:56:43 首次发布

本文链接：https://blog.youkuaiyun.com/phily123/article/details/123250466

版权

目标检测学习笔记专栏收录该内容

30 篇文章

订阅专栏

这篇博客详细介绍了YOLOV5的训练过程，包括数据集格式、学习率策略、冻结训练、断点续训和NMS的实现。此外，还探讨了如何修改代码以使用F2作为评价指标，并提供了数据增强的方法。同时，文章涵盖了损失函数计算、自动锚点(autoanchor)以及训练中常见的错误和解决办法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、数据集准备

yolov5数据label采用的格式不是coco的json，也不是voc的xml文件格式，而是txt格式，其中每个图片对应一个label的txt文档，每一个txt文档每一行表示一个目标，多少行就表示这张图片对应多少个目标，其中label的组成格式是class_id,x,y,w,h，其中x,y为归一化的中心坐标，w,h也经过归一化。然后重新创建一个项目data的yaml文件。
数据集格式相互转换——CoCo、VOC、YOLO、TT100K
YOLOV5学习笔记（七）——训练自己数据集

二、YOLOV5学习率更新策略

yolov5的学习率调度器采用的是LambdaLR
参考链接：目标检测 YOLOv5 - 学习率
 yolov5学习率设置

三、YOLOV5冻结训练

参考链接： https://wandb.ai/glenn-jocher/yolov5_tutorial_freeze/reports/Freezing-Layers-in-YOLOv5–VmlldzozMDk3NTg

四、断点续训

（一）上一次未训练完，接着上次训练
将train.py里的resume的default设置为True
(二）上一次未训练完，接着上次训练的同时增大最终epochs
将train.py里的resume的default设置为True，同时修改opt.yaml里的epochs
(三）上一次已训练完，增大最终epochs同时接着上次训练
将train.py里的resume的default设置为True，同时修改opt.yaml里的epochs华人修改torch_utils.py里的start_epoch
(这里可以发现损失函数突然增加，评价指标降低，lr0,lr1,lr2从0.01开始）

五、NMS

YOLOV5使用的是加权nms(weighted nms),但是默认是关闭的使用的还是传统的nms。
可以看出加权坐标的权重是iou * scores>

源代码在utils/general.py里

 merge = False  # use merge-NMS

 if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)
            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)
            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix
            weights = iou * scores[None]  # box weights
            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes
            if redundant:
                i = i[iou.sum(1) > 1]  # require redundancy

        output[xi] = x[i]

六、数据增强

https://blog.youkuaiyun.com/u010598525/article/details/112623356?utm_term=yolov5%E6%80%8E%E4%B9%88%E4%BF%AE%E6%94%B9%E6%95%B0%E6%8D%AE%E5%A2%9E%E5%BC%BA%E6%96%B9%E6%B3%95&utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2_allsobaiduweb~default-0-112623356&spm=3001.4430

七、坐标输出形式和损失函数计算

一般目标检测的坐标输出都是pred[:,:4],其中pred[:,0]pred[:,1]是中心坐标的偏移量；pred[:,2]pred[:,3]是宽和长的偏移量，要通过公式转化成Xmin,Ymin,Xmax,Ymax(olov5)或者Xmin,Ymin,W,H(efficientdet)。标签的坐标格式是xyxy或者xywh(yolov5是归一化的xywh)
yolov5 loss总结
[YOLOV5代码理解——损失函数的计算(https://blog.youkuaiyun.com/l13022736018/article/details/118346085)

八、评价指标改为F2

链接： Logging F2 score while training YOLOv5

1、修改metrics.py

trian.py代码里这行

fi = fitness(np.array(results).reshape(1, -1))  # weighted combination of [P, R, mAP@.5, mAP@.5-.95]
            if fi > best_fitness:
                best_fitness = fi

利用fitness来确定best.pt，注意results的输出是长度为七的元组，是由val.py文件里的run函数返回的。
run函数的返回为return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t，所以取result[0:4]来计算fitness。而val.py文件里的run函数里面的这些metrics是由代码：tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)产生的，是由metrics.py里的ap_per_class产生的。
在这里插入图片描述

将metrics.py这行

f1 = 2 * p * r / (p + r + eps)

改为f2 = 5 * p * r / (4 * p + r + 1e-16)
然后对ap_per_class函数的所有f1做这个修改，同时修改返回为return tp, fp, p, r, f2, ap, unique_classes.astype("int32")
然后修改firness函数如下

def fitness(x):
    # Model fitness as a weighted combination of metrics
    w = [0.0, 0.0, 0.0, 0.0, 1.0] # weights for [P, R, mAP@0.5, mAP@0.5:0.95, F2@0.3:0.8]
    return (x[:, :5] * w).sum(1)

2、修改val.py

将val.py这行

iouv = torch.linspace(0.5, 0.95, 10).to(device)  # iou vector for mAP@0.5:0.95

改成iouv= torch.from_numpy(np.arange(0.3, 0.85, 0.05)).to(device)
将这行

dt, p, r, f1, mp, mr, map50, map = [0.0, 0.0, 0.0], 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

和这行

tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)

的f1改成f2
再将这行

ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()

改成 ap50, ap, f2 = ap[:, 0], ap.mean(1), f2.mean(0) # AP@0.5, AP@0.5:0.95 和 mp, mr, f2, map50, map = p.mean(), r.mean(), f2.mean(), ap50.mean(), ap.mean()

再将返回值

 return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t

修改成 return (mp, mr, map50, map, f2, *(loss.cpu() / len(dataloader)).tolist()), maps, t

3、在命令行显示F2

将val.py这行

 s = ('%20s' + '%11s' * 6) % ('Class', 'Images', 'Labels', 'P', 'R', 'mAP@.5', 'mAP@.5:.95')

改成 s = ('%20s' + '%11s' * 6) % ('Class', 'Images', 'Labels', 'P', 'R', 'F2', 'mAP@.5')
将这行

LOGGER.info(pf % ('all', seen, nt.sum(), mp, mr, map50, map))

改成 LOGGER.info(pf % ('all', seen, nt.sum(), mp, mr, f2, map50))

4、在wandb中显示和记录F2：

在 utils/loggers/init.py文件里，在这行

self.keys = ['train/box_loss', 'train/obj_loss', 'train/cls_loss',  # train loss
                     'metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.5:0.95',  # metrics
                     'val/box_loss', 'val/obj_loss', 'val/cls_loss',  # val loss
                     'x/lr0', 'x/lr1', 'x/lr2']  # params

添加 “metrics/F2”
在这行

  self.best_keys = ['best/epoch', 'best/precision', 'best/recall', 'best/mAP_0.5', 'best/mAP_0.5:0.95',]

添加 “best/F2”
在函数on_fit_epoch_end里修改这行

best_results = [epoch] + vals[3:7]

为best_results = [epoch] + vals[3:8]

九、Loss计算

# 此函数得到了一个batch内所有gt所对应的所有正样本(考虑了三种长宽比的anchor，和gt中心点最靠近的两个cell），得到了gt对应左上角的偏移量tx，ty和对应特征图的tw,th， 也就可以与预测tx,ty经过处理,tw,th处理后乘以anchor的长宽比，求损失函数
    def build_targets(self, p, targets):
        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)，image为[0:bs]，表示属于batch内部的第几张图片
        # p是长度为3的列表，形状分别为(2,3,80,80,6),(2,3,40,40,6),(2,3,20,20,6),2是bs,3是3个尺度的anchor,80是特征图尺寸
        # targets.shape = (5,6)
        #na = 3, nt = 5,na表示一个cell对应的anchor数，nt表示一个batch对应的target数
        na, nt = self.na, targets.shape[0]  # number of anchors, targets
        tcls, tbox, indices, anch = [], [], [], []
        # gain.shape = (7,)
        gain = torch.ones(7, device=targets.device)  # normalized to gridspace gain
        # ai.shape = (3,5)
        ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)
        #targets.shape = (3,5,7)，将batch内的5个target复制三份，为了分别与3个anchor进行计算，6变成7是增加了一个索引，表示与哪一个anchor匹配
        targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indices

        g = 0.5  # bias
        off = torch.tensor([[0, 0],
                            [1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m
                            # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm
                            ], device=targets.device).float() * g  # offsets
        # 遍历每一层特征图
        for i in range(self.nl):
            # anchors.shape = (3,2)，三个anchor的长宽
            # anchors = tensor([[1.25,1.625],[2,3.75],[4.125,2.875]]，这是由事先设定的对应当前尺度特征图的anchor相比较当前特征图stride的大小，见下一行
            # [10,13, 16,30, 33,23]  # P3/8
            anchors = self.anchors[i]
            #debug_a = tensor([2,3,80,80,6])
            debug_a = torch.tensor(p[i].shape)
            #debug_b = tensor([80,80,80,80])
            debug_b = debug_a[[3, 2, 3, 2]]
            gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain
            #gain = tensor([1.,1.,80.,80.,80.,80.,1.]),gain.shape=(7,)
            # Match targets to anchors
            # t.shape = (3,5,7)
            # 将归一化的targets尺寸映射到对应特征图的尺寸，比如80*80
            t = targets * gain
            if nt:
                #对每个输出层单独匹配。首先将targets变成anchor尺度，方便计算；
                # 然后将target wh shape和anchor的wh计算比例，如果比例过大，则说明匹配度不高，将该bbox过滤，在当前层认为是bg
                # Matches
                #[4:6]->w,h
                # r.shape = (3,5,2)
                r = t[:, :, 4:6] / anchors[:, None]  # wh ratio
                # j.shape = (3,5),j值为True或False
                j = torch.max(r, 1 / r).max(2)[0] < self.hyp['anchor_t']  # compare
                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))
                # 这里通过j的True还是False来确定t的值，False的话则舍弃，所以(3,5,7)的shape变成了(14,7),因为有一个False.
                t = t[j]  # filter

                # Offsets
                gxy = t[:, 2:4]  # grid xy
                gxi = gain[[2, 3]] - gxy  # inverse
                #j,k,l,m.shape = (14,)，值为布尔值
                j, k = ((gxy % 1 < g) & (gxy > 1)).T
                l, m = ((gxi % 1 < g) & (gxi > 1)).T
                # j.shape = (5,14),值为True或False,5是因为一个target不仅仅由中心点所落的cell负责预测，还考虑包括它在内与附近四个cell的一共五个cell
                # 5*14个布尔值一共有3*14个True和2*14个False
                j = torch.stack((torch.ones_like(j), j, k, l, m))
                # t.shape = (42,7),42是因为每个cell最终分别会从附近四个cell选取两个cell,一共三个cell,3*14=42
                # t.repeat((5,1,1)).shape=(5,14,7),t.repeat((5, 1, 1))[j].shape = (42，7)
                # 此时得到的t是由当前batch内(bs=2)的5个target相对于当前特征图的位置(xywh都是浮点数)通过3个anchor的个数去除不合适的和周围两个cell个数不断进行复制得到的，实际还是有五种值构成的
                t = t.repeat((5, 1, 1))[j]
                # torch.zeros_like(gxy)[None].shape = (1,14,2),off[:, None].shape = (5,1,2)
                # offsets.shape = (42,2),值为(0,0)14个,(0,0.5)5个,(0.5,0)8个,(0,-0.5)9个,(-0.5,0)6个
                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]
            else:
                t = targets[0]
                offsets = 0

            # Define
            # b表示所属batch内第几张图片，c为类别,b.shape=c.shape = (42,)
            b, c = t[:, :2].long().T  # image, class
            # gxy表示映射到特征图上的gt的中心坐标,gxy.shape = (42,2),(5*3-1)*3=42,其中只有五种gt对应的xy,然后重复到42个
            gxy = t[:, 2:4]  # grid xy
            # gwh表示映射到特征图上的gt的wh,gwh.shape = (42,2)
            gwh = t[:, 4:6]  # grid wh
            # gij.shape = (42,2)
            # 将gt的xy坐标移动offsets后向下取整得到整型网格的相对于三个cell的左上角坐标
            # 此时的gij表示的是当前batch内五个gt所对应的特征图网格上考虑周围两个cell在内的三个cell的左上角坐标，也就是正样本anchor的xy坐标，也就是索引。
            gij = (gxy - offsets).long()
            # gi.shape=gj.shape = (42,)
            gi, gj = gij.T  # grid xy indices

            # Append
            # a.shape = (42,)
            a = t[:, 6].long()  # anchor indices，取值0，1，2表示对应的anchor
            # indices[0][0].shape = indices[0][1].shape =indices[0][2].shape =indices[0][3].shape = (42,)
            # b取值[0,bs]，表示哪张图片上的
            indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indices
            # tbox[0].shape = (42,4),42种值，xy的值是相对于所处cell左上角坐标的偏移量，也就是tx,ty,wh值就是原gt映射到特征图上的xy,所以这是返回的正样本，参与计算。
            tbox.append(torch.cat((gxy - gij, gwh), 1))  # box
            # anch[0].shape = (42,2),只有3种值，复制到了42个，是预先设定好的anchor([10,13, 16,30, 33,23]  # P3/8)映射到特征图上的wh大小
            anch.append(anchors[a])  # anchors
            # tcls[0].shape = (42,),class
            tcls.append(c)  # class
        
        return tcls, tbox, indices, anch