一、IoU
1. 概念
在目标检测任务中,IoU被广泛使用,它反映预测bbox与真实bbox的重叠程度。具体来说,是预测bbox与真实bbox的交集与并集的比,该比值越大预测效果越好。
2. 计算

import torch
def IoU(box1, box2, x1y1x2y2=False):
# box1.shape = 4 box2.shape = 4
# return iou.shape = 1
if x1y1x2y2:
b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
else:
b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
cap_x1, cap_y1 = torch.max(b1_x1, b2_x1), torch.max(b1_y1, b2_y1) # 交集左上坐标,取两框左上坐标的大值,请注意图像左上角坐标为(0,0)
cap_x2, cap_y2 = torch.min(b1_x2, b2_x2), torch.min(b1_y2, b2_y2) # 交集右下坐标
in_area = torch.clamp(cap_x2 - cap_x1, min=0) * torch.clamp(cap_y2 - cap_y1, min=0) # clamp 限制最小为0,此时不相交,面积为0
un_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1) + (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
return in_area / torch.clamp(un_area - in_area, min=1e-10) # 避免溢出
if __name__ == "__main__":
boxs1 = torch.tensor([1, 1, 3, 3])
boxs2 = torch.tensor([2, 2, 4, 4])
print(IoU(boxs1, boxs2, x1y1x2y2=True))
3. 问题
当1-IoU作为损失时,出现以下问题:
(1) 所有两框没相交的情况IoU均为0,损失均为1。此时,我们希望IoU同为0的不同情况下产生的损失有差异,比如两框距离越远损失越大。
(2) 所有两框相交且面积相同的情况IoU和损失均分别相同。此时,我们希望重合效果更好的拥有更小的损失。
如下图,前者应好于后者:

二、GIoU
1. 概念
针对上述问题,在IoU基础上,增加最小凸集概念,即包含两框的最小框。最小凸集减去并集的面积占最小凸集的比值越小预测效果越好。
2. 计算
如图所示, G I o U = I o U − C − B C GIoU=IoU-\frac{C-B}{C} GIoU=IoU−CC−B其中 G I o U ∈ ( − 1 , 1 ] GIoU\in(-1,1] GIoU∈(−1,1], C C C为最小凸集, B B B为并集。

import torch
def GIoU(box1, box2, x1y1x2y2=False):
# box1.shape = 4 box2.shape = 4
# return iou.shape = 1
if x1y1x2y2:
b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
else:
b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
cap_x1, cap_y1 = torch.max(b1_x1, b2_x1), torch.max(b1_y1, b2_y1)
cap_x2, cap_y2 = torch.min(b1_x2, b2_x2), torch.min(b1_y2, b2_y2)
in_area = torch.clamp(cap_x2 - cap_x1, min=0) * torch.clamp(cap_y2 - cap_y1, min=0) # clamp 限制最小为0
un_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1) + (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
iou = in_area / torch.clamp(un_area - in_area, min=1e-10)
C_x1, C_y1 = torch.min(b1_x1, b2_x1), torch.min(b1_y1, b2_y1)
C_x2, C_y2 = torch.max(b1_x2, b2_x2), torch.max(b1_y2, b2_y2)
C_area = (C_x2 - C_x1) * (C_y2 - C_y1)
return iou - (C_area - (un_area - in_area)) / torch.clamp(C_area, min=1e-10)
if __name__ == "__main__":
boxs1 = torch.tensor([1, 1, 3, 3])
boxs2 = torch.tensor([2, 2, 4, 4])
print(GIoU(boxs1, boxs2, x1y1x2y2=True))
3. 问题
当1-GIoU作为损失时,存在以下问题:
(1) 如下图,当两框在不同位置重叠时,损失无差异。

(2) 收敛速度较慢。
三、DIoU
1. 概念
为解决上述问题,DIoU不再使用非重叠区域占比作为度量,而是引入中心点距离。两个框的重叠面积越大、中心点距离越近预测效果越好。因为直接最小化两框的距离,所以GIoU收敛速度更快。
2. 计算
如图所示, D I o U = I o U − d 2 c 2 DIoU=IoU-\frac{d^2}{c^2} DIoU=IoU−c2d2其中 D I o U ∈ ( − 1 , 1 ] DIoU\in(-1,1] DIoU∈(−1,1], d d d为预测框与真实框的中心点距离, c c c为最小凸集的对角线距离。

import torch
def DIoU(box1, box2, x1y1x2y2=False):
# box1.shape = 4 box2.shape = 4
# return iou.shape = 1
if x1y1x2y2:
b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
else:
b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
cap_x1, cap_y1 = torch.max(b1_x1, b2_x1), torch.max(b1_y1, b2_y1)
cap_x2, cap_y2 = torch.min(b1_x2, b2_x2), torch.min(b1_y2, b2_y2)
in_area = torch.clamp(cap_x2 - cap_x1, min=0) * torch.clamp(cap_y2 - cap_y1, min=0) # clamp 限制最小为0
un_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1) + (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
iou = in_area / torch.clamp(un_area - in_area, min=1e-10)
b1_cx, b1_cy = (b1_x1 + b1_x2) / 2, (b1_y1 + b1_y2) / 2
b2_cx, b2_cy = (b2_x1 + b2_x2) / 2, (b2_y1 + b2_y2) / 2
d2 = (b2_cx - b1_cx)**2 + (b2_cy - b1_cy)**2
C_x1, C_y1 = torch.min(b1_x1, b2_x1), torch.min(b1_y1, b2_y1)
C_x2, C_y2 = torch.max(b1_x2, b2_x2), torch.max(b1_y2, b2_y2)
c2 = (C_x2 - C_x1)**2 + (C_y2 - C_y1)**2
return iou - d2 / torch.clamp(c2, min=1e-10)
if __name__ == "__main__":
boxs1 = torch.tensor([1, 1, 3, 3])
boxs2 = torch.tensor([2, 2, 4, 4])
print(DIoU(boxs1, boxs2, x1y1x2y2=True))
3. 问题
bbox的回归效果有三个重要依据:重叠面积、中心点距离、长宽比。1-DIoU作为损失时,未考虑预测框的长宽比与真实框的长宽比的匹配度。如下图,我们希望状态1拥有更小的损失。

三、CIoU
1. 概念
为解决上述问题,CIoU引入长宽比。综合重叠面积、中心点距离、长宽比评估bbox的回归效果。
2. 计算
C I o U = I o U − d 2 c 2 − α × v CIoU=IoU-\frac{d^2}{c^2}-\alpha\times v CIoU=IoU−c2d2−α×v其中, v = 4 π 2 ( arctan w g t h g t − arctan w h ) 2 v=\frac{4}{\pi^2}(\arctan\frac{w^{gt}}{h^{gt}}-\arctan\frac{w}{h})^2 v=π24(arctanhgtwgt−arctanhw)2其中, arctan \arctan arctan图像如下,在大于0部分最大为 π 2 \frac{\pi}{2} 2π最小为0,所以 v v v的取值范围在 ( 0 , 1 ) (0,1) (0,1),长宽比越接近, v v v越小。

另外, α = v 1 − I o U + v \alpha=\frac{v}{1-IoU+v} α=1−IoU+vv其中, x a + x \frac{x}{a+x} a+xx的图像(左图 a = 0.1 a=0.1 a=0.1,右图 a = 0.9 a=0.9 a=0.9)如下,所以 I o U IoU IoU越大, 1 − I o U 1-IoU 1−IoU越小, α \alpha α越大。


综合来看, a × v a\times v a×v中 v v v迫使预测框与真实框长宽比更接近, α \alpha α迫使两框重叠越多,即IoU越大时,长宽比越应该更接近(相同长宽比时,IoU越大,损失越大)。
import torch
import math
def CIoU(box1, box2, x1y1x2y2=False):
# box1.shape = 4 box2.shape = 4
# return iou.shape = 1
if x1y1x2y2:
b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
else:
b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2
b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2
b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2
b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2
cap_x1, cap_y1 = torch.max(b1_x1, b2_x1), torch.max(b1_y1, b2_y1)
cap_x2, cap_y2 = torch.min(b1_x2, b2_x2), torch.min(b1_y2, b2_y2)
in_area = torch.clamp(cap_x2 - cap_x1, min=0) * torch.clamp(cap_y2 - cap_y1, min=0) # clamp 限制最小为0
un_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1) + (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
iou = in_area / torch.clamp(un_area - in_area, min=1e-10)
b1_cx, b1_cy = (b1_x1 + b1_x2) / 2, (b1_y1 + b1_y2) / 2
b2_cx, b2_cy = (b2_x1 + b2_x2) / 2, (b2_y1 + b2_y2) / 2
d2 = (b2_cx - b1_cx)**2 + (b2_cy - b1_cy)**2
C_x1, C_y1 = torch.min(b1_x1, b2_x1), torch.min(b1_y1, b2_y1)
C_x2, C_y2 = torch.max(b1_x2, b2_x2), torch.max(b1_y2, b2_y2)
c2 = (C_x2 - C_x1)**2 + (C_y2 - C_y1)**2
b1_w, b1_h = b1_x2 - b1_x1, b1_y2 - b1_y1
b2_w, b2_h = b2_x2 - b2_x1, b2_y2 - b2_y1
v = 4 / math.pi ** 2 * (torch.atan(b1_w / torch.clamp(b1_h, min=1e-10)) - torch.atan(b2_w / torch.clamp(b2_h, min=1e-10))) ** 2
alpha = v / torch.clamp(1 - iou + v, min=1e-10)
return iou - d2 / torch.clamp(c2, 1e-10) - alpha * v
if __name__ == "__main__":
boxs1 = torch.tensor([2, 2, 4, 4])
boxs2 = torch.tensor([1, 1, 3, 3])
print(CIoU(boxs1, boxs2, x1y1x2y2=True))
3. 问题
考虑因素全面,但计算复杂度上升。
致谢:
本博客仅做记录使用,无任何商业用途,参考内容如下:
优化改进YOLOv5算法之添加GIoU、DIoU、CIoU、EIoU、Wise-IoU模块(超详细)
【IOU全系列】IOU GIOU DIOU CIOU 代码公式详细解读