Residual network , YOLO

Residual Network:https://blog.youkuaiyun.com/koala_tree/article/details/78583979

以下是YOLO部分:

1、模型细节:(来自:https://blog.youkuaiyun.com/koala_tree/article/details/78690396

- The input is a batch of images of shape (m, 608, 608, 3) 
- The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers (p_{c}, b_{x}, b_{y}, b_{h}, b_{w},c) as explained above. If you expand c into an 80-dimensional vector, each bounding box is then represented by 85 numbers.(在这里,一共有80个不同的类别)

We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).

If the center/midpoint of an object falls into a grid cell, that grid cell is responsible for detecting that object.

Anchor boxes are defined only by their width and height.

 

For simplicity, we will flatten the last two last dimensions of the shape (19, 19, 5, 85) encoding. So the output of the Deep CNN is (19, 19, 425).

Now, for each box (of each cell) we will compute the following elementwise product and extract a probability that the box contains a certain class.(softmax)

Here’s one way to visualize what YOLO is predicting on an image: 
- For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across both the 5 anchor boxes and across different classes). 
- Color that grid cell according to what object that grid cell considers the most likely.

Doing this results in this picture:

IOU:

- In this exercise only, we define a box using its two corners (upper left and lower right): (x1, y1, x2, y2) rather than the midpoint and height/width. 
- To calculate the area of a rectangle you need to multiply its height (y2 - y1) by its width (x2 - x1) 
- You’ll also need to find the coordinates (xi1, yi1, xi2, yi2) of the intersection of two boxes. Remember that: 
- xi1 = maximum of the x1 coordinates of the two boxes 
- yi1 = maximum of the y1 coordinates of the two boxes 
- xi2 = minimum of the x2 coordinates of the two boxes 
- yi2 = minimum of the y2 coordinates of the two boxes

we use the convention that (0,0) is the top-left corner of an image, (1,0) is the upper-right corner, and (1,1) the lower-right corner.

def iou(box1, box2):
    """Implement the intersection over union (IoU) between box1 and box2

    Arguments:
    box1 -- first box, list object with coordinates (x1, y1, x2, y2)
    box2 -- second box, list object with coordinates (x1, y1, x2, y2)
    """

    # Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.
    xi1 = max(box1[0], box2[0])
    yi1 = max(box1[1], box2[1])
    xi2 = min(box1[2], box2[2])
    yi2 = min(box1[3], box2[3])
    inter_area = (xi2 - xi1) * (yi2 - yi1) 

    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
    box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union_area = box1_area + box2_area - inter_area

    # compute the IoU
    iou = inter_area / union_area


    return iou

非极大值抑制函数:

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes

    Arguments:
    scores -- tensor of shape (None,), output of yolo_filter_boxes()
    boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
    classes -- tensor of shape (None,), output of yolo_filter_boxes()
    max_boxes -- integer, maximum number of predicted boxes you'd like
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

    Returns:
    scores -- tensor of shape (, None), predicted score for each box
    boxes -- tensor of shape (4, None), predicted box coordinates
    classes -- tensor of shape (, None), predicted class for each box

    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """

    max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
    K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor

    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    ### START CODE HERE ### (≈ 1 line)
    nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold, name=None)
    ### END CODE HERE ###

    # Use K.gather() to select only nms_indices from scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = K.gather(scores, nms_indices)
    boxes = K.gather(boxes, nms_indices)
    classes = K.gather(classes, nms_indices)
    ### END CODE HERE ###

    return scores, boxes, classes

一个步骤的整合:

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
    Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.

    Arguments:
    yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
                    box_confidence: tensor of shape (None, 19, 19, 5, 1)
                    box_xy: tensor of shape (None, 19, 19, 5, 2)
                    box_wh: tensor of shape (None, 19, 19, 5, 2)
                    box_class_probs: tensor of shape (None, 19, 19, 5, 80)
    image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
    max_boxes -- integer, maximum number of predicted boxes you'd like
    score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None,), predicted class for each box
    """

    ### START CODE HERE ### 

    # Retrieve outputs of the YOLO model (≈1 line)
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs

    # Convert boxes to be ready for filtering functions 
    boxes = yolo_boxes_to_corners(box_xy, box_wh)

    # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)

    # Scale boxes back to original image shape.
    boxes = scale_boxes(boxes, image_shape)

    # Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)

    ### END CODE HERE ###

    return scores, boxes, classes

最终返回scores,boxes, classes的信息。

相当于是通过已经训练好到YOLO网络,我们来做一个后期到非极大值抑制的处理,最终得到想要的boxes信息。并且输出,以及在原始图片上将box绘制出来。

summary for YOLO:

- Input image (608, 608, 3) 
- The input image goes through a CNN, resulting in a (19,19,5,85) dimensional output. 
- After flattening the last two dimensions, the output is a volume of shape (19, 19, 425): 
- Each cell in a 19x19 grid over the input image gives 425 numbers. 
- 425 = 5 x 85 because each cell contains predictions for 5 boxes, corresponding to 5 anchor boxes, as seen in lecture. 
- 85 = 5 + 80 where 5 is because (P_{c}, b_{x}, b_{y}, b_{h}, b_{w}) has 5 numbers, and and 80 is the number of classes we’d like to detect 
- You then select only few boxes based on: 
- Score-thresholding: throw away boxes that have detected a class with a score less than the threshold 
- Non-max suppression: Compute the Intersection over Union and avoid selecting overlapping boxes 
- This gives you YOLO’s final output.

 

 

 

### 关于ResNet和YOLO深度学习模型 #### ResNet原理与实现 残差网络(Residual Network,简称ResNet)解决了深层神经网络训练过程中梯度消失的问题。通过引入跳跃连接(skip connections),使得网络可以更有效地传递梯度信息。这种结构允许构建非常深的卷积神经网络,在ImageNet竞赛中取得了优异的成绩[^1]。 ```python import torch.nn as nn class BasicBlock(nn.Module): expansion = 1 def __init__(self, in_planes, planes, stride=1): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(planes) self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(planes) self.shortcut = nn.Sequential() if stride != 1 or in_planes != self.expansion*planes: self.shortcut = nn.Sequential( nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(self.expansion*planes) ) def forward(self, x): out = nn.ReLU()(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += self.shortcut(x) out = nn.ReLU()(out) return out ``` #### YOLO原理与实现 YOLO系列的目标检测框架将整个图像划分为S×S网格,并让每个网格负责预测B个边界框以及这些框所属类别的置信度分数。对于YOLOv1而言,其核心在于统一了物体分类和定位两个任务到同一个全连接层输出的空间中去处理。 ```python def yolo_loss(pred_boxes, pred_confidence, target_boxes, target_confidence): """ 计算YOLO损失函数. 参数: pred_boxes (Tensor): 预测的边界框坐标. pred_confidence (Tensor): 对应边界的置信度得分. target_boxes (Tensor): 真实标签中的边界框坐标. target_confidence (Tensor): 真实标签对应的置信度得分. 返回: loss (float): 总体损失值. """ # 定义不同部分权重 coord_weight = 5.0 noobj_weight = .5 # ...省略具体计算过程... total_loss = ... return total_loss ``` #### 应用案例 在实际应用场景方面,ResNet被广泛应用于各种计算机视觉任务之中,比如图像分类、目标跟踪等;而YOLO则以其快速实时的特点著称,适用于自动驾驶汽车环境感知系统、安防监控视频流分析等领域[^2]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值