基于YOLOv3-Tiny 的智能门铃的人体检测模型的实现（下）

最新推荐文章于 2025-07-14 08:00:48 发布

技术与健康

最新推荐文章于 2025-07-14 08:00:48 发布

阅读量206

点赞数 4

CC 4.0 BY-SA版权

文章标签： YOLO pytorch 人工智能

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.youkuaiyun.com/Practicer2015/article/details/149322022

承接基于YOLOv3-Tiny 的智能门铃的人体检测模型的实现（中）一文，为了提升模型进性能。本文主要介绍锚框聚类

锚框聚类是优化 YOLOv3-Tiny 模型性能的关键一步，因为它能确保模型学习到与数据集目标尺寸更匹配的先验框，从而提高检测精度和收敛速度。

这里将添加一个用于执行锚框聚类（K-means）的 Python 脚本。通过运行这个脚本来生成适合数据集的锚框，然后会将它们更新到 config.py 中。

锚框聚类脚本

这个脚本会遍历你的训练集标注文件，提取所有真实边界框的宽高，然后对这些宽高进行 K-means 聚类，以找到最佳的锚框。

# kmeans_anchors.py
import numpy as np
import os
from tqdm import tqdm
from config import Config # 导入Config以便获取数据路径

def iou(box, clusters):
    """
    Calculates the Intersection over Union (IoU) between a box and all clusters.
    box: (width, height)
    clusters: (N, 2) array of (width, height) cluster centroids
    """
    box_w, box_h = box
    cluster_w, cluster_h = clusters[:, 0], clusters[:, 1]

    # Intersection area
    inter_w = np.minimum(box_w, cluster_w)
    inter_h = np.minimum(box_h, cluster_h)
    inter_area = inter_w * inter_h

    # Union area
    box_area = box_w * box_h
    cluster_area = cluster_w * cluster_h
    union_area = box_area + cluster_area - inter_area

    return inter_area / (union_area + 1e-16) # Add epsilon to avoid division by zero

def kmeans(boxes, k, dist=np.median):
    """
    Performs K-means clustering to find optimal anchor boxes.
    boxes: (N, 2) array of (width, height) of ground truth boxes
    k: Number of clusters (anchors) to find
    dist: Distance metric function (e.g., np.mean, np.median for centroids)
    """
    rows = boxes.shape[0]
    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))

    # Initialize clusters randomly
    clusters = boxes[np.random.choice(rows, k, replace=False)]

    while True:
        for row in range(rows):
            # Calculate 1 - IoU as distance
            distances[row] = 1 - iou(boxes[row], clusters)
        
        # Assign each box to the closest cluster
        nearest_clusters = np.argmin(distances, axis=1)

        if (last_clusters == nearest_clusters).all():
            break # Convergence
        
        # Update centroids
        for cluster_idx in range(k):
            # Filter boxes belonging to current cluster
            assigned_boxes = boxes[nearest_clusters == cluster_idx]
            if len(assigned_boxes) > 0: # Avoid empty clusters
                clusters[cluster_idx] = dist(assigned_boxes, axis=0)
        
        last_clusters = nearest_clusters
    
    # Sort clusters by area (optional, but good practice)
    areas = clusters[:, 0] * clusters[:, 1]
    sorted_indices = np.argsort(areas)
    clusters = clusters[sorted_indices]

    return clusters

def get_wh_from_labels(label_dir):
    """
    Extracts width and height from all YOLO format label files.
    label_dir: Directory containing .txt label files.
    Returns: numpy array of (width, height) for all bounding boxes.
    """
    widths_heights = []
    for filename in tqdm(os.listdir(label_dir), desc="Loading bounding box dimensions"):
        if filename.endswith(".txt"):
            filepath = os.path.join(label_dir, filename)
            if os.path.getsize(filepath) > 0: # Ensure file is not empty
                with open(filepath, 'r') as f:
                    for line in f.readlines():
                        parts = list(map(float, line.strip().split()))
                        # YOLO format: class_id x_center y_center width height
                        width = parts[3]
                        height = parts[4]
                        widths_heights.append([width, height])
    return np.array(widths_heights)

if __name__ == "__main__":
    print("Starting anchor box clustering...")
    # 假设使用训练集数据进行聚类
    label_data = get_wh_from_labels(Config.TRAIN_LABEL_DIR)

    if len(label_data) == 0:
        print("No bounding box data found. Please check your label directory.")
    else:
        # YOLOv3-Tiny 通常使用 6 个锚框，分成 2 个尺度，每个尺度 3 个锚框
        # 因此，我们需要 k=6
        num_anchors = 6 
        
        # 进行 K-means 聚类
        # dist=np.mean 或 dist=np.median 都可以尝试
        print(f"Running K-means clustering with k={num_anchors}...")
        anchors_raw = kmeans(label_data, num_anchors)
        
        # 将锚框按面积从小到大排序，并根据 YOLOv3-Tiny 的输出层数量分配
        # YOLOv3-Tiny 有两个输出层 (P4/16 和 P5/32)，通常较小的锚框分配给 P4/16，较大的分配给 P5/32
        # P4/16 对应 Config.ANCHORS[1]，P5/32 对应 Config.ANCHORS[2]
        
        # 按面积排序
        sorted_anchors = sorted(anchors_raw.tolist(), key=lambda x: x[0] * x[1])
        
        print("\nGenerated anchors (normalized [0,1] scale, sorted by area):")
        for i, (w, h) in enumerate(sorted_anchors):
            print(f"  Anchor {i+1}: Width={w:.4f}, Height={h:.4f}")

        # 将锚框分配给YOLOv3-Tiny的两个尺度
        # 假设前3个是P4/16的小目标锚框，后3个是P5/32的大目标锚框
        # 你需要根据实际情况决定如何分配，通常是根据锚框的尺寸来判断
        # 在这里，我们将它们按从小到大分为两组，每组3个
        
        # 确保锚框是整数（像素）而不是归一化值，因为 YOLOLayer 内部会处理缩放
        # 然而，YOLO层输入的锚框是相对于**网格**的锚框，
        # 你的Config.ANCHORS是绝对像素值，因此这里也应该生成绝对像素值锚框
        # 再次确认：YOLOv3-Tiny YOLOLayer 期望的 `anchors` 是**绝对像素值**
        # 所以我们需要将归一化的锚框尺寸乘以 Config.IMAGE_SIZE
        
        # 如果训练输入图像大小是 416x416
        image_size = Config.IMAGE_SIZE 
        final_anchors = [[int(w * image_size), int(h * image_size)] for w, h in sorted_anchors]

        # 针对 YOLOv3-Tiny 的两个输出层
        # layer 1 (P4/16): 对应中等目标，通常是中间大小的锚框
        # layer 2 (P5/32): 对应大目标，通常是最大尺寸的锚框
        # YOLOv3-Tiny 的 Darknet-Tiny 架构中，route层（索引8）输出的特征图是 P4/16
        # 最后的特征图是 P5/32
        # 所以，YOLOLayer1 接收的是 P5/32 特征图，YOLOLayer2 接收的是 P4/16 特征图
        # 因此，YOLOLayer1 (P5/32) 应该对应最大的锚框
        # YOLOLayer2 (P4/16) 应该对应中等锚框
        # 这里的Config.ANCHORS结构是 [P3/8, P4/16, P5/32]，但YOLOv3-Tiny只有两个输出。
        # 实际上，YOLOv3-Tiny的输出层对应 Darknet-Tiny 的 `yolo` 层：
        # 第一个 yolo 层 (output_p5) 通常对应最大的锚框（对应原始YOLOv3的P5/32层）
        # 第二个 yolo 层 (output_p4) 通常对应中等大小的锚框（对应原始YOLOv3的P4/16层）
        
        # 因此，Config.ANCHORS 应该是 2 个子列表，每个子列表包含 3 个锚框
        # 或者，如果你的 YOLOLayer 能够处理不同数量的锚框，可以自行调整
        # 为了兼容 model.py 中的 YOLOLayer，我们将其分为两组，每组 3 个
        
        # 假设按面积从小到大排序的锚框是：a1, a2, a3, a4, a5, a6
        # P4/16 (中等目标): [a1, a2, a3] (或 a2, a3, a4)
        # P5/32 (大目标): [a4, a5, a6] (或 a4, a5, a6)
        
        # 这里为了简化，我们取最小的3个给P4/16 (YOLOLayer2), 最大的3个给P5/32 (YOLOLayer1)
        anchors_for_p4_16 = sorted(final_anchors[:3]) # 较小锚框给P4/16
        anchors_for_p5_32 = sorted(final_anchors[3:]) # 较大锚框给P5/32
        
        print("\nUpdate your Config.py with these anchors:")
        print(f"ANCHORS = [")
        # 这里 P3/8 是为了兼容 Config 的列表结构，实际 YOLOv3-Tiny 只有两个检测头
        # 所以这里第一个元素可以为空或者用一个占位符，或者直接调整 Config.ANCHORS 结构
        print(f"    [], # Placeholder for P3/8 if not used in model") 
        print(f"    {anchors_for_p4_16}, # For P4/16 scale (medium objects)")
        print(f"    {anchors_for_p5_32}, # For P5/32 scale (large objects)")
        print(f"]")
        print("\nRemember to verify the assignment of anchors to correct YOLOLayers based on feature map scale.")

如何使用锚框聚类脚本

保存脚本：将上述代码保存为 kmeans_anchors.py。
准备数据：确保你的训练集标注文件（YOLO 格式的 .txt 文件）位于 Config.TRAIN_LABEL_DIR 所指定的路径下。
运行脚本：在命令行中执行：
```
python kmeans_anchors.py
```
更新 config.py：脚本运行完毕后，它会在控制台打印出计算出的锚框。你需要将这些锚框复制并粘贴到 config.py 文件中 Config.ANCHORS 的位置。

特别注意：
- YOLOv3-Tiny 通常只有两个输出层（YOLOLayer）。在我的 model.py 实现中，self.yolo_layer1 对应 P5/32（大目标），self.yolo_layer2 对应 P4/16（中等目标）。
- 因此，当脚本输出锚框时，你应该将较大的3个锚框分配给 Config.ANCHORS 中与 P5/32 对应的部分，将较小的3个锚框分配给与 P4/16 对应的部分。
例如，如果 kmeans_anchors.py 输出的排好序的 6 个锚框是 [(a1_w, a1_h), ..., (a6_w, a6_h)]，那么：
```
# config.py
class Config:
    # ... 其他配置 ...
    ANCHORS = [
        [], # 占位符，因为 YOLOv3-Tiny 只有两个检测头，对应P4/16和P5/32
        [(a1_w, a1_h), (a2_w, a2_h), (a3_w, a3_h)],  # P4/16 (中等目标，通常是较小的3个锚框)
        [(a4_w, a4_h), (a5_w, a5_h), (a6_w, a6_h)],  # P5/32 (大目标，通常是较大的3个锚框)
    ] 
    # ... 其他配置 ...
```
请根据脚本的实际输出进行调整。

作用解释

匹配数据分布：默认的锚框是基于 COCO 数据集（一个通用对象检测数据集）的。智能门铃场景下的人体尺寸分布可能与 COCO 数据集有很大差异。通过聚类，锚框会更精确地匹配你数据集中人的实际宽高比和尺寸，从而减少模型学习这些几何形状的难度。
提高召回率和精度：当锚框与真实目标更吻合时，模型更容易预测出高质量的边界框，从而提高 IoU，进而提升召回率和平均精度（mAP）。
加速收敛：更好的锚框作为先验，使得模型在训练初期就能获得较好的预测，从而加速训练过程的收敛。

通过集成这个锚框聚类步骤，这个 YOLOv3-Tiny 模型将能更好地适应智能门铃场景下的特定人体尺寸，进一步提升其检测性能！