Selective Search算法-候选框生成

最新推荐文章于 2025-04-29 21:02:59 发布

X_Student737

最新推荐文章于 2025-04-29 21:02:59 发布

阅读量2.3k

点赞数 2

分类专栏：计算机视觉图像处理文章标签：算法 python 人工智能矩阵几何学

本文链接：https://blog.youkuaiyun.com/Twilight737/article/details/115869089

版权

计算机视觉图像处理专栏收录该内容

30 篇文章

订阅专栏

Selective Search算法—候选框生成

相比于滑动搜索策略，Selective Search算法采用启发式的方法，过滤掉图像中很多断裂的子区域，候选生成所需的目标区域（Region Proposal），计算效率大幅提升。

引论：学习算法前的问题思考

Ques：如何粗略地度量两张图片的相似度？

在这里插入图片描述
假设现在有5张图片，数学上可以采取一种什么样的度量方式，来计算图片之间的相似度呢？如何粗略地计算出后4张图片与第1张图片的相似度？

（1）从颜色上来度量。直觉上，第3张图片整体颜色偏黑，感觉与第1张图片是最相似的，而第4张图片整体都是淡蓝色，与第1张图片差异程度最大。

可以这么做：分别统计图像像素值在0-255的概率分布，拉成直方图的形式。假设每个通道分别拉成6维的直方图，最后就能拼接得到一个18维的颜色特征直方图。我们将两张图片的18维颜色特征直方图进行比对，将对应较小的值累加起来求和，最后就得到了一个颜色相似度的数学度量。

def metric_color_similarity(image1, image2):
    color_bin = 6

    color_hist1 = np.array([])
    color_hist2 = np.array([])

    for colour_channel in (0, 1, 2):
        c1 = image1[:, :, colour_channel]
        color_hist1 = np.concatenate([color_hist1] + [np.histogram(c1, color_bin, (0.0, 255.0))[0]])

        c2 = image2[:, :, colour_channel]
        color_hist2 = np.concatenate([color_hist2] + [np.histogram(c2, color_bin, (0.0, 255.0))[0]])

    color_hist1 = color_hist1 / sum(color_hist1)
    color_hist2 = color_hist2 / sum(color_hist2)

    color_sim = 0
    for i in range(len(color_hist1)):
        color_sim = color_sim + min(color_hist1[i], color_hist2[i])

    print(color_sim)

计算得到第2、3、4、5张图分别与第1张图的颜色相似度大小：

在这里插入图片描述

（2）从纹理上来度量。将rgb图片灰度化，提取LBP纹理特征图。统计纹理特征图像素值在0-255的概率分布，拉成直方图的形式。将两张图片的纹理特征直方图进行比对，将对应较小的值累加起来求和，最后就得到了一个纹理相似度的数学度量。

def metric_texture_similarity(image1, image2):
    gray_image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)
    gray_image2 = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)

    tex_img1 = skimage.feature.local_binary_pattern(gray_image1, 8, 1.0)
    tex_img2 = skimage.feature.local_binary_pattern(gray_image2, 8, 1.0)

    texture_bin = 20
    texture_hist1 = np.histogram(tex_img1.flatten(), texture_bin, (0.0, 255.0))[0]
    texture_hist2 = np.histogram(tex_img2.flatten(), texture_bin, (0.0, 255.0))[0]

    p_hist1 = texture_hist1/sum(texture_hist1)
    p_hist2 = texture_hist2/sum(texture_hist2)

    similarity = 0
    for i in range(texture_bin):
        similarity = similarity + min(p_hist1[i], p_hist2[i])

    print(similarity)

上面5张图提取得到的LBP纹理特征图如下：

在这里插入图片描述

计算得到第2、3、4、5张图分别与第1张图的纹理相似度大小：

在这里插入图片描述
这些度量方式虽然粗略，但的确可以作为图片相似度的一种简单计算方式。

一、Selective Search算法实现步骤

第1步：利用felzenszwalb算法对rgb图像进行过度预分割。

假设我们原始输入一张（250，250，3）的rgb图像，如下图所示：

在这里插入图片描述
调用skimage.segmentation.felzenszwalb函数进行预分割，分割结果如下图所示：

在这里插入图片描述
原始250*250=62500个像素点被分割成了915个类别。

第2步：创建字典集合region，含有915个元素。其中每个元素的键记为label，对应8个值（该label下所有像素点的min_x、min_y、max_x、max_y、类别标号label、像素点个数size 、颜色统计直方图、纹理统计直方图）。

计算min_x、min_y、max_x、max_y时，设置min的初始值为inf，max的初始值为0，遍历每个样本点，依次更新。

计算颜色统计直方图、纹理统计直方图时，将rgb像素点、LBP纹理图像素点拉成直方图的形式，再做归一化处理。

最后得到字典集合样本点的形式为：

在这里插入图片描述
第3步：创建相邻对集合neighbour_couple ，含有2429个相邻对。

对于字典集合region中的915个元素，两两进行比较，根据每个类别区域的min_x、min_y、max_x、max_y，判断这两个区域是不是相邻。如果相邻，就把region中的元素r1、r2，以（r1，r2）的形式append到neighbour_couple中。

最后得到neighbour_couple的结果为：

在这里插入图片描述
第4步：创建相似度字典集合sim_dictionary。对neighbour_couple中的2429个相邻对，分别计算它们的相似度，并以（i，j）：sim的形式，添加进入sim_dictionary。

计算区域i、区域j的相似度时，利用了下面4种相似度度量公式：

（1）颜色相似度

在这里插入图片描述
（2）纹理相似度

（3）大小相似度

大小是指区域中包含像素点的个数，计算方式是总体减去两个像素和占全图像像素比例，这样可以尽量让小的区域先合并，避免某个大区域对周围小区域进行吞并。
在这里插入图片描述
（4）形状相似度

形状相似度主要是为了衡量两个区域是否更加“吻合”，其指标是合并后能够框住区域的最小矩形和原始两图像大小和的差越小，其吻合度越高。
在这里插入图片描述
最后将四种相似度累加起来，作为区域（i，j）之间的相似度度量。

在这里插入图片描述
这是某些邻近区域的相似度计算结果，可以看出还是比较合理的。

第5步：找出集合sim_dictionary中相似度最大的区域对（i，j），进行融合，标记为新的区域t，添加进入region集合中。删除neighbour_couple与i、j邻近的区域对，更新为与t邻近的区域对。

区域t更新后，键标记为原先最大的label值+1，而8个值更新公式如下：

在这里插入图片描述
当sim_dictionary集合中所有的邻近区域都融合完毕后，region中不再有新的区域加入，此时整个区域融合过程结束。由计算可以得到，原先felzenszwalb算法分割后只得到915个区域，通过区域不断融合添加入新区域后，最后总共得到2429个区域。

第6步：对融合后的region集合，取出每个区域的min_x、min_y、max_x、max_y，二次筛选后，得到的就是我们的候选区域。

在这里插入图片描述
我们将这些位置对应的图片区域裁减出来，得到的是如下候选区域：

在这里插入图片描述

由结果可以看出，对于目标检测中的人脸检测问题，我们利用Selective Search算法的确可以筛选得到目标人脸区域。

二、Selective Search算法流程图

在这里插入图片描述

三、代码

import cv2
import numpy as np
import skimage.segmentation
import random
import skimage.feature


# Selective Search algorithm

# step 1: calculate the first fel_segment region
# step 2: calculate the neighbour couple
# step 3: calculate the similarity dictionary
# step 4: merge regions and calculate the second merged region
# step 5: obtain e target candidate regions by secondary screening


def intersect(a, b):
    if (a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]) or \
            (a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or \
            (a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or \
            (a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]):
        return True
    return False


def calc_similarity(r1, r2, size):

    sim1 = 0
    sim2 = 0
    for a, b in zip(r1["hist_c"], r2["hist_c"]):
        sim1 = sim1 + min(a, b)
    for a, b in zip(r1["hist_t"], r2["hist_t"]):
        sim2 = sim2 + min(a, b)
    sim3 = 1.0 - (r1["size"] + r2["size"]) / size
    rect_size = (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"])) * \
             (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
    sim4 = 1.0 - (rect_size - r1["size"] - r2["size"]) / size
    similarity = sim1 + sim2 + sim3 + sim4

    return similarity


def merge_region(r1, r2, t):
    new_size = r1["size"] + r2["size"]
    r_new = {
        "min_x": min(r1["min_x"], r2["min_x"]),
        "min_y": min(r1["min_y"], r2["min_y"]),
        "max_x": max(r1["max_x"], r2["max_x"]),
        "max_y": max(r1["max_y"], r2["max_y"]),
        "size": new_size,
        "hist_c": (
            r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
        "hist_t": (
            r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
        "labels": t
    }
    return r_new


# Step 1: Calculate the different categories segmented by felzenszwalb algorithm

def first_calc_fel_category(image, scale, sigma, min_size):

    fel_mask = skimage.segmentation.felzenszwalb(image, scale=scale, sigma=sigma, min_size=min_size)
    print('The picture has been segmented in these categories : ', np.max(fel_mask))   # 0-694 categories

    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)    # (250, 250)
    texture_img = skimage.feature.local_binary_pattern(gray_image, 8, 1.0)    # (250, 250)

    # fel_img = np.zeros((fel_mask.shape[0], fel_mask.shape[0], 3))
    # for i in range(np.max(fel_mask)):
    #     a = random.randint(0, 255)
    #     b = random.randint(0, 255)
    #     c = random.randint(0, 255)
    #     for j in range(fel_mask.shape[0]):
    #         for k in range(fel_mask.shape[1]):
    #             if fel_mask[j, k] == i:
    #                 fel_img[j, k, 0] = a
    #                 fel_img[j, k, 1] = b
    #                 fel_img[j, k, 2] = c
    #
    # cv2.namedWindow("image")
    # cv2.imshow('image', fel_img/255)
    # cv2.waitKey(0)
    # cv2.imwrite('felzenszwalb_img.jpg', fel_img)

    img_append = np.zeros((fel_mask.shape[0], fel_mask.shape[1], 4))  # (250, 250, 4)
    img_append[:, :, 0:3] = image
    img_append[:, :, 3] = fel_mask

    region = {}

    # calc the min_x、in_y、max_x、max_y、label in every category
    for y, i in enumerate(img_append):
        for x, (r, g, b, l) in enumerate(i):
            if l not in region:
                region[l] = {"min_x": 0xffff, "min_y": 0xffff, "max_x": 0, "max_y": 0, "labels": l}
            if region[l]["min_x"] > x:
                region[l]["min_x"] = x
            if region[l]["min_y"] > y:
                region[l]["min_y"] = y
            if region[l]["max_x"] < x:
                region[l]["max_x"] = x
            if region[l]["max_y"] < y:
                region[l]["max_y"] = y

    for k, v in list(region.items()):

        # calc the size feature in every category
        masked_color = image[:, :, :][img_append[:, :, 3] == k]
        region[k]["size"] = len(masked_color)

        # calc the color feature in every category
        color_bin = 6
        color_hist = np.array([])

        for colour_channel in (0, 1, 2):
           c = masked_color[:, colour_channel]
           color_hist = np.concatenate([color_hist] + [np.histogram(c, color_bin, (0.0, 255.0))[0]])

        color_hist = color_hist / sum(color_hist)
        region[k]["hist_c"] = color_hist

        # calc the texture feature in every category
        texture_bin = 10
        masked_texture = texture_img[:, :][img_append[:, :, 3] == k]
        texture_hist = np.histogram(masked_texture, texture_bin, (0.0, 255.0))[0]
        texture_hist = texture_hist / sum(texture_hist)
        region[k]["hist_t"] = texture_hist

    return region


# Step 2: Calculate the neighbour couple in the first fel_segment region

def calc_neighbour_couple(region):
    r = list(region.items())
    couples = []

    for cur, a in enumerate(r[:-1]):
        for b in r[cur + 1:]:
            if intersect(a[1], b[1]):
                couples.append((a, b))

    return couples


# Step 3: Calculate the sim_dictionary in the neighbour couple

def calc_sim_dictionary(couple, total_size):

    sim_dictionary = {}

    for (ai, ar), (bi, br) in couple:
        sim_dictionary[(ai, bi)] = calc_similarity(ar, br, total_size)

    return sim_dictionary


# step 4: merge the small regions and calculate the second merged region

def second_calc_merge_category(sim_dictionary, region,  total_size):

    while sim_dictionary != {}:
        i, j = sorted(sim_dictionary.items(), key=lambda i: i[1])[-1][0]
        t = max(region.keys()) + 1.0

        region[t] = merge_region(region[i], region[j], t)
        key_to_delete = []
        for k, v in list(sim_dictionary.items()):
            if (i in k) or (j in k):
                key_to_delete.append(k)
        for k in key_to_delete:
            del sim_dictionary[k]

        for k in [a for a in key_to_delete if a != (i, j)]:
            n = k[1] if k[0] in (i, j) else k[0]
            sim_dictionary[(t, n)] = calc_similarity(region[t], region[n], total_size)

    return region


# step 5: obtain the target candidate regions by secondary screening

def calc_candidate_box(second_region, total_size):
    category = []
    for k, r in list(second_region.items()):
        category.append({'rect': (r['min_x'], r['min_y'], r['max_x'], r['max_y']), 'size': r['size']})

    candidate_box = set()
    for r in category:
        if r['rect'] in candidate_box:
            continue

        if r['size'] > total_size / 4:
            continue

        if r['size'] < total_size / 36:
            continue

        x1, y1, x2, y2 = r['rect']

        if (x2-x1) == 0 or (y2-y1) == 0:
            continue

        if (y2-y1) / (x2-x1) > 1.5 or (x2-x1) / (y2-y1) > 1.5:
            continue

        candidate_box.add(r['rect'])

    return candidate_box


img = cv2.imread('/home/archer/CODE/PF/162.jpg')
total_size = img.shape[0] * img.shape[1]
print('The shape of the image is : ', img.shape)    # (250, 250, 3)

first_region = first_calc_fel_category(img, scale=20, sigma=0.9, min_size=10)
print('first segment categories: ', len(first_region))

neighbour_couple = calc_neighbour_couple(first_region)
print('first neighbour_couple : ', len(neighbour_couple))

sim_dictionary = calc_sim_dictionary(neighbour_couple, total_size)

second_region = second_calc_merge_category(sim_dictionary, first_region, total_size)
print('second merge categories: ', len(second_region))

candidate_box = calc_candidate_box(second_region, total_size)
print('the candidate box we got by the selective search algorithm ： ')

flag = 1
for (x1, y1, x2, y2) in candidate_box:
    select_img = img[y1:y2, x1:x2]
    print(x1, y1, x2, y2)
    # cv2.namedWindow("select_image")
    # cv2.imshow("select_image", select_img)
    # cv2.waitKey(0)
    img_path ='/home/archer/CODE/PF/selective/' + str(flag) + '.jpg'
    cv2.imwrite(img_path, select_img)
    flag = flag + 1