边框回归(Bounding Box Regression)详解

最新推荐文章于 2021-12-29 10:11:39 发布

yllifesong

最新推荐文章于 2021-12-29 10:11:39 发布

阅读量3.1k

点赞数 1

分类专栏：机器学习

机器学习专栏收录该内容

1 篇文章

订阅专栏

Bounding-Box regression

最近一直看检测有关的Paper, 从rcnn， fast rcnn, faster rcnn, yolo, r-fcn, ssd，到今年cvpr最新的yolo9000。这些paper中损失函数都包含了边框回归，除了rcnn详细介绍了，其他的paper都是一笔带过，或者直接引用rcnn就把损失函数写出来了。前三条网上解释比较多，后面的两条我看了很多paper，才得出这些结论。

为什么要边框回归？
什么是边框回归？
边框回归怎么做的？
边框回归为什么宽高，坐标会设计这种形式？
为什么边框回归只能微调，在离Ground Truth近的时候才能生效？

为什么要边框回归？

这里引用王斌师兄的理解，如下图所示：

对于上图，绿色的框表示Ground Truth, 红色的框为Selective Search提取的Region Proposal。那么即便红色的框被分类器识别为飞机，但是由于红色的框定位不准(IoU<0.5)，那么这张图相当于没有正确的检测出飞机。如果我们能对红色的框进行微调，使得经过微调后的窗口跟Ground Truth 更接近，这样岂不是定位会更准确。确实，Bounding-box regression 就是用来微调这个窗口的。

边框回归是什么？

继续借用师兄的理解：对于窗口一般使用四维向量(x,y,w,h)” role=”presentation”>(x,y,w,h)。

边框回归的目的既是：给定(Px,Py,Pw,Ph)” role=”presentation”>(Px,Py,Pw,Ph)

边框回归怎么做的？

那么经过何种变换才能从图 2 中的窗口 P 变为窗口G^” role=”presentation”>G^呢？比较简单的思路就是: 平移+尺度放缩

先做平移(Δx,Δy)” role=”presentation”>(Δx,Δy)
然后再做尺度缩放(Sw,Sh)” role=”presentation”>(Sw,Sh)

观察(1)-(4)我们发现，边框回归学习就是dx(P),dy(P),dw(P),dh(P)” role=”presentation”>dx(P),dy(P),dw(P),dh(P)这四个变换。下一步就是设计算法那得到这四个映射。

线性回归就是给定输入的特征向量 X, 学习一组参数 W, 使得经过线性回归后的值跟真实值 Y(Ground Truth)非常接近. 即Y≈WX” role=”presentation”>Y≈WX 。那么 Bounding-box 中我们的输入以及输出分别是什么呢？

Input:

RegionProposal→P=(Px,Py,Pw,Ph)” role=”presentation”>RegionProposal→P=(Px,Py,Pw,Ph))

Output:

需要进行的平移变换和尺度缩放 dx(P),dy(P),dw(P),dh(P)” role=”presentation”>dx(P),dy(P),dw(P),dh(P) 。
这也就是 R-CNN 中的(6)~(9)：

tx=(Gx&#x2212;Px)/Pw,(6)” role=”presentation”> t x = (G x - P x) / P w, (6)

ty=(Gy&#x2212;Py)/Ph,(7)” role=”presentation”> t y = (G y - P y) / P h, (7)

tw=log&#x2061;(Gw/Pw),(8)” role=”presentation”> t w = log (G w / P w), (8)

th=log&#x2061;(Gh/Ph),(9)” role=”presentation”> t h = log (G h / P h), (9)

那么目标函数可以表示为 d∗(P)=w∗TΦ5(P)” role=”presentation”>d∗(P)=wT∗Φ5(P)差距最小，得到损失函数为：

Loss=&#x2211;iN(t&#x2217;i&#x2212;w&#x005E;&#x2217;T&#x03D5;5(Pi))2” role=”presentation”> L o s s = \sum i N (t i * - w^T * ϕ 5 (P i)) 2

函数优化目标为：

W&#x2217;=argminw&#x2217;&#x2211;iN(t&#x2217;i&#x2212;w&#x005E;&#x2217;T&#x03D5;5(Pi))2+&#x03BB;||w&#x005E;&#x2217;||2” role=”presentation”> W * = a r g m i n w * \sum i N (t i * - w^T * ϕ 5 (P i)) 2 + λ | | w^* | | 2

利用梯度下降法或者最小二乘法就可以得到 w∗” role=”presentation”>w∗。

为什么宽高尺度会设计这种形式？

这边我重点解释一下为什么设计的tx,ty” role=”presentation”>tx,ty会有log形式！！！

首先CNN具有尺度不变性，以图3为例：

x,y 坐标除以宽高

上图的两个人具有不同的尺度，因为他都是人，我们得到的特征相同。假设我们得到的特征为ϕ1,ϕ2” role=”presentation”>ϕ1,ϕ2。也就是说同一个x对应多个y，这明显不满足函数的定义。边框回归学习的是回归函数，然而你的目标却不满足函数定义，当然学习不到什么。

宽高坐标Log形式

我们想要得到一个放缩的尺度，也就是说这里限制尺度必须大于0。我们学习的tw,th” role=”presentation”>tw,th怎么保证满足大于0呢？直观的想法就是EXP函数，如公式(3), (4)所示，那么反过来推导就是Log函数的来源了。

为什么IoU较大，认为是线性变换？

当输入的 Proposal 与 Ground Truth 相差较小时(RCNN 设置的是 IoU>0.6)，可以认为这种变换是一种线性变换，那么我们就可以用线性回归来建模对窗口进行微调，否则会导致训练的回归模型不 work（当 Proposal跟 GT 离得较远，就是复杂的非线性问题了，此时用线性回归建模显然不合理）。这里我来解释：

Log函数明显不满足线性函数，但是为什么当Proposal 和Ground Truth相差较小的时候，就可以认为是一种线性变换呢？大家还记得这个公式不？参看高数1。

limx=0log(1+x)=x” role=”presentation”> l i m x = 0 l o g (1 + x) = x

现在回过来看公式(8):

tw=log&#x2061;(Gw/Pw)=log(Gw+Pw&#x2212;PwPw)=log(1+Gw&#x2212;PwPw)” role=”presentation”> t w = log (G w / P w) = l o g (G w + P w - P w P w) = l o g (1 + G w - P w P w)

当且仅当Gw−Pw” role=”presentation”>Gw−Pw=0的时候，才会是线性函数，也就是宽度和高度必须近似相等。

对于IoU大于指定值这块，我并不认同作者的说法。我个人理解，只保证Region Proposal和Ground Truth的宽高相差不多就能满足回归条件。x,y位置到没有太多限制，这点我们从YOLOv2可以看出，原始的边框回归其实x，y的位置相对来说对很大的。这也是YOLOv2的改进地方。详情请参考我的博客YOLOv2。

总结

里面很多都是参考师兄在caffe社区的回答，本来不想重复打字的，但是美观的强迫症，让我手动把latex公式巴拉巴拉敲完，当然也为了让大家看起来顺眼。后面还有一些公式那块资料很少，是我在阅读paper+个人总结，不对的地方还请大家留言多多指正。

文章标签：目标检测算法

个人分类：目标检测

(".MathJax").remove();

    MathJax.Hub.Config({
            "HTML-CSS": {
                    linebreaks: { automatic: true, width: "94%container" },
                    imageFont: null
            },
            tex2jax: {
                preview: "none"
            },
            mml2jax: {
                preview: 'none'
            }
    });

    (function(){
        var btnReadmore = (".MathJax").remove();    MathJax.Hub.Config({            "HTML-CSS": {                    linebreaks: { automatic: true, width: "94%container" },                    imageFont: null            },            tex2jax: {                preview: "none"            },            mml2jax: {                preview: 'none'            }    });    (function(){        var btnReadmore = $(".MathJax").remove(); MathJax.Hub.Config({ "HTML-CSS": { linebreaks: { automatic: true, width: "94%container" }, imageFont: null }, tex2jax: { preview: "none" }, mml2jax: { preview: 'none' } }); (function(){ var btnReadmore =$ ("#btn-readmore"); if(btnReadmore.length>0){ var winH =

(window).height();vararticleBox= ( w i n d o w ) . h e i g h t ( ) ; v a r a r t i c l e B o x = $(window).height(); var articleBox =$ ("div.article_content"); var artH = articleBox.height(); if(artH > winH*2){ articleBox.css({ 'height':winH*2+'px', 'overflow':'hidden' }) btnReadmore.click(function(){ articleBox.removeAttr("style"); $(this).parent().remove(); }) }else{ btnReadmore.parent().remove(); } } })()

想对作者说点什么？我来说一句

nathansader 2018-07-04 10:55:52 #15楼

谢谢博主，有心了赞！

举报回复

fab_4 2018-06-14 14:09:59 #14楼

谢谢博主，受教了～

举报回复

Rulen9987 2018-06-11 14:21:14 #13楼

写的很好！！！回归的的意义一开始我也没有想到

举报回复

drifter1026 2018-05-02 20:27:44 #12楼

谢谢楼主，看完清楚了很多

举报回复

qq_29271691 2018-04-08 10:06:18 #11楼

解释的真好，原本就是关于Bouding-Box regression中坐标变换和尺度缩放的形式没搞懂，经过这么以解释全明白了

举报回复查看回复(1)
- SugarAnnie回复 qq_29271691 2018-04-12 13:40:34
  
  请问一下，到底网络怎么设计，pool5层后接什么（4096维的特征向量输入到什么里面能得到x,y,w,h）？求解答
  
  举报回复

查看 20 条热评

个人资料

南有乔木ICT

关注

原创

粉丝

喜欢

等级：

访问：

9万+

积分：

1359

排名：

3万+

勋章：

持之以恒

授予每个自然月内发布4篇或4篇以上原创或翻译IT博文的用户。不积跬步无以至千里，不积小流无以成江海，程序人生的精彩需要坚持不懈地积累！

scrolling="no" src="https://pos.baidu.com/s?hei=250&wid=300&di=u3392637&ltu=https%3A%2F%2Fblog.youkuaiyun.com%2Fzijin0802034%2Farticle%2Fdetails%2F77685438&pis=-1x-1&prot=2&exps=111000&pss=1908x3516&ari=2&drs=1&dis=0&ant=0&par=1920x988&cfv=0&cdo=-1&tlm=1531143364&ti=%E8%BE%B9%E6%A1%86%E5%9B%9E%E5%BD%92(Bounding%20Box%20Regression)%E8%AF%A6%E8%A7%A3%20-%20优快云%E5%8D%9A%E5%AE%A2&cpl=0&chi=4&dri=0&cja=false&tcn=1531143365&dtm=HTML_POST&cmi=0&cce=true&ps=509x1262&col=zh-CN&ccd=24&psr=1920x1080&pcs=1908x636&cec=UTF-8&tpr=1531143364686&dai=1&dc=3" width="300" height="250">

        <div id="asideNewArticle" class="aside-box">
<h3 class="aside-title">最新文章</h3>
<div class="aside-content">
    <ul class="inf_list clearfix csdn-tracking-statistics tracking-click" data-mod="popu_382">
                    <li class="clearfix">
            <a href="https://blog.youkuaiyun.com/zijin0802034/article/details/77834798" target="_blank">Vim 录制宏</a>
        </li>
                    <li class="clearfix">
            <a href="https://blog.youkuaiyun.com/zijin0802034/article/details/77709465" target="_blank">Vim一键编译运行</a>
        </li>
                    <li class="clearfix">
            <a href="https://blog.youkuaiyun.com/zijin0802034/article/details/77334144" target="_blank">SeLU 激活函数</a>
        </li>
                    <li class="clearfix">
            <a href="https://blog.youkuaiyun.com/zijin0802034/article/details/77097894" target="_blank">YOLO9000: Better, Faster, Stronger</a>
        </li>
                    <li class="clearfix">
            <a href="https://blog.youkuaiyun.com/zijin0802034/article/details/72677150" target="_blank">Action Recognition</a>
        </li>
                </ul>
</div>

个人分类

展开

归档

展开

最新评论

边框回归(Bounding Box…

wfxueyuan：谢谢博主，有心了赞！
边框回归(Bounding Box…

fab_4：[reply]AaronYKing[/reply]
第二点我的理解，除以宽和高就把一个绝对的中心点…
边框回归(Bounding Box…

fab_4：谢谢博主，受教了～
边框回归(Bounding Box…

qq_39835472：写的很好！！！回归的的意义一开始我也没有想到
SSD: Single Shot …

T_maker：感谢博主，想提一个问题，预测是如何知道每一个盒子的概率的?

    <div class="aside-box">
                    <div id="_wiw7hnayxp" style="width: 100%;"><abbr style="display:none;"></abbr><iframe scrolling="no" src="//pos.baidu.com/s?hei=250&amp;wid=300&amp;di=u3163270&amp;ltu=https%3A%2F%2Fblog.youkuaiyun.com%2Fzijin0802034%2Farticle%2Fdetails%2F77685438&amp;dtm=HTML_POST&amp;pss=1908x3516&amp;dc=3&amp;dri=0&amp;ccd=24&amp;tcn=1531143365&amp;chi=4&amp;cja=false&amp;dis=0&amp;ari=2&amp;psr=1920x1080&amp;drs=1&amp;cpl=0&amp;tlm=1531143364&amp;cfv=0&amp;cmi=0&amp;ps=2233x1262&amp;cec=UTF-8&amp;cdo=-1&amp;prot=2&amp;ant=0&amp;ti=%E8%BE%B9%E6%A1%86%E5%9B%9E%E5%BD%92(Bounding%20Box%20Regression)%E8%AF%A6%E8%A7%A3%20-%20优快云%E5%8D%9A%E5%AE%A2&amp;col=zh-CN&amp;exps=111000&amp;pis=-1x-1&amp;dai=2&amp;pcs=1908x636&amp;par=1920x988&amp;cce=true&amp;tpr=1531143364686" width="300" height="250" frameborder="0"></iframe></div><script type="text/javascript" src="//cee1.iteye.com/avneunkwb.js"></script>
                </div>
            <div class="aside-box">
        <div class="persion_article">
        <div class="right_box footer_box csdn-tracking-statistics" data-mod="popu_475" data-dsm="post">        <h3 class="feed_new_tit"><span class="line"></span><span class="txt">联系我们</span></h3>        <div class="contact-box">        <div class="img-box"><img src="//csdnimg.cn/pubfooter/images/csdn_cs_qr.png" alt="客服"></div>        <div class="contact-info">        <h4>请扫描二维码联系客服</h4>        <p><svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><path d="M2.167 2h11.666C14.478 2 15 2.576 15 3.286v9.428c0 .71-.522 1.286-1.167 1.286H2.167C1.522 14 1 13.424 1 12.714V3.286C1 2.576 1.522 2 2.167 2zm-.164 3v1L8 10l6-4V5L8 9 2.003 5z" fill="#B3B3B3" fill-rule="evenodd"></path></svg><a href="mailto:webmaster@youkuaiyun.com" target="_blank"><span class="txt">webmaster@youkuaiyun.com</span></a></p><p><svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><path d="M14.999 13.355a.603.603 0 0 1-.609.645H1.61a.603.603 0 0 1-.609-.645l.139-1.47c.021-.355.25-.845.51-1.088 0 0 3.107-2.827 3.343-2.909 0 0-.029-2.46 1.2-2.46h3.635c1.112 0 1.202 2.469 1.202 2.469l3.32 2.9c.26.243.489.733.51 1.088l.139 1.47zM7 10a1 1 0 0 0 0 2h2a1 1 0 0 0 0-2H7zm7.806-5.674c.105.135.191.384.19.554l-.003 2.811c0 .17-.133.26-.295.2l-2.462-.999a.478.478 0 0 1-.296-.416V5.445c0-2.07-7.878-2.225-7.878 0v1.21c0 .17-.135.352-.3.404L1.3 7.904c-.165.052-.3-.044-.3-.213V4.88c0-.17.086-.42.191-.554C1.191 4.326 2.131 2 8 2s6.807 2.326 6.807 2.326z" fill="#B3B3B3"></path></svg><span class="txt"> 400-660-0108</span></p>        <p><svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><path d="M14.496 10.35c-.301-1.705-1.565-2.822-1.565-2.822.18-1.548-.481-1.823-.481-1.823C12.31.915 8.089.998 8 1 7.91.998 3.689.915 3.55 5.705c0 0-.662.275-.481 1.823 0 0-1.264 1.117-1.565 2.822 0 0-.16 2.882 1.445.353 0 0 .36.96 1.022 1.823 0 0-1.183.392-1.083 1.412 0 0-.04 1.136 2.527 1.058 0 0 1.805-.137 2.347-.882h.476c.542.745 2.347.882 2.347.882 2.566.078 2.527-1.058 2.527-1.058.1-1.02-1.083-1.412-1.083-1.412a7.986 7.986 0 0 0 1.022-1.823c1.604 2.529 1.445-.353 1.445-.353z" fill="#B3B3B3" fill-rule="evenodd"></path></svg><a href="javascript:void(0);" class="qqcustomer_s" target="_blank"><span class="txt">QQ客服</span></a>        <svg width="16" height="16" xmlns="http://www.w3.org/2000/svg"><path d="M7.325 13.965a6.5 6.5 0 1 1 7.175-6.4C14.467 11.677 11.346 15 7.5 15c-.514 0-1.015-.06-1.498-.172.488-.178.922-.48 1.323-.863zM4 7.5a4 4 0 1 0 8 0 .5.5 0 1 0-1 0 3 3 0 1 1-6 0 .5.5 0 0 0-1 0z" fill="#B3B3B3" fill-rule="evenodd"></path></svg><a href="http://bbs.youkuaiyun.com/forums/Service" target="_blank"><span class="txt">客服论坛</span></a>        </p>        </div></div>        <div class="bg-gray">        <div class="feed_copyright">        <p><a class="right-dotte" href="//www.youkuaiyun.com/company/index.html#about" target="_blank">关于</a><a href="//www.youkuaiyun.com/company/index.html#recruit" target="_blank" class="right-dotte">招聘</a><a href="//www.youkuaiyun.com/company/index.html#business" target="_blank" class="right-dotte">广告服务</a>        <a href="https://www.youkuaiyun.com/gather/A" target="_blank" class="footer_baidu">        网站地图</a></p>        <p class="fz12">©2018 优快云版权所有 <a href="http://www.miibeian.gov.cn/" target="_blank" class="ml14">京ICP证09002463号</a></p>        <p class="fz12 fz12_baidu"><svg width="13" height="14" xmlns="http://www.w3.org/2000/svg"><path d="M8.392 7.013c1.014 1.454 2.753 2.8 2.753 2.8s1.303 1.017.47 2.98c-.833 1.962-3.876.942-3.876.942s-1.122-.36-2.424-.072c-1.303.291-2.426.181-2.426.181s-1.523.037-1.957-1.888c-.434-1.927 1.52-2.982 1.666-3.161.145-.183 1.159-.873 1.81-1.963.653-1.09 2.608-1.962 3.984.181zm1.23 5.706V9.346H8.64v2.534h-.937s-.3-.044-.356-.285V9.33l-.925.015v2.518s.042.627.925.855h2.277zm-3.685.013V7.951l-.896-.014v1.295H3.987s-1.054.086-1.422 1.28c-.129.798.114 1.266.156 1.368.043.099.383.682 1.238.852h1.978zm-2.433-1.45c-.087-.286.013-.613.057-.741.042-.128.228-.427.61-.54h.855v1.948h-.797s-.555-.029-.725-.668zm6.877-8.775c-.143.909-.865 2.108-1.99 1.962-1.121-.144-1.375-1.16-1.267-2.179C7.214 1.458 8.21.18 9.007.364c.796.18 1.52 1.235 1.374 2.143zm-4.09-.345c0 1.197-.68 2.164-1.52 2.164S3.25 3.36 3.25 2.162C3.25.967 3.932 0 4.77 0c.842 0 1.52.967 1.52 2.162zm4.854 2.09c1.34 0 1.701 1.309 1.701 1.743 0 .438.182 2.29-1.485 2.326-1.667.037-1.737-1.126-1.737-1.96 0-.874.179-2.11 1.52-2.11zm-7.93.581c.045.398.253 2.217-1.27 2.544C.427 7.704-.14 5.947.028 5.124c0 0 .18-1.78 1.412-1.89.98-.085 1.7.986 1.774 1.6z" fill="#999" fill-rule="evenodd"></path></svg><em>百度提供支持</em></p>        </div>        <div class="allow-info-box">        <p><a href="http://www.hd315.gov.cn/beian/view.asp?bianhao=010202001032100010" target="_blank"><span>经营性网站备案信息</span></a></p>        <p><a href="http://www.cyberpolice.cn/" target="_blank"><span>网络110报警服务</span></a></p>        <p><a href="http://www.12377.cn/" target="_blank"><span>中国互联网举报中心</span></a></p>        <p><a href="http://www.bjjubao.org/" target="_blank"><span>北京互联网违法和不良信息举报中心</span></a></p>        </div>        </div>        </div></div>
    </div>
</div>

("a.flexible-btn").click(function(){ ("a.flexible-btn").click(function(){ $("a.flexible-btn").click(function(){$ (this).parents('div.aside-box').removeClass('flexible-box'); $(this).remove(); })