DSST(Accurate Scale Estimation for Robust Visual Tracking 代码解读

最新推荐文章于 2025-06-05 09:14:01 发布

autocyz

最新推荐文章于 2025-06-05 09:14:01 发布

阅读量2.2w

点赞数 24

CC 4.0 BY-SA版权

分类专栏：视觉跟踪文章标签： DSST 相关滤波跟踪多尺度跟踪

本文链接：https://blog.youkuaiyun.com/autocyz/article/details/48651013

本文详细解读了DSST（Accurate Scale Estimation for Robust Visual Tracking）算法，从代码层面剖析了初始化和跟踪阶段，包括翻译滤波器和尺度滤波器的输入输出、HOG特征的使用以及汉明窗口的作用。DSST通过两个并行的滤波器分别处理位置和尺度变化，改善了MOSSE的跟踪效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Accurate Scale Estimation for Robust Visual Tracking

我在前面一篇博客“相关滤波跟踪（MOSSE）”中讲了相关滤波跟踪的原理，但是因为那篇文章没有提供代码，所以就没法深入的研究他，而且纯理论看起来会很枯燥。后来Martin Danelljan 对MOSST做了改进，并增加了多尺度跟踪，改进效果很显著，在今年的VOT上，其测试效果是第一的。其文章名为Accurate Scale Estimation for Robust Visual Tracking，其代码为DSST，因此后面就用DSST代表这种方法。我把文章代码都上传到这里http://yunpan.cn/cHcj9c4q9VLfL 访问密码 af88 或者是这里https://github.com/Lennycyz/DSST，大家可以下载。下面进入正文：

MOSSE(Visual Object Tracking using Adaptive Correlation Filters )在求解滤波器时，其输入项是图像本身（灰度图），也就是图像的灰度特征。对于灰度特征，其特征较为简单，不能很好的描述目标的纹理、边缘等形状信息，因此DSST的作者将灰度特征替换为在跟踪和识别领域较为常用的HOG特征。

DSST作者将跟踪分为两个部分，位置变化（translation）和尺度变化（scale estimation）。在跟踪的实现过程中，作者定义了两个correlation filter，一个滤波器（translation filter）专门用于确定新的目标所处的位置，另一个滤波器（scale filter）专门用于尺度评估。

在translation filter方面，作者的方法与MOSSE的方法是一样的，只不过其获取最佳模板H的准则有了些许变化。根据translation filter可以获取当前帧目标所处的位置，然后在当前目标位置获取不同尺度的候选框，经过scale filter之后，确定新的目标尺度。

程序实现：

先来看看作者给出的伪代码：

Algorithm 1 Proposed tracking approach :iteration at step t.
Input :
    Image .
    Precious target position and scale .
    Translation model ,and scale model ,.
Output :
    Estimated target position  and scale .
    Updated translation model ,and scale model ,.

Translation estimation :
    1：Extract a translation sample from at and .
    2：Compute the translation correlation using ,and in (6).
    3：Set to the target position that maximizes .

Scale estimation :
    4：Extract a translation sample from at and .
    5：Compute the translation correlation using ,and in (6).
    6：Set to the target position that maximizes .

Model update :
    7：Extract samples and from at  and .
    8：Update the translation model ,using (5).
    9：Update the scale model ,using (5).

初始化阶段：

一、得到translation filter的输入和输出

% desired translation filter output (gaussian shaped), bandwidth proportional to target size
%prod（X）表示对X中的每个元素求积（product）
%ndgrid将两个向量复制到rs（行rows） 和cs（cols）中 
output_sigma = sqrt(prod(base_target_sz)) * output_sigma_factor;
[rs, cs] = ndgrid((1:sz(1)) - floor(sz(1)/2), (1:sz(2)) - floor(sz(2)/2));
y = exp(-0.5 * (((rs.^2 + cs.^2) / output_sigma^2)));
yf = single(fft2(y)); %将矩正变为单精度浮点型

已知初始目标框的为 $(x_0,y_0,w_0,h_0)$ ，作者首先获取不同大小的patchs，获取过程如下：

[rs, cs] = ndgrid((1:sz(1)) - floor(sz(1)/2), (1:sz(2)) - floor(sz(2)/2));

这里的sz(1)= $2h_0$ ，sz(2)= $2w_0$ 。这里获取的patch大小如下表示：

p a t c h r o w s {r | 1 - h 0 < r < h 0}

$patch_{rows}\{r|1-h_0<r<h_0\}$

p a t c h c o l s {c | 1 -

最低0.47元/天解锁文章

200万优质内容无限畅学