对应论文:Fast Tracking via Spatio-Temporal Context Learning. Kaihua Zhang,Lei Zhang, Ming-Hsuan Yang, David Zhang
作者官方主页:http://www4.comp.polyu.edu.hk/~cslzhang/STC/STC.htm
代码及论文可以在作者官方主页上查看与下载。
该论文提出一种简单却非常有效的视觉跟踪方法。更迷人的一点是,它速度很快,原作者实现的Matlab代码在i7的电脑上达到350fps。
该论文的关键点是对时空上下文(Spatio-TemporalContext)信息的利用。主要思想是通过贝叶斯框架对要跟踪的目标和它的局部上下文区域的时空关系进行建模,得到目标和其周围区域低级特征的统计相关性。然后综合这一时空关系和生物视觉系统上的focus of attention特性来评估新的一帧中目标出现位置的置信图,置信最大的位置就是我们得到的新的一帧的目标位置。另外,时空模型的学习和目标的检测都是通过FFT(傅里叶变换)来实现,所以学习和检测的速度都比较快。(转,点击见原文)
我下载了作者给的源代码,运行了试了试效果。
我的测试环境:Win7 64位+ Matlab2015a
直接下载源代码,不需要配置,直接运行demoSTC.m即可。
运行效果如下(截取了4幅图像):
发现效果还挺好的。
但是我修改了代码,使得可以自己手动选择要追踪的物体,代码如下:
% Demo for paper "Fast Tracking via Spatio-Temporal Context Learning,"Kaihua Zhang, Lei Zhang,Ming-Hsuan Yang and David Zhang
% Paper can be available from http://arxiv.org/pdf/1311.1939v1.pdf
% Implemented by Kaihua Zhang, Dept.of Computing, HK PolyU.
% Email: zhkhua@gmail.com
% Date: 11/24/2013
%%
clc;close all;clear all;
%%
fftw('planner','patient')
%% set path
addpath('./data'); %addpath('./data');
img_dir = dir('./data/*.jpg'); %img_dir = dir('./data/*.jpg');
%% initialization
%initstate = [161,65,75,95];%initial rectangle [x,y,width, height]
%读取并显示图片
im = imread('./data/0001.jpg');
figure,imshow(im);
%选择需要追踪的物体
[I,initstate] = imcrop(); %RECT:[XMIN YMIN WIDTH HEIGHT]
pos = [initstate(2)+initstate(4)/2 initstate(1)+initstate(3)/2];%center of the target
target_sz = [initstate(4) initstate(3)];%initial size of the target
%% parameters according to the paper
padding = 1; %extra area surrounding the target
rho = 0.075; %the learning parameter \rho in Eq.(12)
sz = floor(target_sz * (1 + padding));% size of context region
%% parameters of scale update. See Eq.(15)
scale = 1;%initial scale ratio
lambda = 0.25;% \lambda in Eq.(15)
num = 5; % number of average frames
%% store pre-computed confidence map
alapha = 2.25; %parmeter \alpha in Eq.(6)
[rs, cs] = ndgrid((1:sz(1)) - floor(sz(1)/2), (1:sz(2)) - floor(sz(2)/2));
dist = rs.^2 + cs.^2;
conf = exp(-0.5 / (alapha) * sqrt(dist));%confidence map function Eq.(6)
conf = conf/sum(sum(conf));% normalization
conff = fft2(conf); %transform conf to frequencey domain
%% store pre-computed weight window
hamming_window = hamming(sz(1)) * hann(sz(2))';
sigma = mean(target_sz);% initial \sigma_1 for the weight function w_{\sigma} in Eq.(11)
window = hamming_window.*exp(-0.5 / (sigma^2) *(dist));% use Hamming window to reduce frequency effect of image boundary
window = window/sum(sum(window));%normalization
%%
for frame = 1:numel(img_dir),
sigma = sigma*scale;% update scale in Eq.(15)
window = hamming_window.*exp(-0.5 / (sigma^2) *(dist));%update weight function w_{sigma} in Eq.(11)
window = window/sum(sum(window));%normalization
%load image
img = imread(img_dir(frame).name);
if size(img,3) > 1,
im = rgb2gray(img);
end
contextprior = get_context(im, pos, sz, window);% the context prior model Eq.(4)
%%
if frame > 1,
%calculate response of the confidence map at all locations
confmap = real(ifft2(Hstcf.*fft2(contextprior))); %Eq.(11)
%target location is at the maximum response
[row, col] = find(confmap == max(confmap(:)), 1);
pos = pos - sz/2 + [row, col];
contextprior = get_context(im, pos, sz, window);
conftmp = real(ifft2(Hstcf.*fft2(contextprior)));
maxconf(frame-1)=max(conftmp(:));
%% update scale by Eq.(15)
if (mod(frame,num+2)==0)
scale_curr = 0;
for kk=1:num
scale_curr = scale_curr + sqrt(maxconf(frame-kk)/maxconf(frame-kk-1))
end
scale = (1-lambda)*scale+lambda*(scale_curr/num)%update scale
end
%%
end
%% update the spatial context model h^{sc} in Eq.(9)
contextprior = get_context(im, pos, sz, window);
hscf = conff./(fft2(contextprior)+eps);% Note the hscf is the FFT of hsc in Eq.(9)
%% update the spatio-temporal context model by Eq.(12)
if frame == 1, %first frame, initialize the spatio-temporal context model as the spatial context model
Hstcf = hscf;
else
%update the spatio-temporal context model H^{stc} by Eq.(12)
Hstcf = (1 - rho) * Hstcf + rho * hscf;% Hstcf is the FFT of Hstc in Eq.(12)
end
%% visualization
target_sz([2,1]) = target_sz([2,1])*scale;% update object size
rect_position = [pos([2,1]) - (target_sz([2,1])/2), (target_sz([2,1]))];
imagesc(uint8(img))
colormap(gray)
rectangle('Position',rect_position,'LineWidth',4,'EdgeColor','r');
hold on;
text(5, 18, strcat('#',num2str(frame)), 'Color','y', 'FontWeight','bold', 'FontSize',20);
set(gca,'position',[0 0 1 1]);
pause(0.001);
hold off;
drawnow;
end
这是我选择的目标:
运行效果如下:
效果很差。
然后我又尝试其他图片库,效果如下:
效果还是很差。
小结:该代码的运行效果似乎对矩形框的初始化非常敏感,矩形框初始化不好,最后追踪效果会大打折扣,鲁棒性比较差。
由于我没有看过这个算法对应的论文,也没有仔细研究代码,所以不能推断出到底是什么原因造成鲁棒性这么差。