目标跟踪 Object tracking

本文深入探讨了视频跟踪技术的基本原理及应用,介绍了多种跟踪模型,包括平面对象、三维刚体对象等,并详细讲解了相关算法如相关性跟踪、均值漂移、KLT等。此外,还提供了使用Matlab进行视频跟踪的具体实现步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Object tracking

Video tracking is a process for position estimation of an object in time using a camera. Tracking algorithm analyzes video frames and estimates a position of the object.

A simple models for tracking are for instance:

  • A planar object with motion model of 2D image transformation (affine or homography).
  • A rigid 3D object with a motion model depending on its 3D position and orientation.
  • Object with a feature points (e.g. Harris points) or regions (e.g. MSERs) in the image, which are tracking separately. This provides robustness to occlusions.
  • Non-rigid (deformable) object approximated with the surface wireframe. The motion of an object is defined by position of wireframe edges.

Popular tracking algorithms are: correlation, mean-shift, Kanade-Lucas-Tomasi (KLT) tracking, Kalman tracking, particle filtering, (sequential) linear predictors and so on. In this course you will familiarize with tracking using the correlation and the KLT tracker.

Tracking I, tracking by finding correspondences, correlation.

We already know how to find correspondences between images. The task of tracking is very similar. It is searching for object correspondences (of image frame, region, feature point with a neighborhood) in a given frame with subsequent unknown frames. The difference between correspondence finding in two wide-baseline images is the fact, that now we can expect only small changes of the object (position, deformation, photometric) between successive frames. Nevertheless, the changes of the object appearance in a whole sequence can be significant. Selection of the object for tracking is done manually or by automatic detection (which is outside of scope of this lab).

You can choose to implement one of these two methods:

  • tracking by correspondence finding using methods from  second task
  • tracking by correlations in Harris points neighborhood

Since we will be working with video in Matlab, download function processMpvVideo(filename,method,options), where filename is name of the videofile (e.g. example.avi), method is a tracking method and options is a structure with parameters for tracking function. Function creates a video sequence with tracked points plotted and writes the output into folder ./export/ and shows a preview meantime.

The script works well with nearly all Windows versions, but it is possible that you will encounter problems with codecs. Therefor we prepared the input video in several versions. On a linux system, or if any of the video version will not work, use the slightly changed function processmpvvideo_jpeg.m for reading from video decomposed to sequence of jpeg images. You can download or create these images:

  • Linux:  mplayer -vo png video.avi
  • Windows:  VirtualDub

Function processmpvvideo_jpeg.m creates sequence of images to folder ./export/. For joining into video, which you have to submit use:

  • Linux:  mencoder mf://*.png -mf w=640:h=480:fps=15:type=png -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=2400 -oac copy -o video.avi
  • Windows: utility  MakeAvi

You will implement all methods for tracking into functions xNew = track_method(imgPrev,imgNew,xPrev,options), where imgPrev is a previous frame in the sequence, xPrev are tracked points in the previous frame, imgNew is a new image in the sequence and xNeware tracked points founded in the new frame. Tracked points are represented in a structure,

  x.x    % COLUMN vector of ''x'' coordinates
  x.y    % COLUMN vector of ''y'' coordinates
  x.ID   % COLUMN vector unique identifier of the point (index to the array is not enough because some points will disappear during tracking)
  x.data % specific information for the tracking method (e.g. SIFT description)

As we mentioned before, tracking algorithms tracks the selected object during the entire sequence. Selection of the object for tracking will be done in two phases - processMpvVideo calls function data = processMpvInit(img,options), which selects the object for tracking in the first image img of the sequence. In our case, it is represented by a bounding box.

  data.xRect    %x coordinates of bounding box corners (anticlockwise)
  data.yRect    %y coordinates of bounding box corners (anticlockwise)

The second function, x = track_init(img,options), finds points x for tracking. options is a structure with parameters.

At the end, for each image the following function is called processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options), which uses founded tracking points, for instance for estimating a homography transformation between frames.

Set structure options and call function processMpvVideo.m in “main” script cv08_method.m, which you will also submit.

To ease your task and better understanding of the parameters, we prepared templates of described functions, which simulates constant shift of randomly selected points.

All-in-one

Your task

Your task is to hide an identity of a selected object in a video sequence. You will select the object in the first frame (for instance your least favorite vegetable) and after that you will blur this object in the whole sequence (in a similar way as you know from TV).

It can be done in two ways:

  • direct tracking of the selected object
  • assuming that object is not moving, tracking the whole scene and bluring the right place

If we know or can assume that there is only one scene in the sequence and it is planar (eg. images from airplane), we can use the second method - tracking the scene and estimating a homography, in the same way as in second task or courseA4M33TZ. It will create more robust tracking.

In obligatory part of this task we will track the whole scene and estimate the frame homographies. Optionally you can compare this method with the direct object tracking.

The goal of this task is to familiarize and to fill in the universal framework for tracking and implement a simple tracking algorithm based on your previous knowledge (either correlations or correspondences). Use the successfully tracked points in the scene for homography estimation between images and for trajecting of the rectangle across the sequence.

  1. Familiarize yourself with tracking framework and testing sequence:
  2. Choose implementation using correspondences or correlations.
  3. Implement detection of tracking point in the whole image in function  track_init_method.m
  4. Implement tracking of the points of interest in function  track_method.m
  5. Try your algorithm on the testing video and draw tracked points
  6. From tracked points estimate homographies between images
  7. Transform the bounding box of the selected object
  8. Blur the image inside the bounding box
  9. Join points 6-8 into function  processmpvframe.m
  10. Generate the video with the blurred object

Example of bounding box transformation (without blurring) export_billa_xvid.avi

Tracking using correspondences

On the base of your knowledge and use of your code from the second task you should be able to implement a function for finding corresponding points between successive frames of the sequence and estimate the homography between the frames given tentative correspondences. There is only a little portion of the new code. The suitable combination for a first experiment are Harris points and DCT descriptors. If your algorithms from the second task would not give you good results, you can try to add a constraint that the distance between corresponding points in two frames cannot be too large (Due to a slow motion, the scene cannot change much between successive frames).

The method you will choose is up to you. The minimal acceptable solution for this task is your two functionstrack_init_correspond.m and track_correspond.m, that track and transform the selection during the whole sequence.

Tracking using the correlation

Tracking using the correlation finds correspondences by directly comparing image intensities between the feature point neighborhoods, unlike constructing a compact normalized descriptor (SIFT, DCT, etc.), We will use the correlation coefficient as the similarity statistic.

  • Correlation coefficeint (also called normalized cross-correlation, NCC) is defined as

NCC(T,I)=kl(T(k,l)T¯¯¯¯)(I(x+k,y+l)I(x,y)¯¯¯¯¯¯¯¯¯¯¯¯¯¯)kl(T(k,l)T¯¯¯¯)2kl(I(x+k,y+l)I(x,y)¯¯¯¯¯¯¯¯¯¯¯¯¯¯)2 NCC(T,I)=∑k∑l(T(k,l)−T¯)(I(x+k,y+l)−I(x,y)¯)∑k∑l(T(k,l)−T¯)2∑k∑l(I(x+k,y+l)−I(x,y)¯)2

where Graph is image template window and Graph is target image window.

NCC value is inside the range <-1,+1> and is commonly used as a statistic for similarity between image windows, see matlab function corr2 .

  • Advantages of NCC: invariance to affine transformation of intensity
  • Disadvantages of NCC: not invariant to scale changes, rotation and perspective distortion. For some applications the computational complexity is the problem.

Harris corner points are often suitable for tracking. By using function harris(img,Graph,Graph,thresh) write a function x = track_init(img,options), which will return Harris points x in a chosen region of interest options.ROI. Structure options will contain fields sigma_dsigma_i and thresh.

As we mentioned above, the method is analogous to the method for correspondence finding - detect Harris points in each frame (track_init will be called also in track_corr!) and find a correspondences for them using the NCC. Set the size of the neighborhood with parameter options.ps and compare the results with a different setting. Assuming only small shifts between successive frames, it is not necessary (and also not desirable) to compute the correlation between all pairs. For each point from the first image, compute only correlations with points in the second image up to some distance. Obviously, too distant points cannot correspond. Set this threshold with parameter options.corr_max_dist. The value depends on a character of the motion in actual sequence. For our task we recommend the value 30px.

To identify promising matches, i.e. correspondences of points with a high correlation coefficient, use for instance the principle of mutual nearest correspondences (two points are paired if the second point is the closest one from points in the second image and simultaneously the first point is the closest one from points in the first image).

If there is no point in a new frame that corresponds to the tracked point from the previous frame, this tracked point is discarded and not tracked anymore.

Join this functionality into xNew = track_corr(imgPrev,imgNew,xPrev,options).

Homography estimation

If potentially corresponding points between frames (tentative correspondences) are known and assuming only a (approximately) planar scene, then the homography between frames can be estimated. We will use our function[Hbest,inl]=ransac_h(u,threshold,confidence) from the second task. Create the vector of points u. The IDs are known and you also know that xPrev contains all points from xNew. Keep the setting in fields options.rnsc_threshold and options.rnsc_confidence.

Discard the points which are homography outliers from further tracking. Advanced algorithms has methods for adding new tracking points, however we will not implement any in this course.

Knowing the homography (matrix H), you can transform (Graph) the corners of bounding box which outlines the selected object from one frame to next frame. Blur the interior region with function gaussfilter.m with high enough sigma. Do not blur the image outside of the selection (you can use your knowledge from this course).

Join your functions into [dataOut xNewOut] = processMpvFrame(data,imgPrev,imgNew,xPrev,xNew,options). This function returns structure dataOut containing

  dataOut.xRect    %transfomed x-coordinates of bounding box corners (anti-clockwise) 
  dataOut.yRect    %transfomed y-coordinates of bounding box corners (anti-clockwise)
  dataOut.H        %estimated homography

and xNewOut containing tracked points in the current frame without outliers. Within this function, implement also a drawing of the current image frame into a figure with a blurred selection and show the transformed bounding box.

What you should upload?

Submit into upload system: the file cv08.m with structure options for your implementation, your function processmpvframe.m and (track_init_correspond.m and track_correspond.mor (track_init.m and track_corr.m), together with all used non-standard functions you have created. Submit also generated video-file export_billa_xvid.avi with the blurred selection and the generated file homography.mat with all homographies.

The minimal acceptable solution for this task is to implement one of the two simple methods for tracking in a way that it will generate enough points and without adding new points it will find homographies for the entire sequence (acceptable minimum is 20 point in the scene)

Please do not submit the function procesMpvVideo.m. It complicates the automatic evaluation.

Testing

To test your code, you can use a matlab script and a function 'publish'. Copy track_test.zip (20.4.2010) and unpack it to direcotry which is in matlab paths and execute (be careful, it contains our version of processMpvVideo.m). Compare your results with ours.

As a quality measure of your algorithm, you can use the last part of the test, where you are tracking points on the image which is transformed with known homography and in ideal case you will get the same homography back. Other comparisons are with KLT tracker from Open CV.

Where to go next

Tracking using correspondences has several disadvantages

  • necessity of feature detection (harris points…) in next image
  • precision is given with precision of detected points

We will show in the next task, how we can look for the selection (or its transformation) in the next image. The object will be selected only in the first image.

2. Kanade-Lucas-Tomasi Tracking (KLT tracker)

KLT minimizes sum of squared difference of image intensities between windows in subsequent frames. The minimum is found iteratively by Newton-Raphson method.

We are given a patch template  T(x) T(x) centered at pixel  x=[x,y]T x=[x,y]T in image frame at time  t t. In a subsequent frame, at time  t+1 t+1, the target moves to a new position described coordinate transformation  W(x;p)=[x+px;y+py]T W(x;p)=[x+px;y+py]T. The task is to estimate displacement  p=[px,py]T p=[px,py]T.

Graph

Consider the best shift is Graph, from (1) we will get

Graph

We minimize this expression with respect to Graph. Nonlinear expression (2) is linearized by (first order) Taylor expansion

Graph

where Graph is gradient at Graph Term Graph is Jacobian matrix of the coordinate transformation

Graph

The minimum of expresion (3) over  Δp Δp is

Graph

where Graph is approximation of the Hessian matrix used in Gauss-Newton gradient method. This is a nonlinear regression in fact (nonlinear least squares). Note that the approximation of the Hassian matrix in this case is equal to the autocorrelation matrix (Harris matrix), i.e. a dot product of the first partial derivatives. This suggests that a good idea is to track the points in near neighborhood around Harris points

Graph

Substituting (4) into (5) and (6) simplifies Graph. The displacement correction  Δp Δp is computed in each iteration and the estimated shift is updated by

Graph

The iterations are terminated by setting the maximum number of iteration steps and/or by convergence condition

Graph

Your task

Implement KLT tracking algorithm, estimation of Harris point translations, and test it on the familiar sequence with promotional leaflet

  1. If you have not finished the previous task, implement function  track_init.m for Harris point detection in the image
  2. Implement function  getPatchSubpixel.m for sub-pixel selection in image
  3. Implement KLT algorithm into function  track_klt.m
  4. Try an algorithm on the sequence from the first part of this task. Transform the selection and find homographies between frames in the same way as before. Integrate the whole process into  cv09.m

KLT algorithm

For the KLT algorithm, it is necessary to implement all operations in a sub-pixel precision. Think about it. Try to find a simple example, in which the non-subpixel argorithm is not enough.

You will need a function for a patch selection patch = getPatchSubpix(img,x,y,win_x,win_y), where the patch is a selection from image img around center x,y with size win_x*2+1 x win_y*2+1. Assume that win_x,win_y are integers, but x,y are real. Functioninterp2.m can be useful in your implementation. Tip: You will get much faster computation, if you crop the image before usinginterp2.m.

The KLT algorithm can be summarized in a few steps:

For template Graph the neighborhood of Harris point Graph in previous image xPrev, set Graph and iterate:

  1. Take patch  Graph from new image  imgNew the neighborhood of Harris point  Graph with current shift  Graph
  2. Estimate the error  Graph.
  3. Computes gradients  Graph at translated coordinates  Graph.
  4. Approximate Hessian matrix  Graph with dot product  Graph.
  5. Estimate displacement  Graph
  6. Update the translation estimate by  Graph
  7. Test the convergence  Graph

Set the new position of Harris point Graph

If the algorithm did not converge in the maximum number of steps, the best decision would be to discard the point from further tracking, because the real shift was not found likely.

%% Implementation - example
[Gx,Gy]= ... ;        % gradient estimation
g = [ Gx(:) Gy(:) ];
H = g'*g;             % approximation of hessian matrix with dot product

A simplified illustrative scheme of the algorithm demonstrated on a car tracking is on the figure below


 

Implement function xNew = track_klt(imgPrev,imgNew,xPrev,options), where parameters are known from the previous task and the structure contains:

 options.klt_window         % size of patch W(x,p) in the sense of getPatchSubpix.m
 options.klt_stop_count     % maximal number of iteration steps
 options.klt_stop_treshold  % minimal change epsilon^2 for termination (squared for easier computation of the distance)
 options.klt_show_steps     % 0/1 turning on and of drawing during tracking

Tip: Do not ignore warnings from Matlab and use operator \ rather than function inv()

To easily check your implementation, include a drawing function after each iteration.

 if (options.klt_show_steps)
  showKltStep(step,T,I,E,Gx,Gy,aP);
 end
 % step  - serial number of iteration (zero based)
 % T     - template, (patch from imgPrev)
 % I     - current sifted patch in imgNew
 % E     - current error (I - T)
 % Gx,Gy - gradients
 % aP    - size of current shift delta P

What you should upload?

Submit file cv09.m together with structure setting options of your implementation into the upload system. Include also completed functions track_init.mtrack_klt.m and getPatchSubpix.m together with all used non-standard functions you have created. Submit the generated video file export_billa_xvid.avi with blurred selection and a highlighted bouding box and generated file homography.mat with all homographies.

Please do not submit the function procesMpvVideo.m or showKltSteps.m. It complicates the automatic evaluation.

Testing

To test and for implementation download showkltstep.m for visualization of KLT iteration.

To test your code, you can use a matlab script and a function 'publish'. Copy klt_test.zip, unpack and add your files, which are requested for submission. Do not unpack it into your working folder, because it contains our version of processMpvVideo.mand showKltSteps.m. Compare your results with ours.

References

Lucas-Kanade 20 Years On: A Unifying Framework

Predator: A Smart Camera that Learns


from: https://cw.fel.cvut.cz/wiki/courses/ae4m33mpv/labs/4_tracking/start

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值