The OpenCV Video Surveillance / Blob Tracker Facility

翻译于 2009-08-15 09:51:00 发布 · 3.6k 阅读

文章标签：

#video #module #documentation #algorithm #components #system

本文档详细介绍了OpenCV视频监控设施的工作原理和技术细节，重点阐述了前景背景判别算法，并提供了混合高斯模型等实现方式的参数配置说明。

Unofficial Documentation

This is an attempt to document the OpenCV Video Surveillance facility.

It is at the moment, just a collection of insights from working with the under-documented code, and not a complete full documentation. In fact, at the moment this document will only focus on the Foreground / Background Discrimination part of the complete algorithm. Please feel free to add more details, improve the layout and edit the contents. (AdiShavit)

The OpenCV "Video Surveillance" facility, also called "blob tracker" through much of the code, is a simple but practical facility intended to track moving foreground objects against a relatively static background. Conceptually it consists of a three-stage video processing pipeline:

A foreground/background discriminator which labels each pixel as either foreground or background.
A blob detector which groups adjacent "foreground" pixels into blobs, flood-fill style.
A blob tracker which assigns ID numbers to blobs and tracks their motion frame-to-frame.

Almost all the sophistication (and CPU time!) in this facility is devoted to the first stage, which uses a state of the art (as of 2003) algorithm. The other two stages use relatively unsophisticated algorithms. This has the advantage of making the module fast and quite generic.

More specialized applications will typically go on to use sophisticated algorithms to classify the blobs and perhaps extract six-degree-of-freedom orientation information from them using domain-specific object models. The basic module provided does not do this, but may be used as a jumping-off point to develop such a system, for example by using the OpenCV Haar-based routines to classify the blobs.

The foreground/background discrimination stage is both memory and CPU intensive. It builds and maintains a histogram-based statistical model of the video image background on a per-pixel basis; consequently it may easily wind up using on the loose order of a gigabyte of ram to process a TV-resolution video stream. (This may sound like gross overkill. It is not. Vision is a hard problem. It seems deceptively easy only because humans come with extraordinarily good subconscious vision logic built in. Computers are not so lucky.)

Resource consumption may be controlled via a number of tuning parameters. In general, the results obtained do not depend too critically upon the parameter settings; the casual user can (and probably should) simply leave them at their default settings.

The casual user can simply treat the module as a black box which translates a video stream into a list of moving blobs. At this level, the sample program samples/c/blobtrack.cpp included with the OpenCV source distribution may be used as-is or lightly tweaked, or equivalent code may be written using (say) the OpenCV Python or Matlab bindings. A Linux-based Gtk wrapper in C is available here. A "beta" documentation of these modules can be found on the SVN: doc/vidsurv/Blob_Tracking_Modules.doc

To build more sophisticated computer vision applications you will need to understand and extend the detailed internal structure of this facility, which actually involves six major video pipeline processing stages, each of which in general is implemented by several alternative modules offering different space/time/complexity tradeoffs, each alternative module having its own set of tuning parameters.

An article providing a good overview of the complete facility internal architecture may be found here:

Chen, T.; Haussecker, H.; Bovyrin, A.; Belenov, R.; Rodyushkin, K.; Kuranov, A.; Eruhimov, V. "Computer Vision Workload Analysis: Case Study of Video Surveillance Systems." Intel Technology Journal. (May 2005). (PDF)

Foreground / Background Segmentation

The implementation provides a choice of two modules for this subtask:

CV_BG_MODEL_FGD
CV_BG_MODEL_MOG

CV_BG_MODEL_FGD

The most sophisticated (and default) module is based on an algorithm from Foreground Object Detection from Videos Containing Complex Background, Liyuan Li, Weimin Huang, Irene Y.H. Gu, and Qi Tian, ACM MM2003 9p, available here (pdf).

Internally (in cvbgfg_acmmm2003.cpp) it uses a change-detection algorithm from: P.Rosin, Thresholding for Change Detection, ICCV, 1998, available here (pdf).

Parameters for this module are supplied via the CvFGDStatModelParams struct:

typedef struct CvFGDStatModelParams
{
    int    Lc;                  /* Quantized levels per 'color' component. Power of two, typically 32, 64 or 128.                               */
    int    N1c;                 /* Number of color vectors used to model normal background color variation at a given pixel.                    */
    int    N2c;                 /* Number of color vectors retained at given pixel.  Must be > N1c, typically ~ 5/3 of N1c.                     */
                                /* Used to allow the first N1c vectors to adapt over time to changing background.                               */

    int    Lcc;                 /* Quantized levels per 'color co-occurrence' component.  Power of two, typically 16, 32 or 64.                 */
    int    N1cc;                /* Number of color co-occurrence vectors used to model normal background color variation at a given pixel.      */
    int    N2cc;                /* Number of color co-occurrence vectors retained at given pixel.  Must be > N1cc, typically ~ 5/3 of N1cc.     */
                                /* Used to allow the first N1cc vectors to adapt over time to changing background.                              */

    int    is_obj_without_holes;/* If TRUE we ignore holes within foreground blobs. Defaults to TRUE.                                           */
    int    perform_morphing;    /* Number of erode-dilate-erode foreground-blob cleanup iterations.                                             */
                                /* These erase one-pixel junk blobs and merge almost-touching blobs. Default value is 1.                        */

    float  alpha1;              /* How quickly we forget old background pixel values seen.  Typically set to 0.1                                */
    float  alpha2;              /* "Controls speed of feature learning". Depends on T. Typical value circa 0.005.                               */
    float  alpha3;              /* Alternate to alpha2, used (e.g.) for quicker initial convergence. Typical value 0.1.                         */

    float  delta;               /* Affects color and color co-occurrence quantization, typically set to 2.                                      */
    float  T;                   /* "A percentage value which determines when new features can be recognized as new background." (Typically 0.9).*/
    float  minArea;             /* Discard foreground blobs whose bounding box is smaller than this threshold.                                  */

}
CvFGDStatModelParams;

This is initialized with the defaults:

/* default paremeters of foreground detection algorithm */
#define  CV_BGFG_FGD_LC              128
#define  CV_BGFG_FGD_N1C             15
#define  CV_BGFG_FGD_N2C             25

#define  CV_BGFG_FGD_LCC             64
#define  CV_BGFG_FGD_N1CC            25
#define  CV_BGFG_FGD_N2CC            40

/* BG reference image update parameter */
#define  CV_BGFG_FGD_ALPHA_1         0.1f

/* stat model update parameter
   0.002f ~ 1K frame(~45sec), 0.005 ~ 18sec (if 25fps and absolutely static BG) */
#define  CV_BGFG_FGD_ALPHA_2         0.005f

/* start value for alpha parameter (to fast initiate statistic model) */
#define  CV_BGFG_FGD_ALPHA_3         0.1f

#define  CV_BGFG_FGD_DELTA           2

#define  CV_BGFG_FGD_T               0.9f

#define  CV_BGFG_FGD_MINAREA         15.f

#define  CV_BGFG_FGD_BG_UPDATE_TRESH 0.5f

If we want to update alpha2, the paper explains:

If we think that n frames for the system to response to an “once-off” background change is quick enough, we should choose the learning rate alpha2 from (22):
- (22) alpha2 > 1 - (1 - T)^(1/n)
As example, if we want the system to response to an ideal "once-off" background change after 20 seconds with 25 fps frame rate and T = 90%, alpha2 should be larger than 0.0046 but not too larger than it to prevent the system not to sensitive to noise and foreground objects. So:
```
   n = 20*25 = 500 
   T = 0.9
   => 0.005f = CV_BGFG_FGD_ALPHA_2 = alpha2 > 1 - ((1 - 0.9)^(1 / 500)) = 0.00459458265 
```
T is a percentage value which determines when the new features can be recognized as new background appearance. With a large value of T, the system is stable but slow to response to the "once-off" changes. However, if T is small, the system is easy to learn the frequent foreground features as new background appearances. In our tests, T was set as 90%.

CV_BG_MODEL_MOG

This is an implementation of the Mixture of Gaussians paper: P. KadewTraKuPong and R. Bowden,An improved adaptive background mixture model for real-time tracking with shadow detection, in Proc. 2nd European Workshp on Advanced Video-Based Surveillance Systems, 2001. It can be foundhere (pdf)(citeceer).

just a comment

Due to the complexity of the process and lack of documentation, the following link may be quite helpful in understanding all the necessary steps. http://www.merl.com/papers/docs/TR2003-36.pdf

Example

cvGaussianBGModelDemo.cpp

Blob Entrance Detection

The “Blob Entering Detection” module uses the result (FG mask) of the “FG/BG Detection” module to detect that a new blob object enters the scene.

Actually, two implementations exist for this module:

BD_CC - Detect new blob by tracking connected components of ForeGround mask
BD_Simple - Detect new blob by uniform moving of connected components of FG mask

The execution is similar for both, using the background model explained above we can resume the call of this module by:

#include "cvaux.h"

IplImage *current_frame;
int nextBlobID=1;
CvBlobSeq* newBlobList, CvBlobSeq* blobList;
CvBGStatModel* bgModel = cvCreateGaussianBGModel(cvQueryFrame(capture),NULL);
CvBlobDetector* blobDetect = cvCreateBlobDetectorCC(); //or cvCreateBlobDetectorSimple();
...
while(cvGrabFrame(capture)) {

      //Compute the FG
     current_frame = cvRetrieveFrame(capture);
     cvUpdateBGStatModel(current_frame,bgModel);

    ....

     //Then once the BG is trained use FG to detect new blob.
    if(FrameCount > FGTrainFrames) {
        
        blobDetect->DetectNewBlob(current_frame, bgModel->foreground, &newBlobList, &blobList);

        //Loop on the new blob found.
        for(i=0; i<newBlobList.GetBlobNum(); ++i)
        {
            CvBlob* pBN = NewBlobList.GetBlob(i);
            pBN->ID = nextBlobID;

            //Check if the size of the new blob is big enough to be inserted in the blobList.
            if(pBN && pBN->w >= CV_BLOB_MINW && pBN->h >= CV_BLOB_MINH)
            {
                   cout << "Add blob #" << nextBlobID << endl; 
                   blobList.AddBlob(pBN);
                   nextBlobID++;                  
            }
        }
    }

    //Then a tracking should be performed to follow the blob
    ...  
}

This code is a simplification of the code in "cvaux/vs/blobtrackingauto.cpp"