28、视频处理与特征跟踪技术详解-优快云博客

本文链接：https://blog.youkuaiyun.com/php55/article/details/154861067

视频处理与特征跟踪技术详解

1. 视频编解码器的四字符代码

视频编解码器的四字符代码由四个 ASCII 字符组成，也可以将它们连接起来转换为一个整数。通过已打开的 cv::VideoCapture 实例的 get 方法的 cv::CAP_PROP_FOURCC 标志，可以获取已打开视频文件的代码。以下是在 VideoProcessor 类中定义的一个方法，用于返回输入视频的四字符代码：

// get the codec of input video 
int getCodec(char codec[4]) { 

  // undefined for vector of images 
  if (images.size()!=0) return -1; 
  union { // data structure for the 4-char code 
    int value; 
    char code[4]; 
  } returned; 

  // get the code 
  returned.value = static_cast<int>(capture.get(cv::CAP_PROP_FOURCC)); 
  // get the 4 characters 
  codec[0] = returned.code[0]; 
  codec[1] = returned.code[1]; 
  codec[2] = returned.code[2]; 
  codec[3] = returned.code[3]; 

  // return the int value corresponding to the code 
  return returned.value; 
}

get 方法总是返回一个双精度值，然后将其转换为整数。这个整数代表代码，可以使用联合数据结构从中提取四个字符。如果打开测试视频序列，可以使用以下语句：

char codec[4]; 
processor.getCodec(codec); 
std::cout << "Codec: " << codec[0] << codec[1] 
          << codec[2] << codec[3] << std::endl;

例如，上述语句可能会输出：

Codec : XVID

在写入视频文件时，必须使用四字符代码指定编解码器。这是 cv::VideoWriter 类的 open 方法的第二个参数。可以使用与输入视频相同的编解码器（这是 setOutput 方法中的默认选项），也可以传递值 -1，该方法将弹出一个窗口，要求从可用编解码器列表中选择一个编解码器。此窗口中显示的列表对应于机器上安装的编解码器列表。所选编解码器的代码将自动发送到 open 方法。

以下是相关信息的总结表格：
| 操作 | 说明 |
| ---- | ---- |
| 获取编解码器代码 | 使用 cv::VideoCapture 实例的 get 方法结合 cv::CAP_PROP_FOURCC 标志 |
| 写入视频指定编解码器 | cv::VideoWriter 类的 open 方法的第二个参数 |
| 选择编解码器 | 传递 -1 弹出窗口选择 |

2. 视频中前景对象的提取

在视频处理中，提取前景对象是一项重要任务。当固定相机观察场景时，背景大多保持不变，而移动的前景对象是我们关注的重点。为了提取这些前景对象，需要构建背景模型，然后将其与当前帧进行比较以检测前景对象。

如果有场景背景的图像（即不包含前景对象的帧），可以通过简单的图像差分来提取当前帧的前景：

// compute difference between current image and background 
cv::absdiff(backgroundImage, currentImage, foreground);

然而，大多数情况下，背景图像并不容易获得。因为很难保证给定图像中没有前景对象，而且在繁忙场景中这种情况很少发生。此外，背景场景通常会随时间变化，例如光照条件改变或背景中有新对象添加或移除。

因此，需要动态构建背景模型。一种方法是计算所有观察值的平均值，但这存在一些问题，如需要存储大量图像、在积累图像计算平均值时无法进行前景提取，以及难以确定何时和积累多少图像来计算可接受的背景模型。

更好的策略是通过定期更新来动态构建背景模型，即计算移动平均值。如果 $p_t$ 是给定时间 $t$ 的像素值，$\mu_{t - 1}$ 是当前平均值，则可以使用以下公式更新平均值：
$\mu_t = \alpha p_t + (1 - \alpha) \mu_{t - 1}$
其中，$\alpha$ 参数称为学习率，它定义了当前值对当前估计平均值的影响。学习率越大，移动平均值将越快适应观察值的变化，但如果学习率设置过高，缓慢移动的对象可能会消失在背景中。合适的学习率很大程度上取决于场景的动态性。

以下是构建一个使用移动平均值学习背景模型并通过减法提取前景对象的类的代码：

class BGFGSegmentor : public FrameProcessor { 
  cv::Mat gray;          // current gray-level image 
  cv::Mat background;    // accumulated background 
  cv::Mat backImage;     // current background image 
  cv::Mat foreground;    // foreground image 
  // learning rate in background accumulation 
  double learningRate; 
  int threshold;         // threshold for foreground extraction 
};

// processing method 
void process(cv:: Mat &frame, cv:: Mat &output) { 
  // convert to gray-level image 
  cv::cvtColor(frame, gray, cv::COLOR_BGR2GRAY); 
  // initialize background to 1st frame 
  if (background.empty()) 
    gray.convertTo(background, CV_32F); 
  // convert background to 8U 
  background.convertTo(backImage, CV_8U); 

  // compute difference between image and background 
  cv::absdiff(backImage, gray, foreground); 
  // apply threshold to foreground image         
  cv::threshold(foreground, output, threshold, 
                255, cv::THRESH_BINARY_INV); 

  // accumulate background 
  cv::accumulateWeighted(gray, background,  
                         // alpha*gray + (1-alpha)*background 
                         learningRate,  // alpha 
                         output);       // mask 
}

使用视频处理框架构建前景提取程序的代码如下：

int main() { 
  // Create video procesor instance 
  VideoProcessor processor; 

  // Create background/foreground segmentor 
  BGFGSegmentor segmentor; 
  segmentor.setThreshold(25); 

  // Open video file 
  processor.setInput("bike.avi"); 

  // Set frame processor 
  processor.setFrameProcessor(&segmentor); 

  // Declare a window to display the video 
  processor.displayOutput("Extracted Foreground"); 

  // Play the video at the original frame rate 
  processor.setDelay(1000. / processor.getFrameRate()); 

  // Start the process 
  processor.run(); 
}

其工作流程如下：

graph TD;
    A[输入视频帧] --> B[转换为灰度图像];
    B --> C{背景是否为空};
    C -- 是 --> D[初始化背景];
    C -- 否 --> E[转换背景为 8U];
    D --> E;
    E --> F[计算图像与背景的差异];
    F --> G[应用阈值处理前景图像];
    G --> H[累积背景];
    H --> I[输出前景图像];

3. 视频中特征点的跟踪

在视频序列中，跟踪特征点可以帮助我们理解场景中不同元素的运动。为了开始跟踪过程，首先要在初始帧中检测特征点，然后在后续帧中跟踪这些点。由于视频序列中对象可能会移动，需要在点的先前位置周围搜索其新位置，这可以通过 cv::calcOpticalFlowPyrLK 函数实现。

以下是实现特征点跟踪的类的代码：

class FeatureTracker : public FrameProcessor { 
  cv::Mat gray;      // current gray-level image 
  cv::Mat gray_prev; // previous gray-level image 
  // tracked features from 0->1 
  std::vector<cv::Point2f> points[2]; 
  // initial position of tracked points 
  std::vector<cv::Point2f> initial; 
  std::vector<cv::Point2f> features;  // detected features 
  int max_count;               // maximum number of features to detect 
  double qlevel;               // quality level for feature detection 
  double minDist;              // min distance between two points 
  std::vector<uchar> status;   // status of tracked features 
  std::vector<float> err;      // error in tracking 

  public: 
    FeatureTracker() : max_count(500), qlevel(0.01), minDist(10.) {}
};

void process(cv:: Mat &frame, cv:: Mat &output) { 
  // convert to gray-level image 
  cv::cvtColor(frame, gray, CV_BGR2GRAY);  
  frame.copyTo(output); 

  // 1. if new feature points must be added 
  if (addNewPoints()) { 
    // detect feature points 
    detectFeaturePoints(); 
    // add the detected features to  
    // the currently tracked features 
    points[0].insert(points[0].end(), features.begin(), features.end()); 
    initial.insert(initial.end(), features.begin(), features.end()); 
  } 

  // for first image of the sequence 
  if (gray_prev.empty()) 
    gray.copyTo(gray_prev); 

  // 2. track features 
  cv::calcOpticalFlowPyrLK( 
    gray_prev, gray, // 2 consecutive images 
    points[0],       // input point positions in first image 
    points[1],       // output point positions in the 2nd image 
    status,          // tracking success 
    err);            // tracking error 

  // 3. loop over the tracked points to reject some 
  int k = 0; 
  for (int i = 0; i < points[1].size(); i++) { 
    // do we keep this point? 
    if (acceptTrackedPoint(i)) { 
      // keep this point in vector 
      initial[k] = initial[i]; 
      points[1][k++] = points[1][i]; 
    } 
  } 

  // eliminate unsuccesful points 
  points[1].resize(k); 
  initial.resize(k); 

  // 4. handle the accepted tracked points 
  handleTrackedPoints(frame, output); 

  // 5. current points and image become previous ones 
  std::swap(points[1], points[0]); 
  cv::swap(gray_prev, gray); 
} 

// feature point detection 
void detectFeaturePoints() { 
  // detect the features 
  cv::goodFeaturesToTrack(gray,  // the image 
                          features,    // the output detected features 
                          max_count,   // the maximum number of features  
                          qlevel,      // quality level 
                          minDist);    // min distance between two features 
} 

// determine if new points should be added 
bool addNewPoints() { 
  // if too few points 
  return points[0].size() <= 10; 
} 

// determine which tracked point should be accepted 
bool acceptTrackedPoint(int i) { 
  return status[i] &&  // status is false if unable to track point i 
    // if point has moved 
    (abs(points[0][i].x - points[1][i].x) + 
     (abs(points[0][i].y - points[1][i].y)) > 2); 
} 

// handle the currently tracked points 
void handleTrackedPoints(cv:: Mat &frame, cv:: Mat &output) { 
  // for all tracked points 
  for (int i = 0; i < points[1].size(); i++) { 
    // draw line and circle 
    cv::line(output, initial[i],  // initial position  
             points[1][i],        // new position  
             cv::Scalar(255, 255, 255)); 
    cv::circle(output, points[1][i], 3,       
               cv::Scalar(255, 255, 255), -1); 
  } 
}

简单的主函数代码如下：

int main() { 
  // Create video procesor instance 
  VideoProcessor processor; 

  // Create feature tracker instance 
  FeatureTracker tracker; 
  // Open video file 
  processor.setInput("bike.avi"); 

  // set frame processor 
  processor.setFrameProcessor(&tracker); 

  // Declare a window to display the video 
  processor.displayOutput("Tracked Features"); 

  // Play the video at the original frame rate 
  processor.setDelay(1000. / processor.getFrameRate()); 

  // Start the process 
  processor.run(); 
}

特征点跟踪的步骤总结如下：
1. 检测特征点：使用 cv::goodFeaturesToTrack 函数。
2. 跟踪特征点：使用 cv::calcOpticalFlowPyrLK 函数。
3. 筛选特征点：根据条件筛选出需要保留的特征点。
4. 处理跟踪到的特征点：绘制跟踪线和圆。
5. 更新当前帧和特征点信息。

通过上述技术，可以实现视频的编解码器处理、前景对象提取和特征点跟踪，为视频分析和处理提供有力支持。

视频处理与特征跟踪技术详解（下半部分）

4. 更复杂的背景建模方法

前面介绍的简单前景提取方法在背景相对稳定的简单场景中效果较好。但在很多情况下，背景场景可能在某些区域在不同值之间波动，从而导致频繁的误检。这些问题可能是由于移动的背景对象（如树叶）、眩光效果（如水表面）或阴影等原因引起的。为了解决这些问题，引入了更复杂的背景建模方法。

4.1 高斯混合模型方法

高斯混合模型方法与前面介绍的方法类似，但有一些改进。

首先，该方法为每个像素维护多个模型（即多个移动平均值）。例如，如果一个背景像素在两个值之间波动，就会存储两个移动平均值。只有当一个新的像素值不属于任何最常观察到的模型时，才会将其声明为前景。使用的模型数量是该方法的一个参数，典型值为 5。

其次，不仅为每个模型维护移动平均值，还维护移动方差。计算方式如下：

这些计算得到的平均值和方差用于构建高斯模型，从而可以估计给定像素值属于背景的概率。这使得确定合适的阈值变得更容易，因为现在阈值是以概率而不是绝对差值来表示的。因此，在背景值波动较大的区域，需要更大的差值才能声明为前景对象。

最后，这是一个自适应模型。当一个给定的高斯模型没有被足够频繁地命中时，它将被排除在背景模型之外。相反，当一个像素值被发现不在当前维护的背景模型中（即它是一个前景像素），会创建一个新的高斯模型。如果这个新模型在未来频繁接收到像素，它将与背景关联起来。

在 OpenCV 中，有该算法的实现，如 cv::bgsegm::createBackgroundSubtractorMOG ，它是 cv::BackgroundSubtractor 类的子类。使用默认参数时，这个类非常容易使用：

int main() {
  // Open the video file 
  cv::VideoCapture capture("bike.avi"); 
  // check if video successfully opened 
  if (!capture.isOpened()) 
    return 0; 

  // current video frame 
  cv::Mat frame; 

  // foreground binary image 
  cv::Mat foreground; 
  // background image 
  cv::Mat background; 
  cv::namedWindow("Extracted Foreground"); 

  // The Mixture of Gaussian object 
  // used with all default parameters 
  cv::Ptr<cv::BackgroundSubtractor> ptrMOG = cv::bgsegm::createBackgroundSubtractorMOG(); 
  bool stop(false); 
  // for all frames in video 
  while (!stop) { 
    // read next frame if any 
    if (!capture.read(frame)) 
      break; 

    // update the background 
    // and return the foreground 
    ptrMOG->apply(frame, foreground, 0.01); 

    // Complement the image 
    cv::threshold(foreground, foreground, 128, 
                  255, cv::THRESH_BINARY_INV); 
    // show foreground and background 
    cv::imshow("Extracted Foreground", foreground); 

    // introduce a delay 
    // or press key to stop 
    if (cv::waitKey(10) >= 0) 
      stop = true; 
  } 
}

该方法的工作流程如下：

graph TD;
    A[打开视频文件] --> B[创建高斯混合模型对象];
    B --> C[循环读取视频帧];
    C --> D[更新背景并返回前景];
    D --> E[对前景图像进行阈值处理];
    E --> F[显示前景图像];
    F --> G{是否按下按键};
    G -- 是 --> H[停止循环];
    G -- 否 --> C;

此外，还有 cv::BackgroundSubtractorMOG2 实现，它的一个改进是每个像素使用的合适高斯模型数量现在是动态确定的。通常， cv::BackgroundSubtractorMOG2 速度更快。

以下是不同背景建模方法的比较表格：
| 方法 | 优点 | 缺点 | 适用场景 |
| ---- | ---- | ---- | ---- |
| 简单背景差分 | 实现简单 | 对背景变化适应性差，易误检 | 背景稳定的简单场景 |
| 高斯混合模型（MOG） | 能处理背景波动，可自适应 | 计算相对复杂 | 背景有一定波动的场景 |
| 高斯混合模型（MOG2） | 速度快，模型数量动态确定 | | 对速度要求较高的场景 |