opencv contrib-master ximgproc module doc note

最新推荐文章于 2024-07-30 14:38:14 发布

wendox

最新推荐文章于 2024-07-30 14:38:14 发布

阅读量1.6k

点赞数

CC 4.0 BY-SA版权

分类专栏： Vision

本文链接：https://blog.youkuaiyun.com/wendox/article/details/50698776

Vision 专栏收录该内容

11 篇文章

订阅专栏

本文提出了一种新的边缘检测方法，使用结构化的随机森林学习局部图像特征，并应用于快速准确的边缘检测。此外，还介绍了一种基于匹配的光流估计方法EpicFlow，以及一种用于轮廓和对象检测的中层表示——SketchTokens。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow

Most state-of-the-art optical flow approaches are built upon an energy minimization framework, often solved using efficient coarse-to-fine algorithms [1,5]. A major drawback of coarse-to-fine schemes is error-propagation, i.e., errors at coarser levels, where different motion layers can overlap, can propagate across scales. Even if coarse-to-fine techniques work well in most cases, we are not aware of a theoretical guarantee or proof of convergence. Instead, we propose to simply interpolate a sparse set of matches in a dense manner to initialize the optical flow estimation, see Figure 1. This novel procedure enables us to leverage recent advances in matching algorithms, which can now output quasi-dense correspondence fields [6]. In the same spirit as [4], we perform a sparse-to-dense interpolation by fitting a local affine model at each pixel based on nearby matches. Nevertheless, a major issue arises for the preservation of motion boundaries. We make the following observation: motion boundaries often tend to appear at image edges, see Figure 2. Consequently, we propose to exchange the Euclidean distance with a better, i.e., edge-aware, distance and show that it offers a natural way to handle motion discontinuities. Moreover, we show how an approximation of the edge-aware distance allows to fit only one affine model per input match (instead of one per pixel). This leads to an important speed up of the interpolation scheme without loss in performance.

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection

We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both top-down and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200 $\times$ faster than competing methods. We also achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA [5] and PASCAL [10], respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.

self-similarity

The second type of feature used by our method is based on self-similarity. It is well known that contours not only occur at intensity or color edges, but also at texture boundaries. The self-similarity features capture the portions of an image patch that contain similar textures based on color or gradient information. We compute texture information on a m×m grid over the patch, with $m = 5$ yielding $7 \times 7$ cells for $35 \times 35$ patches. For channel $k$ and grid cells $i$ and $j$ , we define the self-similarity feature $f_{ijk}$ as:

f i j k = s j k - s i k

$f_{ijk}=s_{jk}-s_{ik}$
where

sjk $s_{jk}$ is the sum of grid cell

j $j$ in channel

k $k$ . Since

fijk=−fjik $f_{ijk}=-f_{jik}$ and

fiik=0 $f_{iik} = 0$ , the total number of self similarity features per channel is given by

(m \cdot m 2)

$\left(m\cdot m 2 \right)$ . For

m=5 $m=5$ this yields 300 features per channel. For computational efficiency, the sums

sjk $s_{jk}$ over an entire image can be computed efficiently by convolving each channel by a box filter of width equal to the cell size.

In summary, we utilize 3 color channels, 3 gradient magnitude channels, and 8 oriented gradient channels for a total of 14 channels. For a $35 \times 35$ patch this gives $35 \cdot 35 \cdot 14 = 17150$ channel features and $300 \cdot 14 = 4200$ self-similarity features, yielding a 21350 dimensional feature vector (the learned model will use only a small subset of these features). Computing the channels given a $640 \times 480$ input image takes only a fraction of a second using optimized code from Doll ́ar et al. [7] available online 1 .

Classification

Two considerations must be taken into account when choosing a classifier for labeling sketch tokens in image patches. First, every pixel in the image must be labeled, so the classifier must be efficient. Second, the number of potential classes for each patch ranges in the hundreds. In this work we use a random forest classifier, since it is an efficient method for multi-class problems.

A random forest is a collection of decision trees whose results are averaged to produce a final result. We randomly sample 150,000 contour patches (1000 per token class) and 160,000 “no contour” patches (800 per training image) for training each tree. The Gini impurity measure is used to select a feature and decision boundary for each branch node from a randomly selected subset of $\sqrt{F}$ of $F$ possible features. The leaf nodes contain the probabilities of belonging to each class and are typically quite sparse. We use a collection of 25 trees trained until every leaf node is pure or contains fewer than 5 examples. The median depth of the tree is 20 although some branches are substantially deeper.

Contour detection

We now describe our approach to detecting contours using a top-down approach. Sketch tokens provide an estimate of the local edge structure in a patch. However, contour detection only requires the binary labeling of pixel contours.
We show that computing mid-level sketch tokens provides accurate and efficient predictions of low-level contours. Our random forest classifier predicts the probability that an image patch belongs to each token class or the negative set. Since each token has a contour located at its center, we can compute the probability of a contour at the center pixel using the sum of token probabilities. If $t_{ij}$ is the probability of patch $x_i$ belonging to token $j$ , and $t){i0}$ is the probability of belonging to the “no contour” class, the estimated probability of the patch’s center containing a contour is:

e i = \sum j t i j = 1 - t i 0

$e_i=\sum_j t_{ij} = 1-t_{i0}$
Once the probability of a contour has been computed at each pixel, a standard non-maximal suppression scheme may be applied to find the peak response of a contour.

Structured Forests for Fast Edge Detection

边缘检测对于很多视觉系统至关重要，包括目标检测，图像分割算法等。Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. 文章中我们利用了表示局部图像簇的结构来学习一个精确快速的边缘检测器。我们构建一个预测局部边缘掩膜，在一个结构性学习框架中应用随机鞠策森林方法。该方法学习决策树鲁棒将结构标签映射至离散空间，该空间中可以evaluate standard information gain measures。The result is an approach that obtains real-time performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge detection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.

State-of-the-art approaches to edge detection [1, 31, 21] use a variety of features as input, including brightness, color and texture gradients computed over multiple scales. For top accuracy, globalization based on spectral clustering may also be performed.

Since visually salient edges correspond to a variety of
visual phenomena, finding a unified approach to edge detection is difficult.

We formulate the problem of edge detection as predicting local segmentation masks given input image patches. Our novel approach to learning decision trees uses structured labels to determine the splitting function at each branch in the tree. The structured labels are robustly mapped to a discrete space on which standard information gain measures may be evaluated. Each forest predicts a patch of edge pixel labels that are aggregated across the image to compute our final edge map, see Figure 1. We show state-of-the-art results on both the BSDS500 and the NYU Depth dataset. We demonstrate the potential of our approach as a general purpose edge detector by showing the strong cross dataset generalization of our learned edge models.

Structured learning

Structured learning addresses the problem of learning a mapping where the input or output space may be arbitrarily complex representing strings, sequences, graphs, object pose, bounding boxes etc.

Our structured random forests differ from these works in several respects. First, we assume that only the output space is structured and operate on a standard input space. Second, by default our model can only output examples observed during training (although this can be ameliorated with custom ensemble models). On the other hand, common approaches for structured prediction learn parameters to a
scoring function, and to obtain a prediction, an optimization over the output space must be performed. This
requires defining a scoring function and an efficient (possibly approximate) optimization procedure. In contrast, inference using our structured random forest is straightforward, general and fast (same as for standard random forests).

Finally, our work was inspired by the recent paper by
Kontschieder et al. on learning random forests for
structured class labels for the specific case where the output labels represent a semantic image labeling for an image patch. The key observation made by Kontschieder et al. is that given a color image patch, the leaf node reached in a tree is independent of the structured semantic labels, and any type of output can be stored at each leaf. Building on this idea, we propose a general learning framework for structured output forests that can be used with a broad class of output spaces and we apply our framework to learning an accurate and fast edge detector.

edge-preserving smoothing

At a high level, EPS(edge-preserving smoothing) methods can be classified into two groups. The first group consists of the edge-preserving (EP) filters that explicitly compute a filtering output as a weighted average, sometimes in an iterative way.
The second class of existing EPS methods are based on global optimization formulations. They seek to find a globally optimal solution to an objective function usually involving a data constraint term and a prior smoothness term.
引导滤波器
http://blog.youkuaiyun.com/wushanyun1989/article/details/18225259
http://research.microsoft.com/en-us/um/people/kahe/eccv10/index.html