opencv contrib-master ximgproc module doc note

本文提出了一种新的边缘检测方法,使用结构化的随机森林学习局部图像特征,并应用于快速准确的边缘检测。此外,还介绍了一种基于匹配的光流估计方法EpicFlow,以及一种用于轮廓和对象检测的中层表示——SketchTokens。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow

Most state-of-the-art optical flow approaches are built upon an energy minimization framework, often solved using efficient coarse-to-fine algorithms [1,5]. A major drawback of coarse-to-fine schemes is error-propagation, i.e., errors at coarser levels, where different motion layers can overlap, can propagate across scales. Even if coarse-to-fine techniques work well in most cases, we are not aware of a theoretical guarantee or proof of convergence. Instead, we propose to simply interpolate a sparse set of matches in a dense manner to initialize the optical flow estimation, see Figure 1. This novel procedure enables us to leverage recent advances in matching algorithms, which can now output quasi-dense correspondence fields [6]. In the same spirit as [4], we perform a sparse-to-dense interpolation by fitting a local affine model at each pixel based on nearby matches. Nevertheless, a major issue arises for the preservation of motion boundaries. We make the following observation: motion boundaries often tend to appear at image edges, see Figure 2. Consequently, we propose to exchange the Euclidean distance with a better, i.e., edge-aware, distance and show that it offers a natural way to handle motion discontinuities. Moreover, we show how an approximation of the edge-aware distance allows to fit only one affine model per input match (instead of one per pixel). This leads to an important speed up of the interpolation scheme without loss in performance.

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection

We propose a novel approach to both learning and detecting local contour-based representations for mid-level features. Our features, called sketch tokens, are learned using supervised mid-level information in the form of hand drawn contours in images. Patches of human generated contours are clustered to form sketch token classes and a random forest classifier is used for efficient detection in novel images. We demonstrate our approach on both top-down and bottom-up tasks. We show state-of-the-art results on the top-down task of contour detection while being over 200 × faster than competing methods. We also achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA [5] and PASCAL [10], respectively. These gains are due to the complementary information provided by sketch tokens to low-level features such as gradient histograms.

self-similarity

The second type of feature used by our method is based on self-similarity. It is well known that contours not only occur at intensity or color edges, but also at texture boundaries. The self-similarity features capture the portions of an image patch that contain similar textures based on color or gradient information. We compute texture information on a m×m grid over the patch, with m=5 yielding 7×7 cells for 35×35 patches. For channel k and grid cells i and j , we define the self-similarity feature fijk as:

fijk=sjksik

where sjk is the sum of grid cell j in channel k. Since fijk=fjik and fiik=0 , the total number of self similarity features per channel is given by
(mm2)
. For m=5 this yields 300 features per channel. For computational efficiency, the sums sjk over an entire image can be computed efficiently by convolving each channel by a box filter of width equal to the cell size.

In summary, we utilize 3 color channels, 3 gradient magnitude channels, and 8 oriented gradient channels for a total of 14 channels. For a 35×35 patch this gives 353514=17150 channel features and 30014=4200 self-similarity features, yielding a 21350 dimensional feature vector (the learned model will use only a small subset of these features). Computing the channels given a 640×480 input image takes only a fraction of a second using optimized code from Doll ́ar et al. [7] available online 1 .

Classification

Two considerations must be taken into account when choosing a classifier for labeling sketch tokens in image patches. First, every pixel in the image must be labeled, so the classifier must be efficient. Second, the number of potential classes for each patch ranges in the hundreds. In this work we use a random forest classifier, since it is an efficient method for multi-class problems.

A random forest is a collection of decision trees whose results are averaged to produce a final result. We randomly sample 150,000 contour patches (1000 per token class) and 160,000 “no contour” patches (800 per training image) for training each tree. The Gini impurity measure is used to select a feature and decision boundary for each branch node from a randomly selected subset of F of F possible features. The leaf nodes contain the probabilities of belonging to each class and are typically quite sparse. We use a collection of 25 trees trained until every leaf node is pure or contains fewer than 5 examples. The median depth of the tree is 20 although some branches are substantially deeper.

Contour detection

We now describe our approach to detecting contours using a top-down approach. Sketch tokens provide an estimate of the local edge structure in a patch. However, contour detection only requires the binary labeling of pixel contours.
We show that computing mid-level sketch tokens provides accurate and efficient predictions of low-level contours. Our random forest classifier predicts the probability that an image patch belongs to each token class or the negative set. Since each token has a contour located at its center, we can compute the probability of a contour at the center pixel using the sum of token probabilities. If tij is the probability of patch xi belonging to token j , and t)i0 is the probability of belonging to the “no contour” class, the estimated probability of the patch’s center containing a contour is:

ei=jtij=1ti0

Once the probability of a contour has been computed at each pixel, a standard non-maximal suppression scheme may be applied to find the peak response of a contour.

Structured Forests for Fast Edge Detection

边缘检测对于很多视觉系统至关重要,包括目标检测,图像分割算法等。Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. 文章中我们利用了表示局部图像簇的结构来学习一个精确快速的边缘检测器。 我们构建一个预测局部边缘掩膜,在一个结构性学习框架中应用随机鞠策森林方法。该方法学习决策树鲁棒将结构标签映射至离散空间,该空间中可以evaluate standard information gain measures。The result is an approach that obtains real-time performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge detection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.

State-of-the-art approaches to edge detection [1, 31, 21] use a variety of features as input, including brightness, color and texture gradients computed over multiple scales. For top accuracy, globalization based on spectral clustering may also be performed.

Since visually salient edges correspond to a variety of
visual phenomena, finding a unified approach to edge detection is difficult.

We formulate the problem of edge detection as predicting local segmentation masks given input image patches. Our novel approach to learning decision trees uses structured labels to determine the splitting function at each branch in the tree. The structured labels are robustly mapped to a discrete space on which standard information gain measures may be evaluated. Each forest predicts a patch of edge pixel labels that are aggregated across the image to compute our final edge map, see Figure 1. We show state-of-the-art results on both the BSDS500 and the NYU Depth dataset. We demonstrate the potential of our approach as a general purpose edge detector by showing the strong cross dataset generalization of our learned edge models.

Structured learning

Structured learning addresses the problem of learning a mapping where the input or output space may be arbitrarily complex representing strings, sequences, graphs, object pose, bounding boxes etc.

Our structured random forests differ from these works in several respects. First, we assume that only the output space is structured and operate on a standard input space. Second, by default our model can only output examples observed during training (although this can be ameliorated with custom ensemble models). On the other hand, common approaches for structured prediction learn parameters to a
scoring function, and to obtain a prediction, an optimization over the output space must be performed. This
requires defining a scoring function and an efficient (possibly approximate) optimization procedure. In contrast, inference using our structured random forest is straightforward, general and fast (same as for standard random forests).

Finally, our work was inspired by the recent paper by
Kontschieder et al. on learning random forests for
structured class labels for the specific case where the output labels represent a semantic image labeling for an image patch. The key observation made by Kontschieder et al. is that given a color image patch, the leaf node reached in a tree is independent of the structured semantic labels, and any type of output can be stored at each leaf. Building on this idea, we propose a general learning framework for structured output forests that can be used with a broad class of output spaces and we apply our framework to learning an accurate and fast edge detector.

edge-preserving smoothing

At a high level, EPS(edge-preserving smoothing) methods can be classified into two groups. The first group consists of the edge-preserving (EP) filters that explicitly compute a filtering output as a weighted average, sometimes in an iterative way.
The second class of existing EPS methods are based on global optimization formulations. They seek to find a globally optimal solution to an objective function usually involving a data constraint term and a prior smoothness term.
引导滤波器
http://blog.youkuaiyun.com/wushanyun1989/article/details/18225259
http://research.microsoft.com/en-us/um/people/kahe/eccv10/index.html

### 关于 OpenCV 的 `ximgproc` 模块 `opencv-contrib-python` 是一个扩展库,包含了 OpenCV 主要模块之外的一些额外功能。其中的 `ximgproc` 模块提供了许多高级图像处理算法,这些算法通常用于边缘检测、超分辨率重建以及其他复杂的图像增强技术。 以下是有关 `ximgproc` 模块的功能描述以及一些常见的使用案例: #### 功能概述 `ximgproc` 模块中的主要功能包括但不限于以下几种: - **Edge Detection**: 提供了多种先进的边缘检测方法,例如 Domain Transform Filter 和 Niblack Thresholding。 - **Superpixel Segmentation**: 实现了 SLIC (Simple Linear Iterative Clustering) 超像素分割算法。 - **Disparity Map Filtering**: 改进了立体视觉中视差图的质量。 - **Guided Image Filtering**: 一种基于引导滤波器的方法,可以平滑图像并保留边界细节[^1]。 #### 使用示例 下面是一个简单的 Python 示例,展示如何利用 `cv2.ximgproc` 进行双边滤波操作: ```python import cv2 import numpy as np # 加载图片 image = cv2.imread('example.jpg') # 创建双边滤波对象 bilateral_filter = cv2.ximgproc.createFastBilateralSolverFilter( guide=image, sigma_spatial=8, sigma_luma=8, lambda_value=128 ) # 执行双边滤波 filtered_image = bilateral_filter.filter(image) # 显示结果 cv2.imshow('Original', image) cv2.imshow('Filtered', filtered_image) cv2.waitKey(0) cv2.destroyAllWindows() ``` 此代码片段展示了如何通过调用 `createFastBilateralSolverFilter()` 方法来创建一个快速双边求解器过滤器实例,并将其应用于输入图像上以获得更清晰的结果[^3]。 #### 常见问题与解决办法 如果遇到安装过程中出现问题或者无法找到某些头文件的情况,请确认环境变量设置正确无误。比如,在 Linux 平台上可能需要指定路径 `/usr/lib/x86_64-linux-gnu/cmake/opencv4` 给 CMake 工具链以便它能够定位到必要的依赖项[^2]。 另外值得注意的是,对于较旧版本的操作系统如 Ubuntu 10.04 来说,由于其软件包管理系统的局限性,建议升级至更新版发行版再尝试部署最新版 OpenCV 及相关组件[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值