目录
摘要
本文从数据维度的角度探讨了三维卷积操作,并通过具体实例说明了目标定位和特征点检测的原理及其基本方法,目标定位和特征点检测都是在单个对象上的任务,在图像分类网络的基础上,目标定位任务需要加入定位对象的中心点坐标及长宽,特征点检测任务中需要加入特征点(兴趣点)的坐标。此外,本文还介绍了基于滑动窗口的目标检测算法,其通过特定大小的滑动窗口按照特定的滑动步长,从左到右,从上至下从待检测图像中获得相应信息,传入卷积神经网络,计算后得到当前滑动窗口对应的预测结果,但这种方法在原本的卷积神经网络上的效率低下。本文接着分析了这种效率低下的原因,不同滑动窗口间的重合部分涉及到重复的卷积运算,即独立检测导致了大量重复计算。针对这一问题,本文介绍了一种利用卷积实现滑动窗口的方法,此方法将原始卷积神经网络中的全连接层改为相应的卷积层,用卷积运算代替全连接运算,以此增强网络的灵活性,即可以处理不同尺寸的输入。检测时将整个待检测图像一次性送入改进后的卷积神经网络,且此方法能一次得到所有滑动窗口的预测结果,其结果的相对位置也能反映原本滑动窗口的位置。该方法用了卷积的思想,通过共享多个卷积结果,有效避免了重复计算,从而提高了计算效率。
Abstract
This article discusses the operation of 3D convolution from the perspective of data dimensions and illustrates the principles and basic methods of object localization and feature point detection through specific examples. Both object localization and feature point detection are tasks performed on a single object. Building on the foundation of image classification networks, the object localization task requires the addition of the center coordinates and dimensions (length and width) of the object, while the feature point detection task requires the addition of the coordinates of feature points (points of interest). Additionally, this article introduces a sliding window-based object detection algorithm. This algorithm obtains corresponding information from the image to be detected through a sliding window of a specific size and stride, moving from left to right and top to bottom. The information is then input into the convolutional neural network to calculate the prediction results for the current sliding window. However, this method is inefficient on the original convolutional neural network. The article then analyzes the reasons for this inefficiency, noting that the overlapping parts between different sliding windows involve redundant convolution operations, meaning that independent detection leads to a large amount of repeated calculations. To address this issue, the article presents a method that utilizes convolution to implement sliding windows. This method replaces the fully connected layers in the original convolutional neural network with corresponding convolutional layers, substituting convolution operations for fully connected operations to enhance the network's flexibility, allowing it to handle inputs of different sizes. During detection,