Object Recognition and Scene Understanding(一)概述及HOG特征

本文是Object Recognition and Scene Understanding专题的第一部分,主要介绍基于HOG特征的行人检测。HOG特征是一种局部区域描述符,通过计算梯度方向直方图用于图像识别,特别是行人检测。HOG+SVM的组合在行人检测中表现优秀,涉及图像灰度化、梯度计算、单元格梯度方向投影、块归一化等多个步骤。

写一个简单的专题吧:Object Recognition and Scene Understanding,包括以下三大块内容:

1.Object Recognition from Local Scale-Invariant Features,基于特征的目标识别算法,最具代表性的就是David G. Lowe的SIFT特征。

该部分内容作者已经申请专利,这里就不多做介绍

2. Histograms of Oriented Gradients for Human Detection

基于HOG特征的行人检测

3. A Discriminatively Trained, Multiscale, Deformable Part Model

DPM 目前为止比较好的目标检测算法

 

按照以上框架,尽量利用网络资源,这样可以集合大家的力量,分享这部分内容。

 HOG特征

http://blog.youkuaiyun.com/carson2005/article/details/7782726

 

梯度直方图特征(HOG) 是一种对图像局部重叠区域的密集型描述符,它通过计算局部区域的梯度方向直方图来构成特征。Hog特征结合SVM分类器已经被广泛应用于图像识别中,尤其在行人检测中获得了极大的成功。需要提醒的是,HOG+SVM进行行人检测的方法是法国研究人员Dalal2005CVPR上提出的,而如今虽然有很多行人检测算法不断提出,但基本都是以HOG+SVM的思路为主。

        HOG特征是一种局部区域描述符,它通过计算局部区域上的梯度方向直方图来构成人体特征,能够很好地描述人体的边缘。它对光照变化和小量的偏移不敏感。

图像中像素点(x,y)的梯度为

    Dalal提出的Hog特征提取的过程:把样本图像分割为若干个像素的单元(cell),把梯度方向平均划分为9个区间(bin),在每个单元里面对所有像素的梯度方向在各个方向区间进行直方图统计,得到一个9维的特征向量,每相邻的4个单元构成一个块(block),把一个块内的特征向量联起来得到36维的特征向量,用块对样本图像进行扫描,扫描步长为一个单元。最后将所有块的特征串联起来,就得到了人体的特征。例如,对于64*128的图像而言,每2*2的单元(16*16的像素)构成一个块,每个块内有4*9=36个特征,以8个像素为步长,那么,水平方向将有7个扫描窗口,垂直方向将有15个扫描窗口。也就是说,64*128的图片,总共有36*7*15=3780个特征。

在行人检测过程中,除了上面提到的HOG特征提取过程,还包括彩图转灰度,亮度校正等步骤。总结一下,在行人检测中,HOG特征计算的步骤:

(1)将输入的彩图转换为灰度图;

(2)采用Gamma校正法对输入图像进行颜色空间的标准化(归一化);   目的是调节图像的对比度,降低图像局部的阴影和光照变化所造成的影响,同时可以抑制噪音的干扰;

(3)计算梯度;主要是为了捕获轮廓信息,同时进一步弱化光照的干扰。

(4)将梯度投影到单元的梯度方向;目的是为局部图像区域提供一个编码,

(5)将所有单元格在块上进行归一化;归一化能够更进一步对光照、阴影和边缘进行压缩,通常,每个单元格由多个不同的块共享,但它的归一化是基于不同块的,所以计算结果也不一样。因此,一个单元格的特征会以不同的结果多次出现在最后的向量中。我们将归一化之后的块描述符就称之为HOG描述符。

(6)收集得到检测空间所有块的HOG特征;该步骤就是将检测窗口中所有重叠的块进行HOG特征的收集,并将它们结合成最终的特征向量供分类使用。

 

Convolutional Neural Networks for Image Recognition Abstract: Convolutional Neural Networks (CNNs) have recently shown outstanding performance in many computer vision tasks, especially in image recognition. This paper presents an overview of CNNs, including their architecture, training, and applications. We first introduce the basic concepts of CNNs, including convolution, pooling, and nonlinearity, and then discuss the popular CNN models, such as LeNet-5, AlexNet, VGGNet, and ResNet. The training methods of CNNs, such as backpropagation, dropout, and data augmentation, are also discussed. Finally, we present several applications of CNNs in image recognition, including object detection, face recognition, and scene understanding. Introduction: Image recognition is a fundamental task in computer vision, which aims to classify the objects and scenes in images. Traditional methods for image recognition relied on handcrafted features, such as SIFT, HOG, and LBP, which were then fed into classifiers, such as SVM, boosting, and random forests, to make the final prediction. However, these methods suffered from several limitations, such as the need for manual feature engineering, sensitivity to image variations, and difficulty in handling large-scale datasets. Convolutional Neural Networks (CNNs) have emerged as a powerful technique for image recognition, which can automatically learn the features from raw images and make accurate predictions. CNNs are inspired by the structure and function of the visual cortex in animals, which consists of multiple layers of neurons that are sensitive to different visual features, such as edges, corners, and colors. The neurons in each layer receive inputs from the previous layer and apply a set of learned filters to extract the relevant features. The output of each layer is then fed into the next layer, forming a hierarchical representation of the input image. Architecture of CNNs: The basic building blocks of CNNs are convolutional layers, pooling layers, and fully connected layers. Convolutional layers apply a set of filters to the input image, which convolves the filters with the input to produce a set of activation maps. Each filter is responsible for detecting a specific feature, such as edges, corners, or blobs, at different locations in the input image. Pooling layers reduce the dimensionality of the activation maps by subsampling them, which makes the network more robust to spatial translations and reduces the computational cost. Fully connected layers connect all the neurons in one layer to all the neurons in the next layer, which enables the network to learn complex nonlinear relationships between the features. Training of CNNs: The training of CNNs involves minimizing a loss function, which measures the difference between the predicted labels and the true labels. The most common loss function for image classification is cross-entropy, which penalizes the predicted probabilities that deviate from the true probabilities. Backpropagation is used to compute the gradients of the loss function with respect to the network parameters, which are then updated using optimization algorithms, such as stochastic gradient descent (SGD), Adam, and RMSprop. Dropout is a regularization technique that randomly drops out some neurons during training, which prevents overfitting and improves generalization. Data augmentation is a technique that generates new training examples by applying random transformations to the original images, such as rotation, scaling, and flipping, which increases the diversity and quantity of the training data. Applications of CNNs: CNNs have been successfully applied to various image recognition tasks, including object detection, face recognition, and scene understanding. Object detection aims to localize and classify the objects in images, which is a more challenging task than image classification. The popular object detection frameworks based on CNNs include R-CNN, Fast R-CNN, and Faster R-CNN, which use region proposal methods to generate candidate object locations and then classify them using CNNs. Face recognition aims to identify the individuals in images, which is important for security and surveillance applications. The popular face recognition methods based on CNNs include DeepFace, FaceNet, and VGGFace, which learn the deep features of faces and then use metric learning to compare the similarity between them. Scene understanding aims to interpret the semantic meaning of images, which is important for autonomous driving and robotics applications. The popular scene understanding methods based on CNNs include SegNet, FCN, and DeepLab, which perform pixel-wise classification of the image regions based on the learned features. Conclusion: CNNs have revolutionized the field of computer vision and achieved state-of-the-art performance in many image recognition tasks. The success of CNNs can be attributed to their ability to learn the features from raw images and their hierarchical structure that captures the spatial and semantic relationships between the features. CNNs have also inspired many new research directions, such as deep learning, transfer learning, and adversarial learning, which are expected to further improve the performance and scalability of image recognition systems.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值