深度学习论文理解3:Flexible, high performance convolutional neural networks for image classification

本文是11年Dan C.Ciresan的作品,主要贡献是提供了一种快速,全GPU部署的CNN计算框架,通过快速的GPU计算可以让作者尝试相对以前的神经网络更深的CNN,而且是仅仅使用监督学习的方式。

本来不想写本文的总结的,但是最近看了ImageNet上取得好成绩的网络,都是通过GPU(caffe,convnet)部署,仅仅通过监督学习的方式来训练更加深的CNN,所以打算总结一下本文,作为在GPU上通过监督学习来训练深度CNN的开篇(Dan不是第一个使用GPU计算CNN的,个人见解)。

一,介绍

……尽管在硬件上的进步,计算速度仍然是限制CNN发展的一个主要瓶颈。为了系统性地测试各种结构的影响,本文提供了一种快速GPU部署CNN框架。之前的GPU部署CNN都是为了满足GPU硬件的限制,或者使用一般的函数库,然而我们的GPU部署比较灵活而且是on-line的权值学习方式。我们的部署允许训练CNN时间是以天为单位,而不是月;这样我们可以探索更大的参数空间,研究各种结构的影响。

二.CNN

 2.1卷积层

C层的参数=f(特征图大小,特征图个数,filter尺寸,跳跃间隔因子,连接表)

采用valid的卷积方式,卷积后特征图大小关系,如下式:


n代表层数,M代表每层特征个数,x,y分别代表特征图的长宽。K代表filter,S代表跳跃间隔数。(貌似上面公式并不具有普遍性,因为自己在后面的一些论文用此公式是map的尺寸和公式的结果不一样,估计可能是和Stride的定义有关,再有就是填充像素的原因)

2.2 max-Pooling层

相比mean-Pooling,max-Pooling能够较快的收敛,选择更加具有不变性的特征,而且还能够增加泛化能力。Max-Pooling能够在更大的局域上具有不变性,下采样(降维)特征图按照Kx和Ky倍数。

2.3 分类层

需要选择Filter尺寸&#

### Spatial Pyramid Pooling (SPP) in Deep Learning Computer Vision #### Implementation of SPP Layer Spatial Pyramid Pooling (SPP), also known as PPM, is a flexible method to handle multi-scale inputs within convolutional neural networks. The core idea behind this technique lies in dividing the feature maps into spatial bins at multiple levels and performing pooling operations on these regions regardless of their size or aspect ratio[^1]. The specific structure involves defining three different-sized pooling layers after convolutions—namely \(4 \times 4\), \(2 \times 2\), and \(1 \times 1\)—resulting in varying numbers of spatial blocks per layer: 16, 4, and 1 respectively[^3]. Each block undergoes max-pooling operation which extracts local maximum values from each region before concatenating all pooled features together forming an output vector with fixed dimensions suitable for subsequent fully connected layers. This approach allows CNNs trained using SPP-layers like those described above not only accept variable input sizes but also improve performance across various tasks such as image classification and object detection due to its ability to capture information effectively over diverse scales without requiring pre-processing steps that fixate upon particular resolutions[^4]. ```python import torch.nn.functional as F def spp_layer(x, level): """ Applies spatial pyramid pooling on given tensor. Args: x (Tensor): Input tensor of shape (batch_size, channels, height, width). level (int): Number of divisions along both axes; e.g., 1 -> no division, 2 -> two by two grid etc. Returns: Tensor: Output tensor resulting from applying SPP. """ batch_size, num_channels, h_in, w_in = x.size() # Calculate target size based on current level h_target = int(h_in / level) w_target = int(w_in / level) # Perform adaptive average/max pooling according to specified parameters out = [] for l in range(level): pool_output = F.adaptive_max_pool2d( x, output_size=(h_target * (l + 1), w_target * (l + 1)) ) flattened = pool_output.view(batch_size, -1) out.append(flattened) return torch.cat(out, dim=-1) ``` #### Applications of SPP In practical applications, SPP has been successfully applied beyond just improving robustness against scale variations during training phases. For instance, when integrated into Faster R-CNN architectures designed specifically around detecting objects within images efficiently while maintaining high accuracy standards even under challenging conditions where targets may appear distorted or occluded partially visible among cluttered backgrounds containing numerous distractors simultaneously present alongside them. Furthermore, another significant advantage offered through leveraging SPP comes down to computational efficiency gains achieved thanks largely because entire pictures rather than cropped patches centered about potential bounding boxes serve directly fed straightaway inside network pipelines thus eliminating needful preprocessing stages typically associated closely related methods relying heavily upon selective search algorithms generating proposals prior feeding forward passes occurring later stages processing workflows involved here instead leading up substantial speedups ranging anywhere between twenty-four sixty four fold improvements depending exact configurations utilized throughout experimentation runs conducted previously reported literature sources covering similar topics extensively already existing today's academic circles surrounding machine learning research communities actively exploring cutting-edge advancements continuously pushing boundaries what machines capable understanding interpreting visual data presented varied forms contexts encountered real-world scenarios daily basis regularly. --related questions-- 1. How does integrating SPP affect model performance compared to traditional approaches? 2. Can you provide examples of other models utilizing SPP besides Faster R-CNN? 3. What are some limitations faced when implementing SPP in modern neural networks? 4. Are there any alternatives to SPP for handling multi-scale inputs in deep learning? 5. In which types of datasets would SPP be most beneficial?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值