CNN入门paper

本文介绍了卷积神经网络(CNN)的基础知识,包括CNN在图像、语音和时间序列中的应用,以及变量大小的卷积网络SDNN。文章详细讨论了卷积网络的局部感受野、共享权重和有时的子采样等核心思想。此外,还阐述了深度学习中卷积运算的算术,包括卷积层输出形状的计算,涉及输入形状、内核形状、零填充和步长等因素。还讨论了池化操作,如平均池化和最大池化的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Convolutional Networks for Images, Speech, and Time-Series

INTRODUCTION
  1. multilayer back-propagation networks适用于图像识别(见PATTERN RECOGNITION AND NEURAL NETWORKS)
  2. 模式识别的traditional model中,需要:
    • hand-designed feature extractor:用来从输入图像中获取信息,并消除无关变量
    • trainable classier:将得到的feature vectors (or strings of symbols)分类
    • standard, fully-connected multilayer networks 可被用作classifier
  3. 传统模型的缺点:
    • 上千变量;
    • fully-connected architecture忽略了输入中的拓扑结构
CONVOLUTIONAL NETWORKS
  1. CNN中的三种architectural ideas(保证shift and distortion invariance):
    • local receptive fields
    • shared weights (or weight replication)
    • sometimes, spatial or temporal subsampling
VARIABLE-SIZE CONVOLUTIONAL NETWORKS, SDNN

A guide to convolution arithmetic for deep learning

Introduction

卷积层的输出shape受输入shape、kernel shape的选择、 zero padding and strides影响,而且这些性质之间的关系不易得到。但fully-connected layers的output size与the input size之间是独立的。

Discrete convolutions
  1. 神经网络主要源于仿射变换(affine transformation):输入一个vector,给它乘上一个matrix,得到输出vector,通常还要加上一个 bias vector。也就是线性变换+平移。

    齐次坐标:用N+1维来代表N维坐标Alt text
    这里写图片描述
  2. images无论是多少维,都可以表示为vector,它有如下性质:
    • 可以用多维数组存储表示
    • 存在一个或多个axes,如width axes和height axes
    • 每一个axis表示该data的不同view(如color image的R、G、B通道)
  3. 上述性质在仿射变换中未被利用,所有axes被同等对待,未考虑其topology信息,于是discrete convolutions闪亮登场٩(๑❛ᴗ❛๑)۶
  4. discrete convolution:
    • a linear transformation that preserves this notion of ordering
    • sparse(只有很少的input影响output)
    • reuse parameters(共享权值,不同位置有相同权值)
  5. 一个discrete convolution的例子:
    • 原始的图像数据称为input feature map
    • kernel在原始图像上滑动,每个位置计算:
    • 最终得到的结果为output feature maps
    • 有多个input feature maps的情况很常见,如图像的不同通道。这种情况下kernel是三维的,也可以说是多个,最后对应元素相加
    • 可以利用多个不同的kernels,产生所需数量的output feature maps的例子:
  6. kernels collection相关参数:(n,m,k1,,kN)
    • n:output feature maps数量
    • m : input feature maps数量
In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection per- formance tends to degrade with increasing the IoU thresh- olds. Two main factors are responsible for this: 1) over- fitting during training, due to exponentially vanishing pos- itive samples, and 2) inference-time mismatch between the IoUs for which the detector is optimal and those of the in- put hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, is proposed to address these prob- lems. It consists of a sequence of detectors trained with increasing IoU thresholds, to be sequentially more selec- tive against close false positives. The detectors are trained stage by stage, leveraging the observation that the out- put of a detector is a good distribution for training the next higher quality detector. The resampling of progres- sively improved hypotheses guarantees that all detectors have a positive set of examples of equivalent size, reduc- ing the overfitting problem. The same cascade procedure is applied at inference, enabling a closer match between the hypotheses and the detector quality of each stage. A simple implementation of the Cascade R-CNN is shown to surpass all single-model object detectors on the challeng- ing COCO dataset. Experiments also show that the Cas- cade R-CNN is widely applicable across detector architec- tures, achieving consistent gains independently of the base- line detector strength. The code will be made available at https://github.com/zhaoweicai/cascade-rcnn.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值