CRNN论文解读与复现

本文详细介绍了一种结合卷积神经网络(CNN)与循环神经网络(RNN)的网络架构——CRNN。该架构专为处理序列长度变化的对象设计,特别适用于图像识别任务,如文字识别。文中介绍了网络的具体结构,包括多层卷积层、池化层以及双向循环层的设计细节,并阐述了如何通过条件概率层实现序列到标签的映射。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、Network  Architecture

 

1.Convoluional   Layers:

    Input image:100x32(gray-scale image )

    Conv0:100x32x1*3x3x1x64——>100x32x64(maps:64  k:3x3 s:1 p:1)

    Maxpool0:100x32x64——>50x16x64(window:2x2 s:2)

    Conv1:50x16x64*3x3x64x128——>50x16x128(maps:128 k:3x3 s:1 p:1)

    Maxpool1:50x16x128——>25x8x128(window:2x2 s:2)

    Conv2:25x8x128*3x3x128x256——>25x8x256(maps:256 k:3x3 s:1 p:1)

    Conv3:25x8x256*3x3x256x256——>25x8x256(maps:256 k:3x3 s:1 p:1)

    Maxpool2:25x8x256——>13x4x256(window:1x2 s:2)

    Conv4:13x4x256*3x3x256x512——>13x4x512(maps:512 k:3x3 s:1 p:1)

    BatchNormalization

    Conv5:13x4x512*3x3x512x512——>13x4x512(maps:512 k:3x3 s:1 p:1)

    BatchNormalization

     Maxpool3:13x4x512——>7x2x512(window:1x2 s:2)

    Conv6:7x2x512*2x2x512x512——>6x1x512(maps:512 k:2x2 s:1 p:0)

    Map-to-Sequence:6x1x512——>6x512

2.Recurrent Layers

   Bidirectional-LSTM0:6x512*512x256——>6x256

   Bidirectional-LSTM1:6x256*256x5530——>6x5530

二、Details

1.Convoluional   Layers

1)Before being fed into the network , all the images need to be scaled to the same height. In the 3rd and the 4th maxpooling layers,we adopt 1x2 sized rectangular pooling windows instead of the conventional squared ones.This tweak yields feature maps with lager width,hence longer feature sequence.

2)Each feature vector of a feature sequence is generated from left to right on the feature maps by column.This means the i-th feature vector  is the concatenation  of the i-th columns of all the maps. And because of the translation invariant,each column of the feature maps corresponds to a rectangle region of the original image(termed the receptive filed)

3)Camparing to CNN, CRNN remove the fully connected layer and convey deep features into sequential representations in order to be invariant to the length variation of sequence-like objects.

2.Recurrent Layers

1)A deep bidirectional Recurrent Neural Network is built on the top of the convolutional layers,as the recurrent layers. The recurrent layers predict a label distribution Yt for each frame Xt in the feature sequence.

2)RNN has a strong capability of capturing contextual information within a sequence.Some ambiguous characters are easier to distinguish when observing their contexts.

3)RNN can back-propagates error differentials to its input,the convolutional layer,allowing us to jointly train the recurrent layers and the convolutional layers in a unified network.

4)RNN is able to operate on sequences of arbitrary  lengths,traversing from starts to ends.

3.Transcription layers

We adopt the conditional probility defined in the CTC layer. And the conditional probability is defined as the sum of probabilities of all π that are mapped by B onto l:

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值