Week 12: 深度学习补遗：RNN与LSTM

最新推荐文章于 2025-11-23 17:56:55 发布

原创

最新推荐文章于 2025-11-23 17:56:55 发布 · 878 阅读

8 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #rnn #lstm

文章目录

Week 12: 深度学习补遗：RNN与LSTM

Week 12: 深度学习补遗：RNN与LSTM

摘要

本周继续跟随李宏毅老师的课程进行学习，主要对循环神经网络和长短期记忆进行了解和学习，了解其底层逻辑以及具体数学实现。除此之外，还对其奏效的原因和底层逻辑进行了一定程度的认识。

Abstract

This week, I took a course with Professor Hung-yi Lee on autoencoders and generative models, which are closely related. I mainly studied autoencoders from an abstract and mathematical perspective. The encoder-decoder architecture is a major mainstream structure in current models and is therefore important to learn. Studying autoencoders and generative models has given me a certain understanding of the encoder-decoder architecture.

1. Recurrent Neural Network 循环神经网络

RNN的一个典型应用是Slot Filling，即设置几个槽位（Slot），将句子的相应内容解析到对应的槽位。例如设置“Destination”、“Time of Arrival”两个槽位，对于“Arrive Taipei on November 2nd”这个句子，需要将“Taipei”解析到“Destination”中，“November 2nd”解析到“Time of Arrival”中。

Destination	Time of Arrival
Taipei	November 2nd

首先需要将句子嵌入成向量，可以通过1-of-N Encoding的方法或者词哈希等方法进行实现。但使用普通前馈神经网络时会出现一个问题，即对于两个句子“Arrive Taipei on November 2nd”和“Leave Taipei on November 2nd”，FNN会先处理“Arrive”和“Leave”再处理“Taipei”。对于这两个句子，FNN无法分辨“Taipei”在当前句子是出发地还是目的地，因为FNN没有记忆能力。

RNN Network Structure

RNN的特点，在于增加了一个暂存模块，隐藏层的输出被存储在内存中，而内存作为一个另外的输入在下次输入进行时一并进行输入。需要注意的时候，在第一次训练时，Memory也必须要被初始化。

设想一个简单的RNN，其权重都被设置为1，无偏差，Memory初始化为0。
$\text{Input Sequence:}\begin{bmatrix}1\\1\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix}\begin{bmatrix}2\\2\end{bmatrix} \dots \\ \text{Output Sequence:}\begin{bmatrix}4\\4\end{bmatrix}\begin{bmatrix}12\\12\end{bmatrix}\begin{bmatrix}32\\32\end{bmatrix} \dots\\$
对于第一个输入 $\begin{bmatrix}1\\1\end{bmatrix}$ ，第一个隐藏层的结果为 $\begin{bmatrix}2\\2\end{bmatrix}$

最低0.47元/天解锁文章