cs224n RNN和语言模型（The Vanishing Gradient Issue）

大模型与Agent智能体

于 2019-01-01 10:31:36 发布

阅读量487

点赞数

CC 4.0 BY-SA版权

分类专栏： AI & Big Data案例实战课程

热烈祝贺Gavin大咖2024年北京航空航天大学两本新书《Transformer&ChatGPT解密：原理、源码及案例》、《Transformer& Rasa 解密: 原理、源码及案例》出版发行

本文链接：https://blog.youkuaiyun.com/duan_zhihua/article/details/85523581

AI & Big Data案例实战课程专栏收录该内容

167 篇文章 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

本文探讨cs224n中的RNN和语言模型遇到的梯度消失问题，通过对比sigmoid和relu激活函数在双层神经网络中的表现，揭示两者在梯度传播上的差异。sigmoid激活函数在多次迭代后出现梯度消失，而relu函数则能避免这个问题，具有更快的收敛速度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

cs224n RNN和语言模型（The Vanishing Gradient Issue）

cs224n RNN和语言模型中提及梯度消失，这里展示在两个隐藏层的简单神经网络中分别使用sigmoid、relu激活函数的区别，这是Andrej Karpathy为cs231n 一个小型网络演示构建的，代码如下：

# -*- coding: utf-8 -*-
# Setup
import numpy as np
import matplotlib.pyplot as plt

#%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
#%load_ext autoreload
#%autoreload 2


#generate random data -- not linearly separable 
np.random.seed(0)
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D))
num