Combine RNN with Neural ODEs

本文介绍了神经ODE如何从残差网络发展而来,并利用欧拉方法进行预测。通过结合残差网络与微分方程的概念,实现了常数内存消耗的模型训练。此外,还探讨了如何将神经ODE作为生成模型应用到变分自编码器中,实现连续时间序列的生成与插值。
Intro to Neural ODEs

ResNets

Neural ODEs comes from ResNets

As these models grew to hundreds of layers deep, ResNets’ performance decreased. Deep learning had reached its limit. We need state-of-the-art performance to train deeper networks.

在这里插入图片描述
It also directly adds the input to the output, this shortcut connection improves the model since, at the worst, the residual block does not do anything.

One final thought. A ResNet can be described by the following equation:

h t + 1 = h t + f ( h t , θ t ) h_{t+1} = h_{t}+f(h_{t},\theta_{t}) ht+1=ht+f(ht,θt)

h - value of the hidden layer;
t - tell us which layer we are look at

the next hidden layer is the sum of the input and a function of the input as we have seen.

Find introduction to ResNets in Reference

Euler’s Method

How Neural ODEs work

Above equation seems like calculus, and if you don’t remember from calculus class, the Euler’s method is the simplest way to approximate the solution of a differential equation with initial value.

I n i t i a l v a l u e p r o b l e m : y ′ ( t ) = f ( t , y ( t ) ) , y ( t 0 ) = y 0 Initial value problem:y'(t)=f(t,y(t)), y(t_{0})=y_{0} Initialvalueproblem:y(t)=f(t,y(t)),y(t0)=y0

E u l e r ’ s M e t h o d : y n + 1 = y n + h f ( t n , y n ) Euler’s Method: y_{n+1}=y_{n}+hf(t_{n},y_{n}) EulersMethod:yn+1=yn+hf(tn,yn)

through this we can find numerical approximation

Euler’s method and ResNets equation are identical, the only difference being the step size h h h, that is multiplied by the function. Because of this similarity, we can think ResNets is underlying differential equation.

Instead of going from diffeq to Euler’s method, we can reverse engineer the problem. Starting from the ResNet, the resulting differential equation is

N e u r a l O D E : d h ( t ) d t = f ( h ( t ) , t , θ ) Neural ODE: {\frac{dh(t)}{dt}} = f(h(t),t,\theta) NeuralODE:dtdh(t)=f(h(t),t,θ)

which describes the dynamics of our model.

The Basics

How a Neural ODE works

The Neural ODEs combines two concepts: deep learning and differential equations, we use the most simple methods - Euler’s method to make predictions.

Q:
  How do we train it?
A:
  Adjoint method

Include using another numerical solver to run backwards through time (backpropagating) and updating the model’s parameters.

  • Defining the architecture:
class ODEfunc(nn.Module):
	def __init__(self,dim):
		super(ODEfunc,self).__init__()
		#a 2d convolution that adds a time dimension to the output
		self.conv1 = ConcatConv2d(dim,dim,kernel_size=3,padding=1)
		self.norm2 = nn.BatchNorm2d(dim)
		self.conv2 = ConcatConv2d(dim,dim,kernel_size=3,padding=1)
		self.norm2 = nn.BatchNorm2d(dim)
		self.relu = nn.ReLU(inplace=True)
		# num of function evaluations for model depth
		self.nfe = 0
	def forward(self,t,x):
		self.nfe += 1
		out = self.conv1(t,x)
		out = self.norm1(out)
		out = self.relu(out)
		out = self.conv2(out)
		out = self.norm2(out)
		out = self.relu(out)
		return out
  • Defining a neural ODE:
# numerically solves the diffeq thereby obtaining predictions
class NeuralODE(nn.Module):
	def __init__(self,odefunc):
		super(ODENet,self).__init__()
		self.odefunc = odefunc
		self.integraton_time = torch.tensor([0,1]).float()
	# backpropagates with the adjoint method to train model
	def forward(self,x):
		self
非常好的问题! 我们来详细理解一下 **“Seq2Seq with Attention 的双向 RNN”** 这个概念。 --- ## 🔍 一、什么是 Seq2Seq with Attention? **Seq2Seq(Sequence to Sequence)** 是一种经典的深度学习架构,常用于机器翻译、文本摘要等任务。它由两个主要部分组成: - **Encoder(编码器)**:将输入序列编码为一个上下文向量(context vector)。 - **Decoder(解码器)**:根据这个上下文向量生成目标序列。 传统的 Seq2Seq 使用 RNN(如 LSTM 或 GRU)来实现,但存在一个问题:**信息瓶颈(information bottleneck)**,即所有信息都要压缩到一个固定长度的上下文向量中。 **Attention 机制的引入**,解决了这个问题,它允许 Decoder 在每一步都关注 Encoder 输出中的不同部分,而不是只关注最后一个隐藏状态。 --- ## 🔍 二、什么是双向 RNN(BiRNN)? **双向 RNN(Bidirectional RNN)** 是一种 RNN 变体,它同时考虑了序列的前向信息和后向信息: - **前向 RNN(Forward RNN)**:从左到右读取序列,捕捉前面的信息。 - **后向 RNN(Backward RNN)**:从右到左读取序列,捕捉后面的信息。 最终的隐藏状态是两个方向隐藏状态的拼接。 > 例如:在句子 “I love deep learning” 中,BiRNN 能同时知道 “love” 前面是 “I”,后面是 “deep learning”。 --- ## 🔍 三、在 Seq2Seq with Attention 中使用 BiRNN 的意义 ### ✅ 1. Encoder 使用 BiRNN 的优势: - Encoder 的每个时间步输出都包含了该位置的**上下文信息**(既包括前面也包括后面的内容)。 - 提供了更丰富的表示,帮助 Attention 更好地选择相关部分。 ### 📌 结构图示意(简化): ``` Input: x1 x2 x3 x4 Encoder: [BiLSTM] ↓ ↓ ↓ ↓ h1 h2 h3 h4 Attention: ↓ Decoder: y1 → y2 → y3 → y4 ``` --- ## 🧠 四、代码实现(PyTorch 示例) 下面是一个使用 **BiLSTM + Attention 的 Seq2Seq 模型** 的简化实现: ```python import torch import torch.nn as nn class Seq2SeqWithAttention(nn.Module): def __init__(self, input_size, hidden_size, output_size, num_layers=1): super(Seq2SeqWithAttention, self).__init__() self.hidden_size = hidden_size # Encoder 使用双向 LSTM self.encoder = nn.LSTM(input_size, hidden_size, num_layers, bidirectional=True) # Decoder 使用单向 LSTM self.decoder = nn.LSTM(hidden_size * 2, hidden_size, num_layers) # 注意力权重计算 self.attn = nn.Linear(hidden_size * 2, hidden_size) self.attn_combine = nn.Linear(hidden_size * 2, hidden_size) # 输出层 self.out = nn.Linear(hidden_size, output_size) def forward(self, encoder_inputs, decoder_hidden, decoder_cell): # Encoder 阶段 encoder_outputs, (hidden, cell) = self.encoder(encoder_inputs) # 初始化 decoder 的输入(例如第一个 token 是 <SOS>) decoder_input = torch.randn(1, self.hidden_size * 2) # 存储输出 outputs = [] for t in range(5): # 假设解码长度为5 # 注意力机制 attn_weights = F.softmax(self.attn(decoder_input), dim=1) context = torch.bmm(attn_weights.unsqueeze(1), encoder_outputs.transpose(0, 1)) decoder_input = torch.cat((context.squeeze(1), decoder_input), dim=1) decoder_input = self.attn_combine(decoder_input) # Decoder LSTM decoder_output, (hidden, cell) = self.decoder(decoder_input.unsqueeze(0), (hidden, cell)) # 输出 output = self.out(decoder_output.squeeze(0)) outputs.append(output) # 更新 decoder_input 为下一个时间步的输入 decoder_input = output return torch.stack(outputs), attn_weights ``` --- ## ✅ 五、关键点总结 | 模块 | 作用 | 是否必须 | |------|------|----------| | Encoder(BiLSTM) | 提取每个输入词的上下文信息 | ✅ 是 | | Decoder(LSTM) | 生成目标序列 | ✅ 是 | | Attention 机制 | 动态选择 Encoder 输出中相关部分 | ✅ 是 | | BiLSTM 的隐藏状态拼接 | 提供双向信息 | ✅ 是 | --- ## ✅ 六、为什么使用双向 RNN 效果更好? | 优点 | 说明 | |------|------| | 上下文感知更强 | 每个词的表示都包含前向和后向信息 | | 注意力更准确 | Encoder 输出包含更丰富的上下文,注意力可以更精准 | | 适合 NLP 任务 | 如翻译、摘要、问答等需要上下文理解的任务 | --- ## ✅ 七、适用任务举例 | 任务 | 说明 | |------|------| | 机器翻译 | 源语言句子用 BiRNN 编码,目标语言用 Attention 解码 | | 文本摘要 | 输入长文本,BiRNN 提取信息,Attention 解码出摘要 | | 问答系统 | 输入问题和文档,BiRNN 提取语义,Attention 找出答案片段 | --- ## ✅ 总结一句话: > 在 **Seq2Seq with Attention** 中使用 **双向 RNN(BiRNN)**,可以让 Encoder 提取每个输入词的前后上下文信息,从而为 Decoder 的注意力机制提供更丰富、更准确的表示,显著提升模型在需要上下文理解任务上的表现。 --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值