Combine RNN with Neural ODEs

本文介绍了神经ODE如何从残差网络发展而来,并利用欧拉方法进行预测。通过结合残差网络与微分方程的概念,实现了常数内存消耗的模型训练。此外,还探讨了如何将神经ODE作为生成模型应用到变分自编码器中,实现连续时间序列的生成与插值。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Intro to Neural ODEs

ResNets

Neural ODEs comes from ResNets

As these models grew to hundreds of layers deep, ResNets’ performance decreased. Deep learning had reached its limit. We need state-of-the-art performance to train deeper networks.

在这里插入图片描述
It also directly adds the input to the output, this shortcut connection improves the model since, at the worst, the residual block does not do anything.

One final thought. A ResNet can be described by the following equation:

h t + 1 = h t + f ( h t , θ t ) h_{t+1} = h_{t}+f(h_{t},\theta_{t}) ht+1=ht+f(ht,θt)

h - value of the hidden layer;
t - tell us which layer we are look at

the next hidden layer is the sum of the input and a function of the input as we have seen.

Find introduction to ResNets in Reference

Euler’s Method

How Neural ODEs work

Above equation seems like calculus, and if you don’t remember from calculus class, the Euler’s method is the simplest way to approximate the solution of a differential equation with initial value.

I n i t i a l v a l u e p r o b l e m : y ′ ( t ) = f ( t , y ( t ) ) , y ( t 0 ) = y 0 Initial value problem:y'(t)=f(t,y(t)), y(t_{0})=y_{0} Initialvalueproblem:y(t)=f(t,y(t)),y(t0)=y0

E u l e r ’ s M e t h o d : y n + 1 = y n + h f ( t n , y n ) Euler’s Method: y_{n+1}=y_{n}+hf(t_{n},y_{n}) EulersMethod:yn+1=yn+hf(tn,yn)

through this we can find numerical approximation

Euler’s method and ResNets equation are identical, the only difference being the step size h h h, that is multiplied by the function. Because of this similarity, we can think ResNets is underlying differential equation.

Instead of going from diffeq to Euler’s method, we can reverse engineer the problem. Starting from the ResNet, the resulting differential equation is

N e u r a l O D E : d h ( t ) d t = f ( h ( t ) , t , θ ) Neural ODE: {\frac{dh(t)}{dt}} = f(h(t),t,\theta) NeuralODE:dtdh(t)=f(h(t),t,θ)

which describes the dynamics of our model.

The Basics

How a Neural ODE works

The Neural ODEs combines two concepts: deep learning and differential equations, we use the most simple methods - Euler’s method to make predictions.

Q:
  How do we train it?
A:
  Adjoint method

Include using another numerical solver to run backwards through time (backpropagating) and updating the model’s parameters.

  • Defining the architecture:
class ODEfunc(nn.Module):
	def __init__(self,dim):
		super(ODEfunc,self).__init__()
		#a 2d convolution that adds a time dimension to the output
		self.conv1 = ConcatConv2d(dim,dim,kernel_size=3,padding=1)
		self.norm2 = nn.BatchNorm2d(dim)
		self.conv2 = ConcatConv2d(dim,dim,kernel_size=3,padding=1)
		self.norm2 = nn.BatchNorm2d(dim)
		self.relu = nn.ReLU(inplace=True)
		# num of function evaluations for model depth
		self.nfe = 0
	def forward(self,t,x):
		self.nfe += 1
		out = self.conv1(t,x)
		out = self.norm1(out)
		out = self.relu(out)
		out = self.conv2(out)
		out = self.norm2(out)
		out = self.relu(out)
		return out
  • Defining a neural ODE:
# numerically solves the diffeq thereby obtaining predictions
class NeuralODE(nn.Module):
	def __init__(self,odefunc):
		super(ODENet,self).__init__()
		self.odefunc = odefunc
		self.integraton_time = torch.tensor([0,1]).float()
	# backpropagates with the adjoint method to train model
	def forward(self,x):
		self.integration_time = self.integration_time.type_as(x)
		out = odient_adjoint(self.odefunc,x,self.interation_time)
		return out[1]
  • Conbine
input_layers = [nn.Conv2d(1,64,kernel_size=3),nn.BatchNorm(64),nn.ReLU(inplace=True)]
feature_layers = [NeuralODE(ODEfunc(64))]
output_layers = [nn.AdaptiveAvgPool((1,1)),nn.Linear(64,5)]
neural_ode = nn.Sequential(*input_layers,*feature_layers,*output_layers)

Adjoint Method

How a Neural ODE backpropagates with the adjoint method

The adjoint method used to actually train the model parameters.

Adjoint method reverse-mode differentiation:
在这里插入图片描述

Model Comparison

Start with a simple machine learning model to showcase its strengths and weaknesses

  • ResNets model with lower time per epoch, and Neural ODEs model with more time.

  • ResNets model with more memory, and ODE model with O(1) space usage.

Overall, one of the main benefits is the constant memory usage while training the model. However, this comes at the cost of training time.

Variational Autoencoders

Premise:

Generative model: Able to generate samples like those in training data

VAE is a directed generative model with observed and latent variables, which give us a latent space to sample from.

In view of the application that interpolate between sentences, I will use VAE to connect RNN and ODE.

VAE Architecture

在这里插入图片描述

VAE Design

When as Inference model, the input x x x is passed to the encoder network, producing an approximate posterior q ( z ∣ x ) q(z|x) q(zx) over latent variables.

Sentence prediction by conventional autoencoder

Sentences produced by greedily decoding from points between two sentence encodings with a conventional autoencoder. The intermediate sentences are not plausible English.

VAE Language Model

Words are represented using a learned dictionary of embedding words

VAE sentence interpolation

  • Paths between random points in VAE space
  • Intermediate sentences are grammatical
  • Topic and syntactic structure are consistent

Generative model

Neural ODE as generative model, are used in a VAE framework

在这里插入图片描述

  • First, we encode the input sequence with some time series algorithms
  • Run the embedding through the Neural ODE to get the continuous embedding
  • Recover initial sequence from the continuous embedding in VAE

VAE as a generative model

variational autoencoder approach

A generative model through sampling procedure:

z t 0 ∼ N ( 0 , I ) z_{t_{0}} \sim N(0,I) zt0N(0,I)

z t 1 , z t 2 , . . . , z t M = O D E S o l v e ( z t 0 , f , θ f , t 0 , . . . , t M ) z_{t_{1}},z_{t_{2}},...,z_{t_{M}} = ODESolve(z_{t_{0}},f,\theta_{f},t_{0},...,t_{M}) zt1,zt2,...,ztM=ODESolve(zt0,f,θf,t0,...,tM)

e a c h x t i ∼ p ( x ∣ z t i ; θ x ) each x_{t_{i}} \sim p(x|z_{t_{i}};\theta_{x}) eachxtip(xzti;θx)

Training :

  • Run the RNN encoder through the time series backwards in time to infer the parameters μ z t 0 \mu_{z_{t_{0}}} μzt0, σ z t 0 \sigma_{z_{t_{0}}} σzt0 of variational posterior and sample from it.

z t 0 ∼ q ( z t 0 ∣ x t 0 , . . . , x t 0 ; t 0 , . . . , t M ; θ q ) = N ( z t 0 ∣ μ z t 0 σ z t 0 ) z_{t_{0}} \sim q(z_{t_{0}}|x_{t_{0}},...,x_{t_{0}};t_{0},...,t_{M};\theta_{q}) = N(z_{t_{0}}|\mu_{z_{t_{0}}} \sigma_{z_{t_{0}}}) zt0q(zt0xt0,...,xt0;t0,...,tM;θq)=N(zt0μzt0σzt0)

  • Obtain the latent trajectory

z t 1 , z t 2 , . . . , z t N = O D E S o l v e ( z t 0 , f , θ f , t 0 , . . . , t N ) , w h e r e d z d t = f ( z , t ; θ f ) z_{t_{1}},z_{t_{2}},...,z_{t_{N}} = ODESolve(z_{t_{0}},f,\theta_{f},t_{0},...,t_{N}),where \frac{dz}{dt} = f(z,t;\theta_{f}) zt1,zt2,...,ztN=ODESolve(zt0,f,θf,t0,...,tN),wheredtdz=f(z,t;θf)

  1. Map the latent trajectory onto the data space using another neural network: x ^ t i ( z t i , t i ; θ z ) \hat{x}_{t_{i}}({z}_{t_{i}},t_{i};\theta_{z}) x^ti(zti,ti;θz)
  2. Maximize Evidence Lower Bound estimate for sampled trajectory

E L B O ≈ N ( ∑ i = 0 M l o g p ( x t i ; θ f ) ; θ z ) + K L ( q ( z t 0 ∣ x t 0 , . . . , x t M ; t 0 , . . . , t M ; θ q ) ∣ ∣ N ( 0 , I ) ) ELBO \approx N(\sum_{i = 0}^{M} logp(x_{t_{i}};\theta_{f});\theta_{z}) +KL(q(z_{t_{0}}|x_{t_{0}},...,x_{t_{M}};t_{0},...,t_{M};\theta_{q})||N(0,I)) ELBON(i=0Mlogp(xti;θf);θz)+KL(q(zt0xt0,...,xtM;t0,...,tM;θq)N(0,I))

And in case of Gaussian posterior p ( x ∣ z t i ; θ x ) p(x|z_{t_{i}};\theta_{x}) p(xzti;θx) and known noise level σ x \sigma_{x} σx

E L B O ≈ − N ( ∑ i = 1 M ( x i − x ^ i ) 2 σ x 2 − l o g σ z t 0 2 + μ z t 0 2 + σ z t 0 2 ) + C ELBO \approx -N(\sum_{i = 1}^{M} \frac{(x_{i}-\hat{x}_{i})^2}{\sigma_{x}^2}-log\sigma_{z_{t_{0}}}^2+\mu_{z_{t_{0}}}^2+\sigma_{z_{t_{0}}}^2)+C ELBON(i=1Mσx2(xix^i)2logσzt02+μzt02+σzt02)+C

Define Model

class RNNEncoder(nn.moudle):
	def __init__(self,input_dim,hidden_dim,latent_dim):
		super(RNNEncoder,self).__init__()
		self.input_dim = input_dim
		self.hidden_dim = hidden_dim
		self.latent_dim = latent_dim

		self.rnn = nn.GRU(input_dim+1,hidden_dim)
		self.hid2lat = nn.Linear(hidden_dim,2*latent_dim)
	
	def forward(self,x,t):
		# Concatenate time to input
		t = t.clone()
		t[1:] = t[:-1] = t[1:]
		t[0] = 0
		xt = torch.cat((x,t),dim=-1)
		_,h0 = self.rnn(xt.flip((0,))) #Reversed
		# Compute latent dimension
		z0 = self.hid2lat(h0[0])
		z0_mean = z0[:,:self.latent_dim]
		z0_log_var = z0[:,self.latent_dim:]
		return z0_mean,z0_log_var
class NeuralODEDecoder(nn.Module):
	def __init__(self,output_dim,hidden_dim,latent_dim):
		super(NeuralODEDcoder,self).__init__()
		self.output_dim = output_dim
		self.hidden_dim = hidden_dim
		self.latent_dim = latent_dim
	
		func = NNODEF(latent_dim,hidden_dim,time_invariant=True)
		self.ode = NeuralODE(func)
		self.l2h = nn.Linear(latent_dim,hidden_dim)
		self.h2o = nn.Linear(hidden_dim,output_dim)

	def forward(self,z0,t):
		zs = self.ode(z0,t,return_whole_sequence=True)
		hs = self.l2h(zs)
		xs = self.h2o(hs)
		return xs
class ODEVAE(nn.Module):
	def __init__(self,output_dim,hidden_dim,latent_dim):
		super(ODEVAE,self).__init__()
		self.output_dim = output_dim
		self.hidden_dim = hidden_dim
		self.latent_dim = latent_dim
		
		self.encoder = RNNEncoder(output_dim,hidden_dim,latent_dim)
		self.decoder = NeuralODEDecoder(output_dim,hidden_dim,latent_dim)

	def forward(self,x,t,MAP=False):
		z_mean,z_log_var = self.encoder(x,t)
		if MAP:
			z = z_mean
		else:
			z = z_mean + torch.randn_like(z_mean)*torch.exp(0.5=z_log_var)
		x_p = self.decoder(z,t)
		return x_p,z,z_mean,z_log_var
	
	def generate_with_seed(self,seed_x,t):
		seed_t_len = seed_x.shape[0]
		z_mean,z_log_var = self.encoder(seed_x,t[:seed_t_len])
		x_p = self.decoder(z_mean,t)
		return x_p

Reference:

[1]: Intro to Neural ODEs

[2]: PyTorch Implementation of Differentiable ODE Solvers

[3]: Variational autoencoders.

[4]: 变分自编码器VAE:原来是这么一回事 | 附开源代码

[5]: VAE Application

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值