ResNets
Neural ODEs comes from ResNets
As these models grew to hundreds of layers deep, ResNets’ performance decreased. Deep learning had reached its limit. We need state-of-the-art performance to train deeper networks.

It also directly adds the input to the output, this shortcut connection improves the model since, at the worst, the residual block does not do anything.
One final thought. A ResNet can be described by the following equation:
h t + 1 = h t + f ( h t , θ t ) h_{t+1} = h_{t}+f(h_{t},\theta_{t}) ht+1=ht+f(ht,θt)
h - value of the hidden layer;
t - tell us which layer we are look at
the next hidden layer is the sum of the input and a function of the input as we have seen.
Find introduction to ResNets in Reference
Euler’s Method
How Neural ODEs work
Above equation seems like calculus, and if you don’t remember from calculus class, the Euler’s method is the simplest way to approximate the solution of a differential equation with initial value.
I n i t i a l v a l u e p r o b l e m : y ′ ( t ) = f ( t , y ( t ) ) , y ( t 0 ) = y 0 Initial value problem:y'(t)=f(t,y(t)), y(t_{0})=y_{0} Initialvalueproblem:y′(t)=f(t,y(t)),y(t0)=y0
E u l e r ’ s M e t h o d : y n + 1 = y n + h f ( t n , y n ) Euler’s Method: y_{n+1}=y_{n}+hf(t_{n},y_{n}) Euler’sMethod:yn+1=yn+hf(tn,yn)
through this we can find numerical approximation
Euler’s method and ResNets equation are identical, the only difference being the step size h h h, that is multiplied by the function. Because of this similarity, we can think ResNets is underlying differential equation.
Instead of going from diffeq to Euler’s method, we can reverse engineer the problem. Starting from the ResNet, the resulting differential equation is
N e u r a l O D E : d h ( t ) d t = f ( h ( t ) , t , θ ) Neural ODE: {\frac{dh(t)}{dt}} = f(h(t),t,\theta) NeuralODE:dtdh(t)=f(h(t),t,θ)
which describes the dynamics of our model.
The Basics
How a Neural ODE works
The Neural ODEs combines two concepts: deep learning and differential equations, we use the most simple methods - Euler’s method to make predictions.
Q:
How do we train it?
A:
Adjoint method
Include using another numerical solver to run backwards through time (backpropagating) and updating the model’s parameters.
- Defining the architecture:
class ODEfunc(nn.Module):
def __init__(self,dim):
super(ODEfunc,self).__init__()
#a 2d convolution that adds a time dimension to the output
self.conv1 = ConcatConv2d(dim,dim,kernel_size=3,padding=1)
self.norm2 = nn.BatchNorm2d(dim)
self.conv2 = ConcatConv2d(dim,dim,kernel_size=3,padding=1)
self.norm2 = nn.BatchNorm2d(dim)
self.relu = nn.ReLU(inplace=True)
# num of function evaluations for model depth
self.nfe = 0
def forward(self,t,x):
self.nfe += 1
out = self.conv1(t,x)
out = self.norm1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.norm2(out)
out = self.relu(out)
return out
- Defining a neural ODE:
# numerically solves the diffeq thereby obtaining predictions
class NeuralODE(nn.Module):
def __init__(self,odefunc):
super(ODENet,self).__init__()
self.odefunc = odefunc
self.integraton_time = torch.tensor([0,1]).float()
# backpropagates with the adjoint method to train model
def forward(self,x):
self

本文介绍了神经ODE如何从残差网络发展而来,并利用欧拉方法进行预测。通过结合残差网络与微分方程的概念,实现了常数内存消耗的模型训练。此外,还探讨了如何将神经ODE作为生成模型应用到变分自编码器中,实现连续时间序列的生成与插值。
最低0.47元/天 解锁文章
1348

被折叠的 条评论
为什么被折叠?



