Basic Ideas
Why use S§DE solvers for GPs?
- The O(n3)O(n^3)O(n3) computational complexity is a challenge.
- What do we get:
- O(n)O(n)O(n) state-space methods for SDEs/SPDEs.
- Sparse approximations developed for SPDEs.
- Reduced rank Fourier/basis function approximations. Path to non-Gaussian processes.
- Downsides:
- We often need to approximate.
- Mathematics can become messy
Stochastic differential equations and Gaussian processes
Ornstein-Uhlenbeck process
The mean and covariance functions:
m(x)=0k(x,x′)=σ2exp(−λ∣x−x′∣)
\begin{aligned} m(x) &=0 \\ k\left(x, x^{\prime}\right) &=\sigma^{2} \exp \left(-\lambda\left|x-x^{\prime}\right|\right) \end{aligned}
m(x)k(x,x′)=0=σ2exp(−λ∣x−x′∣)
This has a path representation as a stochastic differential equation (SDE):
df(t)dt=−λf(t)+w(t)
\frac{d f(t)}{d t}=-\lambda f(t)+w(t)
dtdf(t)=−λf(t)+w(t)
where w(t)w(t)w(t) is a white noise process with xxx relabeled as ttt.
Prove:
FT:(iω)f^=−λf^+ω^f^=ω^λ+(iω)SpectralDensity:δ(ω)=E[∣w^∣2]w2+λ2=qw2+λ2IF:h(τ)=12π∫qw2+λ2exp(iwτ)dτ
\begin{aligned}
FT: (i \omega) \hat{f} &= -\lambda \hat{f} + \hat{\omega} \\
\hat{f} &= \frac{\hat{\omega}}{\lambda +(i \omega) } \\
Spectral Density: \delta(\omega) &= \frac{{E}[|\hat{w}|^{2}]}{w^2+\lambda^2} = \frac{q}{w^2+\lambda^2}\\
IF:h(\tau) &= \frac{1}{2 \pi} \int \frac{q}{w^2+\lambda^2} \exp(iw\tau) d\tau\\
\end{aligned}
FT:(iω)f^f^SpectralDensity:δ(ω)IF:h(τ)=−λf^+ω^=λ+(iω)ω^=w2+λ2E[∣w^∣2]=w2+λ2q=2π1∫w2+λ2qexp(iwτ)dτ
Consider a Gaussian process regression problem:
f(x)∼GP(0,σ2exp(−λ∣x−x′∣))yk=f(xk)+εk
\begin{aligned} f(x) & \sim \mathrm{GP}\left(0, \sigma^{2} \exp \left(-\lambda\left|x-x^{\prime}\right|\right)\right) \\ y_{k} &=f\left(x_{k}\right)+\varepsilon_{k} \end{aligned}
f(x)yk∼GP(0,σ2exp(−λ∣x−x′∣))=f(xk)+εk
this is equivalent to the state-space model:
df(t)dt=−λf(t)+w(t)yk=f(tk)+εk
\begin{aligned} \frac{d f(t)}{d t} &=-\lambda f(t)+w(t) \\ y_{k} &=f\left(t_{k}\right)+\varepsilon_{k} \end{aligned}
dtdf(t)yk=−λf(t)+w(t)=f(tk)+εk
that is, with fk=f(tk)fk = f(t_k)fk=f(tk) we have a Gauss-Markov model
fk+1∼p(fk+1∣fk)yk∼p(yk∣fk)
\begin{aligned} f_{k+1} & \sim p\left(f_{k+1} | f_{k}\right) \\ y_{k} & \sim p\left(y_{k} | f_{k}\right) \end{aligned}
fk+1yk∼p(fk+1∣fk)∼p(yk∣fk)
Solvable in O(n)O(n)O(n) time using Kalman filter/smoother