解密准确天气预报背后的黑客技巧：变分数据同化-优快云博客

原文：towardsdatascience.com/decoding-the-hack-behind-accurate-weather-forecasting-variational-data-assimilation-3aea18be36f0

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/e7a25b3a4cb4a74f4488e88d9d27e7ba.png

来源：unsplash.com/

1. 快速入门：为什么是数据同化

天气预报模型是混沌动力系统，由于模型状态中的微小扰动，预报变得不稳定，因此盲目相信预报是有风险的。尽管当前的预报服务，如欧洲中期天气预报中心（ECMWF），在预测中期（15 天）到季节性天气方面取得了高精度。良好预报背后的黑客技巧在于自 1997 年以来在 ECMWF 使用的四维变分数据同化（4D-Var）。该算法结合实时观测来改善预报。作为最小化蝴蝶效应（对初始条件的高度敏感性）的主要技术，4D-Var 在许多其他领域的操作时间序列预报系统中也被广泛使用。

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/ba366acb1dd4399778a0e2878c0c6bf1.png

ECMWF 数据同化的示意图。来源：www.ecmwf.int/sites/default/files/2023-08/Fact%20sheet%20-%20Reanalysis%20-v3_1.pdf。版权© ECMWF，许可协议下 CC BY-SA 4.0。

这篇博客文章将介绍 4D-Var 背后的数学原理，并使用 PyTorch（一个现代深度学习框架，提供强大的功能以加速传统数据同化）通过一个简单示例进行实现。到这篇文章结束时，你将完全理解 4D-Var 是如何工作的，并准备好将其应用于你自己的问题。文章结构如下：

快速入门：为什么是数据同化
前置条件
标准 4D-Var 算法
增量 4D-Var 算法
4D-Var 实现
案例研究
总结

2. 前置条件

4D-Var 的推导需要基本的向量微积分、贝叶斯定理和多变量统计知识。在不深入研究数学的情况下玩弄代码是完全可行的。对 Python 和 PyTorch 的熟悉将帮助你理解代码。代码仓库可在pytorch-data-assimilation-tutorial找到。

3. 标准的 4D-Var 算法

3.1. 从贝叶斯视角的问题表述

与像卡尔曼滤波器这样的顺序数据同化方法不同，后者在每个时间步更新模型状态，4D-Var 执行批量优化。它同时更新多个连续步骤中的所有模型状态。假设一个模拟模型由两个函数组成：状态转换模型 m 和观测算子 h：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/ad276462df1c920e59bf2e8538c2eb34.png

在这里，x 是模型状态，y 是模型输出。由于模型状态之间的顺序依赖性，优化一批状态等同于仅优化初始状态，x0。回忆贝叶斯定理：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/c9418fe5e2cc9281242a6736bb7bf66d.png

当使用高斯先验分布 p(x) 和似然 p(y|x) 时，我们有：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/99903678698d5da60d58eeade3693653.png

后验概率可以对数变换成一个要最小化的成本函数 J：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/06103a43ca91cb51fa534d15767bb00b.png

3.2. 梯度下降优化

成本函数中每一项的梯度是：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/a69150f4ca676c6f5af966f3fc2effa7.png

所有导数都可以表示为雅可比矩阵的形式：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/4dd9bbcc06609abe956178a59bf76acd.png

3.3. 简化技术：伴随

计算上述成本项以递归方式要容易得多，这利用了来自线性代数的伴随算子：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/9efa5b3c8da57422db8cf3258533e787.png

这样可以将信息向后传播，从而允许高效地计算成本函数的梯度：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/b5517c5e771009a3c9fb4d1db580f05d.png

3.4. 总结：算法

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/2709da3aad572bdbce6e25d19da5b0c4.png

4. 增量 4D-Var 算法

4.1. 问题表述

如果我们定义模型状态和观测的增量：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/81172bf0b279910375dc3fb819fd1215.png

那么，标准 4D-Var 中的成本函数可以重新写为：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/11e6211ac88df061f123d512a19fb273.png

4.2. 进一步简化：切线线性模型

这引出了 ECMWF 所指的“4D-Var 的核心”：线性化物理。这个概念基于数学规则，即当增量很小时，非线性函数可以被切线线性模型近似，即函数的雅可比矩阵：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/6b4bdec9ac14889d6b683615f58a9f12.png

这样，x0 的非线性优化问题就转化为增量 x0 的二次优化问题：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/18b91397cab454c323834ac886041050.png

与梯度下降法相比，使用共轭梯度或拟牛顿优化器等方法解决二次优化问题要容易得多。

4.3. 总结：算法

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/821bd5fb628105892117f828214f14a6.png

4.4. 与标准 4D-Var 的比较

增量 4D-Var 将非线性优化问题分解成多个线性问题。它是通过将过程分为两个独立的循环来做到这一点的：一个外循环用于更新初始时间步的状态（x0），一个内循环用于推导x0的最优增量。由于内循环中的二次优化比梯度下降更快地收敛到全局最优值，因此增量 4D-Var 比标准 4D-Var 的外循环收敛得更快。

5. 4D-Var 实现

首先，导入必要的库。整个管道使用 PyTorch 实现，这简化了通过其自动梯度计算推导雅可比矩阵和切线线性模型。

import torch
import numpy as np
from typing import Dict, Tuple
import pandas as pd
from scipy.optimize import minimize
import seaborn as sns
from sklearn.metrics import root_mean_squared_error

5.1. 标准 4D-Var

在标准的 4D-Var 更新步骤中，时间窗口内的观测数据被用来推导窗口开始时的最优模型状态（x0）。然后使用梯度下降法更新状态x0。

def standard_4dvar_update(x_background: torch.Tensor,
                          window_observations: torch.Tensor,
                          model: callable,
                          obs_operator: callable,
                          Q_inv: torch.Tensor,
                          R_inv: torch.Tensor,
                          window_model_input: Dict[str, torch.Tensor],
                          n_iter: int = 10,
                          lr: float = 0.01) -> torch.Tensor:
    '''
    :param x_background: (state_dim,), initial guess of state
    :param window_observations: (assimilation_window, y_dim), observations over the assimilation window
    :param model: state transition
    :param obs_operator: observation operator
    :param Q_inv: (state_dim, state_dim), inverse of the state transition noise covariance matrix
    :param R_inv: (y_dim, y_dim), inverse of the observation noise covariance matrix
    :param window_model_input: dict of (assimilation_window, input_dim), external input for the state transition model
    :param n_iter: number of iterations for the gradient descent
    :param lr: learning rate
    :return: (state_dim, ), updated state
    '''
    num_steps = len(window_observations)

    # Initialize the state
    x0 = x_background.clone()

    for _ in range(n_iter):
        # Forward integration (state propagation over the assimilation window)
        x_forward = torch.zeros((num_steps, x0.shape[0]), dtype=torch.float32)
        x_forward[0] = x0
        innovations = torch.zeros((num_steps, y_dim), dtype=torch.float32)
        for t in range(0, num_steps):
            if t > 0:
                step_model_input = {k: v[t] for k, v in window_model_input.items()}
                x_forward[t] = model(x_forward[t - 1], **step_model_input)
            innovations[t] = window_observations[t] - obs_operator(x_forward[t])

        # Backward integration to compute gradients of the cost w.r.t. initial state using adjoint
        x0_grad = nonlinear_delta(x_forward, innovations, model, obs_operator, Q_inv, R_inv, window_model_input)

        # Apply gradient descent to update x0
        x0 = x0 - lr * x0_grad

    return x0

在每次梯度下降步骤中，通过传播梯度（由雅可比矩阵向后时间表示）计算成本函数的总梯度。这个过程在nonlinear_delta函数中实现。你可以参考这篇帖子中的算法部分来理解以下代码。

def nonlinear_delta(x_forward: torch.Tensor,
                    innovations: torch.Tensor,
                    model: callable,
                    obs_operator: callable,
                    Q_inv: torch.Tensor,
                    R_inv: torch.Tensor,
                    window_model_input: Dict[str, torch.Tensor]) -> torch.Tensor:
    '''
    :param x_forward: (assimilation_window, state_dim), state trajectory over the assimilation window
    :param innovations: (assimilation_window, y_dim), innovations over the assimilation window
    :param model: state transition
    :param obs_operator: observation operator
    :param Q_inv: (state_dim, state_dim), inverse of the state transition noise covariance matrix
    :param R_inv: (y_dim, y_dim), inverse of the observation noise covariance matrix
    :param window_model_input: dict of (assimilation_window, input_dim), external input for the state transition model
    :return: (state_dim, ), gradient of the cost function w.r.t. the initial state
    '''
    num_steps = len(innovations)

    # Initialize adjoint tensor with zeros
    adjoint = torch.zeros_like(x_forward)

    # Backward pass to compute gradients with respect to the initial state (x0)
    for t in range(num_steps - 1, -1, -1):  # Going backward in time

        # Gradients of the observation term (J_y)
        operator_tlm = torch.autograd.functional.jacobian(lambda x: obs_operator(x), x_forward[t])
        grad_obs = -operator_tlm.t() @ R_inv @ innovations[t]

        # Compute the adjoint at time t (this is the gradient w.r.t. the state at time t)
        adjoint[t] = grad_obs  # Adjoint at time t comes from the innovation term

        if t < num_steps - 1:
            # Backpropagate through the model: compute the adjoint of the model at time t
            step_model_input = {k: v[t + 1] for k, v in window_model_input.items()}
            model_tlm = torch.autograd.functional.jacobian(lambda x: model(x, **step_model_input), x_forward[t])

            # Chain rule to propagate the adjoint backward
            adjoint[t] += model_tlm.t() @ adjoint[t + 1]

    J_b_grad = Q_inv @ (x_forward[0] - x_background)  # Gradient of the background term
    x0_grad = adjoint[0] + J_b_grad
    return x0_grad

5.2. 增量 4D-Var

与标准 4D-Var 类似，增量 4D-Var 使用时间窗口内的观测数据来优化起始状态x0。然而，在每次优化步骤中，增量 4D-Var 通过直接通过二次优化找到最优增量dx0来更新x0，这比标准 4D-Var 中的非线性优化问题要简单。

def incremental_4dvar_update(x_background: torch.Tensor,
                             window_observations: torch.Tensor,
                             model: callable,
                             obs_operator: callable,
                             Q_inv: torch.Tensor,
                             R_inv: torch.Tensor,
                             window_model_input: Dict[str, torch.Tensor],
                             n_iter: int = 10) -> torch.Tensor:
    '''
    :param x_background: (state_dim,), initial guess of state
    :param window_observations: (assimilation_window, y_dim), observations over the assimilation window
    :param model: state transition
    :param obs_operator: observation operator
    :param Q_inv: (state_dim, state_dim), inverse of the state transition noise covariance matrix
    :param R_inv: (y_dim, y_dim), inverse of the observation noise covariance matrix
    :param window_model_input: dict of (assimilation_window, input_dim), external input for the state transition model
    :param n_iter: number of iterations for the increment calculation
    :return: (state_dim, ), updated state
    '''
    num_steps = len(window_observations)

    # Initialize the state
    x0 = x_background.clone()

    for _ in range(n_iter):
        # Outer loop: forward integration (state propagation over the assimilation window)
        x_forward = torch.zeros((num_steps, x0.shape[0]), dtype=torch.float32)
        x_forward[0] = x0
        innovations = torch.zeros((num_steps, y_dim), dtype=torch.float32)
        for t in range(0, num_steps):
            if t > 0:
                step_model_input = {k: v[t] for k, v in window_model_input.items()}
                x_forward[t] = model(x_forward[t - 1], **step_model_input)
            innovations[t] = window_observations[t] - obs_operator(x_forward[t])

        # Compute the optimized increment
        dx0 = linear_delta(x_forward, innovations, model, obs_operator, Q_inv, R_inv, window_model_input)

        # Update the initial state (x0)
        x0 = x0 + dx0

    return x0

在内循环函数linear_delta中，沿着从参考状态开始的轨迹表示的切线线性模型——通过雅可比矩阵——使得成本函数成为二次函数。这使得成本函数可以通过scipy.optimize.minimize中的共轭梯度法进行最小化。

def linear_delta(x_forward: torch.Tensor,
                 innovations: torch.Tensor,
                 model: callable,
                 obs_operator: callable,
                 Q_inv: torch.Tensor,
                 R_inv: torch.Tensor,
                 window_model_input: Dict[str, torch.Tensor]) -> torch.Tensor:
    '''
    :param x_forward: (assimilation_window, state_dim), state trajectory over the assimilation window
    :param innovations: (assimilation_window, y_dim), innovations over the assimilation window
    :param model: state transition
    :param obs_operator: observation operator
    :param Q_inv: (state_dim, state_dim), inverse of the state transition noise covariance matrix
    :param R_inv: (y_dim, y_dim), inverse of the observation noise covariance matrix
    :param window_model_input: dict of (assimilation_window, input_dim), external input for the state transition model
    :return: (state_dim, ), optimized increment of the initial state
    '''
    num_steps = len(innovations)
    state_dim = x_forward.shape[1]

    # Calculate the Tangent Linear Model (TLM) matrices
    model_tlm_stacks = torch.zeros((num_steps, state_dim, state_dim), dtype=torch.float32)
    operator_tlm_stacks = torch.zeros((num_steps, y_dim, state_dim), dtype=torch.float32)
    for t in range(num_steps):
        step_model_input = {k: v[t] for k, v in window_model_input.items()}
        model_tlm = torch.autograd.functional.jacobian(lambda x: model(x, **step_model_input), x_forward[t])
        operator_tlm = torch.autograd.functional.jacobian(lambda x: obs_operator(x), x_forward[t])
        model_tlm_stacks[t] = model_tlm
        operator_tlm_stacks[t] = operator_tlm

    def cost_function(dx0, num_steps, model_tlm_stacks_np, operator_tlm_stacks_np, innovations_np, Q_inv_np, R_inv_np):

        # Propagete perturbation using the TLM
        J_b = 0.5 * dx0.T @ Q_inv_np @ dx0
        J_y = 0.
        dxi = dx0.copy()
        for t in range(num_steps):
            if t > 0:
                dxi = model_tlm_stacks_np[t - 1] @ dxi
            dyi = operator_tlm_stacks_np[t] @ dxi
            J_y += 0.5 * (innovations_np[t] - dyi).T @ R_inv_np @ (innovations_np[t] - dyi)
        return J_b + J_y

    dx0_init = np.zeros_like(x_forward[0]) + .5
    model_tlm_stacks_np = model_tlm_stacks.numpy()
    operator_tlm_stacks_np = operator_tlm_stacks.numpy()
    innovations_np = innovations.numpy()
    Q_inv_np = Q_inv.numpy()
    R_inv_np = R_inv.numpy()

    res = minimize(
        cost_function, dx0_init,
        args=(num_steps, model_tlm_stacks_np, operator_tlm_stacks_np, innovations_np, Q_inv_np, R_inv_np),
        method='CG',
        options={'gtol': 1e-6}
    )
    dx0 = torch.tensor(res.x, dtype=torch.float32)

    return dx0

5.3. 同化循环

在现实世界的问题中，数据同化是在连续的时间窗口中进行的，以产生无缝的时间模型校正。以下是一个简化的操作数据同化模块，它在一个时间段内运行 4D-Var（标准或增量），并更新所有时间步的状态。

def vad_4dvar_loop(x_background: torch.Tensor,
                   observations_y: torch.Tensor,
                   assimilation_window: int,
                   model: callable,
                   obs_operator: callable,
                   Q: torch.Tensor,
                   R: torch.Tensor,
                   model_input: Dict[str, torch.Tensor],
                   method: str = "standard",
                   n_iter: int = 10,
                   lr: float = 0.01,
                   ) -> torch.Tensor:
    '''
    :param x_background: (state_dim,), initial guess of state
    :param observations_y: (time_steps, y_dim), observations
    :param assimilation_window: number of time steps to assimilate in a batch
    :param model: state transition
    :param obs_operator: observation operator
    :param Q: (state_dim, state_dim), state transition noise covariance matrix
    :param R: (y_dim, y_dim), observation noise covariance matrix
    :param model_input: dict of (time_steps, input_dim), external input for the state transition model
    :param method: "standard" or "incremental"
    :param n_iter: number of iterations for the gradient descent (standard) or increment calculation (incremental)
    :param lr: learning rate for the gradient descent
    :return: (time_steps, state_dim), updated state trajectory
    '''
    x_assimilated = []
    Q_inv, R_inv = torch.inverse(Q), torch.inverse(R)

    for start in range(0, len(observations_y), assimilation_window):
        window_observations = observations_y[start:(start + assimilation_window)]
        window_model_input = {k: v[start:(start + assimilation_window)] for k, v in model_input.items()}

        # Initialize the background state for the current window
        if start > 0:
            step_model_input = {k: v[0] for k, v in window_model_input.items()}
            x_background = model(x_assimilated[-1], **step_model_input)

        if method == "standard":
            x0 = standard_4dvar_update(x_background, window_observations, model, obs_operator, Q_inv, R_inv, window_model_input, n_iter, lr)
        elif method == "incremental":
            x0 = incremental_4dvar_update(x_background, window_observations, model, obs_operator, Q_inv, R_inv, window_model_input, n_iter)
        else:
            raise ValueError(f"Unknown method: {method}")

        # Update simulation states
        x_assimilated.append(x0)
        for t in range(1, assimilation_window):  # end with the start of the next window
            step_model_input = {k: v[t] for k, v in window_model_input.items()}
            x_assimilated.append(model(x_assimilated[-1], **step_model_input))

    return torch.stack(x_assimilated)

6. 案例研究

本文使用简化的降雨径流模型作为案例研究。该模型的描述可以在解决蝴蝶效应：使用集合卡尔曼滤波进行数据同化问题概述部分找到，模型和数据生成的代码可在pytorch-data-assimilation-tutorial中找到。我们通过比较以下内容来评估同化对模拟精度的影响：

模拟径流（模型输出）
同化径流（同化后的模型输出）
观测径流（地面真实值）

下两个图展示了这些比较。标准 4D-Var 和增量 4D-Var 算法产生的模型状态和输出变量与地面真实值更加吻合，比纯模拟的要好得多。

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/fbd9e7a0a280373a2c9cf7f45743751a.png

案例研究中模型状态和输出的轨迹（真实值 vs. 模拟值 vs. 同化值）。同化算法为标准 4D-Var。来源：作者。

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/bed01c3afbccc3bcdd1dbb406bc4f58c.png

案例研究中模型状态和输出的轨迹（真实值 vs. 模拟值 vs. 同化值）。同化算法为增量 4D-Var。来源：作者。

7. 摘要

在这篇文章中，你了解了标准 4D-Var 及其更高效的版本——增量 4D-Var——这是稳定天气预报的关键技术，天气预报对天气系统状态非常敏感。文章介绍了 4D-Var 的数学基础，其根植于贝叶斯推理和非线性优化。它还展示了如何使用具有自动微分功能的现代计算工具，如 PyTorch，简化算法的实现。在现实世界的实际天气预报系统中，由于数据维度和体积很大，整合 4D-Var 要复杂得多，还有很多技术需要学习。