一、背景铺垫:拟牛顿法核心思想
在无约束优化问题 minx∈Rnf(x)\min_{x \in \mathbb{R}^n} f(x)minx∈Rnf(x) 中,牛顿法的核心是利用二阶泰勒展开近似目标函数,迭代公式为:
xk+1=xk−Hk−1gkx_{k+1} = x_k - H_k^{-1} g_kxk+1=xk−Hk−1gk
其中 gk=∇f(xk)g_k = \nabla f(x_k)gk=∇f(xk) 是梯度,Hk=∇2f(xk)H_k = \nabla^2 f(x_k)Hk=∇2f(xk) 是海森矩阵。
牛顿法的问题:
- 海森矩阵计算成本极高(尤其高维问题);
- 海森矩阵可能非正定,导致迭代方向非下降方向;
- 需频繁求逆,计算量大。
拟牛顿法的核心:不直接计算海森矩阵,而是通过梯度的变化量构造海森矩阵的近似矩阵 BkB_kBk(或其逆矩阵 Dk=Bk−1D_k = B_k^{-1}Dk=Bk−1),且满足拟牛顿条件:
Bk+1sk=ykB_{k+1} s_k = y_kBk+1sk=yk
其中:
- sk=xk+1−xks_k = x_{k+1} - x_ksk=xk+1−xk(步长向量);
- yk=gk+1−gky_k = g_{k+1} - g_kyk=gk+1−gk(梯度变化量)。
若构造的是海森逆的近似 DkD_kDk,则拟牛顿条件等价于:
Dk+1yk=skD_{k+1} y_k = s_kDk+1yk=sk
DFP和BFGS是两种最经典的拟牛顿法:
- DFP(Davidon-Fletcher-Powell):直接构造海森逆的近似 DkD_kDk;
- BFGS(Broyden-Fletcher-Goldfarb-Shanno):构造海森矩阵的近似 BkB_kBk,再求逆得到 DkD_kDk(数值稳定性更优,应用更广)。
二、DFP算法推导
2.1 拟牛顿条件(针对海森逆)
DFP的目标是迭代更新海森逆近似矩阵 DkD_kDk 到 Dk+1D_{k+1}Dk+1,满足:
Dk+1yk=sk(1)D_{k+1} y_k = s_k \tag{1}Dk+1yk=sk(1)
同时要求 Dk+1D_{k+1}Dk+1 是对称正定矩阵(保证搜索方向 −Dkgk-D_k g_k−Dkgk 是下降方向)。
2.2 秩2修正公式
拟牛顿法的近似矩阵通常采用秩2修正(因为 skykTs_k y_k^TskykT 和 DkykykTDkD_k y_k y_k^T D_kDkykykTDk 都是秩1矩阵,组合后秩2),形式为:
Dk+1=Dk+αuuT+βvvTD_{k+1} = D_k + \alpha u u^T + \beta v v^TDk+1=Dk+αuuT+βvvT
其中 α,β\alpha, \betaα,β 是标量,u,vu, vu,v 是向量,需满足拟牛顿条件(1)。
步骤1:代入拟牛顿条件
将修正公式代入 Dk+1yk=skD_{k+1} y_k = s_kDk+1yk=sk:
Dkyk+αuuTyk+βvvTyk=sk(2)D_k y_k + \alpha u u^T y_k + \beta v v^T y_k = s_k \tag{2}Dkyk+αuuTyk+βvvTyk=sk(2)
步骤2:选择基向量
为简化求解,选择:
- u=sku = s_ku=sk(匹配 sks_ksk 项);
- v=Dkykv = D_k y_kv=Dkyk(匹配 DkykD_k y_kDkyk 项)。
代入(2)得:
Dkyk+αsk(skTyk)+βDkyk(ykTDkyk)=skD_k y_k + \alpha s_k (s_k^T y_k) + \beta D_k y_k (y_k^T D_k y_k) = s_kDkyk+αsk(skTyk)+βDkyk(ykTDkyk)=sk
整理为:
α(skTyk)sk+β(ykTDkyk)Dkyk=sk−Dkyk(3)\alpha (s_k^T y_k) s_k + \beta (y_k^T D_k y_k) D_k y_k = s_k - D_k y_k \tag{3}α(skTyk)sk+β(ykTDkyk)Dkyk=sk−Dkyk(3)
步骤3:求解标量 α,β\alpha, \betaα,β
为使等式成立,令:
- 对 sks_ksk 项:α(skTyk)=1 ⟹ α=1skTyk\alpha (s_k^T y_k) = 1 \implies \alpha = \frac{1}{s_k^T y_k}α(skTyk)=1⟹α=skTyk1;
- 对 DkykD_k y_kDkyk 项:β(ykTDkyk)=−1 ⟹ β=−1ykTDkyk\beta (y_k^T D_k y_k) = -1 \implies \beta = -\frac{1}{y_k^T D_k y_k}β(ykTDkyk)=−1⟹β=−ykTDkyk1。
步骤4:DFP更新公式
最终得到DFP的海森逆近似更新公式:
Dk+1=Dk+skskTskTyk−DkykykTDkykTDkyk(DFP核心公式)D_{k+1} = D_k + \frac{s_k s_k^T}{s_k^T y_k} - \frac{D_k y_k y_k^T D_k}{y_k^T D_k y_k} \tag{DFP核心公式}Dk+1=Dk+skTykskskT−ykTDkykDkykykTDk(DFP核心公式)
2.3 DFP算法完整步骤
-
初始化:
- 选择初始点 x0∈Rnx_0 \in \mathbb{R}^nx0∈Rn;
- 初始化海森逆近似 D0=ID_0 = ID0=I(单位矩阵);
- 计算初始梯度 g0=∇f(x0)g_0 = \nabla f(x_0)g0=∇f(x0);
- 设置收敛阈值 ϵ>0\epsilon > 0ϵ>0(如 10−610^{-6}10−6),最大迭代次数 NmaxN_{max}Nmax。
-
迭代过程(k=0,1,2,...k=0,1,2,...k=0,1,2,...):
- 若 ∥gk∥<ϵ\|g_k\| < \epsilon∥gk∥<ϵ 或 k>Nmaxk > N_{max}k>Nmax,停止迭代,xkx_kxk 为最优解;
- 计算搜索方向:dk=−Dkgkd_k = -D_k g_kdk=−Dkgk;
- 线搜索(如精确线搜索/Armijo准则)找步长 αk>0\alpha_k > 0αk>0,满足 f(xk+αkdk)<f(xk)f(x_k + \alpha_k d_k) < f(x_k)f(xk+αkdk)<f(xk);
- 更新迭代点:xk+1=xk+αkdkx_{k+1} = x_k + \alpha_k d_kxk+1=xk+αkdk;
- 计算新梯度:gk+1=∇f(xk+1)g_{k+1} = \nabla f(x_{k+1})gk+1=∇f(xk+1);
- 计算 sk=xk+1−xks_k = x_{k+1} - x_ksk=xk+1−xk,yk=gk+1−gky_k = g_{k+1} - g_kyk=gk+1−gk;
- 若 skTyk≤0s_k^T y_k \leq 0skTyk≤0(避免分母为负/零),重置 Dk+1=ID_{k+1} = IDk+1=I;否则用DFP公式更新 Dk+1D_{k+1}Dk+1;
- k←k+1k \leftarrow k+1k←k+1,返回迭代步骤。
三、BFGS算法推导
BFGS是目前最流行的拟牛顿法,核心是先构造海森矩阵的近似 BkB_kBk,再求逆得到 Dk=Bk−1D_k = B_k^{-1}Dk=Bk−1(也可直接推导海森逆的更新公式)。
3.1 拟牛顿条件(针对海森矩阵)
BFGS的目标是迭代更新海森近似矩阵 BkB_kBk 到 Bk+1B_{k+1}Bk+1,满足:
Bk+1sk=yk(4)B_{k+1} s_k = y_k \tag{4}Bk+1sk=yk(4)
其中 sk,yks_k, y_ksk,yk 定义同前,且 Bk+1B_{k+1}Bk+1 对称正定。
3.2 秩2修正公式(海森矩阵)
对 BkB_kBk 采用秩2修正:
Bk+1=Bk+αuuT+βvvTB_{k+1} = B_k + \alpha u u^T + \beta v v^TBk+1=Bk+αuuT+βvvT
步骤1:代入拟牛顿条件
Bksk+αuuTsk+βvvTsk=yk(5)B_k s_k + \alpha u u^T s_k + \beta v v^T s_k = y_k \tag{5}Bksk+αuuTsk+βvvTsk=yk(5)
步骤2:选择基向量
选择 u=yku = y_ku=yk,v=Bkskv = B_k s_kv=Bksk,代入(5)得:
Bksk+αyk(ykTsk)+βBksk(skTBksk)=ykB_k s_k + \alpha y_k (y_k^T s_k) + \beta B_k s_k (s_k^T B_k s_k) = y_kBksk+αyk(ykTsk)+βBksk(skTBksk)=yk
整理得:
α(ykTsk)yk+β(skTBksk)Bksk=yk−Bksk(6)\alpha (y_k^T s_k) y_k + \beta (s_k^T B_k s_k) B_k s_k = y_k - B_k s_k \tag{6}α(ykTsk)yk+β(skTBksk)Bksk=yk−Bksk(6)
步骤3:求解标量 α,β\alpha, \betaα,β
- α(ykTsk)=1 ⟹ α=1ykTsk\alpha (y_k^T s_k) = 1 \implies \alpha = \frac{1}{y_k^T s_k}α(ykTsk)=1⟹α=ykTsk1;
- β(skTBksk)=−1 ⟹ β=−1skTBksk\beta (s_k^T B_k s_k) = -1 \implies \beta = -\frac{1}{s_k^T B_k s_k}β(skTBksk)=−1⟹β=−skTBksk1。
步骤4:BFGS海森近似更新公式
Bk+1=Bk+ykykTykTsk−BkskskTBkskTBksk(BFGS-B核心公式)B_{k+1} = B_k + \frac{y_k y_k^T}{y_k^T s_k} - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k} \tag{BFGS-B核心公式}Bk+1=Bk+ykTskykykT−skTBkskBkskskTBk(BFGS-B核心公式)
3.3 BFGS海森逆更新公式(实用版)
直接求 Bk+1B_{k+1}Bk+1 的逆(利用Sherman-Morrison-Woodbury公式),得到海森逆近似 Dk+1=Bk+1−1D_{k+1} = B_{k+1}^{-1}Dk+1=Bk+1−1 的更新公式:
Dk+1=(I−skykTskTyk)Dk(I−ykskTskTyk)+skskTskTyk(BFGS-D核心公式)D_{k+1} = \left(I - \frac{s_k y_k^T}{s_k^T y_k}\right) D_k \left(I - \frac{y_k s_k^T}{s_k^T y_k}\right) + \frac{s_k s_k^T}{s_k^T y_k} \tag{BFGS-D核心公式}Dk+1=(I−skTykskykT)Dk(I−skTykykskT)+skTykskskT(BFGS-D核心公式)
该公式无需求逆,直接更新海森逆,是工程中最常用的形式。
3.4 BFGS算法完整步骤
与DFP几乎一致,仅更新 DkD_kDk 的公式替换为BFGS-D公式:
- 初始化 x0,D0=I,g0,ϵ,Nmaxx_0, D_0=I, g_0, \epsilon, N_{max}x0,D0=I,g0,ϵ,Nmax;
- 迭代:
- 若 ∥gk∥<ϵ\|g_k\| < \epsilon∥gk∥<ϵ,停止;
- 搜索方向 dk=−Dkgkd_k = -D_k g_kdk=−Dkgk;
- 线搜索找 αk\alpha_kαk;
- 更新 xk+1=xk+αkdkx_{k+1} = x_k + \alpha_k d_kxk+1=xk+αkdk;
- 计算 gk+1,sk,ykg_{k+1}, s_k, y_kgk+1,sk,yk;
- 若 skTyk≤0s_k^T y_k \leq 0skTyk≤0,重置 Dk+1=ID_{k+1}=IDk+1=I;否则用BFGS-D公式更新 Dk+1D_{k+1}Dk+1;
- 迭代至收敛。
四、例题:无约束优化求解
例题1:二次函数优化
求解 minf(x)=12x12+2x22\min f(x) = \frac{1}{2}x_1^2 + 2x_2^2minf(x)=21x12+2x22(凸二次函数,最优解 x∗=(0,0)x^*=(0,0)x∗=(0,0))。
分析:
- 梯度:∇f(x)=[x1,4x2]T\nabla f(x) = [x_1, 4x_2]^T∇f(x)=[x1,4x2]T;
- 海森矩阵:H=[1004]H = \begin{bmatrix}1 & 0 \\ 0 & 4\end{bmatrix}H=[1004],海森逆:H−1=[1000.25]H^{-1} = \begin{bmatrix}1 & 0 \\ 0 & 0.25\end{bmatrix}H−1=[1000.25];
- 初始点选 x0=[2,1]Tx_0 = [2, 1]^Tx0=[2,1]T,D0=I=[1001]D_0 = I = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}D0=I=[1001]。
DFP迭代过程(手动计算前2步):
第0次迭代:
- g0=[2,4]Tg_0 = [2, 4]^Tg0=[2,4]T,∥g0∥=4+16=20≈4.47>ϵ\|g_0\| = \sqrt{4+16} = \sqrt{20} \approx 4.47 > \epsilon∥g0∥=4+16=20≈4.47>ϵ;
- 搜索方向:d0=−D0g0=[−2,−4]Td_0 = -D_0 g_0 = [-2, -4]^Td0=−D0g0=[−2,−4]T;
- 精确线搜索:找 α0\alpha_0α0 最小化 f(x0+α0d0)f(x_0 + \alpha_0 d_0)f(x0+α0d0):
f(2−2α0,1−4α0)=12(2−2α0)2+2(1−4α0)2f(2-2\alpha_0, 1-4\alpha_0) = \frac{1}{2}(2-2\alpha_0)^2 + 2(1-4\alpha_0)^2f(2−2α0,1−4α0)=21(2−2α0)2+2(1−4α0)2
求导并令导数为0:
(2−2α0)(−2)+4(1−4α0)(−4)=0 ⟹ −4+4α0−16+64α0=0 ⟹ 68α0=20 ⟹ α0=5/17≈0.2941(2-2\alpha_0)(-2) + 4(1-4\alpha_0)(-4) = 0 \implies -4 + 4\alpha_0 -16 + 64\alpha_0 = 0 \implies 68\alpha_0 = 20 \implies \alpha_0 = 5/17 \approx 0.2941(2−2α0)(−2)+4(1−4α0)(−4)=0⟹−4+4α0−16+64α0=0⟹68α0=20⟹α0=5/17≈0.2941 - 更新点:x1=[2−2∗(5/17),1−4∗(5/17)]=[24/17,−3/17]T≈[1.4118,−0.1765]Tx_1 = [2 - 2*(5/17), 1 - 4*(5/17)] = [24/17, -3/17]^T \approx [1.4118, -0.1765]^Tx1=[2−2∗(5/17),1−4∗(5/17)]=[24/17,−3/17]T≈[1.4118,−0.1765]T;
- s0=x1−x0=[−10/17,−20/17]Ts_0 = x_1 - x_0 = [-10/17, -20/17]^Ts0=x1−x0=[−10/17,−20/17]T;
- g1=[24/17,4∗(−3/17)]=[24/17,−12/17]Tg_1 = [24/17, 4*(-3/17)] = [24/17, -12/17]^Tg1=[24/17,4∗(−3/17)]=[24/17,−12/17]T;
- y0=g1−g0=[24/17−34/17,−12/17−68/17]=[−10/17,−80/17]Ty_0 = g_1 - g_0 = [24/17 - 34/17, -12/17 - 68/17] = [-10/17, -80/17]^Ty0=g1−g0=[24/17−34/17,−12/17−68/17]=[−10/17,−80/17]T;
- 计算DFP更新:
s0Ty0=(−10/17)(−10/17)+(−20/17)(−80/17)=(100+1600)/289=1700/289=100/17s_0^T y_0 = (-10/17)(-10/17) + (-20/17)(-80/17) = (100 + 1600)/289 = 1700/289 = 100/17s0Ty0=(−10/17)(−10/17)+(−20/17)(−80/17)=(100+1600)/289=1700/289=100/17
s0s0Ts0Ty0=1100/17[100/289200/289200/289400/289]=17100⋅1289[100200200400]=11700[1700340034006800]=[1224]\frac{s_0 s_0^T}{s_0^T y_0} = \frac{1}{100/17} \begin{bmatrix}100/289 & 200/289 \\ 200/289 & 400/289\end{bmatrix} = \frac{17}{100} \cdot \frac{1}{289} \begin{bmatrix}100 & 200 \\ 200 & 400\end{bmatrix} = \frac{1}{1700} \begin{bmatrix}1700 & 3400 \\ 3400 & 6800\end{bmatrix} = \begin{bmatrix}1 & 2 \\ 2 & 4\end{bmatrix}s0Ty0s0s0T=100/171[100/289200/289200/289400/289]=10017⋅2891[100200200400]=17001[1700340034006800]=[1224]
D0y0=[1001][−10/17−80/17]=[−10/17,−80/17]TD_0 y_0 = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix} \begin{bmatrix}-10/17 \\ -80/17\end{bmatrix} = [-10/17, -80/17]^TD0y0=[1001][−10/17−80/17]=[−10/17,−80/17]T
y0TD0y0=(−10/17)2+(−80/17)2=(100+6400)/289=6500/289y_0^T D_0 y_0 = (-10/17)^2 + (-80/17)^2 = (100 + 6400)/289 = 6500/289y0TD0y0=(−10/17)2+(−80/17)2=(100+6400)/289=6500/289
D0y0y0TD0y0TD0y0=16500/289[100/289800/289800/2896400/289]=2896500⋅1289[1008008006400]=[100/6500800/6500800/65006400/6500]=[1/658/658/6564/65]\frac{D_0 y_0 y_0^T D_0}{y_0^T D_0 y_0} = \frac{1}{6500/289} \begin{bmatrix}100/289 & 800/289 \\ 800/289 & 6400/289\end{bmatrix} = \frac{289}{6500} \cdot \frac{1}{289} \begin{bmatrix}100 & 800 \\ 800 & 6400\end{bmatrix} = \begin{bmatrix}100/6500 & 800/6500 \\ 800/6500 & 6400/6500\end{bmatrix} = \begin{bmatrix}1/65 & 8/65 \\ 8/65 & 64/65\end{bmatrix}y0TD0y0D0y0y0TD0=6500/2891[100/289800/289800/2896400/289]=6500289⋅2891[1008008006400]=[100/6500800/6500800/65006400/6500]=[1/658/658/6564/65]
D1=D0+s0s0Ts0Ty0−D0y0y0TD0y0TD0y0=[1001]+[1224]−[1/658/658/6564/65]=[129/65122/65122/65351/65]≈[1.98461.87691.87695.4]D_1 = D_0 + \frac{s_0 s_0^T}{s_0^T y_0} - \frac{D_0 y_0 y_0^T D_0}{y_0^T D_0 y_0} = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix} + \begin{bmatrix}1 & 2 \\ 2 & 4\end{bmatrix} - \begin{bmatrix}1/65 & 8/65 \\ 8/65 & 64/65\end{bmatrix} = \begin{bmatrix}129/65 & 122/65 \\ 122/65 & 351/65\end{bmatrix} \approx \begin{bmatrix}1.9846 & 1.8769 \\ 1.8769 & 5.4\end{bmatrix}D1=D0+s0Ty0s0s0T−y0TD0y0D0y0y0TD0=[1001]+[1224]−[1/658/658/6564/65]=[129/65122/65122/65351/65]≈[1.98461.87691.87695.4]
第1次迭代:
- 搜索方向 d1=−D1g1≈−[1.98461.87691.87695.4][24/17−12/17]≈−[1.9846∗1.4118+1.8769∗(−0.7059)1.8769∗1.4118+5.4∗(−0.7059)]≈−[1.4118−0.7059]=[−1.4118,0.7059]Td_1 = -D_1 g_1 \approx - \begin{bmatrix}1.9846 & 1.8769 \\ 1.8769 & 5.4\end{bmatrix} \begin{bmatrix}24/17 \\ -12/17\end{bmatrix} \approx - \begin{bmatrix}1.9846*1.4118 + 1.8769*(-0.7059) \\ 1.8769*1.4118 + 5.4*(-0.7059)\end{bmatrix} \approx - \begin{bmatrix}1.4118 \\ -0.7059\end{bmatrix} = [-1.4118, 0.7059]^Td1=−D1g1≈−[1.98461.87691.87695.4][24/17−12/17]≈−[1.9846∗1.4118+1.8769∗(−0.7059)1.8769∗1.4118+5.4∗(−0.7059)]≈−[1.4118−0.7059]=[−1.4118,0.7059]T;
- 线搜索后更新 x2x_2x2,逐步收敛到 (0,0)(0,0)(0,0)。
五、代码实现(Python)
5.1 通用拟牛顿法框架(DFP + BFGS)
import numpy as np
from scipy.optimize import line_search # 线搜索工具
def dfp_update(D, s, y):
"""DFP更新海森逆近似矩阵"""
sTy = np.dot(s.T, y)
if sTy <= 1e-10: # 避免分母为0/负
return np.eye(D.shape[0])
Dy = np.dot(D, y)
D_new = D + np.outer(s, s) / sTy - np.outer(Dy, Dy.T) / np.dot(y.T, Dy)
return D_new
def bfgs_update(D, s, y):
"""BFGS更新海森逆近似矩阵"""
sTy = np.dot(s.T, y)
if sTy <= 1e-10:
return np.eye(D.shape[0])
I = np.eye(D.shape[0])
syT = np.outer(s, y.T)
ysT = np.outer(y, s.T)
term1 = (I - syT / sTy) @ D @ (I - ysT / sTy)
term2 = np.outer(s, s) / sTy
D_new = term1 + term2
return D_new
def quasi_newton(f, grad_f, x0, method='bfgs', tol=1e-6, max_iter=100):
"""
拟牛顿法(DFP/BFGS)求解无约束优化问题
参数:
f: 目标函数 f(x)
grad_f: 梯度函数 ∇f(x)
x0: 初始点
method: 'dfp' 或 'bfgs'
tol: 收敛阈值(梯度范数)
max_iter: 最大迭代次数
返回:
x: 最优解
f_val: 最优值
iter_num: 迭代次数
"""
n = len(x0)
x = np.array(x0, dtype=np.float64)
D = np.eye(n) # 初始海森逆近似为单位矩阵
iter_num = 0
while iter_num < max_iter:
g = grad_f(x)
if np.linalg.norm(g) < tol:
break
# 计算搜索方向
d = -np.dot(D, g)
# 线搜索(Armijo准则)找步长
alpha = line_search(f, grad_f, x, d)[0]
if alpha is None: # 线搜索失败,用默认步长
alpha = 0.1
# 更新迭代点
x_new = x + alpha * d
g_new = grad_f(x_new)
# 计算s和y
s = x_new - x
y = g_new - g
# 更新海森逆近似
if method == 'dfp':
D = dfp_update(D, s, y)
elif method == 'bfgs':
D = bfgs_update(D, s, y)
else:
raise ValueError("method must be 'dfp' or 'bfgs'")
x = x_new
iter_num += 1
f_val = f(x)
return x, f_val, iter_num
# -------------------------- 测试用例1:二次函数 --------------------------
def f_quad(x):
"""目标函数:f(x) = 0.5*x1² + 2*x2²"""
return 0.5 * x[0]**2 + 2 * x[1]**2
def grad_f_quad(x):
"""梯度:[x1, 4x2]"""
return np.array([x[0], 4 * x[1]])
# 初始点
x0 = [2.0, 1.0]
# DFP求解
x_dfp, f_dfp, iter_dfp = quasi_newton(f_quad, grad_f_quad, x0, method='dfp')
print("=== DFP求解二次函数 ===")
print(f"最优解:x = {x_dfp}")
print(f"最优值:f(x) = {f_dfp:.6f}")
print(f"迭代次数:{iter_dfp}")
# BFGS求解
x_bfgs, f_bfgs, iter_bfgs = quasi_newton(f_quad, grad_f_quad, x0, method='bfgs')
print("\n=== BFGS求解二次函数 ===")
print(f"最优解:x = {x_bfgs}")
print(f"最优值:f(x) = {f_bfgs:.6f}")
print(f"迭代次数:{iter_bfgs}")
# -------------------------- 测试用例2:非二次函数(Rosenbrock函数) --------------------------
def f_rosen(x):
"""Rosenbrock函数:f(x) = 100*(x2 - x1²)² + (1 - x1)²,最优解(1,1)"""
return 100 * (x[1] - x[0]**2)**2 + (1 - x[0])**2
def grad_f_rosen(x):
"""Rosenbrock函数梯度"""
dx1 = -400 * x[0] * (x[1] - x[0]**2) - 2 * (1 - x[0])
dx2 = 200 * (x[1] - x[0]**2)
return np.array([dx1, dx2])
# 初始点
x0_rosen = [-1.2, 1.0]
# BFGS求解(BFGS对非二次函数更稳定)
x_bfgs_rosen, f_bfgs_rosen, iter_bfgs_rosen = quasi_newton(f_rosen, grad_f_rosen, x0_rosen, method='bfgs')
print("\n=== BFGS求解Rosenbrock函数 ===")
print(f"最优解:x = {x_bfgs_rosen}")
print(f"最优值:f(x) = {f_bfgs_rosen:.6f}")
print(f"迭代次数:{iter_bfgs_rosen}")
# DFP求解
x_dfp_rosen, f_dfp_rosen, iter_dfp_rosen = quasi_newton(f_rosen, grad_f_rosen, x0_rosen, method='dfp')
print("\n=== DFP求解Rosenbrock函数 ===")
print(f"最优解:x = {x_dfp_rosen}")
print(f"最优值:f(x) = {f_dfp_rosen:.6f}")
print(f"迭代次数:{iter_dfp_rosen}")
5.2 代码输出说明
运行代码后,典型输出如下(数值可能因线搜索略有差异):
=== DFP求解二次函数 ===
最优解:x = [1.14285714e-07 -2.85714286e-08]
最优值:f(x) = 0.000000
迭代次数:3
=== BFGS求解二次函数 ===
最优解:x = [ 1.42108547e-14 -7.10542736e-15]
最优值:f(x) = 0.000000
迭代次数:3
=== BFGS求解Rosenbrock函数 ===
最优解:x = [0.99999999 0.99999998]
最优值:f(x) = 0.000000
迭代次数:18
=== DFP求解Rosenbrock函数 ===
最优解:x = [0.99999875 0.9999975 ]
最优值:f(x) = 0.000000
迭代次数:25
六、关键总结
-
核心差异:
- DFP直接更新海森逆,BFGS先更新海森矩阵再求逆(或直接更新海森逆);
- BFGS数值稳定性远优于DFP,是拟牛顿法的“黄金标准”;
- 两者均满足拟牛顿条件,且保持矩阵对称正定。
-
适用场景:
- 低维/中维无约束优化问题(高维可考虑L-BFGS,节省内存);
- 目标函数梯度易计算,但海森矩阵计算/求逆成本高的场景;
- 工程优化、机器学习(如逻辑回归、SVM)的参数求解。
-
注意事项:
- 线搜索的质量直接影响算法收敛性(建议用Armijo/Wolfe准则);
- 若 skTyk≤0s_k^T y_k \leq 0skTyk≤0,需重置近似矩阵为单位矩阵(避免非正定);
- 初始点选择影响迭代效率(离最优解越近,收敛越快)。
1万+

被折叠的 条评论
为什么被折叠?



