1. ilqr
ILQR算法是基于nominal trajectory(x~,u~)(\tilde{x}, \tilde{u})(x~,u~)来优化求解的。ILQR是求解状态变量和控制变量的增量序列(δx∗,δu∗)(\delta x^*, \delta u^*)(δx∗,δu∗)求解轨迹的局部最优值。
1.1 无约束轨迹优化问题形式
x∗,u∗=argminx,u[∑0N−1Lk(xk,uk)+LF(xN)]s.t.{ xk+1=f(xk,uk)x0=xstart \begin{split} & x^* ,u^* = \mathop{\arg \min}\limits_{x, u} [\sum_0^{N-1}L^k(x_k, u_k) + L_F(x_N) ] \\ & s.t. \quad \left\{\begin{aligned} x_{k+1} &= f(x_k, u_k) \\ x_0 &= x_{start} \end{aligned}\right. \end{split} x∗,u∗=x,uargmin[0∑N−1Lk(xk,uk)+LF(xN)]s.t.{ xk+1x0=f(xk,uk)=xstart
1.2 Backward Pass
VkV^kVk是第k(k∈[0,N])k(k \in [0,N])k(k∈[0,N])的最优 cost-to-go,根据Bellman方程有:
{
Vk(xN)=LF(xN)Vk(xk)=minuk[Lk(xk,uk)+Vk+1(f(xk,uk))]\left\{\begin{aligned} V^k(x_N) &= L_F(x_N) \\ V^k(x_k) &= \min\limits_{u_k}[L^k(x_k, u_k) + V^{k+1}(f(x_k, u_k))] \end{aligned}\right.⎩
⎨
⎧Vk(xN)Vk(xk)=LF(xN)=ukmin[Lk(xk,uk)+Vk+1(f(xk,uk))]
根据Bellman方程的等式右侧公式,定义Perturbation如下:
Pk(δx,δu)≜Lk(x~k+δxk,u~k+δuk)−Lk(x~k,u~k)+Vk+1(f(x~k+δxk,u~k+δuk))−Vk+1(f(x~k,u~k))P^k(\delta x, \delta u) \triangleq L^k(\tilde{x}_k + \delta x_k, \tilde{u}_k + \delta u_k) - L^k(\tilde{x}_k, \tilde{u}_k) + V^{k+1}(f(\tilde{x}_k + \delta x_k, \tilde{u}_k + \delta u_k)) - V^{k+1}(f(\tilde{x}_k, \tilde{u}_k))Pk(δx,δu)≜Lk(x~k+δxk,u~k+δuk)−Lk(x~k,u~k)+Vk+1(f(x~k+δxk,u~k+δuk))−Vk+1(f(x~k,u~k))
其中,Pk(0,0)=0P^k(0,0) = 0Pk(0,0)=0,使用二阶泰勒展开:
Pk(δx,δu)≈12[1δxδu]T[0(Pxk)T(Puk)TPxkPxxkPxukPukPuxkPuuk][1δxδu]=12(δxPxk+δuPuk+(Pxk)Tδx+δxPxxkδx+δuPuxkδx+(Puk)Tδu+δxPxukδu+δuPuukδu)P^k(\delta x, \delta u) \approx \frac{1}{2} \begin{bmatrix} 1 \\ \delta x \\ \delta u \end{bmatrix} ^T \begin{bmatrix} 0 & (P_x^k)^T & (P_u^k)^T \\ P_x^k & P_{xx}^k & P_{xu}^k \\ P_u^k & P_{ux}^k & P_{uu}^k \\ \end{bmatrix} \begin{bmatrix} 1 \\ \delta x \\ \delta u \end{bmatrix} =\frac{1}{2} (\delta x P_x^k + \delta u P_u^k + (P_x^k)^T \delta x + \delta x P_{xx}^k \delta x + \delta u P_{ux}^k \delta x + (P_u^k)^T \delta u + \delta x P_{xu}^k \delta u + \delta u P_{uu}^k \delta u)Pk(δx,δu)≈21
1δxδu
T
0PxkPuk(Pxk)TPxxkPuxk(Puk)TPxukPuuk
1δxδu
=21(δxPxk+δuPuk+(Pxk)Tδx+δxPxxkδx+δuPuxkδx+(Puk)Tδu+δxPxukδu+δuPuukδu)
∂P∂(δu)=12(Puk+Puxkδx+Puk+Pxukδx+2Puukδu)=Puk+Puxkδx+Puukδu\frac{\partial{P}}{\partial(\delta u)} = \frac{1}{2} (P_u^k + P_{ux}^k \delta x + P_u^k + P_{xu}^k \delta x + 2 P_{uu}^k \delta u) = P_u^k + P_{ux}^k \delta x + P_{uu}^k \delta u∂(δu)∂P=21(Puk+Puxkδx+Puk+Pxukδx+2Puukδu)=Puk+Puxkδx+Puukδu
PkP^kPk是标准二次型,因此满足一阶条件下,PkP^kPk取到最小值,因此令∂P∂(δu)=Puk+Puxkδx+Puukδu=0\frac{\partial{P}}{\partial(\delta u)} = P_u^k + P_{ux}^k \delta x + P_{uu}^k \delta u = 0∂(δu)∂P=Puk+Puxkδx+Puukδu=0,
可得δu=−(Puuk)−1(Puk+Puxkδx)\delta u = -(P_{uu}^k)^{-1} (P_u^k + P_{ux}^k \delta x )δu=−(Puuk)−1(Puk+Puxkδx)
其中:
{
Pxk=Lxk+fxTVxk+1Puk=Luk+fuTVxk+1Pxxk=Lxxk+fxTVxxk+1fx+Vxk+1fxxPuuk=Luuk+fuTVxxk+1fu+Vxk+1fuuPuxk=Luxk+fuTVxxk+1fx+Vxk+1fux\begin{cases} \begin{aligned} P_{x}^k &= L_{x}^k + f_{x}^T V_{x}^{k+1} \\ P_{u}^k &= L_{u}^k + f_{u}^T V_{x}^{k+1} \\ P_{xx}^k &= L_{xx}^k + f_{x}^T V_{xx}^{k+1} f_{x} + V_{x}^{k+1} f_{xx} \\ P_{uu}^k &= L_{uu}^k + f_{u}^T V_{xx}^{k+1} f_{u} + V_{x}^{k+1} f_{uu} \\ P_{ux}^k &= L_{ux}^k + f_{u}^T V_{xx}^{k+1} f_{x} + V_{x}^{k+1} f_{ux} \\ \end{aligned} \end{cases}⎩
⎨
⎧PxkPukPxxkPuukPuxk=Lxk+fxTVxk+1=Luk+fuTVxk+1=Lxxk+fxTVxxk+1fx+Vxk+1fxx=Luuk+fuTVxxk+1fu+Vxk+1fuu=Luxk+fuTVxxk+1fx+Vxk+1fux
由于Vk+1(xk+1)=Vk+1(fk(x,u))V^{k+1}(x_{k+1}) = V^{k+1}(f^{k}(x, u))Vk+1(xk+1)=Vk+1(fk(x,u)),因此链式求导VVV都是对xxx求偏导。对于ILQR来说,系统的二阶导数为000,即忽略fxx,fuu,fuxf_{xx}, f_{uu},f_{ux}fxx,fuu,fux。在DDP中,二