机器学习Normal Equation的推导（不要求矩阵求导）_法方程 normal equations 高斯-优快云博客

本文链接：https://blog.youkuaiyun.com/ShadyPi/article/details/122565204

本文详细推导了线性回归中正规方程的方法，通过矩阵运算避免了梯度下降的迭代过程，适用于参数数量不多的情况。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

首先是代价函数 $J(θ)J(\theta)$ 的定义为
$J(\theta)=\frac{1}{2m}\sum_{i=1}^m(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\cdots+\theta_nx_n^{(i)}-y^{(i)})^2$
对任意参数 $θj\theta_j$ 求偏导，得
$\frac{\partial J(\theta)}{\partial \theta_j}=\frac{1}{m}\sum_{i=1}^m x_j^{(i)}(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\cdots+\theta_nx_n^{(i)}-y^{(i)})$
要求极值，则偏导等于0，有
$\frac{1}{m}\sum_{i=1}^m x_j^{(i)}(\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+\cdots+\theta_nx_n^{(i)}-y^{(i)})=0$
尝试把求和号打开，化成矩阵形式
$\left[\begin{matrix} x_j^{(1)}&x_j^{(2)}&\cdots&x_j^{(m)} \end{matrix}\right] \left[\begin{matrix} \theta_0x_0^{(1)}+\theta_1x_1^{(1)}+\cdots+\theta_nx_n^{(1)}-y^{(1)}\\ \theta_0x_0^{(2)}+\theta_1x_1^{(2)}+\cdots+\theta_nx_n^{(2)}-y^{(2)}\\ \vdots \\ \theta_0x_0^{(m)}+\theta_1x_1^{(m)}+\cdots+\theta_nx_n^{(m)}-y^{(m)}\\ \end{matrix}\right] =0$
上面的第二个矩阵还可以再次展开为
$\left[\begin{matrix} \theta_0x_0^{(1)}+\theta_1x_1^{(1)}+\cdots+\theta_nx_n^{(1)}-y^{(1)}\\ \theta_0x_0^{(2)}+\theta_1x_1^{(2)}+\cdots+\theta_nx_n^{(2)}-y^{(2)}\\ \vdots \\ \theta_0x_0^{(m)}+\theta_1x_1^{(m)}+\cdots+\theta_nx_n^{(m)}-y^{(m)}\\ \end{matrix}\right]= \left[\begin{matrix} x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\ x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\ \vdots & \vdots & \ddots &\vdots\\ x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\ \end{matrix}\right] \left[\begin{matrix} \theta_0\\ \theta_1\\ \vdots \\ \theta_n\\ \end{matrix}\right]- \left[\begin{matrix} y_1\\ y_2\\ \vdots \\ y_m\\ \end{matrix}\right]$
所以，对于任一参数 $θj\theta_j$ ，都有
$\left[\begin{matrix} x_j^{(1)}&x_j^{(2)}&\cdots&x_j^{(m)} \end{matrix}\right] \left( \left[\begin{matrix} x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\ x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\ \vdots & \vdots & \ddots &\vdots\\ x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\ \end{matrix}\right] \left[\begin{matrix} \theta_0\\ \theta_1\\ \vdots \\ \theta_n\\ \end{matrix}\right]- \left[\begin{matrix} y_1\\ y_2\\ \vdots \\ y_m\\ \end{matrix}\right] \right)=0$
将所有参数 $θ\theta$ 得到的方程全部联立，就可以得到
$\left[\begin{matrix} x_0^{(1)}&x_0^{(2)}&\cdots&x_0^{(m)}\\ x_1^{(1)}&x_1^{(2)}&\cdots&x_1^{(m)}\\ \vdots & \vdots & \ddots &\vdots\\ x_n^{(1)}&x_n^{(2)}&\cdots&x_n^{(m)}\\ \end{matrix}\right] \left( \left[\begin{matrix} x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\ x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\ \vdots & \vdots & \ddots &\vdots\\ x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\ \end{matrix}\right] \left[\begin{matrix} \theta_0\\ \theta_1\\ \vdots \\ \theta_n\\ \end{matrix}\right]- \left[\begin{matrix} y_1\\ y_2\\ \vdots \\ y_m\\ \end{matrix}\right] \right)=0$
令
$\left[\begin{matrix} x_0^{(1)}&x_1^{(1)}&\cdots&x_n^{(1)}\\ x_0^{(2)}&x_1^{(2)}&\cdots&x_n^{(2)}\\ \vdots & \vdots & \ddots &\vdots\\ x_0^{(m)}&x_1^{(m)}&\cdots&x_n^{(m)}\\ \end{matrix}\right] ,\Theta=\left[\begin{matrix} \theta_0\\ \theta_1\\ \vdots \\ \theta_n\\ \end{matrix}\right], Y=\left[\begin{matrix} y_1\\ y_2\\ \vdots \\ y_m\\ \end{matrix}\right]$
则上述方程可写为
$X^T(X\Theta-Y)=0$
展开，再化简
$X^TX\Theta-X^TY=0\\ X^TX\Theta=X^TY\\ \Theta=(X^TX)^{-1}X^TY$
就可以得到Normal Equation
$\Theta=(X^TX)^{-1}X^TY$