梯度下降(Gradient Descent)数学推导，多变量

最新推荐文章于 2021-07-09 10:56:27 发布

dycdyccc

最新推荐文章于 2021-07-09 10:56:27 发布

阅读量396

点赞数

分类专栏： AI

AI 专栏收录该内容

3 篇文章

订阅专栏

本文深入探讨了多元线性回归模型的数学表示及其参数优化方法——梯度下降算法。通过具体实例，详细解释了如何使用下标表示特征，并介绍了如何将多元线性回归模型转换为矩阵形式，以便于计算。此外，还阐述了梯度下降算法的工作原理，包括其迭代更新参数的公式，以及如何计算成本函数。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

下标及其表示

Notation	Size $x_{1}$	Number of bed rooms $x_{2}$	Number of floors $x_{3}$	Years $x_{4}$	Price $y$
$x^{(1)} =1^{th}$ $t r a i n i n g$ $e x a m p l e$	2104	5	1	10	460
$x^{(2)} =2^{nd}$ $t r a i n i n g$ $e x a m p l e$	1416	3（ $x^{(2)}_{2}$ ）	2	8	232
$x^{(3)} =3^{rd}$ $t r a i n i n g$ $e x a m p l e$	1534	3	2	5	315
$⋯\cdots$	$⋯\cdots$	$⋯\cdots$	$⋯\cdots$	$⋯\cdots$	$⋯\cdots$

$n$ = number of features = 4
$x^{(i)}$ = input of $i^{th}$ trainning example，第 $i$ 个训练数据， $4×14\times1$ 向量，定义成列向量
$x^{(2)} = \left( \begin{matrix} 1416 \\ 3 \\ 2 \\ 8 \\ \end{matrix} \right)$
$xj(i)x^{(i)}_{j}$ = value of feature $j$ in $i^{th}$ trainning example, 标量

多变量表示

$hθ(x)(假设)=θ0+θ1x1+θ2x2+⋯h_{\theta}(x)(假设) = \theta_0 +\theta_1x_1 + \theta_2x_2 + \cdots$
定义： $x_0 = 1$
$\left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right)$
$\theta = \left( \begin{matrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n\\ \end{matrix} \right)$
$hθ(x)=θT⋅x=(θ0θ1⋯θn)⋅(x0x1⋮xn)h_{\theta}(x) = \theta^T\cdot x = \left( \begin{matrix} \theta_0 & \theta_1 & \cdots & \theta_n \end{matrix} \right) \cdot \left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right)$

梯度下降

Hypothesis: $hθ(x)=θT⋅x=θ0x0+θ1x1+θ2x2+⋯h_{\theta}(x) = \theta^T\cdot x =\theta_0x_0 +\theta_1x_1 + \theta_2x_2 + \cdots$
Parameters: $θ\theta$ which is a $\times 1$ vector
Cost function: $J(θ)=12m∑i=1m(hθ(x(i))−yi)2J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})^2}$
Gradient Descent:

Repeat:{
$θj:=θj−α∂∂θjJ(θ)\theta_j:= \theta_j - \alpha\frac{\partial}{\partial \theta_j}J(\theta)$
}
So,
for j=0:
$∂∂θ0J(θ)=1m∑i=1m(hθ(x(i))−yi)∂(hθ(x(i))−yi)∂θ0\frac{\partial}{\partial \theta_0}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_0}$
$=1m∑i=1m(hθ(x(i))−yi)∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯ )∂θ0=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_0}$
$=1m∑i=1m(hθ(x(i))−yi)x0(i)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_0^{(i)}$
$=1m∑i=1m(hθ(x(i))−yi)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}$
for j=1:
$∂∂θ1J(θ)=1m∑i=1m(hθ(x(i))−yi)∂(hθ(x(i))−yi)∂θ1\frac{\partial}{\partial \theta_1}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_1}$
$=1m∑i=1m(hθ(x(i))−yi)∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯ )∂θ1=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_1}$
$=1m∑i=1m(hθ(x(i))−yi)x1(i)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_1^{(i)}$