梯度下降(Gradient Descent)数学推导,多变量

本文深入探讨了多元线性回归模型的数学表示及其参数优化方法——梯度下降算法。通过具体实例,详细解释了如何使用下标表示特征,并介绍了如何将多元线性回归模型转换为矩阵形式,以便于计算。此外,还阐述了梯度下降算法的工作原理,包括其迭代更新参数的公式,以及如何计算成本函数。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

下标及其表示

NotationSize x1x_{1}x1Number of bed rooms x2x_{2}x2Number of floors x3x_{3}x3Years x4x_{4}x4Price yyy
x(1)=1thx^{(1)} =1^{th}x(1)=1th trainingtrainingtraining exampleexampleexample21045110460
x(2)=2ndx^{(2)} =2^{nd}x(2)=2nd trainingtrainingtraining exampleexampleexample14163x2(2)x^{(2)}_{2}x2(2)28232
x(3)=3rdx^{(3)} =3^{rd}x(3)=3rd trainingtrainingtraining exampleexampleexample1534325315
⋯\cdots⋯\cdots⋯\cdots⋯\cdots⋯\cdots⋯\cdots

nnn = number of features = 4
x(i)x^{(i)}x(i) = input of ithi^{th}ith trainning example, 第iii个训练数据,4×14\times14×1 向量,定义成列向量
x(2)=(1416328) x^{(2)} = \left( \begin{matrix} 1416 \\ 3 \\ 2 \\ 8 \\ \end{matrix} \right) x(2)=1416328
xj(i)x^{(i)}_{j}xj(i) = value of feature jjj in ithi^{th}ith trainning example, 标量

多变量表示

hθ(x)(假设)=θ0+θ1x1+θ2x2+⋯h_{\theta}(x)(假设) = \theta_0 +\theta_1x_1 + \theta_2x_2 + \cdotshθ(x)()=θ0+θ1x1+θ2x2+
定义:x0=1x_0 = 1x0=1
x=(x0x1⋮xn) x = \left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right) x=x0x1xn
θ=(θ0θ1⋮θn) \theta = \left( \begin{matrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n\\ \end{matrix} \right) θ=θ0θ1θn
hθ(x)=θT⋅x=(θ0θ1⋯θn)⋅(x0x1⋮xn)h_{\theta}(x) = \theta^T\cdot x = \left( \begin{matrix} \theta_0 & \theta_1 & \cdots & \theta_n \end{matrix} \right) \cdot \left( \begin{matrix} x_0 \\ x_1 \\ \vdots \\ x_n\\ \end{matrix} \right) hθ(x)=θTx=(θ0θ1θn)x0x1xn

梯度下降

Hypothesis: hθ(x)=θT⋅x=θ0x0+θ1x1+θ2x2+⋯h_{\theta}(x) = \theta^T\cdot x =\theta_0x_0 +\theta_1x_1 + \theta_2x_2 + \cdotshθ(x)=θTx=θ0x0+θ1x1+θ2x2+
Parameters: θ\thetaθ which is a (n+1)×1(n+1) \times 1(n+1)×1 vector
Cost function: J(θ)=12m∑i=1m(hθ(x(i))−yi)2J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})^2}J(θ)=2m1i=1m(hθ(x(i))yi)2
Gradient Descent:

Repeat:{
θj:=θj−α∂∂θjJ(θ)\theta_j:= \theta_j - \alpha\frac{\partial}{\partial \theta_j}J(\theta)θj:=θjαθjJ(θ)
}
So,
for j=0:
∂∂θ0J(θ)=1m∑i=1m(hθ(x(i))−yi)∂(hθ(x(i))−yi)∂θ0\frac{\partial}{\partial \theta_0}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_0}θ0J(θ)=m1i=1m(hθ(x(i))yi)θ0(hθ(x(i))yi)
=1m∑i=1m(hθ(x(i))−yi)∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯ )∂θ0=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_0}=m1i=1m(hθ(x(i))yi)θ0(θ0x0(i)+θ1x1(i)+θ2x2(i)+)
=1m∑i=1m(hθ(x(i))−yi)x0(i)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_0^{(i)}=m1i=1m(hθ(x(i))yi)x0(i)
=1m∑i=1m(hθ(x(i))−yi)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}=m1i=1m(hθ(x(i))yi)
for j=1:
∂∂θ1J(θ)=1m∑i=1m(hθ(x(i))−yi)∂(hθ(x(i))−yi)∂θ1\frac{\partial}{\partial \theta_1}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_1}θ1J(θ)=m1i=1m(hθ(x(i))yi)θ1(hθ(x(i))yi)
=1m∑i=1m(hθ(x(i))−yi)∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯ )∂θ1=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_1}=m1i=1m(hθ(x(i))yi)θ1(θ0x0(i)+θ1x1(i)+θ2x2(i)+)
=1m∑i=1m(hθ(x(i))−yi)x1(i)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_1^{(i)}=m1i=1m(hθ(x(i))yi)x1(i)

So:
Repeat:{
θj:=θj−α1m∑i=1m(hθ(x(i))−yi)xj(i)\theta_j:= \theta_j - \alpha\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_j^{(i)}θj:=θjαm1i=1m(hθ(x(i))yi)xj(i)
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值