下标及其表示
Notation | Size x1x_{1}x1 | Number of bed rooms x2x_{2}x2 | Number of floors x3x_{3}x3 | Years x4x_{4}x4 | Price yyy |
---|---|---|---|---|---|
x(1)=1thx^{(1)} =1^{th}x(1)=1th trainingtrainingtraining exampleexampleexample | 2104 | 5 | 1 | 10 | 460 |
x(2)=2ndx^{(2)} =2^{nd}x(2)=2nd trainingtrainingtraining exampleexampleexample | 1416 | 3(x2(2)x^{(2)}_{2}x2(2)) | 2 | 8 | 232 |
x(3)=3rdx^{(3)} =3^{rd}x(3)=3rd trainingtrainingtraining exampleexampleexample | 1534 | 3 | 2 | 5 | 315 |
⋯\cdots⋯ | ⋯\cdots⋯ | ⋯\cdots⋯ | ⋯\cdots⋯ | ⋯\cdots⋯ | ⋯\cdots⋯ |
nnn = number of features = 4
x(i)x^{(i)}x(i) = input of ithi^{th}ith trainning example, 第iii个训练数据,4×14\times14×1 向量,定义成列向量
x(2)=(1416328) x^{(2)} = \left( \begin{matrix} 1416 \\ 3 \\ 2 \\ 8 \\ \end{matrix} \right) x(2)=⎝⎜⎜⎛1416328⎠⎟⎟⎞
xj(i)x^{(i)}_{j}xj(i) = value of feature jjj in ithi^{th}ith trainning example, 标量
多变量表示
hθ(x)(假设)=θ0+θ1x1+θ2x2+⋯h_{\theta}(x)(假设) = \theta_0 +\theta_1x_1 + \theta_2x_2 + \cdotshθ(x)(假设)=θ0+θ1x1+θ2x2+⋯
定义:x0=1x_0 = 1x0=1
x=(x0x1⋮xn)
x =
\left(
\begin{matrix}
x_0 \\
x_1 \\
\vdots \\
x_n\\
\end{matrix}
\right)
x=⎝⎜⎜⎜⎛x0x1⋮xn⎠⎟⎟⎟⎞
θ=(θ0θ1⋮θn)
\theta =
\left(
\begin{matrix}
\theta_0 \\
\theta_1 \\
\vdots \\
\theta_n\\
\end{matrix}
\right)
θ=⎝⎜⎜⎜⎛θ0θ1⋮θn⎠⎟⎟⎟⎞
hθ(x)=θT⋅x=(θ0θ1⋯θn)⋅(x0x1⋮xn)h_{\theta}(x) = \theta^T\cdot x = \left(
\begin{matrix}
\theta_0 &
\theta_1 &
\cdots &
\theta_n
\end{matrix}
\right) \cdot \left(
\begin{matrix}
x_0 \\
x_1 \\
\vdots \\
x_n\\
\end{matrix}
\right)
hθ(x)=θT⋅x=(θ0θ1⋯θn)⋅⎝⎜⎜⎜⎛x0x1⋮xn⎠⎟⎟⎟⎞
梯度下降
Hypothesis: hθ(x)=θT⋅x=θ0x0+θ1x1+θ2x2+⋯h_{\theta}(x) = \theta^T\cdot x =\theta_0x_0 +\theta_1x_1 + \theta_2x_2 + \cdotshθ(x)=θT⋅x=θ0x0+θ1x1+θ2x2+⋯
Parameters: θ\thetaθ which is a (n+1)×1(n+1) \times 1(n+1)×1 vector
Cost function: J(θ)=12m∑i=1m(hθ(x(i))−yi)2J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})^2}J(θ)=2m1i=1∑m(hθ(x(i))−yi)2
Gradient Descent:
Repeat:{
θj:=θj−α∂∂θjJ(θ)\theta_j:= \theta_j - \alpha\frac{\partial}{\partial \theta_j}J(\theta)θj:=θj−α∂θj∂J(θ)
}
So,
for j=0:
∂∂θ0J(θ)=1m∑i=1m(hθ(x(i))−yi)∂(hθ(x(i))−yi)∂θ0\frac{\partial}{\partial \theta_0}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_0}∂θ0∂J(θ)=m1i=1∑m(hθ(x(i))−yi)∂θ0∂(hθ(x(i))−yi)
=1m∑i=1m(hθ(x(i))−yi)∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯ )∂θ0=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_0}=m1i=1∑m(hθ(x(i))−yi)∂θ0∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯)
=1m∑i=1m(hθ(x(i))−yi)x0(i)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_0^{(i)}=m1i=1∑m(hθ(x(i))−yi)x0(i)
=1m∑i=1m(hθ(x(i))−yi)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}=m1i=1∑m(hθ(x(i))−yi)
for j=1:
∂∂θ1J(θ)=1m∑i=1m(hθ(x(i))−yi)∂(hθ(x(i))−yi)∂θ1\frac{\partial}{\partial \theta_1}J(\theta) = \frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(h_{\theta}(x^{(i)})-y^{i})}{\partial \theta_1}∂θ1∂J(θ)=m1i=1∑m(hθ(x(i))−yi)∂θ1∂(hθ(x(i))−yi)
=1m∑i=1m(hθ(x(i))−yi)∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯ )∂θ1=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}\frac{\partial(\theta_0x_0^{(i)} +\theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \cdots)}{\partial \theta_1}=m1i=1∑m(hθ(x(i))−yi)∂θ1∂(θ0x0(i)+θ1x1(i)+θ2x2(i)+⋯)
=1m∑i=1m(hθ(x(i))−yi)x1(i)=\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_1^{(i)}=m1i=1∑m(hθ(x(i))−yi)x1(i)
So:
Repeat:{
θj:=θj−α1m∑i=1m(hθ(x(i))−yi)xj(i)\theta_j:= \theta_j - \alpha\frac{1}{m}\sum_{i=1}^{m}{(h_{\theta}(x^{(i)})-y^{i})}x_j^{(i)}θj:=θj−αm1i=1∑m(hθ(x(i))−yi)xj(i)
}