多变量的线性回归——Linear Regresssion with Multiple Variables
多变量线性回归——Multivariant Linear Regression
多特征——Multiple Feature
Notation
nn = number of features.
= input of ithith training example.
x(i)jxj(i) = value of feature j in ithith training example.
Hypothesis
Previously:hθ(x)=θ0+θ1xPreviously:hθ(x)=θ0+θ1x
Now:hθ(x)=θ0+θ1x1+θ2x2+...+θnxnNow:hθ(x)=θ0+θ1x1+θ2x2+...+θnxn为了符号的收敛,定义x0=1x0=1即(x(i)0=1)(x0(i)=1)
x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢x0x1x2...xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥∈Rn+1, θ=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢θ0θ1θ2...θn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥x=[x0x1x2...xn]∈Rn+1, θ=[θ0θ1θ2...θn]
hθ(x)=[θ0θ1θ2...θn]⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢x0x1x2...xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥hθ(x)=[θ0θ1θ2...θn][x0x1x2...xn]
=θTx=θTx
so the hypothsis can be writen:
hθ(x)=θ0x0+θ1x1+...+θnxnhθ(x)=θ0x0+θ1x1+...+θnxn
=θTx=θTx
Multivariate Linear Regression
多元变量的梯度下降——Gradient Descent for Multiple Variables
寻找参数使得cost Function收敛:
repeat until convergence:{
θ0:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)0θ0:=θ0−α1m∑i=1m(hθ(x(i))−y(i))⋅x0(i)
θ1:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)1θ1:=θ0−α1m∑i=1m(hθ(x(i))−y(i))⋅x1(i)
θ2:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)2θ2:=θ0−α1m∑i=1m(hθ(x(i))−y(i))⋅x2(i)…
}
简单来说:
repeat until convergence:{
θj:=θ0−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)j forj:=0...nθj:=θ0−α1m∑i=1m(hθ(x(i))−y(i))⋅xj(i) forj:=0...n
}
梯度下降实用技巧(特征缩放)——Gradient Descent in Practice (Feature Scaling)
一般情况下,特征值相差不大的情况下,梯度下降会找到最近的路径得到最优值
特征缩放或者均值归一化(Mean Normalization):
xi:=xi−μisixi:=xi−μisi
其中μiμi 是第i个特征的平均值,sisi是值域(最大值-最小值)
例如:
如果 xixi 表示房价,房价为100-2000,平均数为1000,则将房价输入重新赋值为:
xi:=price−10001900xi:=price−10001900
特征下降实用技巧(学习率)——Gradient Descent in Practice(Learning rate)
目的:
Gradient Descent:
θj:=θj−α∂∂θjJ(θ)θj:=θj−α∂∂θjJ(θ)
“Debugging”: How to mark sure gradient descent is working correctly.
How to choose learning rate αα
一般情况下,如果一次迭代的代价函数J(θ)J(θ)小于10−310−3则为收敛
αα的情况:
总结:
- 如果 αα太小:很慢的收敛
- 如果 αα太大:每个迭代并不减少,并且不收敛