分类问题
- 二元分类问题
- 多元分类问题
逻辑回归算法
逻辑回归模型
sigmoid function/logestic function
g(z)=11+e−z
hθ(x)=g(θTX)
hθ(x)=11+e−θTX=p(y=1|x;θ)
决策边界: θX=0
- 线性决策边界
- 非线性决策边界
if hθ(x)>=0.5 (θX>=0) ; then y=1;
else y=0;
the decision boundary is a properity not of
the training set , but of the hypothesis and of the parameters.
cost function
构造Cost Function
J(θ)=1m∑i=1mCost(hθ(x(i)),y(i))Cost(hθ(x),y)=−log(hθ(x))Cost(hθ(x),y)=−log(1−hθ(x))if y = 1if y = 0
Cost(hθ(x),y)=0 if hθ(x)=yCost(hθ(x),y)→∞ if y=0andhθ(x)→1Cost(hθ(x),y)→∞ if y=1andhθ(x)→0
Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
J(θ)=−1m∑i=1m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
∂J(θ)∂θj=1m∑mi=1(hθ(x(i)−y(i))xij
极大似然法
y=11+e−z
hθ(x)=11+e−θTX=eθTX1+eθTX
lny1−y=θTX
即:lnp(y=1|x)p(y=0|x)=θTX
p(y=1|x;θ)=hθ(x)=eθTX1+eθTX
p(y=0|x;θ)=1−hθ(x)=11+eθTX
p(y=0|x;θ)=(hθ(x))y(1−hθ(x))1−y
L(θ)=p(y→|X;θ)=∏mi=1p(y(i)|x(i);θ)=∏mi=1(hθ(x(i)))y(i)(1−hθ(x(i)))1−y(i)
l(θ)=logL(θ)=−1m∑i=1m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
由g'(z)=g(z)(1−g(z)),得
Gradient descent
θj:=θj−α∂∂θjJ(θ)=θj−αm∑mi=1(hθ(x(i))−y(i))x(i)j
feature scalling
向量化
h=g(Xθ)J(θ)=1m⋅(−yTlog(h)−(1−y)Tlog(1−h))
grad=1m(h−y)∗X
θ:=θ−αmXT(g(Xθ)−y⃗ )
Conjugate gradient
BFGS
L-BFGS
多分类问题
One-vs-all classification
过度拟合
- 欠拟合/高偏差
- 过拟合/高方差
1) Reduce the number of features:
- Manually select which features to keep.
-Use a model selection algorithm (studied later in the course).
2) Regularization
- Keep all the features, but reduce the magnitude of parameters θj.
- Regularization works well when we have a lot of slightly useful features.
minθ 12m [∑mi=1(hθ(x(i))−y(i))2+λ ∑nj=1θ2j]
正则线性回归
正则梯度下降
J(θ)= 12m [∑mi=1(hθ(x(i))−y(i))2+λ ∑nj=1θ2j]
θj:=θj(1−αλm)−α1m∑mi=1(hθ(x(i))−y(i))x(i)j
正则正规方程
正则逻辑回归
正则梯度下降
J(θ)=−1m∑mi=1[y(i) log(hθ(x(i)))+(1−y(i)) log(1−hθ(x(i)))] whenj=0
J(θ)=−1m∑mi=1[y(i) log(hθ(x(i)))+(1−y(i)) log(1−hθ(x(i)))]+λ2m∑nj=1θ2j
∂J(θ)∂θj=1m∑mi=1(hθ(x(i)−y(i))xij forj=0
∂J(θ)∂θj=1m∑mi=1(hθ(x(i)−y(i))xij+λmθj forj>=1