吴恩达机器学习 Chapter7 逻辑回归 (Logistic Regression)

Classification

Classification is the main application of Logistic Regression.
在这里插入图片描述

Hypothesis Representation

Logistic/Sigmoid function

g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+ez1 h θ ( x ) = g ( θ T X ) h_\theta(x) = g(\theta^TX) hθ(x)=g(θTX)
h θ ( x ) = 1 1 + e − θ T X h_\theta(x) = \frac{1}{1+e^{-\theta^TX}} hθ(x)=1+eθTX1

Interpretation of hypothesis

h θ ( x ) = h_\theta(x) = hθ(x)= estimated probability that y = 1 y=1 y=1 on input x x x

Decision Boundary

Decision boundary is the property of the hypothesis, not the property of data set.
在这里插入图片描述
Predict:

  • " y=1 " if h θ ( x ) ≥ 0.5 h_\theta(x) \geq 0.5 hθ(x)0.5 => θ T X ≥ 0 \theta^TX \geq 0 θTX0
  • " y=0 " if h θ ( x ) ≤ 0.5 h_\theta(x) \leq 0.5 hθ(x)0.5 => θ T X ≤ 0 \theta^TX \leq 0 θTX0

Non-linear boundaries

E.g. h θ ( x ) = g ( θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 1 2 + θ 4 x 2 2 ) h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2+\theta_3x_1^2+\theta_4x_2^2) hθ(x)=g(θ0+θ1x1+θ2x2+θ3x12+θ4x22)
Decision boundaries can be more complex: higher order polynomial features.

Cost function

Square error function is not convex.

在这里插入图片描述
Cannot apply old version directly.

Use log

Intuition

If h θ ( x ) = 0 h_\theta(x) = 0 hθ(x)=0(predict y = 0), but actually y = 1 y = 1 y=1, then we’ll penalize learning algorithm by a very large cost.
在这里插入图片描述
在这里插入图片描述

Simplified cost function

C o s t ( h θ ( x ) , y ) = − y l o g h θ ( x ) − ( 1 − y ) l o g ( 1 − h θ ( x ) ) Cost(h_\theta(x), y) = -ylogh_\theta(x)-(1-y)log(1-h_\theta(x)) Cost(hθ(x),y)=yloghθ(x)(1y)log(1hθ(x))

Gradient descent

在这里插入图片描述
Want m i n J ( θ ) min J(\theta) minJ(θ):
∂ J ( θ ) ∂ θ i = 1 m ∑ ( − y 1 h θ ( x ) − 1 − y 1 − h θ ( x ) ( − 1 ) ) ∂ h θ ( x ) ∂ θ i = 1 m ∑ ( − y 1 h θ ( x ) + 1 − y 1 − h θ ( x ) ) ( − 1 ( 1 + e − θ T x ) 2 ) e − θ T x ( − x ) = 1 m ∑ ( − y ( 1 + e − θ T x ) + 1 − y 1 − 1 1 + e − θ T x ) ( x ( 1 + e − θ T x ) 2 ) e − θ T x = 1 m ∑ ( − y ( e − θ T x ) ( 1 + e − θ T x ) + 1 − y 1 + e − θ T x ) x = 1 m ∑ ( − y ( e − θ T x + 1 ) + 1 ( 1 + e − θ T x ) ) x = 1 m ∑ ( h θ ( x ) − y ) x \begin{aligned} \frac{\partial J(\theta)}{\partial\theta_i}&=\frac{1}{m}\sum(-y \frac{1}{h_\theta(x)}-\frac{1-y}{1-h_\theta(x)}(-1))\frac{\partial h_\theta(x)}{\partial \theta_i} \\ &= \frac{1}{m}\sum(-y \frac{1}{h_\theta(x)}+\frac{1-y}{1-h_\theta(x)})(-\frac{1}{(1+e^{-\theta^Tx})^2})e^{-\theta^Tx}(-x) \\ &=\frac{1}{m}\sum(-y (1+e^{-\theta^Tx})+\frac{1-y}{1-\frac{1}{1+e^{-\theta^Tx}}})(\frac{x}{(1+e^{-\theta^Tx})^2})e^{-\theta^Tx} \\ &= \frac{1}{m}\sum(\frac{-y(e^{-\theta^Tx})}{(1+e^{-\theta^Tx})}+\frac{1-y}{1+e^{-\theta^Tx}})x \\ &= \frac{1}{m}\sum(\frac{-y(e^{-\theta^Tx}+1)+1}{(1+e^{-\theta^Tx})})x \\ &= \frac{1}{m}\sum(h_\theta(x)-y)x\end{aligned} θiJ(θ)=m1(yhθ(x)11hθ(x)1y(1))θihθ(x)=m1(yhθ(x)1+1hθ(x)1y)((1+eθTx)21)eθTx(x)=m1(y(1+eθTx)+11+eθTx11y)((1+eθTx)2x)eθTx=m1((1+eθTx)y(eθTx)+1+eθTx1y)x=m1((1+eθTx)y(eθTx+1)+1)x=m1(hθ(x)y)x

Algorithm

在这里插入图片描述

Advanced optimization

  • Gradient descent
  • Conjugate gradient
  • BFGS
  • L-BFGS

Advantages:

  • No need to manually pick α \alpha α => A clever innerloop that can automatically choose α \alpha α
  • Often faster than gradient descent

Disadvantages

  • More complex

Multi-class classification One-vs-all

Fit m classifiers that try to estimate what is the probability that P ( y = 1 ∣ x ; θ ) P(y=1|x;\theta) P(y=1x;θ)

One-vs-all

Train a logistic regression classifier h θ i ( x ) h_\theta^i(x) hθi(x) for each i to predict the probability that y = i y=i y=i.
To predict:
On a new input x, pick the class i that maximizes m a x i h θ ( i ) ( x ) max_ih_\theta^{(i)}(x) maxihθ(i)(x)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值