吴恩达机器学习 Chapter7 逻辑回归 (Logistic Regression)

最新推荐文章于 2022-07-21 17:23:12 发布

菲啊菲啊菲

最新推荐文章于 2022-07-21 17:23:12 发布

阅读量511

点赞数

CC 4.0 BY-SA版权

分类专栏：公开课笔记机器学习逻辑回归文章标签：逻辑回归机器学习笔记公开课

本文链接：https://blog.youkuaiyun.com/weixin_38768647/article/details/88386211

笔记同时被 3 个专栏收录

9 篇文章

订阅专栏

机器学习

7 篇文章

订阅专栏

公开课

6 篇文章

订阅专栏

吴恩达机器学习 Chapter7 逻辑回归（Logistic Regression）

Classification
Hypothesis Representation
- Logistic/Sigmoid function
- - Interpretation of hypothesis
Decision Boundary
- Non-linear boundaries
Cost function
Gradient descent
Multi-class classification One-vs-all
- One-vs-all

Classification

Classification is the main application of Logistic Regression.
在这里插入图片描述

Hypothesis Representation

Logistic/Sigmoid function

$\frac{1}{1+e^{-z}}$ $h_\theta(x) = g(\theta^TX)$
$h_\theta(x) = \frac{1}{1+e^{-\theta^TX}}$

Interpretation of hypothesis

$h_\theta(x) =$ estimated probability that $y = 1$ on input $x$

Decision Boundary

Decision boundary is the property of the hypothesis, not the property of data set.
在这里插入图片描述
Predict:

" y=1 " if $h_\theta(x) \geq 0.5$ => $\theta^TX \geq 0$
" y=0 " if $h_\theta(x) \leq 0.5$ => $\theta^TX \leq 0$

Non-linear boundaries

E.g. $h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2+\theta_3x_1^2+\theta_4x_2^2)$
Decision boundaries can be more complex: higher order polynomial features.

Cost function

Square error function is not convex.

在这里插入图片描述
Cannot apply old version directly.

Use log

Intuition

If $h_\theta(x) = 0$ (predict y = 0), but actually $y = 1$ , then we’ll penalize learning algorithm by a very large cost.
在这里插入图片描述

Simplified cost function

$Cost(h_\theta(x), y) = -ylogh_\theta(x)-(1-y)log(1-h_\theta(x))$

Gradient descent

在这里插入图片描述
Want $J(\theta)$ :
$\begin{aligned} \frac{\partial J(\theta)}{\partial\theta_i}&=\frac{1}{m}\sum(-y \frac{1}{h_\theta(x)}-\frac{1-y}{1-h_\theta(x)}(-1))\frac{\partial h_\theta(x)}{\partial \theta_i} \\ &= \frac{1}{m}\sum(-y \frac{1}{h_\theta(x)}+\frac{1-y}{1-h_\theta(x)})(-\frac{1}{(1+e^{-\theta^Tx})^2})e^{-\theta^Tx}(-x) \\ &=\frac{1}{m}\sum(-y (1+e^{-\theta^Tx})+\frac{1-y}{1-\frac{1}{1+e^{-\theta^Tx}}})(\frac{x}{(1+e^{-\theta^Tx})^2})e^{-\theta^Tx} \\ &= \frac{1}{m}\sum(\frac{-y(e^{-\theta^Tx})}{(1+e^{-\theta^Tx})}+\frac{1-y}{1+e^{-\theta^Tx}})x \\ &= \frac{1}{m}\sum(\frac{-y(e^{-\theta^Tx}+1)+1}{(1+e^{-\theta^Tx})})x \\ &= \frac{1}{m}\sum(h_\theta(x)-y)x\end{aligned}$

Algorithm

在这里插入图片描述

Advanced optimization

Gradient descent
Conjugate gradient
BFGS
L-BFGS

Advantages:

No need to manually pick $\alpha$ => A clever innerloop that can automatically choose $\alpha$
Often faster than gradient descent

Disadvantages

More complex

Multi-class classification One-vs-all

Fit m classifiers that try to estimate what is the probability that $P(y=1|x;\theta)$

One-vs-all

Train a logistic regression classifier $h_\theta^i(x)$ for each i to predict the probability that $y = i$ .
To predict:
On a new input x, pick the class i that maximizes $max_ih_\theta^{(i)}(x)$