Coursera ML笔记3-优快云博客

本文链接：https://blog.youkuaiyun.com/js1568/article/details/64905521

本文深入探讨了逻辑回归算法的基础原理及应用，包括二元与多元分类问题的解决方法、梯度下降法、正则化技术等，并介绍了如何通过极大似然法进行参数估计。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

分类问题

二元分类问题
多元分类问题

逻辑回归算法

逻辑回归模型

sigmoid function/logestic function
$g(z)=\frac{1}{1+e^{-z}}$
$h_\theta(x)=g(\theta^T X)$
$h_\theta(x)=\frac{1}{1+e^{-\theta^T X}}=p（y=1|x;\theta)$

决策边界: $\theta X=0$

线性决策边界
非线性决策边界

if $h_\theta(x)>=0.5$ ( $\theta X>=0$ ) ; then y=1;
else y=0;
the decision boundary is a properity not of
the training set , but of the hypothesis and of the parameters.

cost function

构造Cost Function

$\begin{align*}& J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; & \text{if y = 1} \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; & \text{if y = 0}\end{align*}$

$\begin{align*}& \mathrm{Cost}(h_\theta(x),y) = 0 \text{ if } h_\theta(x) = y \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 0 \; \mathrm{and} \; h_\theta(x) \rightarrow 1 \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 1 \; \mathrm{and} \; h_\theta(x) \rightarrow 0 \newline \end{align*}$

$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$
$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$
$\frac {\partial J(\theta)} {\partial {\theta^j}}= \frac 1 m \sum_{i=1}^{m}(h_\theta(x^{(i})-y^{(i)})x_j^{i}$

极大似然法

$y=\frac{1}{1+e^{-z}}$
$h_\theta(x)=\frac{1}{1+e^{-\theta^T X}}=\frac {e^{\theta^T X}} {1+e^{\theta^T X}}$
$ln \frac y {1-y}= \theta^T X$
即： $ln \frac {p（y=1|x)} {p（y=0|x )}= \theta^T X$
$p（y=1|x;\theta)=h_\theta(x)=\frac {e^{\theta^T X}} {1+e^{\theta^T X}}$
$p（y=0|x;\theta)=1-h_\theta(x)=\frac 1 {1+e^{\theta^T X}}$
$p（y=0|x;\theta)={({h_{\theta}(x)})}^y{(1-h_\theta(x))}^{1-y}$
$L(\theta)=p(\overrightarrow y|X;\theta)=\prod_{i=1}^m p(y^{(i)}|x^{(i)};\theta)=\prod_{i=1}^m {({h_{\theta}(x^{(i)})})}^{y^{(i)}}{(1-h_\theta(x^{(i)}))}^{1-y^{(i)}}$

$l(\theta)=log L(\theta)=- \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$

由 $g′(z) = g(z)(1−g(z))$ ，得

Gradient descent

$\theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta)=\theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}$

feature scalling

向量化

$\begin{align*} & h = g(X\theta)\newline & J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right) \end{align*}$

$grad=\frac 1 m (h-y)*X$

$\theta := \theta - \frac{\alpha}{m} X^{T} (g(X \theta ) - \vec{y})$

Conjugate gradient

BFGS

L-BFGS

多分类问题

One-vs-all classification

y \in {0, 1 . . . n} h (0) θ (x) = P (y = 0 | x; θ) h (1) θ (x) = P (y = 1 | x; θ) \dots h (n) θ (x) = P (y = n | x; θ) p r e d i c t i o n = max i (h (i) θ (x))

$\begin{align*}& y \in \lbrace0, 1 ... n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}$

过度拟合

欠拟合/高偏差
过拟合/高方差

1) Reduce the number of features:
- Manually select which features to keep.
-Use a model selection algorithm (studied later in the course).
2) Regularization
- Keep all the features, but reduce the magnitude of parameters θj.
- Regularization works well when we have a lot of slightly useful features.

$min_\theta\ \dfrac{1}{2m}\ \left[ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2 \right]$

正则线性回归

正则梯度下降

$J(\theta)=\ \dfrac{1}{2m}\ \left[ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2 \right]$

Repeat {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j - α [(1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j) + λ m θ j]} j \in {1, 2... n}

$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$

θj:=θj(1−αλm)−α1m∑mi=1(hθ(x(i))−y(i))x(i)j $\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$

正则正规方程

θ = (X T X + λ \cdot L) - 1 X T y where L = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ 011 ⋱ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

$\begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align*}$

正则逻辑回归

正则梯度下降

$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large]$ $when j=0$
$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$

$\frac {\partial J(\theta)} {\partial {\theta^j}}= \frac 1 m \sum_{i=1}^{m}(h_\theta(x^{(i})-y^{(i)})x_j^{i}$ $for j=0$
$\frac {\partial J(\theta)} {\partial {\theta^j}}= \frac 1 m \sum_{i=1}^{m}(h_\theta(x^{(i})-y^{(i)})x_j^{i}+\frac \lambda m \theta _j$ $for j>=1$