逻辑回归

最新推荐文章于 2022-08-11 19:41:19 发布

Simp丶

最新推荐文章于 2022-08-11 19:41:19 发布

阅读量325

点赞数 1

CC 4.0 BY-SA版权

分类专栏：机器学习文章标签：逻辑回归

本文链接：https://blog.youkuaiyun.com/sp1206/article/details/80138521

机器学习专栏收录该内容

5 篇文章

订阅专栏

假设函数表示

y ∈ {0, 1}，因变量y只有0，1两种取值，
为此改变假设函数的形式，使假设函数 $h_\theta (x)$ 满足 $0 \leq h_\theta (x) \leq 1$

$\begin{align*}& h_\theta (x) = g ( \theta^T x ) \newline \newline& z = \theta^T x \newline& g(z) = \dfrac{1}{1 + e^{-z}}\end{align*}$

得到假设函数：

h θ (x) = 1 1 + e - θ T x

$h_\theta (x)= \dfrac{1}{1 + e^{-\theta^Tx}}$
称为逻辑函数（Logistic Function）或者S型函数(Sigmoid Function)

对于样本x，

hθ(x)hθ(x) $h_\theta(x)$ 给出输出值为1的概率，即

P(y=1|x;θ)P(y=1|x;θ) $P(y=1|x;\theta)$

决策边界

为了得到离散的0和1的两个分类，我们将假设函数做以下转化

h θ (x) \geq 0.5 \to y = 1 h θ (x) < 0.5 \to y = 0

$\begin{align*}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < 0.5 \rightarrow y = 0 \newline\end{align*}$
即当假设函数值大于等于0.5时，预测y=1；小于0.5时，预测y=0.

有

h θ (x) = g (θ T x) \geq 0.5 w h e n θ T x \geq 0

$\begin{align*}& h_\theta(x) = g(\theta^T x) \geq 0.5 \newline& when \; \theta^T x \geq 0\end{align*}$ ，所以

θ T x \geq 0 \Rightarrow y = 1 θ T x < 0 \Rightarrow y = 0

$\begin{align*}& \theta^T x \geq 0 \Rightarrow y = 1 \newline& \theta^T x < 0 \Rightarrow y = 0 \newline\end{align*}$
决策边界就是区分预测y=1的区域和y=0的区域的曲线，它是假设函数的属性，与数据集无关。
曲线

θTx=0θTx=0 $\theta^T x = 0$ 即决策边界。

代价函数

逻辑回归的代价函数：

J (θ) = 1 m \sum i = 1 m C o s t (h θ (x (i)), y (i)) C o s t (h θ (x), y) = - log (h θ (x)) C o s t (h θ (x), y) = - log (1 - h θ (x)) if y = 1 if y = 0

$\begin{align*}& J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; & \text{if y = 1} \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; & \text{if y = 0}\end{align*}$

y=1, 这里写图片描述，y=0,

C o s t (h θ (x), y) = 0 if h θ (x) = y C o s t (h θ (x), y) \to \infty if y = 0 a n d h θ (x) \to 1 C o s t (h θ (x), y) \to \infty if y = 1 a n d h θ (x) \to 0

$\begin{align*}& \mathrm{Cost}(h_\theta(x),y) = 0 \text{ if } h_\theta(x) = y \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 0 \; \mathrm{and} \; h_\theta(x) \rightarrow 1 \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 1 \; \mathrm{and} \; h_\theta(x) \rightarrow 0 \newline \end{align*}$

C o s t (h θ (x), y) = - y log (h θ (x)) - (1 - y) log (1 - h θ (x))

$\mathrm{Cost}(h_\theta(x),y) = -y\log(h_\theta(x))-(1-y)\log(1-h_\theta(x))$
完整的代价函数

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))]

$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$
向量表示

h = g (X θ) J (θ) = 1 m \cdot (- y T log (h) - (1 - y) T log (1 - h))

$\begin{align*} & h = g(X\theta)\newline & J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right) \end{align*}$

梯度下降

$\begin{align*}& Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta) \newline & \rbrace\end{align*}$

$\begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align*}$

向量运算实现： $\theta := \theta - \frac{\alpha}{m} X^{T} (g(X \theta ) - \vec{y})$
梯度运算的推导：

\partial \partial θ j J (θ) = - 1 m \sum i = 1 m (y (i) 1 h θ ( x ( i ) ) \partial h θ ( x ( i ) ) \partial θ j - (1 - y (i)) 1 1 - h θ ( x ( i ) ) \partial h θ ( x ( i ) ) \partial θ j) = - 1 m \sum i = 1 m (y (i) 1 g ( θ T x ( i ) ) - (1 - y (i)) 1 1 - g ( θ T x ( i ) )) \partial g ( θ T x ( i ) ) \partial θ j = - 1 m \sum i = 1 m (y (i) 1 g ( θ T x ( i ) ) - (1 - y (i)) 1 1 - g ( θ T x ( i ) )) g (θ T x (i)) (1 - g (θ T x (i))) x (i) j = - 1 m \sum i = 1 m (y (i) (1 - g (θ T x (i))) - (1 - y (i)) g (θ T x (i))) x (i) j = - 1 m \sum i = 1 m (y (i) - g (θ T x (i))) x (i) j = 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\begin{align*} \dfrac{\partial}{\partial \theta_j}J(\theta) & = -\frac{1}{m}\sum_{i=1}^m \left (y^{(i)}\frac{1}{h_\theta(x^{(i)})}\dfrac{\partial h_\theta(x^{(i)})}{\partial \theta_j}-(1-y^{(i)})\frac{1}{1-h_\theta(x^{(i)})}\dfrac{\partial h_\theta(x^{(i)})}{\partial \theta_j} \right ) \newline &=-\frac{1}{m}\sum_{i=1}^m \left (y^{(i)} \frac{1}{g(\theta^Tx^{(i)})} -(1-y^{(i)}) \frac{1}{1-g(\theta^Tx^{(i)})} \right ) \dfrac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j} \newline &=-\frac{1}{m}\sum_{i=1}^m \left (y^{(i)} \frac{1}{g(\theta^Tx^{(i)})} -(1-y^{(i)}) \frac{1}{1-g(\theta^Tx^{(i)})} \right ) g(\theta^Tx^{(i)})(1-g(\theta^Tx^{(i)}))x_j^{(i)} \newline &=-\frac{1}{m}\sum_{i=1}^m \left (y^{(i)}(1-g(\theta^Tx^{(i)})) - (1-y^{(i)})g(\theta^Tx^{(i)}) \right ) x_j^{(i)} \newline &=-\frac{1}{m}\sum_{i=1}^m \left ( y^{(i)} - g(\theta^Tx^{(i)}) \right ) x_j^{(i)} \newline &=\frac{1}{m}\sum_{i=1}^m \left (h_\theta(x^{(i)}) - y^{(i)} \right ) x_j^{(i)} \end{align*}$
2到3步：

\partial g ( θ T x ( i ) ) \partial θ j = d d z (1 1 + e - z) \partial ( θ T x ( i ) ) \partial θ j \dots 令 z = θ T x (i) = e - z ( 1 + e - z ) 2 \partial ( θ 0 + θ 1 x 1 + \dots + θ j x j + \dots + θ m x m ) \partial θ j = e - z + 1 - 1 ( 1 + e - z ) 2 x (i) j = [1 1 + e - z - (1 1 + e - z) 2] x (i) j = (g (z) - g 2 (z)) x (i) j = g (θ T x (i)) (1 - g (θ T x (i))) x (i) j

$\begin {align*} \dfrac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j} & = \frac{\mathrm{d} }{\mathrm{d} z}\left (\frac{1}{1+e^{-z}} \right )\dfrac{\partial (\theta^Tx^{(i)})}{\partial \theta_j} \cdots 令z=\theta^Tx^{(i)} \newline &= \frac{e^{-z}}{(1+e^{-z})^2} \dfrac{\partial(\theta_0+\theta_1x_1+\cdots+\theta_jx_j+\cdots+\theta_mx_m)}{\partial \theta_j} \newline &=\frac{e^{-z}+1-1}{(1+e^{-z})^2} x_j^{(i)} \newline &=\left [ \frac{1}{1+e^{-z}}-( \frac{1}{1+e^{-z}} )^2 \right]x_j^{(i)} \newline &=(g(z)-g^2(z))x_j^{(i)} \newline &=g(\theta^Tx^{(i)})(1-g(\theta^Tx^{(i)}))x_j^{(i)} \end{align*}$

多类别分类

这里写图片描述

选择一个类别，将其余的类别都划为第二类，由此得到一个分类器，以此类推，对n个类别获得n个分类器

$\begin{align*}& y \in \lbrace0, 1 ... n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}$
预测值取各个分类器结果中最大值，即为预测结果。