3.线性模型——逻辑回归

最新推荐文章于 2026-01-09 16:51:29 发布

原创最新推荐文章于 2026-01-09 16:51:29 发布 · 439 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#线性模型 #逻辑回归

线性模型专栏收录该内容

3 篇文章

订阅专栏

数据集: $\{(\vec{x}_1,y_1),(\vec{x}_2,y_2),\cdots(\vec{x}_{N},y_N)\}$
$Rn,yi∈Y={0,1}\qquad\quad\vec{x_i} \in \mathcal{X} \subseteq\ \mathbb{R^n},y_i \in \mathcal{Y}=\{0,1\}$
求解已知 $x⃗\vec{x}$ 对应的 $y$ ，即： $P(y∣x⃗)P(y\mid\vec{x})$ ，也即： $P(y=1∣x⃗)P(y=1\mid\vec{x})$

对数概率函数：
$P(y=1∣x⃗)=11+e−(w⃗T⋅x⃗+b)P(y=1\mid\vec{x}) = \frac{1}{1+e^{-(\vec{w}^{T}\cdot\vec{x}+b)}}$ 记： $w~⃗=(w⃗,b)T\vec{\tilde{w}}=(\vec{w},b)^{T}$
$    x~⃗=(x⃗,1)T\quad\,\,\,\,\vec{\tilde{x}}=(\vec{x},1)^{T}$
$D(x~⃗)=P(y=1∣x~⃗)=11+e−(w~⃗T⋅x~⃗)=ew~⃗T⋅x~⃗1+ew~⃗T⋅x~⃗D(\vec{\tilde{x}})=P(y=1\mid\vec{\tilde{x}}) = \frac{1}{1+e^{-(\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}})}}=\frac{e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}}}{1+e^{\vec{\tilde{w}}^T\cdot\vec{\tilde{x}}}}$ 同理: $1−D(x~⃗)=P(y=0∣x~⃗)=11+ew~⃗T⋅x~⃗1-D(\vec{\tilde{x}})=P(y=0\mid\vec{\tilde{x}}) =\frac{1}{1+e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}}}$ 似然函数：
$S(w~⃗∣X)=∏i=1N[D(x~⃗i)]yi[1−D(x~⃗i)]1−yi,yi∈{0,1}S(\vec{\tilde{w}}\mid X) = \prod_{i=1}^{N} [D(\vec{\tilde{x}}_i)]^{y_i}[1-D(\vec{\tilde{x}}_i)]^{_1- y_i}\hspace{0.1cm}_,\hspace{0.1cm}y_i\in\{0,1\}$ 对数似然函数：
$L(w~⃗)=log⁡S(w~⃗∣X)L(\vec{\tilde{w}})=\log S(\vec{\tilde{w}}\mid X)$ $⇒L(w~⃗)=∑iN[yilog⁡D(x~⃗i)+(1−yi)log⁡(1−D(x~⃗i))]\Rightarrow L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i\log D(\vec{\tilde{x}}_i)+(1-y_i)\log (1-D(\vec{\tilde{x}}_i))\right]$ $⇒L(w~⃗)=∑iN[yilog⁡D(x~⃗i)1−D(x~⃗i)+log⁡(1−D(x~⃗i))]\Rightarrow L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i\log \frac{D(\vec{\tilde{x}}_i)}{1-D(\vec{\tilde{x}}_i)}+\log (1-D(\vec{\tilde{x}}_i))\right]$ $⇒L(w~⃗)=∑iN[yi(w~⃗T⋅x~⃗i)−log⁡(1+ew~⃗T⋅x~⃗i)]\Rightarrow L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i(\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i)-\log (1+e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i})\right ]$

极大似然估计法：
$w~⃗∗=arg⁡max⁡w~⃗T∑iN[yi(w~⃗T⋅x~⃗i)−log⁡(1+ew~⃗T⋅x~⃗i)]\vec{\tilde{w}}^{*}=\arg\max_{\vec{\tilde{w}}^{T}}\sum_{i}^{N}\left[y_i(\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i)-\log (1+e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i})\right ]$

梯度下降法：

$最大化：L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i\log D(\vec{\tilde{x}}_i)+(1-y_i)\log (1-D(\vec{\tilde{x}}_i))\right]$ $J(\vec{\tilde{w}})=-\frac{1}{N}L(\vec{\tilde{w}})$ $最小化：J(w~⃗)=−1N∑iN[yilog⁡D(x~⃗i)+(1−yi)log⁡(1−D(x~⃗i))]最小化：J(\vec{\tilde{w}})=-\frac{1}{N}\sum_{i}^{N}\left[y_i\log D(\vec{\tilde{x}}_i)+(1-y_i)\log (1-D(\vec{\tilde{x}}_i))\right]$ $∂J(w~⃗)∂w~⃗j=−1N∑iN[yi1D(x~⃗i)∂D(x~⃗i)∂w~⃗j−(1−yi)11−D(x~⃗i)∂D(x~⃗i)∂w~⃗j]\frac{\partial J(\vec{\tilde{w}})}{\partial \vec{\tilde{w}}_j }=-\frac{1}{N}\sum_{i}^{N}\left[y_i \frac{1}{D(\vec{\tilde{x}}_i)} \frac{\partial D(\vec{\tilde{x}}_i) }{\partial\vec{\tilde{w}}_j }-(1-y_i)\frac{1}{1-D(\vec{\tilde{x}}_i)}\frac{\partial D(\vec{\tilde{x}}_i) }{\partial\vec{\tilde{w}}_j }\right]$ $=−1N∑iN(yi1D(x~⃗i)−(1−yi)11−D(x~⃗i))∂D(x~⃗i)∂w~⃗j=-\frac{1}{N}\sum_{i}^{N}\left(y_i \frac{1}{D(\vec{\tilde{x}}_i)} -(1-y_i)\frac{1}{1-D(\vec{\tilde{x}}_i)}\right)\frac{\partial D(\vec{\tilde{x}}_i) }{\partial\vec{\tilde{w}}_j }$ $=−1N∑iN(yi1D(x~⃗i)−(1−yi)11−D(x~⃗i))D(x~⃗i)(1−D(x~⃗i))∂(w~⃗T⋅x~⃗)∂w~⃗j=-\frac{1}{N}\sum_{i}^{N}\left(y_i \frac{1}{D(\vec{\tilde{x}}_i)} -(1-y_i)\frac{1}{1-D(\vec{\tilde{x}}_i)}\right)D(\vec{\tilde{x}}_i)(1-D(\vec{\tilde{x}}_i))\frac{\partial (\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}})}{\partial \vec{\tilde{w}}_j}$ $=−1N∑iN(yi(1−D(x~⃗i))−(1−yi)D(x~⃗i))∂(w~⃗T⋅x~⃗)∂w~⃗j=-\frac{1}{N}\sum_{i}^{N}\left(y_i (1-D(\vec{\tilde{x}}_i)) -(1-y_i)D(\vec{\tilde{x}}_i)\right)\frac{\partial (\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}})}{\partial \vec{\tilde{w}}_j}$ $=−1N∑iN(yi−D(x~⃗i))x~⃗ij=-\frac{1}{N}\sum_{i}^{N}\left( y_i -D(\vec{\tilde{x}}_i)\right) \vec{\tilde{x}}_i^{j}$ $=1N∑iN(D(x~⃗i)−yi)x~⃗ij=\frac{1}{N}\sum_{i}^{N}\left( D(\vec{\tilde{x}}_i) - y_i \right) \vec{\tilde{x}}_i^{j}$ $⇒1N(D(X~)−y⃗)X~⋅j\Rightarrow \frac{1}{N}(D(\tilde{X})-\vec{y})\tilde{X}_{\cdot j}$ 最后： $w~⃗j=w~⃗j−1Nη(D(X~)−y⃗)TX~⋅j\vec{\tilde{w}}_{j}=\vec{\tilde{w}}_{j} - \frac{1}{N}\eta(D(\tilde{X})-\vec{y})^{T}\tilde{X}_{\cdot j}$ $其中，D(X~)=eX~⋅w~⃗1+eX~⋅w~⃗其中，D(\tilde{X})=\frac{e^{\tilde{X}\cdot\vec{\tilde{w}}}}{1+e^{\tilde{X}\cdot\vec{\tilde{w}}}}$

python实例-1

import numpy as np
import matplotlib.pyplot as plt

def computeCost(X, y, w_e):
    return 1.0 / (2 * len(y)) * np.dot((np.dot(X, w_e) - y).T, (np.dot(X, w_e) - y))

def D(X,w_e):
    return np.exp(np.dot(X, w_e)) / (1 + np.exp(-np.dot(X, w_e)))


def gradientDescent(X, y, w_e, alpha):
    for j in range(len(w_e)):
        w_e[j] = w_e[j] - alpha * 1.0 / len(y) * np.dot((D(X,w_e) - y).T, X[:, j])
    return w_e

X = np.random.rand(10, 5)
m = np.ones((10, 1))
X = np.concatenate((X, m), axis=1)
w = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [10.0]])
y = np.dot(X, w)
a = np.mean(y, axis=0)
for i in range(10):
    if y[i][0] > a:
        y[i][0] = 1
    else: y[i][0] = 0

w_e = np.zeros_like(w)
cost = []
for i in range(10000):
    w_e = gradientDescent(X, y, w_e, 0.001)
    cost.append(computeCost(X, y, w_e)[0, 0])

print(y)
print(D(X,w_e))


fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
ax1.plot(range(10000), cost,label='lost')
ax1.set_yscale('log')
ax1.set_xlabel("Iteration")
ax1.set_ylabel("Loss")
ax1.legend(loc='best')
plt.show()

[[1.]
 [0.]
 [1.]
 [0.]
 [0.]
 [1.]
 [0.]
 [1.]
 [1.]
 [1.]]
[[0.63103198]
 [0.54273564]
 [0.72331236]
 [0.55213718]
 [0.41564281]
 [0.78018675]
 [0.49371131]
 [0.59834743]
 [0.67121118]
 [0.86004243]]

损失函数：
在这里插入图片描述
增加至迭代30000轮数

import numpy as np
import matplotlib.pyplot as plt

def computeCost(X, y, w_e):
    return 1.0 / (2 * len(y)) * np.dot((np.dot(X, w_e) - y).T, (np.dot(X, w_e) - y))

def D(X,w_e):
    return np.exp(np.dot(X, w_e)) / (1 + np.exp(-np.dot(X, w_e)))


def gradientDescent(X, y, w_e, alpha):
    for j in range(len(w_e)):
        w_e[j] = w_e[j] - alpha * 1.0 / len(y) * np.dot((D(X,w_e) - y).T, X[:, j])
    return w_e

X = np.random.rand(10, 5)
m = np.ones((10, 1))
X = np.concatenate((X, m), axis=1)
w = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [10.0]])
y = np.dot(X, w)
a = np.mean(y, axis=0)
for i in range(10):
    if y[i][0] > a:
        y[i][0] = 1
    else: y[i][0] = 0

w_e = np.zeros_like(w)
cost = []
for i in range(30000):
    w_e = gradientDescent(X, y, w_e, 0.001)
    cost.append(computeCost(X, y, w_e)[0, 0])

print(y)
print(D(X,w_e))


fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
ax1.plot(range(30000), cost,label='lost')
ax1.set_yscale('log')
ax1.set_xlabel("Iteration")
ax1.set_ylabel("Loss")
ax1.legend(loc='best')
plt.show()

在这里插入图片描述
损失有点上升！

可以接着减小学习率

import numpy as np
import matplotlib.pyplot as plt

def computeCost(X, y, w_e):
    return 1.0 / (2 * len(y)) * np.dot((np.dot(X, w_e) - y).T, (np.dot(X, w_e) - y))

def D(X,w_e):
    return np.exp(np.dot(X, w_e)) / (1 + np.exp(-np.dot(X, w_e)))


def gradientDescent(X, y, w_e, alpha):
    for j in range(len(w_e)):
        w_e[j] = w_e[j] - alpha * 1.0 / len(y) * np.dot((D(X,w_e) - y).T, X[:, j])
    return w_e

X = np.random.rand(10, 5)
m = np.ones((10, 1))
X = np.concatenate((X, m), axis=1)
w = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [10.0]])
y = np.dot(X, w)
a = np.mean(y, axis=0)
for i in range(10):
    if y[i][0] > a:
        y[i][0] = 1
    else: y[i][0] = 0

w_e = np.zeros_like(w)
cost = []
for i in range(30000):
    w_e = gradientDescent(X, y, w_e, 0.0001)
    cost.append(computeCost(X, y, w_e)[0, 0])

print(y)
print(D(X,w_e))

fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
ax1.plot(range(30000), cost,label='lost')
ax1.set_yscale('log')
ax1.set_xlabel("Iteration")
ax1.set_ylabel("Loss")
ax1.legend(loc='best')
plt.show()