西瓜书3.3&南瓜书3.3——对数几率回归_对率回归西瓜数据集-优快云博客

本文链接：https://blog.youkuaiyun.com/m0_73556978/article/details/142502128

对数几率回归

对数几率回归是广义线性回归的一种，将线性回归模型输入进sigmoid函数，将函数值压缩到0～1区间内。以二分类问题为例，sigmoid函数的输出值代表此对象是正样本的概率，函数值越大代表他是正样本的概率越大。选取一个阈值theta作为对照，函数值大于theta的别判为正例，函数值小于theta的被判为反例，通常theta取0.5。

sigmoid函数：
$sigmoid=\frac{1}{1+e^{-z}}$
z为线性回归模型：
$z=w^Tx+b$
综合两式可得对率回归表达式：
$y=\frac{1}{1+e^{-(w^Tx+b)}}$
在广义线性模型中有一个链接函数的概念，链接函数能将线性模型的输出与广义线性模型的输出结合起来，他们的关系如下：
$\begin{aligned} 链接函数&=g(·)\\ 线性模型输出&=w^Tx+b\\ 广义线性模型输出&=g^{-1}(线性模型输出) \end{aligned}$
在对率回归中，链接函数的反函数就是sigmoid，链接函数是sigmoid的反函数，哈哈哈是不是已经绕晕了：
$\begin{aligned} g^{-1}(·)=sigmoid&=\frac{1}{1+e^{-z}}\\ g(·)&=ln\frac{y}{1-y} \end{aligned}$
其实不必太过注意这些，我们只需记住对率回归模型是将线性回归输入进sigmoid函数即可。

参数估计

对率回归使用极大似然估计法进行参数估计，目的是让对率回归预测的结果和样本真实的分类结果尽量相同。

先简单讲下极大似然估计。极大似然估计就是，在已知样本的概率分布的条件下，估计出样本的参数，让样本符合这个分布的概率最大。对应到我们这个模型中就是：已知样本真实的分类结果，得到一组参数让模型预测的样本的分类与他真实的分类尽量相同。

下面讲解参数估计的数学推导：

前面讲过模型的输出值代表此对象是正样本的概率，所以我们将原公式进行变形：
$\begin{aligned} p(y=1|x)=\frac{1}{1+e^{-(w^Tx+b)}}=\frac{1*{e^{w^Tx+b}}}{(1+e^{-(w^Tx+b)})*e^{w^Tx+b}}&=\frac{e^{w^Tx+b}}{1+e^{e^Tx+b}}\\ p(y=0|x)=1-p(y=1|x)&=\frac{1}{1+e^{e^Tx+b}} \end{aligned}$
对w，b进行极大似然估计，构造似然函数 $L$
$L(w,b)=\prod^{m}_{i=1}p(y_i|x_i;w,b)$
为了方写代码我们对L取对数，转换为对数似然函数，让累乘变累加
$\begin{aligned} lnL(w,b)&=ln\prod^{m}_{i=1}p(y_i|x_i;w,b)\\ \iota(w,b)&=\sum^{m}_{i=1}lnp(y_i|x_i;w,b) \end{aligned}$
将y_i和p转换成矩阵形式， $\beta=[w;b],列向量(分母布局)$ ， $\vec{x}=[x;1]，列向量（分母布局）$ ， $\vec y=[y_1;y_2;···y_m]（分母布局）$ 所以 ${\beta^T}\vec x =w^T+b$ ，p_1= $\Large\frac{e^{\beta^T\vec x}}{1+{e^{\beta^T \vec x}}}$ ，所以对数似然函数整理为：
$\iota(\beta)=-\vec y*\beta^T\vec x+ln(1+e^{\beta^T\vec x})$
对 $\iota$ 取反，令 $-\iota$ 最小同样得到w，b的最优解
$-\iota(\beta)=\vec y*\beta^T\vec x-ln(1+e^{\beta^T\vec x})$
为了方便起见，令 $\iota\ =-\iota$ 下面的讨论都是基于此进行的。

所以求
$\underset{w,b}{arg}\ min\iota(\beta)$
即可。

梯度下降

使用梯度下降求解 $\large \underset{w,b}{arg}\ min\iota(\beta)$ 的最优解。
$\beta .grad=\frac{\partial\iota (\beta)}{\partial \beta}$
设置超参数——学习率r， $\beta$ 每次都向最优解方向移动一小步，但这个最优解不一定是全局最优解而是局部最优解，机器学习有一些防止陷入局部最优的方法如“批量随机梯度下降”等等。每轮 $\beta$ 更新的公式
$\beta-=r*\beta.grad$

总结

对数几率回归算法的机器学习三要素：

1、模型：线性模型，输出值的范围是[0,1]，近似阶跃的单调可微函数

2、策略：极大似然估计，信息；论

3、优化算法：梯度下降，牛顿法

代码

使用pytorch框架的自动求导求得 $\beta$ 的梯度

西瓜书课后作业3.3，使用西瓜书3 $\alpha$ 数据集

import torch
import numpy as np
import matplotlib.pyplot as plt

def logistic(x,beta):
    '''计算预测值,返回的是对象属于1的概率'''
    # y_hat=torch.where(torch.sigmoid(beta@x)>theta,torch.tensor(1),torch.tensor(0))
    p1=torch.sigmoid(beta@x)
    return p1
    
def prediction(p1,theta):
    '''p1为sigmoid函数预测的样本为正例的概率，theta是阈值'''
    return torch.where(p1>theta,torch.tensor(1),torch.tensor(0))
    
def F1_score(y_hat,y):
    '''输入预测值y_hat和实际值y，计算F1_score评估模型性能'''
    TP=torch.sum(torch.where((y==1)&(y_hat==1),torch.tensor(1),torch.tensor(0)))
    # TP=torch.sum(y==1&y_hat==1)
    FP=torch.sum(torch.where((y==0)&(y_hat==1),torch.tensor(1),torch.tensor(0)))
    # FP=torch.sum(y==0&y_hat==1)
    FN=torch.sum(torch.where((y==1)&(y_hat==0),torch.tensor(1),torch.tensor(0)))
    # FN=torch.sum(y==1&y_hat==0)
    P=TP/(TP+FP)
    R=TP/(TP+FN)
    return 2*P*R/(P+R)

def gd(beta,r):
    '''梯度下降算法,beta为参数矩阵，r为学习率'''
    with torch.no_grad():
        beta-=r*beta.grad
        beta.grad.zero_()

def log_loss(x,beta,y):
    '''参数beta的极小对数似然函数'''
    # a=torch.exp(beta@x)
    # return -torch.log(y.T*(a/1+a)+(1-y.T)*(1/1+a))
    # return -y*torch.log(a/(1+a))-(1-y)*torch.log(1/(1+a))
    return -y.T*(beta@x)+torch.log(1+torch.exp(beta@x))
    
def log_loss(x,beta,y):
    '''参数beta的极小对数似然函数'''
    # a=torch.exp(beta@x)
    # return -torch.log(y.T*(a/1+a)+(1-y.T)*(1/1+a))
    # return -y*torch.log(a/(1+a))-(1-y)*torch.log(1/(1+a))
    return -y.T*(beta@x)+torch.log(1+torch.exp(beta@x))
    
    
epoches = 100 #训练轮次
r = 0.05	#学习率
theta = 0.5	#阈值
# 初始化beta矩阵
beta = torch.rand(1, features.size(0) + 1, dtype=float, requires_grad=True)
X = torch.cat([features, torch.ones(1, features.size(1), dtype=float)], dim=0)
F1_history = []
for epoch in range(0, epoches):
    p1 = logistic(X, beta)  # 使用更新过的beta计算X是正样本的概率
    loss = log_loss(X, beta, labels)  # 计算本轮的负对数似然函数
    loss.sum().backward()  # 计算负对数似然函数对beta的导数
    gd(beta, r)  # 梯度下降
    y_hat = prediction(p1, theta).reshape(labels.shape)  # 得到预测值，并转换为与labels相同的形式
    F1 = F1_score(y_hat, labels)  # 计算F1值
    F1_history.append(F1)  # 储存F1值，方便后续画图
    print(f'eopch：{epoch+1},F1_score：{F1}')
# 画F1 score 曲线
plt.plot(np.linspace(0, epoches, epoches), F1_history)
plt.ylim(0, 1.0)
plt.title('F1 score')
plt.xlabel('epoch')
plt.ylabel('F1')
# 保存第二张图
# plt.savefig("f1_score.png")
plt.show()    

# 将边界显示在散点图中
# 画决策边界
w1 = beta[0, 0].item()
w2 = beta[0, 1].item()
b = beta[0, 2].item()
# 生成直线上的点
x_range = np.linspace(x_min, x_max, 100)
y = -w1 / w2 * x_range - b / w2

# 可视化数据和决策边界
plt.figure(figsize=(8, 6))
plt.scatter(features[0, :8], features[1, :8], c='red', marker='o', label='Positive')
plt.scatter(features[0, 9:], features[1, 9:], c='green', marker='x', label='Negative')
plt.plot(x_range, y, 'b-', label='Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Data with Decision Boundary')
plt.xlim(x_min, x_max)  # 设置 x 轴范围
plt.ylim(y_min, y_max)  # 设置 y 轴范围
plt.legend()
# 保存第三张图
# plt.savefig("data_with_boundary.png")
plt.show()

伪造数据集

import torch
import numpy as np
import matplotlib.pyplot as plt

def logistic(x,beta):
    '''计算预测值,返回的是对象属于1的概率'''
    # y_hat=torch.where(torch.sigmoid(beta@x)>theta,torch.tensor(1),torch.tensor(0))
    p1=torch.sigmoid(beta@x)
    return p1

def prediction(p1,theta):
    '''p1为sigmoid函数预测的样本为正例的概率，theta是阈值'''
    return torch.where(p1>theta,torch.tensor(1),torch.tensor(0))

def gd(beta,r):
    '''梯度下降算法,beta为参数矩阵，r为学习率'''
    with torch.no_grad():
        beta-=r*beta.grad
        beta.grad.zero_()

def log_loss(x,beta,y):
    '''参数beta的极小对数似然函数'''
    # a=torch.exp(beta@x)
    # return -torch.log(y.T*(a/1+a)+(1-y.T)*(1/1+a))
    # return -y*torch.log(a/(1+a))-(1-y)*torch.log(1/(1+a))
    return -y.T*(beta@x)+torch.log(1+torch.exp(beta@x))

def F1_score(y_hat,y):
    '''输入预测值y_hat和实际值y，计算F1_score评估模型性能'''
    TP=torch.sum(torch.where((y==1)&(y_hat==1),torch.tensor(1),torch.tensor(0)))
    # TP=torch.sum(y==1&y_hat==1)
    FP=torch.sum(torch.where((y==0)&(y_hat==1),torch.tensor(1),torch.tensor(0)))
    # FP=torch.sum(y==0&y_hat==1)
    FN=torch.sum(torch.where((y==1)&(y_hat==0),torch.tensor(1),torch.tensor(0)))
    # FN=torch.sum(y==1&y_hat==0)
    P=TP/(TP+FP)
    R=TP/(TP+FN)
    return 2*P*R/(P+R)
    
# 数据1、边界不清晰
# 设置随机种子，确保每次运行结果一致
np.random.seed(42)

# 生成正例数据
n_samples = 50
mean = [1, 1]  # 正例的中心点
cov = [[1, 0], [0, 1]]  # 协方差矩阵

positive_samples = np.random.multivariate_normal(mean, cov, n_samples)

# 生成负例数据
negative_samples = np.random.multivariate_normal([-1, -1], cov, n_samples)

# 合并数据
features = np.concatenate((positive_samples, negative_samples), axis=0).T
labels = np.concatenate((np.ones(n_samples), np.zeros(n_samples)))

# 打乱数据
indices = np.random.permutation(features.shape[1])
features = features[:, indices]
labels = labels[indices]

# 将数据转换为 PyTorch 张量
features = torch.tensor(features, dtype=torch.float)
labels = torch.tensor(labels, dtype=torch.float).reshape((-1, 1))


# 数据集2、边界清晰但是距离近
# np.random.seed(42)
# # 生成正例数据
# n_samples = 50
# mean_positive = [1, 1]  # 正例的中心点
# cov_positive = [[0.5, 0], [0, 0.5]]  # 协方差矩阵，减小方差让数据更集中

# positive_samples = np.random.multivariate_normal(mean_positive, cov_positive, n_samples)

# # 生成负例数据
# mean_negative = [-1, -1]  # 负例的中心点
# cov_negative = [[0.5, 0], [0, 0.5]]  # 协方差矩阵，减小方差让数据更集中

# negative_samples = np.random.multivariate_normal(mean_negative, cov_negative, n_samples)

# # 合并数据
# features = np.concatenate((positive_samples, negative_samples), axis=0).T
# labels = np.concatenate((np.ones(n_samples), np.zeros(n_samples)))

# # 打乱数据
# indices = np.random.permutation(features.shape[1])
# features = features[:, indices]
# labels = labels[indices]

# # 将数据转换为 PyTorch 张量
# features = torch.tensor(features, dtype=torch.float)
# labels = torch.tensor(labels, dtype=torch.float).reshape((-1, 1))



# 数据集3、边界清晰且距离远
# # 设置随机种子，确保每次运行结果一致
# np.random.seed(42)

# # 生成正例数据
# n_samples = 50
# mean_positive = [2, 2]  # 正例的中心点，将中心点向右上角移动
# cov_positive = [[0.5, 0], [0, 0.5]]  # 协方差矩阵，减小方差让数据更集中

# positive_samples = np.random.multivariate_normal(mean_positive, cov_positive, n_samples)

# # 生成负例数据
# mean_negative = [-2, -2]  # 负例的中心点，将中心点向左下角移动
# cov_negative = [[0.5, 0], [0, 0.5]]  # 协方差矩阵，减小方差让数据更集中

# negative_samples = np.random.multivariate_normal(mean_negative, cov_negative, n_samples)

# # 合并数据
# features = np.concatenate((positive_samples, negative_samples), axis=0).T
# labels = np.concatenate((np.ones(n_samples), np.zeros(n_samples)))

# # 打乱数据
# indices = np.random.permutation(features.shape[1])
# features = features[:, indices]
# labels = labels[indices]

# # 将数据转换为 PyTorch 张量
# features = torch.tensor(features, dtype=torch.float)
# labels = torch.tensor(labels, dtype=torch.float).reshape((-1, 1))

# 可视化数据
# 第一次画图，获取坐标轴范围和图像大小
x_min = -5
x_max = 5
y_min = -5
y_max = 5

# 第一次画图，使用设置的坐标范围
plt.figure(figsize=(8, 6))
plt.scatter(features[0, :], features[1, :], c=labels.numpy().flatten())
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Data Visualization')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.close()

epoches = 5
r = 0.05
theta = 0.5
# 初始化beta矩阵
beta = torch.rand(1, features.size(0) + 1, dtype=float, requires_grad=True)
X = torch.cat([features, torch.ones(1, features.size(1), dtype=float)], dim=0)
F1_history = []
for epoch in range(0, epoches):
    p1 = logistic(X, beta)  # 使用更新过的beta计算X是正样本的概率
    loss = log_loss(X, beta, labels)  # 计算本轮的负对数似然函数
    loss.sum().backward()  # 计算负对数似然函数对beta的导数
    gd(beta, r)  # 梯度下降
    y_hat = prediction(p1, theta).reshape(labels.shape)  # 得到预测值，并转换为与labels相同的形式
    F1 = F1_score(y_hat, labels)  # 计算F1值
    F1_history.append(F1)  # 储存F1值，方便后续画图
    print(f'eopch：{epoch+1},F1_score：{F1}')
# 画F1 score 曲线
plt.plot(np.linspace(0, epoches, epoches), F1_history)
plt.ylim(0, 1.0)
plt.title('F1 score')
plt.xlabel('epoch')
plt.ylabel('F1')
# 保存第二张图
# plt.savefig("f1_score.png")
plt.show()

# 将边界显示在散点图中
# 画决策边界
w1 = beta[0, 0].item()
w2 = beta[0, 1].item()
b = beta[0, 2].item()
# 生成直线上的点
x_range = np.linspace(x_min, x_max, 100)
y = -w1 / w2 * x_range - b / w2
# 第二次画图，使用设置的坐标范围
plt.figure(figsize=(8, 6))
plt.scatter(X[0, :], X[1, :], c=labels.numpy().flatten())
plt.plot(x_range, y, 'b-', label='Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Data Visualization with Decision Boundary')
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.legend()
# 保存第三张图
# plt.savefig("data_with_boundary.png")
plt.show()