逻辑回归学习笔记

最新推荐文章于 2024-06-30 13:47:00 发布

kele421

最新推荐文章于 2024-06-30 13:47:00 发布

阅读量506

点赞数

本文链接：https://blog.youkuaiyun.com/kele421/article/details/126291249

版权

逻辑回归是一种根据自变量分类权重来决定是与否的分类器，以下就是逻辑回归函数

这其中的z就是

跟之前线性回归的算式wx+b一样，这种算式叫做sigmoid（s形）函数

1.sigmoid函数

就如函数名字一样，图形就是呈现s形的，sigmoid函数也称逻辑函数，可以用来激活映射二分类

2.逻辑函数简单分类

import numpy as np
import matplotlib.pyplot as mp
import sklearn.linear_model as lm

x = np.array([ [3, 1],
               [2, 5],
               [1, 8],
               [6, 4],
               [5, 2],
               [3, 5],
               [4, 7],
               [4, -1]])
y = np.array([0, 1, 1, 0, 0, 1, 1, 0])

# 训练模型
model = lm.LogisticRegression(
solver='liblinear', C=1)
model.fit(x, y)
# 模拟预测
r = model.predict([[1,8], [5,8], [8,3]])
print(r)

# 画图
mp.figure('Simple Classification', facecolor='lightgray')
mp.title('Simple Classification', fontsize=16)
# 绘制分类边界线
n = 500
l, r = x[:,0].min()-1, x[:,0].max()+1
b, t = x[:,1].min()-1, x[:,1].max()+1
grid_x, grid_y = np.meshgrid(np.linspace(l, r, n),
np.linspace(b, t, n))
# 根据业务，模拟预测
mesh_x = np.column_stack(
(grid_x.ravel(), grid_y.ravel()))
grid_z = model.predict(mesh_x)
# 把grid_z 变维：(500,500)
grid_z = grid_z.reshape(grid_x.shape)

mp.pcolormesh(grid_x, grid_y, grid_z, cmap='gray')
mp.scatter(x[:,0], x[:,1], s=80,
c=y, cmap='brg_r', label='Samples')
mp.legend()
mp.show()

前面定义了xy的数值，调用了逻辑回归的模型加以学习，这种二分类的还比较好直接看出来，譬如这里就是若x1>x2就是y=0，反之则为1。

随后我们要注意这里的画图方法，grid_x,grid_y就是为了划分边界而定的，所以取了最小和最大值，而这里的mesh_x用了两个方法，一个是stack，将两个数组堆放在一起，还有就是ravel平滑方法，使得数组变成一维数组，因此mesh_x最终结果就是二维数组500*2的大小。这也为后面predict服务，最后形成grid_z，而grid_z就是用来在pcolormesh方法中决定颜色分块的东西，这也是为何图像分成两块，分别就是预测的0和1部分。

3.朴素贝叶斯分类

import numpy as np
import matplotlib.pyplot as mp
import sklearn.naive_bayes as nb

# 整理样本
data = np.loadtxt(
'multiple1.txt', delimiter=',')
x = data[:, :2]
y = data[:, -1]
print(x.shape, y.shape)

# 训练模型
model = nb.GaussianNB()
model.fit(x, y)

# 画图
mp.figure('Naive Bayes Classification', facecolor='lightgray')
mp.title('Naive Bayes Classification', fontsize=16)
# 绘制分类边界线
n = 500
l, r = x[:,0].min()-1, x[:,0].max()+1
b, t = x[:,1].min()-1, x[:,1].max()+1
grid_x, grid_y = np.meshgrid(np.linspace(l, r, n),
np.linspace(b, t, n))
# 根据业务，模拟预测
mesh_x = np.column_stack(
(grid_x.ravel(), grid_y.ravel()))
grid_z = model.predict(mesh_x)
# 把grid_z 变维：(500,500)
grid_z = grid_z.reshape(grid_x.shape)

mp.pcolormesh(grid_x, grid_y, grid_z, cmap='gray')
mp.scatter(x[:,0], x[:,1], s=80,
c=y, cmap='brg_r', label='Samples')
mp.legend()
mp.show()

其实这里跟上面并没有过多的不同，导入的文件中最终预测的数字有0123，因此最终颜色分块是分成四种的，但是大家可以看到因为样本过于多导致那个蓝色和绿色交接部分还是有些不清晰，因此我们可以引出下面只使用少部分样本的分类

import numpy as np
import matplotlib.pyplot as mp
import sklearn.naive_bayes as nb
import sklearn.model_selection as ms

# 整理样本
data = np.loadtxt(
'multiple1.txt', delimiter=',')
x = data[:, :2]
y = data[:, -1]
print(x.shape, y.shape)

# 训练模型
train_x, test_x, train_y, test_y = \
ms.train_test_split(
x, y, test_size=0.25, random_state=7)
model = nb.GaussianNB()
model.fit(train_x, train_y)
# 使用测试集，检测预测结果正确率
pred_test_y = model.predict(test_x)
acc = (test_y == pred_test_y).sum() / test_y.size
print(acc)

mp.pcolormesh(grid_x, grid_y, grid_z, cmap='gray')
mp.scatter(test_x[:,0], test_x[:,1], s=80,
c=test_y, cmap='brg_r', label='Samples')
mp.legend()
mp.show()

在这里我们在训练模型的时候选用了四分之一样本train_x, test_x, train_y, test_y = \
ms.train_test_split(
x, y, test_size=0.25, random_state=7)，就是这里的test_size，这样就可以让图像更加清晰，不会聚在一块不好分析。

4.交叉验证

import numpy as np
import matplotlib.pyplot as mp
import sklearn.naive_bayes as nb
import sklearn.model_selection as ms

# 整理样本
data = np.loadtxt(
'multiple1.txt', delimiter=',')
x = data[:, :2]
y = data[:, -1]
print(x.shape, y.shape)

# 训练模型
train_x, test_x, train_y, test_y = \
ms.train_test_split(
x, y, test_size=0.25, random_state=7)
model = nb.GaussianNB()

# 交叉验证
acc = ms.cross_val_score(model,
train_x, train_y, cv=5, scoring='accuracy')
print(acc.mean())

pw = ms.cross_val_score(model,
train_x, train_y, cv=5, scoring='precision_weighted')
print(acc.mean())

rw = ms.cross_val_score(model,
train_x, train_y, cv=5, scoring='recall_weighted')
print(rw.mean())

f1 = ms.cross_val_score(model,
train_x, train_y, cv=5, scoring='f1_weighted')
print(f1.mean())

model.fit(train_x, train_y)
# 使用测试集，检测预测结果正确率
pred_test_y = model.predict(test_x)
acc = (test_y == pred_test_y).sum() / test_y.size
print(acc)

mp.pcolormesh(grid_x, grid_y, grid_z, cmap='gray')
mp.scatter(test_x[:,0], test_x[:,1], s=80,
c=test_y, cmap='brg_r', label='Samples')
mp.legend()
mp.show()