机器学习算法(二) 基于逻辑回归的分类预测(代码实现)

最新推荐文章于 2022-11-19 20:05:59 发布

转载最新推荐文章于 2022-11-19 20:05:59 发布 · 795 阅读

2 ·

CC 4.0 BY-SA版权

原文链接：https://tianchi.aliyun.com/specials/promotion/aicampml?invite_channel=1

文章标签：

#逻辑回归 #python

这篇博客展示了如何使用Python的sklearn库实现逻辑回归模型，并进行边界和预测结果的可视化。代码中训练了一个逻辑回归模型，并对新数据点进行了预测，同时用matplotlib和seaborn绘制了决策边界。

以下代码来自阿里云平台，本人只是在学习时为了自己看起来方便做了一些删减，还有少部分名称，注释，以及测试数据的改动。想学习完整部分的可以见https://tianchi.aliyun.com/specials/promotion/aicampml?invite_channel=1

## base func
import numpy as np 

## ply
import matplotlib.pyplot as plt
import seaborn as sns

## logistic
from sklearn.linear_model import LogisticRegression

# data
x_feature = np.array([[1,2], [2,2], [3,1], [-1,-1], [-3,-4], [-2,-2]])
y_label = np.array([0,0,0,1,1,1])

# model
lr_clf = LogisticRegression()

# put data to mdoel
lr_clf = lr_clf.fit(x_feature, y_label) #linear combination

## print weight w,w0
print('the weight w', lr_clf.coef_)
print('the weight w0', lr_clf.intercept_)

## model and boundary visualization
plt.figure()
plt.scatter(x_feature[:, 0], x_feature[:, 1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')

nx, ny = 200, 100
x_min, x_max = plt.xlim()
y_min, y_max = plt.ylim()
x_grid, y_grid = np.meshgrid(np.linspace(x_min, x_max, nx), np.linspace(y_min, y_max, ny))
z_proba = lr_clf.predict_proba(np.c_[x_grid.ravel(), y_grid.ravel()])
z_proba = z_proba[:, 1].reshape(x_grid.shape)
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')

plt.show()

# new data visualization
plt.figure()

x_feature1 = np.array([[2, -2]])
plt.scatter(x_feature1[:, 0], x_feature1[:, 1], s=50, cmap='viridis')
plt.annotate(s='new point 1', xy=(2, -2), xytext=(1, -2), color = 'blue', 
	arrowprops=dict(arrowstyle='-|>', connectionstyle='arc3', color ='red'))

x_feature2 = np.array([[1, 1]])
plt.scatter(x_feature2[:, 0], x_feature2[:, 1], s=50, cmap='viridis')
plt.annotate(s='New point 2', xy=(1, 1), xytext=(1, 2), color='red',
	arrowprops=dict(arrowstyle='-|>',connectionstyle='arc3',color='red'))

## train data
plt.scatter(x_feature[:,0],x_feature[:,1], c=y_label, s=50, cmap='viridis')
plt.title('Dataset')

# boundary
plt.contour(x_grid, y_grid, z_proba, [0.5], linewidths=2., colors='blue')

plt.show()

# model predict
y_label1 = lr_clf.predict(x_feature1)
y_label2 = lr_clf.predict(x_feature2)

print('new point 1 predict class:\n', y_label1)
print('new point 2 predict class:\n', y_label2)

# because log regresion math is probability
y_label1_proba = lr_clf.predict_proba(x_feature1)
y_label2_proba = lr_clf.predict_proba(x_feature2)

print(y_label1_proba, y_label2_proba)