利用梯度下降法进行多变量二分类
首先让我们了解一下什么是二分类。
二分类就是指将数据点分为两个可能的类别之一,比如"+1",'-1'两个类别。
而多变量二分类就是指在多个输入变量的基础上,将数据点分为两个可能的类别之一。
以下代码就是利用梯度下降法来实现二分类的。
导包
import numpy as np
import matplotlib.pyplot as plt
数据准备
# 数据集
x=np.array([[0.180,0.001*1],[0.100,0.001*2],
[0.160,0.001*3],[0.080,0.001*4],
[0.090,0.001*5],[0.110,0.001*6],
[0.120,0.001*7],[0.170,0.001*8],
[0.150,0.001*9],[0.140,0.001*10],
[0.130,0.001*11]]) # 11*2
# 分类标签
y=np.array([+1,-1,+1,-1,-1,-1,-1,+1,+1,+1,-1])
然后获取数据集x的样本数n_samples和特征数n_features
n_samples,n_features=x.shape
我们学习的x.shape通常是用来获取维度维度的,那么在这里是如何使用的呢?
初始化参数
weights=np.zeros(n_features)
bias=0.0
lr=0.0001
epoch=10000
weights=np.zeros(n_features)
意思是为每一个特征分配一个权重,并且所有的这些权重都初始化为0
进行循环
for _ in range(epoch):
mis_flag=False
for i in range(n_samples):
# 计算预测值
y_pre=np.dot(weights,x[i])+bias
# 根据预测值和真实值调整权重和偏置
if y_pre*y[i]<=0:
weights=weights+lr*y[i]*x[i]
bias=bias+lr*y[i]
mis_flag=True
if not mis_flag:
break
print(weights)
print(bias)
我们将mis_flag初始化 为否,是指在此次迭代中不存在误分类样本
然后对每个样本计算预测值y_pre,是对w和x进行点积,再加偏置b得到的
接下来,根据预测值和真实值的乘积是否小于0来判断样本是否被误分类,如果是误分类,就要调整参数和偏置了
如果小于0,代表预测值和真实值符号相反,样本被误分类了,然后更新权值和偏置,同时将mis_flag设置为True,表示本次迭代中存在被误分类的样本
最后
if not mis_flag: break
它的意思是,如果没有被误分类的样本,可以提前终止循环。
对比预测值与真实值
y_pre=[]
for i in range(n_samples):
# 计算预测值
pre=np.dot(weights,x[i])+bias
if(pre>=0):
y_pre.append(1)
else:
y_pre.append(-1)
print(y)
print(y_pre)
把每个样本的预测值追加到y_pre中,将真实值与预测值进行对比,判断二分类是否正确
可视化
# 绘制决策边界和误分类点
x1=np.arange(0.135,0.145,0.0001)
x2=(-weights[0]*x1-bias)/weights[1]
plt.plot(x1,x2,'r',label='Decision Boundary')
for ii in range(11):
if y[ii]==+1:
plt.plot(x[ii][0],x[ii][1],'bo')
else:
plt.plot(x[ii][0],x[ii][1],'ro')
plt.legend()
plt.show()
完整代码
import numpy as np
import matplotlib.pyplot as plt
# 数据集
x=np.array([[0.180,0.001*1],[0.100,0.001*2],
[0.160,0.001*3],[0.080,0.001*4],
[0.090,0.001*5],[0.110,0.001*6],
[0.120,0.001*7],[0.170,0.001*8],
[0.150,0.001*9],[0.140,0.001*10],
[0.130,0.001*11]]) # 11*2
# 分类标签
y=np.array([+1,-1,+1,-1,-1,-1,-1,+1,+1,+1,-1])
n_samples,n_features=x.shape
# 初始化权重和偏置为0
weights=np.zeros(n_features)
bias=0.0
lr=0.0001
epoch=10000
for _ in range(epoch):
mis_flag=False
for i in range(n_samples):
# 计算预测值
y_pre=np.dot(weights,x[i])+bias
# 根据预测值和真实值调整权重和偏置
if y_pre*y[i]<=0:
weights=weights+lr*y[i]*x[i]
bias=bias+lr*y[i]
mis_flag=True
if not mis_flag:
break
print(weights)
print(bias)
y_pre=[]
for i in range(n_samples):
# 计算预测值
pre=np.dot(weights,x[i])+bias
if(pre>=0):
y_pre.append(1)
else:
y_pre.append(-1)
print(y)
print(y_pre)
# 绘制决策边界和误分类点
x1=np.arange(0.135,0.145,0.0001)
x2=(-weights[0]*x1-bias)/weights[1]
plt.plot(x1,x2,'r',label='Decision Boundary')
for ii in range(11):
if y[ii]==+1:
plt.plot(x[ii][0],x[ii][1],'bo')
else:
plt.plot(x[ii][0],x[ii][1],'ro')
plt.legend()
plt.show()
最终结果:
利用梯度下降法进行多变量线性回归
import numpy as np
# 数据集
x = np.array([[0.180, 0.001 * 1], [0.100, 0.001 * 2],
[0.160, 0.001 *