对数几率回归(logistic regression),也常被叫做逻辑回归,用于分类任务当中。
优点:计算代价不高,易于理解和实现
缺点:容易欠拟合,分类精度可能不高
在使用线性模型进行二分类任务时,需要找一个单调可微函数将分类任务的真实标记y与线性回归模型的预测值联系起来,二分类任务中,输出非0即1,而线性模型则是连续的任意值,二者需要做映射,考虑亥维赛德阶跃函数(单位阶跃函数):
f
(
x
)
=
{
0
x
<
0
0.5
x
=
0
1
x
>
0
f(x) = \begin{cases} 0 & x<0 \\ 0.5 & x=0 \\ 1 & x>0 \end{cases}
f(x)=⎩⎪⎨⎪⎧00.51x<0x=0x>0
但该函数不连续,在后续的处理中不是很合适,所以采用另一种近似阶跃函数的单调可微函数,对数几率函数(Sigmoid 函数)。
y
=
1
1
+
e
−
z
y = \frac{1}{1+e^{-z}}
y=1+e−z1
将线性回归模型的结果z计算函数值g(z),g(z)>0在分类为1,否则为0。
公式推导
假设z = Θ^Tx为线性回归模型
g
(
z
)
=
1
1
+
e
−
θ
T
x
⟹
1
−
g
(
z
)
g
(
z
)
=
1
+
e
−
θ
T
x
g(z) = \frac{1}{1+e^{-\theta^{T}x}} \Longrightarrow \frac{1-g(z)}{g(z)} = 1+e^{-\theta^{T}x}
g(z)=1+e−θTx1⟹g(z)1−g(z)=1+e−θTx
假设g(z)是将样本分类为1的概率,则:
P
{
g
(
z
)
=
1
∣
x
;
θ
}
=
1
1
+
e
−
θ
T
x
=
h
θ
(
x
)
P
{
g
(
z
)
=
0
∣
x
;
θ
}
=
e
−
θ
T
x
1
+
e
−
θ
T
x
=
1
−
h
θ
(
x
)
P\{g(z) = 1|x;\theta\} = \frac{1}{1+e^{-\theta^{T}x}} = h_\theta(x) \\ P\{g(z) = 0|x;\theta\} = \frac{e^{-\theta^{T}x}}{1+e^{-\theta^{T}x}} = 1-h_\theta(x)
P{g(z)=1∣x;θ}=1+e−θTx1=hθ(x)P{g(z)=0∣x;θ}=1+e−θTxe−θTx=1−hθ(x)
构建损失函数:
L
o
s
s
(
h
θ
(
x
)
,
y
)
=
h
θ
(
x
)
y
(
1
−
h
θ
(
x
)
)
1
−
y
Loss(h_\theta(x),y) = h_\theta(x)^y(1-h_\theta(x))^{1-y}
Loss(hθ(x),y)=hθ(x)y(1−hθ(x))1−y
根据极大似然法,最小化代价函数:
C
o
s
t
(
h
θ
(
x
)
,
y
)
=
∏
i
n
h
θ
(
x
i
)
y
i
(
1
−
h
θ
(
x
i
)
)
(
1
−
y
)
i
Cost(h_\theta(x),y) = \prod_i^n h_\theta(x^{i})^{y^i}(1-h_\theta(x^i))^{(1-y)^i}
Cost(hθ(x),y)=i∏nhθ(xi)yi(1−hθ(xi))(1−y)i
取对数:
J
(
θ
)
=
∑
i
n
[
y
i
l
o
g
(
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
]
J(\theta) = \sum_{i}^n[y^ilog(h_\theta(x^i))+(1-y^i)log(1-h_\theta(x^i))]
J(θ)=i∑n[yilog(hθ(xi))+(1−yi)log(1−hθ(xi))]
梯度上升法求θ的极大似然估计值:
θ
j
:
=
θ
j
+
α
∂
J
(
θ
)
θ
j
\theta_j:= \theta_j + \alpha \frac{\partial J(\theta)}{\theta_j}
θj:=θj+αθj∂J(θ)
链式求导:
∂
J
(
θ
)
θ
j
=
∂
J
(
θ
)
∂
h
θ
(
x
)
∂
h
θ
(
x
)
∂
θ
T
x
∂
θ
T
x
∂
θ
j
∂
J
(
θ
)
∂
h
θ
(
x
)
=
y
∗
1
h
θ
(
x
)
+
(
y
−
1
)
1
1
−
h
θ
(
x
)
∂
h
θ
(
x
)
∂
θ
T
x
=
e
−
θ
T
x
(
1
+
e
−
θ
T
x
)
2
=
h
θ
(
x
)
(
1
−
h
θ
(
x
)
)
∂
θ
T
x
∂
θ
j
=
x
j
\frac{\partial J(\theta)}{\theta_j} = \frac{\partial J(\theta)}{\partial h_\theta(x)} \frac{\partial h_\theta(x)}{\partial \theta^Tx} \frac{\partial \theta^Tx}{\partial \theta_j} \\ \frac{\partial J(\theta)}{\partial h_\theta(x)} = y * \frac{1}{h_\theta(x)} + (y-1)\frac{1}{1 - h_\theta(x)} \\ \frac{\partial h_\theta(x)}{\partial \theta^Tx} = \frac{e^{-\theta^Tx}}{(1+e^{-\theta^Tx})^2} = h_\theta(x)(1- h_\theta(x)) \\ \frac{\partial \theta^Tx}{\partial \theta_j} = x_j
θj∂J(θ)=∂hθ(x)∂J(θ)∂θTx∂hθ(x)∂θj∂θTx∂hθ(x)∂J(θ)=y∗hθ(x)1+(y−1)1−hθ(x)1∂θTx∂hθ(x)=(1+e−θTx)2e−θTx=hθ(x)(1−hθ(x))∂θj∂θTx=xj
可推出:
∂
J
(
θ
)
θ
j
=
(
y
−
h
θ
(
x
)
)
x
j
θ
j
:
=
θ
j
+
α
∑
i
m
(
y
i
−
h
θ
(
x
i
)
)
x
j
i
\frac{\partial J(\theta)}{\theta_j} = (y-h_\theta(x))x_j \\ \theta_j := \theta_j + \alpha\sum_i^m(y^{i} - h_\theta(x^i))x_j^i
θj∂J(θ)=(y−hθ(x))xjθj:=θj+αi∑m(yi−hθ(xi))xji
由该公式可求出最佳回归参数。
模拟算法
这里仍然采用《机器学习实战》内的例子,实战中主要将求参数转为矩阵操作,大大简化计算:
# -*- coding:utf-8 -*-
"""
@author liuning
@date 2019/10/26 16:13
@File logRegres.py
@Desciption
"""
from numpy import mat, shape, ones, exp, array, arange
def loadDataSet():
dataMat = []; labelMat = []
with open('testSet.txt','r') as f:
for line in f.readlines():
lineArr = line.strip().split()
dataMat.append([1.0,float(lineArr[0]),float(lineArr[1])])
labelMat.append(int(lineArr[2]))
return dataMat,labelMat
def sigmoid(inX):
"""
回归方程
:param inX:
:return: 1/(1+exp(-inX)) S函数
"""""
return 1/(1+exp(-inX))
def gradAscent(dataMat,labelMat):
"""
梯度上升
"""
dataMat = mat(dataMat)
labelMat = mat(labelMat).transpose()
n,m = shape(dataMat)
alpha = 0.001 # 步长
generations = 500
weights = ones((m,1))
# 结合数学推导,得到weights的更新方法
for k in range(generations):
h = sigmoid(dataMat*weights)
error = labelMat - h
weights = weights + alpha * dataMat.transpose()*error # 极大似然估计法
return weights
def plotBestFit(wei):
"""
绘图
"""
import matplotlib.pyplot as plt
weights = wei.getA()
dataMat,labelMat = loadDataSet()
dataArr = array(dataMat)
n = shape(dataArr)[0]
xcord1 = []; ycord1 = []
xcord2 = []; ycord2 = []
for i in range(n):
if int(labelMat[i]) == 1:
xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
else:
xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xcord1,ycord1,s=30,c='red',marker='s')
ax.scatter(xcord2,ycord2,s=30,c='green')
x = arange(-3.0,3.0,0.1)
y = (-weights[0]-weights[1]*x)/weights[2]
ax.plot(x,y)
plt.xlabel('X1');plt.ylabel('X2')
plt.show()
if __name__ == '__main__':
inMat,labelMat = loadDataSet()
weights = gradAscent(inMat,labelMat)
print(weights)
plotBestFit(weights)
随机梯度上升(SGA)
当数据量很大时,直接全部进行计算时,计算复杂度太高。随机梯度上升方法一次只用一个样本数据更新参数值,由于可以对参数进行增量式更新,所以是一门“在线学习”算法。
其他类似算法还有小批量梯度上升(Mini-batch Gradient Ascent),该方法每次使用batch_size大小的数据集对参数进行更新,是一个折中办法,常用于深度学习中。
def stocGradAscent0(dataMatrix, classLabel,numIter=150):
"""
随机梯度上升
:param dataMatrix:
:param classLabel:
:param numIter:
:return: weights nd.array
"""
dataMatrix = array(dataMatrix)
m,n = shape(dataMatrix)
weights = ones(n)
for j in range(numIter):
dataIndex = list(range(m))
for i in range(m):
alpha = 4/(1.0+j+i) + 0.01
randIndex = int(random.uniform(0, len(dataIndex))) #随机选取一个样本
h = sigmoid(sum(dataMatrix[randIndex] * weights))
error = classLabel[randIndex] - h
weights = weights + alpha * error * dataMatrix[randIndex] # 更新参数
del(dataIndex[randIndex])
return weights
其中,alpha会随迭代次数进行修改,alpha会逐渐变小,表示该组数据对参数值的调整影响效果越来越小,但因为由常数值0.01,所以永远不会为0。常数值要根据实际场景进行调整,通常,如果处理的问题是动态变化的,则可以适当加大常数项,确保新的值获得更大的回归系数。
数据样本随机选取可以减少周期性的波动。
实践
案例取自《机器学习实战》第五章
收集数据
https://archive.ics.uci.edu/ml/datasets.php
Horse-Colic 数据集
准备数据
训练数据填充缺失值,测试数据集去掉缺失条目
分析数据
可视化并观察数据
def loadData():
"""
加载训练数据
:return:
"""
dataMat = []; labelMat = []
with open("horseColicTraining.txt",'r') as f:
for line in f.readlines():
lineArr = line.strip().split()
lineArr = [float(x) for x in lineArr]
dataMat.append([1.0]+lineArr[0:21])
labelMat.append(lineArr[21])
return dataMat,labelMat
def loadTestData():
"""
加载测试数据
:return:
"""
dataMat = []; labelMat = []
with open("horseColicTest.txt",'r') as f:
for line in f.readlines():
lineArr = line.strip().split()
lineArr = [float(x) for x in lineArr]
dataMat.append([1.0] + lineArr[0:21])
labelMat.append(lineArr[21])
return dataMat,labelMat
训练算法
使用随机梯度下降算法,求最佳回归参数。
def signoid(x):
return 1/(1+exp(-x))
def stocGradAscent(inMat,labels,numIter=150):
"""
随机梯度上升
:param inMat:
:param labels:
:param numIter:
:return:
"""
inMat = array(inMat)
m,n = shape(inMat)
#print(m)
weights = ones(n)
for i in range(numIter):
dataIndex = list(range(m))
for j in range(m):
alpha = 4.0/(1.0+i+j) + 0.01
randIndex = int(random.uniform(1,len(dataIndex)))
h = signoid(sum(inMat[randIndex]*weights))
error = labels[randIndex] - h
weights = weights + alpha * error * inMat[randIndex]
return weights
测试算法
inMat,labelMat = loadData()
weights = stocGradAscent(inMat,labelMat,500)
# print(weights)
testMat,labelTest = loadTestData()
m,n = shape(testMat)
result = []
for i in range(m):
if signoid(sum(testMat[i]*weights)) > 0.5:
result.append(1.0)
else:
result.append(0.0)
error = 0
for i in range(m):
if(result[i] != labelTest[i]):
error += 1
print("errorRate:"+str(error*100/m)+"%")
补一个Andrew课程作业答案:
# -*- coding:utf-8 -*-
"""
"""
import numpy as np
import matplotlib.pyplot as plt
import h5py
from andrew.logistic.lr_utils import load_dataset
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
img_width, img_height = train_set_x_orig.shape[1],train_set_x_orig.shape[2]
print("图片像素:"+ str(img_width)+"*"+str(img_height)+"*3")
num_pics_train = train_set_y.shape[1]
num_pics_test = test_set_y.shape[1]
print("训练图片:"+ str(num_pics_train))
print("测试图片:"+str(num_pics_test))
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T # -1表示reshape自动将剩余维度合并
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T
print("训练集维度:" + str(train_set_x_flatten.shape))
print("测试集维度:" + str(test_set_x_flatten.shape))
# 标准化数据
train_set_x = train_set_x_flatten / 255
test_set_x = test_set_x_flatten / 255
print("训练结果集维度:" + str(train_set_y.shape))
def sigmoid(x):
return 1/(1+np.exp(-x))
def gradient(learning_rate=0.5,iteration=1000):
w = np.zeros(shape=(train_set_x.shape[0],1))
b = 0
costs = []
epsilon = 1e-5
for i in range(iteration):
A = sigmoid(np.dot(w.T, train_set_x) + b)
if(i%100 == 0):
costs.append(-(1/train_set_x.shape[1])*np.sum(np.dot(train_set_y,np.log(A+epsilon).T)
+ np.dot((1-train_set_y),np.log(1-A+epsilon).T)))
dw = np.dot(train_set_x, (A - train_set_y).T)
db = (1/train_set_x.shape[1])*np.sum(train_set_y - A)
w = w - learning_rate * dw
b = b - learning_rate * db
return w,b,costs
def predict(w,b,X):
m = X.shape[1]
Y_prediction = np.zeros((1,m))
# w = w.reshape(X.shape[0],1)
A = sigmoid(np.dot(w.T,X)+b)
for i in range(A.shape[1]):
Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
return Y_prediction
w,b,costs = gradient(learning_rate=0.1,iteration=2000)
Y_predict_test = predict(w,b,test_set_x)
Y_predict_train = predict(w,b,train_set_x)
print(Y_predict_test)
print("训练集准确性:", format(100 - np.mean(np.abs(Y_predict_train - train_set_y)) * 100), "%")
print("测试集准确性:", format(100 - np.mean(np.abs(Y_predict_test - test_set_y)) * 100), "%")
print(costs)
costs = np.squeeze(costs)
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.show()
参考链接:
《机器学习实战》第五章
《机器学习》3.3
机器学习实战教程(六)
梯度上升算法
具有神经网络思维的Logistic回归
对数几率回归:理论与实践
本文详细介绍了对数几率回归(Logistic Regression),包括公式推导、模拟算法、随机梯度上升(SGA)及其在实际问题中的应用。讨论了其优缺点,并给出了在机器学习实战中处理数据和训练算法的步骤。通过对Horse-Colic数据集的分析,展示了如何利用随机梯度上升算法找到最佳回归参数。
2328

被折叠的 条评论
为什么被折叠?



