本篇介绍单变量线性回归之代码实现。如下:
首先是梯度下降法:
参照吴恩达《机器学习》课程及课件,单变量线性回归模型(Hypothesis)、代价函数(Cost Function)、目标函数(Goal)如下:
代码实现如下:
import numpy as np
import matplotlib.pyplot as plt
a = np.loadtxt('ex1data1.txt')
m=a.shape[0]
print(m)
print(type(a))
x=a[:,0]
y=a[:,1]
plt.scatter(x,y,marker='*',color='r',s=20)
theta0=0
theta1=0
iterations = 1500
alpha = 0.01
def gradientdescent(x,y,theta0,theta1,iterations,alpha):
J_h=np.zeros((iterations,1))
for i in range(0,iterations):
y_hat=theta0+theta1*x
temp0=theta0-alpha*((1/m)*sum(y_hat-y))
temp1=theta1-alpha*(1/m)*sum((y_hat-y)*x)
theta0=temp0
theta1=temp1
y_hat2=theta0+theta1*x
aa=sum((y_hat2-y)**2)
J=aa*(1/(2*m))
J_h[i,:]=J
return theta0,theta1,J_h
(theta0,theta1,J_h) = gradientdescent(x,y,theta0,theta1,iterations,alpha)
print(theta1)
print(theta0)
plt.plot(x,theta0+theta1*x)
plt.title("fittingcurve")
plt.show()
x2=np.arange(iterations)
plt.plot(x2,J_h)
plt.title("costfunction")
plt.show()
第二种方法是标准方程法:
即最小二乘法计算,相信学过线性代数的同学不难理解这种方法,代码如下:
import numpy as np
from pylab import *
def train_wb(X, y):
"""
:param X:N*D的数据
:param y:X对应的y值
:return: 返回(w,b)的向量
"""
if np.linalg.det(X.T * X) != 0:
wb = ((X.T.dot(X).I).dot(X.T)).dot(y)
return wb
def test(x, wb):
return x.T.dot(wb)
def getdata():
x = []; y = []
file = open("ex0.txt", 'r')
for line in file.readlines():
temp = line.strip().split("\t")
x.append([float(temp[0]),float(temp[1])])
y.append(float(temp[2]))
return (np.mat(x), np.mat(y).T)
def draw(x, y, wb):
#画回归直线y = wx+b
a = np.linspace(0, np.max(x)) #横坐标的取值范围
b = wb[0] + a * wb[1]
plot(x, y, '.')
plot(a, b)
show()
X, y = getdata()
wb = train_wb(X, y)
draw(X[:, 1], y, wb.tolist())
对比梯度下降法和标准方程法,我们可以知道,梯度下降法对于很多特征的模型也能进行运算(比如多元线性回归模型),而对于标准方程法,因为要计算逆矩阵,对于多特征模型则因计算量过大显得更不适合。不过对于单变量线性模型,两种方法效果差别不大!
附:本文参考博客如下:
https://blog.youkuaiyun.com/qq_20406597/article/details/80020528
https://blog.youkuaiyun.com/u014028027/article/details/72667733