吴恩达机器学习代码及相关知识点总结--ex2（1.逻辑回归）

最新推荐文章于 2020-10-20 13:46:46 发布

AsteriaJoJo

最新推荐文章于 2020-10-20 13:46:46 发布

阅读量326

点赞数

文章标签： python 机器学习

本文链接：https://blog.youkuaiyun.com/qq_41462598/article/details/104521447

版权

查看数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data=pd.read_csv("code/ex2-logistic regression/ex2data1.txt",names=['Exam 1', 'Exam 2', 'Admitted'])
data.head()

在这里插入图片描述

positive=data[data["Admitted"].isin(["1"])]
negative=data[data["Admitted"].isin(["0"])]
fig,ax=plt.subplots(figsize=[12,8])
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='positive')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='negative')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()

关于pandas.isin()👆：
isin()接受一个列表，判断该列中元素是否在列表中。
举例：

>>> df
          A         B         C         D  E
0 -0.018330  2.093506 -0.086293 -2.150479  a
1  0.104931 -0.271810 -0.054599  0.361612  a
2  0.590216  0.218049  0.157213  0.643540  c
3 -0.254449 -0.593278 -0.150455 -0.244485  b
>>> df.E.isin(['a','c'])
0     True
1     True
2     True
3    False
Name: E, dtype: bool
————————————————
版权声明：本文为优快云博主「lzw2016」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/lzw2016/article/details/80472649

数据可视化后：
在这里插入图片描述

sigmoid函数

在这里插入图片描述

def sigmoid(z):
    return 1/(1+np.exp(-z))

代价函数

对于线性回归模型，我们定义的误差是所有模型误差的平方和：在这里插入图片描述
但对于逻辑回归模型，将代入上述代价函数中，我们将得到一个非凸函数，导致我们的代价函数有许多局部最小值，这将影响梯度下降算法找全局最小值。
所以我们重新定义代价函数：

data.insert(0,"ones",1)#添加一列x0=1，使x的数量与θ相同
cols=data.shape[1]
X=data.iloc[:,0:cols-1]
Y=data.iloc[:,cols-1:cols]
X=np.array(X.values)
Y=np.array(Y.values)
theta=np.zeros(3)

def cost(theta,X,Y):
    theta=np.matrix(theta)
    z=np.dot(X,theta.T)
    m=len(X)
    cost=1/m*np.sum(np.multiply(-Y,np.log(sigmoid(z)))-np.multiply((1-Y),np.log(1-sigmoid(z))))
    return cost

cost(theta,X,Y)
在这里插入图片描述
theta=np.matrix(theta)让theta从（3，）变为（3，1）
前者是一维的，后者是二维的。
例如：
np.sum(二维矩阵,axis=1)得到结果为一维
np.sum(二维矩阵,axis=1,keepdims=True)得到结果为二维
关于Python列表、Numpy数组与矩阵的区别

梯度下降

在这里插入图片描述

def gradient(theta,X,Y):
    theta = np.matrix(theta)
    X = np.matrix(X)
    Y = np.matrix(Y)
    parameters=int(theta.ravel().shape[1])
    grads=np.zeros(parameters)
    z=np.dot(X,theta.T)
    error=sigmoid(z)-Y
    for i in range(parameters):
        term=np.multiply(error,X[:,i])
        grads[i]=np.sum(term)/len(X)
    return grads
gradient(theta,X,Y)

在这里插入图片描述
注意，我们实际上没有在这个函数中执行梯度下降，我们仅仅在计算一个梯度步长。在练习中，一个称为“fminunc”的Octave函数是用来优化函数来计算成本和梯度参数。由于我们使用Python，我们可以用SciPy的“optimize”命名空间来做同样的事情。

import scipy.optimize as opt
res = opt.minimize(fun=cost, x0=theta, args=(X, Y), method='Newton-CG', jac=gradient)
print(res)

在这里插入图片描述
关于scipy.optimize:

scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)[source]¶

fun:最小化的目标函数
x0：初始化的参数
args：传给目标函数的参数
优化方法：
bounds:参数取值范围限制（Bounds on variables for L-BFGS-B, TNC, SLSQP and
trust-constr methods.）
constraints:约束函数（Constraints definition (only for COBYLA, SLSQP and
trust-constr)
……
scipy.optimize.minimize

预测

在这里插入图片描述

def predict(theta, X):
    z=np.dot(X,theta.T)
    probs = sigmoid(z)
    return [1 if x >= 0.5 else 0 for x in probs]

theta_min = np.matrix(res.x)
predictions = predict(theta_min, X)
correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(predictions, Y)]
accuracy = (sum(map(int, correct)) % len(correct))
print ('accuracy = {0}%'.format(accuracy))

accuracy = 89%

绘制决策边界

coef = -(res.x / res.x[2])  # find the equation
print(coef)

x = np.arange(130, step=0.1)
y = coef[0] + coef[1]*x
data.describe()  # find the range of x and y

coef[0]为截距，大概为125左右
在这里插入图片描述

fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
ax.plot(x,y,'grey')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()