机器学习入门：线性回归与梯度下降-优快云博客

本文链接：https://blog.youkuaiyun.com/clicx/article/details/118874183

前言

本人菜狗一个，啥也不懂，但是苦于夏令营面试，还是得来学习一下。
学习课程：
主要是吴承恩的课程：https://www.bilibili.com/video/BV164411b7dx?from=search&seid=4111199140701962956
这里根据课程进度记录一些知识，以便后面复习使用。

名词

$learning监督学习：supuervised\ learning$
$problem回归问题：regression\ problem$
$problem分类问题：classification\ problem$
$learning无监督学习：unsupervised\ learning$
$function代价函数：cost\ function$
$Optimization组合优化:Combinatorial\ Optimization$
$squares最小二乘法：ordinary\ least\ squares$
$descent梯度下降：Gradient\ descent$
$Plots等高线图：Countour\ Plots$
$Optimum局部最优：Local\ Optimum$

P1-P2课程内容

课程中有一些规定:
$x^{(i)}代表x列第i行，y^{(i)}$ 同理
P1只有一些基础知识，这里直接记录P2线性回归的内容
线性回归主要内容(以只有两个 $θ\theta$ 变量为例)：
Hypothesis: $hθ(x)=θ0+θ1xh_\theta(x)=\theta_0+\theta_1x$
Parameters: $θ0,θ1\theta_0,\theta_1$
Cost Function: $J(θ0,θ1)=12m∑i=1m(hθ(x(i))−y(i))2J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
Goal: $minimizeθ0,θ1J(θ0,θ1)\underset{\theta_0,\theta_1}{minimize}J(\theta_0,\theta_1)$
解释：线性回归就是利用线性函数来拟合数据，所以Hypothesis就是线性函数的通式，关键在于如何找出 $θ0,θ1\theta_0,\theta_1$ 两个参数，这里使用的代价函数就是为了找出这两个参数的东西。代价函数有很多，但这个代价函数在线性回归问题里很常用，所以使用它。并且，我们观察式子可以这样认为，当J取值尽量小时，拟合程度越好，所以问题转变为找到最小值，这里通过取其最小值可以办到，下面介绍的梯度下降算法就是求其最优解（值）：

梯度下降算法

可以参考这篇文章，更好更详细：
https://www.cnblogs.com/pinard/p/5970503.html
梯度下降算法不仅仅只能应用于上面的那个代价函数，也可以应用于其更一般形式的 $J(θ0,θ1,θ2....)J(\theta_0,\theta_1,\theta_2....)$
这里配合下面的图说明梯度下降算法流程：
在这里插入图片描述

首先解释一下图，Z轴代表的就是J值，所以我们要找到最小值，就是找到最低点。
步骤如下：
在这里插入图片描述
**细节：

$θ0\theta_0$ 和 $θ1\theta_1$ 应该同步更新
这里的 $α\alpha$ 指定是下降的倍率（步长），决定下降的快慢，要预先设置
$θ0\theta_0$ 对应偏导 $αφφθ0J=1m∑i=1mJ(hθ(x(i))−y(i))2∗1\alpha\frac{\varphi}{\varphi\theta_0}J=\frac{1}{m}\sum_{i=1}^mJ(h_\theta(x^{(i)})-y^{(i)})^2*1$
$θ1\theta_1$ 对应偏导 $αφφθ1J=1m∑i=1mJ(hθ(x(i))−y(i))2∗x\alpha\frac{\varphi}{\varphi\theta_1}J=\frac{1}{m}\sum_{i=1}^mJ(h_\theta(x^{(i)})-y^{(i)})^2*x$
注意求和符，可以看出当前点的梯度方向是由所有的样本决定的

练习

题目来自于课程的配套习题
博客不是很适合记录，建议参照这个大佬的做一遍：
https://www.heywhale.com/mw/project/5da16a37037db3002d441810
单变量回归：
记录一下代码和结果：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
path='/home/flokken/workspace/Machine_Learning/data_sets/ex1data1.txt'
data=pd.read_csv(path,header=None,names=['Population','Profit'])
data.head()
data.plot(kind='scatter',x='Population',y='Profit',figsize=(12,8))
data.insert(0,'Ones',1)
cols=data.shape[1]
X=data.iloc[:,:-1]
y=data.iloc[:,cols-1:cols]
X=np.matrix(X.values)
y=np.matrix(y.values)
theta=np.matrix(np.array([0,0]))
def computeCost(X,y,theta):
    inner=np.power(((X*theta.T)-y),2)
    return np.sum(inner)/(2*len(X))
alpha=0.01
iters=1500
def gradientDescent(X,y,theta,alpha,iters):
    temp=np.matrix(np.zeros(theta.shape))
    parameters=int(theta.ravel().shape[1])
    cost=np.zeros(iters)
    for i in range(iters):
        error=(X*theta.T)-y
        for j in range(parameters):
            term =np.multiply(error,X[:,j])
            temp[0,j]=theta[0,j]-((alpha/len(X))*np.sum(term))
        theta =temp
        cost[i]=computeCost(X,y,theta)
    return theta,cost
g,cost=gradientDescent(X,y,theta,alpha,iters)
predict1=[1,3.5]*g.T
predict2=[1,7]*g.T
x=np.linspace(data.Population.min(),data.Population.max(),100)
f=g[0,0]+g[0,1]*x
fig,ax=plt.subplots(figsize=(12,8))
ax.plot(x,f,'r',label='Prediction')
ax.scatter(data.Population,data.Profit,label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()