吴恩达机器学习笔记二回归regression

最新推荐文章于 2025-10-12 14:48:23 发布

原创最新推荐文章于 2025-10-12 14:48:23 发布 · 941 阅读

5 ·

CC 4.0 BY-SA版权

机器学习专栏收录该内容

12 篇文章

订阅专栏

本文深入讲解了单变量及多元线性回归原理与实现，包括预测函数、代价函数、梯度下降法及其在TensorFlow中的应用。

部署运行你感兴趣的模型镜像

1. 单变量线性回归

假设样本中有 $m$ 组数据 $(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…,(x^{(m)},y^{(m)})$
预测函数

h θ (x) = θ 0 + θ 1 x

$h_{\theta}(x)=\theta_{0}+\theta_{1}x$ 代价函数

J (θ 0, θ 1) = 1 2 m \sum i = 1 m (h θ x (i) - y (i)) 2

$J(\theta_{0},\theta_{1}) = \frac{1}{2m}\sum_{i=1}^{m} (h_\theta x^{(i)}-y^{(i)})^2$ 目标求解

(θ 0, θ 1) = a r g min θ 0, θ 1 J (θ 0, θ 1)

$(\theta_{0},\theta_{1})=arg\min_{\theta_{0},\theta_{1}} J(\theta_{0},\theta_{1})$ 比较常用的方法就是梯度下降法。梯度与方向导数密切相关，是从泰勒级数的展开来证明的，比较简单，这里就不推导了。具体算法过程：

t e m p 0 = θ 0 - α \partial \partial θ 0 J (θ 0, θ 1)

$temp0 = \theta_{0}-\alpha \frac{ \partial }{ \partial{\theta_{0}} } J(\theta_{0},\theta_{1})$

t e m p 1 = θ 1 - α \partial \partial θ 1 J (θ 0, θ 1)

$temp1 = \theta_{1}-\alpha \frac{ \partial }{ \partial{\theta_{1}} } J(\theta_{0},\theta_{1})$

θ 0 = t e m p 0

$\theta_{0}=temp0$

θ 1 = t e m p 1

$\theta_{1}=temp1$

注意

梯度下降算法过程中的步骤二、三顺序一定不要乱，因为第二步中的计算与 $\theta_{0}$ 是有关系的
对训练集数据进行均值归一化，利于提升收敛速度
注意学习步长的选择

2. 多元线性回归：

多元线性回归是由单变量线性回归延伸而来的。多变量线性回归中的 $m$ 组数据 $表示如下：(x_1^{(1)},x_2^{(1)},…,x_n^{(1)},y^{(1)}),(x_1^{(2)},x_2^{(2)},…,x_n^{(2)},y^{(2)}),…,(x_1^{(m)},x_2^{(m)},…,x_n^{(m)},y^{(m)})$
预测函数

h θ ⃗ (x ⃗) = θ ⃗ T x ⃗ = θ 0 + θ 1 x 1 + θ 2 x 2 + \dots + θ n x n

$h_{\vec{\theta}}(\vec{x}) ={\vec{\theta}}^T\vec{x} = \theta_{0} + \theta_{1}x_1 + \theta_{2}x_2 + …+ \theta_{n}x_n$ 其中，

θ⃗ =(θ0,θ1,…,θn)Tθ→=(θ0,θ1,…,θn)T ${\vec{\theta}} = (\theta_0,\theta_1,…,\theta_n)^T$ ,

x⃗ =(1,x1,x2,…,xn)Tx→=(1,x1,x2,…,xn)T $\vec{x} =(1,x_1,x_2,…,x_n)^T$
代价函数

J (θ ⃗) = 1 2 m \sum i = 1 m (h θ ⃗ x ⃗ (i) - y (i)) 2

$J(\vec{\theta}) = \frac{1}{2m}\sum_{i=1}^{m} (h_\vec{\theta} \vec{x}^{(i)}-y^{(i)})^2$ 目标求解

θ ⃗ = a r g min θ ⃗ J (θ ⃗)

$\vec{\theta}=arg\min_{\vec{\theta}} J(\vec{\theta})$

2.1 梯度下降法

多元线性回归依然可以用梯度下架你敢发进行求解，只不过是在高维空间的梯度下降，不再是三维空间那么地可视化。具体算法过程：

t e m p 0 = θ 0 - α \partial \partial θ 0 J (θ ⃗)

$temp0 = \theta_{0}-\alpha \frac{ \partial }{ \partial{\theta_{0}} } J(\vec{\theta})$

t e m p 1 = θ 1 - α \partial \partial θ 1 J (θ ⃗)

$temp1 = \theta_{1}-\alpha \frac{ \partial }{ \partial{\theta_{1}} } J(\vec{\theta})$

\dots

$…$

t e m p n = θ_{n} - α \frac{\partial}{\partial θ_{n}} J (\vec{θ}))

$tempn = \theta_{n}-\alpha \frac{ \partial }{ \partial{\theta_{n}} } J(\vec{\theta}))$

θ 0 = t e m p 0

$\theta_{0}=temp0$

θ 1 = t e m p 1

$\theta_{1}=temp1$

\dots

$…$

θ_{n} = t e m p n

$\theta_{n}=tempn$

2.2 最小二乘

另外一个求解多元线性回归的方法是最小二乘法。具体推导如下：
整个预测过程可以用方程组表示为

x ⃗ (1) T θ ⃗ = y (1)

${{\vec{x}}^{(1)}}^T \vec{\theta}= y^{(1)}$

x ⃗ (2) T θ ⃗ = y (2)

${{\vec{x}}^{(2)}}^T \vec{\theta}= y^{(2)}$

\dots

$…$

{\vec{x}}^{(m)}^{T} \vec{θ} = y^{(m)}

${{\vec{x}}^{(m)}}^T \vec{\theta}= y^{(m)}$
将方程组表示成矩阵的形式，

X θ ⃗ = y ⃗

$X\vec{\theta} = \vec{y}$ 其中，

矩阵X=[x⃗ (1)T;x⃗ (2)T;…;x⃗ (m)T]矩阵X=[x→(1)T;x→(2)T;…;x→(m)T] $矩阵X = [{\vec{x}^{(1)}}^{T} ; {\vec{x}^{(2)}}^{T}; … ; {\vec{x}^{(m)}}^{T} ]$ ,可以求解得到

θ ⃗ = (X T X) - 1 X T y ⃗

$\vec{\theta} = (X^{T}X)^{-1}X^{T}\vec{y}$

2.3最小二乘的几何意义

对方程组进行变换，令

x ⃗ i = (x i (1) x i (2) \dots x i (m)) T

$\vec{x}_i = ( {x_i}^{(1) }{x_i}^{(2)} … {x_i}^{(m)} )^T$

y ⃗ = (y (1) y (2) \dots y (m)) T

$\vec{y} = ( y^{(1)} y^{(2)} … y^{(m)} )^T$ 那么

y ⃗ = θ 0 x ⃗ 0 + θ 1 x ⃗ 1 + \dots + θ n x ⃗ n

$\vec{y} = \theta_0 \vec{x}_0 + \theta_1 \vec{x}_1 + … + \theta_n \vec{x}_n$ 其实，上式基本上是没有解的。但是，我们可以找到一个使得代价函数最小的解。代价函数最小的几何意义在于在 $\vec{x}_0 \vec{x}_1 … \vec{x}_n$ 所张成的线性子空间中，寻找一点，使得这一点到 $\vec{y}$ 的距离最短。很显然，这一点就是 $\vec{y}$ 的投影，这也是最小二乘法求解的精髓所在。
其实，每个权值都代表了相应的特征对最终结果的贡献，多个训练样本会让特征对结构的贡献的衡量变得更加准确。

2.4 最小二乘与梯度下降法的选择

当数据的特征 $n$ 超过一定界限时，最小二乘中矩阵的求逆运算将会变得十分复杂，此时一般会选择梯度下降法。至于这个界限，可以选择 $10^5$ ~ $10^6$ 作为参考。

3. 多项式回归

多项式回归可以转变为多元线性回归，核心在于用已知的特征组合出新的特征。预测函数，例如

h θ ⃗ (x ⃗) = θ 0 + θ 1 x 1 + θ 2 x 21 + θ 3 x 1 - - \sqrt

$h_{\vec{\theta}}(\vec{x}) = \theta_{0} + \theta_{1}{x_1}+ \theta_{2}x_1^2 + \theta_{3} \sqrt{x_1}$ 其中每一项都是可以计算出的。
至于多项式中应该选择怎样的高次项，就需要根据大概形状进行一个初次的选择。此外，也要根据实际情况，比如，房屋总价随着面积增长一般是不会减少的，所以此时应该二次项是不够的，还需要一个三次项。
在多项式回归中，特征的缩放将会变得尤其重要，因为其中含有同一特征的不同次项，他们的范围是不同的，但是由于是同一特征，不可能进行不同size的缩放的。

4. 单变量线性回归tensorflow

代码有参考网上的博客，第一次写tensorflow，万事开头难，还好总算是上手了。关于tensorflow的一些常用函数用法，后面会专门总结一些，毕竟不死不活，记死了才能活。

'''
Author       :  vivalazxp
Date         :  8/23/2018
Description  :  linear regression with one value
'''
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
'''
Description   :  create data for linear regression with one value
Param         :  @weight    weight of the line needed to be fitting
                 @bias      bias of the line needed to be fitting
                 @numData   number of training data
                 @sigma     power of noises
Return        :  @data_horizon   horizontal-axis of training data    shape=(numData,)
                 @data_vertical  vertical-axis of training data    shape=(numData,)
'''
def data_create_lin_reg_one_val(numData, weight, bias, horizon_limit, sigma):
    data_horizon = horizon_limit * 2*(np.random.rand(numData)-0.5)
    data_vertical = weight * data_horizon + bias
    # add noise
    data_vertical += sigma * np.random.randn(numData)
    print('------------- create training data sucessfully --------------')
    return data_horizon, data_vertical
'''
Description   :  ues tensorflow to complete linear regression with one value
Param         :  @alpha     learning rate
                 @steps     sum learning steps
Return        :  @weight_fitted    weight of the fitting line
                 @bias_fitted      bias of the fitting line
'''
def tf_lin_reg_one_val(data_horizon, data_vertical, steps, alpha):
    horizon_from_data = tf.placeholder(tf.float32)
    vertical_from_data = tf.placeholder(tf.float32)
    # initialize randomly weight and bias
    weight_fitted = tf.Variable(tf.random_normal([1]))
    bias_fitted = tf.Variable(tf.random_normal([1]))
    # cost function and optimizer
    vertical_pred = tf.multiply(weight_fitted, horizon_from_data) + bias_fitted
    cost = tf.reduce_mean(tf.pow(vertical_pred - vertical_from_data, 2))
    optimizer = tf.train.GradientDescentOptimizer(alpha).minimize(cost)
    # session initialization
    sess = tf.Session()
    init = tf.global_variables_initializer()
    sess.run(init)
    print('---------------- train started ------------------------')
    loss = np.zeros(steps)
    for step in range(steps):
        sess.run(optimizer, feed_dict={horizon_from_data: data_horizon, vertical_from_data: data_vertical})
        loss[step] = sess.run(cost,feed_dict={horizon_from_data: data_horizon, vertical_from_data: data_vertical})
    print('---------------- train finished ------------------------')
    weight_fitted = sess.run(weight_fitted)
    bias_fitted = sess.run(bias_fitted)
    return weight_fitted, bias_fitted, loss


if __name__ == "__main__":
    weight = 100
    bias = 2.0
    horizon_limit = 10
    numData = 1000
    sigma = weight
    steps = 10000
    alpha = 0.0001
    data_horizon, data_vertical = data_create_lin_reg_one_val(numData, weight, bias, horizon_limit, sigma)
    weight_fitted, bias_fitted, loss = tf_lin_reg_one_val(data_horizon, data_vertical, steps, alpha)
    # log
    print('expected  weight = ', weight, ', expected  bias = ', bias)
    print('regression weight = ', weight_fitted, ', regression bias = ', bias_fitted)
    # fitting line
    plt.figure(1)
    horizon_fit = np.linspace(-horizon_limit, horizon_limit, 200)
    vertical_fit = weight_fitted*horizon_fit + bias_fitted
    plt.plot(data_horizon, data_vertical, 'o', label='training data')
    plt.plot(horizon_fit, vertical_fit, 'r', label='regression line')
    plt.legend()
    plt.xlabel('horizontal axis')
    plt.ylabel('vertical axis')
    plt.title('linear regression with one value')
    # cost variation
    plt.figure(2)
    plt.plot(range(steps), loss)
    plt.xlabel('step')
    plt.ylabel('loss')
    plt.title('loss variation in linear regression with one value')

    plt.show()

这里写图片描述

5. 多项式回归梯度下降 tensorflow

这段代码也有参考网上，在调节参数的时候，遇到了出现nan的问题，后面继续研究一下。

'''
Author       :  vivalazxp
Date         :  11/9/2018
Description  :  non-linear regression regulization
'''
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

'''
Description   :  create data for non-linear regression of sin(x)
Param         :  @numData   numbers of training data     
                 @sigma     power of noises                      
Return        :  @data_horizon   horizontal-axis of training data    shape=(numData,)
                 @data_vertical  vertical-axis of training data    shape=(numData,)
'''
def data_create_sin_non_lin_reg(numData, sigma, horizon_limit):
     data_horizon = np.linspace(-horizon_limit, horizon_limit, numData)
     data_vertical = np.sin(data_horizon)
     # add noise
     data_vertical += sigma * np.random.randn(numData)
     print('---------- create data sucessfully ----------')
     return data_horizon, data_vertical
'''
Description   :  use tensorflow to complete non-linear regression of sin(x)
Param         :  @alpha    learning rate
                 @steps    sum learning steps
                 @n_order  use n-order polynomial to fit sin(x)
Return        :  @theta    weights of fitting sin(x)  shape=(1,n_order+1)  
'''
def tf_non_lin_reg(n_order,data_horizon, data_vertical, alpha, steps):
    numData = data_vertical.shape[0]
    #placeholder for training data
    horizon_from_data = tf.placeholder(tf.float32)
    vertical_from_data = tf.placeholder(tf.float32)
    #initialize randomly theta and theta
    theta = tf.Variable(tf.random_normal([n_order+1]))
    vertical_pred = tf.zeros(numData)
    for index_n in range(n_order+1):
        vertical_pred = tf.add( vertical_pred, tf.multiply( theta[index_n], tf.pow( horizon_from_data, index_n*tf.ones([1,numData]))))

    #cost function and optimizer
    cost = tf.reduce_mean(tf.square(vertical_pred - vertical_from_data))
    optimizer = tf.train.GradientDescentOptimizer(alpha).minimize(cost)
    #session
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())
    print('-------- train started --------')
    loss = np.zeros(steps)
    for step in range(steps):
        sess.run(optimizer, feed_dict={horizon_from_data: data_horizon, vertical_from_data: data_vertical})
        loss[step] = sess.run(cost, feed_dict={horizon_from_data: data_horizon, vertical_from_data: data_vertical})
    print('-------- train finished --------')
    theta = sess.run(theta)
    return theta, loss

def main():
    numData = 100
    sigma = 0.2
    n_order = 3
    horizon_limit = 3
    alpha = 0.005
    steps = 1000
    data_horizon, data_vertical = data_create_sin_non_lin_reg(numData, sigma, horizon_limit)
    theta, loss = tf_non_lin_reg(n_order, data_horizon, data_vertical, alpha, steps)
    # fitting line
    plt.figure(1)
    horizon_fit = np.linspace(-horizon_limit, horizon_limit, 200)
    vertical_fit = np.zeros(200)
    for index in range(n_order+1):
        vertical_fit = np.add(vertical_fit, theta[index]* horizon_fit ** index)

    plt.plot(data_horizon, data_vertical, 'o', label='training data')
    plt.plot(horizon_fit, vertical_fit, 'r', label='regression curve')
    plt.legend()
    plt.xlabel('horizontal axis')
    plt.ylabel('vertical axis')
    plt.title('non-linear regression')

    # cost variation
    plt.figure(2)
    plt.plot(range(steps), loss)
    plt.xlabel('step')
    plt.ylabel('loss')
    plt.title('loss variation in non-linear regression')
    plt.show()

if __name__ == "__main__":
    main()