快速上手大模型:机器学习3(多元线性回归及梯度、向量化、正规方程)

目录

1 多元线性回归(Multiple liner regression)

1.1 定义

2 向量化

2.1 例子

2.2 代码实现

2.3 NumPy规则

2.4 向量点积运算(dot product multiplies the values)

2.5 矩阵创建

2.6 矩阵索引

3 多元线性回归的梯度下降法

3.1 定义

3.2 梯度函数

4 正规方程(Normal equation)求解代价函数

4.1 定义

4.2 缺点


学习目的:

让线性回归更快、更强大。

多元线性回归(Multiple liner regression)

1.1 定义

x_{j}:第j个特征;

n:特征总数;

\vec{x}^{(i)}:第i个训练样本的所有特征;

x_{j}^{(i)}:第j个特征第i个训练样本。

多元线性回归(Multiple liner regression):多个输入特征的线性回归模型,表达式f_{\vec{w},b}(\vec{x})=\vec{w}*\vec{x}+b

2 向量化

2.1 例子

向量化使代码简洁、运行高效。

2.2 代码实现

使用Python编译,调用NumPy包

(1)引入模块/包

import numpy as np    # it is an unofficial standard to use np for numpy
import time

(2)数组创建

代码:

# NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4);                print(f"np.zeros(4) :   a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,));             print(f"np.zeros(4,) :  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

输出:

# NumPy routines which allocate memory and fill arrays with value
a = np.zeros(4);                print(f"np.zeros(4) :   a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,));             print(f"np.zeros(4,) :  a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")

np.zeros(4):创建一个长度为4的数组,所有元素都是 0.0(浮点数);

a.shape:数组形状,这里是 (4,),表示一维数组长度 4;

a.dtype:数据类型。

2.3 NumPy规则

(1)索引(Indexing)

负的从末尾去找,超限报错。

例子:

#vector indexing operations on 1-D vectors
a = np.arange(10)
print(a)

#access an element
print(f"a[2].shape: {a[2].shape} a[2]  = {a[2]}, Accessing an element returns a scalar")

# access the last element, negative indexes count from the end
print(f"a[-1] = {a[-1]}")

#indexs must be within the range of the vector or they will produce and error
try:
    c = a[10]
except Exception as e:
    print("The error message you'll see is:")
    print(e)

输出:

[0 1 2 3 4 5 6 7 8 9]
a[2].shape: () a[2]  = 2, Accessing an element returns a scalar
a[-1] = 9
The error message you'll see is:
index 10 is out of bounds for axis 0 with size 10

(2)切片(Slicing)

a[start : stop : step]

start:起始索引,包含起始;

stop:终止索引,不包含终止位;

step:步长,每隔多少取一个。

例子:

#vector slicing operations
a = np.arange(10)
print(f"a         = {a}")

# access all elements index 3 and above
c = a[3:];        print("a[3:]    = ", c)

# access all elements below index 3
c = a[:3];        print("a[:3]    = ", c)

# access all elements
c = a[:];         print("a[:]     = ", c)

输出:

a[3:]    =  [3 4 5 6 7 8 9]
a[:3]    =  [0 1 2]
a[:]     =  [0 1 2 3 4 5 6 7 8 9]

2.4 向量点积运算(dot product multiplies the values)

代码:

import numpy as np

# test 1-D
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ") 
c = np.dot(b, a)
print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")

输出:

# np.dot(a, b).shape = () 此处输出为空,因为np.dot(a, b)是数值,不存在维度说法
NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = () 
NumPy 1-D np.dot(b, a) = 24, np.dot(a, b).shape = () 
2.5 矩阵创建

代码:

a = np.zeros((1, 5))                                       
print(f"a shape = {a.shape}, a = {a}")                     

a = np.zeros((2, 1))                                                                   
print(f"a shape = {a.shape}, a = {a}") 

a = np.random.random_sample((1, 1))  
print(f"a shape = {a.shape}, a = {a}") 

输出:

a = np.zeros((1, 5))                                       
print(f"a shape = {a.shape}, a = {a}")                     

a = np.zeros((2, 1))                                                                   
print(f"a shape = {a.shape}, a = {a}") 

a = np.random.random_sample((1, 1))  
print(f"a shape = {a.shape}, a = {a}") 
2.6 矩阵索引

代码:

#vector indexing operations on matrices
"""np.arange(6) → 生成一个一维数组 [0, 1, 2, 3, 4, 5]
   .reshape(-1, 2) → 重新调整形状为 N×2 的矩阵
   这里的 -1 表示 自动计算行数,使总元素数量不变。"""
a = np.arange(6).reshape(-1, 2)   #reshape is a convenient way to create matrices
print(f"a.shape: {a.shape}, \na= {a}")

#access an element
print(f"\na[2,0].shape:   {a[2, 0].shape}, a[2,0] = {a[2, 0]},     type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\n")

#access a row
print(f"a[2].shape:   {a[2].shape}, a[2]   = {a[2]}, type(a[2])   = {type(a[2])}")

输出:

a.shape: (3, 2), 
a= [[0 1]
 [2 3]
 [4 5]]

a[2,0].shape:   (), a[2,0] = 4,     type(a[2,0]) = <class 'numpy.int64'> Accessing an element returns a scalar

a[2].shape:   (2,), a[2]   = [4 5], type(a[2])   = <class 'numpy.ndarray'>

3 多元线性回归的梯度下降法

一元的详见3.5:

https://blog.youkuaiyun.com/weixin_45728280/article/details/153348420?spm=1001.2014.3001.5501

3.1 定义

参数(Parameters):\vec{w}=[w_{1},w_{2},...,w_{n}],b

模型(Model):f_{\vec{w},b}(\vec{x})=\vec{w}*\vec{x}+b

代价函数(Cost function):J(\vec{w},b)

梯度下降:w_{j}=w_{j}-\alpha \frac{\partial J(\vec{w},b)}{\partial w_{j}}

                  b=b-\alpha \frac{\partial J(\vec{w},b)}{\partial b}

代入后梯度函数:w_{n}=w_{n}-\alpha \frac{\sum_{i=1}^{m}(f_{\vec{w,b}}(\vec{x}^{(i)})-y^{(i)})x_{n}^{(i)}}{m}

                            b=b-\alpha \frac{\sum_{i=1}^{m}(f_{\vec{w,b}}(\vec{x}^{(i)})-y^{(i)}}{m}

3.2 梯度函数
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing






# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

输出:

Iteration    0: Cost  2529.46   
Iteration  100: Cost   695.99   
Iteration  200: Cost   694.92   
Iteration  300: Cost   693.86   
Iteration  400: Cost   692.81   
Iteration  500: Cost   691.77   
Iteration  600: Cost   690.73   
Iteration  700: Cost   689.71   
Iteration  800: Cost   688.70   
Iteration  900: Cost   687.69   
b,w found by gradient descent: -0.00,[ 0.2   0.   -0.01 -0.07] 
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178

4 正规方程(Normal equation)求解代价函数

采用正规方程可以直接求解w,b,无需使用梯度下降法迭代。

4.1 定义

仅用于线性回归,用于直接求解w,b而不需迭代。

4.2 缺点

(1)无法推广到其他算法;

(2)特征量n很大(>10000)运算很慢

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值