机器学习之线性回归及代码

最新推荐文章于 2025-11-30 17:35:54 发布

原创

最新推荐文章于 2025-11-30 17:35:54 发布 · 548 阅读

1 ·

CC 4.0 BY-SA版权

本文深入探讨了线性回归的基本原理，包括假设函数和损失函数，并介绍了批梯度下降、随机/增量梯度下降以及正规方程等优化方法在模型训练中的应用。通过实际操作，强调了步长设定对收敛性和模型拟合效果的重要性。

线性回归是对数据特征赋予一定的权重以实现对数据模型的拟合。

线性回归包含一个假设函数（hypothesis function），一个损失函数（loss function）。其目标为最小化损失函数，常用方法是最小二乘法。

本文主要用批梯度下降算法（batch gradient descent）、随机/增量梯度下降算法（stochastic/incremental gradient descent）、正规方程方法（normal equations）实现对模型的训练。

在模型训练过程中，发现判断收敛的条件对模型的拟合程度有重要的影响。在步长的设置上，前期一度因为步长设置过大而导致损失函数越来越大（即不收敛），错过最优解。而后调整步长，终于训练出拟合程度较好的模型。

下面是代码：

一些通用函数：

import csv
import numpy as np


### m represents the number of training data
### n represents the number of features


# This function is used to read csv file by csv package
# Input: file path of csv file
# Output: feature(array(m*(n+1))): for each row, 1, value of feature1, value of feature2,..., feature n
#         result(array(m)): array of result
def read_csv(file_path):
    file_open=open(file_path,"r")
    file_read=csv.reader(file_open)
    res=[]
    for row in file_read:
        if len(row)==0:
            break
        if file_read.line_num==1:
            element_num=len(row)
            continue
        elif file_read.line_num==2:
            feature=[1.0]
            count=0
            for element in row:
                if count<element_num-1:
                    feature.append(float(element))
                    count+=1
                else:
                    res.append(float(element))