DataWhale

本文详细介绍了机器学习的基本概念,包括有监督学习、无监督学习、过拟合与欠拟合、泛化能力及交叉验证。同时深入探讨了线性回归原理,损失函数、成本函数和目标函数的概念,以及常用的优化方法如梯度下降和牛顿法等。此外,还介绍了线性回归的评估指标和sklearn库中线性回归模型的参数设置。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1 SOME CONCEPTS

1.1 Supervised learning

There are input x x x and label y y y.
What the supervised learning needs to do is evaluate x x x to label y ′ y^{'} y with clear purpose.
( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , ( x ( 3 ) , y ( 3 ) ) (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),(x^{(3)},y^{(3)}) (x(1),y(1)),(x(2),y(2)),(x(3),y(3))

1.2 Unsupervised learning

In unsupervised learning, there are only input x x x without label y y y.
The method itself can classify the x to different categories named cluster.
( x ( 1 ) ) , ( x ( 2 ) ) , ( x ( 3 ) ) (x^{(1)}),(x^{(2)}),(x^{(3)}) (x(1)),(x(2)),(x(3))

1.3 Overfitting & Underfitting

The curve fitting is far away from data as the left picture called underfitting.
The curve fitting too much to get the rule of the data like the right picture called overfitting.
在这里插入图片描述

1.4 Generalization

In machine learning, fitting a set of training data is not enough, you have to make the model work well to the set of test data which is named generalization.

1.5 Cross-validation

All the data can be divided some data sets. We choice one set to be valid-set, one to be test set, and others to be the training set each time.
The different choices of the valid-set from the data sets is called cross-validation.

2 LINEAR REGRESSION

2.1 Principle

For a data set ( X , Y ) (X,Y) (X,Y), we use a linear function to fit the data.
The purpose is to predict the value of y y y according to the given x x x with optimal parameters.

2.2 Loss function, cost function and objective function

The loss function computes the error for a single training example.
The cost function is the average of the loss functions of the entire training set.
In linear regression, you have to minimum the structural risk or empirical risk function named objective function to fit just right.
∣ y i − f ( x i ) ∣ |y_i-f(x_i)| yif(xi) 1 N ∑ i = 1 N ∣ y i − f ( x i ) ∣ \frac{1}{N}\sum_{i=1}^N|y_i-f(x_i)| N1i=1Nyif(xi) m i n ( 1 N ∑ i = 1 N ∣ y i − f ( x i ) ∣ + λ J ( f ) ) min(\frac{1}{N}\sum_{i=1}^N|y_i-f(x_i)|+\lambda J(f)) min(N1i=1Nyif(xi)+λJ(f))

2.3 Optimization

  1. gradient descent
    In a linear regression model, compute each parameter’s gradient, and change the parameter towards the direction of gradient descent.
  2. Newton method
    Input: objective function f ( x ) f(x) f(x), gradient g ( x ) = ▽ f ( x ) g(x)=\bigtriangledown f(x) g(x)=f(x), Hessian matrix H ( x ) H(x) H(x), precision ϵ \epsilon ϵ.
    Output: the minimum point of f ( x ) f(x) f(x).
    steps:
    1. select the initial point randomly x 0 x_0 x0, number of iteration k = 0 k=0 k=0 ;
    2. computes the gradient g ( x k ) g(x_k) g(xk) and Hessian matrix H ( x k ) H(x_k) H(xk) of objective at point x k x_k xk, if ∣ ∣ g ( x k ) ∣ ∣ &lt; ϵ ||g(x_k)||&lt;\epsilon g(xk)<ϵ, stop it and the approximate solution is x ∗ = x k x^*=x_k x=xk ;
    3. update the value of x k + 1 x_{k+1} xk+1 according to the equation: x k + 1 = x k − H − ( x k ) ▽ f ( x k ) x_{k+1}=x_k-H^{-}(x_k)\bigtriangledown f(x_k) xk+1=xkH(xk)f(xk)
  3. Quasi-Newton method
    The basic idea of quasi-Newton method is replace H − ( x k ) H^-(x_k) H(xk) using G ( x k ) G(x_k) G(xk) to simplify the calculation process in Newton method.
    The rule of replacement is as follows:

    The matrix G ( x k ) G(x_k) G(xk) is positive ;
    G ( x k ) G(x_k) G(xk) satisfy the quasi-Newton condition: G ( x k ) ( ▽ f ( x k + 1 ) − f ( x k ) ) = x k + 1 − x k G(x_k)(\bigtriangledown f(x_{k+1})-f(x_{k}))=x_{k+1}-x_k G(xk)(f(xk+1)f(xk))=xk+1xk.

Obviously, the choice of G ( x k ) G(x_k) G(xk) is not unique, the common algorithms for that are DEP, BFGS and Broyden algorithm.

2.4 Evaluation index of linear regression

  • R-Squared(coefficient of determination)
    R 2 = 1 − ∑ ( Y _ a c t u a l − Y _ p r e d i c t ) 2 ∑ ( Y _ a c t u a l − Y _ m e a n ) 2 R^2=1-\frac{\sum(Y\_actual-Y\_predict)^2}{\sum(Y\_actual-Y\_mean)^2} R2=1(Y_actualY_mean)2(Y_actualY_predict)2
  • Adjusted R-Squared(degree-of-freedom adjusted coefficient of determination)
    R 2 _ a d j u s t e d = 1 − ( 1 − R 2 ) ( n − 1 ) n − p − 1 R^2\_adjusted=1-\frac{(1-R^2)(n-1)}{n-p-1} R2_adjusted=1np1(1R2)(n1)
  • RMSE
    1 N ∑ i = 1 N ( Y _ p r e d i c t − Y _ m e a n ) 2 \sqrt{\frac{1}{N}\sum_{i=1}^{N}(Y\_predict-Y\_mean)^2} N1i=1N(Y_predictY_mean)2
  • MSE
    M S E = 1 N ∑ i = 1 N ( Y _ p r e d i c t − Y _ m e a n ) 2 MSE=\frac{1}{N}\sum_{i=1}^{N}(Y\_predict-Y\_mean)^2 MSE=N1i=1N(Y_predictY_mean)2
  • MAE
    M A E = 1 N ∑ i = 1 N ∣ Y _ p r e d i c t − Y _ m e a n ∣ MAE=\frac{1}{N}\sum_{i=1}^{N}|Y\_predict-Y\_mean| MAE=N1i=1NY_predictY_mean
  • SSE
    S S E = ∑ ( Y _ a c t u a l − Y _ p r e d i c t ) 2 SSE=\sum(Y\_actual-Y\_predict)^2 SSE=(Y_actualY_predict)2
  • F Statistic

2.5 Parameters of sklearn

call the function in sklearn

sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)
  • fit_intercept: if the intercept equals zero
  • normalize: whether to normalize the data
  • copy_X: if X will be overwritten
  • n_jobs: number of cores used in calculation

REFERENCE

[1] https://yoyoyohamapi.gitbooks.io/mit-ml/content/大规模机器学习/articles/梯度下降.html

[2] https://developers.google.com/machine-learning/crash-course/prereqs-and-prework?hl=zh-cn

[3] https://blog.youkuaiyun.com/shy19890510/article/details/79375062

[4] https://blog.youkuaiyun.com/li980828298/article/details/51273385

[5] https://zhuanlan.zhihu.com/p/38185542

[6] https://blog.youkuaiyun.com/zrh_优快云/article/details/81190221

[7] https://www.zhihu.com/question/52398145

Datawhale 是一个专注于数据科学和机器学习领域的开源社区,致力于为学习者提供高质量的学习资源和实践机会。该组织通过组队学习、开源课程以及竞赛基准方案等多种形式推动数据科学知识的普及和技术能力的提升。 在数据科学方面,Datawhale 提供了诸如《动手学数据分析》这样的精品入门课程[^5],帮助初学者系统性地掌握数据分析的基础技能。此外,还维护了一个名为 "China Competition Baseline" 的开源项目,该项目旨在为各类数据竞赛(如 Kaggle 和天池等平台的比赛)提供基础模板和参考代码,涵盖数据预处理、特征工程、模型构建及调参等关键步骤,使得参与者能够快速上手并深入理解比赛解决方案的设计思路[^1]。 对于机器学习领域,Datawhale 同样推出了多门相关课程,包括集成学习、基于Python的会员数据化运营、R语言数据科学、机器学习的数学基础以及李宏毅机器学习(含深度学习)等内容丰富的学习材料[^3]。这些课程不仅覆盖了理论知识,也注重实际应用,适合不同层次的学习者根据自身需求选择合适的内容进行学习。 如果你对参与具体的项目或者想要获取更多关于 Datawhale 组织的信息,可以访问其 GitHub 页面或是论坛版块来了解更多细节[^4]。 ```markdown ### 示例代码:使用Python实现简单的线性回归模型 ```python from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split import numpy as np # 创建一些示例数据 X = np.random.rand(100, 1) * 100 y = 3 * X.squeeze() + 2 + np.random.randn(100) * 10 # 分割训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 创建线性回归模型实例 model = LinearRegression() # 训练模型 model.fit(X_train, y_train) # 输出系数和截距 print(f"Coefficient: {model.coef_[0]}, Intercept: {model.intercept_}") ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值