Machine Learning 02 - Multivariate Linear Regression

本文详细介绍了斯坦福大学吴恩达教授机器学习课程中的多元线性回归概念,包括多变量假设函数的形式及其向量化表达方式,并探讨了梯度下降算法在多元线性回归中的应用,特征缩放和归一化技巧,以及正规方程方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

正在学习Stanford吴恩达的机器学习课程,常做笔记,以便复习巩固。
鄙人才疏学浅,如有错漏与想法,还请多包涵,指点迷津。非常欢迎一起学习的伙伴们来讨论!

Week 02

2.1 Multivariate Linear Regression

2.1.1 Multiple Features
  • The multivariable form of the hypothesis function :
    hθ(x)=θ0x0+θ1x1+θ2x2+θ3x3++θnxnhθ(x)=θ0x0+θ1x1+θ2x2+θ3x3+⋯+θnxn

    =[θ0θ1θn]x0x1xn=θTx=[θ0θ1⋯θn][x0x1⋮xn]=θTx
  • Remark : For convenice, assume x(i)0=1for i1,,mx0(i)=1for i∈1,⋯,m.
  • The cost function J(θ)J(θ) has the same form
    J(θ)=12mi=1m(hθ(x)y)2J(θ)=12m∑i=1m(hθ(x)−y)2
2.1.2 Gradient Descent


  • Gradient descent for mutivariate linear Regression - Algorithm 1’

Repeat {

θj:=θjα1mi=1m(hθ(x(i))y(i))x(i)jθj:=θj−α1m∑i=1m(hθ(x(i))−y(i))xj(i)

(simultaneously update θjθj for j=0,,nj=0,⋯,n)
}
2.1.3 Practical Tricks in GD
  • Feature Scaling sisi
    • Idea : Make sure features are on a similar scale. This is because θθ will descent very quickly on small ranges, otherwise it will oscillate inefficiently down to the optimum.
    • Get every feature into approximately a 1xi1−1≤xi≤1 range (number 1 is no a necessary problem).
    • Remark : The quizzes in this course use range - the programming exercises use standard deviation.
  • Mean Normalization μiμi
    • Replace xixi with xiμixi−μi to make features have approximately zero mean (do no apply to x0=1x0=1).
    • In general, we have :
      xi:=xiμisixi:=xi−μisi

      where μiμi is the average of all the values for features(i)(i) and sisi is the range of values (max-min), or sisi is the standard deviation.
  • Learning Rate Check
    • Debug gradient descent, make a plot of iterations on x-axis, judge whether the J(θ)J(θ) converge to zeor or not :
      • If αα is too small, slow convergence
      • If αα is too large, J(θ)J(θ) may not decrease on every iteration.
    • Try to use 1×10k1×10k or 3×10k3×10k or other similar value, when judging from the plot.
    • It has been proven that if learning rate α is sufficiently small, then J(θ) will decrease on every iteration.
2.1.4 Improvement of Linear Regression
  • Feature Combination
    • Combine some features in one using a variety of methods.
  • Polynomial Regression
    hθ(x)=θ0x0+θ1xa11+θ2xa22++θnxannhθ(x)=θ0x0+θ1x1a1+θ2x2a2+⋯+θnxnan
  • Remark : One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.

2.2 Another Method for Normal Equation

2.2.1 Normal Equaltion

xi1im=x(i)0x(i)1x(i)2x(i)nRn+1,X=(x(1))T(x(2))T(x(3))T(x(m))T,Y=y(1)y(2)y(3)y(m)xi1≤i≤m=[x0(i)x1(i)x2(i)⋯xn(i)]∈Rn+1,X=[(x(1))T(x(2))T(x(3))T⋯(x(m))T],Y=[y(1)y(2)y(3)⋯y(m)]

and
θ=θ0θ1θ2θnθ=[θ0θ1θ2⋯θn]

Then the normal equation formula is given below :
θ=(XTX)1XTyθ=(XTX)−1XTy
2.2.2 Comparison of GD and NE
  • Gradient Descent
    • Need to choose alpha and iterate
    • Need learning rate
    • O(kn2)O(kn2)
    • Works well when nn is large.
  • Normal Equation
    • No need to choose alpha and iterate
    • No Need to set learning rate
    • O(n3)
    • Slow if nn is large

2.2.3 The non-invertable case of (XTX)1
  • Reason 1 : Redundant features. That is two or more features are linear dependent., delete one or more.
  • Reason 2 : Too many features. Delete some features or use “regularization”.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值