Machine Learning Wu Enda2

本文介绍线性回归的基本概念及其在房价预测中的应用,并深入探讨了梯度下降算法的工作原理,包括如何通过成本函数来调整参数以实现模型优化。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

chapter 6 Model representation

liner regression. this chapter see what the model looks like and what the overall process of supervised learning looks likes.

supervised learning ,has a data set ,called training set.
这里写图片描述
m =Number of traning examples
x’s = “input” variable/features
y’s =”output” variable/”target” variable
(x,y) - a single training example
(xi,yi) - ith training example

这里写图片描述
h - hypothesis
Housing price prediction model called linear regression, linear regression with one variable(univariate linear regression )

chapter 7 Cost function

-Housing price prediction
hypothesis:hθ=θ0+θ1xhθ=θ0+θ1x
θiθi’s:Parameter
choose θ0θ0,θ1θ1 so that hθ(x)hθ(x) is close to y for our training examples (x,y)

J(θ0,θ1)=i=1m(hθ(xi)yi)2J(θ0,θ1)=∑i=1m(hθ(xi)−yi)2

Goal: minmize(θ0,θ1)J(θ0,θ1)minmize(θ0,θ1)J(θ0,θ1),
J(θ0,θ1)J(θ0,θ1)is Cost function or Squate error cost function

chapter 8 Cost function intuition 1

give some example to get back to intuition about what the cost function is doing and why we use it .
这里写图片描述
look up some plots to understand the cost function ,to do so ,we simplify the algorithm,so that it only had one parameter theta one.

chapter 10 Gradient descent

it is taking about gradient descent for minimizing some arbitrary function J.

Have some function J(θ0,θ1)J(θ0,θ1)
Want minθ0,θ1J(θ0,θ1)minθ0,θ1J(θ0,θ1)
Outline:

  • start with some θ0,θ1θ0,θ1.
  • keep changing θ0,θ1θ0,θ1 to reduce J(θ0,θ1)J(θ0,θ1) until we hopefully end up to a minimum.

Gradient descent algorithm:

repeat until convergence {

θj:=θjaθjJ(θ0,θ1)θj:=θj−a∂∂θjJ(θ0,θ1) (for j=0 and j=1)

}

a- called the learning rate,it basically controls how big a step we take downhill with gradient descent.

θj∂∂θj - it is a derivative term

simultaneously update :

temp0:=θ0aθ0J(θ0,θ1)temp0:=θ0−a∂∂θ0J(θ0,θ1)

temp1:=θ1aθ1J(θ0,θ1)temp1:=θ1−a∂∂θ1J(θ0,θ1)

θ0:=temp0θ0:=temp0

θ1:=temp1θ1:=temp1

in the next chapter ,we’re going to go into the details of the derivative term.which it wrote out but didn’t really define.

chapter 11 Gradient descent intuition

get better intuition about what the algorithm is doing ,and why the steps of the gradient descent algorithm might make sense.

if aa is too small,gradient descent can be slow.

if a is too large,gradient descent can overshoot the minimum.it may fail to converge or even diverge.

if you’re already at a local optimum,one step of gradient descent does absolutely nothing.It doesn’t change parameter.cause it keeps your solution at the local optimum.

Gradient descent can converge to a local minimum,even with the learning rate aa fixed.

θj:=θjaθjJ(θ0,θ1)

As we approach a local minimum.gradient descent will automatically take smaller steps.So,no need to decrease aa over time.

derivative term and partial derivative

chapter 12 Gradient descent for linear regression

put together gradient descent with our cost function,and that will give us an algorithm for linear regression for fitting a straight line to our data.

Gradient descent algorithm :

the key term we need is this derivative term over here.

θjJ(θ0,θ1)=θj12mi=1m(θ0+θ1x(i)+y(i))2

j=0:θ0J(θ0,θ1)=1mmi=1(θ0+θ1x(i)+y(i))(θ0+θ1x(i)+y(i))=1mmi=1(θ0+θ1x(i)+y(i))j=0:∂∂θ0J(θ0,θ1)=1m∑i=1m(θ0+θ1x(i)+y(i))·(θ0+θ1x(i)+y(i))′=1m∑i=1m(θ0+θ1x(i)+y(i))

j=0:θ1J(θ0,θ1)=1mmi=1(θ0+θ1x(i)+y(i))(θ0+θ1x(i)+y(i))=1mmi=1(θ0+θ1x(i)+y(i))x(i)j=0:∂∂θ1J(θ0,θ1)=1m∑i=1m(θ0+θ1x(i)+y(i))·(θ0+θ1x(i)+y(i))′=1m∑i=1m(θ0+θ1x(i)+y(i))·x(i)

“Batch” Gradient Descent :

“Batch”: Each step of gradient descent uses all the training examples.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值