Andrew Ng Machine Learning notes - Course1 Week1

文章介绍了监督学习和无监督学习的区别,监督学习包括分类和回归,如垃圾邮件过滤、语音识别等,无监督学习则涉及聚类、异常检测和降维。线性回归作为监督学习的一个例子,其模型通过梯度下降法进行训练,调整权重和偏置以最小化平方误差成本函数。学习率的选择对梯度下降过程的收敛速度和能否找到全局最小值至关重要。

Course 1: Supervised Machine Learning: Regression and Classification

Week 1: Introduction to Machine Learning

supervised learning v.s. unsupervised learning

supervised learning:

algorithms that learn x to y. give your learning algorithm examples to learn from, given “right answers” (output label).
e.g.

input(X)output(Y)application
emailspam? (0/1)spam filtering
audiotext transcriptspeech recognition
EnglishSpanishmachine translation
ad, user infoclick? (0/1)online advertising
image, radar infoposition of other carsself-driving car
image of phonedefect? (0/1)visual inspection

Regression: predict a number from infinitely many possible outputs
Classification: predict categories from a small number of possible outputs

unsupervised learning:

given data that isn’t associated with any output label y, find some structure/pattern / something interesting in unlabeled data

Clustering: group similar data points together. e.g. Google news, DNA microarray, grouping customers
Anomaly Detection: find unusual data points. e.g. fraud detection
Dimensionality Reduction: compress data using fewer numbers

Regression model

Linear Regression with one variable

Notation:
xxx = “input” variable, feature
yyy = “output” variable, “target” variable
mmm = number of training examples
(x,y)(x, y)(x,y) = single training example
(x(i),y(i))(x^{(i)}, y^{(i)})(x(i),y(i)) = i-th training example

Univariate linear regression: linear regression with one variable fw,b(x)=wx+bf_{w,b}(x) = wx+bfw,b(x)=wx+b

Cost Function:
squared-error cost function
J(w,b)=12m∑i=1m(y^(i)−y(i))2J(w,b) = \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}^{(i)}-y^{(i)})^2J(w,b)=2m1i=1m(y^(i)y(i))2
where y^(i)=fw,b(x(i))\hat{y}^{(i)} =f_{w,b}(x^{(i)})y^(i)=fw,b(x(i))

bowl-shaped for squared-error cost function

Train the model with gradient descent

Gradient Descent:
repeat until convergence:
w=w−α∂∂wJ(w,b)w = w - \alpha \frac{\partial}{\partial w} J(w,b) w=wαwJ(w,b) b=b−α∂∂bJ(w,b)b = b - \alpha \frac{\partial}{\partial b} J(w,b)b=bαbJ(w,b)
where α\alphaα is the learning rate
Note: simultaneously update www and bbb. simultaneously means that you calculate the partial derivatives for all the parameters before updating any of the parameters.

在这里插入图片描述
Choosing a different starting point (even just a few steps away from the original starting point), may leading to the reached local minimum different.

在这里插入图片描述

Learning Rate:
if α\alphaα is too small, gradient descent will work but may be slow.
if α\alphaα is too large, gradient descent may overshoot and never reach minimum. May fail to converge, and even diverge

If already at a local minimum, gradient descent leaves www unchanged (since slope=0).

Gradient descent can reach local minimum with fixed learning rate. Because: as we get nearer a local minimum, gradient descent will automatically take smaller steps, since derivative automatically gets smaller.

Gradient Descent for Linear Regression:
w=w−α∂∂wJ(w,b)w = w - \alpha \frac{\partial}{\partial w} J(w,b) w=wαwJ(w,b)b=b−α∂∂bJ(w,b)b = b - \alpha \frac{\partial}{\partial b} J(w,b)b=bαbJ(w,b)
where
∂∂wJ(w,b)=1m∑i=1m(fw,b(x(i))−y(i))x(i)\frac{\partial}{\partial w} J(w,b) = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)}wJ(w,b)=m1i=1m(fw,b(x(i))y(i))x(i)∂∂bJ(w,b)=1m∑i=1m(fw,b(x(i))−y(i))\frac{\partial}{\partial b} J(w,b) = \frac{1}{m} \sum_{i=1}^{m} (f_{w,b}(x^{(i)}) - y^{(i)})bJ(w,b)=m1i=1m(fw,b(x(i))y(i))
在这里插入图片描述
Squared-error cost function is a convex function, which has a single global minimum, because of the bowl shape. So as long as your learning rate is chosen appropriately, it will always converge to the global minimum.

“Batch” gradient descent: each step of gradient descent uses all the training examples.

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值