Machine Learning 01 - Basic Concept

最新推荐文章于 2025-08-10 17:43:02 发布

能智工人

最新推荐文章于 2025-08-10 17:43:02 发布

阅读量290

点赞数 1

CC 4.0 BY-SA版权

分类专栏：机器学习文章标签：机器学习人工智能

本文链接：https://blog.youkuaiyun.com/ddragon1/article/details/79178546

机器学习专栏收录该内容

7 篇文章

订阅专栏

本文为Stanford大学吴恩达教授机器学习课程的学习笔记，涵盖了机器学习定义、常见类型如监督学习与非监督学习，以及模型表示、代价函数等核心概念，并详细介绍了梯度下降算法的工作原理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

最近开始学习Stanford吴恩达的机器学习课程，常做笔记，以便复习巩固。
鄙人才疏学浅，如有错漏与想法，还请多包涵，指点迷津。

Week 01

Introduction

Application of machine learing
- Database mining
- Applications can’t program by hand
- Customizing programs
Definition
- Arthur Samuen. Field of study that gives computers the ability to learn without being explicitly programmed.
- Tom Mitchell. A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T as measured by P improves with experience E.
Common Types
- Supervised Learning
  - Given the “right answer” for each example in the data.
  - Regreession Problem : Predict real-value output.
  - Classification : Predict discrete output.
- Unsupercised learning
  - Unsupervised learning allows us to approach problems with little or no idea what the effect of the variables is.
  - Clustering、Non-clustering

Model and Cost Function

Basic Model Representation
- number of training data - m
- Input - x
- Output - y
- Input space - X
- Output space - Y
- Hypothesis - h:X->Y
Cost Function
- Cost function measure the accuracy of our hypothesis function, for example :
  $J (θ 0, θ 1) = 1 2 m \sum i = 1 m (y i^- y i) 2 = 1 2 m \sum i = 1 m (h θ (x i) - y i) 2$ $J(\theta _{0},\theta _{1})=\frac{1}{2m}\sum_{i=1}^{m}(\hat{y_{i}}-y_{i})^{2}=\frac{1}{2m}\sum_{i=1}^{m}(h_{\theta }(x_{i})-y_{i})^{2}$
Contour plot
- A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value of the same line.

Parameter Learning

Outline
- Start with some $\theta _{0}, \theta _{1}$
- Keep changing $\theta _{0}, \theta _{1}$ to reduce $J(\theta _{0}, \theta _{1})$ until we hopefully end up at a minimum.
Algorithm

repeat until convergence {

$θ j : = θ j - α \partial \partial θ 0 J (θ 0, θ 1, . . ., θ n) (for j = 0, 1, ..., n)$ $\theta _{j}:=\theta _{j}-\alpha \frac{\partial }{\partial \theta _{0}}J(\theta _{0},\theta _{1}, ..., \theta_{n})\qquad\text{(for j = 0, 1, ..., n)}$
}
simultaneous update {
$\qquad$ temp0:= $\theta _{0}-\alpha \frac{\partial }{\partial \theta _{0}}J(\theta _{0},\theta _{1}, ..., \theta_{n})$
$\qquad$ …
$\qquad$ tempn:= $\theta _{n}-\alpha \frac{\partial }{\partial \theta _{0}}J(\theta _{0},\theta _{1}, ..., \theta_{n})$
$\qquad$ $\theta _{0}$ :=temp0
$\qquad$ …
$\qquad$ $\theta _{n}$ :=tempn
}
- Comprehension
  - It’s like going down the hill in the fastest way. The differential give us a direction to move towards, and the $\alpha$ (which is called learning rate) means the size of each step.
  - As we approach the bottom of our convex function, the derivative will tend to be 0, and at the bottom we have $\theta _{1}:=\theta _{1}-\alpha \times 0$ .
Gradient descent for linear regression - Algorthm1

repeat until convergence (simultaneously update){
$\qquad$ $\theta _{0}:=\theta _{0}-\alpha\frac{1}{m}\sum_{m}^{i=1}(h_{\theta }(x^{(i)})-y^{i})$
$\qquad$ $\theta _{1}:=\theta _{1}-\alpha\frac{1}{m}\sum_{m}^{i=1}(h_{\theta }(x^{(i)})-y^{i})$
}
- This method looks at every example in the entire training set on every step, and is called batch gradient descent.