机器学习2.1

最新推荐文章于 2022-06-18 21:05:44 发布

Dove_forehead

最新推荐文章于 2022-06-18 21:05:44 发布

阅读量294

点赞数

CC 4.0 BY-SA版权

分类专栏：机器学习文章标签： 2018-03-19

本文链接：https://blog.youkuaiyun.com/Dove_forehead/article/details/79613712

机器学习专栏收录该内容

9 篇文章

订阅专栏

本文介绍了多变量线性回归的概念及应用，详细解释了如何使用梯度下降法进行参数优化，包括特征缩放和学习率的选择等技巧。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

多变量的线性回归——Linear Regresssion with Multiple Variables

多变量线性回归——Multivariant Linear Regression

多特征——Multiple Feature

Notation

$n$ = number of features.

$x^{(i)}$ = input of $i^{th}$ training example.

$x_j^{(i)}$ = value of feature j in $i^{th}$ training example.
Hypothesis

$P r e v i o u s l y : h θ (x) = θ 0 + θ 1 x$ $Previously:h_\theta(x)= \theta_0 + \theta_1 x$
$N o w : h θ (x) = θ 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n$ $Now: h_\theta(x)= \theta_0 + \theta_1x_1 + \theta_2x_2 + ... +\theta_nx_n$

为了符号的收敛，定义 $x_0 = 1$ 即 $(x_0^{(i)}=1)$

$x = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ x 0 x 1 x 2 . . . x n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ \in R n + 1, θ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ θ 0 θ 1 θ 2 . . . θ n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥$ $x = \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ .\\.\\.\\x_n\\ \end {bmatrix} \in R^{n+1}, \space \space \space \theta= \begin{bmatrix}\theta_0\\\theta_1\\\theta_2\\.\\.\\.\\\theta_n \end{bmatrix}$
$h θ (x) = [θ 0 θ 1 θ 2 . . . θ n] ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ x 0 x 1 x 2 . . . x n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥$ $h_\theta(x)= \begin{bmatrix}\theta_0&\theta_1&\theta_2&.&.&.&\theta_n\end{bmatrix}\begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ .\\.\\.\\x_n\\\end{bmatrix}$
$= θ T x$ $= \theta^T x$

so the hypothsis can be writen:

$h_\theta(x) = \theta_0x_0 + \theta_1x_1 + ... + \theta_nx_n$

$= \theta^T x$

Multivariate Linear Regression

多元变量的梯度下降——Gradient Descent for Multiple Variables

寻找参数使得cost Function收敛：

repeat until convergence:{

$\theta_0 := \theta_0 - \alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)}) \cdot x_0^{(i)}$
$\theta_1 := \theta_0 - \alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)}) \cdot x_1^{(i)}$
$\theta_2 := \theta_0 - \alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)}) \cdot x_2^{(i)}$

…
}

简单来说：

repeat until convergence：{

$\theta_j := \theta_0 - \alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)}) \cdot x_j^{(i)} \space \space for j := 0...n$
}

梯度下降实用技巧（特征缩放)——Gradient Descent in Practice (Feature Scaling)

一般情况下，特征值相差不大的情况下，梯度下降会找到最近的路径得到最优值

特征缩放或者均值归一化（Mean Normalization）：

$x_i := \frac{x_i - \mu_i}{s_i}$

其中 $\mu_i$ 是第i个特征的平均值， $s_i$ 是值域（最大值-最小值）

例如：
如果 $x_i$ 表示房价，房价为100-2000，平均数为1000，则将房价输入重新赋值为：
$x_i := \frac{price-1000}{1900}$

特征下降实用技巧（学习率）——Gradient Descent in Practice（Learning rate）

目的：
Gradient Descent：

$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j}J(\theta)$

“Debugging”: How to mark sure gradient descent is working correctly.
How to choose learning rate $\alpha$