吴恩达机器学习笔记（二）（附编程作业链接）

最新推荐文章于 2022-05-21 20:31:19 发布

蚍蜉_

最新推荐文章于 2022-05-21 20:31:19 发布

阅读量3k

点赞数 2

CC 4.0 BY-SA版权

分类专栏：机器学习文章标签：机器学习优化

本文链接：https://blog.youkuaiyun.com/allen_li123/article/details/78768982

机器学习专栏收录该内容

14 篇文章

订阅专栏

这篇博客介绍了吴恩达机器学习课程的第二部分，主要涵盖逻辑回归的概念，包括逻辑函数、决策边界、代价函数及其简化形式、梯度下降法以及优化算法。还讨论了过拟合问题，提出了正则化的代价函数和解决过拟合的策略，如正则化线性回归和逻辑回归。最后，博主提供了编程作业的下载链接。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

吴恩达机器学习笔记（二）

标签：机器学习

吴恩达机器学习笔记二

一.逻辑回归（logistic regression）

1.逻辑函数&&S型函数(logistic function and sigmoid function)

线性回归的假设表达式不试用于仅有0,1两种结果的分类表达，将表达式简单修改为逻辑函数也叫S型函数如下：

h θ (x) = g (θ T x) z = θ T x g (z) = 1 1 + e - z

$\begin{align*}& h_\theta (x) = g ( \theta^T x ) \newline \newline& z = \theta^T x \newline& g(z) = \dfrac{1}{1 + e^{-z}}\end{align*}$

该函数的函数图像如下

S型函数

在预测时输入x变量所得的g(z)即结果为1的概率值

2.决策边界（decision boundary）

h θ (x) \geq 0.5 \to y = 1 h θ (x) < 0.5 \to y = 0

$\begin{align*}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < 0.5 \rightarrow y = 0 \newline\end{align*}$
在S型函数中若y大于0.5边界则x必定大于0，于是：

h θ (x) = g (θ T x) \geq 0.5 w h e n θ T x \geq 0

$\begin{align*}& h_\theta(x) = g(\theta^T x) \geq 0.5 \newline& when \; \theta^T x \geq 0\end{align*}$
所以可以推出以下结论！

θ T x \geq 0 \Rightarrow y = 1 θ T x < 0 \Rightarrow y = 0

$\begin{align*}& \theta^T x \geq 0 \Rightarrow y = 1 \newline& \theta^T x < 0 \Rightarrow y = 0 \newline\end{align*}$

3.代价函数(cost function)

分类问题的代价函数与回归问题的代价函数有一定的区别如下：

J (θ) = 1 m \sum i = 1 m C o s t (h θ (x (i)), y (i)) C o s t (h θ (x), y) = - log (h θ (x)) C o s t (h θ (x), y) = - log (1 - h θ (x)) if y = 1 if y = 0

$\begin{align*}& J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; & \text{if y = 1} \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; & \text{if y = 0}\end{align*}$
当y=1时函数图像如下
这里写图片描述

当y=0时函数图像如下
这里写图片描述

4.代价函数的简化(Simplified Cost Function)

$\mathrm{Cost}(h_\theta(x),y)$ 可以写作：

C o s t (h θ (x), y) = - y log (h θ (x)) - (1 - y) log (1 - h θ (x))

$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$
从而得到简化的代价函数：

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))]

$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$

将其表达为矢量表达为：

h = g (X θ) J (θ) = 1 m \cdot (- y T log (h) - (1 - y) T log (1 - h)) (重 要)

$\begin{align*} & h = g(X\theta)\newline & J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right)\tag{重要} \end{align*}$

5.梯度下降(Gradient Descent)

将其代价函数应用到梯度下降算法中为：

R e p e a t {θ j : = θ j - α m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j}

$\begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align*}$

矢量化表达为：

θ : = θ - α m X T (g (X θ) - y ⃗) (重 要)

$\theta := \theta - \frac{\alpha}{m} X^{T} (g(X \theta ) - \vec{y})\tag{重要}$

6.更快的优化算法

“Conjugate gradient”, “BFGS”, and “L-BFGS”
这些算法都不需要手动选择学习速率，并且比梯度下降速度更高效，但是也更复杂。

使用方法：
1.先计算出J值与梯度值 $\begin{align*} & J(\theta) \quad与\dfrac{\partial}{\partial \theta_j}J(\theta)\end{align*}$
2.写一个代价函数返回J值与梯度值：

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

3.使用fminunc()优化算法 (无约束非线性规划函数)

options = optimset('GradObj', 'on', 'MaxIter', 100);    %100表示迭代次数
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);
%optTheta是最后迭代出的theta，functionVal是J的最小值exitFlag返回是否收敛

7.多元分类问题

将n元问题分成n个二元问题，然后在对这个二元问题进行预测
这里写图片描述

y \in {0, 1 . . . n} h (0) θ (x) = P (y = 0 | x; θ) h (1) θ (x) = P (y = 1 | x; θ) \dots h (n) θ (x) = P (y = n | x; θ) p r e d i c t i o n = max i (h (i) θ (x))

$\begin{align*}& y \in \lbrace0, 1 ... n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}$

二.过拟合(Overfitting)

1.基本概念

这里写图片描述

左边第一副图称作欠拟合(underfitting), 特征太少导致并没有很好的拟合数据，cost function非常大
中间一幅图拟合的很好
而右边一幅图称作过拟合(overfitting)，特征太多导致过分的拟合，cost function虽然非常小但是不符合实际情况

解决过度拟合问题有两种方法：
1.减少特征数量

手动选择哪些特征需要保留
使用一个模型选择算法

2.使用正则化

保留所有的特征，减少参数 $\theta$ 的数量
当有很多有用的特征时使用的很好

2.正则化的代价函数(cost function)

将前面的代价函数加上一个惩罚(penalize)使之尽量少的增加变量，添加后的代价函数如下：

m i n θ 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2 + λ \sum j = 1 n θ 2 j

$min_\theta\ \dfrac{1}{2m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2$

λ $\lambda$ 是一个 正则化参数(regularization parameter)，决定了惩罚的大小
通过正则化可以使假设函数更加平滑并且减少过拟合，但是如果

λ $\lambda$ 太大则会出现欠拟合。

3.正规化的线性回归(Regularized Linear Regression)

1.修改梯度下降方程、

Repeat {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j - α [(1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j) + λ m θ j]} j \in {1, 2... n}

$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$
注：将

θ0 $\theta_0$ 与

θi $\theta_i$ 分开因为我们一般不惩罚

θ0 $\theta_0$

$\theta_i$ 可以写做：

θ j : = θ j (1 - α λ m) - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$

2.正则方程

θ = (X T X + λ \cdot L) - 1 X T y where L = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ 011 ⋱ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ (重 要)

$\begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\tag{重要}\end{bmatrix}\end{align*}$
注：L是(n+1)*(n+1)维的

4.正则化的逻辑回归(Regularized Logistic Regression）

加上正则化的代价函数如下:

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))] + λ 2 m \sum j = 1 n θ 2 j

$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$
然后通过梯度下降不断更新

θi $\theta_i$ 获得代价函数。

3.总结重要的几个式子

1.逻辑回归的代价函数的矢量表达：

h = g (X θ) J (θ) = 1 m \cdot (- y T log (h) - (1 - y) T log (1 - h)) (重 要)

$\begin{align*} & h = g(X\theta)\newline & J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right)\tag{重要} \end{align*}$
2.逻辑回归梯度下降的矢量表达：