CMU 11-785 L03.5 A brief note on derivatives

最新推荐文章于 2024-02-22 00:51:55 发布

原创最新推荐文章于 2024-02-22 00:51:55 发布 · 272 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#神经网络 #数据挖掘 #机器学习

CMU 11-785 专栏收录该内容

22 篇文章

订阅专栏

这篇博客介绍了导数的概念，它揭示了函数微小变化如何影响输出。讨论了多元标量函数的偏导数，并解释了梯度如何表示函数增量。在优化方面，讨论了单变量和多变量函数的临界点，梯度方向与函数增益的关系，以及无约束最小化问题中使用Hessian矩阵判断局部极值的方法。最后，提到了梯度下降法在凸函数和非凸函数上的收敛性质。

What is derivatives?

A derivative of a function at any point tells us how much a minute increment to the argument of the function will increment the value of the function

To be clear, what we want is not differentiable, but how the change effects the outputs.

Based on the fact that at a fine enough resolution, any smooth, continuous function is locally linear at any point. So we can express like this
$\Delta y=\alpha \Delta x$

Multivariate scalar function

$\Delta y=\alpha_{1} \Delta x_{1}+\alpha_{2} \Delta x_{2}+\cdots+\alpha_{D} \Delta x_{D}$

The partial derivative $αi\alpha_i$ gives us how $y$ increments when only $x_i$ is incremented
It can be expressed as:
$\Delta y=\nabla_{x} y \Delta x$
where

$\nabla_{\mathrm{x}} y=\left[\frac{\partial y}{\partial x_{1}} \quad \cdots \quad \frac{\partial y}{\partial x_{D}}\right]$

Optimization

Single variable

在这里插入图片描述

Three different critical point with zero derivative
The second derivative is
- $≥0\ge 0$ at minima
- $≤0\le 0$ at maxima
- $= 0$ at inflection points

multiple variables

$f(X)=\nabla_{X} f(X) d X$

The gradient is the transpose of the derivative $∇Xf(X)T\nabla_{X} f(X)^{T}$ (give us the change in $f (x)$ for tiny variations in $X$ )
This is a vector inner product
- $d f (x)$ is max if $d X$ is aligned with $∇Xf(X)T\nabla_{X} f(X)^{\mathrm{T}}$
- $∠(∇Xf(X)T,dX)=0\angle\left(\nabla_{X} f(X)^{\mathrm{T}}, d X\right)=0$
The gradient is the direction of fastest increase in $f (x)$
Hessian

在这里插入图片描述

Unconstrained Minimization of function

Solve for $Z$ where the derivative equals to zero:

$\nabla_{X} f(X)=0$

Compute the Hessian Matrix at the candidate solution and verify that
- Hessian is positive definite (eigenvalues positive) -> to identify local minima
- Hessian is negative definite (eigenvalues negative) -> to identify local maxima