数值梯度(Numerical Gradient)
数值梯度是对梯度的估计值,数值梯度在基于梯度下降的学习任务中可以用来检测计算梯度的代码是否正确,尽管当前而言各种autodiff框架早已保证了梯度的准确性,但是我就是想写你管我啊
常见的数值梯度形式有以下两种,利用泰勒展式可以证明其精度
Difference quotient1
f
(
x
+
h
)
−
f
(
x
)
h
\dfrac{f(x+h)-f(x)}{h}
hf(x+h)−f(x)
其误差为
O
(
h
)
O(h)
O(h),证明过程如下2
f
(
x
+
h
)
=
f
(
x
)
+
f
′
(
x
)
h
+
O
(
h
2
)
⇒
f
′
(
x
)
h
=
f
(
x
+
h
)
−
f
(
x
)
+
O
(
h
2
)
⇒
f
′
(
x
)
=
f
(
x
+
h
)
−
f
(
x
)
h
+
O
(
h
)
\begin{aligned} &f(x+h)=f(x)+f'(x)h+O(h^2)\\ \Rightarrow \ &f'(x)h=f(x+h)-f(x)+O(h^2)\\ \Rightarrow \ &f'(x)=\dfrac{f(x+h)-f(x)}{h}+O(h) \end{aligned}
⇒ ⇒ f(x+h)=f(x)+f′(x)h+O(h2)f′(x)h=f(x+h)−f(x)+O(h2)f′(x)=hf(x+h)−f(x)+O(h)
Symmetric difference quotient
f
(
x
+
h
)
−
f
(
x
−
h
)
2
h
\dfrac{f(x+h)-f(x-h)}{2h}
2hf(x+h)−f(x−h)
其误差为
O
(
h
2
)
O(h^2)
O(h2),证明过程如下2
f
(
x
+
h
)
=
f
(
x
)
+
f
′
(
x
)
h
+
f
′
′
(
x
)
h
2
+
O
(
h
3
)
f
(
x
−
h
)
=
f
(
x
)
−
f
′
(
x
)
h
+
f
′
′
(
x
)
h
2
+
O
(
h
3
)
f
(
x
+
h
)
−
f
(
x
−
h
)
=
2
f
′
(
x
)
h
+
O
(
h
3
)
⇒
f
′
(
x
)
=
f
(
x
+
h
)
−
f
(
x
−
h
)
2
h
+
O
(
h
2
)
\begin{aligned} &f(x+h)=f(x)+f'(x)h+f''(x)h^2+O(h^3)\\ &f(x-h)=f(x)-f'(x)h+f''(x)h^2+O(h^3)\\ &f(x+h)-f(x-h)=2f'(x)h+O(h^3)\\ \Rightarrow \ &f'(x)= \dfrac{f(x+h)-f(x-h)}{2h}+O(h^2) \end{aligned}
⇒ f(x+h)=f(x)+f′(x)h+f′′(x)h2+O(h3)f(x−h)=f(x)−f′(x)h+f′′(x)h2+O(h3)f(x+h)−f(x−h)=2f′(x)h+O(h3)f′(x)=2hf(x+h)−f(x−h)+O(h2)
关于大O记号
数学上,当存在
L
L
L
f
(
x
)
g
(
x
)
≤
L
(
x
→
x
0
)
\dfrac{f(x)}{g(x)}\leq L \ (x\rightarrow x_0)
g(x)f(x)≤L (x→x0)
时,记
f
(
x
)
=
O
(
g
(
x
)
)
f(x)=O(g(x))
f(x)=O(g(x))
本文深入探讨数值梯度的概念,它是对梯度的估计值,在基于梯度下降的学习任务中用于验证梯度计算的准确性。文章详细介绍了两种常见形式:差商和对称差商,并通过泰勒展式证明了它们的精度。
2745





