Gradient Descent Intuition

最新推荐文章于 2025-12-05 16:19:27 发布

转载最新推荐文章于 2025-12-05 16:19:27 发布 · 195 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：https://www.coursera.org/learn/machine-learning/supplement/QKEdR/gradient-descent-intuition

文章标签：

#深度学习 #人工智能

AI 专栏收录该内容

8 篇文章

订阅专栏

该博客探讨了使用单参数θ1进行梯度下降的情况，通过公式展示其成本函数的迭代过程。无论导数的正负，θ1都会收敛到最小值。当导数为负时，θ1增加；为正时，θ1减小。同时强调了调整学习率α的重要性，以确保算法在合理时间内收敛。在最小值处，导数为0，θ1更新为自身减去零，即保持不变。

In this video we explored the scenario where we used one parameter $θ_1$ and plotted its cost function to implement a gradient descent. Our formula for a single parameter was :

Repeat until convergence:

$θ1:=θ1−αddθ1J(θ1)θ1:=θ_1−α\dfrac {d}{dθ_1}J(θ_1)$

Regardless of the slope’s sign for $ddθ1J(θ1)\dfrac {d}{dθ_1}J(θ_1)$ , $θ_1$ eventually converges to its minimum value. The following graph shows that when the slope is negative, the value of $θ_1$ increases and when it is positive, the value of $θ_1$ decreases.
在这里插入图片描述

On a side note, we should adjust our parameter $α$ to ensure that the gradient descent algorithm converges in a reasonable time. Failure to converge or too much time to obtain the minimum value imply that our step size is wrong.
请添加图片描述

How does gradient descent converge with a fixed step size $α$ ?

The intuition behind the convergence is that $ddθ1J(θ1)\dfrac {d}{dθ_1}J(θ_1)$ approaches 0 as we approach the bottom of our convex function. At the minimum, the derivative will always be 0 and thus we get: