Momentum, RMSprob and Adam

Gradient Descent with Momentum

Compute exponentially weighed average of gradient, and use the gradient to update weights.

Algorithm

On iteration t:

  1. Compute dW d ⁡ W and db d ⁡ b on current minibatch
  2. VdW=β1VdW+(1β1)dW V d ⁡ W = β 1 V d ⁡ W + ( 1 − β 1 ) d ⁡ W
    Vdb=β1Vdb+(1β1)db V d ⁡ b = β 1 V d ⁡ b + ( 1 − β 1 ) d ⁡ b
  3. W=WαVdW W = W − α V d ⁡ W
    b=bαVdb b = b − α V d ⁡ b

RMSprop

On iteration t:

  1. Compute dW d ⁡ W and db d ⁡ b on current minibatch
  2. SdW=β2SdW+(1β2)dW2 S d ⁡ W = β 2 S d ⁡ W + ( 1 − β 2 ) d ⁡ W 2 (Element-wise square)
    Sdb=β2Sdb+(1β2)db2 S d ⁡ b = β 2 S d ⁡ b + ( 1 − β 2 ) d ⁡ b 2 (Element-wise square)
  3. W=WαdWSdW+ε W = W − α d ⁡ W S d ⁡ W + ε
    b=bαdbSdb+ε b = b − α d ⁡ b S d ⁡ b + ε

Adam

Adaptive Momentum

Algorithm

On iteration t:

  1. Compute dW d ⁡ W and db d ⁡ b on current minibatch
  2. VdW=β1VdW+(1β1)dW V d ⁡ W = β 1 V d ⁡ W + ( 1 − β 1 ) d ⁡ W
    Vdb=β1Vdb+(1β1)db V d ⁡ b = β 1 V d ⁡ b + ( 1 − β 1 ) d ⁡ b
    SdW=β2SdW+(1β2)dW2 S d ⁡ W = β 2 S d ⁡ W + ( 1 − β 2 ) d ⁡ W 2 (Element-wise square)
    Sdb=β2Sdb+(1β2)db2 S d ⁡ b = β 2 S d ⁡ b + ( 1 − β 2 ) d ⁡ b 2 (Element-wise square)
  3. VcorrecteddW=VdW1βt1 V d ⁡ W corrected = V d ⁡ W 1 − β 1 t
    Vcorrecteddb=Vdb1βt1 V d ⁡ b corrected = V d ⁡ b 1 − β 1 t
    ScorrecteddW=SdW1βt2 S d ⁡ W corrected = S d ⁡ W 1 − β 2 t
    Scorrecteddb=Sdb1βt2 S d ⁡ b corrected = S d ⁡ b 1 − β 2 t
  4. W=WαVdWSdW+ε W = W − α V d ⁡ W S d ⁡ W + ε
    b=bαVdbSdb+ε b = b − α V d ⁡ b S d ⁡ b + ε
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值