cs231n 学习过程 问题记录

本文探讨了数据预处理过程中将数据集均值调整为零的重要性。通过分析梯度下降过程中的权重更新,解释了为何这有助于避免训练过程中的不必要震荡,并减轻权重符号单一化的问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

(1)数据预处理时,将数据集转化为0均值的原因:

若数据均大于0,则梯度方向传播时,会均为正或者负,则导致最后权值均为正或负。

但实际使用中由于BATCHsize的存在,在更新权值时,所有数据的梯度相加,可能会使 权值有多重符号。

This has implications on the dynamics during gradient descent, because if the data coming into a neuron is always positive (e.g. x>0x>0 elementwise in f=wTx+bf=wTx+b)), then the gradient on the weights ww will during backpropagation become either all be positive, or all negative (depending on the gradient of the whole expression ff). This could introduce undesirable zig-zagging dynamics in the gradient updates for the weights. However, notice that once these gradients are added up across a batch of data the final update for the weights can have variable signs, somewhat mitigating this issue. Therefore, this is an inconvenience but it has less severe consequences compared to the saturated activation problem above.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值