先把这两篇极好的资料放上来,等周末再整理吧。 weight decay: https://stats.stackexchange.com/questions/29130/difference-between-neural-net-weight-decay-and-learning-rate batch normalization: https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html