https://distill.pub/2017/momentum/ ***why momentum really works
https://www.zhihu.com/question/32673260 ***深度机器学习中的batch的大小对学习效果有何影响?
https://www.zhihu.com/question/64134994/answer/216895968 ***如何理解深度学习分布式训练中的large batch size与learning rate的关系?
https://zhuanlan.zhihu.com/p/27555858 ***A spectral approach
https://github.com/thomasyao3096/Batch_Adaptive_Framework ***Batch ada framework