参考: http://ruder.io/optimizing-gradient-descent/index.html#adadeltahttps://blog.youkuaiyun.com/u010089444/article/details/76725843