An Information-Theoretic View for Deep Learning
作者:Jingwei Zhang, Tongliang Liu, Dacheng Tao
The University of Sydney, NSW, Australia
发布时间:3 May 2018
1. Abstract and Introduction
Deep Learning 的两个关键问题:
- 为什么越深泛化能力越好?
- 是不是总是越深,网络表现越好
文章的核心结论公式:
E[R(W)−RS(W)]≤exp(−L2log(1η))2σ2nI(S,W)−−−−−−−−−−√E[R(W)−RS(W)]≤exp(−L2log(1η))2σ2nI(S,W)
符号含义:
- E[R(W)]E[R(W)]是the expected risk: E[R(W)]=EZ D[l(W,Z)]E[R(W)]=EZ D[l(W,Z)]
- ZZ为数据, 为数据分布,WW为训练好的网络(a hypothesis)
- 是the empirical risk: E[<