这篇文章最早发布在2007年,由 Alon Halevy, Peter Norvig, and Fernando Pereira from Google published a paper entitled The Unreasonable Effectiveness of Data, 强调了在NLP中样本数据的作用。
之后,a team at Google published a revisit of the original paper entitled Revisiting the Unreasonable Effectiveness of Data, where they address the effect of data in deep learning.
In their paper, they validate the following hypothesis:.
(1) 大规模数据有助于表征学习。
(2) 性能根据训练数据量呈对数增长。
(3) 模型容量至关重要。
另外 还应该补充的是优化策略的影响。
参考文章:
[1]、https://miguelgfierro.com/blog/2019/revisiting-the-revisit-of-the-unreasonable-effectiveness-of-data/