相关视频:https://www.youtube.com/watch?v=eindNx4Uk-M
Scaling Laws for Neural Language Models
entropy来表示性能;模型参数、数据量、计算量。
Training Compute-Optimal Large Language Models
模型大小与训练数据等比例扩大
Beyond neural scaling laws: beating power law scaling via data pruning
删除简单样本
可以从幂律过渡到指数
scaling laws and interpretability of learning from repeated data
scaling data-constrained language models
重复使用数据