
2023-09论文阅读集合
https://github.com/inpluslab-wuhui/Systems-for-Foundation-Models 读论文,论文笔记分享
陈超帅-大模型Agent
好的开始成功了一半,不好的开始成功了1/3,尽管开始吧!
展开
-
15.ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning
ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning该论文提出了一种突破GPU内存限制的方法,用于极大规模深度学习任务。原创 2023-09-09 02:06:24 · 193 阅读 · 0 评论 -
14.Chimera: efficiently training large-scale neural networks with bidirectional pipelines
Chimera: efficiently training large-scale neural networks with bidirectional pipelines 阅读笔记转载 2023-09-09 02:03:57 · 183 阅读 · 0 评论 -
13.Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM 阅读笔记转载 2023-09-09 02:00:26 · 234 阅读 · 0 评论 -
12.ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models 阅读笔记原创 2023-09-09 01:58:59 · 237 阅读 · 0 评论 -
8.Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding 阅读笔记转载 2023-09-09 01:03:16 · 80 阅读 · 0 评论 -
5.Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling 阅读笔记转载 2023-09-09 00:56:33 · 92 阅读 · 0 评论 -
3.GPipe: efficient training of giant neural networks using pipeline parallelism
[pipeline parallelism] GPipe: efficient training of giant neural networks using pipeline parallelism 阅读笔记转载 2023-09-09 00:51:19 · 71 阅读 · 0 评论 -
1.FasterMoE:Modeling and Optimizing Training of Large-Scale Dynamic Pre-Trained Models
[distributed MoE model training] FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models 阅读笔记转载 2023-09-09 00:39:54 · 181 阅读 · 0 评论