(WIP) Network Paradigm Fundamentals and Comparison

Preamble

Network is hard, but "AI" demands it

Network is hard. I did rather poorly in this subject during college times and in my defense I think till this very day of writing, there are very few network health providers capable of delivering thorough and accurate checkups, saying much about the complexity of the system when even experts don't fully know what they are working with.

Still, as someone interested in deep learning and its acceleration, it's impossible to not look deeper into data communication systems when we are working with >100B models. More specifically, for large computation workloads, we pull the oldest trick in CS book and cut them into parallel or pipelined pieces, and for large DNN we usually have EP, DP, PP, TP, see 

Paradigms of Parallelism | Colossal-AI

and 

Expert parallelism - Amazon SageMaker

for briefings. These DNN parallelisation generate different synchronization patterns and memory access workloads. For example, EP and DP have independent parallel workloads and have perhaps the highest compute to communication ratio, so they require the least bandwidth and least stringent latency, while PP and TP on the other hand, cut models and require frequent synchronization at high speed to avoid blocking tensor/vector processors.

(an even better summary of common parallelization of LLM deployment by NV: Parallelisms — NVIDIA NeMo Framework User Guide latest documentation)

Layered Communication and Main Schools

Different parallelism and subsequent memory access workloads and synchronization requirements are hard to cater to by a single network design and in practice we use a mixture of systems at different distribution

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值