(WIP) Network Paradigm Fundamentals and Comparison

原创

已于 2024-07-25 11:37:58 修改 · 1.1k 阅读

20 ·

CC 4.0 BY-SA版权

文章标签：

#网络 #tcp/ip #网络协议

于 2024-07-19 10:20:48 首次发布

Preamble

Network is hard, but "AI" demands it

Network is hard. I did rather poorly in this subject during college times and in my defense I think till this very day of writing, there are very few network health providers capable of delivering thorough and accurate checkups, saying much about the complexity of the system when even experts don't fully know what they are working with.

Still, as someone interested in deep learning and its acceleration, it's impossible to not look deeper into data communication systems when we are working with >100B models. More specifically, for large computation workloads, we pull the oldest trick in CS book and cut them into parallel or pipelined pieces, and for large DNN we usually have EP, DP, PP, TP, see

Paradigms of Parallelism | Colossal-AI

and

Expert parallelism - Amazon SageMaker

for briefings. These DNN parallelisation generate different synchronization patterns and memory access workloads. For example, EP and DP have independent parallel workloads and have perhaps the highest compute to communication ratio, so they require the least bandwidth and least stringent latency, while PP and TP on the other hand, cut models and require frequent synchronization at high speed to avoid blocking tensor/vector processors.

(an even better summary of common parallelization of LLM deployment by NV: Parallelisms — NVIDIA NeMo Framework User Guide latest documentation)

Layered Communication and Main Schools

Different parallelism and subsequent memory access workloads and synchronization requirements are hard to cater to by a single network design and in practice we use a mixture of systems at different distribution

最低0.47元/天解锁文章