系统优化
文章平均质量分 68
HPC_C
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
DistServe: Disaggragating Prefill and Decode for Goodput-optimizated Large Language Model Serving
This paper mainly contributes the following 2 points:So the core logic the algorithm.原创 2025-12-27 18:09:56 · 134 阅读 · 0 评论 -
TileLang: A Composable Tiled Programming Model for AI system
TileLang closely resembles TVM: it cleanly separates the scheduling space—thread binding, layout, tensorization, and pipelining—from the pure data-flow description, and it exposes the same knobs through the Python API shown in Fig. 1.As illustrated in Fig.原创 2025-12-20 12:10:23 · 205 阅读 · 0 评论 -
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
TVM can process graph-level and operator-level optimization.As for the graph-level optimization, it can do operator fusion, constant folding, static memory pre-allocation, and data transformate pass.Now I want emphasis operator fusion, It split the operato原创 2025-12-14 23:40:51 · 695 阅读 · 0 评论 -
Megatron-LM: Training multi-billion parameter Language Model using model parallelism
This paper proposed a very simple method for parallel computation of large matrices. The following graph describes the core logic:A is splited by column, and B is splited by rows. Each shard of XA in MLP ,and each group consisting of Q1, K1, V1, is placed原创 2025-12-07 23:13:53 · 271 阅读 · 0 评论 -
Triton: An Intermediate Language and Compiler for tiled Netural Network Computation
The Triton is composed of 3 parts: Triton-C, Trion-IR and Triton JIT.1.1. The get_global_range(axis) is associated with kernel, and it is shown in the above graph.1.2. broadcastThe structure of Trion-IR is similar with MLIR. Both includes modules, func原创 2025-11-22 17:20:32 · 423 阅读 · 0 评论 -
MLIR: A Compiler Infrastructure for the End of Moore‘s Law
【代码】MLIR: A Compiler Infrastructure for the End of Moore‘s Law。原创 2025-11-16 21:35:22 · 341 阅读 · 0 评论 -
SGLang: Efficient Execution of Structured Language Model Programs
I think there are 3 advantages in SGLang. It allows direct programing in python, it suuport RadixAttention to effeicient KVCache reuse, and it used compressed finite state machine to accelerate the structured output.Reuse the KVCache with the same prompts.原创 2025-11-08 14:51:06 · 348 阅读 · 0 评论 -
Mooncake: A KVCache-centric disaggregated Architecture for LLM serving
Kimi is the most frequently used large language model tool by me. It is faster than other tools, like deepseek. So I decide to read this paper to figure out the magic behind its chat interface.As a MaaS provider, there are a lot of constraints, limited re原创 2025-09-01 00:12:16 · 636 阅读 · 0 评论 -
一文读懂对象池
建立对象池,可以避免对象的频繁构造与析构,同时防止内存碎片。原创 2024-02-05 18:16:30 · 484 阅读 · 1 评论 -
详解eigen中的表达式模板
通过参考eigen,实现了自己的一个表达式模板,目前可以完成加法和减法运算,并预留了其他运算的接口。主要优化技巧有:1、head only:只有头文件,使用方便;2、运用栈存储元素;3、运用表达式模板进行lazy compute,减少访存和临时变量生成;4、对赋值进行循环展开。原创 2024-03-05 16:35:33 · 1220 阅读 · 1 评论
分享