算法
文章平均质量分 54
HPC_C
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
DistServe: Disaggragating Prefill and Decode for Goodput-optimizated Large Language Model Serving
This paper mainly contributes the following 2 points:So the core logic the algorithm.原创 2025-12-27 18:09:56 · 134 阅读 · 0 评论 -
SGLang: Efficient Execution of Structured Language Model Programs
I think there are 3 advantages in SGLang. It allows direct programing in python, it suuport RadixAttention to effeicient KVCache reuse, and it used compressed finite state machine to accelerate the structured output.Reuse the KVCache with the same prompts.原创 2025-11-08 14:51:06 · 348 阅读 · 0 评论 -
Why do the LLM possess the capability for chain of thought?
I remember that when I started to using LLM tools, like chatgpt, or kimi, which are terrific at handing complex reasoning jobs, such as math, commonsense and other tasks. After a while, these tools can do it. The most important factor is the chain of thoug原创 2025-08-17 16:48:30 · 131 阅读 · 0 评论 -
What is the model distillation?
The model distillation is a special method used to compress the model for convenient deployment. It involves using a small model that doesn't significantly decline the accurency. The principle behind this method is the use of "soft target" which are differ原创 2025-08-12 23:16:10 · 428 阅读 · 0 评论 -
GQA:Grouped-query attension
The principle is very simple from the above graph. Regarding the conversation from MHA to GQA, we use the mean pool method to do it over the keys and values heads, as shown in the following figure: 1.GQA leads to a higher quality than MQA, but faster than原创 2025-08-09 15:40:09 · 321 阅读 · 0 评论 -
The core logic of Rotary Position Embedding
where:原创 2025-07-20 11:51:00 · 340 阅读 · 0 评论 -
Why the rotation matric is like this?
[ cosθ -sinθsinθ cosθ ]Let us demostrate this using the "complex-number shortcut".1. We denotes 2-D matric [x, y] as the complx number:z = x + y2. If we want to rotate some angle, in complex numbers, we usually multiplys:e^(iθ) = cos(θ) + isin(θ)3. So, we原创 2025-07-16 23:00:08 · 168 阅读 · 0 评论 -
A deep Analysis of MLA algorithm
(Because the formula is so difficult to type, I wrote these by hand.)原创 2025-05-30 22:55:45 · 240 阅读 · 0 评论 -
Bert介绍
BERT(Bidirectional Encoder Representation from Transformers)是2018年10月由Google AI研究院提出的一种预训练模型,论文来自于:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding。整体结构分为Embedding、Transformer Encoder结构和模型输出。原创 2024-06-10 17:37:16 · 983 阅读 · 0 评论
分享