TokenSwift 开源项目最佳实践教程-优快云博客

TokenSwift 开源项目最佳实践教程

TokenSwift From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation 项目地址: https://gitcode.com/gh_mirrors/to/TokenSwift

1. 项目介绍

TokenSwift 是一个旨在加速超长序列生成的开源框架，它能够处理长达 100K 的标记序列，同时保持目标模型的输出质量。该框架的核心技术能够在不损失模型质量的前提下，将计算时间从数小时缩短至数分钟，大大提高了处理效率。

2. 项目快速启动

环境准备

Python 3.11
NVIDIA CUDA Toolkit

安装步骤

克隆项目代码：

git clone https://github.com/bigai-nlco/TokenSwift.git

创建并激活虚拟环境：

conda create -n tokenswift python=3.11
conda activate tokenswift

安装依赖：

conda install nvidia::cuda-nvcc
pip install -r requirements.txt
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.4cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

运行示例

假设我们已经有了预训练的模型和相应的权重文件，以下是一个简单的运行示例：

torchrun --master-port 1111 --nproc_per_node=1 main.py \
--model_type llama3_1 \
--ckpt_path your_checkpoint_path \
--prefill_len 4096 \
--retrival_max_budget 4096 \
--gen_len 102400 \
--gamma 4 \
--min_p 0.1 \
--temperature 1.0 \
--tree_decoding \
--ngram_topk 20 \
--penalty 1.2 \
--penalty_length 1024 \
--prompt_id 0

请确保替换 your_checkpoint_path 为你的权重文件路径。