【大模型】 Qwen2 来了，27 种语言支持，上下文长度支持最高达到 128K tokens

原创已于 2024-06-07 19:25:01 修改 · 6.9k 阅读

9 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #qwen2 #大模型

于 2024-06-07 09:54:26 首次发布

大语言模型专栏收录该内容

43 篇文章

订阅专栏

该文章已生成可运行项目，

【大模型】 Qwen2 来了，27 种语言支持，上下文长度支持最高达到 128K tokens

Qwen2 模型介绍
模型基础信息
支持语言
模型测评
运行模型
模型部署
下载
开源协议
参考

Qwen2 模型介绍

5个尺寸的预训练和指令微调模型, 包括 Qwen2-0.5B、Qwen2-1.5B、Qwen2-7B、Qwen2-57B-A14B 以及 Qwen2-72B；
在中文英语的基础上，训练数据中增加了27种语言相关的高质量数据；
多个评测基准上的领先表现；
代码和数学能力显著提升；
增大了上下文长度支持，最高达到128K tokens（Qwen2-72B-Instruct）。

模型基础信息

在这里插入图片描述

支持语言

在这里插入图片描述

模型测评

在这里插入图片描述

对比评测

在这里插入图片描述

运行模型

使用 transformers 后端进行推理:

要求: transformers>=4.37

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-7B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-7B-Instruct")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

模型部署

安装 vllm

pip install "vllm>=0.4.3"

修改：config.json

    {
        "architectures": [
            "Qwen2ForCausalLM"
        ],
        // ...
        "vocab_size": 152064,

        // adding the following snippets
        "rope_scaling": {
            "factor": 4.0,
            "original_max_position_embeddings": 32768,
            "type": "yarn"
        }
    }

启动服务：

python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-7B-Instruct --model path/to/weights

测试

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Qwen2-7B-Instruct",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Your Long Input Here."}
    ]
    }'