大模型源码解析 | Qwen2源码阅读——环境准备和说明

原创已于 2024-10-26 19:25:55 修改 · 2.3k 阅读

26 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #AI大模型 #大模型 #LLM #ai #agi #Qwen

于 2024-10-25 10:03:26 首次发布

该文章已生成可运行项目，

下面的源码内容来自transformers代码库中：transformers-4.45.2/src/transformers/models/qwen2/modeling_qwen2.py。

一、实验准备

首先我们下载一些Qwen2需要的配置数据。下载地址：https://hf-mirror.com/Qwen/Qwen2-0.5B/tree/main

# 下载配置相关的文件
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/config.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/generation_config.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/merges.txt
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/tokenizer.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/tokenizer_config.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/vocab.json

注：权重文件我们可以不下载，我们这里仅仅是做一些流程实验，所以权重可以使用随机初始化。

下载transformers源码，我们这里使用的是4.45.2版本，理论上之后的版本也都支持。

config.json文件修改

原始文件内容：

{
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 131072,
  "max_window_layers": 24,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

我们这里修改num_hidden_layers值为4和use_cache设置为false，因为我们仅仅是实验一下，并不需要那么多层。其它内容保持不变。

文件结构

在transformers目录的examples目录里面新建一个Qwen2_learn目录，在Qwen2_learn下再建一个config文件夹，然后将上面下载的配置文件复制到config目录下。最终或得如下目录结构：

├── __init__.py
├── config
│   ├── config.json
│   ├── generation_config.json
│   ├── merges.txt
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.json
└── main.py

主要代码

下面是主体代码：

from src.transformers.models.qwen2.configuration_qwen2 import Qwen2Config
from src.transformers.models.qwen2.tokenization_qwen2 import Qwen2Tokenizer
from src.transformers.models.qwen2.modeling_qwen2 import Qwen2ForCausalLM

config = Qwen2Config.from_pretrained("./config")
tokenizer = Qwen2Tokenizer.from_pretrained("./config")
model = Qwen2ForCausalLM(config=config)
print("模型细节： ")
print(model)
print("*="*80)
print("文本编码：")
inputs = tokenizer(["你好啊", "简单的机器学习是为了让机器学习变得更简单而存在的"],
                add_special_tokens=True,
                max_length=10,
                padding=True,
                truncation=True,
                return_tensors="pt")
print(inputs)
print("*="*80)
print("模型输出：")
print(model(**inputs))

不出意外的话，你可以看到下面的输出内容：

模型细节： 
Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 896)
    (layers): ModuleList(
      (0-3): 4 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): Linear(in_features=896, out_features=896, bias=True)
          (k_proj): Linear(in_features=896, out_features=128, bias=True)
          (v_proj): Linear(in_features=896, out_features=128, bias=True)
          (o_proj): Linear(in_features=896, out_features=896, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
          (up_proj): Linear(in_features=896, out_features=4864, bias=False)
          (down_proj): Linear(in_features=4864, out_features=896, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((896,), eps=1e-06)
    (rotary_emb): Qwen2RotaryEmbedding()
  )
  (lm_head): Linear(in_features=896, out_features=151936, bias=False)
)
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
文本编码：
{'input_ids': tensor([[108386, 103924, 151643, 151643, 151643, 151643, 151643, 151643, 151643,
         151643],
        [105172, 102182, 100134, 104802,  99258, 102182, 100134, 112606, 100405,
          68536]]), 'attention_mask': tensor([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
模型输出：
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
CausalLMOutputWithPast(loss=None, logits=tensor([[[ 1.5059,  0.6765,  0.2425,  ...,  0.4329, -0.0789, -1.0450],
         [ 0.3321,  0.8809,  0.6826,  ...,  0.0330,  0.0865, -0.6893],
         [ 0.2471,  0.7115,  0.5307,  ..., -0.0703,  0.1209, -0.7370],
         ...,
         [ 0.3910,  0.7432,  0.3905,  ...,  0.0459,  0.2107, -0.6613],
         [ 0.3790,  0.7864,  0.4028,  ...,  0.0793,  0.2166, -0.6966],
         [ 0.3704,  0.8088,  0.4358,  ...,  0.0567,  0.2196, -0.7045]],

        [[ 1.4859, -0.7797,  0.9490,  ..., -0.0205, -0.2090, -0.7289],
         [ 1.5965, -0.2371,  0.7803,  ..., -0.8275, -0.1699, -0.0016],
         [ 1.2100, -0.2230,  0.8658,  ..., -0.0166, -0.0579, -0.5130],
         ...,
         [ 0.5131, -0.2756,  0.8019,  ..., -0.0026,  0.3006, -1.2691],
         [ 0.2210, -0.0853,  0.9619,  ..., -0.1808,  0.5546, -1.0678],
         [ 0.4743,  0.1699,  0.6723,  ..., -0.0445,  0.4406, -0.9143]]],
       grad_fn=<UnsafeViewBackward0>), past_key_values=None, hidden_states=None, attentions=None)

有了上面的内容，我们基本流程就是搭好了，下面就可以使用我们自己喜欢的IDEA进行各种内容的调试了。我这里使用的是pycharm。

二、Qwen2Model

Qwen2ForCausalLM主体主要是Qwen2Model，所以我们主要来看一下Qwen2Model中的输入输出部分。

输入

对于Qwen2Model的输入主要是以下参数

input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
cache_position: Optional[torch.LongTensor] = None,

input_ids的shape是[bs, seq_len]，即batch_size和序列的长度组成的二维矩阵。里面的元素值是token在词汇表中对应的索引信息。
attention_mask的shape和input_ids shape是一直的，也是[bs, seq_len]，元素取值要么是1，要么是0，1表示input_ids对应位置的元素是有效的，0则表示无效的，在后续计算attention时，只有为1的位置才会被真正的计算。
position_ids的shape也是[bs, seq_len]，表达元素的位置的信息。
past_key_values：预先计算的隐藏状态（自注意力块和交叉注意力块中的键和值），可以用来加速序列解码。这通常包括模型在解码的前一阶段返回的past_key_values，当use_cache=True或config.use_cache=True时。

允许两种格式：

模型将输出与输入相同的缓存格式。如果没有传递past_key_values，将返回传统的缓存格式。

如果使用了past_key_values，用户可以选择性地只输入最后一个input_ids（那些没有给这个模型提供过去键值状态的），形状为(batch_size, 1)，而不是所有input_ids的形状(batch_size, sequence_length)。

注：这个参数一般情况在推理的时候使用，训练的时候不用。
一个~cache_utils.Cache实例，参见我kv缓存指南;
一个长度为config.n_layers的元组，其中每个元组包含两个形状为(batch_size, num_heads, sequence_length, embed_size_per_head)的torch.FloatTensor张量。这也被称为传统的缓存格式。
inputs_embeds：形状为 (batch_size, sequence_length, hidden_size)，可选地，您可以选择不传递 input_ids，而是直接传递嵌入表示。这在您想要对如何将 input_ids 索引转换为相关向量有更多的控制权时很有用，而不是使用模型内部的嵌入查找矩阵。
use_cache：如果设置为 True，则返回 past_key_values 键值状态，可以用来加速解码（参见 past_key_values）。
output_attentions：是否返回所有注意力层的注意力张量。有关返回张量的更多详细信息，请参见返回张量中的 attentions。
output_hidden_states：是否返回所有层的隐藏状态。有关返回张量的更多详细信息，请参见返回张量中的 hidden_states。
return_dict：是否返回一个~utils.ModelOutput而不是一个普通的元组。
cache_position：描述输入序列标记位置的索引。与 position_ids 不同，这个张量不受填充（padding）的影响。它用于在正确的位置更新缓存，并推断完整的序列长度。