本人项目地址大全:Victor94-king/NLP__ManVictor: 优快云 of ManVictor
原文地址:微信公众平台
项目地址: QwenLM/Qwen2.5: Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
官网地址: 你好,Qwen2 | Qwen & Qwen2.5: 基础模型大派对! | Qwen
写在前面: 笔者更新不易,希望走过路过点个关注和赞,笔芯!!!
写在前面: 笔者更新不易,希望走过路过点个关注和赞,笔芯!!!
写在前面: 笔者更新不易,希望走过路过点个关注和赞,笔芯!!!
下面的源码内容来自transformers代码库中:transformers-4.45.2/src/transformers/models/qwen2/modeling_qwen2.py
。
实验准备
首先我们下载一些Qwen2需要的配置数据。下载地址:https://hf-mirror.com/Qwen/Qwen2-0.5B/tree/main
# 下载配置相关的文件
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/config.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/generation_config.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/merges.txt
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/tokenizer.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/tokenizer_config.json
wget https://hf-mirror.com/Qwen/Qwen2-0.5B/resolve/main/vocab.json
注:权重文件我们可以不下载,我们这里仅仅是做一些流程实验,所以权重可以使用随机初始化。
下载transformers源码,我们这里使用的是 4.45.2
版本,理论上之后的版本也都支持。
config.json文件修改
原始文件内容:
{
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151643,
"hidden_act": "silu",
"hidden_size": 896,
"initializer_range": 0.02,
"intermediate_size": 4864,
"max_position_embeddings": 131072,
"max_window_layers": 24,
"model_type": "qwen2",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 131072,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.40.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
我们这里修改 num_hidden_layers
值为 4
和 use_cache
设置为 false
,因为我们仅仅是实验一下,并不需要那么多层。其它内容保持不变。
文件结构
在transformers目录的examples目录里面新建一个Qwen2_learn目录,在Qwen2_learn下再建一个config文件夹,然后将上面下载的配置文件复制到config目录下。最终或得如下目录结构:
├── __init__.py
├── config
│ ├── config.json
│ ├── generation_config.json
│ ├── merges.txt
│ ├── tokenizer.json
│ ├── tokenizer_config.json
│ └── vocab.json
└── main.py
主要代码
下面是主体代码:
from src.transformers.models.qwen2.configuration_qwen2 import Qwen2Config
from src.transformers.models.qwen2.tokenization_qwen2 import Qwen2Tokenizer
from src.transformers.models.qwen2.modeling_qwen2 import Qwen2ForCausalLM
config = Qwen2Config.from_pretrained("./config")
tokenizer = Qwen2Tokenizer.from_pretrained("./config")
model = Qwen2ForCausalLM(config=config)
print("模型细节: ")
print(model)
print("*="*80)
print("文本编码:")
inputs = tokenizer(["你好啊", "简单的机器学习是为了让机器学习变得更简单而存在的"],
add_special_tokens=True,
max_length=10,
padding=True,
truncation=True,
return_tensors="pt")
print(inputs)
print("*="*80)
print("模型输出:")
print(model(**inputs))
不出意外的话,你可以看到下面的输出内容:
模型细节:
Qwen2ForCausalLM(
(model): Qwen2Model(
(embed_tokens): Embedding(151936, 896)
(layers): ModuleList(
(0-3): 4 x Qwen2DecoderLayer(
(self_attn): Qwen2SdpaAttention(
(q_proj): Linear(in_features=896, out_features=896, bias=True)
(k_proj): Linear(in_features=896, out_features=128, bias=True)
(v_proj): Linear(in_features=896, out_features=128, bias=True)
(o_proj): Linear(in_features=896, out_features=896, bias=False)
(rotary_emb): Qwen2RotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): Linear(in_features=896, out_features=4864, bias=False)
(up_proj): Linear(in_features=896, out_features=4864, bias=False)
(down_proj): Linear(in_features=4864, out_features=896, bias=False)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
(post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
)
)
(norm): Qwen2RMSNorm((896,), eps=1e-06)
(rotary_emb): Qwen2RotaryEmbedding()
)
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
)
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*&