Llama2源码导读----model的转化

个人debug得出,模型拆解学习使用

第一步:预训练好的基模型

经过了 model = AutoModelForCausalLM.from_config(cfg)

得到了  LlamaForCausalLM封装的模型结构

CFG

LlamaConfig {
  "_name_or_path": "./llama-2-7b/7B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 2048,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.39.3",
  "use_cache": false,
  "vocab_size": 32000
}



LlamaForCausalLM(
  (model): **LlamaModel(**
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  **)**
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

第二步:Parameter-Efficient Fine-Tuning:PEFT

首先对线性层进行量化

经过了
model.model = replace_linear(model.model, Linear4bit, compute_dtype=compute_dtype,
									quant_type='nf4', quant_storage=torch_dtype, skip_modules=skip_modules)
model.is_loaded_in_4bit = True


LlamaForCausalLM(
  (model): **LlamaModel**(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): **Linear4bit**(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
          **(act_fn): SiLU()**
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)

 peft_config    的配置


 peft_config = LoraConfig(
            task_type=TaskType.CAUSAL_LM, inference_mode=False,
            r=args["lora_rank"],
            lora_alpha=args["lora_alpha"],
            lora_dropout=args["lora_dropout"],
            target_modules=args["lora_target_modules"],
        )
        
         model = get_peft_model(model, peft_config)
         
         
 经过这一步骤:
 
 PeftModelForCausalLM 内是 LoraModel 内是 LlamaForCausalLM  
 [
			 包括了
			 (model): LlamaModel()
			 (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
 ]
 
LlamaModel()下:
		(embed_tokens): Embedding(32000, 4096)
		(layers): ModuleList()
		(norm): LlamaRMSNorm()
		

 Embedding Layer(嵌入层)是非常关键的一部分,特别是在处理自然语言处理(NLP)任务时。
 这里的Embedding Layer指的是embed_tokens,它是用于将输入的单词索引转换成对应的密集向量表示的层
 每一个词汇有使用了4096维度进行表示,总共32000个词汇
					 # 假设词汇表大小为32000,嵌入维度为4096
					vocab_size = 32000
					embedding_dim = 4096
					
							
ModuleList()下:(0-31): 32 x LlamaDecoderLayer()

					
Decoder Layers (layers)	# 自注意力机制和前馈网络,以及相关的归一化步骤			
(layers): ModuleList( # type: ignore
          (0-31): 32 x LlamaDecoderLayer(
		            (self_attn): LlamaSdpaAttention(
						            (q_proj):lora.Linear4bit()
						            (k_proj):lora.Linear4bit()
						            (v_proj):lora.Linear4bit()
					              (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
					              (rotary_emb): LlamaRotaryEmbedding()
		            )
		            (mlp): LlamaMLP( # 前馈网络 信息流控制、维度上升和维度下降 
												(gate_proj): lora.Linear4bit()
												(up_proj): lora.Linear4bit()
												(down_proj): lora.Linear4bit()
												(act_fn): SiLU()
		            )
		            (input_layernorm): LlamaRMSNorm()# 输入层归一化
		            # 对进入解码器层的每个子层(如自注意力或前馈网络层)之前的输入数据进行归一化
		            (post_attention_layernorm): LlamaRMSNorm()# 自注意力后归一化
		            # 对自注意力层输出进行归一化处理
          )
    )

对(q_proj)/(k_proj)/(v_proj)线性层进行了lora化
关于LORA化
LoRA 化的影响
在 LlamaDecoderLayer 的每个组成部分中,原始的线性层(例如 Linear4bit)被替换或增强为包含 LoRA 结构的复合层。每个 LoRA 化的层通常包含以下组件:

1. Base Layer
功能: 基础的线性变换。
LoRA化: LoRA 不替换基本层,而是并行使用。
2. LoRA Components
lora_A 和 lora_B: 这两个模块实现低秩矩阵变换。lora_A 将原始维度映射到一个较小的维度(通常远小于原始维度),而 lora_B 则将这个低维空间映射回原始的高维空间。
目的: 这种变换使得模型能够在不增加过多参数的情况下,学习到额外的、对特定任务有用的适应性变化。
3. Dropout
功能: 在 LoRA 结构中,dropout 被用于正则化低秩变换,帮助防止过拟合。
4. Embedding Parameters
lora_embedding_A 和 lora_embedding_B: 这些是额外的可学习参数,专门用于调整和优化 LoRA 低秩结构的行为。

 PeftModelForCausalLM 模型结构

 **PeftModelForCausalLM**(
  (base_model): **LoraModel**(
    (model): **LlamaForCausalLM**(
      (model): **LlamaModel**(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x **LlamaDecoderLayer**(
            (self_attn): LlamaSdpaAttention(
              **(q_proj)**: **lora.Linear4bit**(
                **(base_layer)**: Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              **(k_proj)**: lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              **(v_proj)**: lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
              (rotary_emb): LlamaRotaryEmbedding()
            )
            (mlp): LlamaMLP(
              (gate_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (up_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=11008, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=11008, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (down_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=11008, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=11008, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act_fn): SiLU()
            )
            (input_layernorm): LlamaRMSNorm()
            (post_attention_layernorm): LlamaRMSNorm()
          )
        )
        (norm): LlamaRMSNorm()
      )
      (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
    )
  )
)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值