将lora权重合并到原模型

原创已于 2024-10-09 22:40:07 修改 · 4.6k 阅读

7 ·

CC 4.0 BY-SA版权

文章标签：

#python #transformer #语言模型

于 2024-10-09 21:54:15 首次发布

这是一个把lora参数合并到原模型的代码。

为什么要合并，虽然微调后预测时需要合并但不一定且保存，但是部分大模型评测项目都是用正常的huggingface checkpoint来测试，因此需要先行合并。

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

def compare_model_weights(model1, model2):
    """
    Compare the weights of two models and return True as soon as any layer's weights are different (early exit).
    Return False if all weights are the same.
    """
    for name1, param1 in model1.named_parameters():
        if name1 in model2.state_dict():
            param2 = model2.state_dict()[name1]
            # Early exit if any weights are different
            if not torch.allclose(param1, param2):
                print(f"Layer '{name1}': Weights are DIFFERENT.")
                return True
        else:
            print(f"Layer '{name1}' not found in the second model.")
            return True
    
    # Return False if no differences were found
    return False


# Define the paths to your base model and LoRA directories
base_model_dir = os.environ.get("BASE_MODEL_DIR", None)
lora_model_dir = os.environ.get("LORA_MODEL_DIR", None)
merged_model_dir = os.environ.get("MERGED_MODEL_DIR", None)

# Step 1: Load the base model and tokenizer
# !!!! check torch_dtype in the config.json is same as below
# !!!! otherwise the size will change
print("Loading base model and tokenizer...")
model_base = AutoModelForCausalLM.from_pretrained(
    base_model_dir,
    load_in_8bit=False,
    torch_dtype=torch.float16,
    device_map={"": "cpu"},
    )
tokenizer = AutoTokenizer.from_pretrained(base_model_dir)

# Optional: check model params before and after merging
import copy
model_base_original = copy.deepcopy(model_base)

# Step 2: Load the LoRA configuration
print("Loading LoRA configuration...")
peft_config = PeftConfig.from_pretrained(lora_model_dir)

# Step 3: Load the LoRA weights into the base model
print("Loading LoRA model and applying weights...")
model_lora = PeftModel.from_pretrained(
    model_base, 
    lora_model_dir,
    device_map={"": "cpu"},
    torch_dtype=torch.float16,
    )

# Step 4: Merge the LoRA weights with the base model and unload LoRA
print("Merging LoRA weights into base model...")
model_merged = model_lora.merge_and_unload()
# Now `merged_model` contains the base model with LoRA weights merged

# Optional: check model params before and after merging
isdifferent = compare_model_weights(model_base_original, model_merged)
if isdifferent:
    print("Merging is valid.")
else:
    print("Merging changes no params. Merging may be invalid.")

# Save the merged model
print(f"Saving merged model to {merged_model_dir}...")
model_merged.save_pretrained(merged_model_dir, max_shard_size="1GB")
tokenizer.save_pretrained(merged_model_dir)

print("Model merging complete.")

这个代码会在合并时顺便检查合并前后的参数是否很接近（可能lora没训练成果，参数变化很小）。很接近时会通知你可能有异常，但还是会保存合并结果。

export BASE_MODEL_DIR=<?>
export LORA_MODEL_DIR=<?>
export MERGED_MODEL_DIR=<?>

python merge_and_check.py

这是脚本命令，把前三个变量分别换成1.原模型地址、2.lora训练结果保存地址、3.用于保存合并后的模型地址。

{
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",       !!!!!!!!!!!!!!注意这个参数
  "transformers_version": "4.31.0.dev0",
  "use_cache": true,
  "vocab_size": 32000
}

问题：lora合并前后模型大小不一样

说明：使用时注意下torch_dtype参数，要与你原先下载的模型的config.json里的torch_dtype要一致，如上面的json内容。模型预加载时这个参数默认是float32，如果没有指定的话，合并前后的模型大小会不一致。以Llama2-7b为例，下载的模型是float16，不指定torch_dtype，你的模型大小会是原模型大小的两倍。

参考：

Model size doubles after .merge_and_unload() and .save_pretrained() · Issue #137 · bigcode-project/starcoder · GitHub