LLM实践——下载、加载（启动）、微调、量化等

lucky_chaichai

已于 2025-05-09 14:48:55 修改

阅读量1k

点赞数 18

分类专栏： TensorFlow 文章标签： python transformer 深度学习

于 2024-12-05 17:13:37 首次发布

本文链接：https://blog.youkuaiyun.com/lucky_chaichai/article/details/138486004

版权

一、模型下载、加载

1、modelscope模型下载：

Python代码方式：

from modelscope import snapshot_download
model_dir = snapshot_download(
	'qwen/Qwen1.5-14B-Chat',  # 模型id，在modelscope上
	cache_dir = '/home/yt/models' # 模型的存储路径
	)

也可以使用git的方式，但是这个方式很慢，而且容易中途出现中断，或者下载的模型文件不完整。

2、modelscope模型加载（基于transformers）

下述模块版本可以正常运行：
torch == 2.2.0 + cpu
transformers == 4.37.2
modelscope == 1.12.0

python代码如下：

from transformers import AutoModelForCausalLM, AutoTokenizer

def model_inferByTrans(model_file = "E:/CQF/LLM/ai_mycode/llm_api/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1___5B"):
    device = "cpu" # the device to load the model onto，有cuda时，该值可以是cuda

    # Now you do not need to add "trust_remote_code=True"
    model = AutoModelForCausalLM.from_pretrained(
        model_file,
        torch_dtype="auto",
        device_map="cpu" # （之前指定的值是auto）使用cpu加载模型时，该值必须为cpu，否则报错：ValueError: You are trying to offload the whole model to the disk. Please us
    )
    tokenizer = AutoTokenizer.from_pretrained(model_file)

    # Instead of using model.chat(), we directly use model.generate()
    # But you need to use tokenizer.apply_chat_template() to format your inputs as shown below
    messages = [
            {
   "role": "system", "content": "You are a helpful assistant."}
        ]
    while True:
        prompt = input('请输入你的prompt：')
        messages.append(
            {
   "role": "user", "content": prompt}
        )
        text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )
        model_inputs = tokenizer([text], return_tensors="pt").to(device)

        # Directly use generate() and tokenizer.decode() to get the output.
        # Use `max_new_tokens` to control the maximum output length.
        generated_ids = model.generate(
            model_inputs.input_ids,
            max_new_tokens=512
        )
        generated_ids