目录
一些cuda使用命令:
查看cuda当前情况(更新时间间隔为1秒):
watch -n 1 nvidia-smi
一、模型下载、加载
1、modelscope模型下载:
Python代码方式:
from modelscope import snapshot_download
model_dir = snapshot_download(
'qwen/Qwen1.5-14B-Chat', # 模型id,在modelscope上
cache_dir = '/home/yt/models' # 模型的存储路径
)
也可以使用git的方式,但是这个方式很慢,而且容易中途出现中断,或者下载的模型文件不完整。
2、modelscope模型加载(基于transformers)
- 下述模块版本可以正常运行:
torch == 2.2.0 + cpu
transformers == 4.37.2
modelscope == 1.12.0
python代码如下:
from transformers import AutoModelForCausalLM, AutoTokenizer
def model_inferByTrans(model_file = "E:/CQF/LLM/ai_mycode/llm_api/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1___5B"):
device = "cpu" # the device to load the model onto,有cuda时,该值可以是cuda
# Now you do not need to add "trust_remote_code=True"
model = AutoModelForCausalLM.from_pretrained(
model_file,
torch_dtype="auto",
device_map="cpu" # (之前指定的值是auto)使用cpu加载模型时,该值必须为cpu,否则报错:ValueError: You are trying to offload the whole model to the disk. Please us
)
tokenizer = AutoTokenizer.from_pretrained(model_file)
# Instead of using model.chat(), we directly use model.generate()
# But you need to use tokenizer.apply_chat_template() to format your inputs as shown below
messages = [
{
"role": "system", "content": "You are a helpful assistant."}
]
while True:
prompt = input('请输入你的prompt:')
messages.append(
{
"role": "user", "content": prompt}
)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
# Directly use generate() and tokenizer.decode() to get the output.
# Use `max_new_tokens` to control the maximum output length.
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids