2024-7-15 22:34:02
系统环境:

创建conda环境
conda create -n llamafactory python=3.10
拉取代码并安装依赖包
git clone https://github.com/LlamaFamily/Llama-Chinese.git
cd Llama-Chinese
pip install -r requirements.txt
看下当前版本
![[dd片转存中...(img-Vnh6FNBj-1726275684364)]](https://i-blog.csdnimg.cn/direct/b6e8e5aaf5134a5ab78f5de20862aa2a.png)
下载模型
vim download_model.py
#模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('FlagAlpha/Llama3-Chinese-8B-Instruct',local_dir='./models/FlagAlpha/Llama3-Chinese-8B-Instruct')
local_dir 设置为自己本地地址
进行推理
创建一个名为 quick_start.py 的文件,并将以下内容复制到该文件中。 模型替换为上面自己本地地址
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device_map = "cuda:0" if torch.cuda.is_available() else "auto"
model_id = "/home/cstu/jupyterspace/models/FlagAlpha/Llama3-Chinese-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id,device_map=device_map,torch_dtype=torch.float16,load_in_8bit=True,trust_remote_code=True,use_flash_attention_2=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_id,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['<s>Human: 介绍一下中国\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids
if torch.cuda.is_available():
input_ids = input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)
运行 quick_start.py 代码。
python quick_start.py
报错如下

解决
pip install flash_attn
上面的安装很慢,参考快速安装flash-attn
再次运行 quick_start.py,还是有不对
transformers 版本不对

解决
pip uninstall transformers
pip install transformers
启动就能正常显示结果了,但是会有一些warning

修改位置如下图

大概占用显存如下

最终结果

快速上手-使用gradio
python examples/chat_gradio.py --model_name_or_path /home/cstu/jupyterspace/models/FlagAlpha/Llama3-Chinese-8B-Instruct
默认启动,生成很慢

修改源码
vim examples/chat_gradio.py
启动后大概占用15G多

发现输出很慢,而且结果也不是我想要的
后来查看https://huggingface.co/FlagAlpha/Llama3-Chinese-8B-Instruct 根据这里面的使用代码
import transformers
import torch
model_id = "/home/cstu/jupyterspace/models/FlagAlpha/Llama3-Chinese-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.float16},
device="cuda",
)
messages = [{"role": "system", "content": ""}]
messages.append(
{"role": "user", "content": "介绍一下机器学习"}
)
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt,
max_new_tokens=512,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9
)
content = outputs[0]["generated_text"][len(prompt):]
print(content)
这个很快就有结果了

配置完预览命令
llamafactory-cli train \
--stage sft \
--do_train True \
--model_name_or_path /home/cstu/jupyterspace/models/FlagAlpha/Llama3-Chinese-8B-Instruct \
--preprocessing_num_workers 16 \
--finetuning_type lora \
--quantization_method bitsandbytes \
--template llama3 \
--flash_attn fa2 \
--dataset_dir data \
--dataset alpaca_zh_demo \
--cutoff_len 1024 \
--learning_rate 5e-05 \
--num_train_epochs 3.0 \
--max_samples 1000 \
--per_device_train_batch_size 2 \
--gradient_a

最低0.47元/天 解锁文章
8万+

被折叠的 条评论
为什么被折叠?



