Ubuntu20.04配置qwen0.5B记录_qwen0.5b 最低配置cpu 内存-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_48622537/article/details/139574785

环境简介

Ubuntu20.04、
NVIDIA-SMI 545.29.06、
Cuda 11.4、
python3.10、
pytorch1.11.0

开始搭建

python环境设置

创建虚拟环境

conda create --name qewn python==3.10

预安装modelscope和transformers

pip install modelscope
pip install transformers

安装pytorch

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3

模型需要下载

创建一个python文件

gedit download.py

里面复制如下内容

from modelscope.hub.file_download import model_file_download
 	
model_dir = model_file_download(model_id='qwen/Qwen1.5-0.5B-Chat-GGUF',file_path='qwen1_5-0_5b-chat-q5_k_m.gguf',revision='master',cache_dir='path/to/local/dir')

运行python文件进行下载

python download.py

下载llama.cpp

使⽤git命令克隆llama.cpp项⽬

git clone https://github.com/ggerganov/llama.cpp

克隆完成之后我们进入llama.cpp目录中，对项目进行编译

cd llama.cpp
make -j

模型下载

在魔搭社区中下载模型运行
https://www.modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GGUF/files
本人下载的是qwen1_5-0_5b-chat-q5_k_m.gguf
终端运行，其中模型替换为自己的模型地址（官方给的-cml参数在help中没有找到，且影响运行，所以我删除掉了）
官方：

./main -m /path/to/local/dir/qwen/Qwen1.5-0.5B-Chat-GGUF/qwen1_5-0_5b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

我运行：

./main -m /path/to/local/dir/qwen/Qwen1.5-0.5B-Chat-GGUF/qwen1_5-0_5b-chat-q5_k_m.gguf -n 512 --color -i -f prompts/chat-with-qwen.txt

help内容：

usage: ./main [options]

general:

  -h,    --help, --usage          print usage and exit
         --version                show version and build info
  -v,    --verbose                print verbose information
         --verbosity N            set specific verbosity level (default: 0)
         --verbose-prompt         print a verbose prompt before generation (default: false)
         --no-display-prompt      don't print prompt at generation (default: false)
  -co,   --color                  colorise output to distinguish prompt and user input from generations (default: false)
  -s,    --seed SEED              RNG seed (default: -1, use random seed for < 0)
  -t,    --threads N              number of threads to use during generation (default: 8)
  -tb,   --threads-batch N        number of threads to use during batch and prompt processing (default: same as --threads)
  -td,   --threads-draft N        number of threads to use during generation (default: same as --threads)
  -tbd,  --threads-batch-draft N  number of threads to use during batch and prompt processing (default: same as --threads-draft)
         --draft N                number of tokens to draft for speculative decoding (default: 5)
  -ps,   --p-split N              speculative decoding split probability (default: 0.1)
  -lcs,  --lookup-cache-static FNAME
                                  path to static lookup cache to use for lookup decoding (not updated by generation)
  -lcd,  --lookup-cache-dynamic FNAME
                                  path to dynamic lookup cache to use for lookup decoding (updated by generation)
  -c,    --ctx-size N             size of the prompt context (default: 0, 0 = loaded from model)
  -n,    --predict N              number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)
  -b,    --batch-size N           logical maximum batch size (default: 2048)
  -ub,   --ubatch-size N          physical maximum batch size (default: 512)
         --keep N                 number of tokens to keep from the initial prompt (default: 0, -1 = all)
         --chunks N               max number of chunks to process (default: -1, -1 = all)
  -fa,   --flash-attn             enable Flash Attention (default: disabled)
  -p,    --prompt PROMPT          prompt to start generation with (default: '')
  -f,    --file FNAME             a file containing the prompt (default: none)
         --in-file FNAME          an input file (repeat to specify multiple files)
  -bf,   --binary-file FNAME      binary file containing the prompt (default: none)
  -e,    --escape                 process escapes sequences