从0开始千问量化模型1.5B运行

我是个大好人

已于 2024-07-17 19:18:28 修改

阅读量1.4k

点赞数 26

文章标签： python 机器学习深度学习语言模型

于 2024-07-16 08:42:24 首次发布

本文链接：https://blog.youkuaiyun.com/a2503099087/article/details/140293293

版权

千问7B

GitHub 地址 https://github.com/QwenLM/Qwen-7B

使用

环境要求

pytorch>=1.12

transformers==4.37.0

使用步骤

1 安装相应的依赖库

pip install transformers==4.37.0 accelerate tiktoken einops

python版本不能太高，3.12版本安装会有很多问题，这里用的3.10

问题1）报错error: can’t find Rust compiler

error: can't find Rust compiler

      If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.

      To update pip, run:

          pip install --upgrade pip

      and then retry package installation.

      If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (tokenizers)

这个错误信息表明在尝试安装 tokenizers 包时，需要 Rust 编译器来构建从源代码编译的包。以下是解决这个问题的步骤：
1.升级 pip

pip install --upgrade pip

尝试升级 pip 到最新版本，因为有时候旧版本的 pip 可能无法正确处理预构建的 wheel 包。
2.安装Rust编译器
在 Windows 系统上安装 Rust 编译器
下载安装程序：
访问 Rust 官方网站 https://www.rust-lang.org/。
点击“Get Started”按钮，然后选择“Windows”选项。
下载 rustup-init.exe 安装程序。
运行安装程序：

双击下载的 rustup-init.exe 文件。
按照提示完成安装。默认选项通常是推荐的，你可以直接按 Enter 键选择默认选项。
验证安装：

在这里插入图片描述

打开命令提示符（Command Prompt）或 PowerShell。

运行以下命令：

rustc --version

如果显示 Rust 编译器的版本信息，说明安装成功。

问题2）报错 error: linker link.exe not found | = note: program not found note: the msvc targets depend on the msvc linker but link.exe was not found

 error: linker `link.exe` not found
        |
        = note: program not found

      note: the msvc targets depend on the msvc linker but `link.exe` was not found

      note: please ensure that Visual Studio 2017 or later, or Build Tools for Visual Studio were installed with the Visual C++ option.

      note: VS Code is a different product, and is not sufficient.

      error: could not compile `windows_x86_64_msvc` (build script) due to 1 previous error
        Caused by:
        process didn't exit successfully: `C:\Users\86151\.rustup\toolchains\stable-x86_64-pc-windows-msvc\bin\rustc.exe --crate-name build_script_build --edition=2021 C:\Users\86151\.
cargo\registry\src\index.crates.io-6f17d22bba15001f\windows_x86_64_msvc-0.52.6\build.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin -
-emit=dep-info,link -C embed-bitcode=no -C debug-assertions=off -C metadata=7af625c90bbc6e46 -C extra-filename=-7af625c90bbc6e46 --out-dir C:\Users\86151\AppData\Local\Temp\pip-install
-jkjp9s0u\tokenizers_cdf649cfe0ff4ec5a489183218d65cfb\target\release\build\windows_x86_64_msvc-7af625c90bbc6e46 -L dependency=C:\Users\86151\AppData\Local\Temp\pip-install-jkjp9s0u\tokenizers_cdf649cfe0ff4ec5a489183218d65cfb\target\release\deps --cap-lints allow` (exit code: 1)
      warning: build failed, waiting for other jobs to finish...
      error: could not compile `windows_x86_64_msvc` (build script) due to 1 previous error

      Caused by:
        process didn't exit successfully: `C:\Users\86151\.rustup\toolchains\stable-x86_64-pc-windows-msvc\bin\rustc.exe --crate-name build_script_build --edition=2018 C:\Users\86151\.
cargo\registry\src\index.crates.io-6f17d22bba15001f\windows_x86_64_msvc-0.48.5\build.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin -
-emit=dep-info,link -C embed-bitcode=no -C debug-assertions=off -C metadata=40923ccbd947c781 -C extra-filename=-40923ccbd947c781 --out-dir C:\Users\86151\AppData\Local\Temp\pip-install
-jkjp9s0u\tokenizers_cdf649cfe0ff4ec5a489183218d65cfb\target\release\build\windows_x86_64_msvc-40923ccbd947c781 -L dependency=C:\Users\86151\AppData\Local\Temp\pip-install-jkjp9s0u\tokenizers_cdf649cfe0ff4ec5a489183218d65cfb\target\release\deps --cap-lints allow` (exit code: 1)
      error: could not compile `proc-macro2` (build script) due to 1 previous error

这个错误提示表明 Rust 编译器在尝试编译 windows_x86_64_msvc 包时，找不到 link.exe 文件。link.exe 是 Microsoft Visual C++ (MSVC) 编译工具链中的一个链接器，用于将编译后的目标文件链接成可执行文件或库。要下载C++的桌面开发工作负载，具体方式如下：
在这里插入图片描述
打开Visual Studio Installer

点击修改进行安装

问题3）报错AssertionError: Torch not compiled with CUDA enabled

这个错误提示表明你的 PyTorch 版本没有编译支持 CUDA。CUDA 是 NVIDIA 提供的并行计算平台和编程模型，用于在 NVIDIA GPU 上进行高性能计算。
要解决这个问题，你需要确保你的 PyTorch 版本支持 CUDA。以下是一些步骤来解决这个问题：

检查 CUDA 支持
首先，检查你的 PyTorch 版本是否支持 CUDA：

import torch
print(torch.cuda.is_available())

如果输出是 False，则说明你的 PyTorch 版本不支持 CUDA。
如何查看自己的pytorch版本？

import torch
print(torch.__version__)

输出如下：
在这里插入图片描述
PyTorch 版本号为 2.3.1+cpu，这个版本号包含以下信息：
2.3.1：这是 PyTorch 的主版本号。它表示使用的是 PyTorch 2.3.1 版本。
+cpu：这个后缀表示安装的 PyTorch 版本是针对 CPU 优化的版本。这意味着这个版本的 PyTorch 没有包含对 NVIDIA CUDA 的支持，因此无法利用 GPU 进行加速计算。
总结来说，2.3.1+cpu 表示安装的是 PyTorch 2.3.1 版本，并且这个版本是专门为 CPU 计算优化的，不支持 GPU 加速。如果需要使用 GPU 进行计算，需要安装支持 CUDA 的 PyTorch 版本。
2. 安装支持 CUDA 的 PyTorch 版本
你可以通过以下命令来安装支持 CUDA 的 PyTorch 版本。请根据你的 CUDA 版本选择合适的命令。
CUDA 11.7

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117

CUDA 11.3

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

检查 CUDA 版本
确保你的系统上安装了合适的 CUDA 版本。你可以通过以下命令检查 CUDA 版本：

nvcc --version

或者在 Windows 上：

nvidia-smi

安装 NVIDIA 驱动程序
从官网安装了 NVIDIA 驱动程序。（https://developer.nvidia.com/）
首要从系统的NVIDIA控制面板查看自己支持的CUDA版本

点击系统信息
在这里插入图片描述
点击组件第三行可以看到自己支持的CUDA版本为12.3

在这里插入图片描述
12.3版本下载地址
https://developer.nvidia.com/cuda-12-3-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_local

环境变量

使用Anaconda3管理虚拟环境
下载地址https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/?C=M&O=D
配置环境变量

D:\ProgramData\anaconda3 
D:\ProgramData\anaconda3\Scripts 
D:\ProgramData\anaconda3\Library\mingw-w64\bin
D:\ProgramData\anaconda3\Library\usr\bin 
D:\ProgramData\anaconda3\Library\bin

下载安装完成后点击Anaconda Prompt配置镜像源
在这里插入图片描述
Anaconda更换默认下载源(可选)
(1)打开Anaconda Prompt，然后输入如下命令添加清华源：(输入这四条指令即可完成换源操作)

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

感觉镜像有点问题？2024年7月15日19:18:03
(2)设置搜索时显示通道地址(之后的2-7指令可不输，以下给出具体含义)

conda config --set show_channel_urls yes

(3)设置pip为清华源（打开Anaconda Prompt，输入如下代码）：

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

(4)显示安装通道

conda config --show channels

(5)添加源

conda config --add channels url地址(url地址为web页地址,请自行补充)

(6)删除源

conda config --remove channels url地址(url地址为web页地址,可看如下例子)

例如：conda config --remove channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

(7)恢复默认源（换回默认设置）

conda config --remove-key channels

打开Anaconda Navigator 点击create创建虚拟环境在这里插入图片描述
下载cuda12.3对应的pytorch

详细内容参考http://t.csdnimg.cn/RSaJE http://t.csdnimg.cn/pQMsb

下载cuDNN（https://developer.nvidia.com/rdp/cudnn-download）
在这里插入图片描述
没有英伟达账号要先注册
下载完毕后解压，把三个文件夹复制到到刚刚CUDA Toolkit安装的路径中即可

环境变量中添加上三个文件

环境配置完后编译代码如下：

！！！！！！注：千问1.5B量化模型不能用chat方法，不然会报错AttributeError: ‘Qwen2ForCausalLM‘ object has no attribute

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# 请注意：我们的分词器做了对特殊token攻击的特殊处理。因此，你不能输入诸如<|endoftext|>这样的token，会出现报错。
# 如需移除此策略，你可以加入这个参数`allowed_special`，可以接收"all"这个字符串或者一个特殊tokens的`set`。
# 举例: tokens = tokenizer(text, allowed_special="all")
tokenizer = AutoTokenizer.from_pretrained("D:\pycharmworkSpace\Qwen1.5-1.8B-Chat-GPTQ-Int4", trust_remote_code=True)

# 使用CPU进行推理，需要约32GB内存
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# 默认使用fp32精度
model = AutoModelForCausalLM.from_pretrained("D:\pycharmworkSpace\Qwen1.5-1.8B-Chat-GPTQ-Int4", device_map="auto", trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained("D:\pycharmworkSpace\Qwen1.5-1.8B-Chat-GPTQ-Int4", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参


prompt = "给我简短的介绍一下大模型。"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]


print(response)

运行成功

(qw22) PS D:\pycharmworkSpace\pythonProject\qw22> python tttt.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
CUDA extension not installed.
CUDA extension not installed.
Some weights of the model checkpoint at D:\pycharmworkSpace\Qwen1.5-1.8B-Chat-GPTQ-Int4 were not used when initializing Qwen2ForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.l
ayers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model
.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'mo
del.layers.10.self_attn.o_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11.mlp.gate_proj.bias', 'model.layers.11.mlp.up_proj.bias', 'model.layers.11.self_attn.o_proj.
bias', 'model.layers.12.mlp.down_proj.bias', 'model.layers.12.mlp.gate_proj.bias', 'model.layers.12.mlp.up_proj.bias', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.13.mlp.dow
n_proj.bias', 'model.layers.13.mlp.gate_proj.bias', 'model.layers.13.mlp.up_proj.bias', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.14.mlp.down_proj.bias', 'model.layers.14.
mlp.gate_proj.bias', 'model.layers.14.mlp.up_proj.bias', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.15.mlp.down_proj.bias', 'model.layers.15.mlp.gate_proj.bias', 'model.lay
ers.15.mlp.up_proj.bias', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.16.mlp.down_proj.bias', 'model.layers.16.mlp.gate_proj.bias', 'model.layers.16.mlp.up_proj.bias', 'mode
l.layers.16.self_attn.o_proj.bias', 'model.layers.17.mlp.down_proj.bias', 'model.layers.17.mlp.gate_proj.bias', 'model.layers.17.mlp.up_proj.bias', 'model.layers.17.self_attn.o_proj.bi
as', 'model.layers.18.mlp.down_proj.bias', 'model.layers.18.mlp.gate_proj.bias', 'model.layers.18.mlp.up_proj.bias', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.19.mlp.down_
proj.bias', 'model.layers.19.mlp.gate_proj.bias', 'model.layers.19.mlp.up_proj.bias', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.2.mlp.down_proj.bias', 'model.layers.2.mlp.
gate_proj.bias', 'model.layers.2.mlp.up_proj.bias', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.20.mlp.down_proj.bias', 'model.layers.20.mlp.gate_proj.bias', 'model.layers.20
.mlp.up_proj.bias', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.21.mlp.down_proj.bias', 'model.layers.21.mlp.gate_proj.bias', 'model.layers.21.mlp.up_proj.bias', 'model.laye
rs.21.self_attn.o_proj.bias', 'model.layers.22.mlp.down_proj.bias', 'model.layers.22.mlp.gate_proj.bias', 'model.layers.22.mlp.up_proj.bias', 'model.layers.22.self_attn.o_proj.bias', '
model.layers.23.mlp.down_proj.bias', 'model.layers.23.mlp.gate_proj.bias', 'model.layers.23.mlp.up_proj.bias', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.3.mlp.down_proj.bi
as', 'model.layers.3.mlp.gate_proj.bias', 'model.layers.3.mlp.up_proj.bias', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.4.mlp.down_proj.bias', 'model.layers.4.mlp.gate_proj.
bias', 'model.layers.4.mlp.up_proj.bias', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.5.mlp.down_proj.bias', 'model.layers.5.mlp.gate_proj.bias', 'model.layers.5.mlp.up_proj.
bias', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.6.mlp.down_proj.bias', 'model.layers.6.mlp.gate_proj.bias', 'model.layers.6.mlp.up_proj.bias', 'model.layers.6.self_attn.o_
proj.bias', 'model.layers.7.mlp.down_proj.bias', 'model.layers.7.mlp.gate_proj.bias', 'model.layers.7.mlp.up_proj.bias', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.8.mlp.dow
n_proj.bias', 'model.layers.8.mlp.gate_proj.bias', 'model.layers.8.mlp.up_proj.bias', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.9.mlp.down_proj.bias', 'model.layers.9.mlp.gate_proj.bias', 'model.layers.9.mlp.up_proj.bias', 'model.layers.9.self_attn.o_proj.bias']
- This IS expected if you are initializing Qwen2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Qwen2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
D:\qw22\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py:698: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(


A large language model (LLM) is a type of artificial intelligence system that can generate human-like text based on patterns and rules learned from vast amounts of text data. These mod
els have become increasingly popular in recent years due to their ability to process and understand complex language, as well as the potential applications they offer in various industries such as language translation, content generation, chatbots, and virtual assistants.

The main idea behind an LLM is to train a deep neural network, typically consisting of multiple layers of interconnected nodes, using a large dataset of text examples. The input to the network is usually a sentence or a piece of text, which it then processes through a series of steps:

1. Tokenization: Breaking down the input text into individual words or phrases called tokens.
2. Part-of-speech tagging: Identifying the grammatical structure of each token, such as noun, verb, adjective, etc.
3. Named entity recognition: Recognizing specific entities within the text, such as people, organizations, locations, and dates.
4. Dependency parsing: Analyzing the relationships between different parts of the sentence, such as subject-verb-object or prepositional phrase.
5. Sentiment analysis: Determining the overall sentiment expressed in the text, whether positive, negative, neutral, or mixed.
6. Contextual understanding: Taking into account the surrounding sentences, clauses, and phrases to generate coherent and relevant output.

During training, the LLM is exposed to a variety of language variations, genres, and contexts, which helps it learn to recognize patterns and common structures across different texts. This allows the model to generate natural-sounding responses to user queries or provide information on a wide range of topics.

The resulting output of an LLM can be used for various purposes, such as generating text-based content, summarizing lengthy documents, answering questions, translating languages, and even creating interactive dialogue with users. Some notable examples include:

1. Chatbots: Chatbots use LLMs to interact with humans by responding to queries, providing information, or engaging in casual conversation.
2. Content creation: LLMs can assist writers in generating new ideas, stories, scripts, or blog posts, by understanding context, generating alternative phrasing, or adapting to specific styles or tones.
3. Language translation: Translating text from one language to another using LLMs that have been trained on multilingual datasets.
4. Sentiment analysis: Using LLMs to analyze public opinions, news articles, social media posts, or product reviews to

在这里插入图片描述