OpenCompass模型评估

OpenCompass面向大模型的开源方和使用者, 提供开源、高效、全面的大模型评测开放平台。

一、OpenCompass文档

1.基础安装
使用Conda准备 OpenCompass 运行环境:

conda create --name opencompass python=3.10 -y
conda activate opencompass

2. 安装 OpenCompass

git clone https://github.com/open-compass/opencompass opencompass
cd opencompass
pip install -e .
root@dsw-990246-5b546cb984-pcfzq:/mnt/workspace# conda create --name opencompass python=3.10 -y
root@dsw-990246-5b546cb984-pcfzq:/mnt/workspace# conda activate opencompass
(opencompass) root@dsw-990246-5b546cb984-pcfzq:/mnt/workspace# cd opencompass
(opencompass) root@dsw-990246-5b546cb984-pcfzq:/mnt/workspace/opencompass# pip install -e .
(opencompass) root@dsw-990246-5b546cb984-pcfzq:/mnt/workspace/opencompass# pip install -e .
### 报错如下
Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple
Obtaining file:///mnt/workspace/opencompass
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [56 lines of output]
      /mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/_distutils_hack/__init__.py:53: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
        warnings.warn(
      /mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/__init__.py:94: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
      !!
      
              ********************************************************************************
              Requirements should be satisfied by a PEP 517 installer.
              If you are using pip, you can try `pip install --use-pep517`.
              ********************************************************************************
      
      !!
        dist.fetch_build_eggs(dist.setup_requires)
      WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mirrors.cloud.aliyuncs.com'. (_ssl.c:1007)"))': /pypi/simple/nltk/
      WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mirrors.cloud.aliyuncs.com'. (_ssl.c:1007)"))': /pypi/simple/nltk/
      WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mirrors.cloud.aliyuncs.com'. (_ssl.c:1007)"))': /pypi/simple/nltk/
      WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mirrors.cloud.aliyuncs.com'. (_ssl.c:1007)"))': /pypi/simple/nltk/
      WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'mirrors.cloud.aliyuncs.com'. (_ssl.c:1007)"))': /pypi/simple/nltk/
      ERROR: Could not find a version that satisfies the requirement nltk==3.8 (from versions: none)
      ERROR: No matching distribution found for nltk==3.8
      Traceback (most recent call last):
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/installer.py", line 107, in _fetch_build_egg_no_warn
          subprocess.check_call(cmd)
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/subprocess.py", line 369, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/mnt/workspace/Anaconda3/envs/opencompass/bin/python', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpvvttj4xe', '--quiet', 'nltk==3.8']' returned non-zero exit status 1.
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/mnt/workspace/opencompass/setup.py", line 164, in <module>
          do_setup()
        File "/mnt/workspace/opencompass/setup.py", line 108, in do_setup
          setup(
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/__init__.py", line 116, in setup
          _install_setup_requires(attrs)
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/__init__.py", line 89, in _install_setup_requires
          _fetch_build_eggs(dist)
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/__init__.py", line 94, in _fetch_build_eggs
          dist.fetch_build_eggs(dist.setup_requires)
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/dist.py", line 768, in fetch_build_eggs
          return _fetch_build_eggs(self, requires)
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/installer.py", line 44, in _fetch_build_eggs
          resolved_dists = pkg_resources.working_set.resolve(
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/pkg_resources/__init__.py", line 893, in resolve
          dist = self._resolve_dist(
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/pkg_resources/__init__.py", line 929, in _resolve_dist
          dist = best[req.key] = env.best_match(
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1267, in best_match
          return self.obtain(req, installer)
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1303, in obtain
          return installer(requirement) if installer else None
        File "/mnt/workspace/Anaconda3/envs/opencompass/lib/python3.10/site-packages/setuptools/installer.py", line 109, in _fetch_build_egg_no_warn
          raise DistutilsError(str(e)) from e
      distutils.errors.DistutilsError: Command '['/mnt/workspace/Anaconda3/envs/opencompass/bin/python', '-m', 'pip', '--disable-pip-version-check', 'wheel', '--no-deps', '-w', '/tmp/tmpvvttj4xe', '--quiet', 'nltk==3.8']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
改用如下安装成功
(opencompass) root@dsw-990246-5b546cb984-pcfzq:/mnt/workspace/opencompass# pip install -r /mnt/workspace/opencompass/requirements/runtime.txt

3.数据集准备
在 OpenCompass 项目根目录下运行下面命令,将数据集准备至 ${OpenCompass}/data 目录下:

wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip

4.配置评估任务
命令行(自定义HF模型)
对于 HuggingFace 模型,用户可以通过命令行直接设置模型参数,无需额外的配置文件。例如,对于
internlm/internlm2-chat-1_8b 模型,可以使用以下命令进行评估:
命令行(自定义hf类型)

windows环境好像不太行
 python run.py --datasets demo_gsm8k_chat_gen demo_math_chat_gen  --hf-type chat --hf-path "D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\model\Qwen\Qwen1___5-0___5B-Chat" --debug
 python run.py --datasets demo_gsm8k_chat_gen demo_math_chat_gen --hf-path "D:\Program Files\python\PycharmProjects\AiStudyProject\demo07\models\Qwen\Qwen2___5-1___5B-Instruct" --debug

报错如下

signal.SIGALRM is not available on this platform
signal.SIGALRM is not available on this platform
04/11 15:38:55 - OpenCompass - INFO - Loading demo_gsm8k_chat_gen: D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\opencompass\opencompass\configs\./datasets\demo\demo_gsm8k_chat_gen.py
04/11 15:38:55 - OpenCompass - INFO - Loading demo_math_chat_gen: D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\opencompass\opencompass\configs\./datasets\demo\demo_math_chat_gen.py
04/11 15:38:55 - OpenCompass - INFO - Loading example: D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\opencompass\opencompass\configs\./summarizers\example.py
04/11 15:38:55 - OpenCompass - INFO - Current exp folder: outputs\default\20250411_153855
04/11 15:38:56 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
demo_gsm8k train 7473
demo_gsm8k test 1319
demo_math test 5000
demo_math train 5000
04/11 15:38:59 - OpenCompass - INFO - Partitioned into 1 tasks.
Traceback (most recent call last):
  File "D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\opencompass\run.py", line 4, in <module>
    main()
  File "D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\opencompass\opencompass\cli\main.py", line 339, in main
    runner(tasks)
  File "D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\opencompass\opencompass\runners\base.py", line 38, in __call__
    status = self.launch(tasks)
  File "D:\Program Files\python\PycharmProjects\AiStudyProject\demo14\opencompass\opencompass\runners\local.py", line 102, in launch
    assert len(all_gpu_ids) >= num_gpus
AssertionError

AssertionError: assert len(all_gpu_ids) >= num_gpus。这表明系统检测到的可用GPU数量不足以满足任务所需的GPU数量。

安装cuda

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 python run.py --datasets demo_gsm8k_chat_gen demo_math_chat_gen --hf-path "D:\Program Files\python\PycharmProjects\AiStudyProject\demo07\models\Qwen\Qwen2___5-1___5B-Instruct" --debug --hf-num-gpus 1  #指定使用gpu的数量

或者直接使用cpu

 python run.py --datasets demo_gsm8k_chat_gen demo_math_chat_gen --hf-path "D:\Program Files\python\PycharmProjects\AiStudyProject\demo07\models\Qwen\Qwen2___5-1___5B-Instruct" --debug --hf-num-gpus 0

运行成功以后有outputs文件
在这里插入图片描述

在这里插入图片描述

没有结果,不知道什么原因

在这里插入图片描述
日志如下:

D:\envs\opencompass\python.exe: can't open file 'D:\\Program': [Errno 2] No such file or directory
D:\envs\opencompass\python.exe: can't open file 'D:\\Program': [Errno 2] No such file or directory
D:\envs\opencompass\python.exe: can't open file 'D:\\Program': [Errno 2] No such file or directory

但是不管怎么改都是一样的问题,所以改用linux环境

linux 环境没问题
(opencompass) root@dsw-990246-5b546cb984-pcfzq:/mnt/workspace/opencompass# python run.py --datasets demo_gsm8k_chat_gen demo_math_chat_gen  --hf-type chat --hf-path "/mnt/workspace/llm/Qwen/Qwen1.5-0.5B-Chat" --debug

正常启动
在这里插入图片描述
有正常的outputs文件输出
在这里插入图片描述
有评测结果
在这里插入图片描述

请注意,通过这种方式,OpenCompass 一次只评估一个模型,而其他方式可以一次评估多个模型。
命令行
用户可以使用 --models 和 --datasets 结合想测试的模型和数据集。

models 名为以下两个文件名,不然会报错
在这里插入图片描述

python run.py --models hf_qwen2_5_0_5b_instruct hf_qwen1_5_0_5b_chat --datasets demo_gsm8k_chat_gen demo_math_chat_gen --debug

在文件opencompass/opencompass/configs/models/qwen2_5或者qwen在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
最后看到如下得分:opencompass/outputs/default/20250411_174951/summary/summary_20250411_174951.md
在这里插入图片描述

模型和数据集的配置文件预存于 configs/models 和 configs/datasets 中。用户可以使用
tools/list_configs.py 查看或过滤当前可用的模型和数据集配置。

# 列出所有配置
python tools/list_configs.py
# 列出与llama和mmlu相关的所有配置
python tools/list_configs.py llama mmlu

运行 python tools/list_configs.py llama mmlu 将产生如下输出:

+-----------------+-----------------------------------+
| Model | Config Path |
|-----------------+-----------------------------------|
| hf_llama2_13b | configs/models/hf_llama2_13b.py |
| hf_llama2_70b | configs/models/hf_llama2_70b.py |
| ... | ... |
+-----------------+-----------------------------------+
+-------------------+---------------------------------------------------+
| Dataset | Config Path |
|-------------------+---------------------------------------------------|
| cmmlu_gen | configs/datasets/cmmlu/cmmlu_gen.py |
| cmmlu_gen_ffe7c0 | configs/datasets/cmmlu/cmmlu_gen_ffe7c0.py |
| ... | ... |
+-------------------+---------------------------------------------------+

用户可以使用第一列中的名称作为 python run.py 中 --models 和 --datasets 的输入参数。对于数据集,同一名称的不同后缀通常表示其提示或评估方法不同。

二、配置文件

除了通过命令行配置实验外,OpenCompass 还允许用户在配置文件中编写实验的完整配置,并通过run.py 直接运行它。配置文件是以 Python 格式组织的,并且必须包括 datasets 和 models 字段。

from mmengine.config import read_base

with read_base():
    from .datasets.demo.demo_gsm8k_chat_gen import gsm8k_datasets
    from .datasets.demo.demo_math_chat_gen import math_datasets
    from .models.qwen.hf_qwen1_5_0_5b_chat import models as hf_qwen1_5_0_5b_chat
    from .models.qwen2_5.hf_qwen2_5_1_5b_instruct import models as hf_qwen2_5_1_5b_instruct

datasets = gsm8k_datasets + math_datasets
models = hf_qwen1_5_0_5b_chat + hf_qwen2_5_1_5b_instruct

运行任务时,我们只需将配置文件的路径传递给 run.py :

python run.py configs/eval_chat_demo.py --debug

三、自定义数据集

数据集格式
选择题 ( mcq )

对于选择 ( mcq ) 类型的数据,默认的字段如下:

  • question : 表示选择题的题干
  • A , B , C , …: 使用单个大写字母表示选项,个数不限定。默认只会从 A 开始,解析连续的字母作
    为选项。
  • answer : 表示选择题的正确答案,其值必须是上述所选用的选项之一,如 A , B , C 等。
    对于非默认字段,我们都会进行读入,但默认不会使用。如需使用,则需要在 .meta.json 文件中进行
    指定。
    .jsonl 格式样例如下:
{"question": "165+833+650+615=", "A": "2258", "B": "2263", "C": "2281", "answer":
"B"}
{"question": "368+959+918+653+978=", "A": "3876", "B": "3878", "C": "3880",
"answer": "A"}
{"question": "776+208+589+882+571+996+515+726=", "A": "5213", "B": "5263", "C":
"5383", "answer": "B"}
{"question": "803+862+815+100+409+758+262+169=", "A": "4098", "B": "4128", "C":
"4178", "answer": "C"}
  • .csv 格式样例如下:
question,A,B,C,answer
127+545+588+620+556+199=,2632,2635,2645,B
735+603+102+335+605=,2376,2380,2410,B
506+346+920+451+910+142+659+850=,4766,4774,4784,C
504+811+870+445=,2615,2630,2750,B
问答题 ( qa )

对于问答 ( qa ) 类型的数据,默认的字段如下:

  • question : 表示问答题的题干
  • answer : 表示问答题的正确答案。可缺失,表示该数据集无正确答案。
    对于非默认字段,我们都会进行读入,但默认不会使用。如需使用,则需要在 .meta.json 文件中进行
    指定。
    .json格式样例如下:
{"question": "752+361+181+933+235+986=", "answer": "3448"}
{"question": "712+165+223+711=", "answer": "1811"}
{"question": "921+975+888+539=", "answer": "3323"}
{"question": "752+321+388+643+568+982+468+397=", "answer": "4519"}

.csv 格式样例如下:

question,answer
123+147+874+850+915+163+291+604=,3967
149+646+241+898+822+386=,3142
332+424+582+962+735+798+653+214=,4700
649+215+412+495+220+738+989+452=,4170

6.命令行列表
自定义数据集可直接通过命令行来调用开始评测。

python run.py --models hf_llama2_7b --custom-dataset-path xxx/test_mcq.csv --custom-dataset-data-type mcq --custom-dataset-infer-method ppl
python run.py \
--models hf_llama2_7b \
--custom-dataset-path xxx/test_qa.jsonl \
--custom-dataset-data-type qa \
--custom-dataset-infer-method gen

在绝大多数情况下, --custom-dataset-data-type 和 --custom-dataset-infer-method 可以省略,

  • OpenCompass 会根据以下逻辑进行设置:
    如果从数据集文件中可以解析出选项,如 A , B , C 等,则认定该数据集为 mcq ,否则认定为
    qa 。
  • 默认 infer_method 为 gen 。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值