使用opencompass在Humaneval+上对比Qwen2-7B和AlchemistCoder-DS-6.7B

最新推荐文章于 2025-04-21 08:29:54 发布

mybbsss

最新推荐文章于 2025-04-21 08:29:54 发布

阅读量749

点赞数 5

文章标签： elasticsearch 大数据搜索引擎人工智能深度学习

本文链接：https://blog.youkuaiyun.com/mycomin/article/details/139624998

版权

AlchemistCoder

AlchemistCoder 是InternLM的代码生成模型
对比一下类似参数，近期评价很好的Qwen2-7B
在这里插入图片描述

下载模型

分别下载到本地（使用hf-mirror）

huggingface-cli download --resume-download internlm/AlchemistCoder-DS-6.7B --local-dir /root/models/AlchemistCoder-DS-6.7B/

huggingface-cli download --resume-download Qwen/Qwen2-7B-Instruct --local-dir /root/models/Qwen2-7B-Instruct/

运行评测

python run.py
--datasets humaneval_plus_gen \
--hf-path /root/models/AlchemistCoder-DS-6.7B \  # HuggingFace 模型路径
--tokenizer-path /root/models/AlchemistCoder-DS-6.7B \  # HuggingFace tokenizer 路径（如果与模型路径相同，可以省略）
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 1024 \  # 模型可以接受的最大序列长度
--max-out-len 1024 \  # 生成的最大 token 数
--batch-size 2  \  # 批量大小
--num-gpus 1  \ # 运行模型所需的 GPU 数量
--debug

python run.py  \
--datasets humaneval_plus_gen  \ 
--hf-path /root/models/Qwen2-7B-Instruct  \
--tokenizer-path /root/models/Qwen2-7B-Instruct  \
--tokenizer-kwargs  trust_remote_code=True padding_side='left' truncation=''left'    \
--model-kwargs device_map='auto' trust_remote_code=True torch_dtype='auto'   \ 
--max-seq-len 1024 \ 
--max-out-len 1024 \ 
--batch-size 2  \  
--num-gpus 1  \ 
--debug

报错

  File "/root/opencompass/opencompass/datasets/humaneval.py", line 71, in __init__
    raise ImportError(
ImportError: Please install evalplus use following steps:
git clone --recurse-submodules git@github.com:open-compass/human-eval.git
cd human-eval
pip install -e .
pip install -e evalplus

如果直接pip install parallel也会报错如下

File "/root/.conda/envs/opencompass/lib/python3.10/site-packages/evalplus/evaluate.py", line 128, in evaluate
    if flags.parallel is None:
AttributeError: 'dict' object has no attribute 'parallel'

后使用源码安装（git clone --recurse-submodules git@github.com:open-compass/human-eval.git 会报错）

git clone  https://github.com/open-compass/evalplus.git
cd evalplus
pip install -e .

替换了版本

Successfully built evalplus
Installing collected packages: evalplus
  Attempting uninstall: evalplus
    Found existing installation: evalplus 0.2.1
    Uninstalling evalplus-0.2.1:
      Successfully uninstalled evalplus-0.2.1
Successfully installed evalplus-0.1.0.dev598

batch size参数设置

batch size	time
2	1637.15s
10	526.18s
20	24G显存不足

结果

AlchemistCoder-DS-6.7B	Qwen2-7B-Instruct
45.73	35.36

两个模型的得分都低于主页上的分数
Qwen2-7B-Instruct一开始是0，部分预测内容如下
在这里插入图片描述需要修改
opencompass/opencompass/models/huggingface.py

# self.model.generation_config.do_sample = False

然后可以正常跑完