一、evalscope使用说明
pip install evalscope gradio
1、如何使用智增增的接口:
VLLM_USE_MODELSCOPE=True evalscope eval \
--model qwen2.5-14b-instruct \
--api-url https://api.zhizengzeng.com/v1/chat/completions \
--api-key skxxx \
--eval-type service \
--datasets gsm8k \
--limit 10
备注:
(1)若没有显卡,提示torch_npu等报错,则增加 TORCH_DEVICE_BACKEND_AUTOLOAD=0
(2)其他数据集如aime24、live_code_bench都可直接使用,支持的完整数据集见:支持的数据集 | EvalScope
2、使用本地数据集
下载
--datasets mmlu_pro \ --limit 10 --dataset-args '{"mmlu_pro": {"local_path": "MMLU-Pro"}}' \
3、使用openCompass作为后端
(1)环境准备
# 安装opencompass依赖
pip install evalscope[opencompass] -U
(2)数据准备
# ModelScope下载
wget -O eval_data.zip https://www.modelscope.cn/datasets/swift/evalscope_resource/resolve/master/eval.zip
# 解压
unzip eval_data.zip
备注:flame和openFinData不在opencompass常用数据集中
(3)部署
OpenCompass 评测后端使用统一的OpenAI API调用来进行评测,因此我们需要进行模型部署。
(4)配置文件
eval_backend: OpenCompass
eval_config:
datasets:
- mmlu
- ceval
- ARC_c
- gsm8k
models:
- openai_api_base: http://127.0.0.1:8000/v1/chat/completions
path: qwen2-0_5b-instruct
temperature: 0.0
备注:这里的path为model_type
(5)运行脚本
from evalscope.run import run_task
from evalscope.summarizer import Summarizer
def run_eval():
# 选项 1: python 字典
task_cfg = task_cfg_dict
# 选项 2: yaml 配置文件
# task_cfg = 'eval_openai_api.yaml'
# 选项 3: json 配置文件
# task_cfg = 'eval_openai_api.json'
run_task(task_cfg=task_cfg)
print('>> Start to get the report with summarizer ...')
report_list = Summarizer.get_report_from_cfg(task_cfg)
print(f'\n>> The report list: {report_list}')
run_eval()
4、自定义数据集
二、openCompass使用说明(仅支持linux,否则报路径错误)
github 地址:https://github.com/open-compass/OpenCompass/https://github.com/open-compass/opencompass/blob/main/README_zh-CN.mdhttps://github.com/open-compass/OpenCompass/
1、安装
pip install -U opencompass
## Full installation (with support for more datasets)
# pip install "opencompass[full]"
## Environment with model acceleration frameworks
## Manage different acceleration frameworks using virtual environments
## since they usually have dependency conflicts with each other.
# pip install "opencompass[lmdeploy]"
# pip install "opencompass[vllm]"
## API evaluation (i.e. Openai, Qwen)
# pip install "opencompass[api]"
2、通过自定义API评测openFinData
参考:【LLM之评测】opencompass使用自定义接口与自定义数据集进行评测_opencompass 自定义数据集-优快云博客
(1)新建custom_api.py
from opencompass.models import OpenAISDK
models = [
dict(
type=OpenAISDK,
abbr="gpt-3.5-turbo",
path='gpt-3.5-turbo', # You need to set your own judge model path
key='sk-', # You need to set your own API key
openai_api_base=[
'http://localhost:8000/v1/chat/completions', # You need to set your own API base
],
query_per_second=10,
batch_size=1,
temperature=0.001,
verbose=True,
max_out_len=16384,
max_seq_len=49152,
retry=3
)
]
将此文件拷贝到opencompass的安装目录/config/models/,如: cp .\custom_api.py D:\Users\Administrator\miniconda3\envs\3.10\lib\site-packages\opencompass\configs\models\
注意:这里的path和abbr只能设置为gpt-3.5-turbo,否则转发不过去,然后可以通过本地地址修改model到智增增服务器。
新增main.py如下:
from fastapi import FastAPI, Request
import requests
app = FastAPI()
@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
data = await request.json()
print("Received POST request with data:", data)
api_key = "sk-xxx3"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
target_url = "https://api.zhizengzeng.com/v1/chat/completions"
data["model"] = "qwen2.5-3b-instruct" # 修改真正的model
response_data = None
try:
response = requests.post(target_url, json=data, headers=headers)
response_data = response.json()
print("Response from target URL:", response_data)
except Exception as e:
print(e)
return response_data
安装依赖:
pip install uvicorn flask requests
启动本地转发服务:
uvicorn main:app --reload
备注:若需要更改端口,可增加 --port 8080,--reload为修改代码后可自动重启web服务
(2)下载openfindata数据
将文件解压到当前目录,文件结构:data/openfindata_release/*.json
(3)执行评测
opencompass --models custom_api.py --datasets OpenFinData_gen
三、FAQ
1、Did you forget to register the node by TVM_REGISTER_NODE_TYPE ?
参考方案发现是安装了两个mlc相关版本,卸载mlc-llm-nightly 后在windows上可以正常运行
pip list| grep mlc
mlc-ai-nightly 0.15.dev559
mlc-llm-nightly 0.1.dev1519