CLiB中文大模型能力评测榜单

最新推荐文章于 2025-07-15 13:36:19 发布

原创

最新推荐文章于 2025-07-15 13:36:19 发布 · 2.6k 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #大模型 #评测

该博客对48个大模型进行多维度能力评测，涵盖商用与开源模型，来源广泛。支持分类、信息抽取等能力评测，给出综合及各项能力排行榜，介绍评分方法。强调公开、公正、公平的评测系统对业界、产业界和研发人员的重要意义。

1 引言

目前已囊括48个大模型，覆盖chatgpt、gpt4、谷歌bard、百度文心一言、阿里通义千问、讯飞星火、360智脑、商汤senseChat、微软new-bing、minimax、tigerbot等商用模型，以及百川、belle、chatglm6b、ziya、guanaco、Phoenix、linly、MOSS、AquilaChat、vicuna、wizardLM、书生internLM、llama2-chat等开源大模型。
模型来源涉及国内外大厂、大模型创业公司、高校研究机构。
支持多维度能力评测，包括分类能力、信息抽取能力、阅读理解能力、表格问答能力。

2 大模型基本信息

由于大模型较多，下表只展示部分大模型的信息，更多更详细的信息，见https://github.com/jeinlee1991/chinese-llm-benchmark

大模型	机构	类别	链接
chatgpt-3.5	openai	商用	https://chat.openai.com
文心一言	百度	商用	https://yiyan.baidu.com
chatglm官方	智谱AI	商用	https://chatglm.cn
讯飞星火	科大讯飞	商用	https://xinghuo.xfyun.cn/desk
360智脑	奇虎360	商用	https://ai.360.cn/
阿里通义千问	阿里巴巴	商用	https://tongyi.aliyun.com
minimax	minimax	商用	https://api.minimax.chat
tigerbot-7b官网	虎博科技	商用/开源	https://www.tigerbot.com/
chatglm-6b	清华大学&智谱AI	开源	https://github.com/THUDM/ChatGLM-6B
belle-llama-7b-2m	链家科技	开源	https://github.com/LianjiaTech/BELLE
BELLE-on-Open-Datasets	链家科技	开源	https://github.com/LianjiaTech/BELLE
belle-llama-13b-2m	链家科技	开源	https://github.com/LianjiaTech/BELLE
belle-llama-13b-ext	链家科技	开源	https://github.com/LianjiaTech/BELLE
Ziya-LLaMA-13B-v1	IDEA研究院	开源	https://mp.weixin.qq.com/s/IeXgq8blGoeVbpIlAUCAjA
guanaco-7b	JosephusCheung	开源	https://huggingface.co/JosephusCheung/Guanaco
phoenix-inst-chat-7b	港中文	开源	https://github.com/FreedomIntelligence/LLMZoo
linly-chatflow-13b	深圳大学	开源	https://github.com/CVI-SZU/Linly
MOSS-003-SFT	复旦大学	开源	https://github.com/OpenLMLab/MOSS
AquilaChat-7B	智源研究院	开源	https://github.com/FlagAI-Open/FlagAI/blob/master/examples/Aquila/README.md
tulu-30b	allenai	开源	https://github.com/allenai/open-instruct
chatglm2-6b	清华大学&智谱AI	开源	https://github.com/THUDM/ChatGLM2-6B
Baichuan-13B-Chat	百川智能	开源	https://github.com/baichuan-inc/Baichuan-13B
……	……	……	……

3 排行榜

3.1 综合能力排行榜

综合能力得分为分类能力、信息抽取能力、阅读理解能力、数据分析能力四者得分的平均值。

类别	大模型	总分	排名
商用	gpt4	96.1	1
商用	chatgpt-3.5	93.6	2
开源	tigerbot-70b-chat-v2	86.5	3
商用	文心一言v2.2	86.5	4
商用	讯飞星火v3	85.8	5
商用	谷歌bard	84.1	6
开源	tigerbot-70b-chat-v3	83.5	7
开源	openbuddy-llama2-70b-v10.1	83.2	8
开源	aquilachat2-34b	82.5	9
商用	商汤senseChat	81.9	10
商用	文心4.0	81.8	11
商用	Baichuan2-53B	81.8	12
开源	BELLE-Llama2-13B-chat-0.4M	79.8	13
商用	豆包	79.5	14
开源	qwen-14b-chat	79.4	15
开源	Baichuan2-13B-Chat	79.4	16
开源	Baichuan2-7B-Chat	79.1	17
商用	阿里通义千问	79.0	18
开源	belle-l