数学测评榜单:测评榜单 - MathEval

| Rank | 🤖 Model | ⭐ Arena Elo | 📊 95% CI | 🗳️ Votes | Organization | License |
|---|---|---|---|---|---|---|
| 1 | 1251 | +5/-4 | 48226 | OpenAI | Proprietary | |
| 1 | 1249 | +5/-6 | 22282 | OpenAI | Proprietary | |
| 1 | 1247 | +6/-6 | 14854 | Anthropic | Proprietary | |
| 4 | 1202 | +6/-7 | 12623 | | Proprietary | |
| 4 | 1190 | +6/-6 | 14845 | Anthropic | Proprietary | |
| 5 | 1185 | +4/-6 | 27245 | OpenAI | Proprietary | |
| 7 | 1159 | +4/-5 | 43783 | OpenAI | Proprietary | |
| 7 | 1155 | +5/-6 | 18959 | Mistral | Proprietary | |
| 8 | 1146 | +4/-5 | 16729 | Alibaba | Qianwen LICENSE | |
| 8 | 1145 | +5/-6 | 21929 | Anthropic | Proprietary | |
| 8 | 1145 | +5/-4 | 23931 | Mistral | Proprietary | |
| 12 | 1126 | +7/-5 | 13679 | Anthropic | Proprietary | |
| 12 | 1123 | +5/-5 | 11875 | Mistral | Proprietary | |
| 12 | 1118 | +5/-6 | 12146 | | Proprietary | |
| 13 | 1115 | +4/-6 | 32431 | Anthropic | Proprietary |

本文概述了MathEval平台上不同AI模型如GPT-4、Claude、Bard等的测评结果,展示了它们在排行榜中的位置,以及相应的性能指标和组织归属。
3211

被折叠的 条评论
为什么被折叠?



