导航菜单

切换主题

评测排行榜

各模型在不同评测基准上的表现排名

aider-polyglot

排名	模型	厂商	得分
1	GPT-5.5	OpenAI	90.00
2	GPT-5.4 Pro	OpenAI	89.50
3	Claude Opus 4.7	Anthropic	88.50
4	O4 Mini	OpenAI	88.50
5	GPT-5	OpenAI	88.00
6	GPT-5.5 Pro	OpenAI	88.00
7	Claude Sonnet 4.6	Anthropic	86.00
8	DeepSeek V3.2	DeepSeek	85.00
9	DeepSeek V4 Pro	DeepSeek	85.00
10	O1	OpenAI	84.20

aime

排名	模型	厂商	得分
1	GPT-5.4 Pro	OpenAI	98.70
2	Gemini 3.1 Pro Preview	Google	98.20
3	Kimi K2.6	月之暗面	96.40
4	GLM-4.7	智谱AI	95.70
5	Claude Opus 4.6	Anthropic	95.60
6	GLM-5	智谱AI	95.40
7	GLM-5.1	智谱AI	95.30
8	Qwen3.6 Plus	阿里巴巴	95.10
9	DeepSeek V3.2	DeepSeek	95.10
10	Kimi K2.5	月之暗面	94.50

aime-2025

排名	模型	厂商	得分
1	MiniMax M2.5	MiniMax	86.30

amc

排名	模型	厂商	得分
1	O4 Mini	OpenAI	88.00
2	GPT-5.5	OpenAI	85.00
3	Claude Opus 4.7	Anthropic	82.00

arc-agi-2

排名	模型	厂商	得分
1	GPT-5.5	OpenAI	40.00
2	Claude Opus 4.7	Anthropic	35.00
3	GPT-5.4	OpenAI	32.00
4	Gemini 2.5 Pro	Google	30.00
5	Gemini 3.1 Pro Preview	Google	28.00
6	O1	OpenAI	25.00
7	DeepSeek V4 Pro	DeepSeek	22.00