WebDev Leaderboard

Compare the performance of AI models for web development tasks built in the Code Arena

Last Updated

Dec 23, 2025

Total Votes

75,257

Total Models

/

	Rank Spread
1	1◄─►1	claude-opus-4-5-20251101-thinking-32k	1520	+12/-12	4,088	Anthropic	Proprietary
2	2◄─►5	gpt-5.2-high	1484	+17/-17	1,647	OpenAI	Proprietary
3	2◄─►5	claude-opus-4-5-20251101	1480	+12/-12	4,010	Anthropic	Proprietary
4	2◄─►5	gemini-3-pro	1478	+10/-10	9,066	Google	Proprietary
5	2◄─►6	gemini-3-flash	1465	+13/-13	2,233	Google	Proprietary
6	5◄─►6	glm-4.7	1449	+15/-15	1,570	Z.ai	MIT
7	7◄─►13	gpt-5-medium	1398	+12/-12	3,949	OpenAI	Proprietary
8	7◄─►14	gpt-5.2	1398	+15/-15	1,641	OpenAI	Proprietary
9	7◄─►13	claude-sonnet-4-5-20250929-thinking-32k	1393	+9/-9	8,150	Anthropic	Proprietary
10	7◄─►14	gpt-5.1-medium	1392	+10/-10	5,191	OpenAI	Proprietary
11	7◄─►14	claude-opus-4-1-20250805	1388	+9/-9	7,786	Anthropic	Proprietary
12	7◄─►14	claude-sonnet-4-5-20250929	1387	+9/-9	9,174	Anthropic	Proprietary
13	7◄─►16	gemini-3-flash (thinking-minimal)	1381	+14/-14	1,883	Google	Proprietary
14	13◄─►16	glm-4.6	1367	+9/-9	7,489	Z.ai	MIT
15	9◄─►18	deepseek-v3.2-thinking	1366	+16/-16	1,404	DeepSeek AI	MIT
16	13◄─►17	gpt-5.1	1360	+9/-9	7,108	OpenAI	Proprietary
17	16◄─►19	kimi-k2-thinking-turbo	1341	+9/-9	6,882	Moonshot	Modified MIT
18	15◄─►20	mimo-v2-flash	1337	+18/-18	1,039	Xiaomi	MIT
19	17◄─►20	gpt-5.1-codex	1335	+10/-10	5,287	OpenAI	Proprietary
20	18◄─►20	minimax-m2	1316	+9/-9	7,592	MiniMax	Apache 2.0
21	21◄─►24	deepseek-v3.2-exp	1293	+10/-10	5,161	DeepSeek AI	MIT
22	21◄─►24	claude-haiku-4-5-20251001	1290	+9/-9	7,857	Anthropic	Proprietary
23	21◄─►24	qwen3-coder-480b-a35b-instruct	1289	+9/-9	7,756	Alibaba	Apache 2.0
24	21◄─►26	deepseek-v3.2	1281	+15/-15	1,707	DeepSeek AI	MIT
25	24◄─►26	KAT-Coder-Pro-V1	1263	+15/-15	1,946	KwaiKAT	Proprietary
26	24◄─►28	gpt-5.1-codex-mini	1251	+17/-17	1,565	OpenAI	Proprietary
27	26◄─►30	grok-4-1-fast-reasoning	1226	+13/-13	3,720	xAI	Proprietary
28	26◄─►30	mistral-large-3	1225	+20/-20	1,027	Mistral	Apache 2.0
29	27◄─►30	gemini-2.5-pro	1212	+13/-13	3,505	Google	Proprietary
30	27◄─►30	grok-4.1-thinking	1205	+19/-19	1,262	xAI	Proprietary
31	31◄─►32	grok-4-fast-reasoning	1152	+23/-23	945	xAI	Proprietary
32	31◄─►33	grok-code-fast-1	1142	+21/-21	1,014	xAI	Proprietary
33	32◄─►33	devstral-medium-2507	1102	+22/-22	1,033	Mistral	Proprietary

WebDev Leaderboard

Remove Style Control Leaderboard Plots

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)