Updated Daily

AI Model Leaderboard

Track the top-performing AI models across multiple benchmarks. Data sourced from LMSYS Chatbot Arena, Hugging Face, Artificial Analysis, SWE-Bench, and Vectara.

Chatbot Arena

Crowdsourced ELO rankings from blind A/B tests where users compare model outputs without knowing which model generated them.

Open LLM Leaderboard

Benchmarks open-weight models on standardized academic tasks including reasoning, knowledge, and instruction following.

Artificial Analysis

Independent quality assessment combining multiple evaluation methods to create comprehensive model quality scores.

SWE-Bench Verified

Measures AI models on real-world software engineering tasks from GitHub issues, testing actual code generation and bug-fixing ability.

BS Meter

Vectara's hallucination benchmark measures how often models fabricate information when summarizing documents. Lower is better — 7,700+ articles tested.