AI Model Benchmarks
Comprehensive collection of benchmarks organized by category
Intelligence
MMLU Pro
IntelligenceA comprehensive benchmark for evaluating language models across multiple domains
View LeaderboardHumanity's Last Exam
IntelligenceA challenging benchmark testing AI models' understanding of complex human concepts
View LeaderboardCoding
Aider Polyglot
CodingEvaluates AI models' ability to understand and generate code across multiple programming languages
View LeaderboardOther
Berkeley Function-Calling Leaderboard
OtherMeasures AI models' ability to correctly call and use functions in various contexts
View Leaderboard