Model Comparison

Compare performance and costs across different AI models using standardized benchmarks.

Intelligence

Sorted by: MMLU Pro, highest first

Model
Input Cost
Output Cost
MMLU Pro
Humanity's Last Exam
Gemini 2.5 Pro
GoogleReasoning
$1.3$10.078.418.2
o1
OpenAIReasoning
$15.0$60.076.28.0
o3-mini
OpenAIReasoning
$1.1$4.472.113.4

Coding

Sorted by: Aider Polyglot, highest first

Model
Input Cost
Output Cost
Aider Polyglot
Gemini 2.5 Pro
GoogleReasoning
$1.3$10.072.9
Claude 3.7 Sonnet
AnthropicReasoning
$3.0$15.064.9
o1
OpenAIReasoning
$15.0$60.061.7

Other

Sorted by: Input Cost, highest first

Model
Input Cost
Output Cost
Berkeley Function-Calling Leaderboard
GPT-4.5
OpenAI
$75.0$150.062.5
o1
OpenAIReasoning
$15.0$60.067.9
Claude 3.7 Sonnet
AnthropicReasoning
$3.0$15.058.3