Model Comparison
Compare performance and costs across different AI models using standardized benchmarks.
Intelligence
Sorted by: MMLU Pro, highest first
Model | Input Cost | Output Cost | MMLU Pro | Humanity's Last Exam |
---|---|---|---|---|
Gemini 2.5 Pro GoogleReasoning | $1.3 | $10.0 | 78.4 | 18.2 |
o1 OpenAIReasoning | $15.0 | $60.0 | 76.2 | 8.0 |
o3-mini OpenAIReasoning | $1.1 | $4.4 | 72.1 | 13.4 |
Coding
Sorted by: Aider Polyglot, highest first
Model | Input Cost | Output Cost | Aider Polyglot |
---|---|---|---|
Gemini 2.5 Pro GoogleReasoning | $1.3 | $10.0 | 72.9 |
Claude 3.7 Sonnet AnthropicReasoning | $3.0 | $15.0 | 64.9 |
o1 OpenAIReasoning | $15.0 | $60.0 | 61.7 |
Other
Sorted by: Input Cost, highest first
Model | Input Cost | Output Cost | Berkeley Function-Calling Leaderboard |
---|---|---|---|
GPT-4.5 OpenAI | $75.0 | $150.0 | 62.5 |
o1 OpenAIReasoning | $15.0 | $60.0 | 67.9 |
Claude 3.7 Sonnet AnthropicReasoning | $3.0 | $15.0 | 58.3 |