Model Comparison

Compare performance and costs across different AI models using standardized benchmarks.

Intelligence

Sorted by: MMLU Pro, highest first

Model	Input Cost	Output Cost	MMLU Pro	Humanity's Last Exam
Gemini 2.5 Pro GoogleReasoning	$1.3	$10.0	78.4	18.2
o1 OpenAIReasoning	$15.0	$60.0	76.2	8.0
o3-mini OpenAIReasoning	$1.1	$4.4	72.1	13.4

Sorted by: Aider Polyglot, highest first

Model	Input Cost	Output Cost	Aider Polyglot
Gemini 2.5 Pro GoogleReasoning	$1.3	$10.0	72.9
Claude 3.7 Sonnet AnthropicReasoning	$3.0	$15.0	64.9
o1 OpenAIReasoning	$15.0	$60.0	61.7

Sorted by: Input Cost, highest first

Model	Input Cost	Output Cost	Berkeley Function-Calling Leaderboard
GPT-4.5 OpenAI	$75.0	$150.0	62.5
o1 OpenAIReasoning	$15.0	$60.0	67.9
Claude 3.7 Sonnet AnthropicReasoning	$3.0	$15.0	58.3