AI API pricing history

Every frontier AI model’s per-token price, current and historical

The cheapest current frontier-tier input price is $0.18 per million tokens — Meta Llama 4 Scout (Together), as of July 12, 2026. That’s a 167× compression against GPT-4’s March 2023 launch price of $30 per million. Across 44 current models from 8 providers, with 106 distinct price points charted from 2020 onward.

As of July 12, 2026. Input pricing only in the headline; output, cached, and batch tiers all live in the table below.

Cheapest current frontier

$0.18 / $0.59

Llama 4 Scout (Together) (in / out per M)

Cheapest current small-tier

$0.10

Gemini 2.5 Flash-Lite input / M

2023 baseline (GPT-4)

$30.00

input / M at launch

Compression vs GPT-4

~167×

cheaper today

Per-token input price over time

One marker per published price point, color-coded by provider. The dashed line traces the running minimum across the frontier tier — the cheapest input-token price for a flagship-class model at each point in time. Y-axis is log-scale to fit the 6,000× range between $60/M davinci tokens and the sub-penny cache-hit floor.

AnthropicOpenAIGooglexAIMetaDeepSeekMistralAlibaba

Hover or tap any dot for the model, the full price (input, output, cached), the effective date, lifecycle status, and the source link.

The frontier price-floor in fourteen moves

Every time the cheapest-frontier-input number on the chart above stepped down. Each entry links into the model’s row on the relevant /ai/<family>/versions/ page for the full release context.

Jun 11, 2020GPT-3 davinci ships at $60/M input
Mar 1, 2023GPT-3.5 Turbo at $2/M lands at 10x cheaper than GPT-3
Mar 14, 2023GPT-4 launches at $30/M input — the 2023 frontier price
Jul 11, 2023Claude 2 at $11/M with first 100K context
Mar 4, 2024Claude 3 Haiku at $0.25/M sets a new mainstream floor
May 13, 2024GPT-4o ships at $5/M input — 6x off GPT-4 launch
May 6, 2024DeepSeek-V2 at $0.14/M triggers the China price war
Jul 18, 2024GPT-4o mini at $0.15/M closes the cheap-tier gap
Sep 24, 2024Gemini 1.5 Pro cuts to $1.25/M — 82% off launch
Apr 14, 2025GPT-4.1 at $2/M input with 1M context
Jun 10, 2025o3 reasoning cut to $2/M input — 80% off launch
Aug 7, 2025GPT-5 at $1.25/M — first OpenAI frontier under $2 input
Nov 24, 2025Claude Opus 4.5 resets the Opus tier from $15 to $5
Apr 30, 2026Grok 4.3 ships at $1.25/$2.50 — sharpest output cut yet

By provider

Each provider’s current flagship, cheapest current input tier, and most-recent price-change log. Δ column shows the per-million input-price change from the same model’s prior price point; “tier hold” means the price held against the predecessor in the same family.

Anthropic

Claude family

Versions →

Current flagship

$3.00 / $15.00

Claude Sonnet 5

Cheapest current input

$1.00 / M

Claude Haiku 4.5

Recent price-change log

EffectiveModelIn / OutΔ

Jun 30, 2026Claude Sonnet 5$3.00 / $15.00launch

Jun 9, 2026Claude Fable 5$10.00 / $50.00launch

May 28, 2026Claude Opus 4.8$5.00 / $25.00launch

Apr 16, 2026Claude Opus 4.7$5.00 / $25.00launch

Feb 17, 2026Claude Sonnet 4.6$3.00 / $15.00launch

Feb 5, 2026Claude Opus 4.6$5.00 / $25.00launch

Nov 24, 2025Claude Opus 4.5$5.00 / $25.00launch

OpenAI

ChatGPT family

Versions →

Current flagship

$5.00 / $30.00

GPT-5.6 Sol

Cheapest current input

$0.15 / M

GPT-4o mini

Recent price-change log

EffectiveModelIn / OutΔ

Jul 9, 2026GPT-5.6 Sol$5.00 / $30.00launch

Jun 5, 2026GPT-5.4$2.50 / $15.00tier hold

Apr 23, 2026GPT-5.5$5.00 / $30.00launch

Mar 5, 2026GPT-5.4$2.50 / $10.00launch

Feb 5, 2026GPT-5.3$1.75 / $14.00launch

Dec 11, 2025GPT-5.2$1.75 / $14.00launch

Nov 13, 2025GPT-5.1$1.25 / $10.00launch

Google

Gemini family

Versions →

Current flagship

$1.50 / $9.00

Gemini 3.5 Flash

Cheapest current input

$0.10 / M

Gemini 2.5 Flash-Lite

Recent price-change log

EffectiveModelIn / OutΔ

May 19, 2026Gemini 3.1 Pro$2.00 / $12.00↑ 60%

May 19, 2026Gemini 3.5 Flash$1.50 / $9.00launch

Mar 3, 2026Gemini 3.1 Flash-Lite$0.25 / $1.50launch

Feb 19, 2026Gemini 3.1 Pro$1.25 / $10.00launch

Dec 17, 2025Gemini 3 Flash$0.50 / $3.00launch

Nov 18, 2025Gemini 3 Pro$1.25 / $10.00launch

Jul 22, 2025Gemini 2.5 Flash-Lite$0.10 / $0.40launch

xAI

Grok family

Versions →

Current flagship

$2.00 / $6.00

Grok 4.5

Cheapest current input

$0.50 / M

Grok Composer 2.5

Recent price-change log

EffectiveModelIn / OutΔ

Jul 8, 2026Grok 4.5$2.00 / $6.00launch

Jun 14, 2026Grok 4.20$1.25 / $2.50↓ 38%

Jun 10, 2026Grok 4.3$1.25 / $2.50tier hold

Jun 1, 2026Grok Composer 2.5$0.50 / $2.50launch

May 14, 2026Grok Build 0.1$1.00 / $2.00launch

Apr 30, 2026Grok 4.3$1.25 / $2.50launch

Mar 10, 2026Grok 4.20$2.00 / $6.00launch

Notes and caveats

Headline tier is API list price. Every figure on this page is the public per-million-token rate the provider documents on its own pricing page. Negotiated enterprise pricing, committed-spend discounts, and special programs (OpenAI startup credits, Anthropic enterprise volume, Google Vertex AI region rates) are out of scope — this is a list-price reference.

Cached input pricing is a third axis that has grown more important since OpenAI introduced automatic prompt caching in October 2024 and Anthropic shipped explicit prompt caching in mid-2024. For workloads with a long stable prefix (agents replaying the same system prompt; RAG against a fixed corpus), the cached input rate dominates the effective per-call cost. The page surfaces it as a column rather than the headline number because non-cached workloads still see the headline rate.

Batch endpoints typically halve both input and output rates (OpenAI Batch, Anthropic Message Batches, Google Batch API) in exchange for asynchronous turnaround up to 24 hours. The row note flags batch availability per model; the headline number is the synchronous-API rate.

Open-weights pricing is a hosting-provider proxy. Meta’s Llama, DeepSeek, Mistral’s Apache-2.0-licensed releases, and Alibaba’s open-weights Qwen sizes have no “official” per-token price — the model is free to download. The row records the Together AI or Groq published rate as a reference. Cheaper rates exist on self-hosted inference (vLLM, sglang) or on cost-optimized providers (Fireworks, DeepInfra); more expensive ones exist on the hyperscalers (AWS Bedrock, Azure AI Foundry).

Reasoning tokens count as output. The o-series, Anthropic’s extended-thinking models, and Gemini’s 2.5 Flash thinking budget all bill reasoning tokens at the output rate. A “cheap-looking” reasoning model can be expensive in practice because each call generates more reasoning than visible output. The page records the per-token rate; the effective per-call cost is workload-dependent.

Tokenizer differences cross-provider. $1 / M tokens does not buy the same amount of text from OpenAI’s o200k tokenizer as it does from Anthropic’s claude-3 tokenizer or Google’s SentencePiece. Anthropic’s 4.7-era tokenizer in particular maps 1.0–1.35× the prior token count for the same input, which compresses the effective price improvement. Compare numbers in tokens, not in characters or pages, and treat them as a same-provider apples-to-apples rather than a strict cross-provider one.

Historical pricing is sourced from Wayback Machine and launch posts. Every effective-date entry links to a primary source — provider blog post, archived pricing page, or release announcement. Where the Wayback snapshot is the only viable cite (the provider has rewritten the page since), the URL points there directly.

About this page

Cross-family comparison page in the /ai/ section. Each row’s price is sourced from the provider’s own pricing page — OpenAI’s openai.com/api/pricing, Anthropic’s anthropic.com/pricing, Google’s ai.google.dev/gemini-api/docs/pricing, xAI’s docs.x.ai, DeepSeek’s api-docs.deepseek.com, Mistral’s mistral.ai/pricing, and Alibaba’s help.aliyun.com. Meta’s Llama models are open-weights; this page cites Together AI’s published rate as the reference hosting-provider price.

The model roster mirrors the per-family pages already on this site — Claude, ChatGPT, Gemini, Grok, Llama, DeepSeek, Mistral, Qwen — so each row links back to the matching version-page entry for the full per-release context. The cross-family context-windows, release-cadence, and benchmarks pages cover the other comparison axes.

Refreshed daily as part of the cross-family /ai/ sweep. Each refresh re-verifies every active row against the provider’s current pricing page; values that changed since the previous run get a new entry appended to the model’s price history and the row’s effective date is bumped. Stale rows for deprecated models are kept as historical record — the price history is the chart, not just the current rate.

Last updated: July 12, 2026. 89 models · 8 providers · 106 price points.

Every frontier AI model’s per-token price, current and historical

Per-token input price over time

The frontier price-floor in fourteen moves

Every model’s current price

By provider

Notes and caveats

About this page