AI API pricing history
Every frontier AI model’s per-token price, current and historical
The cheapest current frontier-tier input price is $0.18 per million tokens — Meta Llama 4 Scout (Together), as of June 3, 2026. That’s a 167× compression against GPT-4’s March 2023 launch price of $30 per million. Across 32 current models from 8 providers, with 81 distinct price points charted from 2020 onward.
As of June 3, 2026. Input pricing only in the headline; output, cached, and batch tiers all live in the table below.
Per-token input price over time
One marker per published price point, color-coded by provider. The dashed line traces the running minimum across the frontier tier — the cheapest input-token price for a flagship-class model at each point in time. Y-axis is log-scale to fit the 6,000× range between $60/M davinci tokens and the sub-penny cache-hit floor.
Hover or tap any dot for the model, the full price (input, output, cached), the effective date, lifecycle status, and the source link.
The frontier price-floor in fourteen moves
Every time the cheapest-frontier-input number on the chart above stepped down. Each entry links into the model’s row on the relevant /ai/<family>/versions/ page for the full release context.
- Jun 11, 2020GPT-3 davinci ships at $60/M input
- Mar 1, 2023GPT-3.5 Turbo at $2/M lands at 10x cheaper than GPT-3
- Mar 14, 2023GPT-4 launches at $30/M input — the 2023 frontier price
- Jul 11, 2023Claude 2 at $11/M with first 100K context
- Mar 4, 2024Claude 3 Haiku at $0.25/M sets a new mainstream floor
- May 13, 2024GPT-4o ships at $5/M input — 6x off GPT-4 launch
- May 6, 2024DeepSeek-V2 at $0.14/M triggers the China price war
- Jul 18, 2024GPT-4o mini at $0.15/M closes the cheap-tier gap
- Sep 24, 2024Gemini 1.5 Pro cuts to $1.25/M — 82% off launch
- Apr 14, 2025GPT-4.1 at $2/M input with 1M context
- Jun 10, 2025o3 reasoning cut to $2/M input — 80% off launch
- Aug 7, 2025GPT-5 at $1.25/M — first OpenAI frontier under $2 input
- Apr 16, 2026Claude Opus 4.7 resets the Opus tier from $15 to $5
- Apr 30, 2026Grok 4.3 ships at $1.25/$2.50 — sharpest output cut yet
Every model’s current price
Sort by any column. Filter by provider, by tier, or to current-only. The Cached column shows the prompt-caching tier where the provider documents one (Anthropic, OpenAI, Google have one; xAI, DeepSeek, Mistral, and Alibaba sit at varying coverage).
Showing all 72 models.
By provider
Each provider’s current flagship, cheapest current input tier, and most-recent price-change log. Δ column shows the per-million input-price change from the same model’s prior price point; “tier hold” means the price held against the predecessor in the same family.
Notes and caveats
Headline tier is API list price. Every figure on this page is the public per-million-token rate the provider documents on its own pricing page. Negotiated enterprise pricing, committed-spend discounts, and special programs (OpenAI startup credits, Anthropic enterprise volume, Google Vertex AI region rates) are out of scope — this is a list-price reference.
Cached input pricing is a third axis that has grown more important since OpenAI introduced automatic prompt caching in October 2024 and Anthropic shipped explicit prompt caching in mid-2024. For workloads with a long stable prefix (agents replaying the same system prompt; RAG against a fixed corpus), the cached input rate dominates the effective per-call cost. The page surfaces it as a column rather than the headline number because non-cached workloads still see the headline rate.
Batch endpoints typically halve both input and output rates (OpenAI Batch, Anthropic Message Batches, Google Batch API) in exchange for asynchronous turnaround up to 24 hours. The row note flags batch availability per model; the headline number is the synchronous-API rate.
Open-weights pricing is a hosting-provider proxy. Meta’s Llama, DeepSeek, Mistral’s Apache-2.0-licensed releases, and Alibaba’s open-weights Qwen sizes have no “official” per-token price — the model is free to download. The row records the Together AI or Groq published rate as a reference. Cheaper rates exist on self-hosted inference (vLLM, sglang) or on cost-optimized providers (Fireworks, DeepInfra); more expensive ones exist on the hyperscalers (AWS Bedrock, Azure AI Foundry).
Reasoning tokens count as output. The o-series, Anthropic’s extended-thinking models, and Gemini’s 2.5 Flash thinking budget all bill reasoning tokens at the output rate. A “cheap-looking” reasoning model can be expensive in practice because each call generates more reasoning than visible output. The page records the per-token rate; the effective per-call cost is workload-dependent.
Tokenizer differences cross-provider. $1 / M tokens does not buy the same amount of text from OpenAI’s o200k tokenizer as it does from Anthropic’s claude-3 tokenizer or Google’s SentencePiece. Anthropic’s 4.7-era tokenizer in particular maps 1.0–1.35× the prior token count for the same input, which compresses the effective price improvement. Compare numbers in tokens, not in characters or pages, and treat them as a same-provider apples-to-apples rather than a strict cross-provider one.
Historical pricing is sourced from Wayback Machine and launch posts. Every effective-date entry links to a primary source — provider blog post, archived pricing page, or release announcement. Where the Wayback snapshot is the only viable cite (the provider has rewritten the page since), the URL points there directly.
About this page
Cross-family comparison page in the /ai/ section. Each row’s price is sourced from the provider’s own pricing page — OpenAI’s openai.com/api/pricing, Anthropic’s anthropic.com/pricing, Google’s ai.google.dev/gemini-api/docs/pricing, xAI’s docs.x.ai, DeepSeek’s api-docs.deepseek.com, Mistral’s mistral.ai/pricing, and Alibaba’s help.aliyun.com. Meta’s Llama models are open-weights; this page cites Together AI’s published rate as the reference hosting-provider price.
The model roster mirrors the per-family pages already on this site — Claude, ChatGPT, Gemini, Grok, Llama, DeepSeek, Mistral, Qwen — so each row links back to the matching version-page entry for the full per-release context. The cross-family context-windows, release-cadence, and benchmarks pages cover the other comparison axes.
Refreshed daily as part of the cross-family /ai/ sweep. Each refresh re-verifies every active row against the provider’s current pricing page; values that changed since the previous run get a new entry appended to the model’s price history and the row’s effective date is bumped. Stale rows for deprecated models are kept as historical record — the price history is the chart, not just the current rate.
Last updated: June 3, 2026. 72 models · 8 providers · 81 price points.