AI context windows
Every model’s context window, in one place
Largest current context window: 10,000,000 tokens — Meta Llama 4 Scout. Across 103 models from 8 providers.
As of May 9, 2026.
Maximum context window over time
One marker per public model release, color-coded by provider. The dashed line traces the running maximum across all providers — the “largest context window in production” at each point in time. Y-axis is log-scale so a 5,000× range still reads.
Every model
Sort by any column. Filter by provider or by minimum context window.
Showing all 103 models.
By provider
Each provider’s context-window history at a glance. The current maximum and the lifetime maximum may differ when a provider has rolled an early extended-context experiment back into a smaller production window.
Notes and caveats
Effective vs. nominal context. Several long-context models advertise large windows but degrade past a certain length on real-world tasks — this page records the documented nominal capacity, not benchmark-measured effective length. The latter is benchmark-dependent and out of scope per the section’s no-benchmarks rule.
Input vs. output limits. Most providers document a single “context window” that includes both prompt and response tokens. A few (notably the OpenAI o-series and the Anthropic Claude 4 generation) document a separate output-token cap. Where separately documented, the Output column shows it; otherwise the row treats the input window as the total budget.
Beta and tier-gated context. Some providers ship a default context size for the standard API and a larger one behind a beta flag, batch endpoint, or paid tier. The headline number on this page is the standard-API value documented as generally available; the per-row notes call out when a beta or tier-gated extended window exists.
Open-weights inference. For open-weights models (Llama, DeepSeek, Mistral, Qwen) the “context window” is the value the model card claims; serving infrastructure (vLLM, Together, Fireworks, Hugging Face Inference) often caps the deployed window lower for memory reasons. Always check the specific endpoint’s docs before relying on the full nominal window.
Tokenizer differences. One token is not a fixed unit across providers. OpenAI’s o200k tokenizer, Anthropic’s tokenizer, Google’s SentencePiece, and Meta’s tiktoken-derived tokenizers all produce different token counts for identical text. Compare context windows in tokens, not in characters or pages, but treat them as a same-provider apples-to-apples comparison rather than a strict cross-provider one.
About this page
Cross-family comparison page in the /ai/ section. Each row’s context-window value is sourced from the provider’s own model documentation — OpenAI’s platform.openai.com/docs/models, Anthropic’s docs.claude.com, Google’s ai.google.dev, xAI’s docs.x.ai, Meta’s llama.com and huggingface.co/meta-llama, DeepSeek’s api-docs.deepseek.com, Mistral’s docs.mistral.ai, and Alibaba’s help.aliyun.com/zh/dashscope.
The model roster mirrors the per-family pages already on this site — Claude, ChatGPT, Gemini, Grok, Llama, DeepSeek, Mistral, Qwen — so each row links back to the matching version-page entry for the full per-release context.
Refreshed monthly. Each refresh re-verifies every row against the provider’s current documentation; values that changed since the previous run are updated and the row’s “as of” date is bumped. See release cadence for the cross-family ship-cadence picture this page complements.
Last updated: May 9, 2026. 103 models · 8 providers.