Mungomash LLC
DeepSeek Versions

2023 – 2026

DeepSeek Versions

Every DeepSeek release — DeepSeek-LLM (November 2023) through DeepSeek-V4-Pro and V4-Flash (April 2026) — with HuggingFace ids, ship dates, family (Flagship / Reasoning / Specialized), license terms, and the major changes per version. Plus the High-Flyer hedge-fund parentage, the December 2024 / January 2025 V3 / R1 inflection that triggered the largest single-day market-cap loss in U.S. stock-market history, the U.S. chip export-control context, the DeepSeek License vs. MIT evolution, and the April 2026 Tencent / Alibaba funding talks.

Family & status

Family

Flagship — the main DeepSeek chat lineage from DeepSeek-LLM through V4
Reasoning — DeepSeek-R1 and R1-0528; converged into the V-series at V3.1's hybrid mode
Specialized — DeepSeek-Coder, DeepSeek-Math, DeepSeek-VL / VL2, and Janus-Pro

Status

Current — actively recommended; the latest in its family
Available — weights still served via HuggingFace and partner inference providers, but superseded
Legacy — deprecated, experimental and superseded, or no longer recommended

DeepSeek version table

Model
DeepSeek-V4-Pro
deepseek-ai/DeepSeek-V4-Pro
Flagship
Current
Apr 24, 2026
Current frontier flagship. 1.6T total / 49B active MoE. 1M-token context. Dual Thinking / Non-Thinking modes. MIT-licensed.
  • Released April 24, 2026 in public preview; the announcement is at api-docs.deepseek.com/news/news260424. HuggingFace card: deepseek-ai/DeepSeek-V4-Pro.
  • 1.6 trillion total parameters / 49B active per token — the largest publicly-released DeepSeek model. Mixture-of-Experts architecture with DeepSeek Sparse Attention productionized from V3.2.
  • 1,000,000-token context window with up to 384K tokens of output; per the API release notes, V4-Pro requires only ~27% of single-token inference FLOPs and ~10% of KV cache compared with V3.2 in the 1M-token setting.
  • Dual Thinking / Non-Thinking modes in a single model, continuing the hybrid-reasoning architecture introduced in V3.1.
  • License: MIT. API pricing at launch: $1.74 / M input tokens, $3.48 / M output tokens. Available at chat.deepseek.com via Expert Mode, and via the OpenAI-compatible API endpoint.
  • Coverage of the launch in CNBC, Simon Willison, and Euronews. The release re-opened the U.S. Department of Commerce inquiry into whether V4 was trained on smuggled Nvidia Blackwell GPUs in violation of export controls (covered in the prose history below).
Model
DeepSeek-V4-Flash
deepseek-ai/DeepSeek-V4-Flash
Flagship
Current
Apr 24, 2026
Fast / cheap V4 companion. 284B total / 13B active MoE. Same 1M context and dual-mode architecture as V4-Pro. MIT-licensed.
  • Released April 24, 2026 alongside V4-Pro; HuggingFace card: deepseek-ai/DeepSeek-V4-Flash.
  • 284B total parameters / 13B active per token; same MoE-with-Sparse-Attention architecture as V4-Pro at a smaller scale, positioned for fast and economical inference.
  • Same 1,000,000-token context window and dual Thinking / Non-Thinking modes as V4-Pro.
  • License: MIT. API pricing at launch: $0.14 / M input tokens, $0.28 / M output tokens — roughly an order of magnitude cheaper than V4-Pro.
  • Available at chat.deepseek.com via Instant Mode, and via the OpenAI-compatible API.
Model
DeepSeek-V3.2 (+ Speciale)
deepseek-ai/DeepSeek-V3.2, deepseek-ai/DeepSeek-V3.2-Speciale
Flagship
Available
Dec 1, 2025
DeepSeek Sparse Attention productionized. 685B total MoE. 128K context. The high-compute Speciale variant claimed IMO and IOI gold medals at this scale.
  • Released December 1, 2025; the announcement is at api-docs.deepseek.com/news/news251201. Technical paper: “DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models” (arXiv 2512.02556).
  • 685B total parameters MoE with Multi-Latent Attention and the productionized DeepSeek Sparse Attention (DSA) from V3.2-Exp. 128K-token context window.
  • DeepSeek-V3.2-Speciale is the high-compute reasoning variant shipped alongside the standard V3.2; per DeepSeek's release post, Speciale achieved gold-medal performance at the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI), and was claimed to be on par with Gemini 3 Pro on broad reasoning benchmarks.
  • License: MIT. HuggingFace cards: DeepSeek-V3.2, DeepSeek-V3.2-Speciale.
  • Superseded by V4-Pro / V4-Flash four months later; weights remain available for self-host and via hosted inference.
Model
DeepSeek-V3.2-Exp
deepseek-ai/DeepSeek-V3.2-Exp
Flagship
Legacy
Sep 29, 2025
Experimental release. Introduced DeepSeek Sparse Attention (DSA) for long-context efficiency. Superseded by V3.2 stable two months later.
  • Released September 29, 2025 as an explicitly experimental release branched off V3.1; HuggingFace card: deepseek-ai/DeepSeek-V3.2-Exp; GitHub: github.com/deepseek-ai/DeepSeek-V3.2-Exp.
  • Introduced DeepSeek Sparse Attention (DSA) — an efficient attention mechanism that substantially reduces computational complexity in long-context scenarios while preserving model performance. The recipe became the architectural backbone of V3.2 stable and V4.
  • License: MIT. Status is Legacy on the “experimental, superseded by the stable V3.2 release” reading.
Model
DeepSeek-V3.1 (+ Terminus)
deepseek-ai/DeepSeek-V3.1, deepseek-ai/DeepSeek-V3.1-Terminus
Flagship
Legacy
Aug 21, 2025
Hybrid Thinking / Non-Thinking modes in a single model. 671B / 37B active. 128K context. The architectural convergence of V-series and R-series.
  • DeepSeek-V3.1 released August 21, 2025; the V3.1-Terminus stability / instruction-following update followed on September 22, 2025. Coverage in InfoQ; the API thinking-mode docs are at api-docs.deepseek.com/guides/thinking_mode.
  • Hybrid reasoning architecture — a single model that supports both fast non-thinking mode and a chain-of-thought thinking mode (called DeepSeek-V3.1-Think), governed by tokenizer parameters rather than separate model architectures.
  • 671B total parameters / ~37B active per token; 128K-token context window. Per DeepSeek, the thinking mode delivers quality comparable to DeepSeek-R1-0528 while reducing output tokens by 20–50% on the same tasks.
  • License: MIT. The convergence of the V-series and R-series in a single hybrid model is the architectural reason no further Reasoning-family rows have shipped after R1-0528.
  • HuggingFace cards: DeepSeek-V3.1, DeepSeek-V3.1-Terminus.
Model
DeepSeek-R1-0528
deepseek-ai/DeepSeek-R1-0528
Reasoning
Available
May 28, 2025
R1 update shipped in lieu of the rumored R2. Improved reasoning depth, hallucination rate, and tool use. The last standalone R-series release before V3.1's hybrid convergence.
  • Released May 28, 2025; HuggingFace card: deepseek-ai/DeepSeek-R1-0528.
  • An R1 update, not a DeepSeek-R2. The April–May 2025 rumor cycle had widely predicted an R2 release; Reuters later reported R2 was delayed by data-labelling and chip-availability constraints. R1-0528 shipped instead and was widely read as the substitute.
  • Same 671B / 37B-active MoE architecture as R1, with substantially-improved reasoning depth, lower hallucination rate, and improved tool / function calling per DeepSeek's release notes.
  • License: MIT. Last standalone R-series release; subsequent reasoning capability ships as the “Thinking” mode of the V-series starting at V3.1 (August 2025).
Model
DeepSeek-V3-0324
deepseek-ai/DeepSeek-V3-0324
Flagship
Legacy
Mar 24, 2025
First MIT-relicensed flagship. Improved reasoning, coding, and tool use over V3. Often cited as “DeepSeek-V3.1” informally before V3.1 proper shipped.
  • Released March 24, 2025; HuggingFace card: deepseek-ai/DeepSeek-V3-0324; coverage in SiliconANGLE.
  • First DeepSeek flagship released under the MIT License rather than the bespoke “DeepSeek License.” The shift signaled a structural pivot toward fully-open licensing for the V-series; subsequent V3.1 / V3.2 / V4 releases all ship under MIT.
  • Same 671B / 37B-active MoE architecture as V3, with improved reasoning, coding, and tool / function calling per the model card.
  • Often referenced informally as “DeepSeek-V3.1” in third-party coverage during March–August 2025; the proper DeepSeek-V3.1 (the row above) shipped on August 21, 2025 and is architecturally distinct.
Model
Janus-Pro (1B / 7B)
deepseek-ai/Janus-Pro-{1B, 7B}
Specialized
Available
Jan 27, 2025
Unified multimodal understanding-and-generation. SigLIP-L vision encoder. Outperformed DALL-E 3 and SD3 on GenEval at 7B. MIT-licensed.
  • Released January 27, 2025 — the same trading day as the Nvidia stock crash triggered by the broader DeepSeek-R1 narrative (covered in the prose history below). Paper: “Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling”.
  • Unified multimodal architecture — a single model for both image understanding and text-to-image generation, building on the earlier Janus / JanusFlow family. Two sizes shipped (1B and 7B activated parameters); SigLIP-L vision encoder.
  • Per DeepSeek's release post, Janus-Pro 7B outperformed OpenAI's DALL-E 3 and Stability AI's Stable Diffusion 3 medium on GenEval and DPG-Bench at launch.
  • License: MIT. HuggingFace card: deepseek-ai/Janus-Pro-7B.
Model
DeepSeek-R1 (+ R1-Zero)
deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Zero
Reasoning
Legacy
Jan 20, 2025
Open-weights reasoning model. RL-only training (R1-Zero) demonstrated emergent chain-of-thought. Triggered the largest single-day market-cap loss in U.S. stock-market history a week later.
  • Released January 20, 2025; the announcement is at api-docs.deepseek.com/news/news250120. Paper: “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” (arXiv 2501.12948); Nature follow-up: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.
  • 671B total / ~37B active per token MoE built on the V3 base, with reasoning capability incentivized through reinforcement learning. Performance was characterized by DeepSeek as comparable to OpenAI's o1 across math, coding, and reasoning benchmarks.
  • DeepSeek-R1-Zero is the load-bearing scientific result — trained with RL only (no supervised fine-tuning at all), R1-Zero exhibited emergent chain-of-thought reasoning, self-verification, and reflection behaviors. R1 itself adds an SFT cold-start to fix the readability and language-mixing issues in R1-Zero outputs.
  • The earlier DeepSeek-R1-Lite preview (November 20, 2024) had been the public hint at the reasoning track; R1 itself was the production release.
  • Distilled smaller variants (1.5B / 7B / 8B / 14B / 32B / 70B based on Qwen and Llama base models) shipped alongside R1; status of the distilled set varies by base model and is tracked on the HuggingFace org.
  • License: MIT. The R1 release week culminated in the January 27, 2025 Nvidia crash described in the prose history below; the broader market-cap impact and the dispute over causation are factual events documented there.

The V3 / R1 inflection — late December 2024 / January 2025. Above this line: every DeepSeek model from V3 onward, all under MIT or transitioning to MIT, shipped after the global-attention moment that DeepSeek-V3 (December 26, 2024) and DeepSeek-R1 (January 20, 2025) created. Below: the pre-inflection lineage — DeepSeek-LLM, the V2 architectural foundation, Coder, Math, VL, VL2 — mostly under the bespoke “DeepSeek License,” quietly building the MoE / MLA recipe that V3 productionized at frontier scale.

Model
DeepSeek-V3
deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-V3-Base
Flagship
Legacy
Dec 26, 2024
671B total / 37B active MoE. 14.8T-token pretraining. Reported $5.576M training cost on 2.788M H800 GPU hours. The frontier-class disclosed-compute moment.
  • Released December 26, 2024 (V3-Base + V3 chat); technical report at arXiv 2412.19437; HuggingFace card: deepseek-ai/DeepSeek-V3.
  • 671 billion total parameters / ~37 billion activated per token in a MoE architecture, with Multi-Latent Attention (MLA) and the DeepSeekMoE recipe productionized at frontier scale for the first time. Pretrained on 14.8 trillion tokens.
  • Disclosed training cost of $5.576 million on 2.788 million H800 GPU hours — the figure that reset the public conversation about AI training cost. The number excludes prior-stage research and infrastructure capex (the Fire-Flyer cluster the run depended on); see the prose history below for the dispute.
  • Performance was characterized at launch as competitive with GPT-4o and Claude 3.5 Sonnet on broad benchmarks at a fraction of the disclosed compute. 128K context window.
  • Originally distributed under the bespoke DeepSeek License for the model weights (an OpenRAIL-derived license with use-based restrictions) and MIT for the code repository; relicensed to MIT for the V3-0324 update three months later.
Model
DeepSeek-VL2 (Tiny / Small / VL2)
deepseek-ai/deepseek-vl2-{tiny, small}, deepseek-ai/deepseek-vl2
Specialized
Available
Dec 13, 2024
First MoE vision-language line. Three sizes (1.0B / 2.8B / 4.5B activated). Dynamic-tiling vision encoder. OCR / chart / document understanding.
  • Released December 13, 2024; paper at arXiv 2412.10302; GitHub: deepseek-ai/DeepSeek-VL2.
  • Three sizes — VL2-Tiny (1.0B activated), VL2-Small (2.8B activated), VL2 (4.5B activated). MoE language tower with MLA, dynamic-tiling vision encoder for variable-aspect-ratio inputs.
  • Targets visual question answering, optical character recognition, document / table / chart understanding, and visual grounding. Strong OCR results on OCRBench at launch.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository. The Janus-Pro line (one row above) is the more recent multimodal flagship.
Model
DeepSeek-V2.5
deepseek-ai/DeepSeek-V2.5, deepseek-ai/DeepSeek-V2.5-1210
Flagship
Legacy
Sep 6, 2024
Merged V2 chat and Coder-V2 into a single general-purpose model. Revised in December 2024 (V2.5-1210). The bridge to V3.
  • Released September 6, 2024; revised December 10, 2024 as V2.5-1210. HuggingFace cards: DeepSeek-V2.5, DeepSeek-V2.5-1210.
  • Merged the V2 chat and Coder-V2 lineages into a single general-purpose model, simplifying the deployment story and serving as the bridge release between V2 and V3.
  • Same 236B / 21B-active MoE architecture as V2, with improved general / coding capability and better instruction following per the release notes.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository.
Model
DeepSeek-Coder-V2 (16B / 236B)
deepseek-ai/DeepSeek-Coder-V2-{Lite-, }Instruct, -Base
Specialized
Available
Jun 17, 2024
First MoE coding model. Two sizes (16B / 236B). 338-language coverage. Reported parity with GPT-4 Turbo / Claude 3 Opus on HumanEval at launch.
  • Released June 17, 2024; HuggingFace collection: DeepSeekCoder-V2.
  • Two sizes — Coder-V2-Lite (16B total / 2.4B active) and Coder-V2 (236B total / 21B active), the latter built on the V2 base. 338-language coverage; 128K context.
  • Reported parity with GPT-4 Turbo and Claude 3 Opus on HumanEval and MBPP at launch — the first open-weights coding model to claim that benchmark range.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository. Superseded as the recommended coding option by general-purpose V3 / V3.1+ instruction tuning by 2025.
Model
DeepSeek-V2
deepseek-ai/DeepSeek-V2, deepseek-ai/DeepSeek-V2-Lite
Flagship
Legacy
May 6, 2024
First DeepSeek MoE flagship. 236B total / 21B active. Multi-Head Latent Attention. 42.5% training-cost reduction and 93.3% smaller KV cache vs. V1.
  • Released May 6, 2024; paper: “DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model” (arXiv 2405.04434).
  • 236 billion total parameters / 21 billion activated per token in a Mixture-of-Experts architecture; supported a 128K-token context window.
  • Multi-Head Latent Attention (MLA) debuted here — the architectural innovation that compresses KV cache into a low-rank latent vector. MLA + DeepSeekMoE delivered 42.5% training-cost reduction, 93.3% smaller KV cache, and 5.76× maximum throughput versus DeepSeek-LLM 67B per the paper. The architecture is the backbone every subsequent V-series release builds on.
  • A smaller V2-Lite (15.7B total / 2.4B active) companion shipped alongside.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository.

The MoE / MLA architecture turn — May 2024. Above this line: every DeepSeek release built on the Multi-Head Latent Attention + DeepSeekMoE recipe introduced in V2. Below: the dense-architecture pre-MoE era — DeepSeek-LLM 7B / 67B, the original DeepSeek-Coder, DeepSeek-Math, and DeepSeek-VL — the foundation lineage that established the lab and the Fire-Flyer infrastructure but had not yet found the architecture that would let DeepSeek punch at frontier scale.

Model
DeepSeek-VL (1.3B / 7B)
deepseek-ai/deepseek-vl-{1.3b, 7b}-{base, chat}
Specialized
Legacy
Mar 8, 2024
First DeepSeek vision-language model. Two sizes. SigLIP / SAM-B hybrid vision encoder. Superseded by VL2 nine months later.
  • Released March 8, 2024 as the first DeepSeek vision-language model; HuggingFace org: deepseek-ai.
  • Two sizes (1.3B and 7B). Hybrid vision encoder combining SigLIP and SAM-B for fine-grained image understanding.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository. Superseded as the recommended VL by VL2 (December 2024) and Janus-Pro (January 2025).
Model
DeepSeekMath 7B
deepseek-ai/deepseek-math-7b-{base, instruct, rl}
Specialized
Legacy
Feb 5, 2024
7B math-specialist. Introduced GRPO — Group Relative Policy Optimization — the RL method later used to train DeepSeek-R1.
  • Released February 5, 2024; paper: “DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models” (arXiv 2402.03300).
  • 7B math-specialist built on DeepSeek-Coder-Base; pretrained on a 120B-token math-corpus filtered from Common Crawl. Three flavors shipped: base, instruct, and the RL-tuned variant.
  • Introduced Group Relative Policy Optimization (GRPO) — the RL recipe that DeepSeek later used to train DeepSeek-R1's reasoning behavior. The DeepSeekMath paper is the load-bearing methodology citation behind R1.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository.
Model
DeepSeek-Coder (1.3B / 6.7B / 33B)
deepseek-ai/deepseek-coder-{1.3b, 6.7b, 33b}-{base, instruct}
Specialized
Legacy
Nov 2, 2023
First DeepSeek-Coder. Three sizes. 87-language coverage, 16K context. Strong open-weights coding-model competitor to Code Llama and StarCoder at launch.
  • Released November 2, 2023 as the first DeepSeek-Coder; HuggingFace collection: DeepSeek-Coder.
  • Three sizes (1.3B, 6.7B, 33B). 87-language coverage. 16,384-token context window and project-level pretraining for repo-aware completion.
  • Reported strong results vs. Code Llama and StarCoder on HumanEval and MBPP at launch; established DeepSeek's reputation in the coding-model niche before the broader brand became globally cited.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository. Superseded by Coder-V2 (June 2024) and the general-purpose V3 / V3.1 lineage.
Model
DeepSeek-LLM (7B / 67B)
deepseek-ai/deepseek-llm-{7b, 67b}-{base, chat}
Flagship
Legacy
Nov 27, 2023
First DeepSeek general-purpose LLM. Dense architecture. Two sizes. 2T-token pretraining. Grouped-Query Attention. The lab's debut model line.
  • Released November 27, 2023; HuggingFace cards: deepseek-llm-7b-base, deepseek-llm-67b-base, plus -chat variants. GitHub: deepseek-ai/DeepSeek-LLM.
  • Two dense sizes (7B and 67B), pretrained from scratch on 2 trillion tokens in English and Chinese. Grouped-Query Attention at 67B. The lab's first general-purpose LLM line, four months after DeepSeek's July 2023 founding.
  • The 67B chat variant was widely considered competitive with Llama 2 70B and ChatGLM-3 at launch on broad benchmarks.
  • Distributed under the bespoke DeepSeek License for the model weights; MIT for the code repository. Architecturally superseded by the V2 MoE family six months later.

Click any row to expand. Each row has a stable id for sharing — e.g. /ai/deepseek/versions/#deepseek-v4-pro, #deepseek-r1, #deepseek-v3, #deepseek-v2. DeepSeek API docs: api-docs.deepseek.com; HuggingFace org: huggingface.co/deepseek-ai; GitHub org: github.com/deepseek-ai.

High-Flyer, Liang Wenfeng, and the July 2023 founding

DeepSeek was founded on July 17, 2023 in Hangzhou, China, by Liang Wenfeng. The lab's parent and sole funder is High-Flyer, the Chinese quantitative hedge fund Liang co-founded in February 2016. Liang serves as CEO of both companies.

Liang's background is unusual for a frontier-AI founder: he studied electrical engineering at Zhejiang University, began trading equities during the 2008 financial crisis as an undergraduate, and built High-Flyer as an AI-driven quant fund. By 2021, High-Flyer was reportedly using AI exclusively for trading decisions and had become one of the largest quantitative funds in China. The fund's profitability is what underwrote the AI-research investment that produced DeepSeek; reporting in Fortune and ChinaTalk covers the trajectory.

DeepSeek was wholly owned by High-Flyer from incorporation through April 2026. The April 2026 reports of Tencent and Alibaba investment talks at a $20B+ valuation (covered in the funding section below) are the company's first external funding round; even at the proposed valuation, Liang's stake remains majority and the High-Flyer relationship intact.

Fire-Flyer 2 and the GPU stockpile

Before DeepSeek existed as an independent entity, High-Flyer had been building the GPU infrastructure the lab would inherit. Liang began acquiring Nvidia GPUs at scale starting in 2021, reportedly building a stockpile of around 10,000 Nvidia A100 chips before the U.S. October 7, 2022 export controls first restricted top-tier-AI-GPU exports to China. The pre-controls procurement window is the single most-cited piece of context for why a Chinese lab can train at scale despite the sanctions: the chips were already on the floor before the restriction took effect.

High-Flyer built the Fire-Flyer 2 cluster beginning in 2021 with a reported budget of 1 billion yuan. Per the cluster's published statistics, Fire-Flyer 2 had reached 5,000 PCIe A100 GPUs in 625 nodes with ~96% utilization through 2022, totaling ~56.74 million GPU-hours of capacity used. The cluster was the load-bearing infrastructure for everything DeepSeek shipped from DeepSeek-LLM through V3, and the implicit denominator behind the “$5.576M training cost” figure for V3 — that figure is the marginal compute cost of one training run, not the cumulative R&D and infrastructure cost of building the cluster the run depended on. The dispute over whether the headline cost number is misleading hinges on this distinction.

The V3 / R1 inflection — December 2024 / January 2025

DeepSeek-V3 shipped on December 26, 2024 as a 671B-total / 37B-active MoE model with a disclosed training cost of $5.576 million on 2.788 million H800 GPU hours, pretrained on 14.8 trillion tokens. Performance on broad benchmarks at launch was characterized as competitive with GPT-4o and Claude 3.5 Sonnet. The combination — frontier-adjacent quality, fully open weights, and an order-of-magnitude-lower disclosed compute number — was the data point the AI-infrastructure capex thesis had not previously had to absorb. The technical report is at arXiv 2412.19437.

DeepSeek-R1 followed on January 20, 2025, built on the V3 base, with reasoning capability incentivized through reinforcement learning. The accompanying R1-Zero result — trained with RL only and no supervised fine-tuning — demonstrated emergent chain-of-thought reasoning, self-verification, and reflection behaviors, and was the load-bearing scientific claim of the release. Performance was characterized as comparable to OpenAI's o1 across math, coding, and reasoning benchmarks. The paper is at arXiv 2501.12948; a Nature follow-up published August 14, 2025 is at nature.com/articles/s41586-025-09422-z.

On January 27, 2025, the public-equity reaction to the R1 narrative produced what was at the time the largest single-day loss in U.S. stock-market history: Nvidia fell ~17% and shed approximately $589 billion in market capitalization in a single trading session. The Nasdaq fell ~3% on the day; AI-infrastructure names (Broadcom, Marvell, Vertiv, Constellation Energy) sold off in sympathy. CNBC, Yahoo Finance.

The causation is disputed. Tim Lee at Understanding AI argued that the move was already in motion before the R1 release week, that the disclosed-compute figure excluded prior R&D and the Fire-Flyer build, and that an alternative reading is “efficiency gains expand the inference-compute market faster than they shrink the training-compute market” (the Jevons-paradox response that Microsoft, Meta, and Google all subsequently adopted publicly). DeepSeek did not retract the disclosed-compute figure; the dispute is over the framing, not the number.

The U.S. chip export-control context

DeepSeek's training infrastructure has been scrutinized by U.S. policymakers since the R1 release week. The October 7, 2022 U.S. Department of Commerce export controls restricted top-tier AI-GPU exports to China; Nvidia subsequently produced the H800, a deliberately-degraded H100 variant designed to fall under the export-control thresholds, which it sold legally to Chinese customers including DeepSeek. The H800 was banned in turn in October 2023, but the year-long gap between the original control and the H800 ban was sufficient for DeepSeek to procure the chips it disclosed using to train V3 in 2024. Coverage at CSIS, RAND.

Following the R1 release, the U.S. Department of Commerce opened an inquiry into whether DeepSeek had used U.S. chips not legally exportable to China. House Select Committee chairs Krishnamoorthi and Moolenaar issued a public call to tighten the existing controls in February 2025. Through 2025, several U.S. state governments and federal agencies banned the DeepSeek consumer chatbot on government devices on data-handling grounds (the same regime that had been applied to TikTok); the bans do not apply to the open-weights releases on HuggingFace, which can be self-hosted on Western infrastructure.

The April 2026 V4 release re-opened the chip-controversy docket. Reporting in 2026 alleged that the V4 training run used clusters of Nvidia Blackwell B200 GPUs — a chip class that is comprehensively export-controlled to China — reportedly housed at a data center in Inner Mongolia. The U.S. Department of Commerce investigation into the alleged Blackwell smuggling is open as of this page's publication date; DeepSeek has not publicly confirmed the chips it used to train V4. The refresh task should re-check the docket on every run.

The DeepSeek License vs. MIT — the licensing turn

DeepSeek's licensing has evolved across two distinct conventions. From DeepSeek-LLM (November 2023) through DeepSeek-V3 (December 2024), the model weights shipped under the bespoke “DeepSeek License” — an OpenRAIL-derived custom license with use-based restrictions (military, surveillance, deceptive content, certain weapons applications) and a separate commercial-license track. The associated GitHub source code repos shipped under MIT separately. This is the same code-vs-weights split Meta uses for the Llama lineage, but DeepSeek's bespoke license is differently shaped and was not OSI-approved. Black Duck's model-license review from January 2025 walks the original terms.

The licensing turn is DeepSeek-R1 (January 20, 2025), which was the first DeepSeek flagship released under the MIT License. DeepSeek-V3-0324 (March 24, 2025) re-released the V3 weights under MIT, retroactively bringing the V-series flagship into MIT-compliance for the post-V3 era. Every subsequent V-series and R-series release — R1-0528, V3.1, V3.1-Terminus, V3.2-Exp, V3.2, V3.2-Speciale, V4-Pro, V4-Flash — has shipped under MIT. Janus-Pro (January 27, 2025) also shipped under MIT.

The pre-R1 specialized models (Coder, Coder-V2, Math, VL, VL2) remain on the original DeepSeek License for the model weights as of this page's publication date; whether DeepSeek will retroactively relicense the older specialized weights to MIT is open. For new builds, the practical guidance is “everything from R1 forward is MIT; the older specialized models retain the use-restriction terms of the DeepSeek License.” Read the LICENSE-MODEL file in the relevant GitHub repo before shipping at scale.

The April 2026 Tencent / Alibaba funding talks

Through April 2026, DeepSeek had raised no external capital — it was funded entirely by High-Flyer's profits since the July 2023 incorporation. On April 22, 2026, Bloomberg and The Information reported that Tencent and Alibaba were in talks to invest a combined ~$1.8 billion at a $20 billion+ valuation — the company's first external funding round.

Per the reporting, Tencent had proposed acquiring up to a 20% stake but DeepSeek was reluctant to cede that share of control; Alibaba's role was reportedly smaller. As of the April 28, 2026 publication date of this page the round had not closed. The funding talks landed two days before the V4 release, and the timing has been read as DeepSeek positioning the line for a frontier-scale 2026 capex push that High-Flyer alone could not underwrite. The refresh task should re-check the round status on every run.

Where to run DeepSeek

DeepSeek is widely deployed because the weights are open and the API is OpenAI-compatible. Inference paths through 2025–2026 break into four categories.

DeepSeek's own API. The first-party endpoint at api-docs.deepseek.com is OpenAI-API-compatible, so any OpenAI SDK can be pointed at it with only a base-URL change. Pricing has historically been an order of magnitude cheaper than Western frontier-model APIs (V4-Flash at $0.14 / M input tokens at launch).

Self-host from HuggingFace. Download from the deepseek-ai org and run with vLLM, SGLang, llama.cpp, or Ollama. The full V3 / V3.1 / V3.2 / V4-Pro models require multi-node H100 / H200 / B200 deployments at full precision; quantized variants ship from the open-source community shortly after each release.

Hosted-inference providers. Together AI, Fireworks AI, OpenRouter, SiliconFlow, Groq, Perplexity's public-API tier. Most providers serve the post-MIT weights (R1 forward) and clearly label which version is hosted.

Hyperscalers. AWS Bedrock, Microsoft Azure AI Foundry, NVIDIA NIM, IBM watsonx, and Oracle OCI have all added DeepSeek SKUs across 2025–2026 (typically the MIT-licensed R1, V3.1, V3.2, V4 lineage). Google Cloud Vertex has been slower to add DeepSeek; check the providers' model catalogs for the current state.

People who shaped DeepSeek

Liang Wenfeng — founder and CEO of DeepSeek, co-founder and CEO of High-Flyer. The 2021 GPU-stockpile decision, the July 2023 DeepSeek incorporation, the V2 MoE / MLA bet, and the 2025 MIT-licensing turn all trace through Liang's office. Profiled in Fortune; on Wikipedia at Liang Wenfeng.

High-Flyer (Hangzhou Huanfang Technology Co., Ltd.) — the parent quantitative hedge fund. Co-founded by Liang in February 2016; reported to be using AI exclusively for trading by 2021. The funder of the Fire-Flyer 2 cluster and DeepSeek's only investor through April 2026.

DeepSeek's research staff — the lab is known for an unusually flat structure, a young research team (many recent PhD graduates from Tsinghua, Peking University, and Zhejiang University), and a publication culture that ships technical reports alongside model releases. Named-author rosters appear on the V2 / V3 / R1 / V3.2 papers on arXiv. Several core researchers have reportedly been recruited away to ByteDance, Tencent, Xiaomi, and the autonomous-driving company Yuanrong Qihang during 2025; named-departure tracking is sparse compared to U.S. labs.

No publicly-named CTO, CEO-second, or board. Unlike OpenAI, Anthropic, xAI, and Google DeepMind, DeepSeek does not maintain a leadership page; corporate governance is held inside the High-Flyer / DeepSeek Liang-led structure. The April 2026 Tencent / Alibaba round is the first external-investor relationship and may produce a board structure when it closes.

The competitive landscape

DeepSeek is, alongside Alibaba's Qwen line, one of the two dominant Chinese open-weights AI families through 2025–2026. The closest direct comparators on the open-weights axis are Alibaba's Qwen (also Apache-2.0-or-permissive across most releases, with strong HuggingFace-leaderboard presence — see Qwen Versions), Mistral (French; mixed Apache 2.0 / Mistral Research License / proprietary tiers across the line, with the December 2025 “Mistral 3” family relaunch re-committing the open releases to Apache 2.0 — see Mistral Versions), Meta's Llama (custom Llama Community License, see Llama Versions), and Moonshot AI's Kimi line. The closed-weights frontier competitors — ChatGPT, Claude, Gemini, Grok — are the practical benchmark for “is DeepSeek competitive at frontier scale,” which is the question the V3 / R1 / V3.2 / V4 release cycle has been answering in the affirmative since December 2024. This page does not attempt a benchmark roundup or a ranking.

Use DeepSeek

The browser cannot detect which DeepSeek model you've used — there's no fingerprint or header that exposes it. The block below carries the practical information instead: the current model identifiers, a copy-paste API call, the surfaces where DeepSeek is available, and the licensing summary.

Current model identifiers

DeepSeek API model strings on the left; HuggingFace ids on the deepseek-ai org on the right. Verify against api-docs.deepseek.com and huggingface.co/deepseek-ai for the freshest list.

# V4 — current frontier line (April 2026)
deepseek-chat       # maps to V4-Flash via the OpenAI-compatible API
deepseek-reasoner   # maps to V4-Pro
deepseek-ai/DeepSeek-V4-Pro
deepseek-ai/DeepSeek-V4-Flash

# V3.2 — still widely served (December 2025)
deepseek-ai/DeepSeek-V3.2
deepseek-ai/DeepSeek-V3.2-Speciale

# V3.1 + Terminus — hybrid Thinking / Non-Thinking architecture (August/September 2025)
deepseek-ai/DeepSeek-V3.1
deepseek-ai/DeepSeek-V3.1-Terminus

# R1 line — reasoning
deepseek-ai/DeepSeek-R1
deepseek-ai/DeepSeek-R1-0528

# Specialized — Coder, Math, multimodal
deepseek-ai/DeepSeek-Coder-V2-Instruct
deepseek-ai/Janus-Pro-7B

Quick API call (OpenAI-compatible)

DeepSeek's API endpoint is OpenAI-API-compatible — point any OpenAI SDK at the DeepSeek base URL with a DeepSeek API key. Replace the placeholder values before running.

$ curl https://api.deepseek.com/chat/completions \
    -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model":    "deepseek-chat",
      "messages": [{ "role": "user", "content": "Hello, DeepSeek." }]
    }'

Where to run DeepSeek

Four categories — DeepSeek's own API, self-host from HuggingFace, hosted-inference providers, and hyperscalers. Pricing varies by orders of magnitude; the open weights are the same across all of them.

# DeepSeek first-party
https://chat.deepseek.com/                  # consumer chat (Expert / Instant modes)
https://api-docs.deepseek.com/              # OpenAI-compatible API

# Self-host from HuggingFace
https://huggingface.co/deepseek-ai          # every model card lives here
https://github.com/vllm-project/vllm        # production-grade throughput
https://github.com/sgl-project/sglang
https://github.com/ggerganov/llama.cpp      # CPU + GPU, edge-friendly
https://ollama.com/                         # single-binary, easiest entry

# Hosted-inference providers
https://www.together.ai/
https://fireworks.ai/
https://openrouter.ai/
https://groq.com/
https://www.siliconflow.com/

# Hyperscalers
AWS Bedrock, Azure AI Foundry, NVIDIA NIM, IBM watsonx, Oracle OCI

Licensing

DeepSeek transitioned from a bespoke “DeepSeek License” on the model weights to MIT starting with R1 (January 2025). Read the LICENSE-MODEL file in the relevant GitHub repo before shipping at scale.

# MIT-licensed (R1 forward, January 2025+)
DeepSeek-R1, R1-0528, R1 distilled
DeepSeek-V3-0324, V3.1, V3.1-Terminus, V3.2-Exp, V3.2, V3.2-Speciale
DeepSeek-V4-Pro, V4-Flash
Janus-Pro 1B / 7B

# Bespoke "DeepSeek License" on model weights, MIT on code
DeepSeek-LLM 7B / 67B
DeepSeek-Coder, Coder-V2
DeepSeekMath
DeepSeek-VL, VL2
DeepSeek-V2, V2.5
DeepSeek-V3 (original December 2024 release; relicensed to MIT as V3-0324)

# LICENSE-MODEL files live in each GitHub repo
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL
https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL