2023 – 2026
DeepSeek Versions
Every DeepSeek release — DeepSeek-LLM (November 2023) through DeepSeek-V4-Pro and V4-Flash (April 2026) — with HuggingFace ids, ship dates, family (Flagship / Reasoning / Specialized), license terms, and the major changes per version. Plus the High-Flyer hedge-fund parentage, the December 2024 / January 2025 V3 / R1 inflection that triggered the largest single-day market-cap loss in U.S. stock-market history, the U.S. chip export-control context, the DeepSeek License vs. MIT evolution, and the April 2026 Tencent / Alibaba funding talks.
High-Flyer, Liang Wenfeng, and the July 2023 founding
DeepSeek was founded on July 17, 2023 in Hangzhou, China, by Liang Wenfeng. The lab's parent and sole funder is High-Flyer, the Chinese quantitative hedge fund Liang co-founded in February 2016. Liang serves as CEO of both companies.
Liang's background is unusual for a frontier-AI founder: he studied electrical engineering at Zhejiang University, began trading equities during the 2008 financial crisis as an undergraduate, and built High-Flyer as an AI-driven quant fund. By 2021, High-Flyer was reportedly using AI exclusively for trading decisions and had become one of the largest quantitative funds in China. The fund's profitability is what underwrote the AI-research investment that produced DeepSeek; reporting in Fortune and ChinaTalk covers the trajectory.
DeepSeek was wholly owned by High-Flyer from incorporation through April 2026. The April 2026 reports of Tencent and Alibaba investment talks at a $20B+ valuation (covered in the funding section below) are the company's first external funding round; even at the proposed valuation, Liang's stake remains majority and the High-Flyer relationship intact.
Fire-Flyer 2 and the GPU stockpile
Before DeepSeek existed as an independent entity, High-Flyer had been building the GPU infrastructure the lab would inherit. Liang began acquiring Nvidia GPUs at scale starting in 2021, reportedly building a stockpile of around 10,000 Nvidia A100 chips before the U.S. October 7, 2022 export controls first restricted top-tier-AI-GPU exports to China. The pre-controls procurement window is the single most-cited piece of context for why a Chinese lab can train at scale despite the sanctions: the chips were already on the floor before the restriction took effect.
High-Flyer built the Fire-Flyer 2 cluster beginning in 2021 with a reported budget of 1 billion yuan. Per the cluster's published statistics, Fire-Flyer 2 had reached 5,000 PCIe A100 GPUs in 625 nodes with ~96% utilization through 2022, totaling ~56.74 million GPU-hours of capacity used. The cluster was the load-bearing infrastructure for everything DeepSeek shipped from DeepSeek-LLM through V3, and the implicit denominator behind the “$5.576M training cost” figure for V3 — that figure is the marginal compute cost of one training run, not the cumulative R&D and infrastructure cost of building the cluster the run depended on. The dispute over whether the headline cost number is misleading hinges on this distinction.
The V3 / R1 inflection — December 2024 / January 2025
DeepSeek-V3 shipped on December 26, 2024 as a 671B-total / 37B-active MoE model with a disclosed training cost of $5.576 million on 2.788 million H800 GPU hours, pretrained on 14.8 trillion tokens. Performance on broad benchmarks at launch was characterized as competitive with GPT-4o and Claude 3.5 Sonnet. The combination — frontier-adjacent quality, fully open weights, and an order-of-magnitude-lower disclosed compute number — was the data point the AI-infrastructure capex thesis had not previously had to absorb. The technical report is at arXiv 2412.19437.
DeepSeek-R1 followed on January 20, 2025, built on the V3 base, with reasoning capability incentivized through reinforcement learning. The accompanying R1-Zero result — trained with RL only and no supervised fine-tuning — demonstrated emergent chain-of-thought reasoning, self-verification, and reflection behaviors, and was the load-bearing scientific claim of the release. Performance was characterized as comparable to OpenAI's o1 across math, coding, and reasoning benchmarks. The paper is at arXiv 2501.12948; a Nature follow-up published August 14, 2025 is at nature.com/articles/s41586-025-09422-z.
On January 27, 2025, the public-equity reaction to the R1 narrative produced what was at the time the largest single-day loss in U.S. stock-market history: Nvidia fell ~17% and shed approximately $589 billion in market capitalization in a single trading session. The Nasdaq fell ~3% on the day; AI-infrastructure names (Broadcom, Marvell, Vertiv, Constellation Energy) sold off in sympathy. CNBC, Yahoo Finance.
The causation is disputed. Tim Lee at Understanding AI argued that the move was already in motion before the R1 release week, that the disclosed-compute figure excluded prior R&D and the Fire-Flyer build, and that an alternative reading is “efficiency gains expand the inference-compute market faster than they shrink the training-compute market” (the Jevons-paradox response that Microsoft, Meta, and Google all subsequently adopted publicly). DeepSeek did not retract the disclosed-compute figure; the dispute is over the framing, not the number.
The U.S. chip export-control context
DeepSeek's training infrastructure has been scrutinized by U.S. policymakers since the R1 release week. The October 7, 2022 U.S. Department of Commerce export controls restricted top-tier AI-GPU exports to China; Nvidia subsequently produced the H800, a deliberately-degraded H100 variant designed to fall under the export-control thresholds, which it sold legally to Chinese customers including DeepSeek. The H800 was banned in turn in October 2023, but the year-long gap between the original control and the H800 ban was sufficient for DeepSeek to procure the chips it disclosed using to train V3 in 2024. Coverage at CSIS, RAND.
Following the R1 release, the U.S. Department of Commerce opened an inquiry into whether DeepSeek had used U.S. chips not legally exportable to China. House Select Committee chairs Krishnamoorthi and Moolenaar issued a public call to tighten the existing controls in February 2025. Through 2025, several U.S. state governments and federal agencies banned the DeepSeek consumer chatbot on government devices on data-handling grounds (the same regime that had been applied to TikTok); the bans do not apply to the open-weights releases on HuggingFace, which can be self-hosted on Western infrastructure.
The April 2026 V4 release re-opened the chip-controversy docket. Reporting in 2026 alleged that the V4 training run used clusters of Nvidia Blackwell B200 GPUs — a chip class that is comprehensively export-controlled to China — reportedly housed at a data center in Inner Mongolia. The U.S. Department of Commerce investigation into the alleged Blackwell smuggling is open as of this page's publication date; DeepSeek has not publicly confirmed the chips it used to train V4. The refresh task should re-check the docket on every run.
The DeepSeek License vs. MIT — the licensing turn
DeepSeek's licensing has evolved across two distinct conventions. From DeepSeek-LLM (November 2023) through DeepSeek-V3 (December 2024), the model weights shipped under the bespoke “DeepSeek License” — an OpenRAIL-derived custom license with use-based restrictions (military, surveillance, deceptive content, certain weapons applications) and a separate commercial-license track. The associated GitHub source code repos shipped under MIT separately. This is the same code-vs-weights split Meta uses for the Llama lineage, but DeepSeek's bespoke license is differently shaped and was not OSI-approved. Black Duck's model-license review from January 2025 walks the original terms.
The licensing turn is DeepSeek-R1 (January 20, 2025), which was the first DeepSeek flagship released under the MIT License. DeepSeek-V3-0324 (March 24, 2025) re-released the V3 weights under MIT, retroactively bringing the V-series flagship into MIT-compliance for the post-V3 era. Every subsequent V-series and R-series release — R1-0528, V3.1, V3.1-Terminus, V3.2-Exp, V3.2, V3.2-Speciale, V4-Pro, V4-Flash — has shipped under MIT. Janus-Pro (January 27, 2025) also shipped under MIT.
The pre-R1 specialized models (Coder, Coder-V2, Math, VL, VL2) remain on the original DeepSeek License for the model weights as of this page's publication date; whether DeepSeek will retroactively relicense the older specialized weights to MIT is open. For new builds, the practical guidance is “everything from R1 forward is MIT; the older specialized models retain the use-restriction terms of the DeepSeek License.” Read the LICENSE-MODEL file in the relevant GitHub repo before shipping at scale.
The April 2026 Tencent / Alibaba funding talks
Through April 2026, DeepSeek had raised no external capital — it was funded entirely by High-Flyer's profits since the July 2023 incorporation. On April 22, 2026, Bloomberg and The Information reported that Tencent and Alibaba were in talks to invest a combined ~$1.8 billion at a $20 billion+ valuation — the company's first external funding round.
Per the reporting, Tencent had proposed acquiring up to a 20% stake but DeepSeek was reluctant to cede that share of control; Alibaba's role was reportedly smaller. As of the April 28, 2026 publication date of this page the round had not closed. The funding talks landed two days before the V4 release, and the timing has been read as DeepSeek positioning the line for a frontier-scale 2026 capex push that High-Flyer alone could not underwrite. The refresh task should re-check the round status on every run.
Where to run DeepSeek
DeepSeek is widely deployed because the weights are open and the API is OpenAI-compatible. Inference paths through 2025–2026 break into four categories.
DeepSeek's own API. The first-party endpoint at api-docs.deepseek.com is OpenAI-API-compatible, so any OpenAI SDK can be pointed at it with only a base-URL change. Pricing has historically been an order of magnitude cheaper than Western frontier-model APIs (V4-Flash at $0.14 / M input tokens at launch).
Self-host from HuggingFace. Download from the deepseek-ai org and run with vLLM, SGLang, llama.cpp, or Ollama. The full V3 / V3.1 / V3.2 / V4-Pro models require multi-node H100 / H200 / B200 deployments at full precision; quantized variants ship from the open-source community shortly after each release.
Hosted-inference providers. Together AI, Fireworks AI, OpenRouter, SiliconFlow, Groq, Perplexity's public-API tier. Most providers serve the post-MIT weights (R1 forward) and clearly label which version is hosted.
Hyperscalers. AWS Bedrock, Microsoft Azure AI Foundry, NVIDIA NIM, IBM watsonx, and Oracle OCI have all added DeepSeek SKUs across 2025–2026 (typically the MIT-licensed R1, V3.1, V3.2, V4 lineage). Google Cloud Vertex has been slower to add DeepSeek; check the providers' model catalogs for the current state.
People who shaped DeepSeek
Liang Wenfeng — founder and CEO of DeepSeek, co-founder and CEO of High-Flyer. The 2021 GPU-stockpile decision, the July 2023 DeepSeek incorporation, the V2 MoE / MLA bet, and the 2025 MIT-licensing turn all trace through Liang's office. Profiled in Fortune; on Wikipedia at Liang Wenfeng.
High-Flyer (Hangzhou Huanfang Technology Co., Ltd.) — the parent quantitative hedge fund. Co-founded by Liang in February 2016; reported to be using AI exclusively for trading by 2021. The funder of the Fire-Flyer 2 cluster and DeepSeek's only investor through April 2026.
DeepSeek's research staff — the lab is known for an unusually flat structure, a young research team (many recent PhD graduates from Tsinghua, Peking University, and Zhejiang University), and a publication culture that ships technical reports alongside model releases. Named-author rosters appear on the V2 / V3 / R1 / V3.2 papers on arXiv. Several core researchers have reportedly been recruited away to ByteDance, Tencent, Xiaomi, and the autonomous-driving company Yuanrong Qihang during 2025; named-departure tracking is sparse compared to U.S. labs.
No publicly-named CTO, CEO-second, or board. Unlike OpenAI, Anthropic, xAI, and Google DeepMind, DeepSeek does not maintain a leadership page; corporate governance is held inside the High-Flyer / DeepSeek Liang-led structure. The April 2026 Tencent / Alibaba round is the first external-investor relationship and may produce a board structure when it closes.
The competitive landscape
DeepSeek is, alongside Alibaba's Qwen line, one of the two dominant Chinese open-weights AI families through 2025–2026. The closest direct comparators on the open-weights axis are Alibaba's Qwen (also Apache-2.0-or-permissive across most releases, with strong HuggingFace-leaderboard presence — see Qwen Versions), Mistral (French; mixed Apache 2.0 / Mistral Research License / proprietary tiers across the line, with the December 2025 “Mistral 3” family relaunch re-committing the open releases to Apache 2.0 — see Mistral Versions), Meta's Llama (custom Llama Community License, see Llama Versions), and Moonshot AI's Kimi line. The closed-weights frontier competitors — ChatGPT, Claude, Gemini, Grok — are the practical benchmark for “is DeepSeek competitive at frontier scale,” which is the question the V3 / R1 / V3.2 / V4 release cycle has been answering in the affirmative since December 2024. This page does not attempt a benchmark roundup or a ranking.