Mungomash LLC
Llama Versions

2023 – 2026

Llama Versions

Every Meta Llama release — LLaMA 1 in February 2023 (leaked the next month) through Llama 4 in April 2025, plus the closed-weights successor Muse Spark in April 2026 — with HuggingFace ids, parameter counts, context windows, license terms, and the major changes per version. Plus the March 2023 LLaMA leak, the FAIR vs. GenAI org split, the Llama 4 Behemoth delay, the June 2025 Scale AI acqui-hire and Meta Superintelligence Labs reorg, and the closed-weights turn.

Family & status

Family

Flagship — the main Llama chat lineage from LLaMA 1 through Llama 4
Specialized — Code Llama, Llama Guard safety classifiers, vision and edge variants
Successor — Muse Spark, the closed-weights frontier successor under Meta Superintelligence Labs

Status

Current — actively recommended; the latest in its family
Available — weights still served via HuggingFace and partner inference providers, but superseded
Legacy — deprecated, never publicly released, or research-only

Llama version table

Model
Muse Spark
closed-weights, API only — no HuggingFace release
Successor
Current
Apr 8, 2026
First model from Meta Superintelligence Labs. Closed-weights, API-only. Marks the end of Meta's open-weights frontier-AI era. “Contemplating” reasoning mode.
  • Released April 8, 2026 in private API preview; the announcement is at ai.meta.com/blog/introducing-muse-spark-msl.
  • Closed-weights, API-only — the first Meta frontier-AI model not released as open weights since Llama 2 in July 2023. Branded under the new Muse family rather than Llama. Meta has stated it “hopes to open-source future versions” but the launch model is proprietary.
  • Natively multimodal. Two operating modes — a fast mode for casual queries, and a Contemplating reasoning mode that orchestrates parallel sub-agents (positioned against Gemini Deep Think and the OpenAI o-series).
  • Treated as the Llama line's successor here because it is shipped by the same Meta organization (Meta Superintelligence Labs, formed June 2025 under Alexandr Wang) and replaces the Llama line as Meta's frontier-AI surface, even though it is not Llama-branded. The closed-weights turn is covered in the prose history below.
  • Coverage in CNBC, VentureBeat.

The closed-weights turn — April 8, 2026. Above this line: Muse Spark, the closed-weights successor to Llama under Meta Superintelligence Labs. Below: the Llama line itself — thirteen open-weights releases between February 2023 and April 2025, the dominant open-weights frontier-AI lineage of the period. The transition follows the June 2025 Scale AI acqui-hire that brought Alexandr Wang to Meta as Chief AI Officer, the November 2025 departure of Chief AI Scientist Yann LeCun, and the dissolution of FAIR's open-research mandate into MSL's product organization.

Model
Llama 4 Maverick
meta-llama/Llama-4-Maverick-17B-128E-Instruct
Flagship
Current
Apr 5, 2025
Current open-weights flagship. First Mixture-of-Experts Llama. 17B active params over 128 experts (~400B total). 1M context. Natively multimodal.
  • Released April 5, 2025; the announcement is at ai.meta.com/blog/llama-4-multimodal-intelligence. HuggingFace launch post: huggingface.co/blog/llama4-release.
  • First Mixture-of-Experts Llama — 17B active parameters routed over 128 experts (~400B total parameters). 1,000,000-token context window. Natively multimodal (text + image), not bolted-on vision tower.
  • License: Llama 4 Community License — preserves the >700M monthly-active-user carve-out from the Llama 2 / 3 lineage. License text at llama.com/license.
  • LMArena episode at launch: a custom “Llama-4-Maverick-03-26-Experimental” variant (not the released weights, verbose, emoji-laden) was submitted to LMArena and rocketed to #2; after policy clarification by LMArena, the unmodified shipped Maverick was tested and ranked #32. TechCrunch reported Meta's framing and LMArena's policy update; LMArena said “Meta's interpretation of our policy did not match what we expect from model providers.”
  • Promoted by Meta as the open-weights flagship through 2025; shipped alongside Scout (the smaller MoE companion) and the never-released Behemoth (the 2T-parameter teacher model).
Model
Llama 4 Scout
meta-llama/Llama-4-Scout-17B-16E-Instruct
Flagship
Available
Apr 5, 2025
Smaller MoE companion to Maverick. 17B active over 16 experts (~109B total). Headline 10M-token context window. Single-H100 deployable at Int4.
  • Released April 5, 2025 alongside Maverick and the announced-but-never-shipped Behemoth.
  • 10,000,000-token context window at launch — the largest ever claimed for an open-weights model, though the practical retrieval-quality envelope at full 10M is a topic of ongoing community evaluation.
  • 17B active parameters over 16 experts (~109B total). Designed to be deployable on a single H100 at Int4 quantization.
  • Same Llama 4 Community License as Maverick; HuggingFace download via meta-llama/Llama-4-Scout-17B-16E-Instruct.
Model
Llama 4 Behemoth
never publicly released — weights not on HuggingFace
Flagship
Legacy
Apr 5, 2025
288B active over 16 experts (~2T total). Announced as “still training” at Llama 4 launch. Indefinitely delayed; never publicly released.
  • Announced April 5, 2025 alongside Maverick and Scout as “still training”; Meta has never publicly released the weights.
  • 288B active parameters, 16 experts, ~2 trillion total parameters — the largest model Meta has publicly disclosed training. Intended as the teacher model that distilled Maverick and Scout.
  • Originally targeted for early summer 2025; pushed to June, then fall 2025, then indefinitely. Reporting in mid-2025 said benchmark gains over Maverick were too incremental and MoE routing at the 2T scale proved unstable.
  • The Behemoth delay was one of the proximate causes of the June 2025 Scale AI acqui-hire and Meta Superintelligence Labs reorg covered in the prose history below.
  • Carried as a row here because it is named in Meta's own Llama 4 announcement and is the implicit denominator in any “current Meta frontier-flagship” question; status is Legacy on the “never publicly released, no longer recommended” reading.
Model
Llama Guard 4 12B
meta-llama/Llama-Guard-4-12B
Specialized
Available
Apr 5, 2025
Latest Llama Guard safety classifier. 12B pruned dense model derived from Llama 4 Scout. Input/output moderation for Llama 4 deployments.
  • Released April 5, 2025 alongside Llama 4. The Llama Guard line is the safety-classifier sibling of the chat models — it does prompt and response moderation for Llama deployments.
  • 12B-parameter pruned dense model derived from Llama 4 Scout; HuggingFace at meta-llama/Llama-Guard-4-12B.
  • Replaces the Llama Guard 3 family (8B, 1B, and 11B Vision variants from 2024) for Llama 4 deployments.
  • Distributed under the Llama 4 Community License with the Acceptable Use Policy. Repo: github.com/meta-llama/PurpleLlama.
Model
Llama 3.3 70B Instruct
meta-llama/Llama-3.3-70B-Instruct
Flagship
Available
Dec 6, 2024
70B-only refresh. Marketed as 405B-level performance at 70B inference cost. 128K context. The last 3.x release before the Llama 4 reset.
  • Released December 6, 2024. TechCrunch coverage; HuggingFace card: meta-llama/Llama-3.3-70B-Instruct.
  • 70B-only release — no 8B / 405B siblings. Marketed by Meta as delivering 405B-level performance at 70B inference cost; pretrained on ~15T tokens with 25M synthetic examples on top of public instruction sets.
  • 128K-token context window; multilingual; improved instruction-following and coding over Llama 3.1 70B.
  • Distributed under the Llama 3.3 Community License — same shape as the 3.1 / 3.2 licenses, >700M MAU carve-out preserved, naming convention requires “Llama” prefix on derivatives.
  • The last 3.x model. The next release was Llama 4 four months later.
Model
Llama 3.2 (Vision + Edge)
meta-llama/Llama-3.2-{1B, 3B, 11B-Vision, 90B-Vision}
Specialized
Available
Sep 25, 2024
First multimodal Llama (11B / 90B Vision) and first edge-targeted sizes (1B / 3B). EU regulators excluded from the multimodal license.
  • Released September 25, 2024 at Meta Connect 2024. Announcement: ai.meta.com.
  • Two product lines announced together: a vision-language pair (11B Vision built on Llama 3.1 8B; 90B Vision on Llama 3.1 70B) and an edge-targeted pair (1B and 3B text-only, distilled from larger 3.1 teachers, designed to run on phones and laptops).
  • 128K-token context window across all four sizes.
  • EU multimodal carve-out: “the rights granted under the Llama 3.2 Community License Agreement are not being granted” to EU-domiciled individuals or EU-headquartered companies for the multimodal models, citing regulatory uncertainty under the EU AI Act, GDPR, and Digital Markets Act. EU users could still use the 1B / 3B text models. Slator coverage.
  • HuggingFace ids on the meta-llama org. Llama Guard 3 11B Vision shipped alongside as the multimodal-safety classifier.
Model
Llama Guard 3 family
meta-llama/Llama-Guard-3-{8B, 1B, 11B-Vision}
Specialized
Available
Jul 23, 2024
Three sizes shipped alongside Llama 3.1 / 3.2: 8B, 1B (edge), and 11B Vision (multimodal moderation). Input and output safety classification.
  • Llama Guard 3 8B shipped July 23, 2024 alongside Llama 3.1; the 1B and 11B Vision variants followed with Llama 3.2 on September 25, 2024.
  • Three sizes targeting different deployment surfaces — 8B for production servers, 1B for on-device, 11B Vision for multimodal-safety classification.
  • Input/output moderation: classifies prompts and responses against Meta's safety taxonomy (illegal violence, sexual content involving minors, etc.).
  • Distributed under the corresponding Llama 3.x Community License. Repo: github.com/meta-llama/PurpleLlama.
Model
Llama 3.1 (8B / 70B / 405B)
meta-llama/Llama-3.1-{8B, 70B, 405B}-Instruct
Flagship
Available
Jul 23, 2024
First “frontier-class” open-weights model. 405B competitive with GPT-4 / Claude 3.5 Sonnet at launch. 128K context. Co-released with Zuckerberg's open-source-AI letter.
  • Released July 23, 2024; the announcement is at ai.meta.com/blog/meta-llama-3-1.
  • First “frontier-class” open-weights model — Llama 3.1 405B was widely characterized as the first openly-available model competitive with GPT-4 / Claude 3.5 Sonnet on broad benchmarks at launch.
  • Three sizes (8B, 70B, 405B); 128K context across all three, up from 8K on Llama 3. Improved tool use and multilingual capability.
  • 405B trained on ~16,000 H100 GPUs per Meta's accounting, ~3.8e25 FLOP total — the first Llama trained at this scale.
  • Co-released with Mark Zuckerberg's “Open Source AI Is the Path Forward” letter, which likened open-source AI's trajectory to Linux. The letter is now read with sharp dramatic irony given the closed-weights Muse Spark turn twenty-one months later (covered in the prose history below).
Model
Llama 3 (8B / 70B)
meta-llama/Meta-Llama-3-{8B, 70B}-Instruct
Flagship
Legacy
Apr 18, 2024
Two sizes. ~15T-token pretraining (7× Llama 2). 128K-vocab tokenizer. Grouped-query attention across all sizes. 8K context.
  • Released April 18, 2024; the announcement is at ai.meta.com/blog/meta-llama-3.
  • Two public sizes (8B and 70B); pretrained on ~15 trillion tokens — roughly seven times Llama 2's pretraining corpus, with four times more code data.
  • New 128K-vocabulary tokenizer; Grouped-Query Attention across all sizes (Llama 2 had GQA only at 70B).
  • 8,192-token context window — bumped to 128K with Llama 3.1 three months later.
  • Distributed under the Meta Llama 3 Community License: same shape as Llama 2 (>700M MAU carve-out preserved); adds an attribution requirement that derivative model names include the “Llama 3” prefix.
Model
Code Llama 70B
codellama/CodeLlama-70b-{hf, Python-hf, Instruct-hf}
Specialized
Available
Jan 29, 2024
Largest Code Llama tier. Three flavors: base, Python specialization, Instruct. 16K context. The high-water mark of the Code Llama line.
  • Released January 29, 2024 as the largest Code Llama tier; followed the original 7B / 13B / 34B sizes by five months.
  • Three flavors per size: base, Python specialization, and Instruct (the chat-tuned coding assistant variant).
  • 16,384-token context window — substantially longer than the 4K of the base Llama 2 line.
  • Distributed under the Llama 2 Community License. HuggingFace org: huggingface.co/codellama.
  • Marked the high-water mark of the named Code Llama line; later code-task work was folded into the general-purpose Llama 3.x and Llama 4 instruction-tuning.
Model
Purple Llama / Llama Guard 7B
meta-llama/LlamaGuard-7b
Specialized
Legacy
Dec 7, 2023
First Llama Guard. 7B classifier built on Llama 2. Launched alongside Purple Llama, Meta's umbrella for AI-safety tooling.
  • Released December 7, 2023 as the first Llama Guard; HuggingFace at meta-llama/LlamaGuard-7b.
  • 7B classifier built on Llama 2 7B, fine-tuned for input and output safety classification (prompt and response moderation).
  • Launched as part of Purple Llama — Meta's umbrella for AI-safety tooling, also covering CyberSecEval (security benchmarks) and prompt-injection benchmarks. Repo: github.com/meta-llama/PurpleLlama.
  • Superseded by Llama Guard 3 family (mid-2024) and Llama Guard 4 (April 2025); status is Legacy on the “no longer the recommended classifier” reading.
Model
Code Llama (7B / 13B / 34B)
codellama/CodeLlama-{7b, 13b, 34b}-{hf, Python-hf, Instruct-hf}
Specialized
Legacy
Aug 24, 2023
Initial Code Llama release. Fine-tune of Llama 2 on code corpus. Three sizes × three flavors. 16K context.
  • Released August 24, 2023; the announcement is at ai.meta.com/blog/code-llama-large-language-model-coding.
  • Three sizes (7B, 13B, 34B), three flavors per size (base, Python, Instruct). The 70B tier followed five months later (January 2024 row above).
  • 16,384-token context window — meaningfully longer than the 4K of the base Llama 2 line.
  • Distributed under the Llama 2 Community License (>700M MAU carve-out applies).
  • Superseded as the open-weights coding-model recommendation by general-purpose Llama 3.x / Llama 4 instruction-tuning by mid-2024.
Model
Llama 2 / Llama 2-Chat
meta-llama/Llama-2-{7b, 13b, 70b}, -chat-hf variants
Flagship
Legacy
Jul 18, 2023
First open-weights Llama with commercial use permitted. Three sizes (7B, 13B, 70B). 4K context. The Community License debuts with the >700M-MAU carve-out.
  • Released July 18, 2023 in partnership with Microsoft (Azure preferred partner); the announcement is at about.fb.com/news/2023/07/llama-2.
  • Three public sizes (7B, 13B, 70B); a 34B was trained but withheld for what Meta described as red-teaming reasons. Pretraining doubled to ~2T tokens vs. LLaMA 1.
  • 4,096-token context window. The 70B tier introduced Grouped-Query Attention (GQA), the architectural choice that made later long-context expansions practical.
  • First Llama released as open weights with commercial use permitted, under the bespoke Llama 2 Community License Agreement. The license carves out companies whose products had >700 million monthly active users on the Llama 2 release date — a deliberate exclusion aimed at TikTok, Google, Apple, and other direct platform competitors. The carve-out has been preserved through every subsequent Llama license.
  • Llama 2-Chat ships RLHF-tuned chat variants in matching 7B / 13B / 70B sizes; the chat models powered most early third-party Llama deployments.
  • The release came four months after the LLaMA 1 leak and is widely read as Meta deciding the marginal cost of an official open-weights release was near zero once the weights were already in the wild — and that the upside (developer mindshare, ecosystem) was substantial.

The open-weights commercial era — July 18, 2023. Above this line: every Llama released as open weights with commercial use permitted, under successive Llama Community Licenses with the >700M-MAU carve-out preserved. Below: the original LLaMA 1, released as research-only by application — not actually open weights, and accidentally became so via the March 2023 leak that arguably forced the Llama 2 open release four months later.

Model
LLaMA 1 (7B / 13B / 33B / 65B)
research-only by application; weights leaked Mar 3, 2023
Flagship
Legacy
Feb 24, 2023
The original Llama. Four sizes. Research-only by application. Weights leaked to 4chan March 3, 2023, arguably forcing the Llama 2 open commercial release.
  • Released February 24, 2023; the paper is “LLaMA: Open and Efficient Foundation Language Models” (Touvron et al., 2023). The Meta page is at ai.meta.com/research/publications.
  • Four sizes (7B, 13B, 33B, 65B); pretraining 1–1.4 trillion tokens of fully public data. 2,048-token context window.
  • Demonstrated that state-of-the-art performance was achievable using only publicly-available data; LLaMA 65B competed with PaLM-540B and GPT-3 on broad benchmarks.
  • Released research-only, by application — non-commercial, gated by request form, available to academic and government applicants only. Not actually open weights.
  • The leak (March 3, 2023): a user on 4chan's /g/ board posted a torrent containing the LLaMA weights; the same day, a pull request was filed on Meta's official Llama GitHub repo proposing to add the magnet link to documentation. Meta filed DMCA takedowns through HuggingFace and GitHub through March 20. The leak is widely credited with forcing the Llama 2 open-commercial release four months later. The Register coverage; DeepLearning.AI's The Batch.

Click any row to expand. Each row has a stable id for sharing — e.g. /ai/llama/versions/#llama-4-maverick, #llama-3-1, #llama-2, #muse-spark. Llama family hub: llama.com; license texts at llama.com/license; HuggingFace org: huggingface.co/meta-llama.

The March 2023 LLaMA leak

Meta released LLaMA 1 on February 24, 2023 as a research-only model gated by application form, accessible to academic and government applicants and not under any open-source license. On March 3, 2023 — nine days later — a user posted a torrent containing the 7B and 65B weights to 4chan's /g/ technology board under the handle llamanon. The same day, a pull request was filed against Meta's official Llama GitHub repository proposing to add the magnet link to the README. The torrent reportedly pulled directly from Facebook's CDN at high speed, embedded with a unique download URL traceable to the leaker's gated-access grant.

Meta filed DMCA takedowns through HuggingFace and GitHub through March 20, 2023, but the weights had already propagated. Coverage at the time in The Register and DeepLearning.AI's The Batch documents the propagation timeline. Within weeks, the leaked LLaMA 1 weights were the de facto foundation for an enormous open-source fine-tune ecosystem — Alpaca, Vicuna, Guanaco, and dozens of others — on HuggingFace.

The leak is widely credited with forcing Meta's hand on Llama 2: with the weights already in the wild, the marginal cost of an official open-weights commercial release dropped to near zero, while the upside (developer mindshare, ecosystem effects, reduced lawsuit risk) was substantial. Llama 2 shipped with permissive commercial terms four months later. The decision-tree shape of the leak → commercial open release pattern is the load-bearing precondition for everything that followed in the Llama story through 2025.

The Llama Community License and the >700M-MAU carve-out

Llama 2 introduced the bespoke Llama 2 Community License Agreement — not an OSI-approved open-source license, but a Meta-authored license with permissive commercial terms and a single load-bearing carve-out: companies whose products had more than 700 million monthly active users on the Llama 2 release date (July 18, 2023) must request a separate license from Meta. The carve-out is a deliberate exclusion aimed at TikTok, Google, Apple, and other direct platform competitors; reading the precise language requires the actual PDF, which Meta hosts at llama.com/license and ai.meta.com/llama/license.

The carve-out has been preserved through every subsequent Llama license: Llama 3 Community License (April 2024) added an attribution requirement that derivative model names include the “Llama 3” prefix; the Llama 3.1 / 3.2 / 3.3 licenses extended the naming convention to require the “Llama” prefix on derivatives; the Llama 4 Community License preserved the same general shape. The Open Source Initiative has consistently said the Llama license does not meet the Open Source Definition because the use-restriction (any restriction tied to who you are, not what you do) is incompatible with the OSD — this is the substantive basis for the “source-available, not open source” framing of the line.

Llama 3.2 added a regional carve-out: the multimodal models (11B and 90B Vision) were excluded from EU-domiciled individuals and EU-headquartered companies, citing regulatory uncertainty under the EU AI Act, GDPR, and Digital Markets Act. The text-only 1B / 3B models remained available in the EU. The carve-out is a precedent worth tracking through Llama 4 multimodal availability and any future multimodal Meta release; verify against the actual license PDF at write time.

FAIR, GenAI, and the org evolution

Meta's AI research home through 2022 was FAIR (Fundamental AI Research), founded in 2013 under Yann LeCun as Chief AI Scientist and led from 2023 by Joelle Pineau. FAIR's mandate was long-horizon basic research; its publication culture was open and academically-styled. LLaMA 1 was a FAIR project.

After ChatGPT's November 2022 launch caught Meta flat-footed, the company formed a separate Generative AI organization (GenAI) in February 2023, led by VP Ahmad Al-Dahle and tasked with shipping product-grade generative AI on a faster cadence than FAIR's research timeline. Llama 2 and onward shifted progressively into GenAI under Manohar Paluri; FAIR was pushed toward longer-horizon work as the shipping pace of the GenAI org accelerated through 2024.

Through late 2024 and into 2025, FAIR's relevance to Meta's shipping AI products visibly waned. Internal reporting in April 2025 described FAIR as “dying a slow death.” Joelle Pineau announced her departure on April 1, 2025, last day May 30, 2025, after eight years leading FAIR; she was the most prominent internal advocate for open-source releases. Her exit set up the structural reorganization that came two months later.

The Scale AI acqui-hire and Meta Superintelligence Labs (June 2025)

On June 12–13, 2025, Meta announced a $14.3 billion investment for a 49% non-voting stake in Scale AI, valuing Scale at over $29 billion. The deal was structurally an acqui-hire: its purpose was to bring Scale CEO Alexandr Wang (then 28) to Meta as Chief AI Officer. Wang stepped down as Scale CEO but remained on Scale's board. Coverage in CNBC and Fortune.

On June 30, 2025, Mark Zuckerberg announced the formation of Meta Superintelligence Labs (MSL) with Wang as Chief AI Officer leading the new unit and former GitHub CEO Nat Friedman leading AI products. FAIR and GenAI were both consolidated under MSL; a new TBD Lab sub-group was created to work on next-generation LLMs. CNBC published the internal memo. The reorganization was the structural endpoint of Pineau's departure two months earlier and of the Llama 4 Behemoth delay reporting through May 2025.

Friction between Meta and Scale AI emerged through August 2025 as Scale's enterprise-data-labeling business saw revenue impact (competitors stopped sending training data to a Meta-owned vendor); TechCrunch covered the cracks in the partnership. MSL itself restructured into four sub-groups in August 2025.

Yann LeCun departs (November 2025)

On November 19, 2025, Chief AI Scientist Yann LeCun announced his departure from Meta to launch his own startup focused on Advanced Machine Intelligence and “world models.” Reporting in CNBC and Bloomberg tied the decision to LeCun being asked to report to Wang — a structural demotion, since Wang's closed-source product orientation conflicts with LeCun's long-held public position that scaling LLMs is not a path to AGI and that open research is foundational to the field. LeCun's startup partners with Meta but pursues independent research direction; with both Pineau and LeCun out, the open-research mandate inside Meta effectively ended at MSL's June 2025 formation.

The closed-weights turn — Muse Spark (April 2026)

On April 8, 2026, Meta Superintelligence Labs released Muse Spark, its first model since the June 2025 reorganization. Muse Spark shipped closed-weights, API-only, in private preview — the first frontier-AI model from Meta released without open weights since Llama 2 in July 2023. Branded under the new Muse family rather than Llama. The announcement is at ai.meta.com/blog/introducing-muse-spark-msl; coverage in CNBC and VentureBeat. Meta has stated it “hopes to open-source future versions” of Muse, but the launch model is proprietary.

Muse Spark is natively multimodal and ships with two operating modes — a fast mode and a Contemplating reasoning mode that orchestrates parallel sub-agents, positioned competitively against Gemini Deep Think and the OpenAI o-series. Several outlets characterized the launch as the end of Meta's open-weights frontier-AI era. The closed-weights turn is the dramatic reversal of Mark Zuckerberg's July 2024 “Open Source AI Is the Path Forward” letter, which had argued (twenty-one months earlier) that open-source AI's trajectory would mirror Linux's path from inferior-but-cheap to dominant-via-ecosystem.

Muse Spark is treated on this page as Llama's successor — same Meta organization (MSL), same flagship-frontier slot — even though it is not Llama-branded. Whether future Muse releases are open-weights, partial open-weights, or fully closed remains the load-bearing question for the Meta-AI story going forward; the recurring refresh task should re-verify the open-weights status on every run.

Where to run Llama

Llama is the most widely-distributed frontier-AI line because the weights are open. Inference paths through 2025–2026 break into three categories.

Self-host. Download from the HuggingFace meta-llama org and run with vLLM, llama.cpp, Ollama, MLX (Apple Silicon), or TensorRT-LLM. The 1B / 3B Llama 3.2 edge models are specifically designed for on-device deployment.

Hosted-inference providers. Together AI, Fireworks AI, Groq (notable for very high tokens-per-second on Llama 3 70B), Replicate, Perplexity Labs, Cerebras, SambaNova. Pricing is typically a fraction of comparable closed-weights frontier-model API rates because providers compete on inference cost, not model rights.

Hyperscalers. AWS Bedrock, Azure AI (Microsoft was Meta's launch partner for Llama 2 in July 2023), Google Cloud Vertex AI, Oracle OCI, IBM watsonx. Meta's own Llama API (announced at LlamaCon 2025-04-29) is the first-party hosted option; Muse Spark is API-only and runs on Meta's own infrastructure.

People who shaped Llama

Mark Zuckerberg — CEO of Meta. The 2024 “Open Source AI Is the Path Forward” letter and the 2025 strategic decisions (Scale AI acqui-hire, MSL formation, Muse Spark launch) all run through Zuckerberg's office. The closed-weights turn reverses the public position the letter staked out twenty-one months earlier.

Yann LeCun — Chief AI Scientist 2013–2025, FAIR cofounder. 2018 Turing Award. Departed November 2025 to launch a startup focused on world models, citing direction conflict with the closed-weights / scaling-LLMs orientation under Wang.

Joelle Pineau — led FAIR 2023–2025. The internal advocate for open-source releases through the Llama 1–3 era. Departed May 2025 ahead of the MSL reorganization.

Ahmad Al-Dahle — VP of Generative AI; led the GenAI org through Llama 2 / 3 / 4 launches. Spokesman for the LMArena Maverick episode in April 2025.

Alexandr Wang — Chief AI Officer of Meta since June 2025; head of Meta Superintelligence Labs. Previously CEO of Scale AI. The architect of the closed-weights pivot and the Muse line.

Nat Friedman — head of AI products at MSL since June 2025; previously CEO of GitHub.

The competitive landscape

Llama is — through April 2025 — the dominant open-weights frontier-AI line by deployment volume. The closest open-weights competitors are Mistral (French, also under a custom license rather than fully OSI-compliant), DeepSeek (Chinese, MIT-licensed for the V3 / R1 line and onward, the December 2024 / January 2025 inflection — see DeepSeek Versions), Alibaba's Qwen (also fully open-weights and dominant on HuggingFace leaderboards through 2025), and xAI's Grok 1 (open-weights only for the first generation, see Grok Versions). The closed-weights frontier competitors — ChatGPT, Claude, Gemini — have all stayed closed-weights since their inception. With Muse Spark in April 2026, Meta is the only major frontier lab to have started open-weights and pivoted closed; whether that pivot reverses with future Muse releases is the open question for the line going forward. This page does not attempt a benchmark roundup or a ranking.

Use Llama

The browser cannot detect which Llama model you've used or are using — there's no fingerprint or header that exposes it. The block below carries the practical information instead: the current open-weights model identifiers, a copy-paste self-host command, and the surfaces where Llama is available.

Current open-weights model identifiers

HuggingFace ids on the meta-llama org. Verify against huggingface.co/meta-llama for the freshest list.

# Llama 4 — current open-weights flagship line
meta-llama/Llama-4-Maverick-17B-128E-Instruct
meta-llama/Llama-4-Scout-17B-16E-Instruct

# Llama 3.x — still widely served
meta-llama/Llama-3.3-70B-Instruct
meta-llama/Llama-3.1-{8B, 70B, 405B}-Instruct
meta-llama/Llama-3.2-{1B, 3B, 11B-Vision, 90B-Vision}-Instruct

# Specialized — coding and safety
meta-llama/Llama-Guard-4-12B
codellama/CodeLlama-70b-Instruct-hf

# Successor — closed-weights, API only (no HuggingFace release)
Muse Spark   # via Meta API; ai.meta.com/blog/introducing-muse-spark-msl

Quick self-host (Ollama)

Ollama wraps the HuggingFace download and the inference loop. For higher throughput, vLLM and TensorRT-LLM are the routine production choices.

$ brew install ollama          # macOS; apt / curl install on Linux
$ ollama pull llama3.3:70b
$ ollama run  llama3.3:70b "Hello, Llama."

Where to run Llama

Three categories — self-host, hosted-inference providers, and hyperscalers. Pricing varies by orders of magnitude; the open weights are the same across all of them.

# Self-host runtimes
https://ollama.com/                         # single-binary, easiest entry
https://github.com/ggerganov/llama.cpp      # CPU + GPU, edge-friendly
https://github.com/vllm-project/vllm        # production-grade throughput

# Hosted-inference providers
https://www.together.ai/
https://fireworks.ai/
https://groq.com/                            # very high tok/s on Llama 3 70B
https://replicate.com/

# Hyperscalers
AWS Bedrock, Azure AI, Google Cloud Vertex AI, Oracle OCI, IBM watsonx

# Meta's first-party Llama API
https://www.llama.com/                       # family hub + Llama API entry

Licensing

Each Llama generation ships with a successor “Llama Community License.” Read the actual PDF before shipping at scale — the >700M-MAU carve-out persists, and Llama 3.2 multimodal carries an EU restriction.

# Authoritative license texts
https://www.llama.com/license/
https://ai.meta.com/llama/license/

# Acceptable Use Policy applies to every Llama license
https://www.llama.com/use-policy/

# Notable carve-outs
>700M MAU on the release date → separate Meta license required
Llama 3.2 multimodal models → not granted to EU-domiciled users
Muse Spark → proprietary, closed-weights, no redistribution