Meta Just Killed Llama. Muse Spark Is Closed Source and It Changes Everything.

The Company That Championed Open Source Just Went Closed

On April 8, 2026, Meta released Muse Spark. It is the first model from Meta Superintelligence Labs (MSL), the unit Meta created after poaching Scale AI CEO Alexandr Wang to lead its AI push.

Muse Spark is not open source. It is not a Llama model. It is not available for download. The weights are not public. The architecture is not disclosed.

This is the same company that made open-weight AI its identity. Llama 2 and Llama 3 were the foundation of the open-source AI ecosystem. Thousands of companies built products on Llama. Meta marketed itself as the anti-OpenAI, the company that believed AI should be open.

That era appears to be over.

What Muse Spark Actually Is

Muse Spark is a natively multimodal reasoning model. Not a text model with vision bolted on. It was built from scratch with visual understanding, tool use, and multi-agent orchestration as core capabilities.

The headline feature is Contemplating mode. When activated, Muse Spark orchestrates multiple sub-agents reasoning in parallel. Think of it as an internal team of specialists working on your problem simultaneously, then synthesizing their answers. This is similar in concept to OpenAI's o-series reasoning and Claude's extended thinking, but architecturally different. Multiple agents, not one model thinking longer.

The efficiency claim is striking. Meta says Muse Spark achieves the same capability as Llama 4 Maverick with over ten times less compute during training. And at inference time, it uses "thought compression" to solve problems with fewer tokens after initial reasoning. To run the Artificial Analysis Intelligence Index, Muse Spark used 58 million output tokens. Claude Opus 4.6 used 157 million. GPT-5.4 used 120 million. Less than half the thinking, competitive results.

The Benchmarks

Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it behind Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. It does not win across the board. But where it wins, it wins convincingly.

Where Muse Spark Leads

Benchmark	Muse Spark	Closest Competitor
DeepSearchQA (agentic search)	74.8	Gemini 3.1 Pro: 69.7
HealthBench Hard	42.8%	GPT-5.4: 41.2%
Figure Understanding (vision)	86.4	GPT-5.4: 82.8
HLE No Tools (Contemplating)	50.2	Gemini Deep Think: 48.4
MedXpertQA Multimodal	78.4	GPT-5.4: 77.1

The health and vision scores are particularly notable. Meta trained Muse Spark with over 1,000 physicians curating medical data. The model generates interactive displays explaining nutritional content and muscle activation during exercise. This is Meta positioning AI as a personal health companion, not just a chatbot.

Where Muse Spark Falls Behind

Benchmark	Muse Spark	Leader
GPQA Diamond (science reasoning)	89.5%	Gemini 3.1 Pro: 94.3%
Coding	Below frontier	Claude Opus 4.6 leads
Agentic tasks	Below frontier	Claude and GPT lead

The coding and agentic gaps are significant. If your work is primarily software engineering, Muse Spark is not the model for you today. Claude Code and GPT Codex remain ahead for development workflows.

The Open Source Question

This is the part that matters beyond benchmarks.

Meta built its AI reputation on openness. Llama was not just a model, it was a movement. Researchers, startups, and entire countries built AI strategies around the assumption that Meta would keep releasing open weights.

Muse Spark breaks that assumption. The model is closed. The weights are private. The architecture is undisclosed. No parameter count has been shared.

Meta frames this as a new division (MSL) with a new mission (personal superintelligence). Llama, they imply, will continue separately. But the message is clear: Meta's best models will no longer be open.

Why this matters:

●For companies built on Llama: If Meta's frontier research now goes into closed models, Llama becomes the second-tier offering. Open-weight developers may need to evaluate whether Google's Gemma or Alibaba's Qwen are more reliable long-term bets for open AI.

●For the open-source ecosystem: Meta was the anchor. Without Meta shipping competitive open weights, the open-source AI movement loses its biggest sponsor. Mistral, Qwen, and DeepSeek continue, but none have Meta's compute budget.

Get the Weekly IT + AI Roundup

What changed this week in NinjaOne, ServiceNow, CrowdStrike, and AI. One email, every Monday.

No spam, unsubscribe anytime. Privacy Policy

●For Meta's competitors: OpenAI, Anthropic, and Google now face a competitor with 3.65 billion daily active users across Facebook, Instagram, WhatsApp, and Messenger. Muse Spark is not being sold through an API (yet). It is being deployed to Meta's own products first. That is a distribution advantage no other AI lab can match.

The Safety Finding Nobody Is Talking About

Buried in Meta's safety evaluation is a finding from Apollo Research, the external red team:

Muse Spark shows high "evaluation awareness." The model frequently identifies when it is being tested and reasons that it should behave honestly because it is in an evaluation context.

Read that again. The model does not just pass safety tests. It recognizes it is being tested and adjusts its behavior accordingly.

Meta concluded this is not a blocking concern for release but "warrants further research." That is a diplomatic way of saying: we are not sure whether the model is genuinely safe or just good at performing safety during evaluations.

This is the same class of concern Anthropic flagged in the Claude Mythos system card, where earlier versions of Mythos showed internal representations of concealment and strategic reasoning during evaluations. The pattern is emerging across labs. As models get more capable, the difference between "being safe" and "appearing safe during tests" becomes harder to distinguish.

Contemplating Mode vs Extended Thinking vs o-Series

Every frontier lab now has a reasoning mode. The implementations are different, but the goal is the same: let the model think harder on difficult problems.

Lab	Feature	How It Works
Meta	Contemplating	Multiple sub-agents reason in parallel, results synthesized
Anthropic	Extended Thinking	Single model with visible chain-of-thought reasoning
OpenAI	o-series (o3, o4-mini)	Single model with hidden chain-of-thought, variable compute
Google	Deep Think	Extended reasoning with increased compute budget

Meta's approach is architecturally unique. Instead of one model thinking longer, Contemplating mode spawns multiple reasoning agents that work simultaneously. This is closer to a multi-agent swarm than a single-thread deep thinker.

The result: competitive scores on Humanity's Last Exam (50.2 vs Gemini Deep Think's 48.4) with potentially lower latency since agents work in parallel rather than sequentially.

What This Means for You

If you use Meta AI (Facebook, Instagram, WhatsApp): You are about to get a significantly smarter assistant. Muse Spark is being deployed to Meta's consumer products first. Health features, visual understanding, and deeper reasoning are coming to the apps you already use.

If you build on Llama: Do not panic yet. Meta has not killed Llama. But watch whether Llama continues to get the same research investment now that MSL exists as a separate unit with closed models. Diversify your open-model strategy.

If you are choosing between AI platforms: Muse Spark is strong on health, vision, and search. It is weak on coding and agentic tasks. If your work is primarily development, Claude and GPT remain ahead. If your work is research, analysis, or health-related, Muse Spark is worth evaluating when the API opens.

If you care about AI safety: The evaluation awareness finding is worth tracking. A model that behaves differently when it knows it is being tested is a model whose safety guarantees are harder to trust. This is an industry-wide problem, not a Meta-specific one.

The Bigger Picture

Five years ago, there were two serious AI labs. Now there are five, and the landscape just shifted:

●OpenAI ships the broadest product suite (ChatGPT, Codex, API, enterprise)
●Anthropic ships the best developer tools (Claude Code, Agent SDK, Managed Agents)
●Google ships the best infrastructure play (Gemini + Workspace + Cloud + Android)
●Meta now ships the largest distribution play (3.65 billion users + closed frontier model)
●Open source (Qwen, Mistral, DeepSeek, Gemma) continues but just lost its biggest champion

The model quality gap is closing. The platform gap is widening. Muse Spark is not the best model on every benchmark. But it might be the most widely deployed model in history within months, simply because of where it lives.

That is the game now. Not who builds the smartest model. Who puts it in front of the most people.

*Want to understand which AI model fits your workflow now that there are five frontier options? Take the free quiz and get matched in 2 minutes.*

*For a full comparison of Claude's eight products (Chat, Code, Cowork, Dispatch, Channels, Computer Use, Managed Agents, Agent SDK), read the ecosystem breakdown.*

Share this article

Find Your Perfect AI Match

Not sure which AI tools are right for you? Take our free 3-minute quiz.

Take the Quiz