The Paper That Raised One Billion Dollars
Yann LeCun's 2022 position paper left its hardest problem unsolved. Investors just bet $1 billion it doesn't matter.
In March 2026, a research paper raised $1.03 billion. Not a product. Not a company with revenue. A position paper — Yann LeCun’s own term for it — that ran no experiments, produced no benchmarks, and left its most important module, the one responsible for everything interesting, as “an open question for future investigation.”
The funding went to AMI Labs, LeCun’s new Paris-based company, which gathered nearly the entire senior research leadership of Meta’s AI division and launched at a $3.5 billion pre-money valuation before shipping a single product. The investor list reads like a catalog of global conviction: Jeff Bezos, Eric Schmidt, Tim Berners-Lee, Nvidia, Samsung, Toyota Ventures, Singapore’s sovereign wealth fund. What they purchased was not a working system. It was a technical argument — one that its author published in June 2022 under the title “A Path Towards Autonomous Machine Intelligence” — and a bet that the argument is correct.
What is being purchased? And is the thing being purchased actually there?
The Diagnosis
LeCun’s argument begins with a capability gap that is genuinely striking. An adolescent learns to drive in approximately twenty hours. The most advanced autonomous driving systems in the world, trained on millions of simulated miles and billions of real-world data points, still fail at tasks a sixteen-year-old handles reflexively. A child acquires the rules of grammar from a few thousand hours of ambient conversation. Large language models require corpora measured in hundreds of billions of tokens and still confuse objects that pass through solid walls with objects that behave normally.
The explanation LeCun offers is architectural. Current AI systems — including the LLMs that dominate industry conversation — are what he calls “word models,” not world models. They learn from text, which is language’s compressed description of experience, not experience itself. They predict tokens in a finite vocabulary, not states in a continuous physical universe. They have no internal simulator for testing whether a proposed action would result in a cup falling, a robot crashing, or a patient’s condition worsening. They can describe a table without knowing that objects placed on its edge will fall.
The total text available for training a modern LLM is estimated at roughly 10¹³ bytes. Thirty minutes of high-quality video contains equivalent information. A four-year-old child has observed more visual data than the entire internet’s text. This is not a scaling argument against LLMs — it is a structural argument. The medium is wrong. Language is a low-bandwidth byproduct of human intelligence, not its substrate.
This diagnosis is good. It is specific, falsifiable in principle, and supported by documented failure modes that scaling has not eliminated: physical reasoning errors, spatial reasoning failures, hallucinated physics. The capability gap is real. The explanation in terms of missing world models is plausible and well-grounded in cognitive science. Whether the proposed remedy is equally well-grounded is a different question.
The Architecture and Its Honest Gap
The Joint Embedding Predictive Architecture — JEPA — is the paper’s core technical proposal, and it earns serious attention. The key insight is elegant: instead of predicting the next token, or the next frame, or the next pixel — all of which require representing the full dimensionality of irreducible noise — a JEPA predicts the representation of what comes next. The encoder learns to discard what cannot be predicted. The model learns what is invariant, what is structured, what is causally connected to what came before. Texture doesn’t matter. Trajectory does. Exact leaf position doesn’t matter. Tree stability does.
This is not just an engineering trick. It is a claim about the structure of intelligence: that knowing what can be predicted is the foundation of useful knowledge, and that the ability to discard irrelevant detail is not a limitation but a prerequisite for generalization. The training framework is correspondingly rigorous. LeCun frames self-supervised learning as Energy-Based Modeling: a system that assigns low “energy” to compatible states and high energy to incompatible ones, learning the shape of the possible rather than the statistics of the observed. The collapse taxonomy — the ways standard architectures find degenerate solutions — is analytically precise. VICReg, LeCun’s non-contrastive training method, prevents constant representations by forcing variance and decorrelating components, avoiding the exponential scaling problem that contrastive methods face in high-dimensional spaces.
Then there is the Configurator.
Buried in Section 6, after sixty pages of intricate derivation, LeCun arrives at the module that provides executive control to the entire system. Everything else — Perception, World Model, Cost, Actor, Short-Term Memory — functions as configured by this module. It is the component that takes a goal like “get a glass of water” and decomposes it into “stand up, walk to kitchen, open cabinet, reach for glass, fill glass, carry glass back.” Without it, the architecture is a sophisticated sensory-prediction system with no direction. With it, the architecture is an agent.
The learning mechanism for this module is left as an open question.
This is the Configurator Problem. It is not peripheral. It is the difference between a world model and an agent — the difference LeCun’s entire argument turns on. His critique of LLMs is precisely that they cannot plan, cannot decompose goals, cannot reason about sequences of action in a causally grounded world. The architecture he proposes to do these things leaves the mechanism unspecified.
What the Billion Dollars Is Buying
V-JEPA 2, published by Meta’s AI Research division in June 2025, is the closest thing we have to empirical validation of the JEPA approach at scale. Trained on over a million hours of video using a billion-parameter Vision Transformer, it achieves 77.3% top-1 accuracy on Something-Something-v2 — a benchmark testing whether systems understand physical motion, not just appearance. It detects physical impossibilities: when objects teleport, its prediction error spikes. After post-training on 62 hours of robot video, it can plan and execute manipulation tasks on a Franka arm. The model does not learn from reward. It learns from watching the world.
These results are meaningful. They are not the Hierarchical JEPA that learns object permanence, intuitive physics, and goal decomposition from passive observation. They are evidence that the direction is right, not that the destination has been reached.
AMI Labs is buying time for the distance between those two things. The strategic targets clarify what’s at stake. Healthcare, through the partnership with Nabla: a world-model-based system that can reason about patient outcomes, handle continuous physiological signals, and provide auditable clinical recommendations — an FDA-certifiable agent, not a hallucinating chatbot. Robotics, through V-JEPA 2’s demonstrated zero-shot manipulation: a path to robots that learn from watching humans rather than from millions of trial-and-error simulations. Wearables, through reported discussions with Meta about Ray-Ban smart glasses: an assistant that sees what you see and understands the causal structure of the physical situation you are in, not just the words you say.
None of these applications can tolerate hallucination. All of them require the system to know that it doesn’t know — to model its own uncertainty, to reason about physical possibility, to refuse to act when action would be dangerous. LLMs fail this test by design. The token-prediction objective has no built-in mechanism for saying “this sequence of tokens describes a physically impossible state.” JEPA’s energy function does: physically impossible states are high-energy states, and the system learns to distinguish them by learning the structure of the possible.
This is the architecture’s deepest safety property. It is not bolted on. It is structural. And in the industries AMI is targeting — healthcare, manufacturing, logistics, autonomous systems — structural safety is worth more than any benchmark score.
The Historical Position
There have been three previous moments when the AI field converged on an architecture and declared it the answer. Symbolic systems in the 1960s. Neural networks in the 1980s. Deep learning in the 2010s. Each was not wrong — each captured something real — and each hit a wall that the next architecture addressed by abandoning the central assumption of the one before.
Symbolic systems could reason but couldn’t learn. Neural networks could learn but were shallow. Deep learning scaled but couldn’t reason about physics, causality, or what would happen if you moved one object next to another in a way that hadn’t appeared in training data.
LeCun’s proposal is that the wall we are now hitting is structural in a way that scaling will not fix. This is the same claim made at each previous transition. History suggests taking it seriously. It also suggests not confusing the diagnosis with the cure.
The paper that raised one billion dollars is not a completed theory. It is a research program with a good diagnosis, a promising architectural framework, a rigorous training paradigm, and an unsolved central problem. LeCun himself would agree with this characterization — he said as much on page 60. The question is whether the research program will close the gap between “JEPA learns to predict video representations” and “H-JEPA learns object permanence, goal decomposition, and causal models of the physical world.”
That question will be answered in the years AMI Labs is buying with this funding. What is already clear is that the direction matters. The grounding problem is real. The argument that the next frontier in AI is not larger language models but deeper world models is coherent, well-supported, and being tested at a scale that will determine whether the diagnosis was also a prescription.
The billion dollars is a bet that this time, the diagnosis and the cure are the same architecture.
Whether the Configurator problem has a solution, we will know soon enough.
Source
LeCun, Yann. “A Path Towards Autonomous Machine Intelligence.” Position paper, version 0.9.2, 27 June 2022. OpenReview Archive. https://openreview.net/forum?id=BZ5a1r-kVsf
Tags: AMI Labs JEPA world models billion dollar funding, Yann LeCun autonomous machine intelligence architecture, Joint Embedding Predictive Architecture vs LLM, V-JEPA 2 video prediction physical reasoning, world model AI grounding problem position paper


