Essay - The Book of Why: The New Science of Cause and Effect

The Mathematics of Why

Feb 11, 2026

Pearl’s Revolution and Its Discontents

The prohibition lasted nearly a century. From the 1890s, when Francis Galton discovered correlation and promptly abandoned his search for causation, until the 1990s, when Judea Pearl’s causal diagrams finally gave science permission to ask “why,” statisticians operated under what Pearl calls a “self-inflicted causal blindness.” The taboo was so complete that Karl Pearson declared causation “simply perfect correlation” and banished the word from statistical discourse entirely. Students learned to chant “correlation is not causation” while their textbooks contained no index entry for the forbidden concept.

This is where Pearl’s The Book of Why: The New Science of Cause and Effect, co-written with Dana Mackenzie, begins: with righteous indignation at a century of scientific malpractice. The tone is prosecutorial. Pearl marshals evidence of lives lost and policies bungled because scientists lacked the grammar to formulate causal questions. The smoking-cancer debate languished for decades. Physicians couldn’t prove cigarettes caused cancer not because the evidence was weak, but because they had no mathematical vocabulary for “proof.” Jerome Cornfield’s 1959 inequality—showing that no smoking gene could account for smokers’ ninefold cancer risk—had to be derived from scratch because statistical theory provided nothing.

Pearl’s central achievement is giving science that vocabulary back. The three-rung Ladder of Causation (Seeing, Doing, Imagining) provides the conceptual architecture. The do-calculus provides the mathematical machinery. Together, they accomplish what generations of statisticians insisted was impossible: predicting the effects of interventions without conducting experiments, and answering counterfactual questions using observational data.

The Apparatus and Its Architecture

Pearl’s framework rests on deceptive simplicity: causal diagrams are just dots and arrows. A causes B if B “listens to” A and determines its value in response. From this elementary notation emerges the backdoor criterion, which transforms confounding from philosophical quagmire into computational puzzle. The frontdoor adjustment shows that you can estimate causal effects even with unmeasured confounders if you have the right mediating variables. The do-calculus completeness proof (via Shpitser and Huang/Valtorta) means we now know exactly when observational data can answer interventional questions.

The book’s historical sections demonstrate that this wasn’t academic hair-splitting. Sewall Wright’s path diagrams were savaged by Henry Niles in 1921 for being “philosophically faulty.” Barbara Burks’s insights about mediation and collider bias—decades ahead of their time—were forgotten after her suicide in 1943. The Galton-Pearson story is particularly instructive: Galton discovered correlation while searching for causation, then abandoned the quest. Pearson weaponized this abandonment into ideology, declaring that “the ultimate scientific statement of description can always be thrown back upon a contingency table.” Data is all there is, full stop.

Pearl shows this prohibition had real costs. The birth-weight paradox—where smoking mothers’ underweight babies survived better than non-smokers’—puzzled epidemiologists for 40 years until someone recognized it as simple collider bias. Controlling for birth weight opened a spurious path between smoking and mortality, making smoking appear protective. The solution was obvious once you drew the diagram. Without diagrams, researchers argued about it until 2006.

The technical content is genuinely impressive. The mediation formula deserves particular attention. Pearl’s initial dismissal of indirect effects as “figments of imagination” followed by his recognition that they require counterfactual (Rung 3) thinking demonstrates intellectual honesty rare in academic writing. His “embrace the would-haves” moment—triggered by reading legal definitions of discrimination—shows how cross-disciplinary thinking unlocks problems. The formula itself reduces a conceptually slippery idea (how much of an effect passes through a mediator?) to a computable quantity, freeing mediation analysis from the confines of linear models.

The Gap Between Framework and Practice

Here we arrive at what Pearl understates: the enormous distance between having the right framework and using it correctly. Pearl makes path diagrams look easy because he’s already done the hard work. The guinea pig breeding diagram, the firing squad, the Berkeley admissions paradox—in each case, Pearl presents the “obvious” causal structure. But constructing these diagrams requires precisely the domain expertise, causal intuition, and theoretical sophistication that most researchers lack.

Consider the 80 Days to Stay project—a real-world attempt to help international students find visa-sponsoring companies by processing 568,000 SEC Form D filings. The causal question seems straightforward: does receiving venture funding cause a company to sponsor H-1B visas? But the diagram immediately explodes in complexity. Company age affects both funding and hiring. Industry sector confounds everything. The decision to file Form D might itself be an outcome variable—companies seeking foreign talent may be more likely to raise capital. Previous funding rounds create dependencies across time. The “simple” question of whether to control for company size becomes a minefield: size mediates the effect of funding on hiring, but it’s also confounded by sector and affected by the very funding we’re trying to study.

Pearl would tell you to “just draw a causal diagram,” but which diagram? The relationship between raising capital and hiring internationally involves mechanisms Pearl’s examples sidestep: internal company politics, labor market conditions, immigration policy uncertainty, signaling effects of previous hires. These aren’t measurement problems. They’re genuine causal ambiguities where reasonable experts would draw different arrows.

The practical researcher faces what we might call the specification problem: Pearl’s framework is complete (if a causal effect is estimable, the do-calculus will find it), but that completeness assumes you’ve specified the correct causal model. Pearl acknowledges this—”causal diagrams require domain expertise”—but doesn’t adequately wrestle with how difficult good specification actually is. The book’s examples work because Pearl has pre-selected scenarios where the causal structure is clear or already established. The real difficulty lies precisely where Pearl’s examples end: when experts disagree about the arrows, when mechanisms are genuinely unknown, when the very act of measurement might alter causal relationships.

Take the smoking-cancer debate. Pearl presents the frontdoor adjustment (smoking → tar → cancer) as if tar deposits were the obvious mediator. But David Friedman correctly objected that this model is almost certainly wrong. If a smoking gene exists, it might affect how bodies process tar, requiring an arrow from gene to tar that invalidates the frontdoor formula. Other mechanisms surely exist—chronic inflammation, immune suppression. The model is pedagogically elegant but medically oversimplified. Pearl’s response amounts to: “Experts should use their judgment.” True enough, but this returns us to exactly the scientific uncertainty that mathematical frameworks are supposed to resolve.

The deeper issue is that Pearl’s framework can verify solutions but struggles to find them. If you know the correct sequence of do-calculus transformations, proving that observational data can estimate a causal effect becomes mechanical. But if you don’t know the sequence—if you’re staring at a complex diagram wondering which variables to adjust for—the do-calculus provides limited guidance. Shpitser’s algorithm solved this for estimability (it can determine if a solution exists), but researchers still need to construct the diagram correctly in the first place. The framework is a powerful verifier, a mediocre searcher.

This connects to a broader tension in Pearl’s project. He positions causal inference as liberation from Fisher’s “tyranny of randomization,” showing how observational studies can estimate causal effects that RCTs measure experimentally. But every such estimate is “provisional causality”—causality contingent upon assumptions the diagram advertises. Pearl treats this transparency as a virtue, and it is. But it also means that two researchers with different diagrams can analyze identical data and reach opposite conclusions, no matter how large the dataset. Pearl celebrates this as honest acknowledgment of assumptions. Critics see it as abandoning the objectivity Fisher fought to establish.

The practical consequence appears in contemporary research. Epidemiologists now routinely draw causal diagrams, which is progress. But diagram quality varies wildly. Some researchers treat them as decorative—adding arrows to satisfy reviewers without genuine causal reasoning. Others over-specify, controlling for variables that introduce rather than eliminate bias. The M-bias example (where controlling for a pre-treatment variable opens a spurious path) should terrify anyone who’s been conditioned to “control for everything you can measure.” Yet that remains the default practice in many fields. Pearl’s framework has changed the vocabulary of epidemiology without necessarily improving the thinking.

The AI Chapter’s Overconfidence

Pearl’s treatment of artificial intelligence (Chapter 10) reveals both the book’s ambitions and its limitations. He prescribes three components for strong AI: a causal model of the world, a causal model of the machine’s own software, and memory linking intentions to outcomes. Then he writes: “I believe that strong AI with causal understanding and agency capabilities is a realizable promise.”

This claim requires examination. Pearl is absolutely correct that current AI systems—including deep learning—operate entirely on Rung 1 of the causation ladder. AlphaGo can predict which move leads to victory with superhuman accuracy, but it cannot explain why a move works. It fits functions to patterns, blind to causation. Pearl’s dismissal of such systems as “machines with truly impressive abilities but no intelligence” captures something real. They cannot generalize beyond training data, cannot explain their decisions, cannot answer the simplest why-questions a three-year-old handles easily.

But Pearl underestimates the gulf between “causal models exist” and “machines can acquire them.” His framework assumes someone (the researcher, the programmer) provides the causal structure. For Pearl’s inference engine to work, we need the diagram drawn correctly first. How does a machine learn that fire causes smoke rather than vice versa? That roosters don’t cause sunrise? That correlation between chocolate consumption and Nobel Prizes is spurious?

Pearl gestures at “an intricate combination of inputs from active experimentation, passive observation, and not least, the programmer”—essentially punting on the hardest problem. The challenge isn’t teaching machines to manipulate causal diagrams once drawn. That’s the easy part, pure symbol manipulation. The challenge is teaching machines to construct correct diagrams from experience, which requires solving the symbol grounding problem, learning causal structure from observational data (causal discovery), and developing common-sense reasoning about mechanisms.

Pearl acknowledges causal discovery is “much more difficult and perhaps impossible,” then immediately pivots to arguing his framework makes strong AI achievable. This is sleight of hand. If machines can’t learn causal structure, someone must program it manually for every domain. That doesn’t scale. That isn’t intelligence. Pearl’s own PhD students (Spirtes, Glymour, Scheines) have spent decades on causal discovery algorithms, with modest success in restricted domains. The general problem remains intractable.

The book needed to spend more time on this limitation. Pearl’s framework is powerful for humans with domain expertise who can draw diagrams. It’s unclear whether it brings machines closer to human-like reasoning or just gives them a different set of tools that still require human guidance. The vision of robots “reflecting on their mistakes” and “functioning as moral entities” sounds compelling until you ask: where do the causal models come from? If humans must still specify the arrows, we’ve automated calculation but not understanding.

What Endures, What Remains

For practitioners, the book provides immediately actionable methodology. The backdoor criterion tells you which variables to control for—not “everything you can measure” but precisely the set that blocks confounding paths. Understanding why RCTs work (they sever incoming arrows to the treatment variable) suggests when observational studies can achieve the same deconfounding. The mediation formula allows you to distinguish direct from indirect effects, which has genuine policy implications. Chicago’s “Algebra for All” program showed a direct effect of +2.7 points but an indirect effect of -2.3 points through classroom environment. Understanding the mechanism explained both why the original policy disappointed and why “Double Dose Algebra” succeeded.

Open epidemiology journals from 1995 and 2015—the transformation Pearl describes is real. Causal diagrams appear routinely. The do-operator is standard notation. Researchers specify assumptions transparently rather than hiding behind “objective” data analysis. This represents science recovering capabilities it should never have surrendered.

For international students processing SEC filings to find visa sponsors, distinguishing association from causation determines their future in this country. For physicians prescribing statins, the difference between lowering cholesterol (intervention) and observing low cholesterol (association) determines treatment efficacy. For climate scientists, P(Y₁|X=1,Y=1)—the probability of necessity—transforms hand-waving into quantifiable attribution of individual weather events to climate change.

Yet the book stumbles precisely where ambition exceeds achievement. The AI discussion needed more humility about unsolved problems. The treatment of competing frameworks (particularly the Rubin Causal Model) carries the edge of old academic grievances. Pearl is technically correct that diagrams provide transparency potential outcomes lack, but the repeated insistence that “Rubin steadfastly maintained diagrams serve no useful purpose” feels like score-settling rather than analysis.

More significantly, Pearl doesn’t adequately address how practitioners should construct diagrams when theory provides insufficient guidance. The book needs more on sensitivity analysis (how wrong can your diagram be before conclusions flip?), model validation beyond conditional independence tests, and iterative refinement procedures. Telling researchers to “draw a causal diagram” is like telling writers to “write a good book”—technically true, profoundly unhelpful.

The book’s deepest insight may be its simplest: cause-effect relationships existed before humans evolved, will exist after we’re gone, but only we—and potentially our machines—can reason about them. This capacity separated us from proto-hominids 40,000 years ago. Whether it will elevate machine intelligence remains genuinely uncertain, despite Pearl’s confidence. The mathematical foundation for asking the questions that matter is now in place. Whether that foundation supports the weight of strong AI is the question Pearl leaves unanswered, perhaps because it cannot yet be answered.

The causal revolution has changed how entire disciplines think. That’s Pearl’s genuine achievement—not merely solving technical problems, but providing the grammar for questions that matter. For students facing visa deadlines, for physicians choosing treatments, for anyone trying to extract signal from the noise of correlation, this grammar is essential. But Pearl has given us a powerful language that still requires fluent speakers. The machines aren’t there yet, and the path from here to there remains obscure.

Assessment: Essential, overdue, occasionally exhausting—but ultimately the mathematical foundation for asking the questions that matter. Just don’t expect the diagrams to draw themselves. </thinking>

The Grammar of Why: Pearl’s Revolution and the Problem of Fluency

This prohibition had consequences. In the 1950s, scientists couldn’t prove cigarettes caused cancer not because evidence was weak—smokers had nine times the lung cancer risk—but because they lacked vocabulary for “proof.” The smoking-cancer debate consumed 15 years that might have been shortened with proper causal framework. Physicians struggled with the birth-weight paradox for 40 years before someone recognized it as simple collider bias. Lives were lost. Policies bungled. Questions couldn’t be asked because the grammar didn’t exist.

Judea Pearl’s The Book of Why, co-written with Dana Mackenzie, arrives as both indictment and remedy. The book’s central achievement is breathtaking in its simplicity: Pearl gave science back the ability to ask “why.” The three-rung Ladder of Causation (Seeing, Doing, Imagining) provides conceptual architecture. The do-calculus provides mathematical machinery. Together, they accomplish what generations of statisticians insisted was impossible.

The Apparatus Works

Pearl’s framework rests on deceptive simplicity. Causal diagrams are just dots and arrows: A causes B if B “listens to” A and determines its value in response. From this elementary notation emerges genuine power. The backdoor criterion transforms confounding from philosophical quagmire into computational puzzle—you trace paths in a diagram, block the backdoor paths, and suddenly you know which variables to control for. Not “everything you can measure,” as Ezra Klein describes current practice, but precisely the set that eliminates spurious correlation.

The frontdoor adjustment demonstrates the framework’s elegance. Even with unmeasured confounders, you can estimate causal effects if you have the right mediating variables. Smoking → Tar → Cancer works even if there’s a “smoking gene” confounding the smoking-cancer relationship, provided tar deposits are measured and genuinely mediate the effect. The do-calculus completeness proof means we know exactly when observational data can answer interventional questions.

The mediation formula deserves particular attention. Pearl’s initial dismissal of indirect effects as “figments of imagination” followed by his recognition that they require counterfactual thinking demonstrates intellectual honesty rare in academic writing. His “embrace the would-haves” moment came from reading legal definitions of discrimination: “Had the employee been of a different race, and everything else had been the same.” This simple legal phrasing unlocked the mathematics. The formula reduces a conceptually slippery idea—how much of an effect passes through a mediator?—to a computable quantity. Chicago’s “Algebra for All” program showed a direct effect of +2.7 points but an indirect effect of -2.3 points through classroom environment changes. Understanding the mechanism explained why the policy initially disappointed and why “Double Dose Algebra” succeeded.

Open epidemiology journals from 1995 and 2015—Pearl’s described transformation is real. Causal diagrams appear routinely. The do-operator is standard notation. Researchers specify assumptions transparently. This represents science recovering capabilities it should never have surrendered.

The Specification Problem

But here we arrive at what Pearl understates: the enormous distance between having the right framework and using it correctly. Pearl makes path diagrams look easy because he’s already done the hard work. The guinea pig breeding diagram, the firing squad, the Berkeley admissions paradox—in each case, Pearl presents the “obvious” causal structure. Telling researchers to “just draw a causal diagram” is like telling writers to “just write a good book.”

Consider a real-world problem: international students processing SEC Form D filings to identify visa-sponsoring companies. The causal question seems straightforward: does receiving venture funding cause companies to sponsor work visas? But the diagram immediately explodes in complexity. Company age affects both funding and hiring. Industry sector confounds everything. The decision to file Form D might itself be an outcome variable—companies seeking foreign talent may be more likely to raise capital. Previous funding rounds create time-dependent confounding. Size mediates the funding-hiring relationship but is also confounded by sector.

The question of whether to control for company size becomes genuinely difficult. In Pearl’s framework, you control for a variable if it blocks backdoor paths. But size sits on the frontdoor path (funding → size → hiring capacity → sponsorship) while simultaneously being confounded by sector. Controlling for it blocks the indirect causal path. Not controlling leaves backdoor paths open. Pearl’s framework can verify which approach is correct if you specify the causal model correctly. But reasonable experts would draw different arrows here.

This is the specification problem Pearl doesn’t adequately address. His framework is complete: if a causal effect is estimable from observational data, the do-calculus will find the estimand. But that completeness assumes correct model specification. Two researchers with different diagrams can analyze identical data and reach opposite conclusions, no matter how large the dataset. Pearl treats this as honest acknowledgment of assumptions. Fisher would call it abandoning objectivity.

Pearl needed more on how to proceed when causal structure is uncertain. Sensitivity analysis (how wrong can your diagram be before conclusions flip?) gets mentioned but not developed. Model validation beyond conditional independence tests remains primitive. Iterative refinement procedures—how to update your diagram when predictions fail—barely appear. The book assumes either you know the causal structure or you don’t. But science operates in the middle ground where structure is partially known, contested, or genuinely ambiguous.

The AI Overreach

Pearl’s Chapter 10 treatment of artificial intelligence reveals both the book’s ambitions and its limitations. He prescribes three components for strong AI: a causal model of the world, a causal model of the machine’s own software, and memory linking intentions to outcomes. Then: “I believe that strong AI with causal understanding and agency capabilities is a realizable promise.”

This claim requires scrutiny. Pearl is absolutely correct that current AI—including deep learning—operates entirely on Rung 1. AlphaGo predicts brilliantly but understands nothing. It cannot explain why moves work, cannot generalize beyond Go, cannot answer why-questions. Pearl’s dismissal of such systems as “machines with truly impressive abilities but no intelligence” captures something real.

But Pearl dramatically underestimates the gap between “causal frameworks exist” and “machines can acquire them.” His inference engine assumes someone provides the causal structure. The framework can manipulate diagrams once drawn—that’s pure symbol manipulation. But how does a machine learn that fire causes smoke rather than vice versa? That roosters don’t cause sunrise? That the correlation between chocolate consumption and Nobel Prizes is spurious?

Pearl gestures at “an intricate combination of inputs from active experimentation, passive observation, and the programmer”—essentially punting on the hardest problem. He acknowledges causal discovery is “much more difficult and perhaps impossible,” then immediately argues his framework makes strong AI achievable. This is sleight of hand. If machines can’t learn causal structure, someone must program it manually for every domain. That doesn’t scale. That isn’t intelligence.

The vision of robots “reflecting on their mistakes” and “functioning as moral entities” sounds compelling until you ask: where do the causal models come from? Pearl’s own students (Spirtes, Glymour, Scheines) spent decades on causal discovery algorithms with modest success in restricted domains. The general problem—learning causal structure from experience without pre-specified possibilities—remains intractable. Pearl’s framework is powerful for humans with domain expertise. Whether it brings machines closer to human-like reasoning remains genuinely uncertain.

AlphaGo’s success doesn’t threaten Pearl’s framework—Go’s rules provide perfect causal structure. But Pearl’s claim that deep learning has “no intelligence” risks the same mistake he accuses others of making: confusing current limitations with fundamental ones. The representation learning, transfer learning, and meta-learning happening in modern ML all have causal interpretations Pearl doesn’t explore. The book needed more on how causal inference should interface with contemporary machine learning: using deep learning for feature extraction (Rung 1) while preserving causal reasoning (Rungs 2-3).

What We Have Now

For practitioners, the book provides actionable methodology if you can construct defensible diagrams. Understanding why RCTs work (randomization severs all incoming arrows to the treatment variable) suggests when observational studies achieve the same deconfounding. For international students processing hundreds of thousands of company filings, distinguishing association from causation determines their future in this country. For physicians prescribing statins, the difference between lowering cholesterol and observing low cholesterol determines treatment efficacy. For climate scientists, the probability of necessity—P(Y₁|X=1,Y=1)—transforms vague claims about “climate change contributing to extreme weather” into quantifiable attribution: there’s a 90% probability that anthropogenic warming was a necessary cause of the 2003 European heat wave.

The framework has changed discourse in epidemiology, social science, economics. That’s Pearl’s genuine achievement—not merely solving technical problems but providing grammar for questions that matter. The paradoxes chapter alone justifies the book’s existence. The “bad-bad-good drug” example (harmful to men, harmful to women, beneficial to “people”) crystallizes why causal thinking matters. The Sure-Thing Principle, properly stated with the do-operator, proves such drugs mathematically impossible.

But Pearl has given us a powerful language that still requires fluent speakers. The book is essential for anyone teaching data science or working with observational data. The correctives it provides—deep learning operates on Rung 1 only, data are “profoundly dumb” about causes, transparency matters more than performance in systems that must explain themselves—will remain relevant as long as people mistake correlation for causation.

The scaffolding shows. Pearl oscillates between accessible exposition and technical density. Co-author Dana Mackenzie’s warmer voice occasionally surfaces before Pearl’s formalism reasserts control. The treatment of competing frameworks carries the edge of old academic grievances. The AI discussion feels both rushed and overconfident.

Yet Pearl’s core insight stands: You are smarter than your data. Data tell you that people who took medicine recovered faster. They can’t tell you why. Maybe those who took medicine did so because they could afford it and would have recovered just as fast without it. The causal revolution enables us to answer such questions, provided we can specify our assumptions clearly enough and defend them on scientific grounds.

The book’s deepest contribution may be epistemological rather than technical. Pearl has shown that causal questions are answerable—not from data alone, but from data combined with explicit structural assumptions. This shifts emphasis from Fisher’s “objective” analysis (which hid assumptions) to transparent modeling (which advertises them). Whether this counts as progress depends on your faith that scientific communities can construct better diagrams than individuals can hide assumptions. The last two decades suggest cautious optimism.

Essential reading for anyone working with observational data. Occasionally exhausting. Undeniably important. But don’t expect the diagrams to draw themselves, and don’t believe strong AI is just around the corner. Pearl has given us the grammar. Fluency takes longer.

Nik Bear Brown - Computational Skepticism

Discussion about this post

Ready for more?