The Human Half: What AI Can't Do Course Series

Causal Reasoning

Mar 20, 2026

You are standing in front of a vendor demo. The screen shows a causal AI platform — sleek, confident, color-coded. The presenter clicks a button. A causal effect estimate appears. A confidence interval. A recommendation. Everything looks exactly the way a result is supposed to look.

You do not know what the tool assumed to produce that number.

Neither does the presenter.

This is not a failure of the technology. The technology worked correctly. The estimation layer — the statistical machinery that takes inputs and produces an output — performed exactly as designed. The failure lives one layer earlier, in decisions that were made before the tool ran, decisions that most people using the tool do not know they are making. Someone drew a causal graph. Someone chose which variables to condition on. Someone encoded assumptions about which causes are real and which are artifacts. The tool accepted those inputs and did its job.

Whether the inputs were defensible — that part never came up.

The field has a name for what the tool cannot do. Identification. Not identification in the bureaucratic sense, not spotting a pattern in a dataset. Identification is the set of decisions that determine whether a causal analysis is asking the right question of the right data with the right structural assumptions. It is the layer between “I have observational data” and “I have a result I can act on.” Every causal AI tool in existence requires someone to perform this layer before the tool runs. No tool performs it for you. Most tools do not tell you this.

Consider what the identification layer actually involves. First, you must draw the causal graph — a directed acyclic graph, a DAG — that encodes your beliefs about what causes what in your domain. Every arrow is a claim. A missing arrow is also a claim. Then you must choose what to condition on: which variables to adjust for in order to block the paths that would otherwise confuse correlation with causation. Then you must defend those choices — explain what you are claiming, what you are assuming, and what you are honestly leaving unresolved — to statisticians who will estimate the effect and to decision-makers who will act on it.

None of those steps are statistical. All of them require domain expertise that no algorithm can supply.

Here is what happens when that layer is skipped.

A hospital reviews its treatment protocols. The data show that patients who received a particular treatment had worse outcomes than patients who did not. The statisticians are confident. The sample size is large. The confidence intervals are narrow. The administration considers discontinuing the treatment.

The hidden variable: the treatment was used preferentially for the sickest patients. Severity drives both the treatment decision and the outcome. When you condition on severity — when you compare sick patients to sick patients, not sick patients to healthy ones — the relationship reverses. The treatment is effective. The aggregate result was not wrong as a description of the data. It was wrong as a causal claim. The data cannot tell you which is which. Only someone who knows the domain — who knows how treatment decisions are actually made in that hospital — can supply the structural assumption that unlocks the right analysis.

This is Simpson’s Paradox. It is named, documented, and taught in introductory statistics. It appears in practice constantly, wearing the confident clothing of a large-sample result.

The analysis that nearly discontinued an effective treatment was not careless. It was careful, rigorous, and causally meaningless.

The curriculum has not caught up.

The technical literature on causal inference — Hernán and Robins, Chernozhukov and collaborators — assumes a statistician audience. The methods are real and powerful. The prerequisite is that someone with domain expertise has already performed the identification layer and handed the statistician a defensible model. That handoff is assumed. It is never taught, because the people writing the literature already know how to do it.

Judea Pearl’s The Book of Why built the intuition. It explained why causal reasoning matters, what confounders are, why correlation is structurally insufficient. Hundreds of thousands of people read it and understood the argument. Almost none of them left knowing how to build a defensible DAG for their own domain problem. The book stops before the doing layer.

Business analytics courses go further in the wrong direction. They teach students to say “correlation is not causation” as a warning. They do not teach what to do instead. The warning is correct. The toolkit is absent.

The people making high-stakes decisions with causal AI tools — the VPs of analytics, the health policy researchers, the marketing scientists, the engineers interpreting algorithmic outputs — have never been taught the one layer that determines whether those tools are being used correctly. The vendors do not tell them. The statisticians assume they already know. The business schools train them to recognize the problem without solving it.

The Human Half is a direct response to that gap.

The first course in the series — causal reasoning — teaches domain experts to construct, evaluate, and defend the causal graphical models that AI estimation tools require humans to supply. The course is built on a single architectural claim: the identification layer cannot be automated, and the people who need to perform it have never been taught how.

The claim is not uncontested. Causal discovery algorithms — PC, FCI, LiNGAM — can recover aspects of causal structure from data under specific assumptions. LLM-assisted DAG construction is an active research area. The course addresses this directly: current methods cannot reliably perform identification in the messy, high-stakes, observational settings where domain experts most need to act. The conditions under which automated discovery would work are rarely met in practice. The thesis stands with that qualification.

Eleven learning outcomes, organized across three zones, anchor the course. Zone one: understand why statistical association and causal effect are structurally different, and what that difference costs when it is ignored. Zone two: build the model — draw a DAG with confounders, mediators, and colliders placed correctly; apply the backdoor criterion; defend the result to a skeptical statistician. Zone three: deploy — translate the defended model into an estimation specification, read the output critically, quantify how wrong the assumptions would have to be before the conclusion changes.

By the end, the student called Sarah in the course design document — a VP of analytics with an MBA, healthcare domain knowledge, and no background in Pearl — can draw a defensible DAG for her domain problem, articulate her assumptions to a statistician, hand off estimation to a tool with confidence, and evaluate whether the result should be trusted.

That is the doing layer. It has never been taught to the people who need it.

The thirteen-chapter arc moves from the decision that looked right to the analysis that accounts for every decision it makes.

Act One — Establish (Chapters 1–4)

Chapter 1: The Decision That Looked Right. You are in the room where a causal failure was made by careful people with large samples and narrow confidence intervals. The analysis was rigorous. The conclusion was wrong. The difference between those two facts is the subject of the entire course.
Chapter 2: Three Words for the Same Problem. Conditioning, confounding, and controlling for a variable are the same concept wearing different disciplinary clothes — one from statistics, one from epidemiology, one from business analytics. Chapter 2 untangles them into a single structural idea the rest of the course depends on.
Chapter 3: The Map Before the Territory. The directed acyclic graph — the DAG — is introduced as a way to make causal beliefs explicit, arguable, and testable. Every arrow is a claim. Every missing arrow is also a claim.
Chapter 4: The Identification Layer: What Only You Can Do. The thesis chapter. Three identification failure types, named and illustrated. The argument that domain expertise is the non-delegatable input to causal analysis. The most dangerous failure is not drawing the wrong DAG — it is not knowing you drew one at all.

Act Two — Build (Chapters 5–9)

Chapter 5: Confounders: The Variable You Forgot. From intuitive recognition to structural identification. Three questions that find confounders systematically. What adjustment does to a backdoor path — and what to do when the confounder is unmeasured.
Chapter 6: Mediators: The Variable You Shouldn’t Touch. Conditioning on a mediator destroys the causal estimate rather than improving it. Total effect versus direct effect. How biomarkers become the most common source of this error in practice.
Chapter 7: Colliders: The Variable That Breaks Everything When You Look at It. The make-or-break chapter. A collider is closed by default and opened by conditioning — the only node type that works this way. The reversal must feel inevitable in retrospect. If your reaction is I’ll trust you on that, the pedagogy has failed. Selection bias is collider bias. Studying only successful founders makes the distortion worse, not better.
Chapter 8: The Backdoor Criterion: Closing the Paths That Don’t Belong. The full node-type taxonomy is complete. Now: given any DAG, what is the correct adjustment set? The backdoor criterion turns DAG-reading into a systematic procedure rather than a judgment call.
Chapter 9: Defending Your DAG: What You’re Claiming, Assuming, and Leaving Open. A three-part defense structure: explicit claims, plausibility-ranked assumptions, honestly acknowledged open questions. Two registers — one for the statistician, one for the decision-maker. The chapter closes by planting the question Act Three answers: what if the DAG is wrong?

Act Three — Apply (Chapters 10–13)

Chapter 10: From DAG to Data: What the Machine Needs. The specification document — translating a defended DAG into the exact inputs an estimation tool requires. Three handoff failure types, each of which produces a result that is technically clean and causally wrong.
Chapter 11: Reading the Output: What to Trust and What to Interrogate. The result lands in your inbox. Narrow confidence intervals. p < 0.05. None of those features address whether identification was correct. Three questions every output must survive before you act on it.
Chapter 12: When the Assumptions Don’t Hold: Limits, Sensitivity, and Honesty. The E-value: a single number that answers “how wrong would my assumptions have to be to reverse this conclusion?” The conditions under which an analysis should not be reported as definitive. The honest version of confidence.
Chapter 13: The Full Analysis: One Problem, Every Decision. A worked capstone — one complete domain problem, seven explicit stages, every identification decision made on the record, every limit named. The course ends where practice begins.

The course is being developed at Northeastern University’s College of Engineering. It is the first course in a graduate series called The Human Half: What AI Can’t Do — a series organized around the cognitive capacities the AI era most urgently requires us to develop, not because machines cannot assist with them, but because machines cannot perform them without a human in the loop who knows what they are doing.

The forklift metaphor is imprecise in one important direction.

A forklift replaces human labor in a bounded task. It lifts what you point it at. It does not decide what needs lifting, which fragile items need handling differently, or whether the warehouse layout makes the whole operation unsafe. Those decisions remain with the human.

Causal AI tools are forklifts that will lift whatever you point them at, with great precision and evident confidence, without telling you whether the thing you are pointing them at is the right thing to lift.

The identification layer is the judgment that precedes the pointing. It is the part that determines whether the tool is being used correctly or being used precisely and confidently in the wrong direction.

Teaching that judgment to the people who need it is not supplementary. It is the prerequisite to using the most powerful analytical tools in the field without systematically deceiving yourself.

The curriculum is coming. That is not an announcement. It is a correction.

Nik Bear Brown - Computational Skepticism

Discussion about this post

Ready for more?