The Artifact Was Once Enough
This essay is a response to Lila Shroff's "Is Schoolwork Optional Now?" published in The Atlantic on April 10, 2026.
This essay is a response to Lila Shroff’s “Is Schoolwork Optional Now?“ published in The Atlantic on April 10, 2026. The argument it makes in full is developed in the preprint “Frictional: Measuring the Struggle“ at irreducibly.xyz.
There is a word — decoupling — that sounds technical enough to keep us comfortable. Clinical. As if what has happened in classrooms since 2022 is primarily a logistics problem, a puzzle about detection and enforcement, a cat-and-mouse game that the right algorithm might someday win.
It is not that.
What has happened is something more fundamental than cheating at scale. The artifact — the essay, the proof, the lab report — used to be evidence of a process. The process was the point. The essay was proof that thinking had occurred, that a mind had engaged with difficulty and emerged changed. When we graded the essay, we were really grading the encounter: the hours of confusion, the drafts that failed, the moment when something clicked and then had to be organized into sentences for another person. The artifact was the residue of all that. It was upstream evidence of downstream consequence.
Generative AI has broken the causal chain. Not bent it — broken it.
A bot called Einstein, built by a 22-year-old entrepreneur named Advait Paliwal, recently completed all eight modules and seven quizzes of an introductory statistics course in under an hour. Perfect score. The human who set it loose reports that she “hardly so much as read the course website.” What Einstein produced — the evidence that a course had been completed — was real. The learning it was supposed to represent did not occur. The artifact existed. The process that should have produced it did not happen.
Paliwal says he released the tool to alert educators. His more honest statement is buried in the subtext: “If I didn’t post about this, someone would have used the same technology and hidden it from the professors.” He is right. He is also describing a world in which the distinction between using it secretly and not using it at all is narrowing toward irrelevance. The tool exists. The temptation exists. The economic pressure on students — especially international students, especially students working jobs to pay tuition, especially students in courses they are taking to satisfy requirements rather than from genuine interest — those pressures exist independently of any single tool.
The institutional response has been to build better detectors. This is a reasonable first move. It is not a durable one.
Why Detection Cannot Save Us
Here is the structural problem with artifact-based AI detection: the arms race has a predetermined winner. Detection is always trained on the outputs of current generation technology. Generation technology improves continuously. The detector trained on today’s AI writing fails on tomorrow’s — not because detectors are poorly built, but because that is how the mathematics of the problem works. The forensic window closes.
There is a deeper problem. The educationally relevant question was never did a human type these words. It was did a human develop this understanding. A student who dictated an essay to a transcriptionist and then submitted it word-for-word would have technically written no AI content. The essay would pass every detector. The learning would have occurred or not occurred based on whether they thought hard while dictating, not based on who typed it. The detector is solving the wrong problem.
And there is a third problem, the one that produces the most corrosive outcomes. When you build a system to catch AI use, you teach students to game the detector. They learn strategies for mimicking authentic writing — inserting typos, varying sentence structure, using phrases the model knows sound “human.” The simulation improves. The gap between simulated engagement and genuine engagement widens at precisely the moment we need it to narrow.
William Liu, a Stanford sophomore who finished high school two years ago, puts it plainly: his educational experience and his younger sibling’s are vastly different despite a two-year gap. The technology arrived. The classroom has not yet figured out what to do next.
What Genuine Learning Actually Leaves Behind
Here is the thing we have been too polite to say: learning is not the same as performance.
Robert Bjork has been saying this for thirty years in academic papers that educators read and administrators do not read and curriculum designers read and then ignore when the calendar pressure comes. Performance is the observable, often temporary thing — how well a student does on a measure. Learning is the durable change in what the student can do and understand and transfer to a new context. These two things are not the same. We have built an entire institutional infrastructure that measures only one of them.
Genuine human learning is a biological event. When a learner encounters material that genuinely challenges their current understanding — material in that productive zone where their current model is wrong or incomplete — something specific happens neurologically. Dopamine neurons fire in response to prediction errors. BDNF expression upregulates, sometimes by nearly three times. New dendritic spines form at the synaptic connections that will hold the memory. These are not metaphors. They are the physical substrate of the thing we call learning.
The behavioral consequences of these neurological events are traceable. A student engaged in genuine cognitive struggle spends time proportional to difficulty. Their errors follow a coherent developmental path — misconceptions that make sense given their current model, corrections that build on each other. When tested in a new context, they can transfer. When scaffolded with a partial hint, they respond — because there is a partially formed structure for the hint to connect to. Their confidence, over time, calibrates to their actual performance rather than inheriting the confidence of the AI explanation they processed.
These are what I have been calling friction traces — the behavioral signatures that genuine human cognitive engagement leaves in observable data. They exist because genuine learning is a biological event. An AI can produce the artifact without triggering any of these neurological events. It cannot produce the behavioral traces, because the biological events that generate those traces did not occur.
The Seven Things We Can Now Measure
The Genuine Learning Probability framework I have been developing with Humanitarians AI specifies seven such traces:
The temporal engagement pattern — the correlation between how hard an item is and how long a student spends on it. Genuine engagement produces this correlation. AI-assisted completion decouples time from difficulty.
The error trajectory — whether a student’s mistakes follow conceptually coherent developmental paths. Genuine learning produces coherent errors; the reward prediction error mechanism drives the model toward better models in patterned ways. Borrowed certainty produces random errors with respect to conceptual structure.
Cross-context transfer — the Bjorkian definition of learning. A student who genuinely understood something can apply it in novel contexts. Borrowed certainty produces surface representations tied to the specific context of the AI explanation.
Uncertainty calibration — whether a student’s expressed confidence tracks their actual performance. Borrowed certainty produces systematic overconfidence: the student inherits the AI’s confidence distribution without the knowledge base that would justify it.
Social knowledge texture — the quality of a student’s engagement in discussion contexts. Genuine encounter with material leaves a characteristic texture: specific confusions, particular connections, the specific questions that arose from actual engagement. This texture cannot be manufactured without having had the encounter.
The retrieval strength decay signature — whether performance decays at rates consistent with genuine encoding. The spacing effect is the benchmark of genuine learning. Borrowed certainty has no storage strength to retrieve; performance decays monotonically and the spacing effect does not appear.
And the scaffolding response curve — whether a student’s performance responds appropriately to partial hints. A student with genuine partial understanding has a zone of proximal development. A partial hint activates the structure that is already forming. Borrowed certainty has no such zone.
What the Bot Cannot Manufacture
Here is the argument I want to make carefully, because it is often misunderstood: this framework is not about catching AI use. It is about measuring learning directly.
An AI detector fails when AI outputs become indistinguishable from human outputs. A learning measure fails when borrowed certainty becomes indistinguishable from genuine learning — which would require borrowed certainty to produce the same neurobiological events, the same schema formation, the same durable transfer. At that point, borrowed certainty has become learning. That is not AI defeating assessment. That is learning occurring through a different pathway than we expected.
What manufacturing all seven friction traces simultaneously — without performing the underlying cognitive work — actually requires is something close to performing the underlying cognitive work. A student who spends genuine time on difficult material, who makes and corrects errors in a conceptually coherent sequence, who demonstrates transfer across novel contexts, who maintains calibrated uncertainty, who engages with genuine texture in discussion, who shows the spacing effect across weeks, and who responds appropriately to partial hints — has learned the material. At that point the game has become indistinguishable from the thing we wanted in the first place.
Natalie Lahr, a Barnard sophomore studying history and political science, describes an “anti-AI radicalizing” experience: a tutor at the writing center pasted her essay prompt into Perplexity and handed her the AI-generated outline. “Why am I even here?” she asked afterward. The question is not rhetorical. It is the correct question.
What We Must Build Instead
The crisis of evidence facing educational institutions is not a technical problem. It is an epistemological problem. The evidence infrastructure we built assumed a world in which the artifact was upstream evidence of the process. That world no longer reliably exists.
What we need is an assessment infrastructure built on the process itself.
This means longitudinal process documentation — portfolios that capture the history of engagement, not just its products. It means embedded formative assessment that generates the data necessary to observe the seven friction traces over time. It means treating developmental trajectory as evidence: not what a student produced, but how their understanding developed, what they got wrong and corrected and why, where they transferred and where they didn’t.
Marc Watkins at the University of Mississippi describes an instructor who could, theoretically, set an AI to grade thirty essays during a fifteen-minute walk to Starbucks. He calls this “really scary.” He is right, but I want to be precise about why. The fear is not the efficiency. It is the loop: AI-generated assignments completed and assessed by AI agents, with human understanding nowhere in the chain. The fully automated loop is not a future dystopia. It is the logical endpoint of current trajectories. Einstein completes the course. The grader grades Einstein’s work. Both certificate and grade are real. The learning did not occur.
The artifact was once enough. It is no longer enough. The arms race between generation and detection has a winner, and it is not the detector.
We must now measure the struggle itself. Not because friction is intrinsically valuable — productive struggle matters only because of what it builds in the brain that does the struggling. We must measure it because the brain that struggles is the brain that learns, and the brain that learns is the only thing education was ever actually for.
The methodology is developed in full in “Frictional: Measuring the Struggle“ — a preprint specifying the seven friction components, the ensemble architecture, and the tier calibration system — and at irreducibly.xyz. The framework is not a secret.
Nik Bear Brown is Associate Teaching Professor of Computer Science and AI at Northeastern University and founder of Humanitarians AI (501(c)(3)).
bear.musinique.com · skepticism.ai · theorist.ai
Tags: AI detection education failure, genuine learning probability framework, friction traces assessment, Bjork performance vs learning, Einstein bot Canvas schoolwork automation


