The paper rough draft: https://www.nikbearbrown.com/notes/Papers/glp-framework-genuine-learning-probability
What We Lost When We Made the Artifact the Grade
Here is the situation as it actually exists, not as anyone in an official capacity is willing to describe it clearly.
A student sits down to write a paper. The paper is due in twelve hours. The student has three other assignments due this week, a job that starts at six, and the accumulated evidence of two semesters telling them that the grade lives in the artifact — the paper itself — not in the thinking that was supposed to produce it. The student opens an AI tool. The paper gets written. It is, by most measurable standards, better than what the student would have produced alone at midnight after a shift.
In the next building, the professor who assigned the paper has used AI to draft the assignment prompt, the rubric, and the feedback comments they will paste into the LMS after running the submitted papers through a grading interface that summarizes them automatically.
Neither of them is a villain. Both of them are responding rationally to a system that has always rewarded the artifact and never found a way to measure the process that was supposed to produce it. Generative AI did not create this problem. It revealed it — suddenly, completely, and without the courtesy of suggesting a solution.
This essay is about what the solution might look like. It is not technical. The technical apparatus exists and is documented elsewhere. What doesn’t exist yet, in language plain enough to be useful, is a way of talking about why the solution matters — what it would mean for a student to be seen by an educational system that has, for most of institutional history, been looking at the wrong thing.
What the Artifact Was Supposed to Prove
The essay, the exam, the project, the recorded performance — these were never the thing education cared about. They were evidence. The artifact was valuable because it was causally downstream of a process: the reading, the confusion, the rereading, the argument with yourself at two in the morning about whether you actually understood what you thought you understood. The artifact was a trace of that process. Grading the artifact was a way of inferring the process, because the two were coupled tightly enough that measuring one was effectively measuring both.
That coupling has broken. This is not a scandal or a failure or a temporary condition that better AI detection will resolve. It is a structural change in what artifacts can tell us, and it is permanent. The forensic window — the period during which you can reliably distinguish a human-written essay from an AI-generated one — is closing sequentially across every domain in which humans produce artifacts. In writing it is largely closed already. In code it is closing. The detectors trained on today’s AI outputs will be obsolete when tomorrow’s outputs arrive.
Every educational institution that is currently responding to this situation by installing better detection software is solving last year’s problem with next year’s obsolescence already scheduled.
The Complicity No One Names
The conversation about AI and academic integrity is almost entirely conducted as a conversation about student dishonesty. This framing is not wrong, exactly. It is just so incomplete as to function as a kind of dishonesty itself.
Students are using AI because the artifact is the grade. The artifact is the grade because grading the process — the confusion, the revision, the dead ends, the moments of genuine understanding — is hard, and institutions have never built the infrastructure to do it at scale. The result is a system that has always been measuring the wrong thing, and now the wrong thing can be produced in thirty seconds by a tool that costs less than a textbook.
Professors are not innocent bystanders. Many are using the same tools to manage the same impossible workloads — drafting prompts, generating feedback, summarizing submissions — that the institution’s growth model has made unmanageable. The incentive structure reaches all the way up. Publish or perish does not reward good teaching. Good teaching does not require good teaching to be measurable, only for its artifacts — syllabi, course evaluations, enrollment numbers — to look like good teaching.
The student who uses AI to write a paper is not defecting from a system that is working. They are defecting from a system that has always asked them to perform learning rather than do it, and has never been able to tell the difference. AI has not corrupted that system. AI has made the corruption visible.
This is the thing worth sitting with before any solution is proposed: the problem is not the tools. The problem is what we decided to measure, and what we decided to ignore, long before the tools arrived.
What Genuine Learning Leaves Behind
Here is what the research shows, stated plainly.
When a human being genuinely learns something hard, the process is biological. Neurons fire in response to the gap between what the learner expected and what they encountered. That gap — the prediction error — is uncomfortable. It is the feeling of not understanding, the specific texture of confusion that is different from ignorance because it knows what it doesn’t know. Working through that discomfort produces measurable changes: in how information is encoded, in how long it persists, in whether it transfers to new contexts or stays locked to the specific example through which it was learned.
Genuine learning leaves traces. Not in the artifact — the artifact is the product, and products can be manufactured without the process. The traces are in the behavior that surrounds the artifact’s production: the time spent on the hard parts, the errors that follow a coherent path as the mental model develops, the ability to apply what was learned to a problem that looks different on the surface but has the same underlying structure, the calibrated uncertainty of someone who knows not just what they know but what they don’t.
None of these traces require looking at the artifact. They require looking at the process.
This is what the concept of friction in assessment is about. Not friction as punishment, not friction as obstacle, not friction as the gatekeeping logic that has always made elite education a credentialing system for people who already had advantages. Friction as signal. The productive struggle of genuine learning — the confusion, the revision, the wrong turn and the recovery — is not the unfortunate cost of arriving at the artifact. It is the thing the artifact was supposed to be evidence of. It is the learning itself.
The proposal is to measure it directly.
What This Would Mean for a Student
I want to be specific about what it would feel like to be in a classroom where this kind of assessment exists, because the abstract case is easy to make and the human case is the one that matters.
It would mean that the time you spent genuinely confused about something counts — not as performance of confusion, not as a participation grade for looking engaged, but as actual data about actual thinking. It would mean that the draft that was a mess, the question you asked in office hours that revealed you’d been working from the wrong assumption for two weeks, the revision that turned a competent response into a thinking one — these are evidence of the thing education is supposed to produce. They would be part of the record.
It would also mean that the smooth, perfectly structured submission produced at midnight with no evidence of genuine engagement is not, by itself, proof of anything. The artifact is not worthless. It has not become zero evidence. It has become insufficient evidence. Insufficient means it needs a partner — and the partner is the process that was supposed to produce it.
This is not a punishment for using AI. It is a recognition that the artifact alone was never the right thing to measure, and that the tools which have made that limitation undeniable have also, in the same move, made the solution more urgent than it has ever been.
The Uncomfortable Truth About Friction
The research contains a finding that takes a moment to absorb. The smooth, well-structured artifact — the one that reads with perfect confidence, that has no rough edges, no places where the writer lost the thread and found it again — may be mild negative evidence of genuine learning.
The rough, searching one may be positive evidence.
Not because roughness is a virtue. Not because difficulty signals intelligence. Because genuine struggle with hard material characteristically produces texture — places where the thinking was actually happening, where the writer was working something out rather than reporting a conclusion they arrived at before they started writing. The friction of genuine learning leaves marks. The borrowed certainty of an AI-assisted artifact is often smooth in a way that real thinking, at its most effortful, is not.
This is uncomfortable because educational institutions have spent generations rewarding the smooth artifact and interpreting roughness as inadequacy. We taught students that the goal was to arrive at certainty quickly and present it cleanly. We built rubrics that rewarded the appearance of knowing and had no mechanism for distinguishing it from the thing itself.
Generative AI did not create that confusion. It just made it expensive.
What Comes Next
The framework that formalizes this argument — the specific components of friction that genuine learning leaves in observable data, the way those components can be measured, combined, and calibrated to different kinds of cognitive work — is documented in the paper that follows this introduction. It is technical in the way that any serious methodology is technical, and it is also not the point of this essay.
The point of this essay is this: the crisis that AI has created for educational assessment is not primarily a cheating problem. It is an evidence problem. The artifact, which was always a proxy for the process, can now be produced without the process. Any response that tries to restore the artifact’s evidentiary value by detecting AI use is fighting a war that the progression of technology has already decided.
The response that might actually work is to stop relying on the artifact as the sole evidence of learning, and start building the infrastructure to measure what the artifact was always supposed to be downstream of.
Students are not wrong that the system gives them no choice but to produce the artifact by whatever means are available. They are responding rationally to a broken incentive structure. Educators are not wrong that something has been lost when the struggle disappears from the work. They are mourning the only evidence they were ever given access to.
The argument this paper makes is that the struggle was always the point. It is still the point. We have spent a long time measuring the wrong thing, and the tools that have made that undeniable have also, in the process, handed us a reason to build something better.
The infrastructure for measuring the struggle exists. The question is whether the institutions that credential learning are willing to build it before the artifact becomes so decoupled from the process that the credential stops meaning anything at all.
That window is not closed. But it is not wide open either.
The struggle is the point. It is time to measure it.
Tags: AI academic integrity assessment friction traces genuine learning, generative AI education artifact decoupling, GLP framework formative assessment process evidence, student professor AI use structural incentives, irreducibly human cognitive engagement pedagogy



Great piece. When I started teaching a few years ago - I tried using AI detectors once. Even then - when it was fairly obvious what was (and wasn't) written by AI, they were effectively useless. I tried 3-4 and they all gave wildly different answers.