The Assessment Was Already Broken

On Jessica Winter's "What Will It Take to Get A.I. Out of Schools?" and what the panic about AI reveals about everything that came before it

Apr 24, 2026

A response to Jessica Winter's "What Will It Take to Get A.I. Out of Schools?"

There is a moment in Jessica Winter’s New Yorker piece that contains the entire argument she doesn’t make. Her sixth-grade daughter runs a fifth-grade slide show through Gemini’s beautifying tools. In thirty seconds, the typography improves, the pictures reshuffle symmetrically, the design evokes fifteenth-century movable type against a background of aged vellum. Winter describes it as the pool race from Mommie Dearest: the larger, faster thing that will always beat you.

Her daughter is unmoved. “I like mine better, because it’s original and I worked really hard on it.”

Hold that sentence. It is the right answer. It is also the answer that does not appear on any rubric in any public school in Massachusetts or New York or Los Angeles. The rubric rewards the prettier slide. The rubric was always going to reward the prettier slide. Winter wants her daughter to hold values that the institution has never rewarded, and she writes a five-thousand-word piece about artificial intelligence without once asking why the institution doesn’t reward them.

This is the intellectual hole at the center of a piece that is otherwise sharp, well-reported, and morally earnest. AI didn’t break the assessment system. It exposed that the assessment system was already broken, and everyone was pretending otherwise.

What the Slide Show Already Was

The printing-press slide show existed before Gemini. It was made in fifth grade to demonstrate learning. Whether it demonstrated learning was always a question nobody asked, because asking it would require admitting that the artifact — the thing handed in, the thing graded — was never reliable evidence of the process. The slide show could have been made with a parent’s help, with a template, with a slightly older sibling, with a capable friend who understood visual design. These interventions existed before large language models. They produced polished artifacts that the teacher accepted as evidence of understanding.

The educational research on this predates AI by decades. Robert Bjork’s distinction between performance and learning — the observable output versus the durable cognitive change — is from 1992. The problem of using artifacts as proxies for thinking is at least as old as Vygotsky. What AI did was not create this problem. It made the problem so visible, so fast, so cheap, that willful ignorance became impossible.

Winter quotes USC professor Mary Helen Immordino-Yang: “We are cutting off learning at the knees.” She quotes University of Toronto psychologist Amy Finn on the magic of how children retain unexpected, non-strategic details that adults would find irrelevant, a kind of creative unpredictability fundamentally misaligned with LLMs’ orientation toward speed and sleekness. These are real insights. They are also insights that apply equally to the printing-press slide show assigned as homework, graded for visual appeal and accuracy, returned in two days, and forgotten. The neuropsychological substrate for creating narratives and thinking through arguments over time is not developed by making a slide show under time pressure at home with no adult monitoring the process.

The question is not whether AI belongs in schools. The question — the one the piece never asks — is whether the assessment was measuring what it was supposed to measure before AI arrived. The answer is: sometimes, unevenly, and less than we told ourselves.

The Tool Hierarchy Problem

Winter’s implicit argument, followed consistently, condemns more than Gemini. Calculators offload arithmetic before numeracy is built. Spell-check offloads orthography. Grammarly offloads syntax judgment. Google Search offloads memory and source evaluation. Slide templates offload visual design judgment. Word processors themselves offload handwriting, which Winter mentions approvingly has developmental benefits — which means she believes at least one tool was introduced too early.

She draws the line at the tool that frightens her right now. This is a very human response and a terrible policy foundation.

The honest version of her argument looks like a developmental sequence: here are the cognitive substrates that must be built before each category of tool is introduced, and here is the evidence for that ordering. Immordino-Yang and Finn gesture at this — the “cognitive muscles” framing, the concern about atrophy before onloading — but nobody builds it out into something a school board could actually implement. Without that framework, the anti-AI position reduces to: tools I grew up with are fine, tools that postdate my childhood are suspect.

Amanda Bickerstaff, CEO of AI for Education, comes closest to the principled version: children should not be using chatbots under age ten, she says, because these tools require expertise and evaluation skills that even many adults don’t have. That’s a threshold with a rationale. It’s also the only threshold in the piece with a rationale. Everything else is rhetoric standing in for policy.

The Research That Isn’t Quite Research

The piece anchors much of its scientific authority in three studies. The 2025 MIT warning that LLMs “may inadvertently contribute to cognitive atrophy” — the authors felt it necessary to append an FAQ asking journalists not to use words like “brain rot” or “brain damage,” which tells you something about how the finding was being reported before Winter’s piece and how it will be reported after. The multi-institution study (MIT, CMU, UCLA, Oxford) on fraction-solving, which showed that students who lost AI access after using it performed significantly worse — not yet peer-reviewed, not yet published, findings are concerning, the concern is real. The Brookings “premortem,” which pairs 400 studies with hundreds of interviews to conclude that AI tools “undermine children’s foundational development.”

These are worth taking seriously. They are also worth examining carefully.

The fraction-solving study is the most empirically specific, and it is also the most useful argument against Winter’s piece rather than for it. The students who used LLMs on fraction-solving and then lost access performed significantly worse and were more likely to give up. The proposed mechanism: AI gives answers, students become dependent on the answer-giving, remove the answers and the capacity to generate them independently has atrophied.

But this is an argument about a specific implementation — an answer machine — not about the technology class. An LLM configured as a Socratic interlocutor, one that refuses to answer directly and instead returns questions that scaffold toward understanding, that detects when a student is stuck versus when they’re avoiding, that withholds confirmation until the student demonstrates the reasoning — that tool would presumably produce the opposite result. Students would have developed the reasoning process rather than outsourcing it, because outsourcing was never made available to them.

This is not an exotic capability. It is prompt engineering plus scaffolding logic. The reason it isn’t what’s being deployed in K-12 classrooms is that Google ships Gemini with a “Help me write” button because that’s the path of least resistance and maximum engagement. That is a product decision, not a technological inevitability. Winter never distinguishes between AI as answer machine and AI as thinking partner. The cognitive offloading critique collapses the moment you make that distinction, because the problem isn’t the tool — it’s the incentive structure of the company deploying it.

The social-emotional hijacking argument from UNC psychologist Mitch Prinstein is the weakest scientific claim in the piece, and it’s presented with the same credentialed authority as the others. Surging oxytocin and dopamine receptors around ages ten to eleven do drive peer-bonding — that’s established developmental neuroscience. Sycophantic LLMs “hijack the biological tendency to want peer feedback” — that’s a hypothesis, not a finding. The claim requires that chatbot interaction activates the same neurological pathways as peer interaction, that substituting chatbot interaction for peer interaction produces measurable deficits in social skill development, and that the effect is “hijacking” — a strong, directional, causal claim — rather than displacement or preference shift. No study is cited because none exists at the necessary scale with the necessary longitudinal follow-up.

This is neuroscience’s authority dressed over a speculation. Which is particularly ironic given that Winter is writing a piece about tools that generate confident-sounding output without rigorous foundations.

The Grade Your Daughter Is Going to Receive

Return to the slide show.

Winter’s daughter likes hers better because it’s original and she worked really hard on it. This is the right value. This is the value Winter wants the school to transmit. The school is not transmitting it, because the school is not grading for it.

If the rubric rewards polish, visual appeal, and impressive output — which most rubrics do, implicitly, because these are the things teachers can assess quickly across thirty slide shows at 11pm — then the student who uses Gemini gets the A. Not abstractly. On the transcript. The student who refuses Gemini, who holds Winter’s daughter’s values, receives the C. Neither of them learns the lesson Winter intends.

The deeper problem: homework was already a weak pedagogical instrument before AI. Most research on homework in K-8 is lukewarm. It was largely accountability theater — proof that learning happened, easy to grade, easy to assign, poor evidence of the process it was supposed to represent. AI exposed the theater. The theater was playing for years before AI bought a ticket.

What would it look like to actually assess the process? That question is harder than “what do we do about Gemini,” and it requires admitting that the current system was already failing to measure what it claimed to measure. Winter doesn’t want to ask that question, because asking it would mean the problem is older and deeper than the creepy neighbor who moved in recently.

What Actually Needs to Change

The resistance movements Winter profiles — District 14 Families for Human Learning, the Coalition for an AI Moratorium, Schools Beyond Screens — are better at stopping things than proposing them. The Student Tech Bill of Rights includes the right to read whole books, write on paper, and learn in a low-stimulation environment free from undue corporate influence. These are reasonable demands. They don’t add up to a pedagogy.

The conflict-of-interest thread is the piece’s most structurally damning detail and the most underplayed. The NYC DOE official overseeing the preliminary AI guidelines holds a fellowship jointly offered by Google and GSV Ventures — whose portfolio includes Amira and MagicSchool, two of the primary AI tools being deployed in the classrooms those guidelines govern. Other Google-GSV fellowship recipients include top school officials in Berkeley, Dallas, Los Angeles, Newark, Colorado, and Maryland. “If you ask tobacco companies to help write your school’s policy on cigarettes,” one parent says, “you’re going to end up with guidance on how to smoke responsibly in school.”

This is the argument Winter should have built the piece around. Not “AI is cognitively harmful” — which is partly true, partly speculation, and entirely dependent on implementation — but “the people writing the rules are being paid by the companies they’re supposed to regulate.” That is verifiable, structural, and not dependent on a not-yet-peer-reviewed study about fractions.

The piece ends with Sinha’s question — “What do you want from this?” — and Winter’s answer: nothing. It’s a parent’s answer. A good parent’s answer. But it is not a policy answer, and it is not an answer that acknowledges what was already not working before the neighbor moved in.

The assessment was already broken. The rubric was already rewarding the wrong things. The slide show was already a poor proxy for thinking. AI made all of this impossible to ignore. That is a service, not a crime — even if the service was rendered by someone with cloven hooves in Yeezy Boosts and a market cap of four trillion dollars.

What we owe children is not the tools of the past but a clear account of what learning actually is, what evidence of it looks like, and how to build assessments that can tell the difference. That conversation is harder than banning Gemini. It is also the only conversation that addresses what Gemini exposed.

Nik Bear Brown is Associate Teaching Professor of Computer Science and AI at Northeastern University and founder of Humanitarians AI. His work on AI in education, including the Genuine Learning Protocol framework, is published at bearbrown.co.

Tags: AI education New Yorker critique, cognitive offloading assessment design, Bjork learning performance distinction, AI schools policy Jessica Winter, GLP genuine learning protocol

Nik Bear Brown - Computational Skepticism

Discussion about this post

Ready for more?