Stop Hunting for Answers. Ask Your Course

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Stop Hunting for Answers. Ask Your Course

What a gambling algorithm reveals about the real problem with educational technology

Nik Bear Brown

Mar 27, 2026

Learn more → https://medhavy.ai

Read more on the Medhavy blog: https://medhavy.ai/blog

There is a moment most students know. You are twelve minutes into a lecture, or forty pages into a chapter, and the explanation has stopped making contact. The words are still arriving — the instructor is still talking, the textbook still has sentences — but something has decoupled. You are receiving information. You are not learning anything.

What happens next depends on who you are. Some students stop and ask a question. Some open a second tab. Some take more aggressive notes, as if the problem is that they haven’t written fast enough. Most do what people do when a machine stops working: they wait, and hope it starts again.

The system’s response to this moment is almost always the same. It continues. The lecture does not pause to recalibrate. The textbook does not offer a different approach. The platform logs that you have completed the module. You have not completed the module. You have sat in the room while the module happened.

This is not a technology problem. It is a philosophy problem. And the technology we have built to fix it has mostly encoded the same philosophy in a more expensive box.

The Illusion of Adaptation

For the past decade, the word adaptive has done significant damage to educational technology.

Adaptive, in the way most platforms use it, means personalized in the sense that a streaming service is personalized — the algorithm has observed your behavior and is now showing you more of what you already clicked on. Netflix knows you watch crime dramas. It does not know whether you understood them. It does not know whether watching more crime dramas is good for you. It knows you did not turn it off.

Apply this logic to learning and you get what we have: platforms that track completion, adjust pacing, and serve more of what a student has already engaged with. A student who moves quickly gets harder content. A student who slows down gets simpler content. This is not adaptation. This is a speed adjustment. The car is still going the same direction. It is going faster or slower based on whether you look nervous.

The deeper variable — the one that actually determines whether a person learns something — is not pace. It is approach. Whether the concept is explained directly or discovered through questions. Whether it is anchored in a case study or built from first principles. Whether the learner is asked to produce something or receive something. Whether the material is revisited strategically or encountered once and abandoned to memory.

These are pedagogical choices. They have been studied for decades. There are researchers who have spent careers trying to understand which approach works for which person under which conditions. The literature is substantial and inconclusive — because the answer is not fixed. Different people learn differently. The same person learns differently on different days, at different moments in a topic, at different levels of prior knowledge.

The honest conclusion from all of this research is not a recommendation. It is a method. You have to run the experiment.

The Bandit

The multi-armed bandit is a framework borrowed from probability theory, named for the slot machines in a casino — each with a different payout rate, none of them labeled.

The problem the framework solves is this: you have several options, you don’t know which one is best, and you have to act while you’re still learning. You cannot spend all your time testing (you’ll never exploit what you’ve learned) and you cannot commit to the first option that works (you might be missing something better). The bandit framework manages this tradeoff — choosing the option that currently looks best while continuously allocating some probability to exploring the alternatives.

Medhavy applies this framework not to slot machines but to pedagogical approaches. Five of them: direct instruction, Socratic questioning, case-based learning, spaced retrieval practice, and project-based generative learning. Each is a coherent educational philosophy with its own decades-long research tradition. Direct instruction works for foundational concepts, clear definitions, sequences that need to be right before anything else can proceed. Socratic questioning works for learners who have surface-level confidence and need to be pushed past the answer they’re pattern-matching toward. Case-based learning works for professionals whose knowledge only means something when it contacts a real decision. Spaced retrieval works for cumulative content where earlier concepts must survive long enough to support later ones. Project-based learning works when demonstrated output is the actual goal.

Each of these approaches requires different content, a different AI persona, a different conversational posture. The platform has to be built differently depending on which one is active. This is not a toggle. It is architecture.

What the bandit does is decide, for each learner at each moment, which approach to deploy — then observe what happens — then update its model. If a learner is getting grounded, engaged responses under the Socratic approach and then the pattern breaks, the bandit notices. It tries something else. When the evidence comes in, the model updates. Not for the cohort. For this learner, in this moment, in this chapter.

Most adaptive platforms are adaptive at the level of the cohort, or at the level of the module, or at best at the level of the pacing track. Medhavy’s bandit is adaptive at the level of the pedagogical philosophy itself — the deepest variable, the one that actually determines contact.

What Running the Experiment Actually Means

Here is what it means in practice, because the abstraction is easy to nod at without grasping.

A business school executive logs into a white-labeled deployment of the platform — the institution’s logo, their colors, a persona configured to sound like a senior corporate strategy advisor. She is working through a module on AI literacy. The bandit has no prior data on her. It defaults to direct instruction — explicit definitions, worked examples, clear sequencing.

She moves through it quickly. Her dwell time on the explanatory sections is short. She is not pausing to absorb. She already knows this. The bandit observes this pattern and shifts: the persona begins responding with questions rather than answers. When she states that AI can reduce operational costs, the advisor asks: in which cost category, specifically? What assumption about labor productivity is that estimate resting on? She slows down. She starts typing longer responses.

This is contact. The bandit records it.

Three modules later, she is in unfamiliar territory. The Socratic approach that worked before has stopped working — she is guessing rather than reasoning, which looks the same from the outside but registers differently in the interaction pattern. The bandit shifts again, this time to case-based learning. The persona anchors the next concept in a documented business case. She can see what happened, evaluate what went wrong, apply the framework to the scenario. The abstraction becomes legible through the example.

None of this requires a human to observe her, diagnose her, and intervene. It runs continuously, invisibly, updating with every interaction. At the end of the cohort, the institution sees which pedagogical approaches drove the most durable engagement, where the content has gaps (the grounded / not in textbook ratio), and which modules generated the most friction. The credential the institution issues has actual learning evidence behind it.

This is what it means to run the experiment. Not to have a theory about which approach is best. To find out.

The Constraint That Makes It Honest

There is one more piece of the architecture that matters, and it is the most counterintuitive.

The AI tutor that runs inside Medhavy is not allowed to use the internet. It is not allowed to draw on general knowledge. It is not allowed to speculate. When a student asks a question, the tutor searches the course content — the verified, expert-reviewed textbook built for this specific deployment — and grounds its response in what is actually there. If the answer is not in the textbook, the tutor says so. Not in the textbook. That is the response.

This sounds like a limitation. It is the point.

The failure mode of every general-purpose AI tutor is that it sounds authoritative whether or not it is correct. It produces fluent, confident, plausible responses. Students who cannot evaluate whether the response is accurate have no way to know when it has invented something. The TEXTBOOK_ONLY constraint eliminates this failure mode by eliminating the thing that causes it. The tutor cannot hallucinate because it cannot leave the source material.

A student who gets not in textbook has not gotten a wrong answer. They have gotten a real signal: this question is beyond the scope of what we’re covering here, and you should know that. That is pedagogically useful. That is honest. The platform would rather say nothing than say something false.

Most EdTech does not make this choice. Most EdTech prioritizes the appearance of competence over the reality of it. Medhavy has decided that the constraint is the credibility.

What This Means for Anyone Paying Attention

The argument for Medhavy is not that it is smarter than other platforms. It is that it is more honest about what learning requires.

Learning requires contact — the moment when an explanation actually reaches someone. That moment is not guaranteed by pacing, or by completions, or by a student sitting in the virtual room while the module happens. It requires the right approach for this person at this moment, applied consistently enough to work, abandoned quickly enough when it stops.

The bandit does not know in advance which approach is right. It cannot. Nobody can. What it does instead is run the experiment continuously, update on evidence, and refuse to commit to a prior that the evidence no longer supports.

That is not a gambling algorithm applied to education. That is what good teaching has always been — the willingness to try something different when what you’re doing stops working, the discipline to notice when it stops working before the student gives up, and the honesty to say, when you don’t know the answer: I don’t know. But I know where to look.

The machine has learned something most platforms haven’t.

The question is whether the institutions that deploy it are willing to learn the same thing: that the evidence matters more than the assumption, and that running the experiment is not a sign of uncertainty.

It is the whole method.

Tags: Medhavy AI adaptive learning, multi-armed bandit pedagogy, EdTech platform architecture, personalized learning systems, AI tutor grounded retrieval