This explains that comparing AI tutoring systems to human tutors isn’t fully fair because they measure different aspects of learning. AI systems mainly focus on procedural skills, while human tutors also handle understanding, emotions, and interaction. It raises an important point that instead of improving AI alone, we also need better ways to measure what real learning actually looks like.
This piece brilliantly exposes a decades-long apples-to-oranges comparison that's shaped educational technology research. The core insight—that Cognitive Tutor's 0.20 sigma effect isn't a failure but rather an honest measurement of what it was actually designed to do—reframes an entire field's narrative of disappointment. What's particularly striking is how clearly it documents that human tutoring isn't just procedural scaffolding plus interaction; it's a fundamentally different apparatus involving expectation-mapping, affective management, and flexible response to actual student thinking. The three diagnostic questions at the end are genuinely useful for cutting through AI-tutor hype. Worth reading carefully before accepting any claims
Really appreciated the breakdown of the construct mismatch, the surgeon vs. GP analogy clicked for me. The point about measurement apparatus being inherited even as the interaction layer changes feels like the most under-discussed issue in current AI-tutor hype.
This explains that comparing AI tutoring systems to human tutors isn’t fully fair because they measure different aspects of learning. AI systems mainly focus on procedural skills, while human tutors also handle understanding, emotions, and interaction. It raises an important point that instead of improving AI alone, we also need better ways to measure what real learning actually looks like.
This piece brilliantly exposes a decades-long apples-to-oranges comparison that's shaped educational technology research. The core insight—that Cognitive Tutor's 0.20 sigma effect isn't a failure but rather an honest measurement of what it was actually designed to do—reframes an entire field's narrative of disappointment. What's particularly striking is how clearly it documents that human tutoring isn't just procedural scaffolding plus interaction; it's a fundamentally different apparatus involving expectation-mapping, affective management, and flexible response to actual student thinking. The three diagnostic questions at the end are genuinely useful for cutting through AI-tutor hype. Worth reading carefully before accepting any claims
Really appreciated the breakdown of the construct mismatch, the surgeon vs. GP analogy clicked for me. The point about measurement apparatus being inherited even as the interaction layer changes feels like the most under-discussed issue in current AI-tutor hype.