There is a moment in every AI-assisted coding session that tells you everything about the developer sitting at the keyboard. The model generates a block of code — clean, confident, internally consistent. It compiles. The tests pass. The developer commits it and moves on.
What they never ask is the question that would save them three weeks in six months: Is this solving the right problem?
I came to Boondoggling the way most people come to uncomfortable realizations — after the thing that was supposed to work didn’t. The code was technically correct. The architecture was sound. And it was aimed, with beautiful precision, at a problem that had already been reframed by the time implementation began. Claude had done exactly what it was told. Nobody had told it the right thing.
This is not an AI failure. This is a human supervisory failure. And it is the failure that the developers now spending $20 a month on AI subscriptions are making, every day, at scale.
The 20% Problem
Here is what most developers actually do with Claude Code or Cursor: they describe a problem, they delegate the implementation, they verify that the output compiles, and they ship.
That is not 100% of the job. That is 20% of the job dressed up as 100%.
The other 80% — the part that determines whether the fast, confident, technically impeccable output is pointed in the right direction — requires five capacities that no model possesses. Not because current models are limited. Because of what statistical pattern matching structurally is and is not.
Claude solves faster than any human. That gap will not close. What will not change is this: the model cannot verify whether its output is grounded in the specific domain reality at hand. It cannot reframe a poorly formulated problem. It cannot interpret what an accurate result means in a specific human context. And it cannot integrate multiple legitimate but conflicting perspectives into a recommendation that someone is accountable for.
These are not bugs to be patched in the next release. They are features of the architecture. The model has been trained on what is common and likely. Your specific project, your specific codebase, your specific business constraint — these are neither common nor likely. The gap between what the model knows and what your situation requires is where all the damage lives.
The Conductor
The Boondoggling methodology is built around a single metaphor that earns its place rather than announcing itself. A conductor does not play any instrument. They hold the whole performance in mind while each section plays its part. They hear the wrong note before the score confirms it. They decide which piece is worth performing and how it should be interpreted. The performance collapses without them — even though they produce no sound themselves.
This is what graduate-level AI supervision looks like. And it is the role that most AI integration workflows currently fail to develop.
The developers who are getting genuine leverage from AI coding tools are not out-prompting the model. They are conducting it. Before Claude Code sees a single requirement, they have decided what the problem actually is. Before the first function is generated, they have specified what done looks like. After the output arrives, they verify it against domain reality before the next step begins.
The ones who are mostly generating technical debt faster than they generated it before — they learned to play their instrument. Nobody taught them to conduct.
Five Things the Model Cannot Do for You
The Irreducibly Human course at Northeastern — built on the same framework as Boondoggling — names these five supervisory capacities precisely. Not as professional development recommendations. As structural requirements for AI-assisted work.
Plausibility auditing is the judgment that happens before verification. It is knowing an output is wrong because of what you know about the domain — not because you ran a test. The model cannot audit its own plausibility. It does not know what it does not know. When it confabulates — when it produces a confident, internally consistent answer that is not grounded in reality — it does so fluently. The code runs. The tests pass. Plausibility auditing is the human capacity that catches this before it ships.
Problem formulation is deciding what the mission is before the model sees it. Not after. The quality of every output is determined here, at the moment of framing, before a single prompt is written. AI optimizes for the common and likely; humans must reframe toward the salient and important. The Semmelweis case — the formulation that saves lives was not the computationally tractable one — is the permanent lesson here. Hand problem definition to the model and you have not delegated. You have abdicated.
Tool orchestration is the sequencing decision. Which tool, in what order, with what context, and what does done look like at each handoff. The developer who reaches for Claude Code because it is already open is not orchestrating — they are defaulting. Orchestration means choosing the audit tool with a different failure mode than the generation tool, so they catch each other’s blind spots.
Interpretive judgment is supplying meaning that the model cannot supply. Which of these three implementations is correct for this context — not in the abstract, but here, in this organization, for this user, at this moment. The model can tell you what each implementation does. It cannot tell you what it means. Somebody has to sign their name to that answer. The model cannot do that either.
Executive integration is not sequencing the four prior capacities. It is holding all four simultaneously toward a unified goal — recognizing when a plausibility audit finding requires problem formulation to re-engage, when an orchestration decision surfaces an interpretive judgment that wasn’t on the agenda. This is what the conductor does in the fourth quarter of a difficult performance: not running a checklist, but maintaining a unified hold on where the whole thing is going.
Better models will not close these gaps. They will widen the stakes of them.
What the Build Actually Looks Like
A moderately complex website — six routes, hybrid architecture, admin dashboard, community upload pipeline, sandboxed iframe viewer, full prompt library — built using the Boondoggling method took roughly three hours. Two hours of conversation with Gru, the custom orchestration prompt. One hour with Claude Code.
Nearly all the time was spent talking. Not coding. Not debugging. Not searching documentation. Talking — precisely, in the right order, about what the site was, who it was for, what it would and would not do, and what each piece needed to be true before the next piece began.
The result was a Boondoggle Score: a conductor’s score with two simultaneous parts. The Minion Part — exact prompts for Claude, in dependency order, each with context required, expected output, and a handoff condition. The Gru Part — precise human actions, labeled by supervisory capacity, in the same dependency order.
Nine Claude tasks. Eleven human tasks. More human decisions than machine outputs. But the Claude tasks ran fast and clean because the structure was already there. Every prompt worked — not because the prompts were magic, but because the conversation that produced them was structured.
The handoff condition is the most important element in the score. It is the conductor’s downbeat. A model that does not know when to stop will stop at the wrong place or not stop at all.
The Vocabulary of What Is Actually Happening
The Boondoggling framework gives names to the different kinds of work in an AI-assisted build. The names are worth knowing because naming a thing is the first step to doing it deliberately.
Frick-fracking is the iterative work — small precise edits, one thing changed at a time, the kind of work Claude Code does exceptionally well when given clear scope. This is where the actual build lives after the structure is established. It is productive and it does not require your full attention. It is not, however, the whole job.
Noodling is the dreaming phase. Figuring out what to build before figuring out how. This happens before the model sees anything. It is the lightest touch — a thought that something could be interesting, a question about whether this feature serves the person the thing is built for. The discipline is knowing which noodle is worth developing. The problem statement is the filter.
Confabulating is the danger word. When the model produces plausible output that is not grounded in reality. It sounds correct. It reads correctly. The code compiles. Only domain knowledge catches it. This is precisely the failure mode that plausibility auditing exists to address — and precisely the failure mode that developers who have learned to prompt but not to supervise will miss every time.
What You Are Actually Responsible For
The developers most effectively using AI coding tools are not the ones generating the most code. They are the ones who have understood that their job changed — and changed in a specific direction.
The job is not to type less. The job is to decide more precisely.
You are responsible for what the problem actually is. You are responsible for what done actually looks like. You are responsible for whether the fast, confident, technically impeccable output is pointed at reality or pointed at a plausible simulation of it. The model takes no responsibility for any of this. It cannot.
The minions are excellent. They are enthusiastic. They will execute exactly what they understood you to mean.
That gap — between what you meant and what they understood — is where all the damage lives.
Anyone can use Claude Code. The question is whether you are playing an instrument or conducting the orchestra.
Tags: boondoggling AI methodology, Claude Code supervision framework, AI-assisted software development, solve-verify asymmetry, plausibility auditing human-AI collaboration








