The Ladder That Isn't There

What Companies Are Building to Replace the Rung AI Eliminated

Apr 25, 2026

The argument goes like this: AI automates entry-level coding work, so companies stop hiring junior developers, so there is nobody to become the senior developers of 2030, so the companies that cut the pipeline will find themselves in 2030 with powerful AI tools and no one with the judgment to use them safely. IBM’s chief human resources officer, Nickle LaMoreaux, made exactly this case in February 2026, announced that IBM was tripling its entry-level hiring, and called on HR leaders across the industry to do the same. “The companies three to five years from now that are going to be the most successful,” she said, “are those companies that doubled down on entry-level hiring in this environment.”

It is a coherent argument. It is also, in its publicly available form, incomplete in precisely the ways that matter most.

The Gap Between the PR and the Pipeline

LaMoreaux is right about the pipeline problem. She is far less specific about the solution. What IBM has said publicly is that it “rewrote” entry-level software developer roles — less boilerplate coding, more AI oversight, more customer interaction, more focus on what the company calls “systems judgment.” Junior developers will spend less time on routine code generation and more time auditing AI output, working directly with clients, and doing the cognitive work of translating business requirements into prompts that produce useful results.

This is not nothing. It represents a genuine attempt to think through what the entry-level job becomes when AI can generate syntactically correct code faster than a human junior can type it. But there is a question embedded in the new job description that IBM has not publicly answered, and it is the only question that matters: does “AI oversight” actually develop the judgment needed to become a senior engineer?

The historical pathway was not glamorous. A junior developer spent two, three, four years writing boilerplate. Authentication flows, database migration scripts, unit tests, CRUD endpoints. Nobody loved the work. The work was, in terms of its immediate output, largely automatable. But the work was also, in terms of its developmental function, the curriculum — and the precise mechanism was not the writing. It was the failure. You wrote the authentication flow. It broke in production in ways you did not anticipate. The error message was visible, the gap between your expectation and reality was undeniable, and you had no choice but to struggle with it. You debugged it, which meant reading documentation you hadn’t read, asking a senior why your mental model was wrong, building a new mental model to replace it. You did this thousands of times. At the end of the process you were a senior engineer — not because you had written a lot of boilerplate, but because engaging repeatedly with its failures had built something durable in your brain.

This distinction matters, because it reframes the problem precisely. AI does not just remove the writing. It removes the visible failure. Code compiles. Tests pass. The race condition hides inside a sleep call. The memory leak is invisible to the test suite. The architectural drift from intent looks like a working feature until it fails at scale in production. The failure is still there — AI-generated code fails in ways human-generated code fails, and in new ways besides. But the failure is no longer surfacing where the junior developer can see it, at a latency and legibility that would allow them to learn from it. That is the actual developmental gap.

The Comprehension Debt Problem

Anthropic published research in January 2026 that should be uncomfortable for every company now designing “AI-native” entry-level roles. Junior developers who delegated code generation to AI tools scored between 24% and 39% on subsequent comprehension assessments. Those who used AI as a collaborator — asking questions, challenging outputs, forcing themselves to understand what the AI produced — scored between 65% and 86%. The difference is not AI versus no AI. The difference is how you use the tool.

The researchers called the gap “comprehension debt” — a cumulative deficit between what the codebase does and what the people managing it understand. It is a subtle disaster. The code works. The tests pass. The junior developer ships the feature. The comprehension debt doesn’t reveal itself until the system breaks in a way that requires architectural judgment to diagnose — which is precisely the moment when you need the senior engineer who was supposed to emerge from the junior developer who was supposed to be learning while working.

There is neurophysiological evidence for the mechanism. A 2025 MIT study by Kosmyna et al. tracked EEG connectivity in participants writing under three conditions: LLM-assisted, search-engine-assisted, and unaided. Across alpha, theta, and delta bands — associated with internal semantic processing, working memory, and self-directed ideation — connectivity scaled inversely with external support. LLM users showed the weakest brain network engagement. More consequentially: when LLM-habituated participants were later asked to work without the tool, their neural connectivity did not reset to novice levels, but it did not reach the levels achieved by practiced unassisted writers either. Alpha and beta engagement — associated with top-down planning and self-driven organization — remained measurably suppressed. The authors call this accumulation “cognitive debt.” The study involves essay writing rather than software development, and the sample of 54 students is too small to carry causal weight. But the finding is structurally consistent with the broader claim: if the generative cognitive work is externalized during the period when mental models are supposed to form, those models form incompletely — and the deficit persists when the tool is removed.

Microsoft’s Azure CTO Mark Russinovich and VP Scott Hanselman put the problem with blunt clarity in a February 2026 paper in Communications of the ACM. Senior engineers experience an “AI boost” — the tools multiply their throughput, and they have the judgment to steer and verify the output. Junior engineers experience what Russinovich and Hanselman call “AI drag” — the tools produce output that looks correct, which the junior developer lacks the judgment to evaluate, and the work is done without the learning happening. The rational economic response for any CFO is to hire seniors and automate juniors. The structural consequence is: no pipeline.

What makes their diagnosis particularly useful is that they catalogue the specific failure modes AI tools exhibit that juniors cannot catch without guidance: agents masking race conditions with sleep calls, agents claiming success on buggy code, agents implementing algorithms that pass tests but don’t generalize. These are Layer 1 failures — implementation-level breakdowns in code that appears to work. A junior developer encountering these outputs sees success where a senior sees warning signs. The failure signal exists. It is not visible to the person who needs to learn from it.

The IBM Critique, Sharpened

IBM’s rewritten roles can be mapped onto the three types of failure signal that produce engineering judgment. There is implementation-level failure — the race condition, the architectural drift, the code that claims success when bugs remain. There is systems-level failure — the customer complaint that maps through the stack to a root cause nobody documented. And there is specification-level failure — the moment someone has to stake their name on whether the requirements themselves were right.

The old boilerplate model exposed juniors to implementation-level failure almost exclusively, and accidentally. The new IBM model — AI oversight, customer interaction, requirements translation — is, in theory, exposure to all three. That is not a step backward. It might be a step forward.

But the theory collapses without the preceptorship. Implementation-level failures in AI output are invisible to someone who lacks enough technical intuition to recognize them. You cannot learn to catch the subtle wrong if no one makes the subtle wrong visible. IBM has rewritten the job description to include “AI oversight” without building the structural condition under which AI oversight actually teaches anything. Without a preceptor paired with the junior, making the failure legible — pointing at the sleep call masking the race condition and explaining why that is wrong, not just that it failed — the oversight role is compliance work, not learning. The junior sees that the tests passed. The preceptor sees the problem the tests don’t catch. Without the preceptor, that gap is just a gap.

Some organizations are doing more than announcing intentions. The responses are uneven, but they are real.

Microsoft proposed a preceptorship model that is worth examining in detail. The structure is adapted from clinical nursing: senior engineers paired with early-in-career developers at three-to-one or five-to-one ratios, for a minimum of one year, on real product teams rather than training sidecars. AI tools are configured to operate in what Russinovich and Hanselman call “EiC mode” — Socratic coaching before code generation, forcing the junior to articulate what they’re trying to accomplish before receiving a solution. Mentorship hours are measured as “human impact” alongside product metrics in performance reviews, which means the senior engineer’s career is now connected to the junior’s development, not just the senior’s own throughput. The model is modeled on clinical preceptorships explicitly because clinical nursing faced the same problem decades ago: how do you develop judgment in someone who is working in a high-stakes environment with experienced practitioners who have better things to do than teach?

Russinovich and Hanselman are honest about the limits of their own proposal. Microsoft cut significant engineering headcount in 2024 and 2025. Whether the preceptorship model will scale into a sustained program depends on whether leadership changes the metrics they optimize — a “big ask” for organizations whose incentives have historically emphasized shipping velocity above all else.

McKinsey redesigned its screening process for the AI era through an assessment called Solve — a gamified evaluation that tests critical thinking, decision-making, and systems thinking, explicitly not prior business knowledge or technical credentials. The framing is sound: what the company needs is people who can learn in the new environment, not people who already know the old skills. Whether a better hiring filter compensates for a weaker developmental pathway is not yet clear.

IBM’s own “New Collar” apprenticeship program is being updated to include what the company calls “AI-native habits” — using AI tools to deconstruct pull requests rather than build from scratch, understanding the architecture of LLMs, designing with generative tools before implementing. The Flatiron School is running an “Accelerated AI Engineer Apprenticeship” that pairs participants with mentors on real agentic frameworks at $20 per hour, with a foundations-first approach that introduces concepts simply before revisiting them with increasing technical depth.

These are attempts. They are not yet evidence.

The Review Tax Nobody Discusses

There is a cost to the existing senior engineers that the pipeline conversation mostly ignores. When one senior can generate the volume of three juniors, the productivity gains are real. But generating code is cognitively different from verifying code, and the verification is now happening at three times the volume.

Senior engineers are spending their days as high-speed compliance officers. Thousands of lines of AI-generated logic, auditing for subtle hallucinations — race conditions masked by sleep calls, code that passes tests but doesn’t generalize, architectural drift that looks fine in isolation and fails at scale. A 2025 paper found that after AI adoption, core developers reviewed more code but their own original productivity dropped 19%. The creative, architectural, problem-solving work that makes senior engineering satisfying and that produces the judgment juniors are supposed to be learning from — that work is being crowded out by the cognitive exhaustion of reviewing AI output at industrial scale.

The delegation vacuum compounds this. Seniors previously handed off lower-risk tasks to juniors as a pressure valve and as a teaching mechanism. Junior implements the UI component, senior reviews it, junior learns something. That loop no longer exists. The junior’s tasks were automated. The senior’s workload increased. The teaching is not happening.

This is the tax that makes the developmental problem worse. The senior engineers who were supposed to mentor are stretched thin doing work that used to be distributed. The preceptorship model addresses this in theory — by making mentorship a measured part of the senior’s job rather than an afterthought. Whether organizations are actually willing to accept the velocity tradeoff is a different question.

What Is Actually Known

The honest answer to the core question — can AI-assisted entry-level work produce the same developmental outcomes as the boilerplate-and-struggle model — is that nobody knows yet.

The cohort that entered the workforce in 2024 and 2025 under AI-assisted conditions will become mid-level engineers in 2027 and 2029. Whether they emerge with the architectural judgment, the debugging instincts, the systems thinking that the old pipeline produced will not be visible until then. The data will arrive precisely when it is needed most — when those engineers are supposed to be the senior developers filling the next generation’s pipeline — and if the answer is no, the remediation options will be limited and expensive.

The Dreyfus model of skill acquisition gives a name to what is at risk. Novices follow rules. Advanced beginners develop pattern recognition. Competent practitioners make choices and bear the consequences of those choices — this is where accountability and emotional investment enter, and where learning accelerates. Proficient practitioners sense problems before the data confirms them. Experts operate through intuition built from thousands of absorbed experiences. The concern is not that AI-assisted juniors are incompetent. It is that they plateau. They recognize patterns. They generate outputs that look like what competent practitioners produce. But they have not made choices whose consequences they had to live with. They have not debugged the 2am production failure that rewired their mental model of how distributed systems actually behave. They have not asked a senior why their elegant solution was wrong and received an answer that changed how they think permanently.

The Kosmyna finding is the most uncomfortable piece of evidence in this space. It is preliminary and domain-limited. But if it holds in technical domains — if the cognitive debt from AI-assisted early-career work doesn’t reverse when the tool is removed — then the preceptorship model is not sufficient on its own. The preceptor can make visible the failure the junior cannot yet see. But they cannot rebuild the neural substrate that early unassisted struggle was supposed to create. The minimum viable intervention may require some version of deliberately maintained struggle — manual-first implementation for foundational modules, Socratic AI tools that require the junior to predict before they receive — to preserve the generative cognitive engagement that builds the mental models the preceptorship then calibrates.

The Wager

IBM’s wager is that oversight, verification, and customer-facing accountability can replace the old developmental pathway. That a junior developer who spends years auditing AI output, explaining architectural choices to clients, and taking responsibility for the correctness of generated code will develop the judgment that used to come from writing and debugging the code yourself.

It might be true. And the three-layer framing suggests it could be more than just “not worse” — exposure to systems-level and specification-level failure earlier in a career, rather than after years of boilerplate, might actually compress the timeline to senior judgment rather than extend it. Customer-facing rotation, where the junior must translate vague failure descriptions into root-cause hypotheses, is the kind of developmental experience that the old model often didn’t provide until mid-career.

But the theory requires the load-bearing piece that IBM has not publicly committed to: preceptorship at Stage 1. The implementation-level failures in AI output are invisible to a junior who lacks the technical intuition to recognize them. Making those failures legible is the senior engineer’s job — not reviewing for correctness, but externalizing judgment that the junior cannot yet access. Without that, the oversight role is compliance work. The junior sees tests passing where the senior sees warning signs. The gap between those two observations is where the learning was supposed to happen.

LaMoreaux is right that the companies which doubled down on entry-level hiring in this environment will be better positioned in 2030. She is right that the pipeline problem is real. What she has not yet answered — what no major company has publicly answered with evidence — is whether the new developmental pathway they are building actually delivers Stage 2 and Stage 3. Whether the junior who spends a year doing AI oversight develops the systems intuition to translate “it stops working sometimes” to root cause. Whether they get to the point of staking their name on an architectural judgment call, being wrong about something, and learning from the consequence.

The ladder looks different. Whether it goes to the same place, and whether the companies building it have designed the rungs deliberately enough to find out, we do not yet know.

Tags: junior developer pipeline AI, failure signal model developer expertise, IBM entry-level roles 2026, Kosmyna cognitive debt LLM, Russinovich Hanselman preceptorship ACM

Nik Bear Brown - Computational Skepticism

Discussion about this post

Ready for more?