The Tutoring Paradox: Why Being Helped Can Prevent Learning

Aethel
16min read
3,086words
6views
6readers
100%completion

There is a version of help that teaches, and a version that replaces teaching. The first leaves the learner more capable than before. The second leaves them more dependent — and, crucially, unaware of the dependence, because the help arrived so smoothly that no learning gap was ever visible. The distinction between these two kinds of help is one of the most important and least discussed problems in education. It has become urgent now that the help is instant, unlimited, and available to everyone.


I learned to drive in a car with a dual-control system — a second brake pedal on the passenger side that my instructor could use if I was about to do something catastrophic. I knew the brake was there. My instructor almost never used it. But knowing it was there changed how I drove, and not in the way you might expect.

The presence of the backup brake made me slightly less afraid. Slightly less afraid meant slightly less careful. Slightly less careful meant that I was, in the early lessons, learning somewhat less from my own near-mistakes than I would have if those near-mistakes had felt more genuinely consequential. My instructor understood this, and at some point told me — with more precision than I had expected from a driving instructor — that she was going to use the brake less often than I thought she would, and that I should proceed as if I were the only person in the car with access to the controls.

It was a small thing, and I understood what she was doing only in retrospect. She was removing a safety net that was preventing me from developing my own. The help she was not providing was, in a specific and important sense, more educational than the help she was providing.


The Scaffolding Problem

The concept of scaffolding — one of the most influential ideas in educational psychology — comes from Jerome Bruner's elaboration of Lev Vygotsky's work in the 1970s. The image is architectural: a scaffold supports a structure under construction, enables building that could not happen without the support, and is then removed once the structure can support itself. Educational scaffolding, on this model, is temporary assistance that enables a learner to do something beyond their current independent capacity — with the explicit understanding that the assistance withdraws as capacity develops.

The critical word in that formulation is withdraws. Scaffolding that does not withdraw is not scaffolding. It is a prosthetic — a permanent substitution for a capacity that was never developed because the substitution was always available.

Vygotsky's Zone of Proximal Development, from which the scaffolding concept derives, describes the gap between what a learner can do independently and what they can do with appropriate assistance. Learning occurs in this gap: in the space between unassisted capacity and assisted capacity, where the assistance is calibrated to be sufficient to enable the task but insufficient to eliminate the cognitive effort required to do it. The assistance has to be incomplete — has to leave something for the learner to do — in order to produce learning rather than simply performance.

This is the theoretical structure that makes the tutoring paradox possible: help that is too complete — that removes all the difficulty, that answers rather than prompts, that does rather than enables — is help that operates outside the Zone of Proximal Development. It allows the learner to perform at a level beyond their current capacity without moving their capacity in that direction. The performance is real; the learning is not.


The Worked Example Effect and Its Reversal

There is a research finding in cognitive psychology that seems, at first reading, to contradict everything I have just said. It is called the worked example effect, and it shows that novice learners — people who are new to a domain and have very little prior knowledge — learn more effectively from studying worked examples of solved problems than from attempting to solve problems on their own.

The finding is robust and well-replicated. Novices who study fully worked examples make fewer errors, learn faster, and perform better on subsequent tests than novices who attempt to solve the same problems through discovery or guided inquiry. The implication seems to be: provide the answer; do the problem for the learner; tell them what to do. This is what produces learning, at least at the beginning.

But there is a companion finding that is less frequently discussed, called the expertise reversal effect, which shows that the advantage of worked examples disappears as expertise develops — and then reverses. For intermediate learners, worked examples produce no advantage over problem-solving. For more advanced learners, worked examples actively impair learning: learners who study worked examples that they have the capacity to work through themselves learn less than learners who work through the problems without the example provided.

The explanation for this reversal lies in cognitive load theory: worked examples reduce extraneous cognitive load (the effort of navigating an unfamiliar problem space) and thereby free cognitive resources for learning. For novices, this reduction is beneficial because their cognitive resources are consumed by the unfamiliarity of the domain. For experts, the reduction eliminates the productive cognitive effort that generates the deep encoding of new knowledge. The difficulty that was getting in the novice's way is the same difficulty that the expert needs in order to learn.

The implication — which is both well-established in the research and consistently ignored in practice — is that the appropriate level of help is a moving target. Help that is well-calibrated for a novice is too much help for an intermediate learner, and actively harmful for an advanced one. A system that provides the same level of help regardless of the learner's developing expertise is not adapting to learning. It is optimising for performance at the cost of development.


Why Instant and Complete Help Is Specifically Bad

The problem with the kind of help now most widely available — the kind produced by large language models in response to any question — is not that it is help. It is that it has two properties that make it particularly well-designed to prevent learning: it is instant, and it is complete.

Instant help eliminates the period of effortful search that precedes assistance. This period is not wasted time. It is, according to a substantial body of research on what Robert Bjork calls "desirable difficulties," the period during which the cognitive operations most associated with deep learning occur: the activation of prior knowledge, the identification of what is and is not understood, the generation of candidate answers that can be evaluated against the correct one when it arrives. A learner who struggles with a problem for five minutes before receiving the answer learns significantly more from the answer than a learner who receives the answer immediately. The struggle is not the obstacle to learning; it is the mechanism by which learning occurs.

Complete help eliminates the gap — the Zone of Proximal Development — in which learning occurs. If the help provides not just direction but the full answer, not just a hint but the complete solution, not just a prompt but the finished formulation, there is nothing left for the learner to do except verify that the help is correct. Verification is easier than generation and produces substantially less learning. The learner who receives a complete answer can check it; they cannot construct it. And the cognitive operations involved in construction are not the same as, and are not substitutable by, the cognitive operations involved in checking.

The combination — instant and complete — is particularly effective at producing the experience of learning without the substance of it. The learner engages with difficult material, receives help that resolves the difficulty, understands the resolution, and feels the satisfaction of a problem solved. None of this is fake; all of it is real. What is not real is the implication that the satisfaction of a problem solved is the same as the development of the capacity to solve the problem. It is not, and the gap between them is not visible in the moment. It is visible later, when the problem reappears without the help available, and the capacity that the learner believed they had developed turns out not to be there.


The Kirschner-Sweller-Clark Problem

In 2006, Paul Kirschner, John Sweller, and Richard Clark published a paper in Educational Psychologist titled "Why Minimal Guidance During Instruction Does Not Work," which argued, on the basis of cognitive load research, that discovery-based and inquiry-based learning approaches — those that minimise direct instruction in favour of student exploration — are less effective than explicit, direct instruction for novice learners. The paper generated substantial controversy, and it deserves engagement rather than dismissal.

Kirschner, Sweller, and Clark are right about the novice problem: learners who lack the prior knowledge to navigate an unfamiliar problem space productively cannot learn effectively from unguided exploration, because they do not have the resources to distinguish productive paths from unproductive ones. The difficulty they encounter is not the productive difficulty that builds learning; it is the unproductive difficulty of being lost in a space you do not understand, which produces frustration and errors rather than insight.

But the argument proves less than it is often taken to prove. It establishes that unguided discovery is ineffective for novices. It does not establish that fully guided instruction — instruction that eliminates all generative effort from the learner — is optimal for anyone. The research supports a model in which the degree of guidance is calibrated to the learner's developing expertise: high guidance early, fading guidance as expertise develops, minimal guidance for advanced learners who benefit from the productive struggle that low guidance enables.

The mistake that much educational technology makes — and that AI tutoring systems are currently in the process of making at scale — is to read the argument for explicit instruction as an argument for maximal help. It is not. Explicit instruction is not the same as doing the work for the learner. A good explanation does not substitute for the learner's own cognitive engagement; it provides the foundation from which that engagement can become productive. The distinction is precisely the distinction between scaffolding and prosthetics — between support that enables development and support that replaces it.


The Neuroscience of Productive Struggle

There is a neuroscientific dimension to this argument that deserves mention, not because neuroscience settles the educational question but because it explains the mechanism.

Learning — the formation of durable memory and transferable skill — depends on processes that operate preferentially under conditions of effortful retrieval and generative engagement. The spacing effect, the testing effect, the generation effect: all of these well-established findings in learning science describe situations in which the cognitive effort required to produce an answer, retrieve a memory, or construct a connection generates stronger neural encoding than the passive reception of information. The encoding is proportional to the effort; the effort is the mechanism.

This means that the ease of receiving an answer is not separable from the learning value of receiving it. The answer that costs nothing to obtain contributes little to the learner's capacity. The answer that costs effort — that was preceded by struggle, that required something from the learner before it arrived — contributes proportionally. A system that minimises the cost of answers is, in this sense, minimising the learning value of those answers. The efficiency it provides is real; it is efficiency at producing performance, not at developing capacity.

The implication for AI-assisted learning is specific and actionable: the most educationally valuable thing an AI tutor can do is frequently to not answer the question. Not to withhold help capriciously, but to respond to a question with a question — to redirect the learner toward the thinking that would produce the answer, rather than producing the answer itself. This is harder to build, harder to evaluate, and considerably less satisfying to the learner in the moment. It is also more likely to produce what a tutoring system is supposed to produce: a learner who, when the tutor is no longer present, can do what they could not do before.


What Good Help Looks Like

The best tutors I have encountered — and I mean this in the full range of contexts, from formal education to informal mentorship — shared a quality that was, initially, puzzling: they were reluctant to answer questions directly. Not incapable of answering; not withholding answers out of pedagogy performed for its own sake. But consistently inclined to respond to a question with a question, or with a partial answer that required something from me to complete, or with a redirection toward the resources that would let me find the answer myself.

This was sometimes frustrating. There were moments when I wanted the answer and instead received a question, and the question felt, in those moments, like being cheated. Looking back, those are the moments I remember most clearly — not because the frustration was memorable, but because what happened after the frustration was different from what happened when I simply received an answer. After the frustration, I had to do something. Doing something produced something that the answer would not have produced.

The model this points toward is not the model of an assistant — an entity whose function is to reduce the cost of tasks. It is the model of a teacher in the Socratic sense: an interlocutor whose function is to produce in the learner the conditions under which the learner can think better than they could before the conversation. The difference between these two models is the difference between an AI that helps you finish the problem and an AI that helps you become someone who can finish the problem. Both are forms of help. Only one of them is tutoring.


The Specific Problem With AI Tutoring

There is a version of this critique that applies to all tutoring systems, and a version that applies specifically to the kind of AI assistance now available. The specific version is worth making explicit.

Human tutors, even poor ones, have a property that AI systems have until recently lacked: they are visibly imperfect sources of authority. A human tutor can be challenged. They can be asked to justify their answer, to explain why they think they are right, to account for the fact that another source says something different. The social dynamic of the interaction — the presence of another person with their own fallibility — makes the knowledge being transmitted legible as something that could, in principle, be examined.

This does not mean human tutors are always challenged, or that their answers are always correct. It means that the social structure of the interaction preserves, at least in principle, the space for the learner's own critical engagement. The learner is not simply a recipient; they are a participant in an interaction with another fallible agent.

Large language models have, for most users in most situations, the opposite social structure. They produce answers with a confidence and fluency that signals authority without the fallibility that would make that authority contestable. The answer arrives as if from a source that has already done the work of checking, that has already resolved the uncertainties, that has already determined what the correct answer is and is now simply reporting it. Challenging a language model feels different from challenging a human tutor — not because the model's answers are more reliable, but because the performance of reliability is more complete.

This matters because the willingness to challenge a source of information — to ask why, to push back, to test the answer against other sources and your own reasoning — is precisely the cognitive activity that produces deep learning rather than surface learning. A source that discourages challenge by performing complete authority is a source that discourages the cognitive engagement that would convert the information it provides into understanding.

The educational ideal — the ideal that genuinely good tutors approximate and that most AI systems fail to approach — is a source that provides enough to enable your thinking without providing so much that it replaces it. A source that is authoritative in a way that invites examination rather than foreclosing it. A source that is, in the relevant sense, incomplete on purpose.


The Dependency That Cannot Be Seen

The most troubling feature of the tutoring paradox is not that it produces dependence — many things produce dependence, and dependence is not always bad. It is that it produces invisible dependence: dependence that does not feel like dependence because the help has been so consistently available that the absence of independent capacity has never been directly confronted.

The learner who has always had instant, complete help available does not know what they would be capable of without it. The help has been there every time the difficulty appeared, and the difficulty has therefore never been fully encountered. The incapacity is real; the experience of incapacity is not. And the experience of incapacity is the diagnostic that would prompt the learner to develop the capacity.

This is the version of the paradox that matters most at the current moment. It is not that AI assistance is bad for people who already have the capacity to do without it. It is that AI assistance, used before that capacity has been developed, may prevent the development from occurring — invisibly, painlessly, in a way that does not register as prevention because the help was there and the work got done and the problem was solved.

What did not happen was learning. And what cannot be seen, from inside the experience of being helped, is the difference between having done the work and having had the work done for you.


There is no version of this argument that concludes with "never use AI for learning." That conclusion is not supportable and would not be honest. The argument concludes with something considerably more specific: that help which is too complete, too instant, and too consistently available produces performance without development — and that the value of a tool for learning is measured not by how quickly it resolves difficulty but by whether the resolution leaves the learner more capable than they were before. This is a harder standard than most tools are currently designed to meet.