ChatGPT feels like learning. It's articulate, available 24/7, and infinitely patient. Ask it to explain recursion and you'll get a clear, well-structured answer in seconds. No wonder employees default to it — and no wonder L&D leaders are asking whether it's "good enough" to replace formal training programs.
It isn't. And the reason has nothing to do with the quality of its answers.
Bjork and Bjork (2011) introduced the concept of "desirable difficulties" — the counterintuitive finding that conditions making learning feel harder are precisely what build lasting competence. ChatGPT is engineered to remove difficulty. That's the problem.
Talking about code is not writing code
Here's the most consequential gap. ChatGPT conversations are, at best, what Chi and Wylie (2014) classify as "active" engagement in their ICAP framework: The learner receives information and responds to it. Occasionally, the interaction becomes "constructive," where a learner generates new ideas. But it is never truly hands-on. Nobody's fingers are on a keyboard writing real code, debugging real errors, or shipping real features.
This matters enormously. Programming is a procedural skill. Anderson's ACT-R theory makes the case that procedural knowledge is built through doing, through the slow compilation of declarative knowledge into automated action sequences via practice. You cannot talk your way into knowing how to code any more than you can talk your way into knowing how to play piano.
The data backs this up at scale. Freeman et al. (2014) conducted a meta-analysis of 225 studies and found that active learning in STEM reduces failure rates by 33% compared to lecture-based instruction. That's not a marginal improvement. That's a third of your learners who would have failed now succeeding, simply because they spent more time doing and less time consuming.
A purpose-built training platform puts practice at the center. Learners spend upwards of 80% of their time writing code, not reading about it. They build portfolio-ready projects that prove applied competency. The output is demonstrable skill, not a chat transcript.
Feedback that actually changes behavior
ChatGPT can comment on code you paste into it. What it cannot do is watch you write that code in real time, detect patterns in your errors across sessions, or grade your work against a structured rubric tied to specific learning outcomes.
That distinction is critical. Hattie and Timperley (2007) found that feedback is among the most powerful influences on learning, with an average effect size of d = 0.73, but only when it operates at the task and process level. Telling a learner their code is "wrong" is low-value. Telling them why their loop logic fails, how it relates to a pattern they've struggled with before, and what a senior engineer would do differently, that changes behavior.
Purpose-built platforms deliver this kind of feedback automatically and at scale. AI grading that critiques style, structure, and correctness replicates the code review a junior developer would get from a senior colleague. Except it never gets tired, never gets behind, and never limits your cohort size.
The case for strategic withholding
ChatGPT is optimized to be helpful. You ask, it answers. Fully. Immediately. Every time.
From a learning science perspective, this is exactly wrong.
Kapur (2008, 2016) has spent over a decade studying what he calls "productive failure" — the phenomenon whereby learners who struggle with a problem before receiving instruction develop deeper conceptual understanding and significantly better transfer to novel problems than learners who receive instruction first. The struggle is not a bug. It is the mechanism.
An AI tutor built for training must be designed to not answer directly. It prompts. It asks questions. It points the learner back toward the problem. This is the Socratic method operationalized in software — and the evidence says it works. Learners who are forced to generate their own solutions, even imperfect ones, build the kind of flexible knowledge that survives contact with real-world complexity.
ChatGPT does the opposite. It hands you the answer and robs you of the cognitive work that would have made the answer stick.
Generic skills don't transfer to real jobs
Ask ChatGPT to teach you Python and you'll learn generic Python. You'll manipulate toy datasets, build textbook examples, and solve algorithm puzzles that have no relationship to your actual job.
Baldwin and Ford (1988) identified this as the transfer problem nearly four decades ago, and it remains one of the most persistent failures in corporate training. Skills learned in one context — especially a decontextualized, generic one — show poor near-transfer to the tasks employees actually perform. The content has to mirror the work.
A purpose-built platform can generate curriculum from organizational context: your tech stack, your codebase conventions, your problem domains. This is the difference between "completed a Python course" and "can build and maintain production code in our environment." One looks good on a dashboard. The other moves the business.
One window, not five
This one is simpler, but the evidence is clear. Using ChatGPT for learning means toggling between a chat window, a code editor, documentation, and whatever content you're trying to learn from. Every switch costs you.
Sweller's (1988) cognitive load theory explains why. Working memory is finite. When learners split attention across multiple sources of information, they burn cognitive resources on the logistics of learning rather than the learning itself. This is extraneous load, the kind that adds nothing and takes from everything.
Embedding the coding environment, instructional content, and AI feedback into a single interface eliminates this tax. Learners stay in flow. Cognitive resources go where they belong: toward building skill.
Two different tools for two different jobs
ChatGPT is a brilliant answer engine. It was built to be helpful, and it is. But helpful and pedagogically effective are not the same thing, and in several important ways, they are opposites.
The question for L&D leaders and training operators is not whether AI belongs in workforce development. It does. The question is whether you're using AI designed to teach or AI designed to answer. One builds competence. The other builds the illusion of it.
Your workforce deserves the real thing.
References
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1)
Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379–424





