Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.
The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):
I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.


Something any (real, trained, educated) developer who has even touched AI in their career could have told you. Without a 3 month study.
What’s funny is this guy has 25 years of experience as a software developer. But three months was all it took to make it worthless. He also said it was harder than if he’d just wrote the code himself. Claude would make a mistake, he would correct it. Claude would make the same mistake again, having learned nothing, and he’d fix it again. Constant firefighting, he called it.
As someone who has been shoved in the direction of using AI for coding by my superiors, that’s been my experience as well. It’s fine at cranking out stackoverflow-level code regurgitation and mostly connecting things in a sane way if the concept is simple enough. The real breakthrough would be if the corrections you make would persist longer than a turn or two. As soon as your “fix-it prompt” is out of the context window, you’re effectively back to square one. If you’re expecting it to “learn” you’re gonna have a bad time. If you’re not constantly double checking its output, you’re gonna have a bad time.
@felbane @AutistoMephisto i don’t have a cs degree (and am more than willing to accept the conclusions of this piece) but how is it not viable to audit code as it’s produced so as it’s both vetted and understood in sequence?
Auditing the code it produces is basically the only effective way to use coding LLMs at this point.
You’re basically playing the role of senior dev code reviewing and editing a junior dev’s code, except in this case the junior dev randomly writes an amalgamation of mostly valid, extremely wonky, and/or complete bullshit code. It has no concept of best practices, or fitness for purpose, or anything you’d expect a junior dev to learn as they gain experience.
Now given the above, you might ask yourself: “Self, what if I myself don’t have the skills or experience of a senior dev?” This is where vibe coding gets sketchy or downright dangerous: if you don’t notice the problems in generated code, you’re doomed to fail sooner or later. If you’re lucky, you end up having to do a big refactoring when you realize the code is brittle. If you’re unlucky, your backend is compromised and your CTO is having to decide whether to pay off the ransomware demands or just take a chance on restoring the latest backup.
If you’re just trying to slap together a quick and dirty proof of concept or bang out a one-shot script to accomplish a task, it’s fairly useful. If you’re trying to implement anything moderately complex or that you intend to support for months/years, you’re better off just writing it yourself as you’ll end up with something stylistically cohesive and more easily maintainable.
@felbane thanks, such a thorough response really appreciate your time.
It’s still useful to have an actual “study” (I’d rather call it a POC) with hard data you can point to, rather than just “trust me bro”.
Like the MIT study that the author refers to? The one that already existed before they decided they need to do it themself?
I was in charge of an AI pilot project two years back at my company. That was my conclusion, among others.
Also, what MIT also told them. Literally MIT lol
Untrained dev here, but the trend I’m seeing is spec-driven development where AI generates the specs with a human, then implements the specs. Humans can modify the specs, and AI can modify the implementation.
This approach seems like it can get us to 99%, maybe.
Trained dev with a decade of professional experience, humans routinely fail to get me workable specs without hours of back and forth discussion. I’d say a solid 25% of my work week is spent understanding what the stakeholders are asking for and how to contort the requirements to fit into the system.
If these humans can’t be explict enough with me, a living thinking human that understands my architecture better than any LLM, what chance does an LLM have at interpreting them?
Thus you get a piece of software that no one really knows shit about the inner workings of. Sure you have a bunch of spec sheets but no one was there doing the grunt work so when something inevitably breaks during production there’s no one on the team saying “oh, that might be related to this system I set up over here.”
Even more efficient: humans do the specs and the implementation. AI has nothing to contribute to specs, and is worse at implementation than an experienced human. The process you describe, with current AIs, offers no advantages.
AI can write boilerplate code and implement simple small-scale features when given very clear and specific requests, sometimes. It’s basically an assistant to type out stuff you know exactly how to do and review. It can also make suggestions, which are sometimes informative and often wrong.
If the AI were a member of my team it would be that dodgy developer whose work you never trust without everyone else spending a lot of time holding their hand, to the point where you wish you had just done it yourself.
Have you used any AI to try and get it to do something? It learns generally, not specifically. So you give it instructions and then it goes, “How about this?” You tell it that it’s not quite right and to fix these things and it goes off on a completely different tangent in other areas. It’s like working with an 8 year old who has access to the greatest stuff around.
It doesn’t even actually learn, though.