> But there are serious limits. [Your coding agent] will lie to you, they don't really understand things, and they often generate bad code.
I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.
Its still engineering work. The code still matters. Its just a different tool to write the code.
I'd compare the difference between manually coding and operating a coding agent to the difference between a handsaw and a chainsaw - the end result is the same but the method is very different.
So far, I haven't seen any comparison of AI (using the best available models) and hand written code that illustrates what you are saying, especially the "it's orders of magnitude worse" part.
This is not my experience *at all*. Maybe models from like 18+ months ago would produce really bad code, but in general most coding agents are amazing at finding existing code and replicating the current patterns. My job as the operator then is to direct the coding agent to improve whatever it doesn't do well.
But a lot of people don't think like this, and we must come to the unavoidable conclusion that the LLM code is better than what they are used to, be their own code, or from their colleagues.
I mean yes, i am speaking for myself. I am drowning in mountains of LLM slop patches lol. I WISH people were using LLMs as "just another tool to generate code, akin to a vim vs emacs discussion."
I'm so sick of being dumped 1000 line diffs from coworkers who have generated whole internal libraries that handle very complicated operations that are difficult to verify. And you just know they spend almost no time properly testing and verifying since it was zero effort to generate it all in the first place.
Considering the seeming increasing frequency of high severity bugs happening at FAANG companies in the last year I think perhaps The great getting greater is not actually the case.
I happen to think that's largely a self-delusion which nobody is immune to, no matter how smart you are (or think you are).
I've heard this from a few smart people whom I know really well. They strongly believe this, they also believe that most people are deluding themselves, but not them - they're in the actually-great group, and when I pointed out the sloppiness of their LLM-assisted work they wouldn't have any of it.
I'm specifically talking about experienced programmers who now let LLMs write majority of their code.
All on my own, I hand-craft pretty good code, and I do it pretty fast. But one person is finite, and the amount of software to write is large.
If you add a second, skilled programmer, just having two people communicating imperfectly drops quality to 90% of the base.
If I add an LLM instead, it drops to maybe 80% of my base quality. But it's still not bad. I'm reading the diffs. There are tests and fancy property tests and even more documentation explaining constraints that Claude would otherwise miss.
So the question is if I can get 2x the features at 80% of the quality, how does that 80% compare to what the engineering problem requires?
I was somewhat surprised to find that the differentiator isn't being smart or not, but the ability to accurately assess when they know something.
From my own observations, the types of people I previously observed to be sloppy in their thought processes and otherwise work, correlates almost perfectly with those that seem most eager to praise LLMs.
It's almost as if the ability to identify bullshit, makes you critical of the ultimate bullshit generator.
This is very true. My biggest frustration is people who use LLMs to generate code, and then don't use LLMs to refine that code. That is how you end up with slop.I would estimate that as a SDE I spend about 30% of my time reviewing and refining my own code, and I would encourage anyone operating a coding agent to still spend 30% figuring out how to improve the code before shipping.
Sure and lots of times I can walk places. That doesn't mean bikes, cars, trains and planes aren't incredibly useful. They let me achieve things I can't in other ways for example transporting cargo without a team of people to help me. Just like AI coding.
America went full car to a point where just going to the shops from the suburbs is a car drive. Crossing the ROAD needs a car in way too many places.
There are cities where you can find a shop for essentials within walking distance, bigger shops need a short to medium drive, but can be still walked to if you really want to.
Parent wasn't referring to a possible future, but present time. If we get AI I can trust 100% that's another discussion. For now I don't see it and I don't think LLMs are the solution to that problem, but we'll see.
>I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.
Do you have any advice to share (or resources)? Have you experienced it yourself?
The practical limit is the latency and inference cost. A full planning and validation loop burns a lot of tokens, and waiting for that cycle breaks flow compared to just writing the code.
Great questions. For me, high quality code is code that:
1) works (is functional, no bugs)
2) is secure (no security vulnerabilities)
3) is extendable (I can quickly and easily build new features with limited refactors)
I argue the code still matters because of these 3 reasons. If the code doesn't work, your product won't work. If its not secure, there's obvious consequences. If you can't build new features quickly, you will end up wasting money/time.
Author here. I kept hitting the same tradeoff with Claude Code:
1) move fast, ship bugs, slow down feature development due to bad code
2) manually review all code and move super slow
This post covers the workflow I landed on:
1) sub-agents for plan review and code review (each gets fresh context)
2) persistent memory across coding sessions (not just markdown files)
3) a closing session protocol that handles tests/lint/format/cleanup/commit/push/merge conflicts.
The key insight: your main agent juggles too much. Sub-agents specialize. Each starts fresh, does one job well, returns findings.
Example: Code Review Sub-agent = detailed document about my exact code standards. When it spins up, it has a brand new context window and its only job is to ensure the `git diff` matches your code standards.
I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.
Its still engineering work. The code still matters. Its just a different tool to write the code.
I'd compare the difference between manually coding and operating a coding agent to the difference between a handsaw and a chainsaw - the end result is the same but the method is very different.
reply