Hacker Newsnew | past | comments | ask | show | jobs | submit | will__ness's commentslogin

> But there are serious limits. [Your coding agent] will lie to you, they don't really understand things, and they often generate bad code.

I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.

Its still engineering work. The code still matters. Its just a different tool to write the code.

I'd compare the difference between manually coding and operating a coding agent to the difference between a handsaw and a chainsaw - the end result is the same but the method is very different.


> the end result is the same but the method is very different.

I dont think anyone really cares at all about LLM code that is the exact same end result as the hand written version.

It's just in reality the LLM version is almost never the same as the hand written version, it's orders of magnitude worse.


So far, I haven't seen any comparison of AI (using the best available models) and hand written code that illustrates what you are saying, especially the "it's orders of magnitude worse" part.

> it's orders of magnitude worse

This is not my experience *at all*. Maybe models from like 18+ months ago would produce really bad code, but in general most coding agents are amazing at finding existing code and replicating the current patterns. My job as the operator then is to direct the coding agent to improve whatever it doesn't do well.


In the limited use cases I've used it, it's alright / good enough. But it has lots of examples (of my own) to work off of.

But a lot of people don't think like this, and we must come to the unavoidable conclusion that the LLM code is better than what they are used to, be their own code, or from their colleagues.

Speak for yourself.

I mean yes, i am speaking for myself. I am drowning in mountains of LLM slop patches lol. I WISH people were using LLMs as "just another tool to generate code, akin to a vim vs emacs discussion."

I'm so sick of being dumped 1000 line diffs from coworkers who have generated whole internal libraries that handle very complicated operations that are difficult to verify. And you just know they spend almost no time properly testing and verifying since it was zero effort to generate it all in the first place.

LLMs are an amplifier. The great get greater, and the lazier get lazier.

Considering the seeming increasing frequency of high severity bugs happening at FAANG companies in the last year I think perhaps The great getting greater is not actually the case.

That's assuming FAANG engineers are actually great.

They're far more likely to be above average I would say.

Above average in tolerance for immoral business models, certainly.

I happen to think that's largely a self-delusion which nobody is immune to, no matter how smart you are (or think you are).

I've heard this from a few smart people whom I know really well. They strongly believe this, they also believe that most people are deluding themselves, but not them - they're in the actually-great group, and when I pointed out the sloppiness of their LLM-assisted work they wouldn't have any of it.

I'm specifically talking about experienced programmers who now let LLMs write majority of their code.


All on my own, I hand-craft pretty good code, and I do it pretty fast. But one person is finite, and the amount of software to write is large.

If you add a second, skilled programmer, just having two people communicating imperfectly drops quality to 90% of the base.

If I add an LLM instead, it drops to maybe 80% of my base quality. But it's still not bad. I'm reading the diffs. There are tests and fancy property tests and even more documentation explaining constraints that Claude would otherwise miss.

So the question is if I can get 2x the features at 80% of the quality, how does that 80% compare to what the engineering problem requires?


I was somewhat surprised to find that the differentiator isn't being smart or not, but the ability to accurately assess when they know something.

From my own observations, the types of people I previously observed to be sloppy in their thought processes and otherwise work, correlates almost perfectly with those that seem most eager to praise LLMs.

It's almost as if the ability to identify bullshit, makes you critical of the ultimate bullshit generator.


This is very true. My biggest frustration is people who use LLMs to generate code, and then don't use LLMs to refine that code. That is how you end up with slop.I would estimate that as a SDE I spend about 30% of my time reviewing and refining my own code, and I would encourage anyone operating a coding agent to still spend 30% figuring out how to improve the code before shipping.

> Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing

Lots of times I could just write it myself and be done with it


Sure and lots of times I can walk places. That doesn't mean bikes, cars, trains and planes aren't incredibly useful. They let me achieve things I can't in other ways for example transporting cargo without a team of people to help me. Just like AI coding.

Yet replacing walking with cars is often cited as one of the reasons for many of society's ills.

Yet no one seriously declares motor vehicles as useless.

Many who live in sufficiently-walkable areas don't have one and are actively opposed to getting one

There is a middle road.

America went full car to a point where just going to the shops from the suburbs is a car drive. Crossing the ROAD needs a car in way too many places.

There are cities where you can find a shop for essentials within walking distance, bigger shops need a short to medium drive, but can be still walked to if you really want to.


Would you still use your car if you ended up in the wrong destination half the time?

Yes, because I can drive to the other end of the state in an afternoon. Then if I get lost, I can just course correct.

Generating lots of pollution, cost, jams, noise and accidents globally. Not all cities need to be made for cars, right tool for the job etc.

Have fun getting stuck in a loop when it insists your destination exists in a place it doesn't.

Would you use your car if you ended up in the right destination 100% - epsilon of the time? Yes, you would.

Or do you suppose this is the best AI will ever get?


Parent wasn't referring to a possible future, but present time. If we get AI I can trust 100% that's another discussion. For now I don't see it and I don't think LLMs are the solution to that problem, but we'll see.

Maybe your analogy holds if driving and walking took the same amount of time.

Plus "planning, implementing, validating, and reviewing" would be a bit like walking anyway in your analogy.


>I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.

Do you have any advice to share (or resources)? Have you experienced it yourself?



The practical limit is the latency and inference cost. A full planning and validation loop burns a lot of tokens, and waiting for that cycle breaks flow compared to just writing the code.

Only if your flow is writing the actual code.

If you flow state involves elaborating complimentary specifications in parallel, it's marvelous


> high quality code

What does high quality code look like?

> The code still matters.

How so?


Great questions. For me, high quality code is code that: 1) works (is functional, no bugs) 2) is secure (no security vulnerabilities) 3) is extendable (I can quickly and easily build new features with limited refactors)

I argue the code still matters because of these 3 reasons. If the code doesn't work, your product won't work. If its not secure, there's obvious consequences. If you can't build new features quickly, you will end up wasting money/time.



Author here. I kept hitting the same tradeoff with Claude Code: 1) move fast, ship bugs, slow down feature development due to bad code 2) manually review all code and move super slow

This post covers the workflow I landed on: 1) sub-agents for plan review and code review (each gets fresh context) 2) persistent memory across coding sessions (not just markdown files) 3) a closing session protocol that handles tests/lint/format/cleanup/commit/push/merge conflicts.

The key insight: your main agent juggles too much. Sub-agents specialize. Each starts fresh, does one job well, returns findings. Example: Code Review Sub-agent = detailed document about my exact code standards. When it spins up, it has a brand new context window and its only job is to ensure the `git diff` matches your code standards.

There's an interactive demo showing how it works.

Happy to answer questions about the setup.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: