will__ness's comments

will__ness · 2026-01-27T00:33:34 1769474014

> But there are serious limits. [Your coding agent] will lie to you, they don't really understand things, and they often generate bad code.

I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.

Its still engineering work. The code still matters. Its just a different tool to write the code.

I'd compare the difference between manually coding and operating a coding agent to the difference between a handsaw and a chainsaw - the end result is the same but the method is very different.

acedTrex · 2026-01-27T00:52:21 1769475141

> the end result is the same but the method is very different.

I dont think anyone really cares at all about LLM code that is the exact same end result as the hand written version.

It's just in reality the LLM version is almost never the same as the hand written version, it's orders of magnitude worse.

mishkovski · 2026-01-27T16:21:59 1769530919

So far, I haven't seen any comparison of AI (using the best available models) and hand written code that illustrates what you are saying, especially the "it's orders of magnitude worse" part.

will__ness · 2026-01-27T17:25:46 1769534746

> it's orders of magnitude worse

This is not my experience *at all*. Maybe models from like 18+ months ago would produce really bad code, but in general most coding agents are amazing at finding existing code and replicating the current patterns. My job as the operator then is to direct the coding agent to improve whatever it doesn't do well.

Cthulhu_ · 2026-01-27T12:21:08 1769516468

In the limited use cases I've used it, it's alright / good enough. But it has lots of examples (of my own) to work off of.

elzbardico · 2026-01-27T01:08:54 1769476134

But a lot of people don't think like this, and we must come to the unavoidable conclusion that the LLM code is better than what they are used to, be their own code, or from their colleagues.

nubg · 2026-01-27T00:58:50 1769475530

Speak for yourself.

acedTrex · 2026-01-27T01:09:11 1769476151

I mean yes, i am speaking for myself. I am drowning in mountains of LLM slop patches lol. I WISH people were using LLMs as "just another tool to generate code, akin to a vim vs emacs discussion."

SchemaLoad · 2026-01-27T01:49:33 1769478573

I'm so sick of being dumped 1000 line diffs from coworkers who have generated whole internal libraries that handle very complicated operations that are difficult to verify. And you just know they spend almost no time properly testing and verifying since it was zero effort to generate it all in the first place.

simplify · 2026-01-27T02:56:25 1769482585

LLMs are an amplifier. The great get greater, and the lazier get lazier.

Madmallard · 2026-01-27T09:09:14 1769504954

Considering the seeming increasing frequency of high severity bugs happening at FAANG companies in the last year I think perhaps The great getting greater is not actually the case.

phito · 2026-01-27T12:21:28 1769516488

That's assuming FAANG engineers are actually great.

Madmallard · 2026-01-27T12:53:28 1769518408

They're far more likely to be above average I would say.

salawat · 2026-01-27T20:36:55 1769546215

Above average in tolerance for immoral business models, certainly.

dns_snek · 2026-01-27T11:21:31 1769512891

I happen to think that's largely a self-delusion which nobody is immune to, no matter how smart you are (or think you are).

I've heard this from a few smart people whom I know really well. They strongly believe this, they also believe that most people are deluding themselves, but not them - they're in the actually-great group, and when I pointed out the sloppiness of their LLM-assisted work they wouldn't have any of it.

I'm specifically talking about experienced programmers who now let LLMs write majority of their code.

ekidd · 2026-01-27T12:05:49 1769515549

All on my own, I hand-craft pretty good code, and I do it pretty fast. But one person is finite, and the amount of software to write is large.

If you add a second, skilled programmer, just having two people communicating imperfectly drops quality to 90% of the base.

If I add an LLM instead, it drops to maybe 80% of my base quality. But it's still not bad. I'm reading the diffs. There are tests and fancy property tests and even more documentation explaining constraints that Claude would otherwise miss.

So the question is if I can get 2x the features at 80% of the quality, how does that 80% compare to what the engineering problem requires?

okamiueru · 2026-01-27T14:15:23 1769523323

I was somewhat surprised to find that the differentiator isn't being smart or not, but the ability to accurately assess when they know something.

From my own observations, the types of people I previously observed to be sloppy in their thought processes and otherwise work, correlates almost perfectly with those that seem most eager to praise LLMs.

It's almost as if the ability to identify bullshit, makes you critical of the ultimate bullshit generator.

will__ness · 2026-01-27T17:27:44 1769534864

This is very true. My biggest frustration is people who use LLMs to generate code, and then don't use LLMs to refine that code. That is how you end up with slop.I would estimate that as a SDE I spend about 30% of my time reviewing and refining my own code, and I would encourage anyone operating a coding agent to still spend 30% figuring out how to improve the code before shipping.

nielsbot · 2026-01-27T01:46:11 1769478371

> Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing

Lots of times I could just write it myself and be done with it

fartfeatures · 2026-01-27T01:53:49 1769478829

Sure and lots of times I can walk places. That doesn't mean bikes, cars, trains and planes aren't incredibly useful. They let me achieve things I can't in other ways for example transporting cargo without a team of people to help me. Just like AI coding.

trollbridge · 2026-01-27T03:04:03 1769483043

Yet replacing walking with cars is often cited as one of the reasons for many of society's ills.

jappgar · 2026-01-27T12:21:07 1769516467

Yet no one seriously declares motor vehicles as useless.

Groxx · 2026-01-29T19:12:55 1769713975

Many who live in sufficiently-walkable areas don't have one and are actively opposed to getting one

theshrike79 · 2026-01-28T11:37:21 1769600241

There is a middle road.

America went full car to a point where just going to the shops from the suburbs is a car drive. Crossing the ROAD needs a car in way too many places.

There are cities where you can find a shop for essentials within walking distance, bigger shops need a short to medium drive, but can be still walked to if you really want to.

worksonmine · 2026-01-27T06:02:23 1769493743

Would you still use your car if you ended up in the wrong destination half the time?

XenophileJKO · 2026-01-27T06:09:39 1769494179

Yes, because I can drive to the other end of the state in an afternoon. Then if I get lost, I can just course correct.

makapuf · 2026-01-27T07:17:25 1769498245

Generating lots of pollution, cost, jams, noise and accidents globally. Not all cities need to be made for cars, right tool for the job etc.

worksonmine · 2026-01-27T09:19:16 1769505556

Have fun getting stuck in a loop when it insists your destination exists in a place it doesn't.

stevenhuang · 2026-01-27T10:30:53 1769509853

Would you use your car if you ended up in the right destination 100% - epsilon of the time? Yes, you would.

Or do you suppose this is the best AI will ever get?

worksonmine · 2026-01-30T13:30:49 1769779849

Parent wasn't referring to a possible future, but present time. If we get AI I can trust 100% that's another discussion. For now I don't see it and I don't think LLMs are the solution to that problem, but we'll see.

nielsbot · 2026-01-27T02:00:15 1769479215

Maybe your analogy holds if driving and walking took the same amount of time.

Plus "planning, implementing, validating, and reviewing" would be a bit like walking anyway in your analogy.

solomatov · 2026-01-27T01:26:14 1769477174

>I think that really high quality code can be created via coding agents. Not in one prompt, but instead an orchestration of planning, implementing, validating, and reviewing.

Do you have any advice to share (or resources)? Have you experienced it yourself?

will__ness · 2026-01-27T17:28:12 1769534892

Here is my exact workflow: https://willness.dev/blog/claude-code-workflow

storystarling · 2026-01-27T08:31:01 1769502661

The practical limit is the latency and inference cost. A full planning and validation loop burns a lot of tokens, and waiting for that cycle breaks flow compared to just writing the code.

jappgar · 2026-01-27T12:23:57 1769516637

Only if your flow is writing the actual code.

If you flow state involves elaborating complimentary specifications in parallel, it's marvelous

mbesto · 2026-01-27T02:19:09 1769480349

> high quality code

What does high quality code look like?

> The code still matters.

How so?

will__ness · 2026-01-27T17:30:32 1769535032

Great questions. For me, high quality code is code that: 1) works (is functional, no bugs) 2) is secure (no security vulnerabilities) 3) is extendable (I can quickly and easily build new features with limited refactors)

I argue the code still matters because of these 3 reasons. If the code doesn't work, your product won't work. If its not secure, there's obvious consequences. If you can't build new features quickly, you will end up wasting money/time.

will__ness · 2026-01-15T07:26:58 1768462018

https://willness.dev

will__ness · 2026-01-04T20:18:04 1767557884

Author here. I kept hitting the same tradeoff with Claude Code: 1) move fast, ship bugs, slow down feature development due to bad code 2) manually review all code and move super slow

This post covers the workflow I landed on: 1) sub-agents for plan review and code review (each gets fresh context) 2) persistent memory across coding sessions (not just markdown files) 3) a closing session protocol that handles tests/lint/format/cleanup/commit/push/merge conflicts.

The key insight: your main agent juggles too much. Sub-agents specialize. Each starts fresh, does one job well, returns findings. Example: Code Review Sub-agent = detailed document about my exact code standards. When it spins up, it has a brand new context window and its only job is to ensure the `git diff` matches your code standards.

There's an interactive demo showing how it works.

Happy to answer questions about the setup.