My experience with LLm-based chat is so different from what the article (and some friends) describe.
I use LLM chat for a wide range of tasks including coding, writing, brainstorming, learning, etc.
It’s mostly right enough. And so my usage of it has only increased and expanded. I don’t know how less right it needs to be or how often to reduce my usage.
Honestly, I think it’s hard to change habits and LLM chat, at its most useful, is attempting to replace decades long habits.
Doesn’t mean quality evaluation is bad. It’s what got us where we are today and what will help us get further.
My experience is anecdotal. But I see this divide in nearly all discussions about LLM usage and adoption.
Honestly this is why your experience is different: your expectations are different (and likely lower). I never find they are "mostly right enough", I find they are "mostly wrong in ways that range from subtle mistakes to extremely incorrect". The more subtly they are wrong, the worse I rate their output actually, because that is what costs me more time when I try to use them
I want tools that save me time. When I use LLMs I have to carefully write the prompts, read and understand, evaluate, and iterate on the output to get "close enough" then fix it up to be actually correct.
By the time I've done all of that, I probably could have just written it from scratch.
The fact is that typing speed has basically never been the bottleneck for developer productivity, and LLMs basically don't offer much except "generate the lines of code more quickly" imo
It's also what you're writing. The GP's commenter's bio shows they're a product lead, not a full-time software developer. To make some broad assumptions about what kind of code they're talking about: using an LLM for "write me a Python script that queries the Jira API for all tickets closed in the past week" is a much different task from "change the code in our 15 year old in-house accounting software to handle these tariffs", both in terms of the code that gets written as well as the consequences of the LLM getting it wrong.
To be clear this isn't a knock on anyone's work, but it does seem to be a source of why "pro-LLM" and "anti-LLM" groups tend to talk past each other.
Sure, but in both cases you are running a real risk of producing incorrect data
If you're a product lead and you ask an LLM to produce a script that gets that output, you still should verify the output is correct
Otherwise you run a real risk of seeming like an idiot later when you give a report on "tickets closed in the past week" and your data is completely wrong. "Why hasn't John closed any tickets this week? Is he slacking off?"... "What he closed more tickets than anyone..." And then it turns out that the unreliable LLM script excluded him for whatever reason
Of course I understand that people are not going to actually be this careful, because more and more people are trusting LLM output without verifying it. Because it's "right enough" that we are becoming complacent
You're absolutely right. You need to verify the script works, and you need to be able to read the code to see what it's actually doing and if it passes the smell test (as a sibling commenter said, the same way you would for a code snippet off StackOverflow). But ultimately for these bits which are largely rote "take data from API, transform into data format X" tasks, LLMs do a great job getting at least 95% of the way there, in my experience. In a lot of ways they're the perfect job for LLMs: most of the work is just typing (as in, pressing buttons on a keyboard) and passing the right arguments to an API, so why not outsource that to an LLM and verify the output?
The challenge comes when dealing with larger systems. Like an LLM might suggest Library A for accomplishing a task, but if your codebase already has Library B for that already, or maybe Library A but a version from 2020 with a different API, you need to make judgment calls about the right approach to take, and the LLM can't help you there. Same with code style, architecture, how future-proof-but-possibly-YAGNI you want your design to be, etc.
I don't think "vibe coding" or making large changes across big code bases really works (or will ever really work), but I do think LLMs are useful for isolated tasks and it's a mistake to totally dismiss them.
> so why not outsource that to an LLM and verify the output?
I mean sure, why not. My argument isn't that it doesn't work, it's that it doesn't really save time
If you try to have it do big changes you will be swamped reviewing those changes for correctness for a long time while you build a mental model of the work
If you have it do small changes, the actual performance improvement is marginal at best, because small changes already don't take much time or effort to create
I really think that LLM-coding has largely just shifted "time spent typing" to "time spent reviewing"
Yes, past a certain size reviewing is faster than typing. But LLMs are not producing terribly good output for large amounts of code still
I disagree that it doesn't save time for some classes of problems.
As a concrete recent example, I had to write a Python script which checked for any postgres tables where the primary key was of type 'INT' and print out the max value of the ID for each table. I know broadly how to do this, but I'd have to double check which information_schema table to use, the right names of the columns to use, etc. Plus a refresher on direct use of psycopg2 and the cursor API. Plus the typing itself. I just put that query into an LLM and it gave me exactly what I needed, took about 30-60 seconds total. Between the research and typing that's easily 10 minutes saved, maybe closer to 20 really.
And I mean, no, this example isn't worth the $10 trillion or whatever the economy thinks AI is worth, but given that it exists, I'm happy to take advantage of it.
And that's a problem with the workflow, not a problem with the LLM.
It's no different than verifying the information from your Google search or the Stack Overflow answer you found works. But for some reason there are people that have higher expectations of LLM output.
Having poked at a few database queries with subtle errors that compounded with a flawed understanding resulting in wildly incorrect conclusions, [a realistic expansion of] “write me a Python script that queries the Jira API for all tickets closed in the past week” is exactly the place where I expect those fuckups to come from.
They save me a tremendous amount of time, you just need to be smart about what you try to get them to do. _Busy work_ is what you want to focus on, not anything that takes a ton of domain knowledge and intelligence.
Just as an example from today, i had a huge pile of yaml documents that needed to have some transformations done to them -- they were pretty simple and obvious, but I just went into cursor, give it a before and after and a few notes, and it wrote a python script in less than 10 seconds that converted everything exactly the way I needed. Did it save me a day of work? Probably not, but probably an hour or so of looking up python docs and iterating until i worked out all the syntax errors myself? An hour here and an hour there adds up to a _lot_ of saved time.
I spent more time just writing this comment then I did asking cursor to write and run that script for me.
Other things I had an LLM do for me just _today_ is fix a github action that was failing, and knock out a developer readme for a helm chart documenting what all the values do -- that's one of the kinds of things where it gets a lot of stuff wrong, but typing speed _is_ the bottleneck. It took me a minute or so to fix the stuff it misunderstood, but the formatting and the bulk of it was fine.
Isn't the article saying it's mainly useful for SW?
I'm an electrical engineer and the only cases LLMs useful were developing phyton scripts or translating a text into a foreign language that I'm fluently speaking.
They are absolutely garbage for anything electrical engineering related, even coding RTL.
This. I use LLMs for some tasks, but for more complex issues, I do it myself. I tried to use it for a project by defining each task as clearly as possible, and I spent weeks trying to come up with something useful. Mind you, I achieved 80% of what I wanted after iterating and "telling" the chat that their answers were wrong, and going over the code to double-check if everything was okay. Now I use it for specific, simple tasks if these are work-related, and then use it for random kinds of stuff that I can verify by going to the actual source.
> Mind you, I achieved 80% of what I wanted after iterating and "telling" the chat that their answers were wrong, and going over the code to double-check if everything was okay
I very often read things like this, and I'm surprised how often the person estimates "around 80%" of the work was good. It feels so perfectly tailored to the Pareto Principal
The LLM does the easy 80% (which we usually say takes 20% of the time anyways). Then the human has to go do the harder remaining 20%, only with a much smaller mental model of how the original 80% is fitting together
From what I can tell, rather than a simple difference in expectation (which could explain your positive experience vs others), it seems to be a "comfort within uncertainty" difference that, from what I can tell, is a personality trait!
You're comfortable with the uncertainty, and accommodate it in your use and expectations. You're left feeling good about the experience, within that uncertainty. Others are repelled by uncertainty, so will have a negative experience, regardless of how well it may work for a subset of tasks they try, because that repulsive uncertainty is always present.
I think it would be interesting (and possibly very useful/profitable for the marketing/UI departments of companies that use AI) to find the relation between perceived AI usefulness and the results of some of the "standard" personality tests.
It's not comfort with uncertainty, it's discomfort with the predictable effects of uncertainty.
I don't want to have to waste time tidying up after an unreliable software tool which is being sold as saving me time. I don't want to be misled by hallucinated fantasies that have no relationship to reality. (See also - lawyers getting laughed out of courtrooms because of this.)
I don't want to have to cancel a travel booking because an AI agent booked me a holiday in Angkor Wat when I wanted a train ticket to Crystal Palace in South London.
Hypotheticals? Not even slightly. Ask anyone who's lost their KDP author account on Amazon or been locked out of Meta because of AI moderation errors.
This is common sense, not some kind of personality flaw.
I'm happy using LLMs for coding and research, but it's also clear the technology is in perpetual beta - at best - and is being wildly oversold.
Normal software operating with this level of reliability would be called "very buggy."
But apparently LLMs get a pass because one day they might not be as buggy as they are today.
Which - if you think about it - is ridiculous, even by the usual standards of the software industry.
I wonder if this is like dishwasher usage. As a kid growing up we never used the dishwasher. It was just the drying rack. The reason was you had to rinse off the big stuff anyways, and then the resulting quality of dishwashing was poor in it. You'd often get a fork with rice stuck between it still, which was unacceptable.
As a grown up now I use a dishwasher for everything that is permitted to go in it. I still have to rinse off plates first, and occasionally I do see rice between a fork that I have to then clean manually. But I'm not comfortable knowing that it won't clean as well as I could by hand, but it does a good enough job -- and in some ways a much better job (it uses much hotter water than I do by hand). I don't know if my mom could ever really be comfortable with it though.
This is a funny example since, for a long time anyway, dishwashers have been much better at actually sanitizing dishes due to the much higher temperatures that can be used vs hand washing. I don't feel like hand washed dishes are truly clean. Oh you rubbed it with a nasty dish rag and water cool enough to touch? greeeeaaaaat
It's still egregious because the main theme is "Learn how to work with AI so you won't be left behind in the future!" The analogy in that case is to waste time pointlessly learning the quirks of old dishwashers while new dishwashers won't have them in the future.
It's fine if LLMs are used casually, for things that don't affect anyone but the user. But when someone plugs an LLM into Social Security or other governmental bodies to take action on real human beings, then disaster awaits. Nobody is going to care if the LLM got it wrong if you're just chatting with it or writing some wonky code that doesn't matter in the real world, but when your government check is reduced or deleted by an LLM that is hallucinating, then the real problems start. These things should not be trusted with anything but the least consequential actions an individual would use it for.
^This - we're trying to use one to partially automate some system engineering type activities.
It's great for reviews where any given reviewer could be expected to have a misunderstanding of certain details or skip a section (RAG somewhat helps this) - but it's frustrating for artifact generation where missing details cascade through the project.
As great as the technology (right now) it seems so far from reliable business process automation.
Charitably, your low expectations are probably the source of your finding them acceptable.
It’s also possible - and you should not take this as an insult, it’s just the way it is - you may not know enough about the subjects of your interactions to really spot how wrong they are.
However the cases you list - brainstorming - don’t really care about wrong answers.
Coding is in the eye of the beholder, but for anything that isn’t junk glue code, scripts or low-complexity web stuff, I find the output of LLMs just short of horrendous.
The code that the best frontier models produce is definitely good if you prompt it with what you believe "good" means, with the caveat that code quality depends heavily on the language -- Python, Typescript/Javascript, Java and C are quite good, Rust, C++ and Go tend to be decent to weak depending on the specific model, and other languages are poor.
The C output is absolutely terrible. I cannot fathom an experienced C coder who has found otherwise for anything non trivial. The code is full of things like return from stack, poor buffer size discipline, etc.
Yeah, I've had mixed results with Rust. Oddly it's been most helpful for me so far in getting Rust code running in WASM without having to know anything about WASM, which I have found delightful.
I really don't understand people who are down on LLM.
In terms of code output. I have gone from the productivity of being a Sr. Engineer to a team with .8 of a Sr. Engineer, 5 Jr. Engineers and One dude solely dedicated to reading/creating documentation.
Unlike a lot of my fellow engineers who are also from traditional CS backgrounds and haven't worked in revenue restricted startup environments, I also have been VERY into interpreted languages like ruby in the past.
Now compiled languages are even better, I think from a velocity perspective compiled languages are now incredibly on par for prototyping velocity and have had their last weakness removed.
It's both exciting and scary, I can't believe how people are still sleep walking in this environment and don't realize we are in a different world. Once again the human inability to "gut reason" about exponentials is going to screw us all over.
Within the population that writes code there are a small number of successful people who approach the topic in a ~purely mathematical approach, and a small number of successful people that approach writing code in a ~purely linguistic approach. Most people fall somewhere in the middle.
Those who are on the MOST extreme end of the mathematic side and are linguistically bereft HATE LLM's and effectively cannot use them.
My guess is that HN population will tend to show stronger reactions against LLM's because it was heavily seeded with functional programmers which I think has a concentration of the successful extremely math focused. I worked for several years in a purely functional shop and that was my observation: Elixir, Haskell, Ramda.
There is this interesting thing called the Paradox of Automation where increasing automation increases the importance of human intervention. We are trying this out on a societal level. It will be.. interesting, to say the least.
Also, congratulations on becoming a team. I sure hope you have the mental bandwidth to check all that output carefully. If so, doubly congrats, because you might be the smartest human that ever lived.
I appreciate you're incredulity and snark! Dismissing without engagement is a fun ability to exercise. I look forward to talking past each other going forward :-)
HackerNews typically doesn't appreciate and will ban accounts for that type of engagement as it is just personal and not a factual wrestling with the point of discussion, I see you are new here and I would encourage you to not continue to engage in the patterns you show.
At core, I think perhaps we have a different interpretation of what 20% of a Sr. Engineer can accomplish and what Jr. Devs are capable of accomplishing.
To be fair to your point, I think one of the enablers is that I actually enjoy working longer hours now so my net time engaging with code has gone up as well.
But I'm from the old school and I've always preferred time in code vs having outside hobbies, that's been true since the 90s.
I find code reviews relaxing and enjoyable and not particularly mentally taxing for 90% of what a decent jr. dev writes. I find it a nice little break from working on problems that can actually be classified as "hard".
Coincidentally, I've worked in human in the loop automation for quite a long time, making Sr. individuals more efficient with their time and removing busy work has been a big focus.
There is a lot in that space to consider from a human factors perspective, the intersection of creation vs editing is a big one, decomposing problems for sure, each individual seems to have different capabilities and natural bents in that regard. I've long been a thought dump and edit person and that's part of what I attribute my high personal productivity to.
I confess I might be showing signs of unlawful thought patterns. I will correct that, fellowniusmonk. Thanks for pointing that out.
I am in the "code is not an asset, it's a liability"-camp and our recently acquired ability to swiftly defecate metric tons of it is not something I am particularly thrilled about. In fact, I find "senior" engineers using LoC as a productivity metric highly suspect - at best. I thought we passed that phase a decade or two ago. Not saying you are one, but in the spirit of talking past each other I thought it prudent to put up a good straw man.
All in all to be completely honest I find it hard to parse your original point so I concur I wasn't engaging properly. To be fair you opened with "in terms of code output" so that's what triggered me I guess.
> Those who are on the MOST extreme end of the mathematic side and are linguistically bereft HATE LLM's and effectively cannot use them.
This is an interesting observation. It at least aligns with my experience. I wouldn't say I'm "linguistically bereft" lol, but I do lean more toward the "functional programming is beautiful" side. I even have a degree in math. I'm not totally down on LLM coding, but I do fall more on unfavorable feelings side. I mostly just hate the idea of having a bunch of code I don't fully understand, but also am responsible for.
I do use them, and find them helpful. But the idea of fully giving control of my codebase to LLM agents, like some people are suggesting, repels me.
Yeah, I certainly don't mean to imply that's the only reason. There are MANY reasons to hate LLMs and people all up and down the spectrum hate them for any number of reasons. I definitely think utility is still language specific as well (LLMs are just terrible with some languages), project specific, etc.
I think currently there are prompts and approaches that help ensure functions stay small and easy to reason about but it's very context dependent. Certainly any language or framework that has large amount of boilerplate will be less painful to work with if you hate boilerplate, I think that could arguably be increasing enshitification though in a sense. The people who say tons of code is being generated and it will all come crashing down in an unmaintainable mess... I do kinda agree.
I'm glad I am not writing code in medical/flight control systems or something like that, I think LLMs can be used in that context but idk if they would save or increase time?
Certain types of tasks require greater precision. Like in working with wood, framing a house is fine but building a dovetailed cabinet drawer is not on the table if that makes sense?
My impression is that at this point work in high precision environments is still in the human domain and LLMs are not. Multi-agent approaches maybe, treating humans like the final agent in multi-agent approaches, maybe, idk, I'm not working on any life or death libraries or projects ATM but I do feel good about test coverage so maybe that's good enough in a lot of cases.
People who say non-devs can dev with ai or cursor, I think at this point that's just a way of getting non-technical people to burn tokens and give them more money, but idk if that will be true in six months you know?
In my space, "mostly right enough" isn't useful. Particularly when that means that the errors are subtle and I might miss them. I can't write whitepapers that tell people to do things that would result in major losses.
IMHO it's a great summarizing search engine. I now don't have to click on a link to go to that original source - Gemini just hands me a useful summary. Ask AI to do something specific that requires GI (General Intelligence) your milage may vary. So as OpenAI and Google suck in all your content (creators) you are going to find yourself derive less and less revenue generated by visits to your site. Just sayin.
Gemini routinely inaccurately reports the contents in the summary. I have found it actually reversing things on a regular basis. The summary says no and the source says yes.
DuckDuckGo, which uses Bing I think, now has Bing's AI summaries instead of the goddamn content in search results, which makes evaluating the search results at a glance useless!
For what it's worth, we produce our own summaries, and you can turn them off if you don't like them. We also offer noai.duckduckgo.com, which turns all of our AI features off automatically.
I use LLM chat for a wide range of tasks including coding, writing, brainstorming, learning, etc.
It’s mostly right enough. And so my usage of it has only increased and expanded. I don’t know how less right it needs to be or how often to reduce my usage.
Honestly, I think it’s hard to change habits and LLM chat, at its most useful, is attempting to replace decades long habits.
Doesn’t mean quality evaluation is bad. It’s what got us where we are today and what will help us get further.
My experience is anecdotal. But I see this divide in nearly all discussions about LLM usage and adoption.