Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use LLM-based autocomplete in my IDE, and it’s not taking away my job unless/until it improves by multiple orders of magnitude. It’s good at filling in boilerplate, but even for that I have to carefully check its output because it can make little errors even when I feel like what I want should be obvious. The article is absolutely correct in saying you have to be critical of its output.

I would say it improves my productivity by maybe 5%, which is an incredible achievement. I’m already getting to where coding without it feels very tedious.



I find it increases my productivity about 5-10% when working with the technologies I'm the most familiar with and use regularly (Elixir, Phoenix, JavaScript, general web dev.) But when I'm doing something unfamiliar and new, it's more like 90%. It's incredible.

Recently at work, for example, I've been setting up a bunch of stuff with some new technologies and libraries that I'd never really used before. Without ChatGPT I'd have spent hours if not days poring through tedious documentation and outdated tutorials while trying to hack something together in an agonising process of trial and error. But ChatGPT gave me a fantastic proof-of-concept app that has everything I needed to get started. It's been enormously helpful and I'm convinced it saved me days of work. This technology is miraculous.

As for my job security... well, I think I'm safe for now; ChatGPT sped me up in this instance but the generated app still needs a skilled programmer to edit it, test it and deploy it.

On the other hand I am slightly concerned that ChatGPT will destroy my side income from selling programming courses... so if you're a Rails developer who wants to learn Elixir and Phoenix, please check out my course Phoenix on Rails before we're both replaced by robots: PhoenixOnRails.com

(Sorry for the self promotion but the code ELIXIRFORUM will give a $10 discount.)


The thing is the hallucinations, I also wasted few hours trying to work on solutions with GPT where it just kept making up parameters and random functions.


So much this. The thing hallucinates far more than the hyperventilation seems willing to acknowledge.

You really need to be quite competent in the thing you're asking it to do in order to ferret out the hallucinations, which greatly diminishes the potency of GPT in the hands of someone who has no knowledge of the relevant language/runtime/problem domain/etc.


Hallucination is less a problem for programming compared to other use case, because finally program must be run.


Not if the hallucination introduces runtime errors that can't be identified a priori with any sort of static analysis or compilation/interpreting stage.

But no, you're fundamentally right. It just goes to the question of whether an LLM assistant can in any sense replace or displace human programmers, or save time for human programmers. The answer seems to be somewhat, and in certain cases, but not much else.

If I already know the technology I'm querying GPT about, I'm going to spend at least some time identifying its hallucinations or realising that it introduced some. I might have been better off just doing it myself. If I don't know the technology I'm querying GPT about, I'm going to be impacted by its hallucinations but will also have to spend time figuring out what the hallucinations are and why this unfamiliar code sample doesn't work.


I had my colleague had troubles getting an email from Google docs into listmonk.

She asked gpt to help get an html version since apparently she got stuck with the wysiwg editor.

However gpt gave back a full html structure, including head and body. Pasting that into listmonk breaks entire webpage. Then she freaked out and told me listmonk sucks :)


It's a huge problem on many levels, but in this case it so much more time intensive. Diminishing it's use.


Try paying for GPT-4 - it barely hallucinates at all, at least as far as I've noticed.


I use GPT-4 it for sure does if you do a bit different things off the beaten path


It referenced a made up a function I needed (that should probably exist lol) in BrightScript, the letdown after realizing as much was painful.


It did the same thing to me with docker compose the other day.

For features that probably should exist but don't it does a really good job of sending you on a wild goose chase.


This has happened to me with Django/DRF twice. I've just accepted that it's more efficient to read and internalize the documentation.


Did you try asking it to write the function?


we should probably implement a lot of the hallucinated methods - consider them the obvious missing pieces of our APIs


This is the WORST feeling if you use co-pilot in an IDE. It's so incredibly disappointing.


There's a lot of things which could be done to improve this:

1) It could use the JSONformer idea [0] where we have a model of the language which determines what are the valid next tokens; we only ask it to supply a token when the language model gives us a choice, and when considering possible next tokens, we immediately ignore any which are invalid given the model. This could go beyond mere syntax to actually considering the APIs/etc which exist, so if the LLM has already generated tokens "import java.util.", then it could only generate a completion which was a public class (or subpackage) of "java.util.". Maybe something like language servers could help here.

2) Every output it generates, automatically compile and test it before showing it to the user. If compile/test fails, give it a chance to fix its mistake. If it gets stuck in a loop, or isn't getting anywhere after several attempts, fall back to next most likely output, and repeat. If after a while we still aren't getting anywhere, it can show the user its attempts (in case they give the user any idea).

[0] https://github.com/1rgs/jsonformer


Integration with linters is going to be the next stage in generative coding.

It should suggest, lint the suggestion in the background, and if it passes offer the suggestion and if not provide the linting issues output to rework the suggestion.

In general, token costs going down will in turn increase the number of multi-pass generation systems over single-pass systems, which is going to improve dramatically.

Combine all that with persistent memory storages that can provide in-context additional guidance around better working with your codebase and you, and it's going to be quite a different experience than it is today.

And at the current rate of advancement, that's maybe going to be how things will look within a year or two.


> It should suggest, lint the suggestion in the background

This makes a big difference, I'm making code writing stuff at the moment.

Injecting results from a language server while it's generating would be huge imo - same as giving humans autocomplete & hints.


You wouldn’t believe what you can get past a linter. You need test cases that cover the intention of the code, but I‘ve also seen well tested code behave totally counter to its purpose.


Yeah it can give you something that works out of the box, but fixing it requires even more effort

Better to ask it for a bunch of small things and piece them together


Yes. Start small and build up.

I’ve found it to be very forgetful and have to work function-by-function, giving it the current code as part of the next prompt. Otherwise it randomly changes class names, invents new bits that weren’t there before or forgets entire chunks of functionality.

It’s a good discipline as I have to work out exactly what I want to achieve first and then build it up piece by piece. A great way to learn a new framework or language.

It also sometimes picks convoluted ways of doing things, so regularly asking whether there’s a simpler way of doing things can be useful.


IIRC its "memory" (actually input size, it remembers by taking its previous output as input) is only about 500 tokens, and that has to contain both your prompt and the beginning of the answer to hold relevance towards the end of its answer. So yes, it can't make anything bigger than maybe a function or two with any consistency. Writing a whole program is just not possible for an LLM without some other knowledge store for it to cross reference, and even then I have my doubts.


This isn't quite accurate.

GPT3.5 is 4k tokens and has a 16k version GP4 is 8k and has a 32k version.

You are correct that this needs to account for both input and output. I suspect that when you feed chat gpt longer it prompts, it may try to use the 16k / 32k models when it makes sense.


Were you using GPT-3.5 or GPT-4?

GPT-4 reduces hallucinations by at least an order of magnitude, and hasn't failed me yet.


This is my experience too. Paying $20/month for GPT-4 has been absolutely worth it. It barely hallucinates at all; the results aren't always perfect (and the September 2021 knowledge cut-off can be frustrating given how quickly things get out of date in the programming world) but it's more than good enough. I don't remember how I ever got by without it.


You could save some money by using GPT-4’s API and a self hosted frontend like YakGPT.


you can also just use OpenAI’s playground.


What's the problem with hallucinations when your editor can tell you automatically if the code compiles or not?


Sometimes the hallucinations compile.


It's nice that we've taught the robots to make off-by-one errors just like a real developer.


> Sometimes the hallucinations compile.

In that case they become complications.


Have it write the unit tests first.


Fuck it, ship it!


This is what ChatGPT and GPT4 are good for, iterating quickly in an unfamiliar ecosystem. Picking up frameworks now feels like a ChatGPT superpower. It doesn't remove reasoning and I've seen some scary bugs introduced if you're not really carefully monitoring what the AI is outputting.

Basically, these days before I dig into documentation I ask "How do I do X with Y framework in Language Z" and if it's pre-2021 tech it works amazingly well.


Especially when you know something similar. Like porting between front-end frameworks. Just sketch out some React code and ask it to port to Vue - you can even tell it to explain the Vue code line-by-line and ask follow up questions, ex "Oh, so $FEATURE is like hooks in React?" "Yes, but ..."


Funnily enough I find the opposite, its most effective for me when using something familiar (though nowhere near 90%). If I'm familiar with it, I can figure out pretty quickly whats a hallucination and whats not, and to what extent it is (sometimes its just a few values that need changing, sometimes its completely wrong with almost no basis in reality). The time I spend attempting to fix its output in unfamiliar territory makes it more of a pain than its worth for me


I agree with 5%. That said, I've found rubber duck debugging to be an exceptionally effective use case for ChatGPT. Often it will surprise me by pinpointing the solution outright, but I'll always be making progress by clarifying my own thinking.


Yeah, it's an amazing rubber duck.

Even in the IDE I'll sometimes just write comments like (arbitrary example out of thin air):

// Q: Should we use a for loop or a while loop here? // A:

It doesn't always have a great answer, but as you say, it almost always helps my own thinking about it, which is often much more valuable.


Fascinating! Can I ask how you use ChatGPT for debugging? are the bugs you've used it with more high level, "this is what's happening" kind of things? Or could you give an example?


It's similar to how you would describe a problem to a coworker on Slack. I give it some context, then I state the problem or paste in the error message/stacktrace. I might also list steps that I've taken already. Then I follow ChatGPT's suggestions to troubleshoot. Sometimes I need to supplement with my own ideas, but usually that's enough to iteratively bisect the issue.


just give it the log output or error code and your code then in alot of cases it comes up with decent solutions


Given AI (today) has no direct agency and can’t create anything unless directly prompted, and engineering is largely domain discovery and resolving of unforeseen edges in a domain, I don’t think we are going to see a time where generative AI alone is able to be more than an assistant. It’ll likely improve, but given it only can react to what it is given/told/fed and inherently can’t innovate or create or discover, despite the illusion that it might be from the position of the users ignorance of details the AI can produce, it’ll be an increasingly powerful adjunct to increasingly capable engineers. The problems we solve will be more interesting, we will produce better software faster, but I’ve never seen the world as lacking in problems to solve but rather capacity to solve them well or quickly enough given the iterative time it takes to develop software. I think this current trend of generative AI will help improve that situation, but will likely make software engineers even more in demand as the possible uses of software become more ubiquitous as the per unit cost of development goes down.


Best way to check LLM output is to make it write its own tests and do TDD. Obviously someone has to check the tests but that is a 1% of the effort problem.


One percent? Are you really suggesting with a straight face that generated code could provide the other ninety-nine percent? If not, say what you actually mean. Don't bullshit us with trash numbers.


Let me guess, ENTJ right?


No, I was just drinking at the time (so for a bit I may have been an alcohol induced ENTJ)... My reply was a little rude. I apologize for that. That wasn't a productive way for me to express my disagreement, and I should have chosen my words more thoughtfully.


I've also been using an LLM autocomplete for a few months, and yeah, it's pretty nice. My spouse was able to use it to write an Easter egg into a game while I was doing housework the other day.


It writes my unit tests super fast, and my method comments

Its hard to say if it improves my productivity because I just wouldn’t have done those things

But for the overall applications I think its improved a lot because we can implement best practices more consistently and catch regressions due to the aforementioned unit tests and documentation


Oh, it will improve by several orders of magnitude.

But even then, it's not 'replacing' you.

It's just going to let you spend less time on BS and more time on the things that are your maximal value contributions to a project.

If I had a dozen junior or mid level devs you could hand work off to, would that save you time? Would you kick back and not review what they were doing, particularly around business critical parts of the software?

The conversation around AI has become obscenely binary, pulling from (now obsolete) SciFi influences to cast it as humans vs machines.

But it's a false dichotomy. Collaborative efforts are almost certainly where this is going, and 100% human or 100% AI will both be significantly inferior to a mix of both.


For sure it will still mostly make sense to have a division of labour where you have people who are focused on building software.

The question is if generative AI is powerful enough to reduce the number of programmers needed to achieve a task, without creating enough opportunities to replace those programmers.

Before we are all replaced there could be a moment where demand for software engineers is 10x less.


Society would simply demand more capable and complex software. Specialized industrial applications that currently look like windows 98 java apps would be expected to be as polished as iOS.


I don't think there is some natural law that dictates we will need enough new software that we will always increase demand in the face of efficiency gains.

For industrial applications in particular they need to be functional and operable, not shiny.


I think the real problem is going to be increased volatility in the work market. You get a chaotic situation in which the bullet that strikes you is the one you would have never guessed. For example, it could be that short term, the increased productivity squeezes workers in every industry and the concern becomes increased competition. You aren't getting replaced by AI, you're getting replaced by someone who out-competed you.

The market may adjust over the longer term, or it may just continue to be volatile as the rate of change accelerates. In that case, we can't fix the work market, and we instead have to address the need for people to feed themselves another way.


"Very tedious without it" doesn't sound like just 5% improvement?

I've started developing in a new language and I can hardly do any work without the LLM assistance, the friction is just too high. Even when auto-competitions are completely wrong they still get the ball rolling, it's so much easier to fix the nicely formatted code than to write from scratch. In my case the improvement is vast, a difference from slacking off and actually being productive.


Agreed 100%. It's helpful at filling out some functions maybe if you name them correctly, and boiler plate code. Eventually, they will get better, because these things get orders better with orders more scale. Society has to do something about all the jobs at that point, but we'll hopefully get a sense of how close/far is that, with ChatGPT 5, and the next versions coming up.


The biggest benefit, I’ve found, is it makes me comment my code. If I can make the AI understand what I want, then it turns out that three months later I’ll also be able to understand the code.


That's the worst part about generative AI IMO - it makes writing new code faster - it barely helps with editing existing code. So when someone eventually updates the code and forgets to update the comments I wouldn't be surprised if the misleading comments made AI hallucinate.


I believe that AI will get so good at creating new code that a lot of existing libraries will be let unused. What is the point of using lots of libraries if AI can generate the code we need directly? The AI will be the library itself, and the generated code will embed the knowledge about doing lots of things for which we used libraries.


>What is the point of using lots of libraries if AI can generate the code we need directly?

Theyve been debugged.


And also have documentation.


At some point the models will produce code with a lower error rate than existing libraries.


How do you ensure quality?

Review and testing.

Reviewing is easier when there is less code (i.e. libraries are in use)


It’s crazy how many people miss this. GPT models can review code too! They can also write and run tests. Once the context window is big enough to fit the whole code base into it they will be better at review than you are. Eventually we’ll have fine tuned models that are experts in any subject you can think of, the only barrier is data and a lot of recent research is showing that that can be machine generated too.


GPT 4 pre nerf was terrible at reviewing non-trivial or non textbook code. I've decided to test it for a few weeks by checking stuff I caught in review or as bugs, to see if it would spot it. It was like 0% on first try (would always talk about something irrelevant) and after leading it with follow up questions it would figure out the problem half of the time and half of the time I'd just give up leading it.

These were tricky problems that were small scope - I've picked them so I could easily provide it to GPT for review.

So I doubt larger context window will do much.


It’s hard to tell why you ran into such a problem without seeing how you prompted but I can offer a few pointers. Use the OpenAI playground instead of chat, it allows you to specify the system prompt and edit the conversation before each submission. System prompt is good for providing general context, tools and options but you absolutely must provide a few example interactions in the conversation. Even just two prompt and response pairs will strongly influence the rest of the conversation. You can use that to shape the responses however you like and it focuses the model on the task at hand. If you get a bad response, delete or edit it. Bad examples beget more bad responses.


What's the point of comments then. Just feed it back in in three months.


The only widely available LLM-based autocomplete is GitHub Copilot, which is based on GPT 3.

Notably, it's not GPT 3.5, it's 3.0, which is pretty stupid as far as the state of the art goes.

The upcoming Copilot X will be based on GPT 4, which has "sparks of AGI".

In my experience there is no comparison. GPT 3 is barely good enough for some trivial tab-complete tasks. GPT 4 can do quite complex tasks like generating documentation, useful tests, finding obscure bugs, etc...


LLMs no matter how clever have no agency or creativity or ability to innovate, anticipate beyond what they’re prompted, etc. It’s crucial to realize that LLM chat interfaces disguise the fact they’re still completing a prompt. This isn’t AGI as AGI requires agency. GPT4/5 or whatever successor might be a key building block, and I suspect we’ve already discovered the missing elements in classical AI and the challenge will be integration, constraint, feedback, etc, but nothing will make LLMs alone AGI. That shouldn’t be surprising. Our brains are composed of many models, some heuristic, some optimizers, some solvers, some constrainers, and some generative. The answer won’t be a single magic thing operating in a black box. It’ll be an ensemble. We already see this effort beginning with plugins and things like langchain. This is the path forward.


> “No agency or creativity or ability to innovate, anticipate beyond what they’re prompted, etc.”

Sadly, you’ve just described the majority of the developers I’ve had to work with recently.

Most have no agency, write boilerplate code with no creativity, need their hand held every step of they way, and won’t do anything they’re not explicitly ordered to do.

You probably work in an SV startup with a highly skilled workforce. Out there in the real world there are armies of low-skill H1Bs and outsourcers that will soon be replaced with automation.

It’s a recurring theme in economics. Outsource to low cost labour, insource with automation, repeat.


I’m describing the human mind, which all people have. But I get your point - and I think most of those people who aren’t particularly adept or skilled or interested in their jobs might find their jobs are more easily done by more adept or skilled or interested people. Consider digging tunnels. John Henry was skilled and adept and interested in what he did. He could beat the steam drill (at a cost!). But if you visit tunnel digs today isn’t not a thousand people slinging hammers, most of whom were unskilled and uninterested in the labor itself. It’s a thousand skilled engineers digging tunnels never dreamed of in John Henry’s day.


“Our brains are composed of many models, some heuristic, some optimizers, some solvers, some constrainers, and some generative.”

We need an AI that iteratively tweaks its own architecture (to recreate and surpass those modules which are necessary for human thought), and maps out hardware enhancements* to accommodate the new architecture.

*I seem to remember Google working on ML software that proposes new chip designs a few years ago




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: