These things are a matter of writing a correct prompt. Like "use this and that n...

DonaldPShimoda · on July 4, 2023

> These things are a matter of writing a correct prompt.

No, they aren't.

ChatGPT doesn't know things. It's just a very fancy predictive text engine. For any given prompt, it will provide a response that is engineered to sound authoritative, regardless of whether any information is correct.

It will summon case law out of the aether when prompted by a lawyer; it will conjure paper titles and author names from thin air when prompted by a researcher; it will certainly generate semantically meaningless code very often. It's absolutely ludicrous to assert that you just need a "better prompt" to counteract these kinds of responses because this is not a bug — it's literally just how it works.

Kiro · on July 4, 2023

Read the next sentence after your quote. The point is that you should include code and examples in your prompt (Copilot is so good since it includes the surrounding code and open files in the prompt to understand your specific context), not that you should craft an exceptional "act as rockstar engineer" prompt.

DonaldPShimoda · on July 4, 2023

I did read it, but the whole premise is flawed due to an apparently incomplete understanding of how LLMs work. Including code samples in your prompt won't have the effect you think it will.

LLMs are trained to produce results that are statistically likely to be syntactically well-formed according to assumptions made about how "language" works. So when you provide code samples, the model incorporates those into the response. But it doesn't have any actually comprehension of what's going on in those code samples, or what any code "means"; it's all just pushing syntax around. So what happens is you end up with responses that are more likely to look like what you want, but there's no guarantee or even necessarily a correlation that the tuned responses will actually produce meaningfully good code. This increases the odds of a bug slipping by because, at a glance, it looked correct.

Until LLMs can generate code with proofs of semantic meaning, I don't think it's a good idea to trust them. You're welcome to do as you please, of course, but I would never use them for anything I work on.

Kiro · on July 4, 2023

If it works it works and it definitely works for me. I've been using Copilot for about a year and I can't imagine coding without it again. I cannot recall any bugs slipping by because of it. If anything it makes me write less bugs, since it has no problem taking tedious edge cases into account.

strus · on July 5, 2023

> I've been using Copilot for about a year and I can't imagine coding without it again

I for example used Copilot for 2 months at work and wouldn't pay for it. Most suggestions where either useless or buggy. But I work in a huge C++ codebase, maybe that's hard for it as C++ is also hard for ChatGPT.

ukuina · on July 5, 2023

I think this is incorrect for most use-cases. LLMs do grok code semantically. Adding requests for coding style injects implementation specificity when flattening the semantic multidimensionality back into language.

DonaldPShimoda · on July 5, 2023

No, they do not. That's not how LLMs work, and stating that it is betrays an absolute lack of understanding of the underlying mechanisms.

LLMs generate statistically likely sequences of tokens. Their statistical model is derived from huge corpora, such as the contents of the entire (easily searchable) internet, more or less. This makes it statistically likely that, given a common query, they will produce a common response. In the realm of code, this makes it likely the response will be semantically meaningful.

But the statistical model doesn't know what the code means. It can't. (And trying to use large buzzwords to convince people otherwise doesn't prove anything, for what it's worth.)

To see what I mean, just ask ChatGPT about a slightly niche area. I work in programming languages research at a university, and I can't tell you how many times I've had to address student confusion because an LLM generated authoritative-sounding semantic garbage about my domain areas. It's not just that it was wrong, but that it just makes things up in every facet of the exercise to a degree that a human simply couldn't. They don't understand things; they generate text from statistical models, and nothing more.

omgmajk · on July 4, 2023

The question is, do you actually save time by coaxing the language model into an answer or would you just save time by writing it yourself?

I have this friend who gets obsessed with things very easily and ChatGPT got to him quite a bit. He spent about two months perfecting his AI persona and starts every chat with several hundred words of directions before asking any questions. I find that this also produces the wrong answers many times.

throwuwu · on July 5, 2023

For me it saves time by keeping my momentum up. I don’t lean on it when I’m in a flow state and cruising through the code I’m writing but as soon as I hit a wall I jump over to chat and start working on a solution with it. This saves a huge amount of time that would otherwise be spent banging my head against the keyboard or googling or reading random SO posts and dev blogs and documentation or even just the time wasted when I get frustrated enough to stop working on the problem and wind up browsing hn.

weevil · on July 4, 2023

My first step when using an LLM is asking it to produce a test suite for a function, with a load of example inputs and outputs. 9 times out of 10, I've been presented with something incorrect which I need to correct first.

I'm reasonably good at being specific and clear in my directions, but I quickly arrived at the conclusion that LLMs are simply not good at producing accurate code in a way that saves me time.