you're missing the point. SAT multiple choice negatives for random guesses, fine...

ACCount37 · 2025-09-06T17:34:45 1757180085

In RLVR? Quite easily.

And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues.

Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5.

RugnirViking · 2025-09-06T19:29:25 1757186965

isn't this just related to the question "how do you train a transformer"? you give it wrong examples, and use optimization algorithms to move away from that kind of completions

throwawaymaths · 2025-09-06T21:27:46 1757194066

thats quite hard for the reasons i explained. might be solvable using q learning techniques, but those are not easy in the context of transformers iiuc