Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am pretty convinced that for most types of day to day work, any perceived improvements from the latest Claude models for example were total placebo. In blind tests and with normal tasks, people would probably have no idea if they're using Opus 4.5 or 4.6.


This has basically been my experience since Sonnet 3.5. I've been working on a personal project on and off with various models and things since then and the biggest difference between then and now is that it will do larger chunks of work than it did before, but the quality of the code is not particularly better, I still have to do a lot of cleanup and it still goes off the rails pretty frequently. I have to do fewer individual prompts, but the time spent reviewing the code takes longer because I also have to mentally process and fix larger chunks of code too

Is it a better user experience now? Yes. Has it boosted my productivity on this project? Absolutely.

But it still needs a ton of hand holding for anything complicated and I still deal with tons of "OK, this bug is fixed now!" followed by manually confirming a bug still exists.


It's because they are getting so good it's impossible to recognize them.

Haiku 4.5 is already so good it's ok for 80% (95%?) of dev tasks.


I must be writing very different software than you, I keep opus on a tight leash and it still comes to the strangest conclusions.


Very possible. Some things work like a charm on first try for me, others you can spell it out again and again. And then yet again. Something to do with training data, obviously.


I've found Haiku to be truly mediocre for working with. If you want a cheap models, the open source ones are much better


4.6 has been a very, very slight regression for me, but the tradeoff is they've added better compaction - and now larger context windows. That's a reasonable tradeoff for me.


I'd agree with you on 4.5 to 4.6, but going from gpt-5 or 4.0 to 4.5 was night and day.


GPT5 added the router, which was def a downgrade. 4.5 was probably the best non-COT model humanity has made. But too expensive to run.


Because post 4.0 dropped the sycophancy?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: