Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The little 10% reasoning performance decrease seems worth it

We need about three orders of magnitude more tests to make these numbers meaningful.



Fair point. More benchmarks are definitely good but I’m optimistic that they will show similar results.

Anecdotally, I can say that my personal experience with the model is in line with what the benchmarks claim: It’s a bit smarter than R1, a bit faster than R1, much faster than R1-0528, but not quite as smart. (Faster meaning less output tokens). For me, it’s at a sweet spot and I use it as daily driver.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: