Is 200% a way to say \*3 quicker ? The little 10% reasoning performance decrease...

loherj · 2025-07-05T10:28:56 1751711336

Yes. If you look at the diagram that plots the performance vs the amount of output tokens, you can see that R1T2 uses about 1/3 of the output tokens that R1-0528 uses.

Keep in mind, the speed improvement doesn’t come from the model running any faster (it’s the exact same architecture as R1, after all) but from using less output tokens while still achieving very good results.

MangoToupe · 2025-07-05T10:27:27 1751711247

> The little 10% reasoning performance decrease seems worth it

We need about three orders of magnitude more tests to make these numbers meaningful.

loherj · 2025-07-05T10:34:20 1751711660

Fair point. More benchmarks are definitely good but I’m optimistic that they will show similar results.

Anecdotally, I can say that my personal experience with the model is in line with what the benchmarks claim: It’s a bit smarter than R1, a bit faster than R1, much faster than R1-0528, but not quite as smart. (Faster meaning less output tokens). For me, it’s at a sweet spot and I use it as daily driver.