Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>people re-creating R1 (some claim for $30)

R1 or the R1 finetunes? Not the same thing...

HF is busy recreating R1 itself but that seems to be a pretty big endevour not a $30 thing



This is indeed a massive exaggeration, I'm pretty sure the $30 experiment is this one: https://threadreaderapp.com/thread/1882839370505621655.html (github: https://github.com/Jiayi-Pan/TinyZero).

And while this is true that this experiment shows that you can reproduce the concept of direct reinforcement learning of an existing LLM, in a way that makes it develop reasoning in the same fashion Deepseek-R1 did, this is very far from a re-creation of R1!


Maybe they mistake recreation for the cp command




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: