>people re-creating R1 (some claim for $30) R1 or the R1 finetunes? Not the same...

littlestymaar · on Jan 26, 2025

This is indeed a massive exaggeration, I'm pretty sure the $30 experiment is this one: https://threadreaderapp.com/thread/1882839370505621655.html (github: https://github.com/Jiayi-Pan/TinyZero).

And while this is true that this experiment shows that you can reproduce the concept of direct reinforcement learning of an existing LLM, in a way that makes it develop reasoning in the same fashion Deepseek-R1 did, this is very far from a re-creation of R1!

m3kw9 · on Jan 26, 2025

Maybe they mistake recreation for the cp command