I really wonder how much the self-supervised data flywheel will be enough to reproduce this model or not.
There are also so many different tweaks that could be made to try to get these models to perform better, kudos to the team reproducing it all in the open
There are also so many different tweaks that could be made to try to get these models to perform better, kudos to the team reproducing it all in the open