Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You linked a paper with no results and no conclusion. Perhaps you meant to link a different paper?


I never finished it.


so it is unproven? what is the value of it?


It’s how we trained roughly 40 GPT 1.5B models. The technique works; it’s up to you to try it out.


The abstract mentions fine-tuning, not full pre-training?


Yeah, sorry for not being precise. We used the technique to fine tune around 40 GPT 1.5B models, including the chess one.

It was very apparent that the technique was working well. The kiss curve suddenly started dropping dramatically the first day we got it working.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: