You linked a paper with no results and no conclusion. Perhaps you meant to link ...

sillysaurusx · on Feb 5, 2023

I never finished it.

hackernewds · on Feb 5, 2023

so it is unproven? what is the value of it?

sillysaurusx · on Feb 5, 2023

It’s how we trained roughly 40 GPT 1.5B models. The technique works; it’s up to you to try it out.

zone411 · on Feb 5, 2023

The abstract mentions fine-tuning, not full pre-training?

sillysaurusx · on Feb 5, 2023

Yeah, sorry for not being precise. We used the technique to fine tune around 40 GPT 1.5B models, including the chess one.

It was very apparent that the technique was working well. The kiss curve suddenly started dropping dramatically the first day we got it working.