Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Structure of Neural Embeddings (seanpedersen.github.io)
74 points by sean_pedersen on Dec 27, 2024 | hide | past | favorite | 4 comments


Current embeddings are badly trained and are massively holding back networks. A core issue is something I call 'token drag'. Low frequency tokens, when they finally come up, drag the model back towards an earlier state causing a lot of lost training. This leads to the first few layers of a model effectively being dedicated to just being a buffer to the bad embeddings feeding the model. Luckily fixing this is actually really easy. Creating a sacrificial two layer network to predict embeddings in training (and then just calculating the embeddings once for prod inference) gives a massive boost to training. To see this in action check out the unified embeddings in this project: https://github.com/jmward01/lmplay


Do you have a peer reviewed source you could link on this approach, or is it something you thought of and are experimenting with yourself? I couldn't tell from the LMPlay repo in my skim, and the idea is intriguing


all my own ideas in there. I was thinking of writing it up more formally, but I am more of a 'think -> build -> next thing' kind of person.


Oh wow, great set of reads. Thanks to @sean_pedersen for posting, looking forward to reviewing this in my closeout this year.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: