Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Should be in the RoPE paper. The OG transformers used multiplicative sinusoidal embeddings, while RoPE does a pairwise rotation.

There's also NoPE, I think SmolLM3 "uses NoPE" (aka doesn't use any positional stuff) every fourth layer.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: