Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've had sucess with GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection https://arxiv.org/abs/2403.03507


This uses less memory so you can do fine tuning or hardware with less vram but at a cost of taking longer on training - there is a throughput penalty, the paper detailing the technique shows something like a 15% decrease in throughput.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: