Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We didn't want to use LoRA to maximize quality, so we used 32 A100-80GB with a sequence length of 4096. It's possible to do a native fine-tune on as little as 8 A100-80GB with DeepSpeed Zero 3, but it will take longer.

With LoRA you can probably get away with just a few 4090s.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: