Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Native FP4 quantization means it requires half as many bytes as parameters, and will have next to zero quality loss (on the order of 0.1%) compared to using twice the VRAM and exponentially more expensive hardware. FP3 and below gets messier.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: