Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LM Studio has an option on model load that I believe does what you describing here: "K Cache Quantization Type" (and similar for "V"). It's marked as experimental and says the effect is basically hard to predict. Never tried myself, though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: