Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
riidom
23 days ago
|
parent
|
context
|
favorite
| on:
Right-sizes LLM models to your system's RAM, CPU, ...
LM Studio has an option on model load that I believe does what you describing here: "K Cache Quantization Type" (and similar for "V"). It's marked as experimental and says the effect is basically hard to predict. Never tried myself, though.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: