Nope, gemma3 and qwen3 exist of many sizes, including very small ones, that 4-bit quantized can run on very small systems. Qwen3-0.6B, 1.7B, ... imagine if you quantize those to 4 bit. But there is the space for the KV cache, if we don't want to limit the runs to very small prompts.