with quantization, 20B fits effortlessly in 24GB with quantization + CPU offload...

with quantization, 20B fits effortlessly in 24GB

with quantization + CPU offloading, non-thinking models run kind of fine (at about 2-5 tokens per second) even with 8 GB of VRAM

sure, it would be great if we could have models in all sizes imaginable (7/13/24/32/70/100+/1000+), but 20B and 120B are great.