I don't exactly have the ideal hardware to run locally - but just ran the 20b in LMStudio with a 3080 Ti (12gb vram) with some offloading to CPU. Ran couple of quick code generation tests. On average about 20t/sec. But response quality was very similar or on-par with chatgpt o3 for the same code it outputted. So its not bad.