> The 20B model runs on my Mac laptop using less than 15GB of RAM. I was about t...

hrpnk · 2025-08-05T21:53:05 1754430785

gpt-oss-20b: 9 threads, 131072 context window, 4 experts - 35-37 tok/s on M2 Max via LM Studio.

rt1rz · 2025-08-05T22:47:26 1754434046

interestingly, i am also on M2 Max, and i get ~66 tok/s in LM Studio on M2 Max, with the same 131072. I have full offload to GPU. I also turned on flash attention in advanced settings.

hrpnk · 2025-08-06T17:56:16 1754502976

Thank you! Flash attention gives me a boost to ~66 tok/s indeed.

mdz4040 · 2025-08-06T03:26:42 1754450802

55 token/s here on m4 pro, turning on flash attention puts it to 60/s.

mekpro · 2025-08-06T00:30:56 1754440256

i got 70 token/s on m4 max

mhitza · 2025-08-06T00:40:57 1754440857

That M4 Max is really something else, I get also 70 tokens/second on eval on a RTX 4000 SFF Ada server GPU.