Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
GodelNumbering
6 months ago
|
parent
|
context
|
favorite
| on:
Open models by OpenAI
> The 20B model runs on my Mac laptop using less than 15GB of RAM.
I was about to try the same. What TPS are you getting and on which processor? Thanks!
hrpnk
6 months ago
|
next
[–]
gpt-oss-20b: 9 threads, 131072 context window, 4 experts - 35-37 tok/s on M2 Max via LM Studio.
rt1rz
6 months ago
|
parent
|
next
[–]
interestingly, i am also on M2 Max, and i get ~66 tok/s in LM Studio on M2 Max, with the same 131072. I have full offload to GPU. I also turned on flash attention in advanced settings.
hrpnk
6 months ago
|
root
|
parent
|
next
[–]
Thank you! Flash attention gives me a boost to ~66 tok/s indeed.
mdz4040
6 months ago
|
prev
|
next
[–]
55 token/s here on m4 pro, turning on flash attention puts it to 60/s.
mekpro
6 months ago
|
prev
[–]
i got 70 token/s on m4 max
mhitza
6 months ago
|
parent
[–]
That M4 Max is really something else, I get also 70 tokens/second on eval on a RTX 4000 SFF Ada server GPU.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
I was about to try the same. What TPS are you getting and on which processor? Thanks!