Agree. Concrete example: "What was the Japanese codeword for Midway Island in WW...

anorwell · 2025-08-06T00:07:30 1754438850

I think your example reflects well on oss-20b, not poorly. It (may) show that they've been successful in separating reasoning from knowledge. You don't _want_ your small reasoning model to waste weights memorizing minutiae.

sailingparrot · 2025-08-06T10:25:49 1754475949

> gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after 3 minutes !

To be fair, this is not the type of questions that benefit from reasoning, either the model has this info in it's parametric memory or it doesn't. Reasoning won't help.

bigmanhank · 2025-08-06T08:53:27 1754470407

Not true: During World War II the Imperial Japanese Navy referred to Midway Island in their communications as “Milano” (ミラノ). This was the official code word used when planning and executing operations against the island, including the Battle of Midway.

12.82 tok/sec 140 tokens 7.91s to first token

openai/gpt-oss-20b

WmWsjA6B29B4nfk · 2025-08-06T09:15:03 1754471703

What's not true? This is a wrong answer

bigmanhank · 2025-08-07T06:21:08 1754547668

this was the answer from my instance. it is true. "not true" was refering to the poster

seba_dos1 · 2025-08-06T12:41:37 1754484097

How would asking this kind of question without providing the model with access to Wikipedia be a valid benchmark for anything useful?