On a first glance, CS3.5 appears to be slightly faster than gpt-4o (62 vs 49 tok/sec) and slightlhy less capable (78% vs 89% accuracy on our internal reasoning benchmark). When initially launched, gpt-4o had speed of over 100 tok/sec, surprised that speed went down as fast.
Do you let it use CoT? I think that first one is pretty hard if you have to produce it directly one token at a time, but I guess that's kind of the point.