Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On a first glance, CS3.5 appears to be slightly faster than gpt-4o (62 vs 49 tok/sec) and slightlhy less capable (78% vs 89% accuracy on our internal reasoning benchmark). When initially launched, gpt-4o had speed of over 100 tok/sec, surprised that speed went down as fast.


Have you tried our prompt generator? https://docs.anthropic.com/en/docs/build-with-claude/prompt-... . We've seen it improve performance.


The benchmark is imagined as zero shot, so no tweaking.


Got it, thanks for the feedback!


I'm not asking for actual examples, but what kind of thing is in your internal reasoning benchmark?


Things like “summarize this text in exactly 14 words”, programming questions, unstructured data to structured data transformations and so on…


Do you let it use CoT? I think that first one is pretty hard if you have to produce it directly one token at a time, but I guess that's kind of the point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: