On a first glance, CS3.5 appears to be slightly faster than gpt-4o (62 vs 49 tok...

jasondclinton · on June 20, 2024

Have you tried our prompt generator? https://docs.anthropic.com/en/docs/build-with-claude/prompt-... . We've seen it improve performance.

freediver · on June 20, 2024

The benchmark is imagined as zero shot, so no tweaking.

jasondclinton · on June 20, 2024

Got it, thanks for the feedback!

sebzim4500 · on June 20, 2024

I'm not asking for actual examples, but what kind of thing is in your internal reasoning benchmark?

freediver · on June 20, 2024

Things like “summarize this text in exactly 14 words”, programming questions, unstructured data to structured data transformations and so on…

sebzim4500 · on June 20, 2024

Do you let it use CoT? I think that first one is pretty hard if you have to produce it directly one token at a time, but I guess that's kind of the point.