The only one it doesn't win is SWE bench which it is significantly behind Claude... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ilaksh 4 months ago \| parent \| context \| favorite \| on: Gemini 3 Pro Model Card [pdf] The only one it doesn't win is SWE bench which it is significantly behind Claude Sonnet. You just can't take down Sonnet.

svantana 4 months ago | [–]

One percentage point is not significant, neither in the colloquial nor the scientific sense[1].

[1] Binomial formula gives a confidence interval of 3.7%, using p=0.77, N=500, confidence=95%

stavros 4 months ago | [–]

Codex has been much better than Sonnet for me.

dotancohen 4 months ago | [–]

On what types of tasks?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact