Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
ilaksh
4 months ago
|
parent
|
context
|
favorite
| on:
Gemini 3 Pro Model Card [pdf]
The only one it doesn't win is SWE bench which it is significantly behind Claude Sonnet. You just can't take down Sonnet.
svantana
4 months ago
|
next
[–]
One percentage point is not significant, neither in the colloquial nor the scientific sense[1].
[1] Binomial formula gives a confidence interval of 3.7%, using p=0.77, N=500, confidence=95%
stavros
4 months ago
|
prev
[–]
Codex has been much better than Sonnet for me.
dotancohen
4 months ago
|
parent
[–]
On what types of tasks?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: