> To highlight the main strength of o1 pro mode (improved reliability), we
> use a stricter evaluation setting: a model is only considered to solve a
> question if it gets the answer right in four out of four attempts ("4/4
> reliability"), not just one.
So, $200/mo. gets you less than 12.5% randomly wrong answers?
And $20/mo. gets you >25% randomly wrong answers?