> To highlight the main strength of o1 pro mode (improved reliability), we > use...

    > To highlight the main strength of o1 pro mode (improved reliability), we 
    > use a stricter evaluation setting: a model is only considered to solve a 
    > question if it gets the answer right in four out of four attempts ("4/4 
    > reliability"), not just one.

So, $200/mo. gets you less than 12.5% randomly wrong answers?

And $20/mo. gets you >25% randomly wrong answers?