Meanwhile this new paper claims that GPT-5 surpasses medical professionals in medical reasoning:
"On MedXpertQA MM, GPT-5 improves reasoning and understanding scores by +29.62% and +36.18% over GPT-4o, respectively, and surpasses pre-licensed human experts by +24.23% in reasoning and +29.40% in understanding."
It's quite interesting that. It also shows GPT 4o was worse than the experts so presumably 3.5 was much worse. I wonder where RFK Jr would come on that scale.
"On MedXpertQA MM, GPT-5 improves reasoning and understanding scores by +29.62% and +36.18% over GPT-4o, respectively, and surpasses pre-licensed human experts by +24.23% in reasoning and +29.40% in understanding."
https://arxiv.org/abs/2508.08224