It shines on hard problems that have a definite answer. Google's IMO gold model ...

It shines on hard problems that have a definite answer. Google's IMO gold model used parallel reasoning. I don't know what exactly theirs looks like, but their Mind Evolution paper had a similar to my llm-consortium. The main difference being that theirs carries on isolated reasoning, while mine in it's default mode shares the synthesized answer back to the models. I don't have pockets deep enough to run benchmarks on a consortium, but I did try the example problems from that paper and my method also solved them using gemini-1.5. those where path-finding problems, like finding the optimal schedule for a trip with multiple people's calendars, locations and transport options.

And it obviously works for code and math problems. My first test was to give the llm-consortium code to a consortium to look for bugs. It identified a serious bug which only one of the three models detected. So on that case it saved me time, as using them on their own would have missed the bug or required multiple attempts.