My llm agent is currently running an experiment generating many pelicans. It will compare various small model consortiums against the same model running solo.
It should push new pelicans to the repo after run.
The horizon-beta is up already, not small or opensource but tested it anyway, and you can already see an improvement using 2+1 (2 models + the arbiter) for that model.
https://irthomasthomas.github.io/Pelicans-consortium/ https://github.com/irthomasthomas/Pelicans-consortium