Depending on when that was: in 2018 the free model was the macOS speech engine, in 2019 it was a fast but relatively weak model, and as of late 2021 it's a much stronger model. I'm currently working on the next model series with a lot more resources than I had before.
It's also worth saying that if you only tried things out briefly, there are a handful of reasons recognition may have seemed worse. Talon uses a strict command system by default, because that improves precision and speed for trained users, but the tradeoff there is it's more confusing for people who haven't learned it yet.
For example, Talon isn't in "dictation mode" by default, so you need to switch to that if you're trying to write email-like text and don't want to prefix your phrases with a command like "say".
The timeout system may also be confusing at first. When you pause, Talon assumes you were done speaking and tries to run whatever you said. You can mitigate this by speaking faster or increasing the timeout.
The default commands (like the alphabet) may also just not be very good for some accents, and that will be the case for any speech engine - you will likely need to change some commands if they're hard to enunciate in your accent.
I recommend joining the slack [1] and asking there if you want more specific feedback. I definitely want to support many accents and even have some users testing Talon with other spoken languages.
I don't know what type of speech each dataset represents, but the talon results are extremely impressive... I assume it wasn't trained on at least some subset (depending on the train/test split) of this data?
A handful of the datasets I tested are fully held out (I have reason to believe none of the models have trained on them), and talon was trained on none of the dev or test data of any of the datasets in question.
Due to whisper's weakly supervised training on a large amount of automatically scraped data and reliance on a bigger language model, it's far more likely whisper had seen some of the test data before.