> They don't show a massive jump in capabilities over your average transformer
Long context is a massive capability improvement.
> But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
Transformers themselves were an incremental improvement over RNN with attention, and in terms of capabilities they weren't immediately superior to their predecessor.
What changed the game was that they were vastly cheaper to train which allowed to train massive models on phenomenal amounts of data.
Linear attention models being much more compute-efficient than transformers on longer context may result in a similar breakthrough.
It's very hard to tell in advance what will be a marginal improvement and what will be a game changer.
Long context is a massive capability improvement.
> But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
Transformers themselves were an incremental improvement over RNN with attention, and in terms of capabilities they weren't immediately superior to their predecessor.
What changed the game was that they were vastly cheaper to train which allowed to train massive models on phenomenal amounts of data.
Linear attention models being much more compute-efficient than transformers on longer context may result in a similar breakthrough.
It's very hard to tell in advance what will be a marginal improvement and what will be a game changer.