There has been a huge increase in context windows recently. I think the larger p...

There has been a huge increase in context windows recently.

I think the larger problem is "effective context" and training data.

Being technically able to use a large context window doesn't mean a model can actually remember or attend to that larger context well. In my experience, the kinds of synthetic "needle in haystack" tasks that AI companies use to show how large of a context their model can handle don't translate very well to more complicated use cases.

You can create data with large context for training by synthetically adding in random stuff, but there's not a ton of organic training data where something meaningfully depends on something 100,000 tokens back.

Also, even if it's not scaling exponentially, it's still scaling: at what point is RAG going to be more effective than just having a large context?