Conversational data between human beings is exactly the sort of tokens you’d wan...

Conversational data between human beings is exactly the sort of tokens you’d want to train a massive in-house LLM on. Then you could have embeddings for the local stuff on-device, and still call it “private” models.

The magic data dust is only useful if they can use all of it though. I just see some incentive there that I can’t imagine them ignoring in a couple years.