Conversational data between human beings is exactly the sort of tokens you’d want to train a massive in-house LLM on. Then you could have embeddings for the local stuff on-device, and still call it “private” models.
The magic data dust is only useful if they can use all of it though. I just see some incentive there that I can’t imagine them ignoring in a couple years.
Facebook uses user data to target ads. Apple doesn’t do this, so they don’t need the data.
If Siri gets better, it’s because Apple have switched it to a better LLM, not because Apple started sucking up magic data dust.
I’m pretty sure that Siri can already read your messages, that’s the point, it’s a speech interface that runs in your phone.