i don't think they really need to...maybe for citations but for training if the content is the same on site A and B it doesn't matter which one it pulled from.
that said.. if the content itself is bad then that'd be a problem. we'll probably start seeing that, sites designed to poison LLMs.
that said.. if the content itself is bad then that'd be a problem. we'll probably start seeing that, sites designed to poison LLMs.