Yeah, probably right. If you want a great rabbit hole, look up "Common Crawl" and see how a great academic project was absolutely hijacked for pennies on the dollar to grab training data - the foundation for every LLM out there right now.
It was meant to be an open-source compilation of the crawled internet so that research could be done on web search given how opaque Google's process is. It was NOT meant to be a cheap source of data for for-profit LLM's to train on.
(Shrug) Multiple not-for-profit LLMs have trained on it as well.
If something I worked on turned out to play a significant part in something that turned out to be that big a deal, I'd be OK with it. And nobody's stopping people from doing web-search studies with it, to this day.