Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> respect signals like subscription paywalls, the robots.txt file, the HTML “noindex” keyword, terms of service, and other means by which copyright holders signal their intentions.

And if they disrespect those signals & terms, and lie, what then?

> the copyrighted content has been ingested, but it is detected during the output phase as part of an overall content management pipeline.

Yes, but how can it be detected reliably? Considering there is much to be gained in fooling us. (think parallel construction)



>Yes, but how can it be detected reliably? Considering there is much to be gained in fooling us. (think parallel construction)

I was fooling around with a fan constructed addon training module for novelai.

I had a blast, reconstructing a few different narratives and sort of melding them together was a lot of fun.

I let the tool name the characters. And the names it came up with were better than halfway decent. So I kept them.

Turns out that while novel ai makes some sort of best effort to remove copyrighted proper nouns, the additional training module had reinserted some. After it had selected the first, it immediately selected his brother for the next one.

If I hadnt gotten suspicious and googled the names in depth, I might have spread the story around. It wasnt ever destined to be published but I could see people falling into the same trap.


The linked article cited Youtube's Content ID, so ... clearly reliability isn't expected.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: